Columbia Photographic Images and Photorealistic Computer Graphics Dataset

  All look so photorealistic ... which one is really from a camera?

Introduction

Passive-blind image authentication is a new area of research. A suitable dataset for experimentation and comparison of new techniques is important for the progress of the new research area. In response to the need for a new dataset, the Columbia Photographic Images and Photorealistic Computer Graphics Dataset is made open for the passive-blind image authentication research community. The dataset is composed of four component image sets, i.e., the Photorealistic Computer Graphics Set, the Personal Photographic Image Set, the Google Image Set, and the Recaptured Computer Graphics Set. This dataset, available from the Trustfoto website, will be for those who work on the photographic images (PIM) versus photorealistic computer graphics (PRCG) classification problem, which is a subproblem of the passive-blind image authentication research. In technical report below, we describe the design and the implementation of the dataset. The report will also serve as a user guide for the dataset.

Component Sets

The dataset consists of four sets of images, as shown in the above figure. A detailed description can be found in the technical report.

Figure 1: (a) Subcategories within the PRCG image set and (b) Subcategories within Personal image set, the number is the image count
  1. 800 PRCG images from the Internet (PRCG): These images are categorized by content into architecture, game, nature, object and life, see Figure above. The PRCG are mainly collected from 40 3D-graphics websites, such as www.softimage.com, www.3ddart.org, www.3d-ring.com and so on. The rendering software used are such as 3ds MAX, softimage-xsi, Maya, Terragen and so on. The geometry modelling tools used include AutoCAD, Rhinoceros, softimage-3D and so on. The high-end rendering techniques used include global illumination with ray tracing or radiosity, simulation of the camera depth-of-field effect, soft-shadow, caustics effect (i.e., the specular light pattern seen near a glass when the glass is illuminated), and so on.
  2. PIM images from the personal collections (Personal): The Personal set consists of two parts, i.e., 800 images from the authors' personal collections (Personal Columbia) and 400 images from the personal collection of Philip Greenspun (Personal Greenspun). The reason for including images from Greenspun's collection is to increase the diversity of the Personal set in terms of the image content, the camera models and the photographer styles. The Personal Greenspun set are mainly travel images with content such as indoor, outdoor, people, objects, building and so on. Whereas the Personal Columbia set are acquired by the authors using the professional single-len-reflex (SLR) Canon 10D and Nikon D70. It has content diversity in terms of indoor or outdoor scenes, natural or artificial objects, and lighting conditions of day time, dusk or night time. See Figure 1(b).
  3. 800 PIM from Google Image Search (Google): These images are the search results based on the keywords that match the categories within the PRCG set. The keywords are architecture, nature scene, landscape, animal, building, people, scenery, indoor, object, machine, insect, interior, plant, forest, vehicle, fruit and statue. The images are filtered subjectively to include only PIM of size larger than 300 pixels for the width or the height, whichever smaller.
  4. 800 photographed PRCG (Recaptured PRCG): These are the photograph of the screen display of the images from the PRCG set. Computer graphics are displayed on a 17-inch (gamma linearized) LCD monitor screen with a display resolution of 1280x1024 and photographed by a Canon G3 digital camera. The acquisition is conducted in a dark room in order to reduce the reflections from the ambient scene.

Note:

The rationale of collecting two different sets of PIM is the following: the Google set has a diverse image content and involves more types of cameras, photographer styles and lighting conditions but the ground truth may not be reliable, whereas the Personal set has reliable sources but it has limited diversity in camera and photographer style factors.

On the other hand, the reason of having the Recaptured PRCG image set is as follows: based on the two-level definitions of image authenticity, i.e., the imaging-process authenticity and the scene, as introduced in [1], we should be able to restore the imaging-process authenticity of the PRCG by recapturing them using a camera. Therefore, we produce the Recaptured PRCG image set for evaluating how much the scene authenticity can be captured by a classifier.


Dataset Download

I appreciate the information you provide below:

Full Name:
Organization:
Email:
Planned Usage:

Download Dataset:

1 Photorealistic Computer Graphics Set
2 Personal Columbia Set
3 Personal Greenspun Set
4 Google Set

Terms of Use:

If you have read and agree to these terms of use, click below to continue to the download:


People

  1. Shih-Fu Chang (sfchang@ee.columbia.edu)
  2. Tian-Tsong Ng (ttng@ee.columbia.edu)
  3. Jessie Hsu (yfhsu@ee.columbia.edu)
  4. Martin Pepeljugoski

Dataset Technical Report (Please Cite This)

Columbia Photographic Images and Photorealistic Computer Graphics Dataset,
Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Martin Pepeljugoski,
ADVENT Technical Report #205-2004-5, Columbia University, Feb 2005 [PDF] [HTML]

@techreport{ng04prcgdataset,
author = "T.-T Ng, S.-F. Chang, J. Hsu, and M. Pepeljugoski",
title = "Columbia Photographic Images and Photorealistic Computer Graphics Dataset",
institution = "ADVENT, Columbia University",
number = "205-2004-5",
year = "2004"
}

Related Publications

[1]
Physics-Motivated Features for Distinguishing Photographic Images and Computer Graphics
,
Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Lexing Xie, Mao-Pei Tsui,
ACM Multimedia, Singapore, November 2005 [Abstract] [PDF]

@inproceedings{ng05physicscg,
author = {T.-T. Ng and S.-F. Chang and J. Hsu and L. Xie and M.-P. Tsui},
title = {Physics-Motivated Features for Distinguishing Photographic Images and Computer Graphics},
booktitle = {ACM Multimedia},
year = {2005},
location = {Singapore}
}

Acknowledgements

This project is supported in part by NSF CyberTrust program (IIS-04-30258) and Singapore A*STAR scholarship (for the first author). The authors thank Philip Greenspun for granting us the permission of using his personal images and Lexing Xie for her kind help. Authors in DVMM would also like to thank Martin Pepeljugoski for his contributions of some original ideas and time.

Links

  1. TrustFoto
  2. "Fake or Foto?" quiz webpage by the 3D graphics rendering software company, Alias
  3. Columbia University Digital Video and Multimedia Lab
  4. Columbia University Graphics Lab