Columbia Photographic Images and Photorealistic Computer Graphics Dataset

Columbia Photographic Images and Photorealistic Computer Graphics Dataset

All look so photorealistic ... which one is really from a camera?

Introduction

Passive-blind image authentication is a new area of research. A suitable dataset for experimentation and comparison of new techniques is important for the progress of the new research area. In response to the need for a new dataset, the Columbia Photographic Images and Photorealistic Computer Graphics Dataset is made open for the passive-blind image authentication research community. The dataset is composed of four component image sets, i.e., the Photorealistic Computer Graphics Set, the Personal Photographic Image Set, the Google Image Set, and the Recaptured Computer Graphics Set. This dataset, available from the Trustfoto website, will be for those who work on the photographic images (PIM) versus photorealistic computer graphics (PRCG) classification problem, which is a subproblem of the passive-blind image authentication research. In technical report below, we describe the design and the implementation of the dataset. The report will also serve as a user guide for the dataset.

Component Sets

The dataset consists of four sets of images, as shown in the above figure. A detailed description can be found in the technical report.

Figure 1: (a) Subcategories within the PRCG image set and (b) Subcategories within Personal image set, the number is the image count

800 PRCG images from the Internet (PRCG): These images are categorized by content into architecture, game, nature, object and life, see Figure above. The PRCG are mainly collected from 40 3D-graphics websites, such as www.softimage.com, www.3ddart.org, www.3d-ring.com and so on. The rendering software used are such as 3ds MAX, softimage-xsi, Maya, Terragen and so on. The geometry modelling tools used include AutoCAD, Rhinoceros, softimage-3D and so on. The high-end rendering techniques used include global illumination with ray tracing or radiosity, simulation of the camera depth-of-field effect, soft-shadow, caustics effect (i.e., the specular light pattern seen near a glass when the glass is illuminated), and so on.

PIM images from the personal collections (Personal): The Personal set consists of two parts, i.e., 800 images from the authors' personal collections (Personal Columbia) and 400 images from the personal collection of Philip Greenspun (Personal Greenspun). The reason for including images from Greenspun's collection is to increase the diversity of the Personal set in terms of the image content, the camera models and the photographer styles. The Personal Greenspun set are mainly travel images with content such as indoor, outdoor, people, objects, building and so on. Whereas the Personal Columbia set are acquired by the authors using the professional single-len-reflex (SLR) Canon 10D and Nikon D70. It has content diversity in terms of indoor or outdoor scenes, natural or artificial objects, and lighting conditions of day time, dusk or night time. See Figure 1(b).

800 PIM from Google Image Search (Google): These images are the search results based on the keywords that match the categories within the PRCG set. The keywords are architecture, nature scene, landscape, animal, building, people, scenery, indoor, object, machine, insect, interior, plant, forest, vehicle, fruit and statue. The images are filtered subjectively to include only PIM of size larger than 300 pixels for the width or the height, whichever smaller.

800 photographed PRCG (Recaptured PRCG): These are the photograph of the screen display of the images from the PRCG set. Computer graphics are displayed on a 17-inch (gamma linearized) LCD monitor screen with a display resolution of 1280x1024 and photographed by a Canon G3 digital camera. The acquisition is conducted in a dark room in order to reduce the reflections from the ambient scene.

Note:

The rationale of collecting two different sets of PIM is the following: the Google set has a diverse image content and involves more types of cameras, photographer styles and lighting conditions but the ground truth may not be reliable, whereas the Personal set has reliable sources but it has limited diversity in camera and photographer style factors.

On the other hand, the reason of having the Recaptured PRCG image set is as follows: based on the two-level definitions of image authenticity, i.e., the imaging-process authenticity and the scene, as introduced in [1], we should be able to restore the imaging-process authenticity of the PRCG by recapturing them using a camera. Therefore, we produce the Recaptured PRCG image set for evaluating how much the scene authenticity can be captured by a classifier.

Dataset Download

I appreciate the information you provide below:

Full Name:

Organization:

Email:

Planned Usage:

Download Dataset:

1 Photorealistic Computer Graphics Set

2 Personal Columbia Set

3 Personal Greenspun Set

4 Google Set

Terms of Use:

The Columbia Photographic Images and Photorealistic Computer Graphics Dataset version 1 Terms of Use Copyright (c) 2002-2004 by the Columbia DVMM Laboratory, Department of Electrical Engineering 1312 S.W. Mudd, 500 West 120th Street, New York, NY 10027 If it is your intent to use this dataset for non-commercial purposes, such as in academic research, this dataset is free. Since this dataset includes images from Philips Greenspun's collection, users of the dataset should abide to the terms set up for that collection as well. If you use this dataset in your research, please acknowledge The Columbia Photographic Images and Photorealistic Computer Graphics Dataset and its authors, and link back to the Trustfoto project page www.ee.columbia.edu/trustfoto. Please cite the dataset in academic publications as: Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Martin Pepeljugoski, Columbia Photographic Images and Photorealistic Computer Graphics Dataset. ADVENT Technical Report #205-2004-5, Columbia University, Feb 2005.

If you have read and agree to these terms of use, click below to continue to the download:

People

Shih-Fu Chang ([email protected])

Tian-Tsong Ng ([email protected])

Jessie Hsu ([email protected])

Martin Pepeljugoski

Dataset Technical Report (Please Cite This)

Columbia Photographic Images and Photorealistic Computer Graphics Dataset,
Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Martin Pepeljugoski,
ADVENT Technical Report #205-2004-5, Columbia University, Feb 2005 [PDF] [HTML]

@techreport{ng04prcgdataset,
author = "T.-T Ng, S.-F. Chang, J. Hsu, and M. Pepeljugoski",
title = "Columbia Photographic Images and Photorealistic Computer Graphics Dataset",
institution = "ADVENT, Columbia University",
number = "205-2004-5",
year = "2004"
}

Related Publications

[1]
Physics-Motivated Features for Distinguishing Photographic Images and Computer Graphics,
Tian-Tsong Ng, Shih-Fu Chang, Jessie Hsu, Lexing Xie, Mao-Pei Tsui,
ACM Multimedia, Singapore, November 2005 [Abstract] [PDF]

@inproceedings{ng05physicscg,
author = {T.-T. Ng and S.-F. Chang and J. Hsu and L. Xie and M.-P. Tsui},
title = {Physics-Motivated Features for Distinguishing Photographic Images and Computer Graphics},
booktitle = {ACM Multimedia},
year = {2005},
location = {Singapore}
}

Acknowledgements

This project is supported in part by NSF CyberTrust program (IIS-04-30258) and Singapore A*STAR scholarship (for the first author). The authors thank Philip Greenspun for granting us the permission of using his personal images and Lexing Xie for her kind help. Authors in DVMM would also like to thank Martin Pepeljugoski for his contributions of some original ideas and time.

Links

TrustFoto

"Fake or Foto?" quiz webpage by the 3D graphics rendering software company, Alias

Columbia University Digital Video and Multimedia Lab

Columbia University Graphics Lab

1	Photorealistic Computer Graphics Set
2	Personal Columbia Set
3	Personal Greenspun Set
4	Google Set