Columbia Uncompressed Image Splicing Detection Evaluation Dataset


Introduction

Copying-and-pasting, or image splicing, is the most common tampering seen today. Although often followed by various post processing techniques, we provide a benchmark set with only the splicing operation so that people can study its effect in a focused way. Our images are in high resolution and uncompressed, removing further the compression concern. Also, the EXIF information is retained, therefore this set is not restricted to splicing detection but also suitable for other computer vision algorithms since the ground truth of exposure settings are accessible. Please feel free to download and test your algorithm on it.

Another relevant spliced image dataset can be found here.

Image Content

There are 2 directories in this dataset: 4cam_auth & 4cam_splc. 4cam_auth containts authentic images, and 4cam_splc contains spliced images. By the term 'authentic', we mean an image that is taken using just one camera.

In 4cam_auth, there are 183 images, and in 4cam_splc, there are 180. The image sizes range from 757x568 to 1152x768 and are uncompressed, in either TIFF or BMP formats. The spliced images are created using the authentic images, without any post processing. Full EXIF information is included in authentic images.

The images are mostly indoor scenes: labs, desks, books ...etc. Only 27 images, or 15%, are taken outdoors on a cloudy day (which makes the outdoor illumination similar to indoor conditions). Several examples are shown below.

Authentic


Spliced
Figure 1: Example images in the dataset

Naming Conventions

The naming conventions are listed in Table 1. All images were taken with 4 cameras: Canon G3, Nikon D70, Canon 350D Rebel XT, and Kodak DCS 330. Since an original set of 3 bracketing images (3 different exposures) were taken for each scene and only the medium exposure was chosen for this dataset, the image index starts at 2 and jumps with an interval of 3 for authentic images (except for Kodak DCS 330, no bracketing was performed with that camera). Each scene is further divided into 9 smaller images. There are a total of 183 images in the authentic category.

Spliced images are created from authentic ones using copying and pasting visually salient objects in Adobe PhotoShop. No post processing were performed. We created 30 images for each camera pair, therefore with 4 cameras, we get 30x6 pairs = 180 images in the spliced category.

Each TIFF file is about 3MB~4MB. The total dataset takes up 800MB of space.

Table 1: Naming Conventions for Authentic and Spliced Images
AUTHENTIC CATEGORY (Camera)_(ImageIndex)_sub_(SubIndex)
    Camera ImageIndex SubIndex
    Canon G3 2, 5, 8 1~9
    Nikon D70 2, 5, 8, 11 1~9
    Canon 350D Rebel XT 2, 5, 8, 11, 14, 17, 20, 23, 26, 29, 32, 35, 38 1~9
    Kodak DCS 330 1, 2, 3 1
SPLICED CATEGORY (Camera1)_(Camera2)_sub_(SubIndex)
Camera1 Camera2     SubIndex
Canon G3 Nikon D70     1~30
Canon G3 Canon 350D Rebel XT     1~30
Canon G3 Kodak DCS 330     1~30
Nikon D70 Canon 350D Rebel XT     1~30
Nikon D70 Kodak DCS 330     1~30
Canon 350D Rebel XT Kodak DCS 330     1~30

Under each directory there is a sub directory storing edgemasks for test images. These edgemasks label regions within each image, indicating them as parts that come from different cameras and are created manually. For authentic images, since there is no actual splicing boundary, we just picked a salient object boundary as the suspicious splicing boundary.

Edgemask filenames follow the naming conventions of its corresponding image, with the string '_edgemask' or '_edgemask_3' appended. For the former case, the image is divided into 4 regions: bright red (255,0,0), bright green (0,255,0), regular red (200,0,0), and regular green (0,200,0).

Bright red indicates the part near the suspicious splicing boundary that comes from camera 1.
Bright green indicates the part near the suspicious splicing boundary that comes from camera 2.
Regular red indicates the part far from the suspicious splicing boundary that comes from camera 1.
Regular green indicates the part far from the suspicious splicing boundary that comes from camera 2.

For '_edgemask_3', the image is divided into 3 regions, with bright red and bright green merged into a single 'splicing boundary' region in regular blue (0,0,200).


Image


Edgemask
Edgemask 3

Figure 2(a): Images and edgemasks in authentic category

Image


Edgemask
Edgemask 3

Figure 2(b): Images and edgemasks in spliced category


Dataset Download

To download, please fill out the request form at this page.


People

  1. Jessie Hsu (yfhsu@ee.columbia.edu)
  2. Shih-Fu Chang (sfchang@ee.columbia.edu)

Citation

Please kindly cite our ICME 06 paper if you use our dataset:

Detecting Image Splicing Using Geometry Invariants And Camera Characteristics Consistency
Yu-Feng Hsu, Shih-Fu Chang
International Conference on Multimedia and Expo (ICME), Toronto, Canada, July 2006. [Abstract][pdf][slides]

@inproceedings{hsu06crfcheck,
author = {Y.-F. Hsu and S.-F. Chang},
title = {Detecting Image Splicing Using Geometry Invariants and Camera Characteristics Consistency},
booktitle = {International Conference on Multimedia and Expo},
year = {2006},
location = {Toronto, Canada}
}

Links

  1. TrustFoto
  2. Columbia Image Splicing Detection Evaluation Dataset
  3. Research Description Page - Image Splicing Detection Using Camera Response Function Inconsistency
  4. Columbia University Digital Video and Multimedia Lab
  5. Columbia University Graphics Lab