Columbia Image Splicing Detection Evaluation Dataset - Details
Our goal is to compile a dataset open to the research community so that new discovery and development of technologies can be expedited. The current data set is collected with sample diversity in mind. It has 933 authentic and 912 spliced image blocks of size 128 x 128 pixels. The image blocks are extracted from images in CalPhotos image set [CalPhotos'00]. The data set can be greatly improved in many ways, and should be considered as a preliminary effort addressing the increasingly important topic of benchmarking.
We emphasize the following points while creating the data set.
- Content diversity: The data set contains 1845 image blocks (128 x 128 pixels) of diverse content extracted from the images of CalPhotos site as well as a small set of 10 images captured by ourselves.
- Balanced distribution: The numbers of the authentic and spliced images are approximately the same.
- Realistic operation: We simulate the process of creating spliced images with two types of operations, crop-and-paste along object boundaries vs. crop-and-paste of horizontal (or vertical) strips. Image objects and strips can be from the same image or two separate source images. Objects spliced together can be the same or different types, smooth or textured.
- Localized detection: We decompose the authentic as well as spliced images into separate local blocks of a fixed size (128 pixels x 128 pixels). The block is kept at a reasonable size to ensure that sufficiently accurate statistical features can be estimated using the empirical data of each block.
The copyrights of the original images from the CalPhotos site are owned by the providers of the images. Information about the usage rights and other copyright issues can be found at
We thank the Berkeley Digital Library group for their generous support in making the original image set available for internal research. Additional terms and conditions are required to download this dataset. They are listed in the overview page.
Structure of the Dataset
The data set consists of 1845 image blocks of size 128x128 pixels with two main categories:
- (Au) = Authentic category: 933 image blocks
- (Sp) = Spliced category: 912 image blocks
The Authentic and Spliced categories are each subdivided in five subcategories:
- (T) = Image block with an entirely homogeneous textured region
- (S) = Image block with an entirely homogeneous smooth region
- (TS) = Image block with an object boundary between a textured region and a smooth region
- (TT) = Image block with an object boundary between two textured regions
- (SS) = Image block with an object boundary between two smooth regions
Then, the subcategories (TS), (TT) and (SS) are further subdivided into 3 sub-subcategories, according to the orientation of the object boundary:
- (V) = with vertical object boundary
- (H) = with horizontal object boundary
- (O) = other than (V) and (H)
Examples of Images in the Dataset
Figure 1: Typical image blocks in the dataset
The following table shows the numbers of image blocks in each subcategory:
Table 1: Numbers of image blocks in different subcategories
One Textured Background
One Smooth Background
Class Notation and Naming Convention
A sub-subcategory is named as a class. We denote a particular class with the following naming convention:
(e.g.) A class under the authentic categories, with image blocks of having a vertical object boundary separating a textured and a smooth region is denoted as Au-TS-V class.
(e.g.) A class under the spliced categories, with image blocks of an entirely homogeneous textured region is denoted as Sp-T class.
Each class of image blocks is kept a directory with the same name as its class name within the zip file.
The original images used in this dataset consist of 312 images from CalPhotos collection and 10 images captured by us using a digital camera.
We considered images that come directly out of imaging devices as authentic and images resulted from splicing of image regions (of the same or different images) as spliced. More discussions and definitions of image authenticity can be found in [NgChang'04a].We made an assumption that images from the CalPhotos site are the original captured data from imaging devices (cameras or scanners) without editing or post-processing. However, this assumption has not been confirmed with the original image providers.
The following table lists the IDs of the images from the CalPhotos Images. CalPhotos images are originally organized in directory structure with the directory name being number codes.
Table 2: IDs of images from the CalPhotos image collection
No of Images
All (86 images)
All (172 images)
All (3 images)
0023, 0057, 0058, 0126, 0122, 0125
0099, 0125, 0166, 0177
0003, 0016, 0012
0220, 0358, 0362, 0383, 0389, 0437
0015, 0016, 0023, 0063, 0070, 0088, 0095, 0097, 0137
0010, 0011, 0012
0064, 0065, 0066
To further diversify the image types, we use a digital camera to capture 10 additional images. Specifically, we mainly use this small set of images to populate more image blocks for Au-SS-V/O and Au-TS-H/V/O classes, as shown in Table 3 , where more detailed information about the 10 digital images are given Table 3 . Besides that, Figure 2 shows two examples of the digital images.
Table 3: Characteristics of the 10 digital images captured by using a digital camera
Canon PowerShot S40
1600 x 1200
Image Blocks Generated From the 10 Images
Au-SS-O (blk_11_57.bmp - blk_11_76.bmp)
Au-SS-V (blk_10_32.bmp - blk_10_36.bmp)
Au-TS-H (blk_3_86.bmp - blk_3_110.bmp)
Au-TS-O (blk_5_138.bmp -blk_5_163.bmp)
Au-TS-V (blk_4_58.bmp - blk_4_95.bmp)
Au-TT-O (blk_8_77.bmp - blk_8_78.bmp)
Au-TT-V (blk_7_22.bmp - blk_7_32.bmp)
Figure 2: Examples from the 10 images that we took and used for generating image blocks in the dataset
Dataset Production Procedure
In this section, we describe the procedures used in generating the authentic and spliced image blocks.
For the authentic category, the image blocks are cropped directly from the CalPhotos images and the images captured by the camera without any further manipulations.
For the spliced category, we first create composite images by object-based image splicing, i.e., cropping an object from an image and paste it to another image, without any post-processing such as edge smoothing and so on. Then, we cropped image blocks which contains part of the splicing boundary from the composite image as image blocks of the (Sp-TS-x), (Sp-TT-x) and (Sp-SS-x) classes, where x represents V, H or O. The above-mentioned cut and paste of image objects is performed using Adobe Photoshop 6.0 on a Windows XP. When cutting an object from an image, the outline of the object is traced using the Magnetic Lasso tool. Plain cut and paste commands are used for transforming the cut object to another image.
The image blocks of (Sp-T) and (Sp-S) classes (i.e., spliced image blocks of homogenous textured and smooth region) are created from the corresponding image blocks of (Au-T) and (Au-S) classes (i.e., authentic image blocks of homogenous textured and smooth region) respectively. Specifically, given an image block of (Au-T) or (Au-S) class, a corresponding image block for (Sp-T) or (Sp-S) class respectively is produced, by copying a vertical strip or a horizontal strip of 20 pixels wide (the decision on the strip orientation is random) from a randomly selected location, within the given image block of (Au-T) or (Au-S) class, and paste it to another randomly selected but different location within the same image (see Figure 3). The rationale for such way of producing images blocks of (Sp-T) and (Sp-S) classes from image blocks of (Au-T) and (Au-S) classes is for imitating the likely scenario of image forger patching up the void left behind after removing an object from an image. In this case, the void is likely to be covered up using small patches similar to the homogenous background. (Note that (Au-T) and (Au-S) classes consist of image blocks with homogenous texture or homogenous smooth region).
The process for creating image blocks of (Sp-T) and (Sp-S) classes from the corresponding image blocks of (Au-T) and (Au-S) classes respectively: An example of creating the image block no. 10 of (Sp-T) from the image block no. 10 of (Au-T).
A Data Set of Authentic and Spliced Image Blocks,
ADVENT Technical Report #203-2004-3
, June 2004 Columbia University
A Model for Image Splicing,
Tian-Tsong Ng, Shih-Fu Chang,
IEEE International Conference on Image Processing (ICIP), Singapore, October 2004
Blind Detection of Photomontage Using Higher Order Statistics,
Tian-Tsong Ng, Shih-Fu Chang, Qibin Sun,
IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, Canada, May 2004
Blind Detection of Digital Photomontage using Higher Order Statistics,
Tian-Tsong Ng, Shih-Fu Chang,
ADVENT Technical Report #201-2004-1 Columbia University, June 2004
A database of photos of plants, animals, habitats and other natural history subjects,
Detecting Digital Forgeries Using Bispectral Analysis,
MIT AI Memo AIM-1657, MIT, 1999
A Picture Tells a Thousand Lies,
New Scientist, vol. 179, pp. 38-41, 2003
When Is Seeing Believing?,
W. J. Mitchell,
Scientific American, pp. 44-49, 1994
- Tian-Tsong Ng (firstname.lastname@example.org)
- Shih-Fu Chang (email@example.com)
We sincerely appreciate Ginger Ogle of Berkeley Digital Library Project for the help in contacting the original photographers and transfering the image files.
- Research Description Page - Image Splicing Detection Using Higher Order Statistics
- Columbia University Digital Video and Multimedia Lab
- Columbia University Graphics Lab