Columbia Image Splicing Detection Evaluation Dataset

Columbia Image Splicing Detection Evaluation Dataset - Details

Objective

Our goal is to compile a dataset open to the research community so that new discovery and development of technologies can be expedited. The current data set is collected with sample diversity in mind. It has 933 authentic and 912 spliced image blocks of size 128 x 128 pixels. The image blocks are extracted from images in CalPhotos image set [CalPhotos'00]. The data set can be greatly improved in many ways, and should be considered as a preliminary effort addressing the increasingly important topic of benchmarking.

Design Criteria

We emphasize the following points while creating the data set.

Content diversity: The data set contains 1845 image blocks (128 x 128 pixels) of diverse content extracted from the images of CalPhotos site as well as a small set of 10 images captured by ourselves.

Balanced distribution: The numbers of the authentic and spliced images are approximately the same.

Realistic operation: We simulate the process of creating spliced images with two types of operations, crop-and-paste along object boundaries vs. crop-and-paste of horizontal (or vertical) strips. Image objects and strips can be from the same image or two separate source images. Objects spliced together can be the same or different types, smooth or textured.

Localized detection: We decompose the authentic as well as spliced images into separate local blocks of a fixed size (128 pixels x 128 pixels). The block is kept at a reasonable size to ensure that sufficiently accurate statistical features can be estimated using the empirical data of each block.

Copyright

The copyrights of the original images from the CalPhotos site are owned by the providers of the images. Information about the usage rights and other copyright issues can be found at
http://elib.cs.berkeley.edu/photos/use.html

We thank the Berkeley Digital Library group for their generous support in making the original image set available for internal research. Additional terms and conditions are required to download this dataset. They are listed in the overview page.

Structure of the Dataset

The data set consists of 1845 image blocks of size 128x128 pixels with two main categories:

(Au) = Authentic category: 933 image blocks

(Sp) = Spliced category: 912 image blocks

The Authentic and Spliced categories are each subdivided in five subcategories:

(T) = Image block with an entirely homogeneous textured region

(S) = Image block with an entirely homogeneous smooth region

(TS) = Image block with an object boundary between a textured region and a smooth region

(TT) = Image block with an object boundary between two textured regions

(SS) = Image block with an object boundary between two smooth regions

Then, the subcategories (TS), (TT) and (SS) are further subdivided into 3 sub-subcategories, according to the orientation of the object boundary:

(V) = with vertical object boundary

(H) = with horizontal object boundary

(O) = other than (V) and (H)

Examples of Images in the Dataset

Authentic Category

Homogenous Smooth

Homogenous Textured

Textured-Smooth

Textured-Textured

Smooth-Smooth

Spliced Category

Homogenous Smooth

Homogenous Textured

Textured-Smooth

Textured-Textured

Smooth-Smooth

Figure 1: Typical image blocks in the dataset

The following table shows the numbers of image blocks in each subcategory:

Table 1: Numbers of image blocks in different subcategories

Category

One Textured Background
(T)

One Smooth Background
(S)

Textured-Smooth Interface
(TS)

Textured-Textured Interface
(TT)

Smooth-Smooth Interface
(SS)

Authentic (Au)

126

54

409

179

165

Spliced (Sp)

126

54

298

287

147

Class Notation and Naming Convention

A sub-subcategory is named as a class. We denote a particular class with the following naming convention:

(main-categories)-(sub-categories)-(orientation)

(e.g.) A class under the authentic categories, with image blocks of having a vertical object boundary separating a textured and a smooth region is denoted as Au-TS-V class.

(e.g.) A class under the spliced categories, with image blocks of an entirely homogeneous textured region is denoted as Sp-T class.

Each class of image blocks is kept a directory with the same name as its class name within the zip file.

Image Sources

The original images used in this dataset consist of 312 images from CalPhotos collection and 10 images captured by us using a digital camera.

We considered images that come directly out of imaging devices as authentic and images resulted from splicing of image regions (of the same or different images) as spliced. More discussions and definitions of image authenticity can be found in [NgChang'04a].We made an assumption that images from the CalPhotos site are the original captured data from imaging devices (cameras or scanners) without editing or post-processing. However, this assumption has not been confirmed with the original image providers.

The following table lists the IDs of the images from the CalPhotos Images. CalPhotos images are originally organized in directory structure with the directory name being number codes.

Table 2: IDs of images from the CalPhotos image collection

Directory

Image IDs

No of Images

0162_2013\0556

All (86 images)

86

0000_0000\0101

All (172 images)

172

0000_0000\0001

All (3 images)

3

0000_0000\0201

0023, 0057, 0058, 0126, 0122, 0125

6

0000_0000\0301

0064, 0127

2

0000_0000\0401

0012, 0021

2

0000_0000\0600

0007

1

0000_0000\0716

0001

1

0000_0000\0800

0099, 0125, 0166, 0177

4

0000_0000\0999

0003, 0016, 0012

3

0000_0000\1000

0220, 0358, 0362, 0383, 0389, 0437

6

0000_0000\1100

0046

1

0000_0000\1200

0254, 0258

2

0024_3291\1997

0015, 0016, 0023, 0063, 0070, 0088, 0095, 0097, 0137

9

0024_3291\1998

0136

1

0072_3301\1163

0134

1

0215

0040

1

1618

0007

1

1622

0038, 0063

2

1626

0010, 0011, 0012

3

1878

0064, 0065, 0066

3

1111_1111\1111

0007, 0181

2

Total

312

To further diversify the image types, we use a digital camera to capture 10 additional images. Specifically, we mainly use this small set of images to populate more image blocks for Au-SS-V/O and Au-TS-H/V/O classes, as shown in Table 3 , where more detailed information about the 10 digital images are given Table 3 . Besides that, Figure 2 shows two examples of the digital images.

Table 3: Characteristics of the 10 digital images captured by using a digital camera

Camera Model

Canon PowerShot S40

Image Size

1600 x 1200

Image Format

JPEG

Image Blocks Generated From the 10 Images

Au-SS-O (blk_11_57.bmp - blk_11_76.bmp)
Au-SS-V (blk_10_32.bmp - blk_10_36.bmp)
Au-TS-H (blk_3_86.bmp - blk_3_110.bmp)
Au-TS-O (blk_5_138.bmp -blk_5_163.bmp)
Au-TS-V (blk_4_58.bmp - blk_4_95.bmp)
Au-TT-O (blk_8_77.bmp - blk_8_78.bmp)
Au-TT-V (blk_7_22.bmp - blk_7_32.bmp)

Figure 2: Examples from the 10 images that we took and used for generating image blocks in the dataset

Dataset Production Procedure

In this section, we describe the procedures used in generating the authentic and spliced image blocks.

For the authentic category, the image blocks are cropped directly from the CalPhotos images and the images captured by the camera without any further manipulations.

For the spliced category, we first create composite images by object-based image splicing, i.e., cropping an object from an image and paste it to another image, without any post-processing such as edge smoothing and so on. Then, we cropped image blocks which contains part of the splicing boundary from the composite image as image blocks of the (Sp-TS-x), (Sp-TT-x) and (Sp-SS-x) classes, where x represents V, H or O. The above-mentioned cut and paste of image objects is performed using Adobe Photoshop 6.0 on a Windows XP. When cutting an object from an image, the outline of the object is traced using the Magnetic Lasso tool. Plain cut and paste commands are used for transforming the cut object to another image.

The image blocks of (Sp-T) and (Sp-S) classes (i.e., spliced image blocks of homogenous textured and smooth region) are created from the corresponding image blocks of (Au-T) and (Au-S) classes (i.e., authentic image blocks of homogenous textured and smooth region) respectively. Specifically, given an image block of (Au-T) or (Au-S) class, a corresponding image block for (Sp-T) or (Sp-S) class respectively is produced, by copying a vertical strip or a horizontal strip of 20 pixels wide (the decision on the strip orientation is random) from a randomly selected location, within the given image block of (Au-T) or (Au-S) class, and paste it to another randomly selected but different location within the same image (see Figure 3). The rationale for such way of producing images blocks of (Sp-T) and (Sp-S) classes from image blocks of (Au-T) and (Au-S) classes is for imitating the likely scenario of image forger patching up the void left behind after removing an object from an image. In this case, the void is likely to be covered up using small patches similar to the homogenous background. (Note that (Au-T) and (Au-S) classes consist of image blocks with homogenous texture or homogenous smooth region).

Figure 3

The process for creating image blocks of (Sp-T) and (Sp-S) classes from the corresponding image blocks of (Au-T) and (Au-S) classes respectively: An example of creating the image block no. 10 of (Sp-T) from the image block no. 10 of (Au-T).

Technical Report

A Data Set of Authentic and Spliced Image Blocks,
Tian-Tsong Ng, Shih-Fu Chang,
ADVENT Technical Report #203-2004-3 Columbia University, June 2004

References

[1]
A Model for Image Splicing,
Tian-Tsong Ng, Shih-Fu Chang,
IEEE International Conference on Image Processing (ICIP), Singapore, October 2004

[2]
Blind Detection of Photomontage Using Higher Order Statistics,
Tian-Tsong Ng, Shih-Fu Chang, Qibin Sun,
IEEE International Symposium on Circuits and Systems (ISCAS), Vancouver, Canada, May 2004

[3]
Blind Detection of Digital Photomontage using Higher Order Statistics,
Tian-Tsong Ng, Shih-Fu Chang,
ADVENT Technical Report #201-2004-1 Columbia University, June 2004
[4]
A database of photos of plants, animals, habitats and other natural history subjects,
CalPhotos,
http://elib.cs.berkeley.edu/photos/, 2000

[5]
Detecting Digital Forgeries Using Bispectral Analysis,
H. Farid,
MIT AI Memo AIM-1657, MIT, 1999

[6]
A Picture Tells a Thousand Lies,
H.. Farid,
New Scientist, vol. 179, pp. 38-41, 2003

[7]
When Is Seeing Believing?,
W. J. Mitchell,
Scientific American, pp. 44-49, 1994

People

Tian-Tsong Ng ([email protected])

Shih-Fu Chang ([email protected])

Acknowledgements

We sincerely appreciate Ginger Ogle of Berkeley Digital Library Project for the help in contacting the original photographers and transfering the image files.

Links

TrustFoto

Research Description Page - Image Splicing Detection Using Higher Order Statistics

Columbia University Digital Video and Multimedia Lab

Columbia University Graphics Lab