|
Introduction
to Columbia
Image Splicing Detection Evaluation Dataset |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Introduction Our objective is to compile a data set open to the research community so that new discovery and development of technologies can be expedited. The current data set is collected with sample diversity in mind. It has 933 authentic and 912 spliced image blocks of size 128 x 128 pixels. The image blocks are extracted from images in CalPhotos image set [CalPhotos'00]. The data set can be greatly improved in many ways, and should be considered as a preliminary effort addressing the increasingly important topic of benchmarking. Design CriteriaWe emphasize the following points while creating the data set.
CopyrightThe copyrights of the original images from the CalPhotos site are owned by the providers of the images. Information about the usage rights and other copyright issues can be found at http://elib.cs.berkeley.edu/photos/use.html We thank the Berkeley Digital Library group for their generous support in making the original image set available for internal research. We are currently trying to seek the permissions from the prospective owners of the images for us to release the data set as a research benchmark. Status of such permissions and information about download procedures will be updated on the following site. http://www.ee.columbia.edu/dvmm/downloads/AuthSplicedDataSet/AuthSplicedDataSet.htm Structure of Data SetThe data set consists of 1845 image blocks of size 128x128 pixels. There are two main categories of data set:
The Authentic and Spliced categories are respectively subdivided in five subcategories:
Then, the subcategories (TS), (TT) and (SS) is further subdivided into 3 sub-subcategories, according to the orientation of the object boundary:
Below are the some typical image blocks in each subcategory of the data set:
Figure 1 Typical image blocks in the data set The following table shows that the number of image blocks in each subcategory: Table 1: numbers of image blocks in different subcategories
Class Notation and File StructureA sub-subcategory is named as a class. We denote a particular class with the following naming convention: (main-categories)-(sub-categories)-(orientation) (e.g.) A class under the authentic categories, with image blocks of having a vertical object boundary separating a textured and a smooth region is denoted as Au-TS-V class. (e.g.) A class under the spliced categories, with image blocks of an entirely homogeneous textured region is denoted as Sp-T class. Each class of image blocks is kept a directory with the same name as its class name within the zip file. Image SourcesThe original images used in this dataset consist of 312 images from CalPhotos collection and 10 images captured by us using a digital camera. We considered images that come directly out of imaging devices as authentic and images resulted from splicing of image regions (of the same or different images) as spliced. More discussions and definitions of image authenticity can be found in [NgChang'04a].We made an assumption that images from the CalPhotos site are the original captured data from imaging devices (cameras or scanners) without editing or post-processing. However, this assumption has not been confirmed with the original image providers. The following table lists the IDs of the images from the CalPhotos Images. CalPhotos images are originally organized in directory structure with the directory name being number code. Table 2: IDs of images from the CalPhotos image collection.
To further diversify the image types, we use a digital camera to capture 10 additional images. Specifically, we mainly use this small set of images to populate more image blocks for Au-SS-V/O and Au-TS-H/V/O classes, as shown in Table 3 , where more detailed information about the 10 digital images are given Table 3 . Besides that, Figure 2 shows two examples of the digital images. Table 3: Characteristics of the 10 digital images captured by using a digital camera
Data Set Production ProcedureFor the authentic category, the image blocks are cropped directly from the CalPhotos images and the images captured by the camera without any further manipulations. For the spliced category, we first create composite images by object-based image splicing, i.e., cropping an object from an image and paste it to another image, without any post-processing such as edge smoothing and so on. Then, we cropped image blocks which contains part of the splicing boundary from the composite image as image blocks of the (Sp-TS-x), (Sp-TT-x) and (Sp-SS-x) classes, where x represents V, H or O. The above-mentioned cut and paste of image objects is performed using Adobe Photoshop 6.0 on a Windows XP. When cutting an object from an image, the outline of the object is traced using the Magnetic Lasso tool. Plain cut and paste commands are used for transforming the cut object to another image. The image blocks of (Sp-T) and (Sp-S) classes (i.e., spliced image blocks of homogenous textured and smooth region) are created from the corresponding image blocks of (Au-T) and (Au-S) classes (i.e., authentic image blocks of homogenous textured and smooth region) respectively. Specifically, given an image block of (Au-T) or (Au-S) class, a corresponding image block for (Sp-T) or (Sp-S) class respectively is produced, by copying a vertical strip or a horizontal strip of 20 pixels wide (the decision on the strip orientation is random) from a randomly selected location, within the given image block of (Au-T) or (Au-S) class, and paste it to another randomly selected but different location within the same image (see Figure 3). The rationale for such way of producing images blocks of (Sp-T) and (Sp-S) classes from image blocks of (Au-T) and (Au-S) classes is for imitating the likely scenario of image forger patching up the void left behind after removing an object from an image. In this case, the void is likely to be covered up using small patches similar to the homogenous background. (Note that (Au-T) and (Au-S) classes consist of image blocks with homogenous texture or homogenous smooth region). : The process for creating image blocks of (Sp-T) and (Sp-S) classes from the corresponding image blocks of (Au-T) and (Au-S) classes respectively: An example of creating the image block no. 10 of (Sp-T) from the image block no. 10 of (Au-T) References
Questions and CommentsPlease forward any questions and comments
Research Work which uses the DatasetLinks
AcknowledgementWe sincerely appreciate Ginger Ogle of Berkeley Digital Library Project for the help in contacting the original photographers and transfering the image files.
For problems or questions
regarding this web site contact The
Web Master. |