techreport

on images, such as JPEG compression, resizing, the various in-camera image processing operations for PIM, and so on. 3. The robustness of the classifier to various computer graphics tech- niques such as the simulated camera depth-of-field (DoF) effects, soft shadow and so on. 4. The robustness of the classifier to various adversarial attacks. When the algorithm of a classifier is known, the attacker may be able to pre-process a PRCG such that it is classified as a photographic image. 5. The sensitivity of the classifier to image content, in particular for those ambiguous content such as that of the recaptured PRCG or paintings, PRCG of natural scene, PIM of artificial objects and so on. Apart from facilitating the evaluation of the PIM versus PRCG classi- fier according to the above-listed aspects, a good dataset for the PIM and PRCG classification problem in the passive-blind image authentication set- tings should also model the authentic and the fake images well. Hence, we have to ensure the reliable authenticity of the PIM besides that the PRCG are from reliable sources and are of high photorealism. The concern of high photorealism of PRCG is due to the fact that only PRCG of high photorealism will be used to fake PIM in realistic situation. Unfortunately, PRCG of high photorealism are not readily available in abun- dance in the Internet. There are many computer graphics in Internet but many of them are not truly photorealistic, so a conscious effort is needed to select only PRCG with high photorealism. Besides that, we also need to make sure that the content of the PRCG is comparable to that of the PIM. The concern of content compatibility between PIM and PRCG is to ensure that we are comparing apple to ap- ple. Otherwise, a trained classifier may overfit to the content discrepancy between the two image sets, for example, this can happen if the dataset contains mainly PIM of buildings and PRCG of forest. There are two ways to ensure the matching of the content. First way is to narrowly restrict the image content in both the PIM and the PRCG sets, e.g., we can restrict the dataset to have only images of vegetation. The second way is to define a broader scope for the content but ensure the content diversity within the scope, in order to lower the likelihood of content mismatch. In our case, we follow the second way; we define the content scope to be natural scene and ensure the content diversity within the defined scope. 4