Summary

We have observed that many popular images on the web are frequently copied, manipulated, and re-posted by many different web authors. This aggregate behavior essentially gives rise to a "family tree" for a given image, where parent-child relationships emerge from having a child image derived from the parent through some editing action. The emergent structure of this family tree, which we call a visual migration map (VMM) can have a number of interesting attributes. The top-most node might be the closest to the original version of the image. The leaf nodes, with high amounts of visual manipulation, might have completely divergent content, which might change the meaning of the original image or the perspective that it conveys. We propose that many of these image editing actions leave artifacts in the visual content of the image that can be used as cues to infer if a parent-child relationship might exist between any two images and to automatically extract an approximation of the image editing history across a plurality of examples of a given image.

Visual Migration Maps

In the figure below, we show an example of what a visual migration map might look like for a given image. Here, each particular image within the graph would represent an instance of the image that a user on the web has created. The edges between the images imply a parent-child relationship between the two images: the child was derived from the parent through a series of visual manipulations. At the top of the graph, we see the highest-resolution image with the largest crop area, which is most likely the closest to the original instance of the image, from which all other images are descended. At the bottom-right, we see some highly-manipulated versions of the image where external information has been overlayed on the original content. These may be of interest, since their content is highly divergent from the original image and they may represent subversions of the viewpoint conveyed by the original. At the bottom-left, we see other highly-manipulated versions of the image, where information was only removed from the original (either by scaling-down or by cropping). These may instances be of much less interest.

Automatically Constructing Image Histories

We propose that it is infeasible to ever create a true visual migration map for any given image, since, without the specific knowledge of image sources and manipulations provided by the image author, we can never definitively know the parent-child relationship (or if one even exists) between any two images. We do suggest, however, that aspects of the visual content of images can provide important cues about the parent-child relationships between images and that, specifically, certain contradictions in the visual content can lead us to definitively state that it is impossible that certain image pairs have parent-child relationships and that it is plausible that these relationships exist between other pairs. We propose that many types of image manipulations imply a directionality between the two images. Some examples of these manipulations are shown in the figure below. The manipulations are defined as follows. Scaling is the creation of a smaller, lower-resolution version of the image by decimating the larger image. In general, the smaller-scale image is assumed to be derived from the larger-scale image, as this usually results in preservation of image quality. Cropping is the creation of a new image out of a subsection of the original image. The image with the smaller crop area is assumed to have been derived from the image with the larger crop area. Grayscale is the removal of color from an image. We generally assume that the grayscale images are derived from color images. Overlay is the addition of text information or some segment of an external image on top of the original image. It is generally assumed that the image containing the overlay is derived from an image where the overlay is absent. Insertion is the process of inserting the image inside of another image. Typical examples might be creating an image with two distinct images placed side by side or by inserting the image in some border with additional external information. It is assumed that the image resulting from the insertion is derived from the other image. Of course, there are exceptions in the directions of each of these manipulations: it is possible, though not ideal, to scale images up, or an overlay could be removed with retouching software. Still, we assume the directions that we have specified are true in most cases.

Once we have the knowledge of the implied directionality of each of these edits, we can then check whether or not they tell a consistent story. If one edit type implies that image A is derived from image B and another edit type implies just the opposite, then it is highly unlikely that any parent-child relationship exists at all. If all of the edit types agree, then it is plausible that the parent-child relationship is actually true. The figure below shows examples of cases where these edits are either consistent or inconsistent.

The resulting parent-child relationship decisions across a plurality of examples of the image will then give rise to a graph structure, which is the automatically-constructed visual migration map. The following figure demonstrates how these pair-wise cues across the four example images above result in a visual migration map.

Experiments and Applications

We apply the above-described approach to images crawled from the web. Given a set of images that are found to be copies of each other, we automatically detect each of the parent-child relationships between each pair of images and arrive and an automatically-generated visual migration map. We apply the approach to 22 image sets related to political figures and find that the emerging automatically-detected migration maps are highly similar to migration maps generated through human annotations. An example of one such automatic map is shown below.

We see that despite some errors in the detection of edits and parent-child relationships, the overall structure of the graph is similar to what we would like. Particularly, the parent node is similar to the original image in that it has the highest resolution and largest crop area. We also see that the interesting highly-manipulated images are all located on sink nodes. Below, we show some example results of finding the most original and most manipulated images directly from these visual migration maps.

We further explore the results by manually labeling each of the images as either being highly-manipulated or original and then also exploring the webpages that originally referenced the images and label the ideological viewpoint of the webpage as being either positive, neutral, or negative with respect to the image. We observe that, indeed, there is a correlation between the addition of high degrees of manipulation and the subversion of the meaning of viewpoint conveyed by an image. Some examples of these correlations are shown in the figure below.

Acknowledgment

This material is based upon work supported by the National Science Foundation under Grant No. 0716203. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

People

Lyndon Kennedy, Shih-Fu Chang.

Publication

Lyndon Kennedy, Shih-Fu Chang. Internet Image Archaeology: Automatically Tracing the Manipulation Histories of Images on the Web. ACM Multimedia 2008, Vancouver, Canada, October 2008. [pdf]

For problems or questions regarding this web site contact The Web Master.
Last updated: May 15th, 2008.