|
|
Summary

We have observed that many popular images on the web are frequently copied,
manipulated, and re-posted by many different web authors. This aggregate
behavior essentially gives rise to a "family tree" for a given
image, where parent-child relationships emerge from having a child image
derived from the parent through some editing action. The emergent structure
of this family tree, which we call a visual migration map (VMM) can have
a number of interesting attributes. The top-most node might be the closest
to the original version of the image. The leaf nodes, with high amounts
of visual manipulation, might have completely divergent content, which
might change the meaning of the original image or the perspective that
it conveys. We propose that many of these image editing actions leave
artifacts in the visual content of the image that can be used as cues
to infer if a parent-child relationship might exist between any two images
and to automatically extract an approximation of the image editing history
across a plurality of examples of a given image.
Visual Migration Maps
In the figure below, we show an example of what a visual migration map
might look like for a given image. Here, each particular image within
the graph would represent an instance of the image that a user on the
web has created. The edges between the images imply a parent-child relationship
between the two images: the child was derived from the parent through
a series of visual manipulations. At the top of the graph, we see the
highest-resolution image with the largest crop area, which is most likely
the closest to the original instance of the image, from which all other
images are descended. At the bottom-right, we see some highly-manipulated
versions of the image where external information has been overlayed on
the original content. These may be of interest, since their content is
highly divergent from the original image and they may represent subversions
of the viewpoint conveyed by the original. At the bottom-left, we see
other highly-manipulated versions of the image, where information was
only removed from the original (either by scaling-down or by cropping).
These may instances be of much less interest.

Automatically Constructing Image Histories
We propose that it is infeasible to ever create a true visual migration
map for any given image, since, without the specific knowledge of image
sources and manipulations provided by the image author, we can never definitively
know the parent-child relationship (or if one even exists) between any
two images. We do suggest, however, that aspects of the visual content
of images can provide important cues about the parent-child relationships
between images and that, specifically, certain contradictions in the visual
content can lead us to definitively state that it is impossible
that certain image pairs have parent-child relationships and that it is
plausible that these relationships exist between other pairs.
We propose that many types of image manipulations imply a directionality
between the two images. Some examples of these manipulations are shown
in the figure below. The manipulations are defined as follows. Scaling
is the creation of a smaller, lower-resolution version of the image by
decimating the larger image. In general, the smaller-scale image is assumed
to be derived from the larger-scale image, as this usually results in
preservation of image quality. Cropping is the creation
of a new image out of a subsection of the original image. The image with
the smaller crop area is assumed to have been derived from the image with
the larger crop area. Grayscale is the removal of color
from an image. We generally assume that the grayscale images are derived
from color images. Overlay is the addition of text information
or some segment of an external image on top of the original image. It
is generally assumed that the image containing the overlay is derived
from an image where the overlay is absent. Insertion
is the process of inserting the image inside of another image. Typical
examples might be creating an image with two distinct images placed side
by side or by inserting the image in some border with additional external
information. It is assumed that the image resulting from the insertion
is derived from the other image. Of course, there are exceptions in the
directions of each of these manipulations: it is possible, though not
ideal, to scale images up, or an overlay could be removed
with retouching software. Still, we assume the directions that we have
specified are true in most cases.

Once we have the knowledge of the implied directionality of each of these
edits, we can then check whether or not they tell a consistent story.
If one edit type implies that image A is derived from image B and another
edit type implies just the opposite, then it is highly unlikely that any
parent-child relationship exists at all. If all of the edit types agree,
then it is plausible that the parent-child relationship is actually true.
The figure below shows examples of cases where these edits are either
consistent or inconsistent.

The resulting parent-child relationship decisions across a plurality
of examples of the image will then give rise to a graph structure, which
is the automatically-constructed visual migration map. The following figure
demonstrates how these pair-wise cues across the four example images above
result in a visual migration map.

Experiments and Applications
We apply the above-described approach to images crawled from the web.
Given a set of images that are found to be copies of each other, we automatically
detect each of the parent-child relationships between each pair of images
and arrive and an automatically-generated visual migration map. We apply
the approach to 22 image sets related to political figures and find that
the emerging automatically-detected migration maps are highly similar
to migration maps generated through human annotations. An example of one
such automatic map is shown below.

We see that despite some errors in the detection of edits and parent-child
relationships, the overall structure of the graph is similar to what we
would like. Particularly, the parent node is similar to the original image
in that it has the highest resolution and largest crop area. We also see
that the interesting highly-manipulated images are all located on sink
nodes. Below, we show some example results of finding the most original
and most manipulated images directly from these visual migration maps.

We further explore the results by manually labeling each of the images
as either being highly-manipulated or original and then also exploring
the webpages that originally referenced the images and label the ideological
viewpoint of the webpage as being either positive, neutral, or negative
with respect to the image. We observe that, indeed, there is a correlation
between the addition of high degrees of manipulation and the subversion
of the meaning of viewpoint conveyed by an image. Some examples of these
correlations are shown in the figure below.

People

Publication

-
Lyndon Kennedy, Shih-Fu Chang. Internet Image Archaeology: Automatically
Tracing the Manipulation Histories of Images on the Web. ACM Multimedia
2008, Vancouver, Canada, October 2008. [pdf]

For problems or questions
regarding this web site contact The
Web Master.
Last updated: May 15th, 2008. |