(to be updated soon)
AMOS is a video object segmentation
and retrieval system. In this framework, a video object (e.g. person,
car) is modeled and tracked as a set of regions with corresponding visual
features and spatio-temporal relations. The region-based model also provides
an effective base for similarity retrieval of video objects.
AMOS effectively combines user
input and automatic region segmentation for defining and tracking video
objects at a semantic level. First, the user roughly outlines the contour
of an object at the starting frame, which is used to create a video object
with underlying homogeneous regions. This process is based on a region
segmentation method that involves color and edge features and a region
aggregation method that classifies regions into foreground and background.
Then, the object and the homogeneous regions are tracked through successive
frames. This process uses affine motion models to project regions from
frame to frame and a color-based region growing to determine the final
projected regions. Users can stop the segmentation at any time to correct
the contour of video objects. Extensive experimental results have demonstrated
excellent results. Most tracking errors are caused by uncovered regions
and can be corrected with a few user inputs.
AMOS also extracts salient
regions within video objects that users can interactively create and manipulate.
Visual features and spatio-temporal relations are computed for video objects
and salient regions and stored in a database for similarity matching.
The features include motion trajectory, dominant color, texture, shape,
and time descriptors. Currently three types of relations among the regions
of a video object are supported: orientation spatial (angle between two
regions), topological spatial (contains, does not contain, or inside),
and directional temporal (start before, at the same time, or after). Users
can enter textual annotations for the objects. AMOS
accepts queries in the form of sketches or examples and returns similar
video objects based on different features and relations. The query process
of finding candidate video objects for a query uses a filtering together
with a joining scheme. The first step is to find a list candidate regions
from the database for each query region based on the visual features.
Then, the region lists are joined to obtain candidate objects and their
total distance to the query is computed by matching the spatio-temporal
For problems or questions
regarding this web site contact The
Last updated: June 12, 2002.