Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information

Alejandro Jaimes


We address the problem of automatic indexing and organization of visual information through user interaction at multiple levels. Our work focuses on the following three important areas: (1) understanding of visual content and the way users search and index it; (2) construction of flexible computational methods that learn how to automatically classify images and videos from user input at multiple levels; (3) integration of generic visual detectors in solving practical tasks in the specific domain of consumer photography.

In particular, we present the following: (1) novel conceptual structures for classifying visual attributes (the Multi-Level Indexing Pyramid); (2) a novel framework for learning structured visual detectors from user input (the Visual Apprentice); (3) a new study of human eye movements in observing images of different visual categories; (4) a new framework for the detection of non-identical duplicate consumer photographs in an interactive consumer image organization system; (5) a detailed study of duplicate consumer photographs.

In the Visual Apprentice (VA), first a user defines a model via a multiple-level definition hierarchy (a scene consists of objects, object-parts, etc.). Then, the user labels example images or videos based on the hierarchy (a handshake image contains two faces and a handshake) and visual features are extracted from each example. Finally, several machine learning algorithms are used to learn classifiers for different nodes of the hierarchy. The best classifiers and features are automatically selected to produce a Visual Detector (e.g., for a handshake), which is applied to new images or videos.

In the human eye tracking experiments we examine variations in the way people look at images within and across different visual categories and explore ways of integrating eye tracking analysis with the VA framework.

Finally, we present a novel framework for the detection of non-identical duplicate consumer images for systems that help users automatically organize their collections. Our approach is based on a multiple strategy that combines knowledge about the geometry of multiple views of the same scene, the extraction of low-level features, the detection of objects using the VA and domain knowledge.