dvmmPub97

Alejandro Jaimes. Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information. PhD Thesis Graduate School of Arts and Sciences, Columbia University, 2003.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Note on this paper

Advisor: Prof. Chang

Abstract

We address the problem of automatic indexing and organization of visual information through user interaction at multiple levels. Our work focuses on the following three important areas: (1) understanding of visual content and the way users search and index it; (2) construction of flexible computational methods that learn how to automatically classify images and videos from user input at multiple levels; (3) integration of generic visual detectors in solving practical tasks in the specific domain of consumer photography. In particular, we present the following: (1) novel conceptual structures for classifying visual attributes (the Multi-Level Indexing Pyramid); (2) a novel framework for learning structured visual detectors from user input (the Visual Apprentice); (3) a new study of human eye movements in observing images of different visual categories; (4) a new framework for the detection of non-identical duplicate consumer photographs in an interactive consumer image organization system; (5) a detailed study of duplicate consumer photographs. In the Visual Apprentice (VA), first a user defines a model via a multiple-level definition hierarchy (a scene consists of objects, object-parts, etc.). Then, the user labels example images or videos based on the hierarchy (a handshake image contains two faces and a handshake) and visual features are extracted from each example. Finally, several machine learning algorithms are used to learn classifiers for different nodes of the hierarchy. The best classifiers and features are automatically selected to produce a Visual Detector (e.g., for a handshake), which is applied to new images or videos. In the human eye tracking experiments we examine variations in the way people look at images within and across different visual categories and explore ways of integrating eye tracking analysis with the VA framework. Finally, we present a novel framework for the detection of non-identical duplicate consumer images for systems that help users automatically organize their collections. Our approach is based on a multiple strategy that combines knowledge about the geometry of multiple views of the same scene, the extraction of low-level features, the detection of objects using the VA and domain knowledge

Contact

Alejandro Jaimes

BibTex Reference

@PhdThesis{dvmmPub97,
   Author = {Jaimes, Alejandro},
   Title = {Conceptual Structures and Computational Methods for Indexing and Organization of Visual Information},
   School = {Graduate School of Arts and Sciences, Columbia University},
   Year = {2003}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.