Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Alejandro Jaimes, Shih-Fu Chang. Concepts and Techniques for Indexing Visual Semantics. In Image Databases, Search and Retrieval of Digital Imagery, V. Castelli, L. Bergman (eds.), Chap. 17, pp. 497-556, Wiley, 2002.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


In this chapter, we focus on the semantics of visual information (e.g., objects, scenes, etc.). In the first part we discuss a ten level conceptual framework for indexing visual information, stressing the differences between syntax (e.g., color, texture, etc.) and semantics, and showing that indexing can occur at multiple levels and that not all indexing techniques are suitable for all levels (e.g., a color histogram cannot be used to find the image of a person). We also emphasize the differences between query formulation paradigms (e.g., query-by-example, or similarity), and indexing techniques (e.g., placing items in semantic categories or indexing using low-level features). We explain how recent interactive techniques (e.g., relevance feedback) address some of the limitations of current query formulation approaches, and discuss the limitations of different indexing techniques. In the second part of the chapter we focus on automatic semantic classification (e.g., objects and scenes), and present a brief overview of some object recognition approaches from the Computer Vision literature. We highlight the relationship between CBIR and object recognition, emphasizing the differences and outlining the new challenges in the application of object recognition approaches in CBIR. In the last part of the chapter we discuss the Visual Apprentice, a framework in which structured visual detectors are learned from user input at multiple levels. Users define classes using a multiple level definition hierarchy (e.g., a scene is composed of objects, composed of object-parts, etc.) and label image and video examples. The system uses the training examples to learn classifiers that are used to automatically index new images and videos (i.e., label scenes of objects based on the hierarchy defined by the user). The Visual Apprentice is discussed in some detail, because it addresses some of the limitations of current CBIR systems. We also discuss some of the issues that arise when machine learning techniques are used in real world applications of CBIR (e.g., Baseball video)


Alejandro Jaimes
Shih-Fu Chang

BibTex Reference

   Author = {Jaimes, Alejandro and Chang, Shih-Fu},
   Title = {Concepts and Techniques for Indexing Visual Semantics},
   BookTitle = {Image Databases, Search and Retrieval of Digital Imagery},
   editor = {Castelli, V. and Bergman, L.},
   Chapter= {17},
   Pages = {497--556},
   Publisher = {Wiley},
   Year = {2002}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).