Learning a Lot from a Little: Efficient Recognition with Partial Image Annotations
Kristen Grauman, University of Texas at Austin
Tuesday, January 13, 2009 - 11:00am
EE Dept. Conference Room, Mudd 1312
Abstract
Visual category recognition and image search are fundamental problems in computer vision. They remain challenging in large part due to the complexity and variability of real-world natural images. While recent progress has been made, current methods often require substantial manual intervention in which a human supervisor provides careful image annotations and specifies consistent examples from which the system can learn. Such dependence on human labeling effort becomes prohibitively expensive for learning large numbers of categories, and may even lead to unintentional biases in the scope of objects learned.
In this talk I will present our recent work addressing scalable image search and recognition from partially annotated image data. I will focus on our approach for performing sub-linear time search with metrics learned from an incomplete set of similarity constraints, and introduce an active learning strategy able to cope with the multiple granularities at which image annotations can be specified.
Given pairwise similarity constraints between some images, we learn a distance function that captures the images' underlying relationships well. To allow sub-linear time similarity search under the learned metric, we show how to encode the metric parameterization into randomized locality-sensitive hash functions. Our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases. In order to minimize the amount of human input, we show how the system can actively choose desired annotations among a mixture of weakly and strongly labeled image examples. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort.
Speaker Bio
Kristen Grauman is a Clare Boothe Luce Assistant Professor in the Department of Computer Sciences at the University of Texas at Austin. Before joining UT-Austin, she received the Ph.D. and S.M. degrees from the MIT Computer Science and Artificial Intelligence Laboratory. Her research in computer vision and machine learning focuses on object recognition and image retrieval. She is a Microsoft Research New Faculty Fellow, and a recipient of an NSF CAREER award and the Frederick A. Howes Scholar Award in Computational Science.
homepage: http://www.cs.utexas.edu/~grauman/