Jump to : Download | Note | Abstract | Contact | BibTex reference | EndNote reference |

dongqing:phdthesis

Dongqing Zhang. Statistical Part-Based Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling. PhD Thesis Graduate School of Arts and Sciences, Columbia University, 2005.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Note on this paper

Advisor: Prof. Chang

Abstract

The automatic analysis and indexing of visual content in unconstrained domain are important and challenging problems for a variety of multimedia applications. Much of the prior research work deals with the problems by modeling images and videos as feature vectors, such as global histogram or block-based representation. Despite substantial research efforts on analysis and indexing algorithms based on this representation, their performance remains unsatisfactory. This dissertation attempts to explore the problem from a different perspective through a part-based representation, where images and videos are represented as a collection of parts with their appearance and relational features. Such representation is partly motivated by the human vision research showing that the human vision system adopts similar mechanism to perceive images. Although part-based representation has been investigated for decades, most of the prior work has been focused on ad hoc or deterministic approaches, which require manual designs of the models and often have poor performance for real-world images or videos due to their inability to model uncertainty and noise. The main focus of this thesis instead is on incorporating statistical modeling and machine learning techniques into the paradigm of part-based modeling so as to alleviate the burden of human manual design, achieve the robustness to content variation and noise, and maximize the performance by learning from examples. We focus on the following three fundamental problems for visual content indexing and analysis : measuring the similarity of images, detecting objects and learning object models, and assigning semantic labels to the regions in images. We focus on a general graph-based representation for images and objects, called Attributed Relational Graph (ARG).We explore new statistical algorithms based upon this representation. Our main contributions include the following: First, we introduce a new principled similarity measure for ARGs that is able to learn the similarity from training data. We establish a theoretical framework for the similarity calculation and learning. And we have applied the developed method to detection of nearduplicate images. Second, we extend the ARG model and traditional Random Graph to a new model called Random Attributed Relational Graph (Random ARG) to represent an object model. We show how to achieve object detection through constructing Markov Random Fields, mapping parameters and performing approximations using advanced inference and learning algorithms. Third, we explore a higher-order relational model and efficient inference algorithms for the region labeling problem, using video scene text detection as a test case

Contact

Dongqing Zhang

BibTex Reference

@PhdThesis{dongqing:phdthesis,
   Author = {Zhang, Dongqing},
   Title = {Statistical Part-Based Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling},
   School = {Graduate School of Arts and Sciences, Columbia University},
   Year = {2005}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

 
bar

For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).