
Partbased Object/Scene Detection by Learning




Motivation and Introduction

A simplified generation process for generating
the partbased representation of an image

Based on this generative framework, object detection problem therefore is reduced to a likelihood test problem. We need to calculate the generation likelihood and learn the parameters of the above generative process. In our paper, we show that we can reduce the likelihood calculation to a binary pairwise MRF defined on the association graph between the RARG and ARG (Figure below). The generation likelihood is shown to be related to the partition functions of the MRFs. Because the log partition function of a MRF has variational representation, we can realize the likelihood computation by variational inference (for example, Loopy Belief Propagation) and learning parameters by variational ExpectationMaximization (EM).
The association graph on which the binary pairwise
MRF is defined

Multiview Object Detection
In order to extend the singview object detection to the multiview object detection, we have developed a Mixture of RARG (MOR) model. In the MOR, each component RARG capture the statistics of an object view. The detection likelihood then becomes the linear combination of the detection likelihood of the individual RARG.
Besides the MOR model, we also explored the use of SVM plus fisher kernel for multiview object detection. Different from the MOR model, the SVM fisher kernel method captures the statistics of an object view using a set of support vectors. The fisher kernel approach only learns one single RARG model, and realize the detection by mapping the input ARG into the tangent space of the RARG likelihood manifold. The SVM based approach can alleviate the overfitting problem encountered in the MOR model for learning the mixture coefficients, and greatly increases the detection and learning speed.
Multiview objects (images are from web)

Experiments
We compare the performance of our system with the constellation model developed by the Oxford and Caltech computer vision group. The constellation is considered as the stateoftheart method in the community. We use the same image data sets. We achieved a comparable detection performance with significantly increased learning speed. The learning speed of our system is more than 2 times faster than the constellation model using Gibbs sampling, and more than 5 times faster using the Loopy Belief Propagation approach, either measured by the learning iteration number or the total learning time.
For the multiview object detection, we have built up our own data set by searching the goolge and altavista image search engine. By using the Mixture of RARG model, we improved the performance of the single RARG model by about 5 percent.
People
Publication
Dongqing Zhang, ShihFu Chang. A GenerativeDiscriminative Hybrid Method for MultiView Object Detection. In IEEE CVPR, New York City, New York, June 2006. (PDF)
Dongqing Zhang. Statistical PartBased Models: Theory and Applications in Image Similarity, Object Detection and Region Labeling. PhD Thesis Graduate School of Arts and Sciences, Columbia University, 2005. (PDF)
Dongqing Zhang, ShihFu Chang. Learning Random Attributed Relational Graph for Partbased Object Detection. ADVENT Technical Report #21220056 Columbia University, May 2005. (PDF)
DongQing Zhang and ShihFu Chang, "Detecting Image NearDuplicate by Stochastic Attributed Relational Graph Matching with Learning", ACM conference of Multimedia 2004, (ACM MM). (PDF)
DongQing Zhang and ShihFu Chang, "Stochastic Attributed Relational
Graph Matching for Image NearDuplicate Detection", Columbia University
ADVENT Technical Report #20620046 Columbia University, October 2004.
(PDF)
Related Projects