~ Lexing Xie / Research / DotMuse / Part I
 

Unsupervised Disocovery of Video Structure
with Statistical Temporal Models

Abstract
This project deals with the problem of unsupervised mining of structures in video using multi-scale statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored.
Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapper-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selected feature set for soccer and baseball videos matches the ones that are manually selected with domain knowledge, (2) The system automatically discovers high-level structures that matches the semantic events in the video, (3) The system achieves even slightly better accuracy in detecting semantic events in unlabelled soccer videos than a competing supervised approach designed and trained with domain knowledge.
 
 
Publications and Reports
L. Xie, S.-F. Chang, A. Divakaran and H. Sun, Unsupervised Mining of Statistical Temporal Structures in Video, in Video Mining, A. Rosenfeld, D. Doremann, D. Dementhon Eds, Kluwer Academic Publishers, 2003
L. Xie, S.-F. Chang, A. Divakaran and H. Sun, Unsupervised Discovery of Multilevel Statistical Video Structures Using Hierarchical Hidden Markov Models, (PDF) ICME 2003, July 2003
L. Xie, S.-F. Chang, A. Divakaran and H. Sun, Feature Selection for Unsupervised Discovery of Statistical Temporal Structures in Video, to appear at ICIP 2003, September 2003
L. Xie, S.-F. Chang, A. Divakaran and H. Sun, Learning Hierarchical Hidden Markov Models for Usupervised Structure Disocvery from Video, Advent Tech. Report, Dec. 2002 (PS.GZ/PDF)
 
Last update: May 25, 2003