dvmmPub236

Lexing Xie, Shih-Fu Chang, Ajay Divakaran, Huifang Sun. Unsupervised Mining of Statistical Temporal Structures in Video. In Video Mining, A. Rosenfeld, D. Doremann, D. Dementhon (eds.), Chap. 10, Kluwer Academic Publishers, 2003.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In this paper, we present algorithms for unsupervised mining of structures in video using multi-scaled statistical models. Video structure are repetitive segments in a video stream with consistent statistical characteristics. Such structures can often be interpreted in relation to distinctive semantics, particularly in structured domains like sports. While much work in the literature explores the link between the observations and the semantics using supervised learning, we propose unsupervised structure mining algorithms that aim at alleviating the burden of labelling and training, as well as providing a scalable solution for generalizing video indexing techniques to heterogeneous content collections such as surveillance and consumer videos. Existing unsupervised video structuring works primarily use clustering techniques, while the rich statistical characteristics in the temporal dimension at different granularity remain unexplored. Automatically identifying structures from an unknown domain poses significant challenges when domain knowledge is not explicitly present to assist algorithm design, model selection, and feature selection. In this work, we model multi-level statistical structures with hierarchical hidden Markov models based on a multi-level Markov dependency assumption. The parameters of the model are efficiently estimated using the EM algorithm, we have also developed a model structure learning algorithm that uses stochastic sampling techniques to find the optimal model structure, and a feature selection algorithm that automatically finds compact relevant feature sets using hybrid wrapper-filter methods. When tested on sports videos, the unsupervised learning scheme achieves very promising results: (1) The automatically selected feature set for soccer and baseball videos matches the ones that are manually selected with domain knowledge, (2) The system automatically discovers high-level structures that matches the semantic events in the video, (3) The system achieves even slightly better accuracy in detecting semantic events in unlabelled soccer videos than a competing supervised approach designed and trained with domain knowledge

Contact

Lexing Xie
Shih-Fu Chang

BibTex Reference

@InCollection{dvmmPub236,
   Author = {Xie, Lexing and Chang, Shih-Fu and Divakaran, Ajay and Sun, Huifang},
   Title = {Unsupervised Mining of Statistical Temporal Structures in Video},
   BookTitle = {Video Mining},
   editor = {Rosenfeld, A. and Doremann, D. and Dementhon, D.},
   Chapter= {10},
   Publisher = {Kluwer Academic Publishers},
   Year = {2003}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.