Lexing Xie, Shih-Fu Chang, Ajay Divakaran, Huifang Sun
Abstract
Structure elements in a time sequence are repetitive segments that bear consistent
deterministic or stochastic characteristics. While most existing work in detecting
structures follow a supervised paradigm, we propose a fully unsupervised statistical
solution in this paper. We present a unified approach to structure discovery
from long video sequences as simultaneously finding the statistical descriptions
of structure and locating segments that matches the descriptions. We model the
multilevel statistical structure as hierarchical hidden Markov models, and present
efficient algorithms for learning both the parameters, as well as the model
structure including the complexity of each structure element and the number
of elements in the stream. We have also proposed feature selection algorithms
that iterate between a wrapper and a filter method to partition the large feature
pool into consistent and compact subsets, upon which the hierarchical hidden
Markov model is learned. When tested on a specific domain, soccer video, the
unsupervised learning scheme achieves very promising results: the automatically
selected feature set includes the manually identified
intuitively most significant feature, and the system automatically discovers
the statistical descriptions of high-level structures, and at the same time
achieves even slightly better accuracy in detecting discovered structures in
unlabelled videos than a supervised approach designed with domain knowledge
and trained with comparable hidden Markov models.
(PDF 271K)