hsu05trecvid04

Winston Hsu, Lyndon Kennedy, Shih-Fu Chang, Martin Franz, John R. Smith. COLUMBIA-IBM NEWS VIDEO STORY SEGMENTATION IN TRECVID 2004. ADVENT Technical Report #207-2005-3 Columbia Universiry, 2005.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In this technical report, we give an overview our technical developments in the story segmentation task in TRECVID 2004. Among them, we propose an information-theoretic framework, visual cue cluster construction (VC3), to automatically discover adequate mid-level features. The problem is posed as mutual information maximization, through which optimal cue clusters are discovered to preserve the highest information about the semantic labels. We extend the Information Bottleneck framework to high-dimensional continuous features and further propose a projection method to map each video into probabilistic memberships over all the cue clusters. The biggest advantage of the proposed approach is to remove the dependence on the manual process in choosing the mid-level features and the huge labor cost involved in annotating the training corpus for training the detector of each mid-level feature. When tested in TRECVID 2004 news video story segmentation, the proposed approach achieves promising performance gain over representations derived from conventional clustering techniques and even the mid-level features selected manually; meanwhile, it achieved one of the top performances, F1=0.65, close to the highest performance, F1=0.69, by other groups. We also experiment with other promising visual features and continue investigating effective prosody features. The introduction of post-processing also provides practical improvements. Furthermore, the fusion from other modalities, such as speech prosody features and ASR-based segmentation scores are significant and have been confirmed again in this experiment.

Contact

Winston Hsu
Lyndon Kennedy
Shih-Fu Chang
John_R. Smith

BibTex Reference

@TechReport{hsu05trecvid04,
   Author = {Hsu, Winston and Kennedy, Lyndon and Chang, Shih-Fu and          Franz, Martin and Smith, John R.},
   Title = {COLUMBIA-IBM NEWS VIDEO STORY SEGMENTATION IN TRECVID 2004},
   Institution = {Columbia Universiry},
   Year = {2005}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.