dvmmPub18

Winston Hsu, Shih-Fu Chang, Chih-Wei Huang, Lyndon Kennedy, Ching-Yung Lin, Giridharan Iyengar. Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation. In IS&T/SPIE Symposium on Electronic Imaging: Science and Technology - SPIE Storage and Retrieval of Image/Video Database, San Jose, CA, January 2004.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to e®ectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address di®erent types of features, i.e., asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance, i.e., F1 measures up to 0.76 in ABC news and 0.73 in CNN news, present how these multi-level multi-modal features construct the probabilistic framework, and more importantly observe an interesting opportunity for further improvement

Contact

Winston Hsu
Shih-Fu Chang
Lyndon Kennedy
Ching-Yung Lin

BibTex Reference

@InProceedings{dvmmPub18,
   Author = {Hsu, Winston and Chang, Shih-Fu and Huang, Chih-Wei and Kennedy, Lyndon and Lin, Ching-Yung and Iyengar, Giridharan},
   Title = {Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation},
   BookTitle = {IS&T/SPIE Symposium on Electronic Imaging: Science and Technology - SPIE Storage and Retrieval of Image/Video Database},
   Address = {San Jose, CA},
   Month = {January},
   Year = {2004}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.