Topic Tracking across Broadcast News Videos with Visual Duplicates and Semantic Concepts

Current Research Areas > Multimedia Indexing and Content Management >

Summary

Due to the explosion of Internet bandwidth and broadcast channels, video streams are easily accessible in many forms such as news video broadcasts, blogs, and podcasting. As a critical event breaks out (e.g., tsunami or hurricanes), bursts of news stories of the same topic emerge either from professional news or amateur videos. Topic threading is an essential task to organize video content from distributed sources into coherent topics for further manipulations such as browsing or search. Current solutions primarily rely on text features only but encounter difficulty when text is noisy or unavailable.

There are usually recurrent visual patterns in video stories across sources that can help topic threading. For example, the following figure illustrates a few examples of a broadcast news video and three web news articles in different languages (e.g., Arabic, English, and Chinese) covering the same topic “Pope sorry for his remarks on Islam." Apparently, the visual duplicates of Pope Benedict XVI are widely used over all the news sources in the same topic. Such duplicates, confirmed by our analysis, are actually effective for news threading across languages.

In this work, we develop novel approaches for story topic tracking using multimodal information, including text, visual duplicates, and semantic visual concepts. We propose a general fusion framework for combining diverse cues and analyze the performance impact by each component. Evaluating on TRECVID 2005 data set, fusion of visual duplicates improves the state-of-the-art text-based approach consistently by up to 25%. For certain topics, visual duplicate alone even outperforms the text-based approach. In addition, we propose an information-theoretic method for selecting subsets of semantic visual concepts that are most relevant to topic tracking.

Examples of a broadcast news video (d) and three web news (a-c) of different languages covering the same topic “Pope sorry for his remarks on Islam,” collected on September 17, 2006. The images of Pope Benedict XVI (e.g., those two in the red rectangle) are widely used (in near-duplciates) over all the news sources of the same topic. Aside from the text transcripts or web text tokens, the visual duplicates provide another similarity link between broadcast news videos or web news and help cross-domain topic threading.

People

Publication

Winston Hsu, Shih-Fu Chang. Topic Tracking across Broadcast News Videos with Visual Duplicates and Semantic Concepts. In International Conference on Image Processing (ICIP), Atlanta, GA, USA, 2006. (PDF)

LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia. ADVENT Technical Report #217-2006-3 Columbia University, March 2006. (PDF)

Dong-Qing Zhang, Shih-Fu Chang. Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In ACM Multimedia, New York City, USA, October 2004. (PDF)

For problems or questions regarding this web site contact The Web Master.
Last updated: January 10, 2007.