|
Summary
Due to the explosion of Internet bandwidth and broadcast channels, video streams are easily accessible in many
forms such as news video broadcasts, blogs, and podcasting. As a critical event breaks out (e.g., tsunami or
hurricanes), bursts of news stories of the same topic emerge either from professional news or amateur videos. Topic
threading is an essential task to organize video content from distributed sources into coherent topics for further
manipulations such as browsing or search. Current solutions primarily rely on text features only but encounter
difficulty when text is noisy or unavailable.
There are usually recurrent visual patterns in video stories across sources that can help topic threading. For
example, the following figure illustrates a few examples of a broadcast news video and three web news articles in different
languages (e.g., Arabic, English, and Chinese) covering the same topic “Pope sorry for his remarks on Islam." Apparently, the visual duplicates of Pope Benedict XVI are widely used over all the news sources in the same topic. Such duplicates, confirmed by our analysis, are actually effective for news threading across languages.
In this work, we develop novel approaches for story topic tracking using multimodal information, including
text, visual duplicates, and semantic visual concepts. We propose a general fusion framework for combining diverse
cues and analyze the performance impact by each component. Evaluating on TRECVID 2005 data set, fusion of
visual duplicates improves the state-of-the-art text-based approach consistently by up to 25%. For certain topics,
visual duplicate alone even outperforms the text-based approach. In addition, we propose an information-theoretic
method for selecting subsets of semantic visual concepts that are most relevant to topic tracking.
Examples of a broadcast news video (d) and three web news (a-c) of different languages covering the same topic “Pope sorry for his remarks on Islam,” collected on September 17, 2006. The images of Pope Benedict XVI (e.g., those two in the red rectangle) are widely used (in near-duplciates) over all the news sources of the same topic. Aside from the text transcripts or web text tokens, the visual duplicates provide another similarity link between broadcast news videos or web news and help cross-domain topic threading. |
People
Publication
Winston Hsu, Shih-Fu Chang. Topic Tracking across Broadcast News Videos with Visual Duplicates and Semantic Concepts. In International Conference on Image Processing (ICIP), Atlanta, GA, USA, 2006. (PDF)
LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia. ADVENT Technical Report #217-2006-3 Columbia University, March 2006. (PDF)
Dong-Qing Zhang, Shih-Fu Chang. Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning. In ACM Multimedia, New York City, USA, October 2004. (PDF)
For problems or questions
regarding this web site contact The
Web Master.
Last updated: January 10, 2007.
|