Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Shih-Fu Chang, R. Manmatha, Tat-Seng Chua. Combining Text and Audio-Visual Features in Video Indexing. In IEEE ICASSP 2005, Philadelphia, PA, March 2005.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance will be described, primarily in the broadcast news video domain


Shih-Fu Chang

BibTex Reference

   Author = {Chang, Shih-Fu and Manmatha, R. and Chua, Tat-Seng},
   Title = {Combining Text and Audio-Visual Features in Video Indexing},
   BookTitle = {IEEE ICASSP 2005},
   Address = {Philadelphia, PA},
   Month = {March},
   Year = {2005}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).