Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Shih-Fu Chang, Dan Ellis, Wei Jiang, Keansub Lee, Akira Yanagawa, Alexander C. Loui, Jiebo Luo. Large-Scale Multimodal Semantic Concept Detection for Consumer Video. In ACM SIGMM International Workshop on Multimedia Information Retrieval, Germany, September 2007.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of the art of multimedia analytics (including both audio and visual analysis) in consumer video classification and to discover new research opportunities. We investigated several statistical approaches built upon global/local visual features, audio features, and audio-visual combinations. Three multi-modal fusion frameworks (ensemble, context fusion, and joint boosting) are also evaluated. Experiment results show that visual and audio models perform best for different sets of concepts. Both provide significant contributions to multimodal fusion, via expansion of the classifier pool for context fusion and the feature bases for feature sharing. The fused multimodal models are shown to significantly reduce the detection errors (compared to single modality models), resulting in a promising accuracy of 83% over diverse concepts. To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a large-scale ontology and realistic video corpus


Shih-Fu Chang
Wei Jiang
Akira Yanagawa

BibTex Reference

   Author = {Chang, Shih-Fu and Ellis, Dan and Jiang, Wei and Lee, Keansub and Yanagawa, Akira and C. Loui, Alexander and Luo, Jiebo},
   Title = {{Large-Scale Multimodal Semantic Concept Detection for Consumer Video}},
   BookTitle = {{ ACM SIGMM International Workshop on Multimedia Information Retrieval}},
   Address = {Germany},
   Month = {September},
   Year = {2007}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).