Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Wei Jiang, Courtenay Cotton, Shih-Fu Chang, Dan Ellis, Alexander C. Loui. Short-Term Audio-Visual Atoms for Generic Video Concept Classification. In Proceeding of ACM international conference on Multimedia (ACM MM), October 2009.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


We investigate the challenging issue of joint audio-visual analysis of generic videos targeting at semantic concept detection. We propose to extract a novel representation, the Short-term Audio-Visual Atom (S-AVA), for improved concept detection. An S-AVA is defined as a short-term region track associated with regional visual features and background audio features. An effective algorithm, named Short-Term Region tracking with joint Point Tracking and Region Segmentation (STR-PTRS), is developed to extract S-AVAs from generic videos under challenging conditions such as uneven lighting, clutter, occlusions, and complicated motions of both objects and camera. Discriminative audio-visual codebooks are constructed on top of S-AVAs using Multiple Instance Learning. Codebook-based features are generated for semantic concept detection. We extensively evaluate our algorithm over Kodak's consumer benchmark video set from real users. Experimental results confirm significant performance improvements?over 120% MAP gain compared to alternative approaches using static region segmentation without temporal tracking. The joint audio-visual features also outperform visual features alone by an average of 8.5% (in terms of AP) over 21 concepts, with many concepts achieving more than 20%


Wei Jiang
Shih-Fu Chang

BibTex Reference

   Author = {Jiang, Wei and Cotton, Courtenay and Chang, Shih-Fu and Ellis, Dan and Loui, Alexander C.},
   Title = {Short-Term Audio-Visual Atoms for Generic Video Concept Classification},
   BookTitle = {Proceeding of ACM international conference on Multimedia (ACM MM)},
   Month = {October},
   Year = {2009}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).