Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Yu-Gang Jiang, Qi Dai, Tao Mei, Yong Rui, Shi-Fu Chang. Super Fast Event Recognition in Internet Videos. IEEE Transactions on Multimedia, 2013.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Techniques for recognizing high-level events in consumer videos on the Internet have many applications. Systems that produced state-of-the-art recognition performance usually contain modules requiring extensive computation, such as the extraction of the temporal motion trajectories, which cannot be deployed on large-scale datasets. In this paper, we provide a comprehensive study on efficient methods in this area and identify technical options for super fast event recognition in Internet videos. We start from analyzing a multimodal baseline that has produced good performance on popular benchmarks, by systematically evaluating each component in terms of both computational cost and contribution to recognition accuracy. After that, we identify alternative features, classifiers, and fusion strategies that can all be efficiently computed. In addition, we also provide a study on the following interesting question: for event recognition in Internet videos, what is the minimum number of visual and audio frames needed to obtain a comparable accuracy to that of using all the frames? Results on two rigorously designed datasets indicate that similar results can be maintained by using only a small portion of the visual frames. We also find that, different from the visual frames, the soundtracks contain little redundant information and thus sampling is always harmful. Integrating all the findings, our suggested recognition system is 2,350-fold faster than a baseline approach with even higher recognition accuracies. It recognizes 20 classes on a 120-second video sequence in just 1.78 seconds, using a regular desktop computer


Yu-Gang Jiang
Shih-Fu Chang

BibTex Reference

   Author = {Jiang, Yu-Gang and Dai, Qi and Mei, Tao and Rui, Yong and Chang, Shi-Fu},
   Title = {Super Fast Event Recognition in Internet Videos},
   Journal = {IEEE Transactions on Multimedia},
   Publisher = {IEEE},
   Year = {2013}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).