Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Liangliang Cao, Yadong Mu, Natsev Apostol, Shih-Fu Chang, Gang Hua, John R. Smith. Scene Aligned Pooling for Complex Video Recognition. In European Conference on Computer Vision (ECCV), 2012.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


Real-world videos often contain dynamic backgrounds and evolving people activities, especially for those web videos generated by users in unconstrained scenarios. This paper proposes a new visual representation, namely scene aligned pooling, for the task of event recognition in complex videos. Based on the observation that a video clip is often composed with shots of different scenes, the key idea of scene aligned pooling is to decompose any video features into concurrent scene components, and to construct classification models adaptive to different scenes. The experiments on two large scale real-world datasets including the TRECVID Multimedia Event Detection 2011 and the Human Motion Recognition Databases (HMDB) show that our new visual representation can consistently improve various kinds of visual features such as different low-level color and texture features, or middle-level histogram of local descriptors such as SIFT, or space-time interest points, and high level semantic model features, by a significant margin. For example, we improve the-state-of-the-art accuracy on HMDB dataset by 20% in terms of accuracy


Yadong Mu
John_R. Smith

BibTex Reference

   Author = {Cao, Liangliang and Mu, Yadong and Apostol, Natsev and Chang,       Shih-Fu and Hua, Gang and Smith, John R.},
   Title = {Scene Aligned Pooling for Complex Video Recognition},
   BookTitle = {European Conference on Computer Vision (ECCV)},
   Year = {2012}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).