Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Zheng Shou, Dongang Wang, Shih-Fu Chang. Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


We address temporal action localization in untrimmed long videos. This is important because videos in real applications are usually unconstrained and contain multiple action instances plus video content of background scenes or other activities. To address this challenging issue, we exploit the effectiveness of deep networks in temporal action localization via three segment-based 3D ConvNets: (1) a proposalnetwork identifies candidate segments in a long video that may contain actions; (2) a classification network learns one-vs-all action classification model to serve as initialization for the localization network; and (3) a localization network fine-tunes the learned classification network to localize each action instance. We propose a novel loss function for the localization network to explicitly consider temporal overlap and achieve high temporal localization accuracy. In the end, only the proposal network and the localization network are used during prediction. On two large-scale benchmarks, our approach achieves significantly superior performances compared with other state-of-the-art systems: mAP increases from 1.7\% to 7.4\% on MEXaction2 and increases from 15.0\% to 19.0\% on THUMOS 2014


Zheng Shou
Shih-Fu Chang

BibTex Reference

   Author = {Shou, Zheng and Wang, Dongang and Chang, Shih-Fu},
   Title = {Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs},
   BookTitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
   Year = {2016}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).