Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Liangliang Cao, Zhenguo Li, Yadong Mu, Shih-Fu Chang. Submodular Video Hashing: A Unified Framework Towards Video Pooling and Hashing. In ACM Multimedia, Nara, Japan, November 2012.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


This paper develops a novel framework for efficient large-scale video retrieval. We aim to find video according to higher level similarities, which is beyond the scope of traditional near duplicate search. Following the popular hashing technique we employ compact binary codes to facilitate nearest neighbor search. Unlike the previous methods which capitalize on only one type of hash code for retrieval, this paper combines heterogeneous hash codes to effectively describe the diverse and multi-scale visual contents in videos. Our method integrates feature pooling and hashing in a single framework. In the pooling stage, we cast video frames into a set of pre-specified components, which capture a variety of semantics of video contents. In the hashing stage, we represent each video component as a compact hash code, and combine multiple hash codes into hash tables for effective search. To speed up the retrieval while retaining most informative codes, we propose a graph-based influence maximization method to bridge the pooling and hashing stages. We show that the influence maximization problem is submodular, which allows a greedy optimization method to achieve a nearly optimal solution. Our method works very efficiently, retrieving thousands of video clips from TRECVID dataset in about 0.001 second. For a larger scale synthetic dataset with 1M samples, it uses less than 1 second in response to 100 queries. Our method is extensively evaluated in both unsupervised and supervised scenarios, and the results on TRECVID Multimedia Event Detection and Columbia Consumer Video datasets demonstrate the success of our proposed technique


Zhenguo Li
Yadong Mu

BibTex Reference

   Author = {Cao, Liangliang and Li, Zhenguo and Mu, Yadong and Chang,       Shih-Fu},
   Title = {Submodular Video Hashing: A Unified Framework Towards Video       Pooling and Hashing},
   BookTitle = {ACM Multimedia},
   Address = {Nara, Japan},
   Month = {November},
   Year = {2012}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).