Download the Consumer Video Dataset.
(7.9 MB file. Expands to 20.8MB on disk.)
Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application.
In this work we developed Kodakfs consumer video benchmark data set, which includes (1) a significant number of videos from actual users
(1358 video clips from consumers and 1873 clips from Youtube), (2) a rich lexicon that accommodates consumersf needs (more than 100 concepts), and (3) the annotation of a subset of concepts (25) over the entire video data set. To the best of our knowledge, this is the first systematic work in the consumer domain aimed at the definition of a large lexicon, construction of a large benchmark data set, and annotation of videos in a rigorous fashion. Such effort will have significant impact by providing a sound foundation for developing and evaluating large-scale learning based semantic indexing/annotation techniques in the consumer domain.
Details about the data and the data structures used in this dataset release can be found in this paper. The dataset includes the annotations, extracted visual features of videos from consumers, and URLs for videos from Youtube. To get sample anonymized video clips or keyframes for videos from consumers, please send requests to us
 Akira Yanagawa, Alexander C. Loui, Jiebo Luo, Shih-Fu Chang. Dan Ellis, Wei Jiang, Lyndon Kennedy, and Keansub Lee, " Kodak consumer video benchmark data set: concept definition and annotation, " Columbia University ADVENT Technical Report 246-2008-4, Sep, 2008.
 Shih-Fu Chang, Dan Ellis, Wei Jiang, Keansub Lee, Akira Yanagawa, Alexander C. Loui, Jiebo Luo, " Large-Scale Multimodal Semantic Concept Detection for Consumer Video, " In ACM SIGMM International Workshop on Multimedia Information Retrieval, Germany, September 2007.
 Alexander C. Loui, Jiebo Luo, Shih-Fu Chang, Dan Ellis, Wei Jiang, Lyndon Kennedy, Keansub Lee, Akira Yanagawa, " Kodak's Consumer Video Benchmark Data Set: Concept Definition and Annotation, " In ACM SIGMM International Workshop on Multimedia Information Retrieval, Germany, September 2007.