%O Report %F akira:Consumer %A Yanagawa, Akira %A Loui, Alexander C. %A Luo, Jiebo %A Chang, Shih-Fu %A Ellis, Dan %A Jiang, Wei %A Kennedy, Lyndon %A Lee, Keansub %T Kodak consumer video benchmark data set: concept definition and annotation %I Columbia University %X Semantic indexing of images and videos in the consumer domain has become a very important issue for both research and actual application. In this work we developed Kodak¡¯s consumer video benchmark data set, which includes (1) a significant number of videos from actual users, (2) a rich lexicon that accommodates consumers' needs, and (3) the annotation of a subset of concepts over the entire video data set. To the best of our knowledge, this is the first systematic work in the consumer domain aimed at the definition of a large lexicon, construction of a large benchmark data set, and annotation of videos in a rigorous fashion. Such effort will have significant impact by providing a sound foundation for developing and evaluating large-scale learning-based semantic indexing/annotation techniques in the consumer domain. This report includes information about the concept definitions, the annotation process, video collection process, and the data structures used in the release file. The released dataset includes the annotations, extracted visual features (for videos from Kodak), and URLs of videos from YouTube. The Appendix section also includes the full list of concepts (more than 100 concepts in 7 categories) that have been defined in the consumer video domain %U http://www.ee.columbia.edu/dvmm/publications/08/kodak_consumer.pdf %8 September %D 2008