Yanagawa:Columbia374

Akira Yanagawa, Shih-Fu Chang, Lyndon Kennedy, Winston Hsu. Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts. ADVENT Technical Report #222-2006-8 Columbia University, March 2007.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

Semantic concept detection represents a key requirement in accessing large collections of digital images/videos. However, due to limitations on resources, the evaluation of concept detection is usually much smaller in scope than is generally thought to be necessary for effectively leveraging concept detection for video search. While such annotation data is certainly valuable, it should also be noted that building automatic concept detectors is a complicated and computationally expensive process. To help address much of this replication-of-effort problem, we are releasing a set of 374 semantic concept detectors (called ?Columbia374?) with the ground truth, the features, and the results of the detectors based on our baseline detection method in TRECVID2005/2006, with the goal of fostering innovation in concept detection and enabling the exploration of the use of a large set of concept detectors for video search. When future datasets become available (e.g., TRECVID 2007), we will also release features and detection results over the new data set. The 374 concepts are selected from the LSCOM ontology, which includes more than 834 visual concepts jointly defined by researchers, information analysts, and ontology specialists according to the criteria of usefulness, feasibility, and observability. These concepts are related to events, objects, locations, people, and programs that can be found in general broadcast news videos. Columbia374 employs a simple baseline method, composed of three types of features, individual SVMs trained independently over each feature space, and a simple late fusion of the SVMs. Such an approach is rather light-weight, when compared to top-performing TRECVID submissions. Nonetheless, running even such a light-weight training process for all 374 concepts takes approximately 3 weeks using 20 machines in parallel, or roughly more than a year of machine time. Clearly this is not an effort that needs to be duplicated at dozens of research groups around the world. Despite the simple features and classification methods used for the Columbia374 detectors, the resulting baseline models achieve very good performance in the TRECVID2006 concept detection benchmark and, therefore, provides a strong baseline platform for researchers to expand upon.

Contact

Akira Yanagawa
Shih-Fu Chang
Lyndon Kennedy
Winston Hsu

BibTex Reference

@TechReport{Yanagawa:Columbia374,
   Author = {Yanagawa, Akira and Chang, Shih-Fu and Kennedy, Lyndon and Hsu, Winston},
   Title = {Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts},
   Institution = {Columbia University},
   Month = {March},
   Year = {2007}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.