%O Report %F Yanagawa:Columbia374 %A Yanagawa, Akira %A Chang, Shih-Fu %A Kennedy, Lyndon %A Hsu, Winston %T Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts %I Columbia University %X Semantic concept detection represents a key requirement in accessing large collections of digital images/videos. However, due to limitations on resources, the evaluation of concept detection is usually much smaller in scope than is generally thought to be necessary for effectively leveraging concept detection for video search. While such annotation data is certainly valuable, it should also be noted that building automatic concept detectors is a complicated and computationally expensive process. To help address much of this replication-of-effort problem, we are releasing a set of 374 semantic concept detectors (called ?Columbia374?) with the ground truth, the features, and the results of the detectors based on our baseline detection method in TRECVID2005/2006, with the goal of fostering innovation in concept detection and enabling the exploration of the use of a large set of concept detectors for video search. When future datasets become available (e.g., TRECVID 2007), we will also release features and detection results over the new data set. The 374 concepts are selected from the LSCOM ontology, which includes more than 834 visual concepts jointly defined by researchers, information analysts, and ontology specialists according to the criteria of usefulness, feasibility, and observability. These concepts are related to events, objects, locations, people, and programs that can be found in general broadcast news videos. Columbia374 employs a simple baseline method, composed of three types of features, individual SVMs trained independently over each feature space, and a simple late fusion of the SVMs. Such an approach is rather light-weight, when compared to top-performing TRECVID submissions. Nonetheless, running even such a light-weight training process for all 374 concepts takes approximately 3 weeks using 20 machines in parallel, or roughly more than a year of machine time. Clearly this is not an effort that needs to be duplicated at dozens of research groups around the world. Despite the simple features and classification methods used for the Columbia374 detectors, the resulting baseline models achieve very good performance in the TRECVID2006 concept detection benchmark and, therefore, provides a strong baseline platform for researchers to expand upon. %U http://www.ee.columbia.edu/dvmm/publications/07/Yanagawa_Columbia374.pdf %8 March %D 2007