Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection

Quick Guide to CU-VIREO374

CU-VIREO374 Citation:

Yu-Gang Jiang, Akira Yanagawa, Shih-Fu Chang, Chong-Wah Ngo, "CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection", Columbia University ADVENT Technical Report #223-2008-1, Aug. 2008. [bibtex]

Download the prediction scores on TRECVID 2008 data set (552MB)

Download the prediction scores on TRECVID 2009 data set (113MB) note09

Download the prediction scores on TRECVID 2010 data set (43MB) note10  
Top 200 keyframes in TV10:


Semantic concept detection is an active research topic as it can provide semantic filters and aid in automatic search of image and video databases. The annual NIST TRECVID video retrieval benchmarking event has greatly contributed to this area by providing benchmark datasets and performing system evaluation. As acquiring ground truths of semantic concepts is time-consuming, in the TRECVID event only 10-20 concepts were selected for evaluation each year. This is insufficient for general video retrieval tasks, for which most researchers believe that hundreds or thousands of concepts would be more appropriate. In light of this, several efforts have developed and released annotation data for hundreds of concepts, such as LSCOM.

Although the annotations are publicly available, building detectors for hundreds of concepts is complicated and time-consuming. To stimulate innovation of new techniques and reduce the effort in replicating similar methods, there are several efforts in developing and releasing large-scale concept detectors, including Mediamill-101, Columbia374, and VIREO374. The Mediamill-101 includes 101 detectors over TRECVID 2005/2006 datasets, including ground truth labels, features, and detection scores. Columbia374 and VIREO374 released detectors for a larger set of 374 semantic concepts selected from the LSCOM ontology. Columbia374 employed a simple and efficient baseline method using three types of global features. VIREO374 also adopted similar framework, but with an emphasize on the use of local keypoint features.

While keypoint features describe the local structures in an image and do not contain any color information, global features are statistics about the overall distribution of color, texture, or edge information in an image. Hence, we expect these two types of features are complementary for semantic concept detection, which requires either global color information (e.g. for concepts water, desert), or local structure information (e.g., for US-flag, car), or both (e.g., for moutain). It is interesting not only to compare the performance of various features, but also to see whether their combination further improves the performance. As Columbia374 and VIREO-374 work on the same set of concepts, we unify the output formats and fuse the detection scores of both detector sets. With the goal of stimulating innovation in concept detection and providing better large-scale concept detectors for video search, we are releasing the fused detection scores on TRECVID datasets to the multimedia community.

Details about fusion method, performance comparisons, and data format can be found in our technical report.

note09: in addition to the detection scores generated from the old models trained on TRECVID 2005 developement data, the 2009 release also includes additional detection scores of the 20 concepts announced in TRECVID 2009, generated from new models trained on TRECVID 2009 development data using similar method.

note10: the 2010 release is based on models re-trained on TRECVID 2010 development set, using the basic fusion method described in our 2008 technical report. It used multiple bag-of-visual-words local features computed from various spatial partitions. It also incorporated the DASD algorithm to explore concept relationship (context) for improved detection. Technical citation for general usage of these new scores should be the 2008 technical report:

Y.-G. Jiang, A. Yanagawa, S.-F. Chang, C.-W. Ngo, "CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection", Columbia University ADVENT Technical Report #223-2008-1, Aug. 2008. [pdf & bibtex]

For citation to the new contextual diffusion algorithm DASD, please use the following ICCV '09 paper:

Y.-G. Jiang, J. Wang, S.-F. Chang, C.-W. Ngo, "Domain Adaptive Semantic Diffusion for Large Scale Context-Based Video Annotation", International Conference on Computer Vision (ICCV), Kyoto, Japan, September 2009. [pdf & bibtex]

For problems or questions regarding this download site, please contact Yu-Gang Jiang.
Last updated: Aug 10, 2010