CU-VIREO374

Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection

Quick Guide to CU-VIREO374


CU-VIREO374 Citation

Yu-Gang Jiang, Akira Yanagawa, Shih-Fu Chang, Chong-Wah Ngo, "CU-VIREO374: Fusing Columbia374 and VIREO374 for Large Scale Semantic Concept Detection", Columbia University ADVENT Technical Report #223-2008-1, Aug. 2008. [pdf]

Download the prediction scores on TRECVID 2008 data set (552MB)

Download the prediction scores on TRECVID 2009 data set (113MB) note   

Summary

Semantic concept detection is an active research topic as it can provide semantic filters and aid in automatic search of image and video databases. The annual NIST TRECVID video retrieval benchmarking event has greatly contributed to this area by providing benchmark datasets and performing system evaluation. As acquiring ground truths of semantic concepts is time-consuming, in the TRECVID event only 10-20 concepts were selected for evaluation each year. This is insufficient for general video retrieval tasks, for which most researchers believe that hundreds or thousands of concepts would be more appropriate. In light of this, several efforts have developed and released annotation data for hundreds of concepts, such as LSCOM.

Although the annotations are publicly available, building detectors for hundreds of concepts is complicated and time-consuming. To stimulate innovation of new techniques and reduce the effort in replicating similar methods, there are several efforts in developing and releasing large-scale concept detectors, including Mediamill-101, Columbia374, and VIREO374. The Mediamill-101 includes 101 detectors over TRECVID 2005/2006 datasets, including ground truth labels, features, and detection scores. Columbia374 and VIREO374 released detectors for a larger set of 374 semantic concepts selected from the LSCOM ontology. Columbia374 employed a simple and efficient baseline method using three types of global features. VIREO374 also adopted similar framework, but with an emphasize on the use of local keypoint features.

While keypoint features describe the local structures in an image and do not contain any color information, global features are statistics about the overall distribution of color, texture, or edge information in an image. Hence, we expect these two types of features are complementary for semantic concept detection, which requires either global color information (e.g. for concepts water, desert), or local structure information (e.g., for US-flag, car), or both (e.g., for moutain). It is interesting not only to compare the performance of various features, but also to see whether their combination further improves the performance. As Columbia374 and VIREO-374 work on the same set of concepts, we unify the output formats and fuse the detection scores of both detector sets. With the goal of stimulating innovation in concept detection and providing better large-scale concept detectors for video search, we are releasing the fused detection scores on TRECVID 2008 corpora to the multimedia community.

Details about fusion method, performance comparisons, and data format can be found in our technical report.
 

note: the 2009 release also includes additional detection scores of the 20 concepts announced in TRECVID 2009, generated from new models trained on TRECVID 2009 development data using similar method. Technical citation for these new scores is still the technical report.


For problems or questions regarding this download site, please contact Web Master.
Last updated: Aug 22, 2009