Columbia374

Columbia Universityfs Baseline Detectors for 374 LSCOM Semantic Visual Concepts

Quick Guide to Columbia374


Columbia374 Citation:

Akira Yanagawa, Shih-Fu Chang, Lyndon Kennedy and Winston Hsu, "Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts", Columbia University ADVENT Technical Report #222-2006-8, March 20, 2007. [pdf]
  1. Visual Features & Lists

    Download the Visual Features and the lists in Columbia374.
    (269 MB file. Expands to 695 MB on disk.)


  2. Models for LIBSVM (Ver. 2.81)

     Download the Models trained by LIBSVM (Ver. 2.81) in Columbia374.
    (3.5 GB file. Expands to 4 GB on disk.)


  3. Scores

     Download the Scores
    (3.9 GB file. Expands to 15 GB on disk.)


  4. Annotation

     Download the annotation for 374 concepts in Columbia374.
    (72 MB file. Expands to 581 MB on disk.)


  5. Visual Features & Lists of TRECVID2007

    Download the Visual Features and the list file for 2007 data.
    (57 MB file. Expands to 142 MB on disk.)


  6. Scores of TRECVID2007

     Download the Scores of TRECVID2007
    (785 MB file. Expands to 2.8 GB on disk.)


  7. Features and Scores of external search examples for TRECVID2007

     Download the Features and Scores of external Search ExamplesTRECVID2007
    (2.7 MB file. Expands to 14 MB on disk.)


  8. Visual Features & Lists of TRECVID2008

    Download the Visual Features and the list file for 2008 data.
    (145 MB file. Expands to 356 MB on disk.)

    Summary

    Semantic concept detection represents a key requirement in accessing large collections of digital images/videos. Automatic detection of presence of a large number of semantic concepts, such as gperson,h or gwaterfront,h or gexplosionh, allows intuitive indexing and retrieval of visual content at the semantic level. Development of effective concept detectors and systematic evaluation methods has become an active research topic in recent years. For example, a major video retrieval benchmarking event, NIST TRECVID[1], has contributed to this emerging area through (1) the provision of large sets of common data and (2) the organization of common benchmark tasks to perform over this data. 

    However, due to limitations on resources, the evaluation of concept detection is usually much smaller in scope than is generally thought to be necessary for effectively leveraging concept detection for video search. In particular, the TRECVID benchmark has typically focused on evaluating, at most, 20 visual concepts, while providing annotation data for 39 concepts. Still, many researchers believe that a set of hundreds or thousands of concept detectors would be more appropriate for general video retrieval tasks.  To bridge this gap, several efforts have developed and released annotation data for hundreds of concepts [2, 5, 6].

    While such annotation data is certainly valuable, it should also be noted that building automatic concept detectors is a complicated and computationally expensive process. Research results over the last few years have converged on the finding that an approach using grid- and global-level feature representations of keyframes from a video shot and a support vector machine (SVM) classifier provide an adequate baseline for building a strong concept detection system [8, 3].  The resulting situation is many research groups investing serious effort in replicating these baseline SVM-based methods, which leaves little time for investigating new and innovative approaches.

    The Mediamill Challenge dataset [2] helped address much of this replication-of-effort problem by releasing detectors for 101 semantic concepts over the TRECVID2005/2006 dataset[1], including the ground truth, the features, and the results of the detectors. This dataset is useful for reducing the large computational costs in concept detection and allowing for a focus on innovative new approaches.  In this same spirit, we are releasing a set of 374 semantic concept detectors (called gColumbia374h) with the ground truth, the features, and the results of the detectors based on our baseline detection method in TRECVID2005/2006 [3, 4], with the goal of fostering innovation in concept detection and enabling the exploration of the use of a large set of concept detectors for video search. When future datasets become available (e.g., TRECVID 2007), we will also release features and detection results over the new data set.

    The 374 concepts are selected from the LSCOM ontology [6], which includes more than 834 visual concepts jointly defined by researchers, information analysts, and ontology specialists according to the criteria of usefulness, feasibility, and observability. These concepts are related to events, objects, locations, people, and programs that can be found in general broadcast news videos. The definition of the LSCOM concept list and the annotation of its subset (449 concepts) may be found on [5].

    Columbia374 employs a simple baseline method, composed of three types of features, individual SVMs [7] trained independently over each feature space, and a simple late fusion of the SVMs. Such an approach is rather light-weight, when compared to top-performing TRECVID submissions.  Nonetheless, running even such a light-weight training process for all 374 concepts takes approximately 3 weeks using 20 machines in parallel, or roughly more than a year of machine time.  Clearly this is not an effort that needs to be duplicated at dozens of research groups around the world.  Despite the simple features and classification methods used for the Columbia374 detectors, the resulting baseline models achieve very good performance in the TRECVID2006 concept detection benchmark and, therefore, provides a strong baseline platform for researchers to expand upon.

    In TRECVID 2007, subshot definitions were only provided by NIST for the development set. However, to obtain better temporal granularity, we computed and extracted subshots and their keyframes for the 2007 test set. Scores for the 2007 test set represent the MAX subshot score for each master shot. Features extracted from each subshot keyframe for the entire 2007 set are also included.

    Additionally, please note that the current models for the Columbia374 baseline were trained using less than 40% of the development set of TRECVID 2005 data. Performance of the models can be improved by increasing the size of the training data set or fusing more models for each detector.

    Details about the features, classification methods, training procedures, and data structures used in the Columbia374 release can be found in [9]. Example applications of Columbia374 detectors in improving video search and concept detection can be found therein also.
     

    References

    [1] NIST, "TREC video retrieval evaluation (TRECVID)," 2001-2006.http://www-nlpir.nist.gov/projects/trecvid/

    [2] C. G. M. Snoek, M. Worring, J. C. v. Gemert, J.-M. Geusebroek, and A. W. M. Smeulders, "The challenge problem for automated detection of 101 semantic concepts in multimedia," in Proceedings of the 14th annual ACM international conference on Multimedia Santa Barbara, CA, USA 2006.

    [3] S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E. Zavesky, and D.-Q. Zhang, "Columbia University TRECVID-2005 Video Search and High-Level Feature Extraction," in NIST TRECVID workshop Gaithersburg, MD, 2005.

    [4] S.-F. Chang, W. Jiang, W. Hsu, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky, "Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction," in NIST TRECVID workshop Gaithersburg, MD, 2006.

    [5] "LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia," Columbia University ADVENT Technical Report #217-2006-3, March 2006. Data set download site: http://www.ee.columbia.edu/dvmm/lscom/

    [6]  M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, J. Curtis, "Large-Scale Concept Ontology for Multimedia, " IEEE Multimedia Magazine, 13(3), 2006.

    [7] C.-C. Chang and C.-J. Lin, "LIBSVM: a Library for Support Vector Machines," 2001

    [8] A. Amir, J. Argillander, M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade A. P. Natsev, John R. Smith, J. Tesic, and T. Volkmer, "IBM Research TRECVID-2005 Video Retrieval System," in NIST TRECVID workshop Gaithersburg, MD, 2005.

    [9] A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu, hColumbia Universityfs Baseline detectors for 374 LSCOM Semantic Visual Concepts,h Columbia University ADVENT Technical Report # 222-2006-8, March 20, 2007. (download site- http://www.ee.columbia.edu/dvmm/columbia374)


    Link to 374 SVM models (CU-VIREO374) that fuse global features and local features trained using TRECVID video data
    For problems or questions regarding this web site contact The Web Master.
    Last updated: Jan. 12, 2011