Download the Visual Features and the
lists in Columbia374.
(269 MB file. Expands to 695 MB on disk.)
Download the Models trained by LIBSVM
(Ver. 2.81) in Columbia374.
(3.5 GB file. Expands to 4 GB on
disk.)
Download the Scores
(3.9 GB file.
Expands to 15 GB on disk.)
Download the annotation for 374
concepts in Columbia374.
(72 MB file. Expands to 581 MB on disk.)
Download the Visual Features and the list file for 2007 data.
(57 MB file. Expands to 142 MB on disk.)
Download the Scores of TRECVID2007
(785 MB file.
Expands to 2.8 GB on disk.)
Download the Features and Scores of external Search ExamplesTRECVID2007
(2.7 MB file.
Expands to 14 MB on disk.)
Download the Visual Features and the list file for 2008 data.
(145 MB file. Expands to 356 MB on disk.)
Semantic concept detection represents a key requirement in accessing large
collections of digital images/videos. Automatic detection of presence of a large
number of semantic concepts, such as gperson,h or gwaterfront,h or gexplosionh,
allows intuitive indexing and retrieval of visual content at the semantic level.
Development of effective concept detectors and systematic evaluation methods has
become an active research topic in recent years. For example, a major video
retrieval benchmarking event, NIST TRECVID[1], has contributed to this emerging
area through (1) the provision of large sets of common data and (2) the
organization of common benchmark tasks to perform over this
data.
However, due to limitations on resources, the evaluation of
concept detection is usually much smaller in scope than is generally thought to
be necessary for effectively leveraging concept detection for video search. In
particular, the TRECVID benchmark has typically focused on evaluating, at most,
20 visual concepts, while providing annotation data for 39 concepts. Still, many
researchers believe that a set of hundreds or thousands of concept detectors
would be more appropriate for general video retrieval tasks. To bridge
this gap, several efforts have developed and released annotation data for
hundreds of concepts [2, 5, 6].
While such annotation data is certainly
valuable, it should also be noted that building automatic concept detectors is a
complicated and computationally expensive process. Research results over the
last few years have converged on the finding that an approach using grid- and
global-level feature representations of keyframes from a video shot and a
support vector machine (SVM) classifier provide an adequate baseline for
building a strong concept detection system [8, 3]. The resulting situation
is many research groups investing serious effort in replicating these baseline
SVM-based methods, which leaves little time for investigating new and innovative
approaches.
The Mediamill Challenge dataset [2] helped address much of
this replication-of-effort problem by releasing detectors for 101 semantic
concepts over the TRECVID2005/2006 dataset[1], including the ground truth, the
features, and the results of the detectors. This dataset is useful for reducing
the large computational costs in concept detection and allowing for a focus on
innovative new approaches. In this same spirit, we are releasing a set of
374 semantic concept detectors (called gColumbia374h) with the ground truth, the
features, and the results of the detectors based on our baseline detection
method in TRECVID2005/2006 [3, 4], with the goal of fostering innovation in
concept detection and enabling the exploration of the use of a large set of
concept detectors for video search. When future datasets become available (e.g.,
TRECVID 2007), we will also release features and detection results over the new
data set.
The 374 concepts are selected from the LSCOM ontology [6],
which includes more than 834 visual concepts jointly defined by researchers,
information analysts, and ontology specialists according to the criteria of
usefulness, feasibility, and observability. These concepts are related to
events, objects, locations, people, and programs that can be found in general
broadcast news videos. The definition of the LSCOM concept list and the
annotation of its subset (449 concepts) may be found on [5].
Columbia374
employs a simple baseline method, composed of three types of features,
individual SVMs [7] trained independently over each feature space, and a simple
late fusion of the SVMs. Such an approach is rather light-weight, when compared
to top-performing TRECVID submissions. Nonetheless, running even such a
light-weight training process for all 374 concepts takes approximately 3 weeks
using 20 machines in parallel, or roughly more than a year of machine
time. Clearly this is not an effort that needs to be duplicated at dozens
of research groups around the world. Despite the simple features and
classification methods used for the Columbia374 detectors, the resulting
baseline models achieve very good performance in the TRECVID2006 concept
detection benchmark and, therefore, provides a strong baseline platform for
researchers to expand upon.
In TRECVID 2007, subshot definitions were
only provided by NIST for the development set. However, to obtain better
temporal granularity, we computed and extracted subshots and their keyframes
for the 2007 test set. Scores for the 2007 test set represent the MAX subshot
score for each master shot. Features extracted from each subshot keyframe for
the entire 2007 set are also included.
Additionally, please note that
the current models for the Columbia374 baseline were trained using less than
40% of the development set of TRECVID 2005 data. Performance of the models can
be improved by increasing the size of the training data set or fusing more
models for each detector.
Details about the features, classification
methods, training procedures, and data structures used in the Columbia374
release can be found in [9]. Example applications of Columbia374 detectors in
improving video search and concept detection can be found therein
also.
[1] NIST, "TREC
video retrieval evaluation (TRECVID)," 2001-2006.http://www-nlpir.nist.gov/projects/trecvid/
[2] C.
G. M. Snoek, M. Worring, J. C. v. Gemert, J.-M. Geusebroek, and A. W. M.
Smeulders, "The challenge problem for automated detection of
101 semantic concepts in multimedia," in Proceedings of the 14th annual
ACM international conference on Multimedia Santa Barbara, CA, USA
2006.
[3] S.-F. Chang, W. Hsu, L. Kennedy, L. Xie, A. Yanagawa, E.
Zavesky, and D.-Q. Zhang, "Columbia
University TRECVID-2005 Video Search and High-Level Feature Extraction," in
NIST TRECVID workshop Gaithersburg, MD, 2005.
[4] S.-F. Chang, W.
Jiang, W. Hsu, L. Kennedy, D. Xu, A. Yanagawa, and E. Zavesky, "Columbia
University TRECVID-2006 Video Search and High-Level Feature Extraction," in
NIST TRECVID workshop Gaithersburg, MD, 2006.
[5] "LSCOM Lexicon Definitions and Annotations Version 1.0, DTO Challenge Workshop on Large Scale Concept Ontology for Multimedia," Columbia University ADVENT Technical Report #217-2006-3, March 2006. Data set download site: http://www.ee.columbia.edu/dvmm/lscom/
[6] M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, J. Curtis, "Large-Scale Concept Ontology for Multimedia, " IEEE Multimedia Magazine, 13(3), 2006.
[7] C.-C. Chang and C.-J. Lin, "LIBSVM: a Library for
Support Vector Machines," 2001
[8] A. Amir, J. Argillander,
M. Campbell, A. Haubold, G. Iyengar, S. Ebadollahi, F. Kang, M. R. Naphade A. P.
Natsev, John R. Smith, J. Tesic, and T. Volkmer, "IBM
Research TRECVID-2005 Video Retrieval System," in NIST TRECVID workshop
Gaithersburg, MD, 2005.
[9] A. Yanagawa, S.-F. Chang, L. Kennedy, and W. Hsu, hColumbia Universityfs Baseline detectors for 374 LSCOM Semantic Visual Concepts,h Columbia University ADVENT Technical Report # 222-2006-8, March 20, 2007. (download site- http://www.ee.columbia.edu/dvmm/columbia374)