Prof. Shih-Fu Chang Research Activities:
My research areas include content-based image video search, video content classification, video adaptation, and media forensics. Our work leverages novel formulation and applications of techniques culled from image processing, computer vision, machine learning, and related fields such as information retrieval, database, and information theory.
Our ultimate goal is to develop next-generation search engines for
digital images and videos. With my students and co-workers, I have made
contributions in several key areas, and developed several well-known systems for image video
search. For these, we have been fortunate to receive
the 2009 IEEE Kiyo Tomiyasu Technical
Field Award for "pioneer contributions to automatic image search and classification.”
In 1998, we developed one of the first video object search systems,
VideoQ ,
which supported automated spatio-temporal indexing at the object region level.
Users were able to specify the search criteria in terms of the spatial-temporal
composition of visual regions using interactive drawing tools, rather than typing
keywords only. The corresponding paper won the IEEE Transactions on Circuits and
Systems Video Technology Best Paper Award in 2000.
The automated object region indexing techniques also formed the basis for our two
well-known image search engines,
VisualSEEk and
WebSEEk .
Recently, through collaboration with partners in other universities and industry,
we have been advocating new representation paradigms for video search.
We proposed to develop a large number of analytics models for detecting
visual concepts contained in images and videos. Such models, in the quantity
of hundreds to thousands, provide a basic visual language which can be used
to describe the generic concepts of objects, people, scenes, events, and
domain-related production syntax contained in the images and videos.
We made a systematic effort in defining
a large-scale concept ontology for multimedia (LSCOM)
, which defined more
than 1000 visual concepts that were considered useful, observable, and detectable.
We led a team of researchers and students to evaluate the validity and utility
of such ontology, and carefully annotate about 450 concept categories over
60,000 video shots. The resulting ontology and annotated video corpus can be downloaded
here.
In addition, we have applied advanced feature extraction and machine learning methods
to develop 374 automatic concept detection models (CU-VIREO 374) which can be downloaded
here .
One of the major challenges in indexing large video collections is the need to process
a vast amount of video data in the real time or even faster. By exploring the compressed
formats of video data, we have developed novel and efficient algorithms for processing coded
videos in the compressed domain without full decoding
[ref].
We demonstrated order of magnitude
speedup in various video manipulation tasks, and deployed the first Web-based compressed
video editing engine,
CVEPS .
Our compressed-domain video processing technologies made possible real-time video
editing using remote thin clients over the Internet without demanding heavy-weight computers.
Another challenge in video indexing is to find effective ways of condensing long
videos into compact summaries that meet real-world constraints, such as hardware
capability, network bandwidth, and user preference. We have developed a unifying
framework based on domain-relevant production theories and human psychological models
for solving open problems in video adaptation and summarization. Our solution extends
the classical rate-distortion information theory and adds analytical computational
models of resources, subjective utilities, and adaptation operations into a joint
optimization framework. Our approach has facilitated development of innovative
applications in practice, including
a real-time sports video highlight generation
system developed in 2001 and licensed to several industry groups.
Our publication on video skimming won a Best Student Paper Award at
ACM 10th Multimedia Conference 2002.
We have extended the content-based approaches to other emerging applications,
such as media forensics. In the era of pervasive video production and editing,
trustworthiness of image and video content should no longer be taken for granted.
In applications such as forensics and criminal investigation, problems commonly
encountered are whether an image has been tampered with, and in cases of dispute
discovering the right source of a video. In our
2001 TCSVT paper, we proposed a novel semi-fragile image authentication
technique that combined content-based invariant features with cryptographic techniques
to distinguish malicious attacks from benign manipulations of images (e.g., compression).
It reflected the important theme of our research – embed automated visual content analysis
into real-world applications. Recently, we launched an NSF-funded major effort called
TrustFoto, in which physics-based
features and natural image statistics were used to distinguish natural photographs
from computer graphics, and to detect artifacts and inconsistency caused by image tampering.
We have also deployed
an online
demo for distinguishing photographic images from computer graphics .
My group has actively
participated in Columbia's ADVENT University-Industry Research Consortium, Columbia's
Persival Digital Library, NIST
TRECVID video retrieval evaluation event, and the development of MPEG-7
and MPEG-21 standards.
For more information
of our projects, please visit the web site of Digital
Video and Multimedia Lab.