Shih-Fu Chang Research Activities:
My research areas include multimedia analysis, search, communication, and forensics, with applications in next-generation media search engines, visual communication systems, and others. Our work leverages novel formulation and techniques culled from signal processing, computer vision, machine learning, communication, and related fields such as information retrieval and information theory.
We have made contributions in
several key areas and developed several well-known systems for visual search.
For these, we have been fortunate to receive the
2009 IEEE Kiyo Tomiyasu Technical Field Award for "pioneer
contributions to automatic image search and classification.” We have been
advocating the use of content-based processing in the communication pipeline,
starting from determining the optimal spatio-temporal resolutions in video
coding, video adaptation optimized for heterogeneous communication and user
platforms, to adaptive event-based video streaming.
In 1998, we developed one of
the first video object search systems, VideoQ
, which supported automated spatio-temporal indexing at the object region
level. The corresponding paper received the IEEE Transactions on Circuits and
Systems Video Technology Best Paper Award in 2000. The automated object region
indexing techniques also formed the basis for our two highly cited image search
engines, VisualSEEk
and WebSEEk
.
Recently, we are developing large-scale
analytics models for detecting a large number of visual concepts contained in
images and videos. We made a systematic effort in defining a
large-scale concept ontology for multimedia (LSCOM) , which defined more
than 1000 visual concepts that were considered useful, observable, and
detectable. We jointly led an effort to evaluate the validity and utility of
such ontology, and carefully annotate about 450 concept categories over 60,000
video shots (download site).
In addition, we have applied advanced feature extraction and machine learning
methods to develop 374 automatic concept detection models (CU-VIREO 374) which have
been broadly used (download
site). We have participated in the
TRECVID international video retrieval benchmarking event and achieved top
performance in the past several years.
One of the major challenges
in indexing large video collections is the need to process a vast amount of
video data in the real time or even faster. By exploring the compressed formats
of video data, we have developed novel and efficient algorithms for processing
coded videos in the compressed domain without full decoding [link].
We demonstrated order of magnitude speedup in various video manipulation tasks,
and deployed the first Web-based compressed video editing engine, CVEPS ,
which made possible real-time video editing using thin clients over the
Internet without heavy-weight servers.
Another challenge in video search
and communication is to condense long videos into compact summaries that meet
real-world communication constraints, such as hardware capability, network
bandwidth, and user preference. We have developed a unifying optimization framework
based on video production theories and psychological models. Our solution
extends the classical rate-distortion information theory and adds analytical
computational models of resources, subjective utilities, and adaptation
operations into a joint optimization framework. Our approach has facilitated
development of innovative applications, including a
real-time sports video highlight system that has been licensed to several companies.
The
related paper won a Best Student Paper Award at ACM 10th Multimedia
Conference 2002.
We have extended the
content-based approaches to other emerging applications, such as media
forensics. In the era of pervasive video production and editing,
trustworthiness of image and video content should no longer be taken for
granted. In applications such as forensics and criminal investigation, problems
commonly encountered are whether an image has been tampered with, and in cases
of dispute discovering the right source of a video. In our 2001
TCSVT paper, we proposed a novel semi-fragile image authentication
technique that combined content-based invariant features with cryptographic
techniques to distinguish malicious attacks from benign manipulations of images
(e.g., compression). It reflected the important theme of our research -
incorporating automated visual content analysis into real-world applications.
Recently, we launched an NSF-funded major effort called TrustFoto, in which
physics-based features and natural image statistics were used to distinguish
natural photographs from computer graphics, and to detect artifacts and
inconsistency caused by image tampering. We have deployed an online demo for
distinguishing photographic images from computer graphics .
My group has led Columbia's
ADVENT University-Industry Research Consortium, and actively participated in
Columbia's Digital Library, NIST TRECVID video retrieval evaluation, and MPEG-7
and MPEG-21 international standards.
For more information of our
projects, please visit the web site of Digital
Video and Multimedia Lab.