Shih-Fu Chang Research Activities:


My research areas include multimedia analysis, search, communication, and forensics, with applications in next-generation media search engines, visual communication systems, and others. Our work leverages novel formulation and techniques culled from signal processing, computer vision, machine learning, communication, and related fields such as information retrieval and information theory.  

We have made contributions in several key areas and developed several well-known systems for visual search. For these, we have been fortunate to receive the 2009 IEEE Kiyo Tomiyasu Technical Field Award for "pioneer contributions to automatic image search and classification.” We have been advocating the use of content-based processing in the communication pipeline, starting from determining the optimal spatio-temporal resolutions in video coding, video adaptation optimized for heterogeneous communication and user platforms, to adaptive event-based video streaming.

In 1998, we developed one of the first video object search systems, VideoQ , which supported automated spatio-temporal indexing at the object region level. The corresponding paper received the IEEE Transactions on Circuits and Systems Video Technology Best Paper Award in 2000. The automated object region indexing techniques also formed the basis for our two highly cited image search engines, VisualSEEk and WebSEEk .

Recently, we are developing large-scale analytics models for detecting a large number of visual concepts contained in images and videos. We made a systematic effort in defining a large-scale concept ontology for multimedia (LSCOM) , which defined more than 1000 visual concepts that were considered useful, observable, and detectable. We jointly led an effort to evaluate the validity and utility of such ontology, and carefully annotate about 450 concept categories over 60,000 video shots (download site). In addition, we have applied advanced feature extraction and machine learning methods to develop 374 automatic concept detection models (CU-VIREO 374) which have been broadly used (download site).  We have participated in the TRECVID international video retrieval benchmarking event and achieved top performance in the past several years.

One of the major challenges in indexing large video collections is the need to process a vast amount of video data in the real time or even faster. By exploring the compressed formats of video data, we have developed novel and efficient algorithms for processing coded videos in the compressed domain without full decoding [link]. We demonstrated order of magnitude speedup in various video manipulation tasks, and deployed the first Web-based compressed video editing engine, CVEPS , which made possible real-time video editing using thin clients over the Internet without heavy-weight servers.

Another challenge in video search and communication is to condense long videos into compact summaries that meet real-world communication constraints, such as hardware capability, network bandwidth, and user preference. We have developed a unifying optimization framework based on video production theories and psychological models. Our solution extends the classical rate-distortion information theory and adds analytical computational models of resources, subjective utilities, and adaptation operations into a joint optimization framework. Our approach has facilitated development of innovative applications, including a real-time sports video highlight system that has been licensed to several companies. The related paper won a Best Student Paper Award at ACM 10th Multimedia Conference 2002.

We have extended the content-based approaches to other emerging applications, such as media forensics. In the era of pervasive video production and editing, trustworthiness of image and video content should no longer be taken for granted. In applications such as forensics and criminal investigation, problems commonly encountered are whether an image has been tampered with, and in cases of dispute discovering the right source of a video. In our 2001 TCSVT paper, we proposed a novel semi-fragile image authentication technique that combined content-based invariant features with cryptographic techniques to distinguish malicious attacks from benign manipulations of images (e.g., compression). It reflected the important theme of our research - incorporating automated visual content analysis into real-world applications. Recently, we launched an NSF-funded major effort called TrustFoto, in which physics-based features and natural image statistics were used to distinguish natural photographs from computer graphics, and to detect artifacts and inconsistency caused by image tampering. We have deployed an online demo for distinguishing photographic images from computer graphics .

My group has led Columbia's ADVENT University-Industry Research Consortium, and actively participated in Columbia's Digital Library, NIST TRECVID video retrieval evaluation, and MPEG-7 and MPEG-21 international standards.

For more information of our projects, please visit the web site of Digital Video and Multimedia Lab.