Prof. Shih-Fu Chang Research Activities:


My research areas include content-based image video search, video content classification, video adaptation, and media forensics. Our work leverages novel formulation and applications of techniques culled from image processing, computer vision, machine learning, and related fields such as information retrieval, database, and information theory.  

Our ultimate goal is to develop next-generation search engines for digital images and videos. With my students and co-workers, I have made contributions in several key areas, and developed several well-known systems for image video search. For these, we have been fortunate to receive the 2009 IEEE Kiyo Tomiyasu Technical Field Award for "pioneer contributions to automatic image search and classification.”

In 1998, we developed one of the first video object search systems, VideoQ , which supported automated spatio-temporal indexing at the object region level. Users were able to specify the search criteria in terms of the spatial-temporal composition of visual regions using interactive drawing tools, rather than typing keywords only. The corresponding paper won the IEEE Transactions on Circuits and Systems Video Technology Best Paper Award in 2000. The automated object region indexing techniques also formed the basis for our two well-known image search engines, VisualSEEk and WebSEEk .

Recently, through collaboration with partners in other universities and industry, we have been advocating new representation paradigms for video search. We proposed to develop a large number of analytics models for detecting visual concepts contained in images and videos. Such models, in the quantity of hundreds to thousands, provide a basic visual language which can be used to describe the generic concepts of objects, people, scenes, events, and domain-related production syntax contained in the images and videos. We made a systematic effort in defining a large-scale concept ontology for multimedia (LSCOM) , which defined more than 1000 visual concepts that were considered useful, observable, and detectable. We led a team of researchers and students to evaluate the validity and utility of such ontology, and carefully annotate about 450 concept categories over 60,000 video shots. The resulting ontology and annotated video corpus can be downloaded here. In addition, we have applied advanced feature extraction and machine learning methods to develop 374 automatic concept detection models (CU-VIREO 374) which can be downloaded here .

One of the major challenges in indexing large video collections is the need to process a vast amount of video data in the real time or even faster. By exploring the compressed formats of video data, we have developed novel and efficient algorithms for processing coded videos in the compressed domain without full decoding [ref]. We demonstrated order of magnitude speedup in various video manipulation tasks, and deployed the first Web-based compressed video editing engine, CVEPS . Our compressed-domain video processing technologies made possible real-time video editing using remote thin clients over the Internet without demanding heavy-weight computers.

Another challenge in video indexing is to find effective ways of condensing long videos into compact summaries that meet real-world constraints, such as hardware capability, network bandwidth, and user preference. We have developed a unifying framework based on domain-relevant production theories and human psychological models for solving open problems in video adaptation and summarization. Our solution extends the classical rate-distortion information theory and adds analytical computational models of resources, subjective utilities, and adaptation operations into a joint optimization framework. Our approach has facilitated development of innovative applications in practice, including a real-time sports video highlight generation system developed in 2001 and licensed to several industry groups. Our publication on video skimming won a Best Student Paper Award at ACM 10th Multimedia Conference 2002.

We have extended the content-based approaches to other emerging applications, such as media forensics. In the era of pervasive video production and editing, trustworthiness of image and video content should no longer be taken for granted. In applications such as forensics and criminal investigation, problems commonly encountered are whether an image has been tampered with, and in cases of dispute discovering the right source of a video. In our 2001 TCSVT paper, we proposed a novel semi-fragile image authentication technique that combined content-based invariant features with cryptographic techniques to distinguish malicious attacks from benign manipulations of images (e.g., compression). It reflected the important theme of our research – embed automated visual content analysis into real-world applications. Recently, we launched an NSF-funded major effort called TrustFoto, in which physics-based features and natural image statistics were used to distinguish natural photographs from computer graphics, and to detect artifacts and inconsistency caused by image tampering. We have also deployed an online demo for distinguishing photographic images from computer graphics .

My group has actively participated in Columbia's ADVENT University-Industry Research Consortium, Columbia's Persival Digital Library, NIST TRECVID video retrieval evaluation event, and the development of MPEG-7 and MPEG-21 standards.

For more information of our projects, please visit the web site of Digital Video and Multimedia Lab.