Shih-Fu Chang: Finding a Visual Needle in the Digital Haystack

(Photo by Jeffrey Schifman)

Research indicates that we upload and share as many as 1.8 billion pictures each day on the Internet—everything from photos of our kids, vacations, and meals to important events and natural phenomena. And those are just the tip of the digital image iceberg. Scientists are sharing images too—photos that capture how cells divide, videos of how the most minute particles interact, and satellite images of new planets in different solar systems. Then there are the surveillance and assistive systems around the world that upload photos and videos minute-by-minute every day, and the media networks that release video and photos of breaking news.

The problem with all those images is apparent when you try to search for one. Right now, most image search engines rely on keywords, or descriptive text, that are linked to a photo or video. That’s an unreliable method of finding a specific image.

Lucky for the world that Shih-Fu Chang, Richard Dicker Professor of Electrical Engineering, professor of computer science, and director of the Digital Video and Multimedia (DVMM) Lab, thought about that problem more than 20 years ago.

“I have long been fascinated by how images can be used to help people communicate ideas, sense physical environment, or even express personal emotions,” explains Chang. “My goal has been to develop intelligent systems that can extract useful information from vast amounts of visual data and use the information in innovative ways to address grand challenges.”

A treemap visualization of a part of the VSO associated with the “joy” emotion and the visual concepts related to different adjectives. You can see the popularity, detection accuracy, and sentiment (indicated by color of the cell) of each Adjective Noun Pair (ANP), for example “Beautiful Clouds” shown in the picture. (Images courtesy of Shih-Fu Chang)

Chang, who is ranked by Microsoft Academic Search as the most influential researcher in the field of multimedia, leverages machine learning, computer vision, multimedia analytics, and video processing to develop intelligent visual search solutions for society and industry. His contributions include groundbreaking search paradigm and prototype tools that allow users to find content of similar visual attributes, to search videos by a very large pool of visual concept classifiers, and to summarize the event patterns and anomalies found in a large array of video sources.

But the deeper Chang goes in developing visual search capability, the more opportunities he finds to push the science further. Currently, he is focused on developing practical applications of multimedia information extraction to help machines conduct situation awareness of events portrayed in open source videos. His objective is to enable better decision making in complex, dynamic situations such as international conflict, socioeconomic movement, emergency service, and sociopolitical movement.

“In order to reach these goals, we are leveraging theories and tools culled from many fields including machine learning, computer vision, signal processing, and natural language processing, for which we have been privileged to enjoy very fruitful collaborations across departments and schools at Columbia,” he says.

One of those collaborations is with Peter Allen, professor of computer science, to further user-machine interaction and enhance assistive technology.

“Collaboration with Peter Allen’s group is a very natural outcome since the intelligent capabilities of visual search emerge not only in the field of information extraction, but also in all other areas dealing with human-machine interaction,” he notes.

Research + Discovery

Julia Hirschberg
Lies and Linguistics
Elisa Konofagou
Headway in Harmonic Health Care

For their collaboration, Chang and Allen are investigating one of the stickier problems in robotic and computer visual recognition: identification and manipulation of deformable objects like clothing or food. Imagine an intelligent assistive robot that can identify a pair of jeans by any shape it may be in—such as hanging on a hanger or part of a pile of clothes—and figure out how to correctly fold it. This will be of tremendous value for improving productivity in the textile manufacturing industry or even in advancing personalized robotic care, which may one day be popular in our rapidly aging society.

“Together, we are combining the intelligent visual recognition techniques developed by my group with intelligent robotics control and planning systems developed by Professor Allen’s group,” he adds. “We are working on intelligent machines that can resolve the ambiguity of visual appearance, and enhancing robotic performance with regard to recognition and control tasks.”

While a picture may still be worth a thousand words, the possibility of developing breakthroughs in visual information processing across the gamut of scientific and practical domains inspires infinite possibilities for Chang.

“With the broad strength we have in all engineering disciplines and the vibrant collaboration culture at Columbia, I am confident that we will be able to continue leading the way and make fundamental contributions in this space,” he says.

Original article available here.

—by Amy Biemiller


500 W. 120th St., Mudd 1310, New York, NY 10027    212-854-3105               
©2014 Columbia University