IBM Photo
.
Ching-Yung Lin -- Projects

IBM

 

[ This page is under construction!! ]


PART I: REAL-WORLD MULTIMODALITY LEARNING MACHINES

 
We are investigating the theory, algorithm and system issues toward the construction of automatic cognitive learning machine. With the recent success of machine learning algorithms, many traditional thinking of system design may be re-examined via machine learning approaches. Under machine learning infrastructure, system designers no longer play the role of assigning rules but designing algorithms to allow systems to learn to solve problems by themselves. For instance, this approach has significantly increased the number of concepts that machine can understand. Previously, researchers work for decades to model a few concepts – e.g., face, car, people, etc. However, with machine learning approaches, the number of concept detectors increase to the range of hundreds and have competitive or better accuracy than prior methods. In order to increase machine’s capability of problem-solving, we propose to create a new kind of learning system. These learning machines automatically capture data from multimodality sensing sources (audio, visual, text, etc), execute recognition, and then learn to infer more complicated concepts/knowledge. Also, the learning system can execute existing knowledge-base, e.g., electronic dictionary, to extend and connect concepts.


(1) Autonomous Learning   (with Xiaodan Song and Ming-Ting Sun)


(2) Smart Semantic Video Camera  (with Victor Sutan and Jason Cardillo)


(3) Imperfect and Continuous Learning (with Xiaodan Song, Panda Navneet and Gang Wu)


(3) Multimedia Semantic Analysis


[VideoAL Mark]  Multimedia Semantic Concept Analysis

 My research objectives are (1) Object Detection - Robust and accurate detection, location and counts of specific objects; (2) Object Recognition - Determine the specific instance of an object class (e.g. person); (3) Event Understanding - Form inferences from occurrences or reoccurrence of activity; (4) Multi-Modal Fusion - Combine multiple sources to maximize the salient information that can be extracted from the video; (5) Video Query by Example - Retrieve information through database inquiry using a sequence of video or content descriptors; (6) Video Summary - Methods to reduce information representation and scenario based activity summarization; (7) Multi-Modal Video Mining - Automatically discovering trends, patterns, and associations in video; (8) Object Tracking - Determine the path of a known object within a video sequence; (9) Motion Analysis - Quantity the movement of objects or phenomenon in a video sequence and (10) Kinematics Analysis - Identify an object or phenomenon by its motion

 We had the following objectives of addressing the challenging problem of fully-automatic indexing and retrieval of unstructured video content, engaging the research/industry community in establishing benchmark for video content retrieval, participating in the benchmark and leveraging it for advancing technology in video content retrieval, and establishing IBM Research as premiere thought leaders in the area of multimedia indexing and semantic understanding.

 Our effort resulted in the following accomplishments. First, we helped with the formation of the TREC video retrieval benchmark and its tasks and participated in TREC video retrieval benchmark since its establishment in 2001. We provided the leadership role in establishing the "concept detection" task within the TREC video retrieval benchmark.  IBM proposed the idea to NIST in Nov. 2001 and followed through by leading effort to design the benchmark and test methodology and choose the concepts for detection. We provided the leadership role in establishing the "MPEG-7 concept/transcript/shot exchange" task of the TREC-2002 benchmark with the goal of accelerating the pace of technological advancement by allowing different participants to focus on different aspects of multimedia indexing problems. In 2003, we initiated and organized a collaborative video annotation forum, in which we jointly work with colleagues in 23 groups to build ground-truth labels on 62 hours of video. Near 500K of labels (after hierarchical propagation) have been annotated on 45K of shots. These ground-truth labels have been widely used for video semantic concept training and system evaluation.


(Collaborators: Belle L. Tseng, Milind Naphade, Apostol Natsev, John R. Smith)



[ This page is under construbtion!!]

Last Updated: 01/24/2006