Keynote Speech at International
Workshop on Image Analysis for Multimedia Interactive Services, Santorini,
Greece, June 2007
[slide
6MB]
With the prevalent success of Internet search, researchers are facing new
opportunities and challenges – developing next-generation multimedia
search technologies that may reach a performance level similar to that of
text search. Despite the grand scale of the challenges, some promising grounds
have been revealed recently. In this talk, I will focus on two exciting
areas – semantic annotation and multimedia document ranking. For the
former, we are witnessing the significant developments in large-scale image/video
collection for benchmarking, multimedia lexicons, and a large number of
semantic classifiers. For example, we have developed classifies for 374
semantic concepts with encouraging performance using more than 160 hours
of digital videos in TRECVID 2006. The collective power of a large number
of semantic models offers great potential – I will share recent results
of this approach in video searching, topic threading, and temporal pattern
mining. For the second area, I will present recent efforts to model video
retrieval as a document ranking problem similar to that used for page ranking
of web search. I will discus how the semantic models and visual duplicate
information may be used to approximate the information required in constructing
document link graphs. I will conclude the talk with discussions of open
issues in this dynamic and exciting area.
Keynote Speech at IEEE
Multimedia Signal Processing Workshop (MMSP), Shanghai, Oct. 2005
Keynote Speech at Conference on Computer Vision and Graphic Image Processing
(CVGIP), Taipei, 2005 [slide
3MB]
With the significant progress made in information analysis in text, audio,
and video and the recent availability of large-scale benchmarking events,
new opportunities emerge in developing and testing novel frameworks for
integrating multi-modal information to solve many challenging programs,
such as automatic annotation, story segmentation, multi-modal retrieval,
and topic clustering. In this talk, I discuss the opportuni-ties, state
of the art, and open research issues in using multi-modal integration in
video indexing. In addition, I discuss applications of some of the techniques
to another class of problems – video mining, namely, automatic discovery
of meaningful patterns in videos without expert domain knowledge or manual
supervision. Case studies showing promising performance will be described,
primarily in the broadcast news video domain.
Invited paper at ICASSP
March 2005, Philadelphia. [slide]
joint paper with R. Manmatha and Tat-Seng Chua.
I gave an overview of recent approaches and progress in fusing multi-modal
features for solving different problems in video indexing, such as story
segmentation, topic detection and tracking, multi-modal retrieval, and automatic
annotation. I reviewed the use of common mathematical models such as maximal
entropy, probabilistic clustering, query dependent retrieval, and cross-media
relevance model. In addition, I discussed the new issues arising from the
unique problems in dealing with heterogeneous, high-dimensional multimedia
features.
[slide] [project web site]
Overview of our new project aiming at blind detection of digital tampering of photographs, without using any watermark or digital signature. Our approach combines statistical modeling of natural image signals, camera filters and transfer functions, and 3D scene-lighting-reflectance modeling. We have developed algorithms and software for detecting block-level splicing and computer graphics generated images. We have also constructed benchmark data sets that can be used by researchers for evaluating performance.
keynote speech at International
Conference on Image and Video Retrieval, Dublin, Ireland, July 2004. [slide]
In this talk, I advocated a new research direction of unsupervised mining
of patterns from large collection of videos. Patterns are recurrent entities
that exhibit consistent spatio-temporal structures and attributes. In particular,
I focused on continuous, stochastic temporal video patterns, that may correspond
to useful semantics such as play/break events in sports, production patterns
in news broadcast, recurrent human activities in surveillance video, or
news topical threads across multiple news channels. I presented our recent
results in using Hierarchical HMM to discover the temporal patterns at multi
levels from sports and news video. I also presented several fusion methods
using the news transcripts (through ASR or closed captions) to automatically
discover the meanings of the discovered temporal patterns (tokens).
keynote speech at International Conference on Image Analysis and Processing (ICIAP), Mantova, Italy, Sept. 2003. [slide]
In this talk, I use the ubiquitous media access scenarios to advocate exploration of content analysis techniques for new applications such as adaptive video presentation, streaming, and transcoding. I presented the need and some solutions for real-time event detection and unsupervised pattern discovery in new domains. I also described our recent results in real-time content-based utility function prediction for automatically selecting the optimal MPEG-4 transcoding options.
I presented our recent work on unsupervised discovery of temporal patterns in video sequences. Specifically we use Hierarchical HMM to model the multi-level events in video. We investigated the difficult issues of parameter estimation, model adaptation, and feature selection. Experiment results on soccer and baseball videos showed promising results, discovering the play/break structures automatically with an accuracy level comparable to using supervised approaches.
This talk includes an overview of our research in video indexing, structure discovery, skimming, and their applications. Specific topics include
I presented the utility-based optimization framework to model the relationships between different types of resources, utilities, and adaptation operations. The framework is analogous to the conventional rate-distortion model, in which rate (resource), distortion (utility), and coding parameters (adaptation operators) are considered. I described how the framework can be used to help derive optimal solutions for video skimming and transcoding operator selection.
In this panel presentation, I attempted to provide a list of criteria for assessing potential impact of any automatic content-based analysis technique. I tried to justify the adopted criteria, and applied them to some applications systems we are developing in medical, sports, and film domains.
The paper I wrote for the vision column of IEEE Multimedia Magazine, "The Holy Grail of Content-Based Media Analysis," be a good companion reading of the above presentation.
We propose a new paradigm called content-adaptive video streaming. The focus is to explore the synergy between content analysis, video coding, and transmission. We presented a real-time system that dynamically change the resource allocation (e.g., bit rate) within a live video streaming session, according to the "importance" of each video segment. The segment importance is defined based on high-level events detectable in specific domains such as pitching and scoring in sports.