Audio-Visual Scene Segmentation

Project's Home Page | Current Research Areas > Multimedia Indexing and Content Management >

Summary

In this project, we develop novel algorithms for computing scenes and within-scene structures in digital video, experimenting with film content. We explore insights from film-making rules and experimental results from the psychology of audition into a computational scene model. We define a computable scene to be a chunk of audio-visual data that exhibits long-term consistency with regard to three properties: (a) chromaticity (b) lighting (c) ambient sound. Central to the computational model is the notion of a causal, finite-memory viewer model. In both audio and video, we determine the degree of correlation of the most recent data in the memory with the past. Synchronization and complementary relations between audio and visual scene boundaries allow us to define different types of a-v scenes. In addition, we detect syntactical structures such as dialog in films by analyzing the statistics in periodic analysis transform of shot sequences. Test on five films show the following results: scene boundary detection: 88% recall and 72% precision, dialogue detection: 91% recall and 100% precision.

People

Hari Sundaram

Prof. Shih-Fu Chang

Publication

H. Sundaram and S.-F. Chang, Determining Computable Scenes in Films and their Structures using Audio Visual Memory Models, ACM Multimedia 2000, Los Angeles, CA, Oct 30-Nov 3, 2000.
(PS.GZ/PDF)

H. Sundaram and S.-F. Chang, Audio Scene Segmentation using Multiple Models, Features and Time Scales, ICASSP 2000, Istanbul, Turkey, June 5-9, 2000.
(PS.GZ/PDF)

H. Sundaram and S.-F. Chang, Video Scene Segmentation using Audio and Video Features, ICME 2000, New York, New York, July 28-Aug 2, 2000 .
(PS.GZ/PDF)

S.-F. Chang and H. Sundaram, Structural and Semantic Analysis of Video, ICME 2000, New York, New York, July 28-Aug 2, 2000 .
(PS.GZ/PDF)

Demo

Download

For problems or questions regarding this web site contact The Web Master.
Last updated: June 12, 2002.