Audio-Visual Scene Segmentation | ||
|
||
|
Summary
In this project, we develop novel
algorithms for computing scenes and within-scene structures in digital video,
experimenting with film content. We explore insights from film-making rules
and experimental results from the psychology of audition into a computational
scene model. We define a computable scene to be a chunk of audio-visual
data that exhibits long-term consistency with regard to three properties:
(a) chromaticity (b) lighting (c) ambient sound. Central to the computational
model is the notion of a causal, finite-memory viewer model. In both audio
and video, we determine the degree of correlation of the most recent data
in the memory with the past. Synchronization and complementary relations
between audio and visual scene boundaries allow us to define different types
of a-v scenes. In addition, we detect syntactical structures such as dialog
in films by analyzing the statistics in periodic analysis transform of shot
sequences. Test on five films show the following results: scene boundary
detection: 88% recall and 72% precision, dialogue detection: 91% recall
and 100% precision.
|
|