AUDIO SCENE SEGMENTATION USING MULTIPLE FEATURES

AUDIO SCENE SEGMENTATION USING MULTIPLE FEATURES, MODELS AND TIME SCALES
Hari Sundaram Shih-Fu Chang
Dept. Of Electrical Engineering, Columbia University,
New York, New York 10027.
Email: {sundaram, sfchang}@ctr.columbia.edu

ABSTRACT
In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%.