dvmmPub93

Hari Sundaram, Shih-Fu Chang. Audio Scene Segmentation using Multiple Models, Features and Time Scales. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, June 2000.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In this paper we present an algorithm for audio scene segmentation. An audio scene is a semantically consistent sound segment that is characterized by a few dominant sources of sound. A scene change occurs when a majority of the sources present in the data change. Our segmentation framework has three parts: (a) A definition of an audio scene (b) multiple feature models that characterize the dominant sources and (c) a simple, causal listener model, which mimics human audition using multiple time-scales. We define a correlation function that determines correlation with past data to determine segmentation boundaries. The algorithm was tested on a difficult data set, a 1 hour audio segment of a film, with impressive results. It achieves an audio scene change detection accuracy of 97%

Contact

Hari Sundaram
Shih-Fu Chang

BibTex Reference

@InProceedings{dvmmPub93,
   Author = {Sundaram, Hari and Chang, Shih-Fu},
   Title = {Audio Scene Segmentation using Multiple Models, Features and Time Scales},
   BookTitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
   Address = {Istanbul, Turkey},
   Month = {June},
   Year = {2000}
}

EndNote Reference [help]

Get EndNote Reference (.ref)

For problems or questions regarding this web site contact The Web Master.