Dan Ellis: Research Projects:

Audio Feature Toolkit

It is generally recognized that the soundtrack to video carries extensive information of value to content analysis and indexing applications. Many labs are investigating these kinds of issues, and several have looked at soundtrack analysis as a source of cues. However, many more researchers might like to use audio information, but find the prospect of developing a whole new set of primitives, to look at a modality that is not their primary concern, rather discouraging.

Thanks to the long history of speech recognition, computer music, audio compression etc., there are a number of different useful feature types that have been defined. While this is still an active research area, there are plenty of applications where the development of novel features is not needed: using a well-established feature would probably be fine.

In addition to features, there are numerous classification and segmentation algorithms, many derived from speech recognition, that are relatively mature, and would be useful to a wide range of researchers if an easy-to-use implemetation was available.

This project is about developing an integrated suite of sound analysis and classification tools with the goal of providing something convenient and useful for multimedia content researchers who wish to avoid the investment of developing their own code. Thus, the resulting programs should be:

Starting points

A place to start would be to be able to reproduce the common speech/music/other segmentation that has been reported by various authors (e.g. Hain et al.'s description of Broadcast News segmentation) via manually-labeled Gaussian Mixture Models and Hidden Markov Model decoding.

We already have feacalc to calculate low-order warped-frequency cepstra (including MFCCs), although it was built for large collections of short clips (speech databases) rather than a few huge soundtracks.

Related work

Other people have identified the value of a common set of audio processing tools, and also It's a natural urge to package up one's tools and hope that all the invested effort might benefit someone else. There are a number of current related efforts that this work might beneficially exist within or alongside:

Last updated: $Date: 2001/05/29 02:52:40 $
Dan Ellis <dpwe@ee.columbia.edu>