Research project ideas

Below is a list of research projects that are either just beginning, or which are based on ideas that I'm still working on. One reason for having them here is to allow me to connect with students who might be interested in working on one of these topics, or some similar project. If you are, please drop me a line.

Some topics are still waiting for me to create full pages describing them. This list is updated sporadically.

You might also take a look at the more recent list of Class Project Ideas for my class ELEN E6820 Speech and Audio Processing and Recognition.

Meeting recorder segmentation: The Meeting Recorder is a large-scale project into automatic processing of recorded meetings. We currently have some raw recordings, but very few tools to process them. A relatively tractable first stage would be a system to automatically extract speaker turns based on the individual close-talk mic channels. This could also lead to a system for the much more difficult problem of extracting speaker turns using only a pair of tabletop mics.
Alarm sound detection: Alarm sounds are pretty easily identified as such by human listeners, so it might be practical to build a computer system to do the same thing.
Machine listener: There's an awful lot of sound data out there; for instance, tuning a broadcast receiver to a radio or TV station gives an essentially limitless stream of sound data. Is there anything we can do to exploit this data, for instance by using statistical techniques to 'learn' the characteristics of real-world sounds? This project is about finding out.
Sound browser/visualization: Having good data investigation tools can have an enormous impact on the kind and quality of research that is performed. In a research environment, where interests are hard to tie down and frequently change, these tools must be very flexible and extensible. Although we are using a range of good third-party sound analysis tools, it would be worthwhile to have an in-house system that supports integrated browsing of audio data and all the kinds of derived descriptions that come out of the different projects. If we do a good job, such a tool could be valuable elsewhere too.
Voice modeling and transformation: Current speech recognition systems are based on a very crude reduction of the speech signal to maybe 13 spectral coefficients sampled 50-100 times a second. If you resynthesize a voice from these models, it is barely intelligible. More detailed models of the voice, for instance those used in coding and synthesis, can provide additional information about the speaker and speech. One interesting application is the effort to 'transform' one speaker's voice into another's, based on joint statistical modeling at an abstract level.
Audio feature toolkit: Lots of people would like to use soundtrack features in their content analysis work, but may not be interested in learning too much about acoustic properties and representations. An easily-deployed toolkit that allowed the application of various standard algorithms to produce a rough-and-ready feature stream from any soundtrack, might be widely appreciated.
Content-based audio retrieval: In some senses, this project encompasses all the interests of the group: the general problem of finding information in a sound stream without having to listen to it all yourself. While specific algorithms for information extraction from audio developed in the other projects provide the representation necessary for this kind of retrieval, there are a host of high-level, user interface issues that need to be considered relating to exactly what an audio retrieval system might look like, and what it might be good for.

Also see this older list of suggested research projects.

Last updated: $Date: 2001/05/28 21:32:45 $
Dan Ellis <[email protected]>