Dan Ellis:
Research project ideas
Below is a list of research projects that are either just beginning,
or which are based on ideas that I'm still working on. One reason for having
them here is to allow me to connect with students who might be interested
in working on one of these topics, or some similar project. If you are,
please drop me a line.
Some topics are still waiting for me to create full pages describing
them. This list is updated sporadically.
You might also take a look at the more recent list of
Class Project Ideas for my class ELEN E6820 Speech and Audio Processing and Recognition.
- Meeting recorder segmentation: The
Meeting Recorder is a large-scale project into automatic processing of
recorded meetings. We currently have some raw recordings, but very few
tools to process them. A relatively tractable first stage would be a system
to automatically extract speaker turns based on the individual close-talk
mic channels. This could also lead to a system for the much more difficult
problem of extracting speaker turns using only a pair of tabletop mics.
- Alarm sound detection: Alarm
sounds are pretty easily identified as such by human listeners, so it might
be practical to build a computer system to do the same thing.
- Machine listener: There's an
awful lot of sound data out there; for instance, tuning a broadcast receiver
to a radio or TV station gives an essentially limitless stream of sound
data. Is there anything we can do to exploit this data, for instance by
using statistical techniques to 'learn' the characteristics of real-world
sounds? This project is about finding out.
- Sound browser/visualization:
Having good data investigation tools can have an enormous impact on
the kind and quality of research that is performed. In a research environment,
where interests are hard to tie down and frequently change, these tools
must be very flexible and extensible. Although we are using a range of
good third-party sound analysis tools, it would be worthwhile to have an
in-house system that supports integrated browsing of audio data and all
the kinds of derived descriptions that come out of the different projects.
If we do a good job, such a tool could be valuable elsewhere too.
- Voice modeling and transformation:
Current speech recognition
systems are based on a very crude reduction of the speech signal to maybe
13 spectral coefficients sampled 50-100 times a second. If you resynthesize
a voice from these models, it is barely intelligible. More detailed models
of the voice, for instance those used in coding and synthesis, can provide
additional information about the speaker and speech. One interesting application
is the effort to 'transform' one speaker's voice into another's, based
on joint statistical modeling at an abstract level.
- Audio feature toolkit: Lots of people would like to use soundtrack
features in their content analysis work, but may not be interested in learning
too much about acoustic properties and representations. An easily-deployed
toolkit that allowed the application of various standard algorithms to
produce a rough-and-ready feature stream from any soundtrack, might be
widely appreciated.
- Content-based audio retrieval: In some senses, this project
encompasses all the interests of the group: the general problem of finding
information in a sound stream without having to listen to it all yourself.
While specific algorithms for information extraction from audio developed
in the other projects provide the representation necessary for this kind
of retrieval, there are a host of high-level, user interface issues that
need to be considered relating to exactly what an audio retrieval system
might look like, and what it might be good for.
Also see this older list of suggested research
projects.
Last updated: $Date: 2001/05/28 21:32:45 $
Dan Ellis <[email protected]>