Student Research Projects

I am looking for some masters' degree students to work on the following research projects for the Fall 2000 semester. If you are interested, please contact me, including a brief resume of your relevant experience. These projects will most likely involve programming in C/C++ on a Linux platform, so experience in this domain is important.

Thanks - Dan Ellis <>

Meeting Recorder

This is a large project that I helped set up at my previous position with the International Computer Science Institute (ICSI) in Berkeley CA. We are interested in the problem of automatic processing of audio recorded during conventional meetings. Currently, we have collected a few hours of recordings, which we are having transcribed. The audio is recorded simultaneously on 16 channels, both on head-mounted microphones and by microphones placed on the conference table. The eventual goal is to develop useful speech recognition from desktop microphones, and to use this for summarization and retrieval of recorded information, but there are many stages along this path. Full details of the meeting recorder data capture setup are available on my ICSI Meeting Recorder web site.

There are several projects I would like to pursue based on analysis of the existing recordings, including:

Tandem Acoustic Modeling for Automatic Speech Recognition

This is a new approach to modeling the speech signal that combines the standard Gaussian Mixture Model / Hidden Markov Model (GMM/HMM) approach with the more unusual connectionist (neural network) approach. Our experiments last year in a connected digits task achieved an improvement of more than 50% in word error rate over a standard baseline system, as reported by Hermansky, Ellis & Sharma at ICASSP-2000 (see also these slides from a recent talk on the Tandem approach in PDF format). However, that system is really just a first pass; we would like to investigate variations to determine which parts are most important and how they can be improved. Projects in this general area include:

Recognition based on partial information

I am involved in the European project RESPITE which is concerned with speech recognition when some of the underlying data is missing or obscured. Within this project, there are a number of ongoing open research areas:

Unsupervised learning of audio signals

One of the main research themes of LabROSA is automatic extraction of audio content structure for use in indexing and retrieval. The ideal is to simulate the skills of a human 'librarian' who will preview a large archive of multimedia material, figure out the significant, recurrent content, and build an appropriate index.

An important step towards this goal would be the development of algorithms that can recognize recurrent patterns or structures in large audio databases without any manual input or labels - i.e. via unsupervised learning. There are several threads I would like to pursue:

Last updated: $Date: 2000/09/11 17:37:32 $
Dan Ellis <>