Separating Speech from Speech Noise

The task of separating speech in complex acoustic environments -- such as a single voice in a cocktail party -- is an extremely difficult challenge. Many speech enhancement or separation techniques cannot accommodate the situation when both target and interference have the same properties, because both are speech. This project is concerned with applying some novel models -- using Computational Auditory Scene Analysis (CASA) and trained models of the speech signal -- to see how well speech can be separated. In particular, our goal is to provide separations that are demonstrably of benefit to human listeners, hence our collaboration with perceptual experimentalists at EBIRE and Boston University.

Partners

East Bay Institute for Research and Education - Pierre Divenyi
Boston University - Barbara Shinn-Cunningham
Columbia University - Dan Ellis
Ohio State University - DeLiang Wang

Resources

A page of examples of very challenging acoustic environments
alignSpondee.tgz - a package of HTK scripts for making 1ms-resolution alignments between experimental tokens and phone labels. We are using "spondees" (dog-house, fire-truck) to control for stress prosody in our experiments.

Related Publications

R. Weiss and D. Ellis (2006)
Estimating single-channel source separation masks: Relevance Vector Machine classifiers vs. pitch-based masking
Proc. Workshop on Statistical and Perceptual Audition SAPA-06, pp. 31-36, Pittsburgh PA, Oct 2006. (6pp)
D. Ellis and R. Weiss (2006)
Model-Based Monaural Source Separation Using a Vector-Quantized Phase-Vocoder Representation
Proc. ICASSP-06, Toulouse, May 2006, pp. V-957-960. (4pp)
M. Mandel, D. Ellis, and T. Jebara (2006)
An EM algorithm for localizing multiple sound sources in reverberant environments
Proc. Neural Info. Proc. Sys., Vancouver CA, Dec 2006. (8pp)
M. Mandel and D. Ellis (2006)
A probability model for interaural phase difference
Proc. Workshop on Statistical and Perceptual Audition SAPA-06, pp. 1-6, Pittsburgh PA, Oct 2006. (6pp)
M. Athineos and D. Ellis (2007)
Autoregressive Modeling of Temporal Envelopes
IEEE Tr. Signal Processing, accepted for publication. (9pp)
D. Ellis (2006)
Model-Based Scene Analysis
Chapter 4 of Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, D. Wang & G. Brown, eds., Wiley/IEEE Press, pp. 115-146, 2006. (46pp)

Reports

Annual Report 2006

Acknowledgment

This material is based in part upon work supported by the National Science Foundation under Grant No. IIS-05-35168. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last updated: $Date: 2005/08/09 03:26:12 $
Dan Ellis <[email protected]>