Topics in Signal Processing: Speech Recognition

ELEN E6884/COMS 86884 - Fall 2005

The Topic links will take you to the slides for that lecture. Slides for a lecture will be posted by 8pm the night before the lecture; links to slides for future lectures will not work. The PDF links in the Readings column will take you to PDF versions of all required readings (i.e., if no PDF version is available for a paper, the paper is not required reading).

Key for sources of readings:

  • [Holmes]: Speech Synthesis and Recognition, J. Holmes, W. Holmes
  • [R+J]: Fundmentals of Speech Recognition, Rabiner, Juang
  • [J+M]: Speech and Language Processing, Jurafsky, Martin
  • [Jelinek]: Statistical Methods for Speech Recognition, Jelinek
  • [HAH]: Spoken Language Processing, Huang, Acero, Hon

Lecture Date Topic Readings
1 2005-09-08 Introduction: brief history of ASR; speech production+perception; signal processing basics. Lab 0 (PDF, HTML). Optional: speech production: [Holmes] Ch. 2, [R+J] Sec. 2.1-2.4; speech perception: [Holmes] Ch. 3, [HAH] p. 29-36, [R+J] Sec. 3.5; speech capture: [HAH] p. 486-497; signal processing: [HAH] p. 201-223, 242-245, [R+J] p. 69-91
2 2005-09-15 Signal processing and dynamic-time warping. Lab 1 (PDF, HTML). Required (PDF): MFCC: [HAH] Sec. 6.5.2; LPC: [R+J] Sec. 3.3.1-3.3.3; PLP: [Gold+Morgan] Sec. 22.1-22.2; DTW: [Holmes] Sec. 8.6-8.7. Optional: DTW: [R+J] p. 200-226, [Sakoe+Chiba] paper.
3 2005-09-22 Gaussian mixture models; intro to HMM's. Required (PDF): GMM's: [Duda+Hart+Stork] p. 84-90, p. 517-528, [HAH] p. 92-95; HMM's: [Holmes], p. 127-132, [HAH] p. 377-385.
4 2005-09-29 Hidden Markov models. Lab 2 (PDF, HTML). Required (PDF): HMM's: [Rabiner] "A tutorial on HMM's", [Poritz] "HMM's: A Guided Tour", [Holmes], p. 133-158, [HAH] p. 385-396, p. 441-443, [Duda+Hart+Stork] p. 128-138.
5 2005-10-06 The Big Picture; Language modeling. Required (PDF): N-gram's: [J+M] Ch. 6. Optional: [Chen+Goodman] paper.
6 2005-10-13 Pronunciation modeling and decision trees. Required (PDF): pronunciation modeling: [HAH] p. 428-436, [Holmes] p. 187-196. decision trees: [HAH] p. 175-189, [Duda+Hart+Stork] p. 395-413.
7 2005-10-20 LVCSR training and introduction to FST's. Lab 3 (PDF, HTML). Required (PDF): Optional: FST's: [Pereira+Riley] paper.
8 2005-10-27 Search. Required (PDF): [Mohri+Pereira+Riley] paper, [Aubert] paper. Optional: [Ney+Ortmanns] paper, [HAH] p. 608-630, minimization: [Aho+Sethi+Ullman] p. 141-144.
9 2005-11-03 Robustness; adaptation. Lab 4 (PDF, HTML). Required (PDF): [HAH] p. 107-109, p. 444-451 (MAP and MLLR); [HAH] p. 515-519 (spectral subtraction), p. 522-525 (CMR), p. 528-529 (retraining); [Leggetter+Woodland] paper (MLLR); [Gauvain+Lee] paper (MAP); [Acero+Stern] paper (CDCN); [Gales+Young] paper (PMC).
10 2005-11-10 Discriminative training; ROVER and consensus. Required (PDF): [Duda+Hart+Stork] p. 114-124 (LDA); [Povey+Woodland] paper (MMIE); [Mangu+Brill+Stolcke] paper (consensus decoding); [Fiscus] paper (ROVER system combination).
11 2005-11-17 Advanced language modeling; maximum entropy models. Optional (PDF): class n-grams: [Brown] paper; grammatical LM's: [Chelba] paper; topic LM's: [Seymore] paper; maximum entropy and triggers: [Rosenfeld] paper; everything and a bag of chips: [Goodman] paper.
2005-11-24 Thanksgiving
12 2005-12-01 Audio-visual speech recognition; The Malach Project. Optional: AVSR: project publications; Malach: project publications.
13 2005-12-08 Project presentations

