Department of Electrical Engineering / Columbia University

Speech Recognition

EECS E6870 — Spring 2016


Course outline

The Topic links will take you to the slides for that lecture. Slides for a lecture will hopefully be posted by 10pm the night before the lecture. The PDF links in the Readings column will take you to PDF versions of all required readings (i.e., if no PDF version is available for a paper, the paper is not required reading).

Key for sources of readings:

  • [Holmes]: Speech Synthesis and Recognition, J. Holmes, W. Holmes.

  • [R+S]: Theory and Applications of Digital Signal Processing, Rabiner, Schafer.

  • [R+J]: Fundamentals of Speech Recognition, Rabiner, Juang.

  • [J+M]: Speech and Language Processing, Jurafsky, Martin, 2nd ed.

  • [Jelinek]: Statistical Methods for Speech Recognition, Jelinek.

  • [HAH]: Spoken Language Processing, Huang, Acero, Hon.

Lecture Date Topic Readings
1 01/20 Introduction: brief history of ASR; speech production+perception; signal processing basics. Optional: speech production: [Holmes] Ch. 2, [R+S] Ch. 3; speech perception: [Holmes] Ch. 3, [HAH] p. 29-36, [R+S] Sec. 4.5; speech capture: [HAH] p. 486-497; signal processing: [HAH] p. 201-223, 242-245, [R+J] p. 69-91. (PDF)
2 01/27 Signal processing and dynamic-time warping. Setting up your account. Lab 1 (HTML, PDF). Required: MFCC: [HAH] Sec. 6.5.2; LPC: [R+S] Sec. 9.2-9.2.2; PLP: [Gold+Morgan] Sec. 22.1-22.2; DTW: [Holmes] Sec. 8.6-8.7. Optional: DTW: [R+J] p. 200-226, [Sakoe+Chiba] paper. (PDF)
3 02/03 Gaussian mixture models; EM algorithm. Required: GMM's: [Duda+Hart+Stork] p. 84-90, p. 517-528, [HAH] p. 92-95; HMM's: [Holmes], p. 127-132, [HAH] p. 377-385. (PDF)
4 02/10 Hidden Markov models. Lab 2 (HTML, PDF). Required: HMM's: [Rabiner] A tutorial on HMM's, [Poritz] HMM's: A Guided Tour, [Holmes], p. 133-158, [HAH] p. 385-396, p. 441-443, [Duda+Hart+Stork] p. 128-138. (PDF)
5 02/17 The big picture. Required: N-gram's: [J+M] Ch. 4. Optional: [Chen+Goodman] paper. (PDF)
6 02/24 Language modeling. Lab 3 (HTML, PDF). Required: pronunciation modeling: [HAH] p. 428-436, [Holmes] p. 186-196; decision trees: [HAH] p. 175-189, [Duda+Hart+Stork] p. 395-413. (PDF)
7 03/02 Pronunciation modeling and decision trees. None (PDF)
8 03/09 LVCSR training. Optional: FST's: [Pereira+Riley] paper. (PDF)
recess 03/16
9 03/23 LVCSR search. Lab 4 (HTML, PDF); Debugging tips. Required: [Mohri+Pereira+Riley] paper, [Aubert] paper. Optional: [Ney+Ortmanns] paper, [HAH] p. 608-630, [Aho+Sethi+Ullman] p. 141-144. (PDF)
10 03/30 Advanced language modeling: maximum entropy models, Model M, and neural network LM's. Optional: class n-grams: [Brown] paper; grammatical LM's: [Chelba] paper; topic LM's: [Seymore] paper; maximum entropy and triggers: [Rosenfeld] paper; everything and a bag of chips: [Goodman] paper; Model M: [Chen] paper; neural network LM's: [Bengio et al.] paper. (PDF)
11 04/06 Robustness; adaptation. Required: [HAH] p. 107-109, p. 444-451 (MAP and MLLR); [HAH] p. 522-525 (CMR), p. 528-529 (retraining); [Leggetter+Woodland] paper, [Gales] paper (MLLR); [Gauvain+Lee] paper (MAP). (PDF)
12 04/13 Discriminative training; ROVER and consensus. Required: [Duda+Hart+Stork] p. 114-124 (LDA); [Povey+Woodland] paper (MMI); [Povey+Woodland] paper (MPE); [Mangu+Brill+Stolcke] paper (consensus decoding); [Fiscus] paper (ROVER system combination). (PDF)
13 04/20 Deep neural networks I. TBA (PDF)
14 04/27 Deep neural networks II. TBA (PDF)
study 05/04
finals 05/11 Project presentations.


Stanley F. Chen <stanchen@us.ibm.com>
Last updated: 2016 Apr 27