Course outline

Speech Recognition

EECS E6870 — Fall 2012

Course outline

The Topic links will take you to the slides for that lecture. Slides for a lecture will be posted by 8pm the night before the lecture. The PDF links in the Readings column will take you to PDF versions of all required readings (i.e., if no PDF version is available for a paper, the paper is not required reading).

Key for sources of readings:

[Holmes]: Speech Synthesis and Recognition, J. Holmes, W. Holmes
[R+S]: Theory and Applications of Digital Signal Processing, Rabiner, Schafer
[R+J]: Fundamentals of Speech Recognition, Rabiner, Juang
[J+M]: Speech and Language Processing, Jurafsky, Martin, 2nd ed.
[Jelinek]: Statistical Methods for Speech Recognition, Jelinek
[HAH]: Spoken Language Processing, Huang, Acero, Hon

Lecture Date Topic Readings

1 2012-09-10 Introduction: brief history of ASR; speech production+perception; signal processing basics (4up). Optional: speech production: [Holmes] Ch. 2, [R+S] Ch. 3; speech perception: [Holmes] Ch. 3, [HAH] p. 29-36, [R+S] Sec. 4.5; speech capture: [HAH] p. 486-497; signal processing: [HAH] p. 201-223, 242-245, [R+J] p. 69-91.

2 2012-09-17 Signal processing and dynamic-time warping (4up). Setting up your account. Lab 1 (PDF, HTML). Required (PDF): MFCC: [HAH] Sec. 6.5.2; LPC: [R+S] Sec. 9.2-9.2.2; PLP: [Gold+Morgan] Sec. 22.1-22.2; DTW: [Holmes] Sec. 8.6-8.7. Optional: DTW: [R+J] p. 200-226, [Sakoe+Chiba] paper.

3 2012-09-24 DTW, Gaussian mixture models, and intro to HMM's (4up). Required (PDF): GMM's: [Duda+Hart+Stork] p. 84-90, p. 517-528, [HAH] p. 92-95; HMM's: [Holmes], p. 127-132, [HAH] p. 377-385.

4 2012-10-01 Hidden Markov models. (4up). Lab 2 (PDF, HTML). Required (PDF): HMM's: [Rabiner] "A tutorial on HMM's", [Poritz] "HMM's: A Guided Tour", [Holmes], p. 133-158, [HAH] p. 385-396, p. 441-443, [Duda+Hart+Stork] p. 128-138.

5 2012-10-08 Language modeling. (4up). Required (PDF): N-gram's: [J+M] Ch. 4. Optional: [Chen+Goodman] paper.

6 2012-10-15 Pronunciation modeling and decision trees. (4up). Lab 3 (PDF, HTML). Required (PDF): pronunciation modeling: [HAH] p. 428-436, [Holmes] p. 186-196; decision trees: [HAH] p. 175-189, [Duda+Hart+Stork] p. 395-413.

7 2012-10-22 Pronunciation modeling, cont'd. (4up). LVCSR training. (4up). Optional (PDF): FST's: [Pereira+Riley] paper.

2012-10-29 Hurricane Sandy

2012-11-05 Academic holiday

8 2012-11-12 LVCSR training and search. (4up). Required (PDF): [Mohri+Pereira+Riley] paper, [Aubert] paper. Optional: [Ney+Ortmanns] paper, [HAH] p. 608-630, [Aho+Sethi+Ullman] p. 141-144.

9 2012-11-19 LVCSR search (cont'd) (4up). Lab 4 (PDF, HTML).

10 2012-11-26 Advanced language modeling: maximum entropy models, Model M, and neural network LM's. (4up). Optional (PDF): class n-grams: [Brown] paper; grammatical LM's: [Chelba] paper; topic LM's: [Seymore] paper; maximum entropy and triggers: [Rosenfeld] paper; everything and a bag of chips: [Goodman] paper; Model M: [Chen] paper; neural network LM's: [Bengio et al.] paper.

11 2012-12-03 Robustness; adaptation. (4up). Required (PDF): [HAH] p. 107-109, p. 444-451 (MAP and MLLR); [HAH] p. 515-519 (spectral subtraction), p. 522-525 (CMR), p. 528-529 (retraining); [Leggetter+Woodland] paper, [Gales] paper (MLLR); [Gauvain+Lee] paper (MAP); [Acero+Stern] paper (CDCN); [Gales+Young] paper (PMC).

12 2012-12-10 Discriminative training; ROVER and consensus. (4up). Required (PDF): [Duda+Hart+Stork] p. 114-124 (LDA); [Povey+Woodland] paper (MMI); [Povey+Woodland] paper (MPE); [Mangu+Brill+Stolcke] paper (consensus decoding); [Fiscus] paper (ROVER system combination).

13 2012-12-12 Neural networks and deep belief networks for acoustic modeling. (2up). Optional: TBA.

14 2012-12-17 Project presentations.

Stanley F. Chen <[email protected]>
Last updated: 07/31/12