Course outline
The Topic links will take you to the slides for that lecture.
Slides for a lecture will be posted by 8pm the night
before the lecture. The PDF links in the Readings
column will take you to PDF versions of all required readings (i.e.,
if no PDF version is available for a paper, the paper is
not required reading).
Key for sources of readings:
- [Holmes]: Speech Synthesis and Recognition,
J. Holmes, W. Holmes
- [R+J]: Fundmentals of Speech Recognition, Rabiner, Juang
- [J+M]: Speech and Language Processing, Jurafsky, Martin,
2nd ed.
- [Jelinek]: Statistical Methods for Speech Recognition,
Jelinek
- [HAH]: Spoken Language Processing, Huang, Acero, Hon
| Lecture |
Date |
Topic |
Readings |
| 1 |
2009-09-08 |
Introduction: brief history
of ASR; speech production+perception; speech capture; signal processing
basics (4up).
|
Optional:
speech production: [Holmes] Ch. 2, [R+J] Sec. 2.1-2.4;
speech perception: [Holmes] Ch. 3, [HAH] p. 29-36, [R+J] Sec. 3.5;
speech capture: [HAH] p. 486-497;
signal processing: [HAH] p. 201-223, 242-245, [R+J] p. 69-91. |
| 2 |
2009-09-15 |
Signal processing and dynamic-time warping
(4up).
Setting up your account.
Lab 1 (PDF, HTML).
|
Required (PDF):
MFCC: [HAH] Sec. 6.5.2;
LPC: [R+J] Sec. 3.3.1-3.3.3;
PLP: [Gold+Morgan] Sec. 22.1-22.2;
DTW: [Holmes] Sec. 8.6-8.7.
Optional: DTW: [R+J] p. 200-226, [Sakoe+Chiba] paper.
|
| 3 |
2009-09-22 |
DTW, Gaussian mixture models, and
intro to HMM's (4up).
|
Required (PDF):
GMM's: [Duda+Hart+Stork] p. 84-90, p. 517-528, [HAH] p. 92-95;
HMM's: [Holmes], p. 127-132, [HAH] p. 377-385.
|
| 4 |
2009-09-29 |
Hidden Markov models
(4up/ps).
Lab 2 (PDF, HTML).
|
Required (PDF):
HMM's:
[Rabiner] "A tutorial on HMM's",
[Poritz] "HMM's: A Guided Tour",
[Holmes], p. 133-158,
[HAH] p. 385-396, p. 441-443,
[Duda+Hart+Stork] p. 128-138.
|
| 5 |
2009-10-06 |
Language modeling
(4up).
|
Required (PDF):
N-gram's:
[J+M] Ch. 4.
Optional: [Chen+Goodman] paper.
|
| 6 |
2009-10-13 |
Pronunciation modeling and
decision trees (4up/ps).
Lab 3 (PDF, HTML).
|
Required (PDF):
pronunciation modeling: [HAH] p. 428-436, [Holmes] p. 186-196;
decision trees: [HAH] p. 175-189, [Duda+Hart+Stork] p. 395-413.
|
| 7 |
2009-10-20 |
LVCSR training and introduction to FST's
(4up).
|
Optional (PDF):
FST's: [Pereira+Riley] paper.
|
| 8 |
2009-10-27 |
Search
(4up).
Lab 4 (PDF, HTML).
|
Required (PDF):
[Mohri+Pereira+Riley] paper, [Aubert] paper.
Optional: [Ney+Ortmanns] paper, [HAH] p. 608-630,
[Aho+Sethi+Ullman] p. 141-144.
|
|
2009-11-03 |
Election Day |
|
| 9 |
2009-11-10 |
Robustness; adaptation
(4up).
|
Required (PDF):
[HAH] p. 107-109, p. 444-451 (MAP and MLLR);
[HAH] p. 515-519 (spectral subtraction), p. 522-525 (CMR),
p. 528-529 (retraining);
[Leggetter+Woodland] paper, [Gales] paper (MLLR);
[Gauvain+Lee] paper (MAP);
[Acero+Stern] paper (CDCN);
[Gales+Young] paper (PMC).
|
| 10 |
2009-11-17 |
Advanced language modeling;
maximum entropy models
(4up).
|
Optional (PDF):
class n-grams: [Brown] paper;
grammatical LM's: [Chelba] paper;
topic LM's: [Seymore] paper;
maximum entropy and triggers: [Rosenfeld] paper;
everything and a bag of chips: [Goodman] paper.
|
| 11 |
2009-11-24 |
Discriminative training; ROVER and
consensus
(4up).
|
Required (PDF):
[Duda+Hart+Stork] p. 114-124 (LDA);
[Povey+Woodland] paper (MMI);
[Povey+Woodland] paper (MPE);
[Mangu+Brill+Stolcke] paper (consensus decoding);
[Fiscus] paper (ROVER system combination).
|
| 12 |
2009-12-01 |
Spoken document retrieval
(4up/ps);
speech-to-speech translation
(4up/ps).
|
Optional: None.
|
| 13 |
2009-12-08 |
Project presentations. |
Stanley F. Chen
<[email protected]>
Last updated: 2009 Sep 07
|