Home page

Topics in Signal Processing: Speech Recognition

ELEN E6884/COMS 86884 - Fall 2005

General Information

Instructors: Michael Picheny <[email protected]>
Ellen Eide <[email protected]>
Stanley F. Chen <[email protected]>
(Note: In E-mail, start subject line with "ELEN E6884:")
IBM T.J. Watson Research Center, Yorktown Heights, NY

Instructor office hours: Thursday 6:40-7:30pm (after class); before class by appointment

Teaching assistant: TBA

Required text: none; see below for recommended texts

Lectures: Thursday, 4:10-6:40pm
Location: 1306 Mudd

Credits: 3

Course web site: http://www.ee.columbia.edu/~stanchen/e6884/

Overview

The first portion of the course will cover fundamental topics in speech recognition: signal processing, Gaussian mixture distributions, hidden Markov models, pronunciation modeling, acoustic state tying, decision trees, finite-state transducers, search, and language modeling. Topics will be covered in sufficient detail for students to be able to implement a basic large vocabulary speech recognizer.

In the remainder of the course, selected topics from the current state-of-the-art will be discussed. We will cover several key areas in more depth, and survey some advanced topics, including acoustic adaptation, discriminative training, and audio-visual speech recognition.

Prerequisites

The course assumes a knowledge of basic probability and statistics. Knowledge of digital signal processing (ELEN E4810) is recommended. In addition, there will be many challenging C++ programming projects during the semester, so a knowledge of C or C++ is required and a basic knowledge of Unix or Linux will be helpful. While an effort will be made not to use an excessive number of C++ constructs in the programming assignments, students who do not know C++ will be expected to pick up the requisite C++ competence on their own. If you do not have the prerequisites and would still like to take the course, please contact one of the instructors.

Readings

Readings will be taken from a variety of sources; PDF versions of the appropriate readings will be made available before each lecture. There is no required text; below are recommended and reference texts that students might find useful:

Recommended text:

Speech Synthesis and Recognition, John Holmes and Wendy Holmes, 2nd edition (paperback, 256 pp., 2001, ISBN 0748408576, ~$45)
- Good introductory text covering many areas.

Reference texts:

Fundmentals of Speech Recognition, Rabiner, Juang (paperback, 496 pp., 1993, ISBN 0130151572, ~$98)
- Reference for signal processing.
Speech and Language Processing, Jurafsky, Martin (hardcover, 960 pp., 2000, ISBN 0130950696, ~$83)
- Reference for language modeling and text processing.
Statistical Methods for Speech Recognition, Jelinek (hardcover, 300 pp., 1998, ISBN 0262100665, ~$45)
- Hardcore coverage of selected topics.
Spoken Language Processing, Huang, Acero, Hon (paperback, 1008 pp., 2001, ISBN 0130226165, ~$85)
- Exhaustive reference for ASR.

Coursework and Grading

The coursework will consist of four programming assignments and a final reading project. Programming assignments will use C++ with the GNU g++ compiler on x86 PCs running Linux; students will be given accounts on the EE department's ILAB computer cluster.

The programming assignments will involve implementing various portions of a basic speech recognition system. Initially, a simple dynamic time warping recognizer will be written, and this will be incrementally extended during the semester to form a large vocabulary continuous speech recognizer. Extensive code infrastructure will be provided for the programming exercises, to allow students to focus on the essential algorithms rather than mundane tasks such as file input/output.

For the final reading project, students will be asked to read one or more papers about a topic not covered in depth in class, and to give a 15 minute presentation summarizing the paper(s). A list of suggested papers will be provided, or students may chooise their own with instructor approval.

The final grade will be broken down as follows:

Programming assignments: 80%

Final project: 20%

Stanley F. Chen <[email protected]>
Last updated: 09/07/05