Home page

Speech Recognition

EECS E6870 — Fall 2012

General Information

Instructors: Bhuvana Ramabhadran <[email protected]>
Michael Picheny <[email protected]>
Stanley F. Chen <[email protected]>
(Note: In E-mail, start subject line with "EECS E6870:")
IBM T.J. Watson Research Center, Yorktown Heights, NY

Instructor office hours: Monday 6:40-7:30pm (after class); before class by appointment

Teaching assistant: Xiao-Ming Wu <[email protected]>

TA office hours: Monday 2-4pm, 7LE3 Schapiro (CEPSR) or by appointment

Required text: none; see below for recommended texts

Lectures: Monday, 4:10-6:40pm
Location: 633 Mudd

Credits: 3

Course web site: http://www.ee.columbia.edu/~stanchen/fall12/e6870/

Overview

The first portion of the course will cover fundamental topics in speech recognition: signal processing, Gaussian mixture distributions, hidden Markov models, pronunciation modeling, acoustic state tying, decision trees, finite-state transducers, search, and language modeling. Topics will be covered in sufficient detail for students to be able to implement a basic large vocabulary speech recognizer.

In the remainder of the course, selected topics from the current state-of-the-art will be discussed. We will cover several key areas in more depth and survey some advanced topics, including acoustic adaptation, discriminative training, maximum entropy models, and deep belief networks.

Prerequisites

The course assumes a knowledge of basic probability and statistics. Knowledge of digital signal processing (ELEN E4810) is helpful but not required. In addition, there will be several programming assignments during the semester; we recommend using C++ but some support for other programming languages may be provided. Thus, proficiency in at least one programming language is required and a basic knowledge of Unix or Linux is helpful. If you do not have the prerequisites and would still like to take the course, please contact one of the instructors.

Readings

Readings will be taken from a variety of sources; PDF versions of the appropriate readings will be made available before each lecture. There is no required text; below are recommended and reference texts that students might find useful:

Recommended text:

Speech Synthesis and Recognition, John Holmes and Wendy Holmes (2nd ed., paperback, 298 pp., 2001, ISBN 0748408576, ~$63)
- Good introductory text covering many areas.

Reference texts:

Theory and Applications of Digital Signal Processing, Rabiner, Schafer (hardcover, 1056 pp., 2010, ISBN 0136034284, ~$143)
- Reference for signal processing.
Speech and Language Processing, Jurafsky, Martin (2nd ed., hardcover, 1024 pp., 2008, ISBN 0131873210, ~$116)
- Reference for language modeling and text processing.
Statistical Methods for Speech Recognition, Jelinek (hardcover, 305 pp., 1998, ISBN 0262100665, ~$38)
- Hardcore coverage of selected topics.
Spoken Language Processing, Huang, Acero, Hon (paperback, 1008 pp., 2001, ISBN 0130226165, ~$73)
- Exhaustive reference for ASR.

Coursework and Grading

The coursework will consist of four programming assignments and a final reading project. The programming assignments will involve implementing various portions of a basic speech recognition system. Initially, a simple dynamic time warping recognizer will be written, and this will be incrementally extended during the semester to form a large vocabulary continuous speech recognizer. The recommended language is C++, but some support for other programming languages may be provided. In particular, we will be providing C/C++ libraries for some of the exercises, and these must be accessible from any selected programming language. Students will be given accounts on the EE department's ILAB computer cluster to complete the programming assignments.

For the final reading project, students will be asked to read one or more papers about a topic not covered in depth in class, and to give a 10-minute presentation summarizing the paper(s). A list of suggested papers will be provided, or students may choose their own with instructor approval.

Instead of the final reading project, motivated students have the option of doing a programming/experimental project, either individually or in a group. A list of projects will be provided or students may propose their own, subject to approval from the instructors. Again, a short presentation will be required for each student.

The final grade will be broken down as follows:

Programming assignments: 80%

Final project: 20%

Stanley F. Chen <[email protected]>
Last updated: 2012 Jul 31