Home page |
Department of Electrical Engineering / Columbia University Speech RecognitionEECS E6870 — Fall 2012 | |||||||||||||||||||
General Information
OverviewThe first portion of the course will cover fundamental topics in speech recognition: signal processing, Gaussian mixture distributions, hidden Markov models, pronunciation modeling, acoustic state tying, decision trees, finite-state transducers, search, and language modeling. Topics will be covered in sufficient detail for students to be able to implement a basic large vocabulary speech recognizer. In the remainder of the course, selected topics from the current state-of-the-art will be discussed. We will cover several key areas in more depth and survey some advanced topics, including acoustic adaptation, discriminative training, maximum entropy models, and deep belief networks. PrerequisitesThe course assumes a knowledge of basic probability and statistics. Knowledge of digital signal processing (ELEN E4810) is helpful but not required. In addition, there will be several programming assignments during the semester; we recommend using C++ but some support for other programming languages may be provided. Thus, proficiency in at least one programming language is required and a basic knowledge of Unix or Linux is helpful. If you do not have the prerequisites and would still like to take the course, please contact one of the instructors. ReadingsReadings will be taken from a variety of sources; PDF versions of the appropriate readings will be made available before each lecture. There is no required text; below are recommended and reference texts that students might find useful: Recommended text:
Coursework and GradingThe coursework will consist of four programming assignments and a final reading project. The programming assignments will involve implementing various portions of a basic speech recognition system. Initially, a simple dynamic time warping recognizer will be written, and this will be incrementally extended during the semester to form a large vocabulary continuous speech recognizer. The recommended language is C++, but some support for other programming languages may be provided. In particular, we will be providing C/C++ libraries for some of the exercises, and these must be accessible from any selected programming language. Students will be given accounts on the EE department's ILAB computer cluster to complete the programming assignments. For the final reading project, students will be asked to read one or more papers about a topic not covered in depth in class, and to give a 10-minute presentation summarizing the paper(s). A list of suggested papers will be provided, or students may choose their own with instructor approval. Instead of the final reading project, motivated students have the option of doing a programming/experimental project, either individually or in a group. A list of projects will be provided or students may propose their own, subject to approval from the instructors. Again, a short presentation will be required for each student. The final grade will be broken down as follows:
Stanley F. Chen <[email protected]> Last updated: 2012 Jul 31 |