Department of Electrical Engineering / Columbia University Speech RecognitionEECS E6870 — Spring 2016 |
||||||||||||||||||
Table of ContentsGeneral Information
OverviewThe first portion of the course will cover fundamental topics in speech recognition: signal processing, Gaussian mixture distributions, the Expectation-Maximization algorithm, deep neural networks, hidden Markov models, pronunciation modeling, decision trees, language modeling, finite-state transducers, and search. Topics will be covered in sufficient detail for students to be able to implement a basic large vocabulary speech recognizer. In the remainder of the course, selected topics from the current state of the art will be discussed. We will cover several key areas in more depth and survey some advanced topics, including acoustic adaptation, discriminative training, and maximum entropy models. PrerequisitesThe course assumes a knowledge of basic probability and statistics. Knowledge of digital signal processing (ELEN E4810) is helpful but not required. In addition, there will be several programming assignments in C++. Only basic features of C++ will be used, so while we do not require proficiency in C++, proficiency in at least one programming language is required. A basic knowledge of Unix or Linux is also helpful. If you do not have the prerequisites and would still like to take the course, please contact one of the instructors. ReadingsReadings will be taken from a variety of sources; PDF versions of the appropriate readings will be made available before each lecture. There is no required text; below are recommended and reference texts that students might find useful. (Prices are from Amazon as of Jan. 2016.) Recommended text:
Coursework and GradingThe coursework will consist of five programming assignments in C++ and a final reading project. The programming assignments will involve implementing various portions of a basic speech recognition system. Initially, a simple dynamic time warping recognizer will be written, and this will be incrementally extended during the semester to form a large vocabulary continuous speech recognizer. Students will be given accounts on the EE department's ILAB computer cluster to complete the programming assignments. For the final reading project, students will be asked to read one or more papers about a topic not covered in depth in class, and to write a 1500-2500 word paper reviewing and analyzing the material. A list of suggested papers will be provided, or students may choose their own with instructor approval. Instead of the final reading project, motivated students have the option of doing a programming/experimental project, either individually or in a group. A list of projects will be provided or students may propose their own, subject to approval from the instructors. Each team must write a paper describing their work and give a 10-15m presentation to the class. The final grade will be broken down as follows:
Stanley F. Chen <[email protected]> |