Topics in Signal Processing: Speech Recognition
ELEN E6884/COMS 86884 - Fall 2005
The first portion of the course will cover fundamental topics in speech recognition: signal processing, Gaussian mixture distributions, hidden Markov models, pronunciation modeling, acoustic state tying, decision trees, finite-state transducers, search, and language modeling. Topics will be covered in sufficient detail for students to be able to implement a basic large vocabulary speech recognizer.
In the remainder of the course, selected topics from the current state-of-the-art will be discussed. We will cover several key areas in more depth, and survey some advanced topics, including acoustic adaptation, discriminative training, and audio-visual speech recognition.
The course assumes a knowledge of basic probability and statistics. Knowledge of digital signal processing (ELEN E4810) is recommended. In addition, there will be many challenging C++ programming projects during the semester, so a knowledge of C or C++ is required and a basic knowledge of Unix or Linux will be helpful. While an effort will be made not to use an excessive number of C++ constructs in the programming assignments, students who do not know C++ will be expected to pick up the requisite C++ competence on their own. If you do not have the prerequisites and would still like to take the course, please contact one of the instructors.
Readings will be taken from a variety of sources; PDF versions of the appropriate readings will be made available before each lecture. There is no required text; below are recommended and reference texts that students might find useful:Recommended text:
Coursework and Grading
The coursework will consist of four programming assignments and a final reading project. Programming assignments will use C++ with the GNU g++ compiler on x86 PCs running Linux; students will be given accounts on the EE department's ILAB computer cluster.
The programming assignments will involve implementing various portions of a basic speech recognition system. Initially, a simple dynamic time warping recognizer will be written, and this will be incrementally extended during the semester to form a large vocabulary continuous speech recognizer. Extensive code infrastructure will be provided for the programming exercises, to allow students to focus on the essential algorithms rather than mundane tasks such as file input/output.
For the final reading project, students will be asked to read one or more papers about a topic not covered in depth in class, and to give a 15 minute presentation summarizing the paper(s). A list of suggested papers will be provided, or students may chooise their own with instructor approval.
The final grade will be broken down as follows:
Stanley F. Chen <[email protected]>
Last updated: 09/07/05