The goal of this assignment is for you, the student, to write a basic ASR front end and to evaluate it using a dynamic-time warping recognition system. It is meant to help you understand what basic signal processing steps are used in ASR, and why they are taken.
The lab consists of the following parts:
Part 0: Familiarization with the data (Required) --- Listen to a few utterances in the data set, until you believe you are processing actual speech signals.
Part 1: Write a front end (Required) --- Write a complete mel-frequency cepstral coefficient (MFCC) front end, except for the FFT, which will be provided.
Part 2: Implement dynamic time warping (Optional) --- Write a function that implements DTW.
Part 3: Evaluate different front ends using a DTW recognizer (Required) --- Run experiments on the TIDIGITS data set comparing the performance of different portions of the front end you implemented.
Part 4: Try to beat the MFCC front end (Optional) --- Try to develop a modification to the given MFCC front end (or do something completely different) to get better performance. The student achieving the best performance on a test set (that will not be released until after the assignment is due) will be awarded some sort of crappy prize.
All of the files needed for the lab can be found in the directory ~stanchen/e6884/lab1/. Before starting the lab, please read the file lab1.txt; this includes all of the questions you will have to answer while doing the lab. Questions about the lab can be posted on Courseworks (https://courseworks.columbia.edu/); a discussion topic will be created for each lab. Note: The hyperlinks in this document are enclosed in square brackets; you need an online version of this document to find out where they point to.