To get yourself familiarized with what the programming exercises will be like, we will go through a mini-exercise. To make it possible for you only have to write the interesting bits of code in a speech recognizer for the labs, we have written extensive amounts of “glue” code. Each program will be compiled from a large number of C++ files, but almost all of these files have already been written for you. We will just leave out parts from a file or two that you will have to fill in.
To see what this is like, let's get started on the mini-exercise. First, create a new subdirectory for us to work in and go there:
mkdir -p ~/e6884/lab0/ cd ~/e6884/lab0/ |
cp ~stanchen/e6884/lab0/Lab0_FE.C . cp ~stanchen/e6884/lab0/.mk_chain . |
Now, open the file Lab0_FE.C in a text editor. You might notice that there are a bunch of weird contructs in the file that you don't understand. Don't freak out yet; there will be plenty of time for this later. Look for the markers BEGIN_LAB and END_LAB near the end of the file. This is the only section of the file you need to understand. The rest of the file can be ignored, though you may want to read the comments there for your own edification. (If you are interested in exploring the related header files and source code, look in the following directories:
~stanchen/pub/zeeapi/inc/ ~stanchen/pub/zeeapi/src/ ~stanchen/pub/zeelib/inc/ ~stanchen/pub/zeelib/src/ ~stanchen/pub/e6884/inc/ ~stanchen/pub/e6884/src/ |
In this exercise, you will be writing a simple signal processing module that takes as input a 2-D array containing a vector of floating-point numbers (or features) for each time unit (or frame) in a speech signal, and outputs a scaled version of the array. Since this is Lab 0, we are going to tell you what the answer is. Type/paste in the following code between the BEGIN_LAB and END_LAB markers:
for (int frm = 0; frm < inFrames; ++frm) { for (int dim = 0; dim < inDim; ++dim) outBuf[frm][dim] = inBuf[frm][dim] * scaleFactorM; } |
In terms of the bigger picture, we can view signal processing in ASR as being comprised of a number of processing steps applied in sequence. Each processing module takes the matrix of values produced by the last module (consisting of feature values for each frame in an utterance) and generates a matrix of values to be fed to the next module. The above example implements a module that does simple scaling; in Lab 1, you'll be implementing a number of modules needed in producing MFCC features.