Getting Started

EECS E6870: Speech Recognition

September 13, 2009


Table of Contents
1. Overview
2. Getting set up for remote login
3. Getting an ILAB account and logging on
4. Learning basic UNIX commands and text editing
5. Setting up your ILAB account
6. Language-specific setup
6.1. C++
6.2. Java
6.3. Matlab
6.4. Python
7. Learning how to debug programs

1. Overview

This document covers general information relevant for completing the programming exercises for the course. See the table of contents for the list of topics covered.


2. Getting set up for remote login

We will be using the ILAB Linux machines for the lab exercises. These machines are located at Mudd 1235 but you need badge access to access the room, so most of you will need to log in remotely. To log into these computers remotely, you need to use the program ssh. See http://www.ee.columbia.edu/pages/resources/systems_group/connecting_remotely.html for some basic information. Here are two free ssh implementations: PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) and OpenSSH (http://www.openssh.com/); you might want to check if they are already installed on your machine.

However, before you download a standalone ssh program, the best thing is to be running ssh under X Windows. X Windows is a windowing system that will let you run graphical programs like Matlab on the ILAB cluster and be able to see their output on your remote machine. In many contexts, ssh will automatically come with X Windows, so there's no need to install it separately.

Most UNIX/Linux machines have X Windows and ssh already installed on them. For Windows machines, one way to get X Windows and ssh is to use Cygwin (free); see http://x.cygwin.com/docs/ug/cygwin-x-ug.html for more information.


3. Getting an ILAB account and logging on

In the first class, you should have submitted information that will let us create a computer account for you on the ILAB computer cluster. If you haven't done this, contact one of the professors ASAP to get this process started. When an account has been created, a username and password will be E-mailed to the address that was provided to us.

Once you have a username and password, you can try logging onto a machine in the ILAB cluster. The machines are named micro1.ilab.columbia.edu through micro17.ilab.columbia.edu. To be able to use graphical programs like Matlab, you need to use the -X flag with ssh, e.g.,
ssh -X username@micro17.ilab.columbia.edu
Do not be alarmed if you see error messages like
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
Not all of the machines may be up at a given time, so if you are having trouble with one machine, try another. To log out, type
exit


4. Learning basic UNIX commands and text editing

For those of you who are not familiar with Linux or other variants of UNIX, you need to learn how to use UNIX for basic tasks such as making directories, moving/copying files, redirection, etc. and for editing text files. You're basically on your own for this, but we did some quick Google searches and here are some pointers:


5. Setting up your ILAB account

In this section, we discuss the things you need to do to set up your account for this course. By default, you will be assigned the shell bash. If you don't know what I'm talking about, you are probably using bash. If you are using a different shell, then you will have to adjust the commands in this section appropriately, but if you are using a different shell, you should know how.

First, check if you already have a .bash_profile and/or .bashrc file in your home directory (e.g., via ls -a ~). For the ones that don't exist, copy the versions of these files from /user1/faculty/stanchen/e6870/, e.g.:
cp /user1/faculty/stanchen/e6870/.bash_profile ~
cp /user1/faculty/stanchen/e6870/.bashrc ~
If one or both of these files already exist, manually merge the contents of the version we supply with the existing version.

You can type . ~/.bashrc (or logout and login again) to have these changes take effect.


6. Language-specific setup

Given the surveys you filled out at the first lecture, it sounds like everybody will be doing the labs in either C++, Java, or Matlab. We'll also throw in a little Python support as we'll be using it ourselves to do some scripting.

We will be providing a small C++ library supplying basic input/output routines and some key data structures. To provide access to this library in Java and Python, we are using SWIG to generate the wrapping code necessary for making this happen. This library won't be accessible from Matlab, but Matlab is only suitable for Lab 1, in which case the course library doesn't give you anything that Matlab doesn't already have. Documentation for this library will be given in the labs. Matlab will also be used for visualization purposes.

For speed, Labs 2 and 4 will need to be written in C++ or Java. C++ will be the best-supported language and code examples will be given in C++, as that is the language the instructors know best. This is also the language the majority of speech recognition software is written in.


6.1. C++

We'll be using the GNU C++ compiler, g++. To compile a program, one can use an incantation like
g++ -g -Wall source-files -o output-file -lm
The flag -g signals that debugging information should be included in the executable (see Section 7); the flag -Wall signals to print lots of warnings about questionable programming constructs; and the flag -lm signals that the math library should be linked in, which you'll probably need for all of the labs. The g++ version installed, 3.4.6, is a few years old so it doesn't have the most recently proposed features of the Standard Template Library, but some extra stuff can be found in /usr/include/c++/3.4.6/ext/, e.g., hash_set (which is like unordered_set).

To automate the compilation process, one can use make; see http://mrbook.org/tutorials/make/ for a quick tutorial. Here's a sample Makefile:
CXX = g++
CXXFLAGS = -g -Wall
LDLIBS = -lm

hello : hello.C
        $(CXX) $(CXXFLAGS) hello.C -o hello $(LDLIBS)
Note that the indentation in the last line must be the tab character, not spaces. If placed in the current directory in the file Makefile, one can recompile by typing the command make hello.


6.2. Java

We'll be using the GNU Java compiler, gcj. To compile a program Hello.java, run
gcj -C Hello.java
This will produce the file Hello.class. To run this program, you would then type
java Hello arguments

If you completed Section 5 correctly, your CLASSPATH and LD_LIBRARY_PATH environment variables should be set up correctly to use the course Java libraries located under /user1/faculty/stanchen/e6870/lib/. To test this, try compiling and running the following program Hello.java:
import edu.columbia.asr.*;

class Hello {
    static {
        System.loadLibrary("asr_lib");
    }

    public static void main(String[] args) {
        System.out.println(asr_lib.getG_zeroLogProb());
    }
}


6.3. Matlab

Typing matlab should start up Matlab. If you are running X Windows (and used the -X flag with ssh), you should see the graphical interface; otherwise, the command-line interface will start up (see Section 2). If you can't find Matlab, the full path is /usr/cad/matlab/bin/matlab.

If for some reason you are having trouble with the Matlab license server, another option is to use octave, which is open-source software mimicking Matlab. This uses the same syntax for most things, though its function library is not as extensive.


6.4. Python

If you completed Section 5 correctly, your PATH and PYTHONPATH environment variables should be set up correctly to use the desired version of Python and the course Python libraries. In particular, you should be using the Python from /user1/faculty/stanchen/pub/exec/ (type type python to check) and PYTHONPATH should include the directory /user1/faculty/stanchen/e6870/lib/.

The course Python module is named asr_lib. To test things are working, you can try:
python
from asr_lib import *
print g_zeroLogProb


7. Learning how to debug programs

For those of you who are not robots or cyborgs, you will undoubtedly introduce bugs into your source code at some time or another. One simple way to debug stuff is to place “print” statements everywhere. However, it will make your life much easier in the long run if you learn how to use a debugger. A debugger lets you stop the execution of your program at any point and examine any values you wish, which makes this much more general and powerful than the “print” statement technique.

For C++, the GNU gdb debugger is available. Here are a couple tutorials: http://www.cs.cmu.edu/~gilpin/tutorial/ and http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html; you can also get some documentation within gdb by typing help. To start gdb, type gdb program. Within gdb, type run arguments to start the program. Commands that you should know about include breakpoint, print, cont, step, next, where, up, down, and finish. Don't forget to compile your program with the -g flag (see Section 6.1).

Tip: in the provided C++ library, errors are reported either as assert() failures or as exceptions. It's useful to be able to set breakpoints on these events when debugging. To set a breakpoint on an assert() failure, you can do
b __assert_fail
If you're asked Make breakpoint pending ... library load?, say yes. To set a breakpoint on an exception such as a runtime_error, you can do
b 'std::runtime_error::runtime_error(std::string const&)'

For Java or Matlab, you're on your own.

For future reference, in the interest of preempting debugging questions, here are three procedures you should perform before asking for help in finding a bug: