This document covers general information relevant for completing the programming exercises for the course. See the table of contents for the list of topics covered.
We will be using the ILAB Linux machines for the lab exercises. These machines are located at Mudd 1235 (badge access), and we don't have permission to enter the lab so you will need to log in remotely. To do this, use the program ssh. See http://www.ee.columbia.edu/pages/resources/systems_group/connecting_remotely.html for some basic information. Here are two free ssh implementations: PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) and OpenSSH (http://www.openssh.com/). (Tip: if your backspace key is acting like the delete key with PuTTY, try going to Terminal/Keyboard settings and setting the backspace key to Control-H.)
Instead of using a standalone ssh program, another alternative is to run ssh under X Windows. X Windows is a windowing system that will let you run graphical software like Matlab on the ILAB cluster, while enabling you to see graphical output on your remote machine. In many contexts, ssh will automatically come with X Windows, so there's no need to install it separately.
Most UNIX/Linux machines have X Windows and ssh already installed on them. For Windows machines, one way to get X Windows and ssh is to use Cygwin (free); see http://x.cygwin.com/docs/ug/cygwin-x-ug.html for more information. (Note: Installing Cygwin takes some effort.) In any case, the labs don't require any graphics, so installing X Windows is entirely optional.
In the first class, you should have submitted information that will let us create a computer account for you on the ILAB computer cluster. If you haven't done this, contact one of the professors ASAP to get this process started. When an account has been created, a username and password will be E-mailed to the address that was provided to us.
Once you have a username and password, you can try logging onto a
machine in the ILAB cluster.
The machines are named micro1.ilab.columbia.edu
through micro17.ilab.columbia.edu.
To be able to use graphical programs like Matlab when using X Windows, you
need to use the -X
flag with ssh, e.g.,
ssh -X username@micro17.ilab.columbia.edu |
Warning: untrusted X11 forwarding setup failed: xauth key data not generated Warning: No xauth data; using fake authentication data for X11 forwarding. |
exit |
If many people are using the same machine, the machine may get slow and you may want to consider switching to a different machine. To see how busy a machine is, you can use the command w. This will list all the users on the machine, and the numbers after “load average” are the average number of programs currently running (averaged over the past 1, 5, and 15 minutes).
For those of you who are not familiar with Linux or other variants of UNIX, you need to learn how to use UNIX for basic tasks such as making directories, moving/copying files, redirection, etc., and for editing text files. You're basically on your own for this, but we did some quick Google searches and here are some pointers:
Here are a couple of UNIX tutorials: http://www.ee.surrey.ac.uk/Teaching/Unix/ and http://people.ischool.berkeley.edu/~kevin/unix-tutorial/toc.html. Type UNIX tutorial or some other relevant string into Google for pointers to more material.
To edit text files, if you do not already know a UNIX text editor, one option is to use X Windows as described in Section 2 and run emacs. In this scenario, emacs acts pretty much like a generic text editor.
Otherwise, the two most popular editing tools for UNIX are emacs and vi. The editor vi has weird key mappings, but is simple and compact. The editor emacs is extremely powerful and also has some weird key mappings. For tutorials, see http://jeremy.zawodny.com/emacs/emacs.html or http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html.
If you are not running X Windows; don't already know emacs or vi; and are too lazy to read a real tutorial, here is a 1-minute tutorial for emacs. To edit the file foo, type
emacs foo |
In this section, we discuss the things you need to do to set up your account for this course. By default, you will be assigned the shell bash. If you don't know what I'm talking about, you are probably using bash. If you are using a different shell, then you will have to adjust the commands in this section appropriately, but if you are using a different shell, you should know how.
If your account was newly created, first backup the default .bash_profile and .bashrc files in your home directory:
cp ~/.bash_profile ~/.bash_profile.bak cp ~/.bashrc ~/.bashrc.bak |
cp /user1/faculty/stanchen/e6870/.bash_profile ~ cp /user1/faculty/stanchen/e6870/.bashrc ~ |
You can type . ~/.bashrc (or logout and login again) to have these changes take effect.
Given the surveys you filled out at the first lecture, we will be supporting C++ and Java for the labs. C++ will be the best-supported language as that is the language the instructors know best. This is also the language the majority of speech recognition software is written in. We'll also mention a little about Matlab and Python, in case you want to play around in these languages by yourself.
We will be providing a small C++ library supplying basic input/output routines and some key data structures. To provide access to this library in Java and Python, we are using SWIG to generate the wrapping code necessary for making this happen. (The library isn't accessible from Matlab.) Documentation for this library will be given in the labs.
We'll be using the GNU C++ compiler, g++. To compile a program, one can use an incantation like
g++ -g -Wall source-files -o output-file -lm |
-g
signals that debugging information should
be included in the executable (see Section 7);
the flag -Wall
signals to print lots of warnings about
questionable programming constructs; and
the flag -lm
signals that the math library should be
linked in, which you'll probably need for all of the labs.
We'll be using version 4.7.1 of g++, a recent version
that has many modern features of C++ (e.g., TR1).
To automate the compilation process, one can use make; see http://mrbook.org/tutorials/make/ for a quick tutorial. Here's a sample Makefile:
CXX = g++ CXXFLAGS = -g -Wall LDLIBS = -lm hello : hello.C $(CXX) $(CXXFLAGS) hello.C -o hello $(LDLIBS) |
We'll be using the Java compiler javac. To compile a program Hello.java, run
javac Hello.java |
java Hello arguments |
If you completed Section 5 correctly, your CLASSPATH
and LD_LIBRARY_PATH
environment variables should be set up correctly
to use the course Java libraries located
under /user1/faculty/stanchen/e6870/lib/.
To test this, try compiling and running
the following program Hello.java:
import edu.columbia.asr.*; class Hello { static { System.loadLibrary("asr_lib"); } public static void main(String[] args) { System.out.println(asr_lib.getG_zeroLogProb()); } } |
Typing matlab should start up Matlab. If you are
running X Windows (and used the -X
flag with ssh),
you should see the graphical interface; otherwise, the command-line
interface will start up (see Section 2). If you can't
find Matlab, the full path is /usr/cad/matlab/bin/matlab.
If for some reason you are having trouble with the Matlab license server, another option is to use octave, which is open-source software mimicking Matlab. This uses the same syntax for most things, though its function library is not as extensive.
If you completed Section 5 correctly, your PATH
and PYTHONPATH
environment variables should be set up correctly
to use the desired version of Python and the course Python libraries.
In particular, you should be using the Python from
/user1/faculty/stanchen/pub/exec/
(type type python to check)
and PYTHONPATH
should include the directory /user1/faculty/stanchen/e6870/lib/.
The course Python module is named asr_lib
. To test things
are working, you can try:
python from asr_lib import * print g_zeroLogProb |
For those of you who are not robots or cyborgs, you will undoubtedly introduce bugs into your source code at some time or another. One simple way to debug stuff is to place “print” statements everywhere. However, it will make your life much easier in the long run if you learn how to use a debugger. A debugger lets you stop the execution of your program at any point and examine any values you wish, which makes this much more general and powerful than the “print” statement technique.
For C++, the GNU gdb debugger is available. Here are
a couple tutorials: http://www.cs.cmu.edu/~gilpin/tutorial/ and
http://www.unknownroad.com/rtfm/gdbtut/gdbtoc.html; you
can also get some documentation within gdb by typing
help. To start gdb,
type gdb program. Within gdb,
type run arguments to start the program. Commands
that you should know about include breakpoint,
print, cont, step, next,
where, up, down, and finish.
Don't forget to compile your program with the -g
flag
(see Section 6.1).
Tip: in the provided C++ library, errors are reported either as assert() failures or as exceptions. It's useful to be able to set breakpoints on these events when debugging. To set a breakpoint on an assert() failure, you can do
b __assert_fail |
b __cxa_throw |
For Java or Matlab, you're on your own.
For future reference, in the interest of preempting debugging questions, here are three procedures you should perform before asking for help in finding a bug:
Code review — Carefully read each line of code to make sure it says what you intended it to say. People make a surprising number of essentially typographical mistakes.
Data review — For each variable with a nontrivial lifetime, read the code and make sure the variable is constructed, initialized, updated, and destructed correctly.
Step through the code — Step through each line of code in a debugger, examining variables to make sure they have the values you think they should have. You may need to step through the same code multiple times, to test different situations that may arise.