2. Part 1: Playing With FSM's and the IBM FSM Toolkit

In this part, we introduce the IBM FSM toolkit as a first step in learning more about the static graph expansion process. In particular, you will be using the program FsmOp, which is a utility that can perform a variety of finite-state operations on weighted FSA's and FST's. The program FsmOp is like a calculator that operates on FSM's rather than real values, where arguments are input using [reverse Polish notation] as is used in some [HP calculators]. For example, to produce the composition of an FSA held in the file foo.fsm and an FST held in bar.fsm, you would use the command
FsmOp foo.fsm bar.fsm -compose > result.fsm
Operations begin with the “-” character, and include -compose, -determinize, and -minimize among many others. By default, the resulting FSM is written to standard output, but if the last argument supplied is a filename (rather than an operation), the resulting FSM will be written to that file instead. Thus, the command
FsmOp foo.fsm bar.fsm -compose result.fsm
has the same effect as the last example.

We explain the default FSA format through an example:
1   2   foo
1   2   bar
2   3   moo
3
Each line with three fields describes an arc in the FSM; the format is: src-state dst-state label. States need not be numbered starting from 1; the label <epsilon> is used to represent the empty label. Each line with a single field lists a final state. The first state mentioned in the file is the start state. Thus, the above FSA file corresponds to:

We drew the above Postscript diagram using the following command:
FsmOp foo.fsm -draw | dot -Tps > foo.ps
You can use similar commands to help you visualize FSM's.

For weighted FSA's, each line can optionally be followed with a cost, or negative log probability base 10. If a cost is omitted on a line, it is taken to be zero. For example, here is a weighted FSA:
1   2   foo
1   2   bar 2.4
2   3   <epsilon>
2   1.2
3
corresponding to

Finite-state transducers have a similar format, except lines representing arcs have an extra field:
src-state dst-state in-label out-label [optional-cost]
In addition, to signal that a file holds an FST rather than an FSA, the following line should be included at the start of the file:
# transducer: true
Here is an example FST:
# transducer: true
1   2   ax  AX
1   2   bar BAR 1.0
2   3   moo <epsilon>
2   1.2
3

For this part of the lab, you will have to create various FSM's and perform operations on them, as described in lab4.txt. Here are some hints:

To prepare for this part, create the relevant subdirectory and copy over the needed files:
mkdir -p ~/e6884/lab4/
cd ~/e6884/lab4/
cp ~stanchen/e6884/lab4/copy/* .
cp ~stanchen/e6884/lab4/.mk_chain .
This will also copy over all files needed in later parts of the lab.