The goal of this assignment is for you, the student, to implement
basic algorithms for `n`-gram language modeling. This lab will
involve counting `n`-grams and doing basic `n`-gram smoothing.
For this lab, we will be working with *Switchboard* data.
The Switchboard corpus is a collection of recordings of telephone
conversations; participants were told to have a discussion
on one of seventy topics (*e.g.*, pollution, gun control).

The lab consists of the following parts, all of which are required:

*Part 1: Implement*--- Given some text, collect the counts of all`n`-gram counting`n`-grams needed in building a trigram language model.*Part 2: Implement “*--- Write code to compute LM probabilities for a trigram model smoothed with “`+delta`” smoothing`+delta`” smoothing.*Part 3: Implement Witten-Bell smoothing*--- Write code to compute LM probabilities for a trigram model smoothed with Witten-Bell smoothing.*Part 4: Evaluate various*--- See how`n`-gram models on the task of`N`-best list rescoring`n`-gram order and smoothing affects WER when doing`N`-best list rescoring for Switchboard.

All of the files needed for the lab can be found in the
directory `~stanchen/e6884/lab3/`. Before
starting the lab, please read the file `lab3.txt`; this
includes all of the questions you will have to answer while
doing the lab. Questions about the lab can be posted
on Courseworks (`https://courseworks.columbia.edu/`);
a discussion topic will be created for each lab.
*Note:* The hyperlinks in this document
are enclosed in square brackets; you need an online version
of this document to find out where they point to.