HW #6

Due Monday May 5th 11:59pm

1) Decision surface for two squares with Gaussian and nearest neighbor classifiers (30pts)
We observe four data points each for two pattern classes y=1 and y=2 in 2D (x1, x2):

C1:  (0, 0), (0, 1), (1, 0), (1, 1)
C2:  (2, 3), (2, 2), (3, 2), (3, 3)

1.1) (10pts) Assume that the two classes both have have Gaussian probability density function have equal prior, i.e., P(y=1)=P(y=2) =1/2.
Compute the means and covariance matrixes of the two class-conditional distributions p(x|y=1) and p(x|y=2)
Obtain the equation of the Bayes decision boundary between class 1 and class 2, and sketch it on the x1-x2 plane.

1.2) (10pts) Does the decision boundary change if Pr(y=1) = 0.6, or P(y=1)=0.2? If yes, sketch or plot the respective new boundary (boundaries).

1.3) (10pts) Assume we use Nearest-Neighbor classifier on these eight points. Draw the Voronoi Diagram on the x1-x2 plane, and find the corresponding descision boundary.

2) Recognizing hand-written digits (40 pts + 15 pt bonus)

Read, understand and run the classification example hw6_example.m , it involves four implemented steps and two more steps left to you:

%% 1. load the data %%%%%%%%%%%%

%% 2. Specify the training and testing data involved in this task
% .. we're working with four digits [1 3 5 7] only

%% 3. classification with minimum distance classifiers
% this needs mdist_learn.m and mdist_classify.m for learning and applying the minimum distance classifier
% .. this step should finish in a second or two and yield an error rate ~17%

%% 4. 1NN classification with Euclidean metric %%%%%%%%%%
% this needs NN_euclidean.m for performing 1-NN classification
% .. this step should finish in about 2.5 minutes and yield an error rate ~1.5%

Can we do better in classifying digits than (a) the simple minimum distance classifier, or (b) the nearest neighbor ?

2.1) (10 pts) Change the 1-nearest neighbor algorithm into k-Nearest-Neighbor with L3 norm (defined here). Run the classifier with k=3 and k=5, report the classification error rates.

2.2) (10pts) Find out which digits are mis-classfied by 1-NN and k-NN -- display their image and submit them in your writeup. Are the errors reasonable? does the neightbor votes in k-NN correspond to the confidence about a digit?

2.3) (5 pts) Implement one of the following three options, report classification error rate. (1) PCA/KL-transform on the vector, followed by k-NN. (2) Linear perceptron (with netlab)(3) SVM classifier (with libSVM).

2.4) (15 pts) Discuss other possible ways for improving performance, list at least three approaches from what has been covered in class so far, in books, or reference papers + websites. For each proposed approach briefly describe how to realize it technically and why it may be useful for this task.

2.4) (10 bonus points) implement one of your proposed solutions in 2.3 and see if it indeed improves the result.

2.5) 5 more bonus points for the lowest classification error rate obtained among all submissions of question 2.4.

3)  Source codes for letters and words (30pts)

In this problem we look at a language written with half the English alphabet. The designated 13 letters appear with the following probability.

 Letter a b c e g i l m o r s t u Probability 0.1000 0.0500 0.0500 0.1500 0.0500 0.1000 0.0500 0.0500 0.0800 0.0800 0.0800 0.0800 0.0800

3.1) (5pts) Compute the entropy of the language assuming any string of letters is i.i.d.
3.2) (10 pts) Construct a binary Huffman code for these symbols, compute its average code-length.
3.3) (5 pts) Find the arithmatic code for the word "letter".
3.4) (10 pts) The arithmatic code for a five-letter word is 0.45518. Decode this word.

Prepared by Lexing Xie < xlx at ee dot columbia dot edu >, 2008-04-21