EE 4830 Spring 2007

Problem Set #6

Due Monday April 23rd 11:59pm as hardcopy at Junfeng's mailbox #K4, optionally *also* submit electronic copy (repoprt + code) to both Lexing (lx21) and Junfeng (jh2700).

50 points +bonus, problem 1 is analytical, problem 2 is experimental.

1) Decision surface of Gaussian variables.
Two pattern classes C1 and C2 have Gaussian probability density function in two dimensions (x1, x2). We observe four data points from each class:

C1:  (0, 0), (0, 2), (2, 0), (2, 2)
C2:  (4, 6), (4, 4), (6, 4), (6, 6)
a) (10pts) Assume Pr(C1)=Pr(C2) =1/2. Obtain the equation of the Bayes decision boundary between C1 and C2, and sketch it on the x1-x2 plane.

b) (8 pts) Does the decision boundary change if Pr(C1) = 0.6, or Pr(C1)=0.2? If yes, sketch the respective new boundary (boundaries).

c) (7 pts) Assume we use Nearest-Neighbor classifier on these eight points. Draw the Voronoi Diagram on the x1-x2 plane, and find the corresponding descision boundary.

2) Gender classification with face images.
Download the downsized rice face image database (in png) from here (the originals are here, at a  resolution of 250x300). Build a classifier to distinguish male faces versus female faces. Here are a few steps you can follow.

2.1) List all image files

% find all face images in the dir
imgdir = 'D:\DIP\rice_elec301_faces\classify_25x30_png';
imgname = dir(fullfile(imgdir, '*.png'));
imgname = {imgname.name};
N = length(imgname);

% extract people's names from file names
for i = 1 : N
j = find(double(imgname{i})<65);
person_name{i} = imgname{i}(1:j(1)-1);
end all_names = unique(person_name); % verify that length(all_names)=29 : 29 unique people in the DB female_names = {'amber', 'amy', 'anita', 'jill', 'joan', 'pooja', 'tulika'};
2.2) (5pts) Read Images. Visualize the resulting set of image vectors (matlab command: imagesc() ).
% read in images, stretch them to a vector, and compile labels
imdim = [25 30];
imvec = zeros(N, prod(imdim));
imlabel = -ones(N, 1);
person_label = -ones(N, 1);
for i = 1 : N
im = imread(fullfile(imgdir, imgname{i}));
imvec(i, :) = im(:);

% label gender: female +1 / male -1 if any(strcmp(person_name{i}, female_names)), imlabel(i) = 1; end % remember which person this image belongs to person_label(i) = strmatch(person_name{i}, all_names, 'exact');
end
2.3) (10pts) Build two baseline classifiers and report the error rates (a) a minimum distance classifier by taking the mean as the prototype for each class. (b) a linear SVM classifiers. Use a SVM pakage of your choice, e.g., libSVM or SVMlight ... the libSVM authors have also provided a matlab version, simply download and add them to your matlab path -- for those using matlab earlier than R14SP3 and have problem buidling the .dll s here is a pre-built version here (for windows).
You can also look at A practical guide to SVM classification for  a headstart in SVM tips and tricks.
pred_label = zeros(N, 1); % a vector for storing the prediction output

for j = 1 :length(all_names) % loop over each person

    person_ind = find(person_label == j); 
    train_ind = setdiff(1:N, person_ind);
    train_vec = imvec(train_ind, :);
    train_label = imlabel(train_ind);

    test_vec = imvec(person_ind, :);
    test_label = imlabel(person_ind);
	
	% implement a minimum distance classifier here
	% measure and report its error rate ...
		
	% train an SVM with linear kernel '-t 0', and use it to classify the test images
	svm_model = svmtrain(train_label, train_vec, '-t 0');
pred_label(person_ind) = svmpredict(test_label, test_vec, svm_model);

end % report your evaluation, see lecture 11 slide #22 for how to compute error rate fprintf('error rate: %f %%\n', ... % optionally list/plot the correctly classified and incorrectly classified images
2.4) (10pts) Improve your results by reducing the error rate. (can be in any one of the three aspects, or others)
	% 1) any data pre-processing techniques (normalization, dimensionality reduction, feature extraction)
	% 2) alternatives to linear SVM ... other SVMs, techniques in SVM training e.g., search for parameters and kernels 
% nerest-neighbor, k-nearest neighbor, etc.
% 3) any post-processing ?

2.5) (5 pts bonus) See how good this model is on new data: take two clean, frontal face images (one male, one female, from the web or your photo collection), tightly crop,convert to grayscale, and resize to 25x30. Use one of the SVM models trained to classify them, and see the result.

2.6) (bonus question, 5 pts, additional points will be given to very insightful answers) Suppose you have another collection of everyday images of people such as those found at http://images.google.com/images?q=group or http://images.google.com/images?q=people . Draw system diagrams to show (a) how you would apply the gender classifier above to find male and female faces in these pictures. (b) if you were to use these pictures as additional training data to improve the classifier, how would you do it. Feel free to use any approach mentioned in class or seen in the literature.

 


Prepared by Lexing Xie < xlx at ee dot columbia dot edu >, 2007-04-16