2. Gaussian Mixture Models

(If you didn't go through the steps in part 1,
you can download this file: ftrslabs.mat, then
load it into Matlab with `load ftrslabs`, to set up the `ftrs`
and `labs` variables).

Now we have our training set nicely prepared in terms of feature vectors
(the rows of `ftrs`) and their corresponding labels as singing or
not (the values of `labs`, one for each row of `ftrs`, where
1 indicates singing). We're ready to build a classifier!

If we knew the probability distribution functions (PDFs) of the signal
features under the two possible conditions, singing or no singing, we
could classify a new test feature vector according to the *likelihood
ratio* - the ratio of the values of the PDFs for the two classes at
that point. In order to do this, we are going to estimate PDFs from the
individual samples present in our training data. To get an idea of what
we're trying to do, we'll first look at the data in just 2 dimensions
(so we can plot it), but in practice we can use all 13 cepstra, or all
39 dimensions including the deltas and double-deltas.

We will be using the Netlab functions for this part, so you will need to make sure that it is downloaded and installed on your machine, and that Matlab knows where to find it. You'll need to type something like this (substituting in the appropriate local path):

path('/homes/dpwe/matlab/netlab', path);

Now let's compare the distribution of the first two cepstra for all the frames labeled as sung versus all the others:

% Sort training data into sung and unsung frames ddS = ftrs(labs==1,:); ddM = ftrs(labs==0,:); % Compare scatter plots of 1st 2 dimensions % First with sung frames (red dots) in front: subplot(221) plot(ddM(:,1),ddM(:,2),'.b',ddS(:,1),ddS(:,2),'.r') % then with unsung frames (blue dots) in front: subplot(222) plot(ddS(:,1),ddS(:,2),'.r',ddM(:,1),ddM(:,2),'.b') % Heavily overlapped, but some difference...

Now we estimate a continuous PDF over these two datasets by fitting a weighted set of 2-dimensional Gaussians - a Gaussian Mixture Model or GMM. Using the standard Netlab procedure, we first initialize the Gaussian means using k-means clustering, then refine the models using Expectation-Maximization. Netlab makes this all pretty straightforward:

% Define how many dimensions we are going to use, and how many Gaussian components in our mixture ndim = 2; nmix = 5; % Initialize diagonal-covariance models for both PDFs to be estimated gmS = gmm(ndim,nmix,'diag'); % Singing GMM gmM = gmm(ndim,nmix,'diag'); % Music GMM options = foptions; % default optimization options options(14) = 5; % 5 iterations of k-means in initialization gmS = gmminit(gmS, ddS(:,1:2), options); % Initialize from sung data Warning: Maximum number of iterations has been exceeded gmM = gmminit(gmM, ddM(:,1:2), options); % Initialize from unsung data Warning: Maximum number of iterations has been exceeded % Now full EM re-estimation: options = zeros(1, 18); options(14) = 20; % Number of iterations gmS = gmmem(gmS, ddS(:,1:2), options); Warning: Maximum number of iterations has been exceeded gmM = gmmem(gmM, ddM(:,1:2), options); Warning: Maximum number of iterations has been exceeded

To see what we have got, we need to sample the PDFs defined by the GMMs across a regular grid of values, then plot the result in 2D or 3D:

% Define sample points across the range in question xx = linspace(-28,-8); yy = linspace(-5,5); [x,y] = meshgrid(xx,yy); % x and y are 100x100 arrays defining the grid sample points. % We'll 'unravel' them to 10,000 point vectors, evaluate the probability at each point, % then re-ravel the result to 100x100: ppS = gmmprob(gmS, [x(:),y(:)]); ppS = reshape(ppS, 100, 100); subplot(223) imagesc(xx,yy,ppS) axis xy % Repeat for the unsung (music) model: ppM = gmmprob(gmM, [x(:),y(:)]); ppM = reshape(ppM, 100, 100); subplot(224) imagesc(xx,yy,ppM) axis xy % You can see the principal Gaussian peaks, also more differences between the two PDFs

Assuming equal prior likelihoods for each class, the decision rule is that points in feature space for which one PDF is larger are classified as belonging to that class. Thus, we can visualize the decision rule by plotting the two surfaces on top of one another in 3D:

% 3D plot in full window subplot(111) surf(xx,yy,ppM, 0*ppM) % 4th argument to surf sets color for each point - make them all 0 % Plot sung-feature PDF GMM on same display, with colorvalue = 1 hold on surf(xx,yy,ppS, 1+0*ppS) % Where red surface pokes through, classification will be as sung frames

Now we need to see how these PDF estimate fare as classifiers. One quick thing we can do is try applying the classifier to the training data itself. This will be optimistic, since in general new test data won't look exactly like the training data, but it at least lets us confirm that the classifier is doing something like we expect. We calculate the log of the ratio of the likelihoods under each model for all the training frames, then see how many of the frames labeled "singing" have log-ratios greater than zero, and vice-versa:

% Figure likelihoods of all training data under each model lS = gmmprob(gmS, ftrs(:,[1 2])); lM = gmmprob(gmM, ftrs(:,[1 2])); % Hence, log-likelihood ratio llrSM = log(lS./lM); % If we call LLR > 0 "sung", how many do we get right? mean( (llrSM > 0) == labs) ans = 0.6676 % 66.7% - well, it's better than guessing (~50%)

Finally, to simplify matters, we can put all these stages into a function, traingmms, that takes the number of Gaussian mixture components, and the data dimensions to use, as input parameters, then trains the models and evaluates them on the test data:

% Redo 5 mixtures on 1st two dimensions [M0,M1] = traingmms(ftrs(:,[1 2]), labs, 5); Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Accuracy on training data = 66.1% Elapsed time = 36.7439 secs % How about 10 mixtures? [M0,M1] = traingmms(ftrs(:,[1 2]), labs, 10); Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Warning: Maximum number of iterations has been exceeded Accuracy on training data = 66.8% Elapsed time = 73.1332 secs % Minimal accuracy gain for extra complexity/compute time

Investigate the classifier accuracy as the number of Gaussian mixture components and/or the number of feature dimensions is varied. You can even vary the amount of training data used by subselecting rows in `ftrs` and `labs`. Can you make a plot of accuracy versus system complexity? How about versus training time?

Back: Data and features | Top | Next: Neural nets |

Last updated: $Date: 2003/07/02 15:40:03 $

Dan Ellis <dpwe@ee.columbia.edu>