Dan Ellis : Music Content Analysis : Practical :

A Practical Investigation of Singing Detection:
3. Neural Networks

GMM estimates of PDFs are only one of a large number of different available classification schemes. As a comparison, we now look at solving the same task with a neural network (NN), specifically a multi-layer perceptron with a single hidden layer. An NN of this kind is essentially just a complex nonlinear mapping whose parameters can be optimized to match some training data using gradient descent (the so-called back-propagation algorithm). By training it to predict the actual 1/0 singing label, it ends up giving us an estimate of the actual probability that a particular frame corresponds to voice. Netlab again makes the training very easy for us:

>> % Set up the training parameters
>> options = zeros(1,18);
>> options(9) = 1;     % Check the gradient calculations
>> options(14) = 10;   % Number of training cycles
>> nhid = 5;           % Hidden units in network - analogous to Gauss components
>> nout = 1;           % Single output is Pr(singing)
>> alpha = 0.2;        % Controls learning rate - some experimentation needed
>> ndim = 2;
>> net = mlp(ndim, nhid, nout, 'logistic', alpha);
>> % Training is via a generalized optimization routine
>> net = netopt(net, options, ftrs(:,1:2), labs, 'quasinew');

Because we're still only classifying on two dimensions, we can again sample the network output over a range of values and see what we get. We can reuse the grid defined for the GMMs:

>> % Run the net 'forward' on the grid points
>> nno = mlpfwd(net, [x(:),y(:)]);
>> nno = reshape(nno, 100, 100);
>> subplot(221)
>> imagesc(xx,yy,nno)
>> axis xy
>> % Notice how MLP outputs are soft planar intersections
>> % Compare to GMM likelihood ratio
>> subplot(222)
>> imagesc(xx,yy,log(ppS./ppM))
>> axis xy
>> % Plot the actual decision regions
>> subplot(223)
>> imagesc(xx,yy,nno>0.5);
>> axis xy
>> subplot(224)
>> imagesc(xx,yy,log(ppS./ppM)>0)
>> axis xy
[Comparitive images of decisions by NN and GMM]

We can calculate the overall accuracy on the training data as before:

>> % Run the net on the training data
>> nnd = mlpfwd(net, ftrs(:,[1 2]));
>> % How well does it agree with the labels?
>> mean( (nnd>0.5) == labs)
ans =
>> % Pretty close to simple GMMs

Finally, we can again wrap all this up in a neat parameterized function, trainnns:

>> % Try again with 2 dimensions and 5 hidden units, trained for 10 iterations
>> net = trainnns(ftrs(:,[1 2]), labs, 5, 10);
Accuracy on training data = 65.5%
Elapsed time = 87.5088 secs
>> % There's a random element in the training, so results will vary from run to run


Try to find neural networks that parallel the complexity (e.g. training time) of the GMMs you investigated before. How do they compare in terms of accuracy?

Back: GMMs Top Next: Evaluation

Last updated: $Date: 2003/07/02 15:40:30 $

Dan Ellis <dpwe@ee.columbia.edu>