Dan Ellis : Music Content Analysis : Practical :

A Practical Investigation of Singing Detection:
3. Neural Networks

GMM estimates of PDFs are only one of a large number of different available classification schemes. As a comparison, we now look at solving the same task with a neural network (NN), specifically a multi-layer perceptron with a single hidden layer. An NN of this kind is essentially just a complex nonlinear mapping whose parameters can be optimized to match some training data using gradient descent (the so-called back-propagation algorithm). By training it to predict the actual 1/0 singing label, it ends up giving us an estimate of the actual probability that a particular frame corresponds to voice. Netlab again makes the training very easy for us:


 % Set up the training parameters
 options = zeros(1,18);
 options(9) = 1;     % Check the gradient calculations
 options(14) = 10;   % Number of training cycles
 nhid = 5;           % Hidden units in network - analogous to Gauss components
 nout = 1;           % Single output is Pr(singing)
 alpha = 0.2;        % Controls learning rate - some experimentation needed
 ndim = 2;
 net = mlp(ndim, nhid, nout, 'logistic', alpha);
 % Training is via a generalized optimization routine
 net = netopt(net, options, ftrs(:,1:2), labs, 'quasinew');

Because we're still only classifying on two dimensions, we can again sample the network output over a range of values and see what we get. We can reuse the grid defined for the GMMs:


 % Run the net 'forward' on the grid points
 nno = mlpfwd(net, [x(:),y(:)]);
 nno = reshape(nno, 100, 100);
 subplot(221)
 imagesc(xx,yy,nno)
 axis xy
 % Notice how MLP outputs are soft planar intersections
 % Compare to GMM likelihood ratio
 subplot(222)
 imagesc(xx,yy,log(ppS./ppM))
 axis xy
 % Plot the actual decision regions
 subplot(223)
 imagesc(xx,yy,nno>0.5);
 axis xy
 subplot(224)
 imagesc(xx,yy,log(ppS./ppM)>0)
 axis xy

[Comparitive images of decisions by NN and GMM]

We can calculate the overall accuracy on the training data as before:


 % Run the net on the training data
 nnd = mlpfwd(net, ftrs(:,[1 2]));
 % How well does it agree with the labels?
 mean( (nnd>0.5) == labs)
ans =
    0.6500
 % Pretty close to simple GMMs

Finally, we can again wrap all this up in a neat parameterized function, trainnns:


 % Try again with 2 dimensions and 5 hidden units, trained for 10 iterations
 net = trainnns(ftrs(:,[1 2]), labs, 5, 10);
Accuracy on training data = 65.5%
Elapsed time = 87.5088 secs
 % There's a random element in the training, so results will vary from run to run

Assignment

Try to find neural networks that parallel the complexity (e.g. training time) of the GMMs you investigated before. How do they compare in terms of accuracy?

Back: GMMs

Top

Next: Evaluation

Last updated: $Date: 2003/07/02 15:40:30 $

Dan Ellis <[email protected]>

A Practical Investigation of Singing Detection: 3. Neural Networks

Assignment

A Practical Investigation of Singing Detection:
3. Neural Networks