Dan Ellis : Music Content Analysis : Practical :

# A Practical Investigation of Singing Detection: 3. Neural Networks

GMM estimates of PDFs are only one of a large number of different available classification schemes. As a comparison, we now look at solving the same task with a neural network (NN), specifically a multi-layer perceptron with a single hidden layer. An NN of this kind is essentially just a complex nonlinear mapping whose parameters can be optimized to match some training data using gradient descent (the so-called back-propagation algorithm). By training it to predict the actual 1/0 singing label, it ends up giving us an estimate of the actual probability that a particular frame corresponds to voice. Netlab again makes the training very easy for us:

```
% Set up the training parameters
options = zeros(1,18);
options(9) = 1;     % Check the gradient calculations
options(14) = 10;   % Number of training cycles
nhid = 5;           % Hidden units in network - analogous to Gauss components
nout = 1;           % Single output is Pr(singing)
alpha = 0.2;        % Controls learning rate - some experimentation needed
ndim = 2;
net = mlp(ndim, nhid, nout, 'logistic', alpha);
% Training is via a generalized optimization routine
net = netopt(net, options, ftrs(:,1:2), labs, 'quasinew');
```

Because we're still only classifying on two dimensions, we can again sample the network output over a range of values and see what we get. We can reuse the grid defined for the GMMs:

```
% Run the net 'forward' on the grid points
nno = mlpfwd(net, [x(:),y(:)]);
nno = reshape(nno, 100, 100);
subplot(221)
imagesc(xx,yy,nno)
axis xy
% Notice how MLP outputs are soft planar intersections
% Compare to GMM likelihood ratio
subplot(222)
imagesc(xx,yy,log(ppS./ppM))
axis xy
% Plot the actual decision regions
subplot(223)
imagesc(xx,yy,nno>0.5);
axis xy
subplot(224)
imagesc(xx,yy,log(ppS./ppM)>0)
axis xy
```

We can calculate the overall accuracy on the training data as before:

```
% Run the net on the training data
nnd = mlpfwd(net, ftrs(:,[1 2]));
% How well does it agree with the labels?
mean( (nnd>0.5) == labs)
ans =
0.6500
% Pretty close to simple GMMs
```

Finally, we can again wrap all this up in a neat parameterized function, trainnns:

```
% Try again with 2 dimensions and 5 hidden units, trained for 10 iterations
net = trainnns(ftrs(:,[1 2]), labs, 5, 10);
Accuracy on training data = 65.5%
Elapsed time = 87.5088 secs
% There's a random element in the training, so results will vary from run to run
```

### Assignment

Try to find neural networks that parallel the complexity (e.g. training time) of the GMMs you investigated before. How do they compare in terms of accuracy?

Last updated: \$Date: 2003/07/02 15:40:30 \$

Dan Ellis <dpwe@ee.columbia.edu>