<< Back to main page

E6820 Assignment 3

For some reason, this page is a lot longer than the previous pages.  So here are links to the appropriate sections:

Reading assignment
Practical assignment
Project

Reading assignment

Paper:  “Construction and evaluation of a robust multifeature speech/music discriminator,” E.
Scheirer and M. Slaney, Proc. ICASSP-97 Munich, 1331-1334.

Summary:
The authors examine many different kinds of features and several kinds of classification systems in order to discriminate between speech and music.  They managed to build a pretty robust system with error rates below 2%.  Interestingly, when the authors tried to add a third category of speech with music, their success rate went down to 65%.

Thoughts:
It's interesting that the authors found that the log of the features improved the fit of the normal distributions.  I wonder why that is, especially since the authors claim it worked for all 13 features.

I wonder how the CPU Time was really calculated.  The authors ran these tests on workstations, which have operating systems.  So, how did they decide what the CPU usage for a particular task was?  If they just took overall CPU usage, they would have included things like the operating system's use of the CPU.  For that matter, even supposing that the operating system broke down the CPU usage, how did it know?  Every time you have a context switch, there's a certain amount of overhead performed on behalf of the process.  Was this counted?  There's also the fact that the cache probably no longer contains stuff of interest to the newly switched in process.  So, initially, the process is going to be spending time waiting during cache misses.  How could any operating system take that into account?

Back to the top

Practical assignment

I wasn't sure where to put the "assignments" at the end of each section.  So, I've put them here.  The diary is here.

Assignments:

I. Data and Features:

Here is my matlab code for the assignment.  It plays just the sung portions of each sample.

II. Gaussian Mixture Models:

Here is the code I wrote to produce the plot below.  As you can see, the accuracy didn't go up much at all as the complexity increased, but the time required for calculations went up almost linearly.  I could have stopped at 4 mfccs quite easily with almost no performance loss.

Gaussian mixture model accurancy and training time.


III. Neural Networks

I again had to graph complexity versus accuracy and training time.  Here is the code I wrote to produce the plot below.  Oddly enough, adding extra mfcc dimensions to the neural network did not seem to improve the neural network's accuracy.  It also did not have a completely predictable effect on the training time.  I must confess that I noticed that the neural network's accuracy varied pretty wildly when I ran it several times with exactly the same parameters.  It seems possible that for this problem, the Gaussian mixture model is a better bet because it takes less time and is more reliable.

Neural network complexity versus accuracy and time


Matlab diary:

[d,sr] = wavread('music/1.wav');
soundsc(d,sr);
[d,sr] = wavread('music/3.wav');
soundsc(d,sr);
[stt,dur,lab] = textread('labels/3.lab', '%f %f %s','commentstyle','shell');
[stt(1:4),dur(1:4)]

ans =

         0    3.3470
    3.3470    1.0540
    4.4010    2.6190
    7.0200    1.2860

lab(1:4)

ans =

    'vox'
    'mus'
    'vox'
    'mus'

ll = zeros(length(lab),1);
ll(strmatch('vox',lab)) = 1;
tt = 0.020:0.020:14.980;
lsamp = labsamplabs(tt,[stt,dur],ll);
subplot(311)
plot(tt,lsamp)
axis([0 15 0 1.1])
subplot(312)
specgram(d,512,sr)
soundsc(d,sr)
soundsc(d((1+0*sr):(3.347*sr)),sr)
Warning: Integer operands are required for colon operator when used as index.
cc = mfcc(d,sr,1/0.020);
size(cc)

ans =

    13   749

subplot(313)
imagesc(cc)
axis xy
frmpersong = 749;
nsong = 60;
nftrs = 3 * 13;
ftrs = zeros(nsong*frmpersong, nftrs);
for i = 1:60;
     [d,sr]=wavread(['music/',num2str(i),'.wav']);
     cc = mfcc(d,sr,1/.020);
     ftrs((i-1)*frmpersong+[1:frmpersong],:) = [cc', deltas(cc)', deltas(deltas(cc,5),5)'];
   end
labs = zeros(nsong*frmpersong, 1);
for i = 1:60;
     [stt,dur,lab] = textread(['labels/',num2str(i),'.lab'], '%f %f %s','commentstyle','shell');
     ll = zeros(length(lab),1);
     ll(strmatch('vox',lab)) = 1;
     lsamp = labsamplabs(tt,[stt,dur],ll);
     labs((i-1)*frmpersong+[1:frmpersong])=lsamp;
   end
size(labs)

ans =

       44940           1

size(ftrs)

ans =

       44940          39

mean(ftrs)

ans =

  Columns 1 through 7

  -14.4471    0.3160   -0.1459   -0.0065   -0.1342   -0.0503   -0.0562

  Columns 8 through 14

   -0.0376   -0.0295   -0.0271   -0.0001   -0.0596   -0.0061   -0.0114

  Columns 15 through 21

   -0.0151    0.0057    0.0040    0.0019   -0.0086    0.0056    0.0122

  Columns 22 through 28

    0.0020    0.0013    0.0013   -0.0006    0.0043    0.0066    0.0025

  Columns 29 through 35

    0.0044    0.0013    0.0004    0.0004    0.0013    0.0017   -0.0024

  Columns 36 through 39

   -0.0001    0.0015    0.0018   -0.0025

mean(labs)

ans =

    0.4740

ddS = ftrs(labs==1,:);
ddM = ftrs(labs==0,:);
subplot(221)
plot(ddM(:,1),ddM(:,2),'.b',ddS(:,1),ddS(:,2),'.r')
subplot(222)
plot(ddS(:,1),ddS(:,2),'.r',ddM(:,1),ddM(:,2),'.b')
ndim = 2;
nmix = 5;
gmS = gmm(ndim,nmix,'diag');
gmM = gmm(ndim,nmix,'diag');
options = foptions;
options(14) = 5;
gmS = gmminit(gmS, ddS(:,1:2), options);
Warning: Maximum number of iterations has been exceeded
gmM = gmminit(gmM, ddM(:,1:2), options);
Warning: Maximum number of iterations has been exceeded
options = zeros(1, 18);
options(14) = 20;
gmS = gmmem(gmS, ddS(:,1:2), options);
Warning: Maximum number of iterations has been exceeded
gmM = gmmem(gmM, ddM(:,1:2), options);
Warning: Maximum number of iterations has been exceeded
xx = linspace(-28,-8);
yy = linspace(-5,5);
[x,y] = meshgrid(xx,yy);
ppS = gmmprob(gmS, [x(:),y(:)]);
ppS = reshape(ppS, 100, 100);
subplot(223)
imagesc(xx,yy,ppS)
axis xy
ppM = gmmprob(gmM, [x(:),y(:)]);
ppM = reshape(ppM, 100, 100);
subplot(224)
imagesc(xx,yy,ppM)
axis xy
subplot(111)
surf(xx,yy,ppM, 0*ppM)
hold on
surf(xx,yy,ppS, 1+0*ppS)
lS = gmmprob(gmS, ftrs(:,[1 2]));
lM = gmmprob(gmM, ftrs(:,[1 2]));
llrSM = log(lS./lM);
mean( (llrSM > 0) == labs)

ans =

    0.6661

[M0,M1] = traingmms(ftrs(:,[1 2]), labs, 5);
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Accuracy on training data = 66.3%
Elapsed time = 2.5 secs
[M0,M1] = traingmms(ftrs(:,[1 2]), labs, 10);
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Warning: Maximum number of iterations has been exceeded
Accuracy on training data = 66.7%
Elapsed time = 4.657 secs
options = zeros(1,18);
options(9) = 1;
options(14) = 10;
nhid = 5;
nout = 1;
alpha = 0.2;
ndim = 2;
net = mlp(ndim, nhid, nout, 'logistic', alpha);
net = netopt(net, options, ftrs(:,1:2), labs, 'quasinew');
Checking gradient ...

   analytic   diffs     delta

  1.0e+003 *

   -0.0286   -0.0286    0.0000
   -0.0007   -0.0007   -0.0000
    0.0010    0.0010   -0.0000
   -0.0000   -0.0000    0.0000
    1.7724    1.7724   -0.0000
    0.0894    0.0894   -0.0000
   -0.0003   -0.0003    0.0000
   -0.0001   -0.0001    0.0000
    0.0001    0.0001    0.0000
   -0.0000   -0.0000    0.0000
    0.0025    0.0025    0.0000
   -0.0001   -0.0001   -0.0000
   -0.1487   -0.1487   -0.0000
   -0.0000   -0.0000   -0.0000
    0.0001    0.0001   -0.0000
    8.0081    8.0081   -0.0000
    8.0114    8.0114   -0.0000
    7.6842    7.6842   -0.0000
   -8.0115   -8.0115    0.0000
    8.0114    8.0114   -0.0000
   -8.0115   -8.0115    0.0000

Warning: Maximum number of iterations has been exceeded in quasinew
nno = mlpfwd(net, [x(:),y(:)]);
nno = reshape(nno, 100, 100);
subplot(221)
imagesc(xx,yy,nno)
axis xy
subplot(222)
imagesc(xx,yy,log(ppS./ppM))
axis xy
subplot(223)
imagesc(xx,yy,nno>0.5);
axis xy
subplot(224
??? subplot(224
               |
Error: ")" expected, "end of line" found.

subplot(224)
imagesc(xx,yy,log(ppS./ppM)>0)
axis xy
nnd = mlpfwd(net, ftrs(:,[1 2]));
mean( (nnd>0.5) == labs)

ans =

    0.6333

net = trainnns(ftrs(:,[1 2]), labs, 5, 10);
Checking gradient ...

   analytic   diffs     delta

  1.0e+003 *

   -0.0003   -0.0003   -0.0000
    0.0001    0.0001    0.0000
   -0.0208   -0.0208    0.0000
   -0.0013   -0.0013   -0.0000
   -0.5552   -0.5552   -0.0000
   -0.0716   -0.0716   -0.0000
    0.0001    0.0001   -0.0000
   -0.0001   -0.0001    0.0000
   -0.1515   -0.1515    0.0000
    0.0308    0.0308    0.0000
   -0.0001   -0.0001   -0.0000
    0.0020    0.0020    0.0000
    0.0526    0.0526   -0.0000
   -0.0002   -0.0002    0.0000
    0.0082    0.0082    0.0000
   -1.0855   -1.0855   -0.0000
   -1.0839   -1.0839   -0.0000
    0.8255    0.8255    0.0000
   -1.0853   -1.0853   -0.0000
   -1.1682   -1.1682    0.0000
   -1.0854   -1.0854   -0.0000

Warning: Maximum number of iterations has been exceeded in quasinew
Accuracy on training data = 66.1%
Elapsed time = 12.172 secs
diary off

Back to the top

Project

Work on the project can be found on my project page here.

Back to the top

Christine Smit

Christine Smit's email address