Dan Ellis : Resources: Matlab:

Contents

Beat Tracking by Dynamic Programming

This page illustrates the use of the beat_* functions to implement a simple music audio beat tracker based on dynamic programming, as described in: D. Ellis "Beat Tracking by Dynamic Programming" J. New Music Research 36(1): 51-60, March 2007, DOI: 10.1080/09298210701653344.

% First, load an example sound file
wavfilename = 'train01.wav';
[d,sr] = wavread(wavfilename);

% Now, calculate the "onset strength" function as the sum of the
% differentiated and half-rectified energy in a Mel-scale
% time-frequency transform:
[onset_fn, osr, sgram, tt, ff] = beat_onset(d,sr);
% osr returns the sampling rate of the frames of onset_fn
% sgram returns the mel spectrogram as an array, with tt and ff
% being the time and frequency labels.

% The dynamic programming approach needs an estimate of global
% tempo, so we calculate one by autocorrelating the onset function,
% applying a bias window, and choosing the biggest peak.
% Optionally, it plots the windowed autocorrelation for us
subplot(211)
display = 1;
tempo = beat_tempo(onset_fn, osr, display);

% Now we can run the dynamic programming beat tracker
beats = beat_simple(onset_fn, osr, tempo);

% We have some helper functions: one to plot the beat times on top
% of the Mel spectrogram (which we saved from beat_onset)
subplot(212)
beat_plot(beats, '-r', tt, ff, sgram);

% And we can listen to the result; the system-found beats are
% marked by little bursts of white noise superimposed on the
% original:
beat_play(beats, d, sr);

% We can go straight from soundfile to beats with beat_track, which
% just provides a wrapper around the steps above:
beats = beat_track(wavfilename);
Warning: The playback thread did not start within one second. 

Ground truth

We can also read in ground truth for the mirex06 McKinney/Moelants tapping data:

tapfilename = 'train01.txt';
truth = beat_ground_truth(tapfilename);
% By default this only returns the subset of tap records that are
% consistentn with the most popular tempo.  Multiple sequences are
% returned in separate cells of a cell array; we can plot and
% listen to them too:
beat_plot(truth{1}, 'xb');
beat_play(truth{1}, d(1:10*sr), sr);

% We can also plot all the individual ground-truth taps in a
% "scatter" format:
subplot(211)
beat_gt_plot(truth)

Scoring results

We can score a beat track against a ground truth by counting how many true beats are missed (deletions), and how many system-generated beats don't correspond to true beats (insertions):

collar = 0.2; % accept a beat within +/-20% of the tempo period
verbose = 1;
[err,ins,del,tru,hh,dd] = beat_score(beats,truth{1},collar,verbose);
% In this case, only the first beat is 'wrong', because the human
% was late to pick up the beat.
% We can score against all the ..
length(truth)
% .. 35 different tap records for this tempo by passing them all to
% the scoring function:
[err,ins,del,tru,hh,dd] = beat_score(beats,truth,collar,verbose);
% We don't line up with all the human taps, but then no single beat
% track ever could because the humans have too much spread.
Overall error=   3.2% (   1 ins,    1 del,   63 true)

ans =

    40

Overall error=  25.1% ( 330 ins,   89 del, 2279 true)

Testing against a set of examples

We can evaluate the beat tracker against a whole set of examples:

dirname = 'mirex06examples';
files = dir(fullfile(dirname,'*.wav'));
for i = 1:length(files); fnames{i} = fullfile(dirname,files(i).name); end
beat_test(fnames);
% Overall average error rate is high, but it's highly variable
% across examples.
Error for train01 (tempo est=129.2, users=129.3 BPM, 35 true tracks): Overall error=  11.6% ( 159 ins,   84 del, 2130 true)
Error for train02 (tempo est=168.1, users= 83.9 BPM, 26 true tracks): Overall error= 113.3% (1129 ins,   13 del, 1016 true)
Error for train03 (tempo est=101.8, users=153.7 BPM, 26 true tracks): Overall error= 110.1% ( 740 ins, 1291 del, 1851 true)
Error for train04 (tempo est=127.6, users= 42.0 BPM, 28 true tracks): Overall error= 223.6% (1179 ins,    4 del,  538 true)
Error for train05 (tempo est= 68.4, users= 68.5 BPM, 34 true tracks): Overall error=  11.0% (  71 ins,   36 del, 1087 true)
Error for train06 (tempo est=164.1, users= 82.0 BPM, 25 true tracks): Overall error= 164.6% (1128 ins,   22 del,  920 true)
Error for train07 (tempo est=103.4, users= 56.5 BPM, 34 true tracks): Overall error= 143.4% (1035 ins,  214 del,  879 true)
Error for train08 (tempo est=147.7, users=147.8 BPM, 26 true tracks): Overall error=  21.5% ( 238 ins,  155 del, 1815 true)
Error for train09 (tempo est=128.4, users=127.7 BPM, 25 true tracks): Overall error=  17.0% ( 165 ins,   88 del, 1498 true)
Error for train10 (tempo est= 61.2, users= 61.2 BPM, 20 true tracks): Overall error=  10.3% (  41 ins,   15 del,  574 true)
Error for train11 (tempo est=139.2, users=139.9 BPM, 26 true tracks): Overall error=  20.0% ( 202 ins,  133 del, 1725 true)
Error for train12 (tempo est=122.0, users= 54.1 BPM, 27 true tracks): Overall error= 161.4% ( 999 ins,   81 del,  675 true)
Error for train13 (tempo est=119.5, users=181.3 BPM, 30 true tracks): Overall error= 112.3% (1039 ins, 1810 del, 2541 true)
Error for train14 (tempo est=130.0, users=130.7 BPM, 29 true tracks): Overall error=  12.2% ( 126 ins,   94 del, 1824 true)
Error for train15 (tempo est=185.4, users= 62.0 BPM, 28 true tracks): Overall error= 207.2% (1709 ins,    4 del,  831 true)
Error for train16 (tempo est=182.9, users= 90.7 BPM, 20 true tracks): Overall error= 131.3% (1022 ins,   28 del,  806 true)
Error for train17 (tempo est= 93.1, users= 46.1 BPM, 24 true tracks): Overall error= 127.0% ( 611 ins,   66 del,  535 true)
Error for train18 (tempo est=129.6, users= 61.2 BPM, 34 true tracks): Overall error= 151.1% (1320 ins,  177 del,  996 true)
Error for train19 (tempo est= 93.5, users=188.1 BPM, 34 true tracks): Overall error=  63.5% ( 221 ins, 1730 del, 3073 true)
Error for train20 (tempo est=110.0, users=219.8 BPM, 38 true tracks): Overall error=  76.9% ( 560 ins, 2551 del, 4043 true)
Overall average error: 94.5% (13694 ins,  8596 del, 29357 true)

Download

You can download the source code for this demo here.

Acknowledgements

This material was developed for my course ELEN E4896 Music Signal Processing, under partial support from the NSF under project IIS-1117015.

2012-03-28 Dan Ellis dpwe@ee.columbia.edu