Dan Ellis : Resources: Matlab:

beatles_fprint - fingerprint-based alignment of labels to Beatles audio

This package provides a set of routines and a precomputed database that will automatically modify the Beatles annotations available from http://isophonics.net/ to local versions of the relevant audio.

Because of variability in the mastering process, classic albums that have multiple digital masterings will end up with multiple, slightly different, digital versions. Not only may there be different silent gaps at the beginning of each track, but even the timing within the track will differ (thanks, e.g., to imperfections in the tape running speed, or stretching of the master tapes). In our experience, differences of 0.1% or larger are quite common; this is entirely imperceptible, but labels made on the basis of one version of the track will be off by entire beats by the end of a typical track of several hundred seconds. Hence the need for a tool that estimates these deviations, and modifies the original label files to match the locally-available audio.

The routines here run an audio fingerprint on any input track, then identify it within the 180 track Beatles cannon (in the provided database). The fingerprint is robust to small timing changes, however, by looking at the trend of timing matches throughout the track, we can estimate a timing offset and scaling that maximizes agreement between the input audio and the reference landmarks.

These values are reported. They can also be applied to label files that were based on the reference audio to construct a modified label file whose labels are the same, but whose timings are matched to the local audio.

Contents

Example usage

Passing a waveform as a vector plus a sampling rate will perform the fingerprint matching and timing estimation. A third argument set to one causes it to plot the scatter of matching landmarks, highlighting the linear fit used to estimate the slope.

FN = 'come_together.wav';
[d,sr] = wavread(FN);
doplot = 1;
[T,O,S] = find_beatles_timeskew(d,sr,doplot);
% T returns the track name; O returns the offset (in sec), and S
% returns the timing slope (ratio)
% Zoom in on the matching part of the matching fingerprint scatter
% plot to show the slight drift in time difference
axis([0 260 0.5 1.5])
Track <input data> matched as The Beatles/11_-_Abbey_Road/01_-_Come_Together (0.23636 hash matches)
Best match for time T sec in reference track is -0.92948+0.99891*T in input track

Modifying Label Files

You can rewrite label files to be modified using the parameters returned by find_beatles_timeskew. Here, I have separately installed the isophonics beatles data in the directory beatles/ .

ref_labeldir = 'annotations/chordlab';
dst_labeldir = 'scratch';
skew_labelfile(T, ref_labeldir, dst_labeldir, O, S);
% This reads a label file with the stem T from the ref_labeldir,
% and writes a modified version with the same name to dst_labeldir.
wrote scratch/The Beatles/11_-_Abbey_Road/01_-_Come_Together.lab

Modifying Audio Files

In August 2013, we had a mini-crisis when we tried to use isophonics segmentation files and got terrible results. It turns out we were using the original Isophonics label files in combination with audio files that were being used with "isophonics" chord labels, which were in fact isophonics chord labels that had been aligned to new audio using this package. This convinced me that modifying label files - and thus proliferating versions of label files, which are identical except for their times - is a mistake. Instead, I think we should try to make the label files unique and canonical, then work on fixing the audio files to align to the labels. The argument here is that there are already multiple versions of the audio, so we're not making things that much worse by adding even more. So, my recommendation is not to rewrite label files, but to use this tool to resample your audio. Like this:

outaudio = 'come_together_aligned.wav';
skew_audiofiles(FN, outaudio, O, S);
% Or, you can write directly from the alignment to a file named
% according to the fingerprint match:
outputdir = 'alignedaudio';
[T,O,S] = find_beatles_timeskew(FN,0,0,outputdir);
% This aligned audio is now well-aligned
doplot = 1;
find_beatles_timeskew(fullfile(outputdir, [T, '.wav']),0,doplot);
axis([0 260 -0.5 0.5])
Wrote skewed audio to come_together_aligned.wav
wrote come_together_aligned.wav
Track come_together.wav matched as The Beatles/11_-_Abbey_Road/01_-_Come_Together (0.23681 hash matches)
Best match for time T sec in reference track is -0.92950+0.99891*T in input track
Wrote skewed audio to alignedaudio/The Beatles/11_-_Abbey_Road/01_-_Come_Together.wav
Track alignedaudio/The Beatles/11_-_Abbey_Road/01_-_Come_Together.wav matched as The Beatles/11_-_Abbey_Road/01_-_Come_Together (0.32581 hash matches)
Best match for time T sec in reference track is -0.00010+1.00000*T in input track

Bulk usage for Labels

% Passing a filename will read an audio file (using my audioread
% package, available at
% http://www.ee.columbia.edu/~dpwe/resources/matlab/audioread/ ):
[T,O,S] = find_beatles_timeskew(FN);

% Passing in a cell array of filenames will return a cell array of
% matched track names, and vectors of offsets and slopes.
TT = myls('beatles/mp3s-32k/Abbey_Road/*.mp3');
[T,O,S] = find_beatles_timeskew(TT);
% You can also provide an <outputdir> to write a whole batch of
% aligned output audio files.

% To modify a whole set of labels (if that's really what you want
% to do - see above) then you can rewrite the label files in a
% single stroke by passing a cell array of names, and vectors of
% offsets and slopes:
skew_labelfile(T, ref_labeldir, dst_labeldir, O, S);
Track come_together.wav matched as The Beatles/11_-_Abbey_Road/01_-_Come_Together (0.23681 hash matches)
Best match for time T sec in reference track is -0.92950+0.99891*T in input track
Track beatles/mp3s-32k/Abbey_Road/01-Come_Together.mp3 matched as The Beatles/11_-_Abbey_Road/01_-_Come_Together (0.1174 hash matches)
Best match for time T sec in reference track is 1.58036+1.00031*T in input track
Track beatles/mp3s-32k/Abbey_Road/02-Something.mp3 matched as The Beatles/11_-_Abbey_Road/02_-_Something (0.10397 hash matches)
Best match for time T sec in reference track is 0.92896+0.99972*T in input track
Track beatles/mp3s-32k/Abbey_Road/03-Maxwell_s_Silver_Hammer.mp3 matched as The Beatles/11_-_Abbey_Road/03_-_Maxwell's_Silver_Hammer (0.11016 hash matches)
Best match for time T sec in reference track is 0.90552+1.00019*T in input track
Track beatles/mp3s-32k/Abbey_Road/04-Oh_Darling.mp3 matched as The Beatles/11_-_Abbey_Road/04_-_Oh!_Darling (0.11215 hash matches)
Best match for time T sec in reference track is 0.72737+1.00019*T in input track
Track beatles/mp3s-32k/Abbey_Road/05-Octopus_s_Garden.mp3 matched as The Beatles/11_-_Abbey_Road/05_-_Octopus's_Garden (0.091792 hash matches)
Best match for time T sec in reference track is 0.42004+0.99981*T in input track
Track beatles/mp3s-32k/Abbey_Road/06-I_Want_You_She_s_So_Heavy_.mp3 matched as The Beatles/11_-_Abbey_Road/06_-_I_Want_You (0.24647 hash matches)
Best match for time T sec in reference track is 0.99069+0.99991*T in input track
Track beatles/mp3s-32k/Abbey_Road/07-Here_Comes_The_Sun.mp3 matched as The Beatles/11_-_Abbey_Road/07_-_Here_Comes_The_Sun (0.042691 hash matches)
Best match for time T sec in reference track is 1.17460+0.99974*T in input track
Track beatles/mp3s-32k/Abbey_Road/08-Because.mp3 matched as The Beatles/11_-_Abbey_Road/08_-_Because (0.094481 hash matches)
Best match for time T sec in reference track is 0.54610+1.00010*T in input track
Track beatles/mp3s-32k/Abbey_Road/09-You_Never_Give_Me_Your_Money.mp3 matched as The Beatles/11_-_Abbey_Road/09_-_You_Never_Give_Me_Your_Money (0.11821 hash matches)
Best match for time T sec in reference track is 1.17793+1.00031*T in input track
Track beatles/mp3s-32k/Abbey_Road/10-Sun_King.mp3 matched as The Beatles/11_-_Abbey_Road/10_-_Sun_King (0.095495 hash matches)
Best match for time T sec in reference track is 5.66477+1.00015*T in input track
Track beatles/mp3s-32k/Abbey_Road/11-Mean_Mr_Mustard.mp3 matched as The Beatles/11_-_Abbey_Road/11_-_Mean_Mr_Mustard (0.11119 hash matches)
Best match for time T sec in reference track is 0.56729+1.00024*T in input track
Track beatles/mp3s-32k/Abbey_Road/12-Polythene_Pam.mp3 matched as The Beatles/11_-_Abbey_Road/12_-_Polythene_Pam (0.049852 hash matches)
Best match for time T sec in reference track is 0.33060+1.00024*T in input track
Track beatles/mp3s-32k/Abbey_Road/13-She_Came_In_Through_The_Bathroom_Window.mp3 matched as The Beatles/11_-_Abbey_Road/13_-_She_Came_In_Through_The_Bathroom_Window (0.097372 hash matches)
Best match for time T sec in reference track is 24.80262+1.00030*T in input track
Track beatles/mp3s-32k/Abbey_Road/14-Golden_Slumbers.mp3 matched as The Beatles/11_-_Abbey_Road/14_-_Golden_Slumbers (0.088551 hash matches)
Best match for time T sec in reference track is 0.58838+0.99981*T in input track
Track beatles/mp3s-32k/Abbey_Road/15-Carry_That_Weight.mp3 matched as The Beatles/11_-_Abbey_Road/15_-_Carry_That_Weight (0.10436 hash matches)
Best match for time T sec in reference track is 0.59553+0.99995*T in input track
Track beatles/mp3s-32k/Abbey_Road/16-The_End.mp3 matched as The Beatles/11_-_Abbey_Road/16_-_The_End (0.093376 hash matches)
Best match for time T sec in reference track is 1.04492+1.00035*T in input track
Track beatles/mp3s-32k/Abbey_Road/17-Her_Majesty.mp3 matched as The Beatles/11_-_Abbey_Road/17_-_Her_Majesty (0.080092 hash matches)
Best match for time T sec in reference track is 1.01315+1.00042*T in input track
wrote scratch/The Beatles/11_-_Abbey_Road/01_-_Come_Together.lab
wrote scratch/The Beatles/11_-_Abbey_Road/02_-_Something.lab
wrote scratch/The Beatles/11_-_Abbey_Road/03_-_Maxwell's_Silver_Hammer.lab
wrote scratch/The Beatles/11_-_Abbey_Road/04_-_Oh!_Darling.lab
wrote scratch/The Beatles/11_-_Abbey_Road/05_-_Octopus's_Garden.lab
wrote scratch/The Beatles/11_-_Abbey_Road/06_-_I_Want_You.lab
wrote scratch/The Beatles/11_-_Abbey_Road/07_-_Here_Comes_The_Sun.lab
wrote scratch/The Beatles/11_-_Abbey_Road/08_-_Because.lab
wrote scratch/The Beatles/11_-_Abbey_Road/09_-_You_Never_Give_Me_Your_Money.lab
wrote scratch/The Beatles/11_-_Abbey_Road/10_-_Sun_King.lab
wrote scratch/The Beatles/11_-_Abbey_Road/11_-_Mean_Mr_Mustard.lab
wrote scratch/The Beatles/11_-_Abbey_Road/12_-_Polythene_Pam.lab
wrote scratch/The Beatles/11_-_Abbey_Road/13_-_She_Came_In_Through_The_Bathroom_Window.lab
wrote scratch/The Beatles/11_-_Abbey_Road/14_-_Golden_Slumbers.lab
wrote scratch/The Beatles/11_-_Abbey_Road/15_-_Carry_That_Weight.lab
wrote scratch/The Beatles/11_-_Abbey_Road/16_-_The_End.lab
wrote scratch/The Beatles/11_-_Abbey_Road/17_-_Her_Majesty.lab

Working with beat files

Colin Raffel modified this code to work with the beat annotation files instead of the chord annotations. The changes were all in skew_labelfile, and I include his modified version as skew_beatfile. You should be able to run this directly on directories of beat annotations from isophonics.net to adjust their timings to match your audio.

Thanks to Colin for sharing these changes.

Installation

You can download a zip file containing the Matlab scripts and the hash table database from beatles_fprint.zip.

This includes the script rebuild_ref which can be modified and used to rebuild the reference hash table for new audio (or for any other purpose).

% Changes

% 2013-08-30 - Modified matching ordering to be normalized by
%              number of hashes for each track in hash table.
%            - Added provisions for rewriting resampled audio.

% Last updated: $Date: 2013/08/30 23:28:57 $
% Dan Ellis <[email protected]>