LabROSA : Projects :

[timbre-chroma image]

Artist Identification of Music Audio
by Timbral and Chroma Features in Matlab


For information about the artist20 dataset, see the companion Artist ID page.

Music audio classification has most often been addressed by modeling the statistics of broad spectral features, which, by design, exclude pitch information and reflect mainly instrumentation. We investigate using instead beat-synchronous chroma features, designed to reflect melodic and harmonic content and be invariant to instrumentation. Chroma features are less informative for classes such as artist, but contain information that is almost entirely independent of the spectral features, and hence the two can be profitably combined: Using a simple Gaussian classifiier on an 18-way pop music artist identification task, we achieve 48% accuracy with MFCCs, 25% with 4-frame chroma vectors, and 52% by combining the two.


Data

These experiments are performed over the 18 artist subset of uspop2002, for which precalculated MFCCs (and some chroma features) are freely available. This artist set was used in Mandel & Ellis 2005 and Mandel, Poliner & Ellis 2006. The list files are available in the mandelset directory (which is also included in the timbrechroma.tgz package). The routines assume the MFCC .htk data files and the beat-chroma .mat files are in sibling directories with standard path naming.

Although the code and examples on this page will allow you to run this code on your own datasets, you can also download the precalculated features for both MFCCs (mfccs.tgz, 1.4GB) and beat-chroma matrices (chromfeats.tgz, 114MB). Note that a 1.4GB file is a significant download that may take some time.

We have now developed and released an improved artist identification dataset, artist20. This consists of 1412 tracks, drawn from 6 albums by each of 20 artists. Follow the link for instructions on how to obtain those data in various formats.


Code

See also the separate pages on Chroma Feature Analysis and Synthesis, and on MFCC calculation that give more detail about the feature representations used here.

You can download all the code in the timbrechroma.tgz package.

Main routines

Subsidiary functions

Example Usage

Here's an example of calculating MFCC features for a set of tracks, then building full-covariance single Gaussian models for all the training tracks that share a common label, then classifying each of the test tracks to one of those models, and scoring against a given ground truth.

The code relies on two list files, tracks-train.txt and tracks-test.txt, each of which specifies a set of tracks (audio files, one per line) used for training and testing the models, respectively. The code prefixes labels- to these filenames (making in this case labels-tracks-train.txt and labels-tracks-test.txt) to read files with the same number of lines but just one token per line, which is taken as the ground truth label associated with that track.

>> % Make sure the MFCC calculation code is in scope
>> addpath('rastamat'); % from http://labrosa.ee.columbia.edu/matlab/rastamat/
>> % Make sure the machine learning routines are available
>> addpath('netlab'); % from http://www.ncrg.aston.ac.uk/netlab/
>> addpath('KPMtools'); % from http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html
>> addpath('KPMstats'); % .. or get the whole package from http://bnt.sourceforge.net/
>> % Calculate MFCC features for the training tracks
>> tic; tks = calc_mfccs_list('tracks-train.txt','../mp3s/','.mp3','../mfccs/','.htk'); toc
>> % .. and for the test tracks
>> tic; tks = calc_mfccs_list('tracks-test.txt','../mp3s/','.mp3','../mfccs/','.htk'); toc
>> % Do an experiment: single Gaussian trained on 1000 frames using MFCCs 2:20, verbose
>> tic; [a1,c1,l1,m1] = do_expt('tracks-train.txt','tracks-test.txt',1,1000,2:20,1); toc
>> % ..reports accuracy at the end
Features = mfcc
training for aerosmith ...
...
training for u2 ...
** Matching by max l/hood of test samples
testing aerosmith/A_Little_South_Of_Sanity_Explicit_Disc_2_/01-Back_In_The_Saddle...
...
testing u2/The_Unforgettable_Fire/10-MLK...
Classification accuracy = 52.1064%
Elapsed time is 151.350983 seconds.

Doing the same but with chroma features would be like this:

>> % Make sure the chroma calculation code is in scope
>> addpath('coversongID'); % from http://labrosa.ee.columbia.edu/projects/coversongs/
>> % Calculate chroma features for the training tracks
>> tic; tks = calclistftrs('tracks-train.txt','../mp3s/','.mp3','../chromftrs/','.mat'); toc
>> % .. and for the test tracks
>> tic; tks = calclistftrs('tracks-test.txt','../mp3s/','.mp3','../chromftrs/','.mat'); toc
>> % Do an experiment: 64 mix GMM trained on 1000 frames using 1 temporal frame with key normalization, verbose
>> tic; [a2,c2,l2,m2] = do_expt_chroma('tracks-train.txt','tracks-test.txt',64,1000,1,1,1); toc
training for aerosmith ...
...
training for u2 ...
testing aerosmith/A_Little_South_Of_Sanity_Explicit_Disc_2_/01-Back_In_The_Saddle...
...
testing u2/The_Unforgettable_Fire/10-MLK...
Classification accuracy = 24.612%
Elapsed time is 1413.342648 seconds.

Chroma features are not great by themselves, but the likelihoods from those models combine profitably with those from MFCC:

>> % Build the ground-truth index vector
>> test_labs = listfileread('labels-tracks-test.txt');
>> ulb = unique(test_labs);
>> for i = 1:length(test_labs); gt(i) = find(strcmp(ulb,test_labs{i})); end
>> % score_lhoods runs just the scoring part of do_expt; run on the log-likelihood matrix from the MFCC experiment
>> score_lhoods(l1,gt);
Classification accuracy = 52.1064%
>> % .. as before.  Now add in weighted log-likelihoods from the chroma experiment
>> score_lhoods(l1+.75*l2,gt);                                    
Classification accuracy = 54.7672%
>> % .. it helps (a little!)

Papers

D. Ellis (2007). Classifying Music Audio with Timbral and Chroma Features
submitted to Int. Conf. on Music Info. Retrieval ISMIR-07, Vienna, Sep. 2007.

Acknowledgment

This material is based in part upon work supported by the National Science Foundation under Grant Nos. IIS-0238301 and IIS-07133334. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

This work was also supported by the Columbia Academic Quality Fund.


Valid HTML 4.0! Last updated: $Date: 2007/05/01 14:44:48 $
Dan Ellis <[email protected]>