LabROSA : Projects :

[timbre-chroma image]

Artist Identification of Music Audio
by Timbral and Chroma Features in Matlab

For information about the artist20 dataset, see the companion Artist ID page.

Music audio classification has most often been addressed by modeling the statistics of broad spectral features, which, by design, exclude pitch information and reflect mainly instrumentation. We investigate using instead beat-synchronous chroma features, designed to reflect melodic and harmonic content and be invariant to instrumentation. Chroma features are less informative for classes such as artist, but contain information that is almost entirely independent of the spectral features, and hence the two can be profitably combined: Using a simple Gaussian classifiier on an 18-way pop music artist identification task, we achieve 48% accuracy with MFCCs, 25% with 4-frame chroma vectors, and 52% by combining the two.

Data

These experiments are performed over the 18 artist subset of uspop2002, for which precalculated MFCCs (and some chroma features) are freely available. This artist set was used in Mandel & Ellis 2005 and Mandel, Poliner & Ellis 2006 . The list files are available in the mandelset directory (which is also included in the timbrechroma.tgz package). The routines assume the MFCC .htk data files and the beat-chroma .mat files are in sibling directories with standard path naming.

Although the code and examples on this page will allow you to run this code on your own datasets, you can also download the precalculated features for both MFCCs (mfccs.tgz, 1.4GB) and beat-chroma matrices (chromfeats.tgz, 114MB). Note that a 1.4GB file is a significant download that may take some time.

We have now developed and released an improved artist identification dataset, artist20. This consists of 1412 tracks, drawn from 6 albums by each of 20 artists. Follow the link for instructions on how to obtain those data in various formats.

Code

See also the separate pages on Chroma Feature Analysis and Synthesis, and on MFCC calculation that give more detail about the feature representations used here.

You can download all the code in the timbrechroma.tgz package.

Main routines

[a,c,l,m] = do_expt(trnlist, tstlist, ngmm, nfrm, dims, verb) - Performs a classification experiment using precalculated MFCC features (from ../artists/).
trnlist is the name of an ascii file listing the training tracks; "labels-" is prepended to that name to read the list of corresponding labels.
tstlist gives the test track list, with a corresponding ground truth label list for scoring.
ngmm specifies the number of Gaussian components to use in each model (diagonal covariance unless ngmm=1, in which case full covariance is used).
nfrm is the (maximum) number of randomly-selected frames to use in training the GMM, since using all the frames is usually too many and unnecessary.
dims is a vector of indices into the basic MFCC vectors indicating which dimensions to actually model.
verb, if set to 1, enables progress messages and a confusion matrix plot at the end.
a returns the accuracy fraction;
c returns the confusion matrix count (n_artists x n_artists);
l returns the actual likelihoods of each test item under each model (n_artists x n_test), and
m returns a struct array of the models for each artist, with fields mean, sigma, and prior.
[a,c,l,m] = do_expt_chroma(trnlist, tstlist, ngmm, nfrm, twin, keynorm, verb) - Performs a classification experiment using precalculated beat-synchronous chroma features (from ../chromfeats/). Most arguments are as for do_expt, except
twin is the number of temporally-adjacent frames to stack into each data record (for context-sensitive models);
keynorm, if 1, causes the modeling to attempt key normalization on each training and test piece.

Subsidiary functions

model_train - takes a list of filenames, loads the features, and fits a Gaussian or GMM model.
readdatafile - reads in one data file (MFCC or chroma); normalizes as needed.
model_train_data - takes a matrix of feature vectors and fits a Gaussian or GMM model.
model_match - evaluates likelihood of given data matrix under a Gaussian model/GMM.
model_match_models - calculatest the KL divergence between two single-Gaussian models.
score_lhoods - figure out accuracy and confusion matrix given matrix of model likelihoods (similarities) and vector of ground truth.
model_train_chroma - as model_train, but includes optional key normalization.
model_match_chroma - as model_match, but includes optional key normalization.
align_chroma - figure out the key of an excerpt, and rotate it to be rooted in A.

Example Usage

Here's an example of calculating MFCC features for a set of tracks, then building full-covariance single Gaussian models for all the training tracks that share a common label, then classifying each of the test tracks to one of those models, and scoring against a given ground truth.

The code relies on two list files, tracks-train.txt and tracks-test.txt, each of which specifies a set of tracks (audio files, one per line) used for training and testing the models, respectively. The code prefixes labels- to these filenames (making in this case labels-tracks-train.txt and labels-tracks-test.txt) to read files with the same number of lines but just one token per line, which is taken as the ground truth label associated with that track.

>> % Make sure the MFCC calculation code is in scope >> addpath('rastamat'); % from http://labrosa.ee.columbia.edu/matlab/rastamat/ >> % Make sure the machine learning routines are available >> addpath('netlab'); % from http://www.ncrg.aston.ac.uk/netlab/ >> addpath('KPMtools'); % from http://www.cs.ubc.ca/~murphyk/Software/HMM/hmm.html >> addpath('KPMstats'); % .. or get the whole package from http://bnt.sourceforge.net/ >> % Calculate MFCC features for the training tracks >> tic; tks = calc_mfccs_list('tracks-train.txt','../mp3s/','.mp3','../mfccs/','.htk'); toc >> % .. and for the test tracks >> tic; tks = calc_mfccs_list('tracks-test.txt','../mp3s/','.mp3','../mfccs/','.htk'); toc >> % Do an experiment: single Gaussian trained on 1000 frames using MFCCs 2:20, verbose >> tic; [a1,c1,l1,m1] = do_expt('tracks-train.txt','tracks-test.txt',1,1000,2:20,1); toc >> % ..reports accuracy at the end Features = mfcc training for aerosmith ... ... training for u2 ... ** Matching by max l/hood of test samples testing aerosmith/A_Little_South_Of_Sanity_Explicit_Disc_2_/01-Back_In_The_Saddle... ... testing u2/The_Unforgettable_Fire/10-MLK... Classification accuracy = 52.1064% Elapsed time is 151.350983 seconds.

Doing the same but with chroma features would be like this:

>> % Make sure the chroma calculation code is in scope >> addpath('coversongID'); % from http://labrosa.ee.columbia.edu/projects/coversongs/ >> % Calculate chroma features for the training tracks >> tic; tks = calclistftrs('tracks-train.txt','../mp3s/','.mp3','../chromftrs/','.mat'); toc >> % .. and for the test tracks >> tic; tks = calclistftrs('tracks-test.txt','../mp3s/','.mp3','../chromftrs/','.mat'); toc >> % Do an experiment: 64 mix GMM trained on 1000 frames using 1 temporal frame with key normalization, verbose >> tic; [a2,c2,l2,m2] = do_expt_chroma('tracks-train.txt','tracks-test.txt',64,1000,1,1,1); toc training for aerosmith ... ... training for u2 ... testing aerosmith/A_Little_South_Of_Sanity_Explicit_Disc_2_/01-Back_In_The_Saddle... ... testing u2/The_Unforgettable_Fire/10-MLK... Classification accuracy = 24.612% Elapsed time is 1413.342648 seconds.

Chroma features are not great by themselves, but the likelihoods from those models combine profitably with those from MFCC:

>> % Build the ground-truth index vector >> test_labs = listfileread('labels-tracks-test.txt'); >> ulb = unique(test_labs); >> for i = 1:length(test_labs); gt(i) = find(strcmp(ulb,test_labs{i})); end >> % score_lhoods runs just the scoring part of do_expt; run on the log-likelihood matrix from the MFCC experiment >> score_lhoods(l1,gt); Classification accuracy = 52.1064% >> % .. as before. Now add in weighted log-likelihoods from the chroma experiment >> score_lhoods(l1+.75*l2,gt); Classification accuracy = 54.7672% >> % .. it helps (a little!)

Papers

D. Ellis (2007). Classifying Music Audio with Timbral and Chroma Features: submitted to Int. Conf. on Music Info. Retrieval ISMIR-07, Vienna, Sep. 2007.

Acknowledgment

This material is based in part upon work supported by the National Science Foundation under Grant Nos. IIS-0238301 and IIS-07133334. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

This work was also supported by the Columbia Academic Quality Fund.

Last updated: $Date: 2007/05/01 14:44:48 $
Dan Ellis <[email protected]>

Artist Identification of Music Audio by Timbral and Chroma Features in Matlab