%%%%%%%%%%%%%%%%%%%%%%%%
Classification of Consumer Video by Soundtrack

Courtenay Cotton
cvcotton@ee.columbia.edu
11/30/2010

%%%%%%%%%%%%%%%%%%%%%%%%
Introduction:

This package contains audio data and Matlab code to reproduce the baseline concept classification experiments 
(single-Gaussian models only) described in:

K. Lee and D. Ellis (2010). Audio-Based Semantic Concept Classification for Consumer Video.
IEEE Trans. Audio, Speech, and Lang. Proc, vol. 18 no. 6 pp. 1406-1416, Aug 2010.


%%%%%%%%%%%%%%%%%%%%%%%%
Contents:

- 1873 consumer video soundtracks (in mp3 format)
- train, validation, and test lists for 5 partitions of the data, with binary labels over 25 concept classes
- list of concept names
- Matlab code to perform concept classification experiment


%%%%%%%%%%%%%%%%%%%%%%%%
Installation and Use:

To run this code, you just need it to be in your Matlab path.
However, you will need to install three external packages to run the experiment:

- mp3read:
www.ee.columbia.edu/~dpwe/resources/matlab/mp3read.html
You won't need mp3write.

- rastamat:
www.ee.columbia.edu/~dpwe/resources/matlab/rastamat

- LIBSVM's Matlab interface:
www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab
You will need the package that is maintained by the LIBSVM authors at National Taiwan University.


To run the experiment:

"runBaseline(distType,plotResults,dataDir)"

distType = 1 for Mahalanobis distance (default), 2 for KL divergence
plotResults = 1 to display bar graph of average precision results (default), 0 otherwise
dataDir = location of 'data' folder with mp3s and labels (by default, looks in the same directory where "runBaseline.m" resides)


%%%%%%%%%%%%%%%%%%%%%%%%
Experiment overview:

The function "runBaseline.m" computes the mean and covariance of MFCC features for each soundtrack file.
It then performs the following experiment over each of the 5 partitions of the data files.
The distances between files are computed according to the distType specified.
For each of the 25 concepts, an SVM is trained with a kernel created from these distances.
(Actually, optimal SVM parameters {gamma,C} are first selected by repeatedly training SVMs and testing on the validation set.
For the best parameter settings, a final SVM for the concept is trained.)
For each concept, files in the test set are ranked according to the SVM decision values and the performance is evaluated 
by calculating the average precision (AP) of this list.
The average AP results over the 5 experiments are reported and displayed, compared with the expected AP of guessing.

%%%%%%%%%%%%%%%%%%%%%%%%
Results:

If everything is working correctly you should get approximately the following results (for distType = 1, Mahalanobis):

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Average Precision results: %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 
1.animal: 0.314
2.baby: 0.333
3.beach: 0.274
4.birthday: 0.302
5.boat: 0.189
6.crowd: 0.688
7.graduation: 0.261
8.group of three or more: 0.891
9.group of two: 0.261
10.museum: 0.196
11.night: 0.414
12.one person: 0.416
13.park: 0.275
14.picnic: 0.225
15.playground: 0.146
16.show: 0.588
17.sports: 0.216
18.sunset: 0.300
19.wedding: 0.370
20.dancing: 0.310
21.parade: 0.196
22.singing: 0.589
23.ski: 0.394
24.cheer: 0.619
25.music: 0.864
Mean Average Precision (MAP) over all concepts: 0.385

%%%%%%%%%%%%%%%%%%%%%%%%


