Labrosa : Projects : Music Similarity : uspop2002 :

ERRATA for the "uspop2002" Pop Music data set DVD distributions


As part of a project to find the most similar beat-chroma matrices within uspop, we identified 13 near-duplicate tracks, and split one track into two. As a result, the uspop used at LabROSA now consists of 8764-13+1 = 8752 files. Here are the details:

The 13 tracks, numbers 15-27, from beach_boys/Pet_Sounds, are simply Stereo versions of tracks 1-13. We have therefore excluded them. That leaves 8751 tracks in uspop. However, one other of the original (DVD) tracks has been split in two: radiohead/Kid_A/10-Motion_Picture_Soundtrack is actually two tracks, and has now been replaced by two files: 10-Motion_Picture_Soundtrack_trimmed_ and 11-Genchildren_hidden_ , for a total of 8752 uspop2002 tracks.

The canonical uspop DVDs contain 8764 tracks, which differ only in the extra beach_boys files, and the unsplit radiohead file. Otherwise, naming is identical. The new list of 8752 names is uspop2002-aset-files-8752.txt.


We just discovered that at some point, the ISO disk images we were using to burn the DVDs became corrupt: a few sectors in each of the files somehow got replaced with the wrong data. I didn't know this was possible, but there it is! We don't know when this corruption occurred or how many sets are affected; hopefully very few.

About 1 sector in a million was affected, with the result that there are 24 files that are suspect. These files are listed at the end of this message. The quick way to check if your set is affected is to calculate the md5 checksum on your own copies of one or more of these files, and see if they match the ones below. Another way to check is to see if you get any NaNs when reading these HTK files in to Matlab. There should be no NaN values in any of the uspop data files.

You can fix your copy of uspop (if you have it on disk) by downloading the following gzipped tar file, which contains the correct version of the 24 affected files: (31 MB)

Hopefully, this will eliminate the problem.

If you have a set of DVDs which you find to be affected, you might want to destroy them, or at least put a written warning on the label advising future recipients to be sure to check this uspop errata web page.

Our apologies for this situation. Thanks to Anthony Brew of University College Dublin for alerting us to this problem.

Affected files with md5 checksums of correct version

19e1734f219ecc48d57e3275590c9624  uspop2002-dvd1/artists/alice_deejay/Who_Needs_Guitars_Anyway_/09-Waiting_For_Your_Love.htk
2696c062c5c258975eba6a6a6ea5ed25  uspop2002-dvd1/artists/ani_difranco/Dilate/10-Adam_and_Eve.htk
192127c392bf1dc8f3419a31869533e9  uspop2002-dvd1/artists/beastie_boys/Licensed_To_Ill/02-The_New_Style.htk
438f2c1ebb71f386cb15c6f66d33a786  uspop2002-dvd1/artists/bryan_adams/Waking_Up_The_Neighbours/07-House_Arrest.htk
7158e06003139112452baf06e8beb885  uspop2002-dvd1/artists/coldplay/Parachutes/05-Yellow.htk
211a99144b6550ab11e32ce9b1e0ec7b  uspop2002-dvd1/artists/corrs/In_Blue/02-Give_me_a_reason.htk
a0b27546fe3bf6bcc255f3cb639939ac  uspop2002-dvd1/artists/deftones/Around_The_Fur/07-Lotion.htk
2bdc5f047bb34537eee1e00ca8cf9685  uspop2002-dvd1/artists/eric_clapton/Crossroads_2_Disc_2/04-Badge.htk
a7b7ddf1975c3171bc9425a2b2c57c04  uspop2002-dvd1/artists/eurythmics/Sweet_Dreams_Are_Made_Of_This_/10-This_City_Never_Sleeps.htk
803faf164987011d9d0cc7470c07a07c  uspop2002-dvd1/artists/everclear/Sparkle_And_Fade/12-Pale_Green_Stars.htk
0f8c42e6965b39bdca8f8dfd784d34df  uspop2002-dvd1/artists/jessica_andrews/Who_Am_I/05-Helplessly_Hoplessly_Recklessly.htk
b3a6ae0e3684de2661dbc23643430e31  uspop2002-dvd1/artists/pat_benatar/Live_From_Earth/05-Hell_Is_For_Children.htk
dd7785c2266350c83c93c5d27f991e1b  uspop2002-dvd2/artists/edwin_mccain/Misguided_Roses/05-How_Strange_It_Seems.htk
52213f3281af8df0ecad46ae9ca51da9  uspop2002-dvd2/artists/lfo/LFO/12-My_Block.htk
25ad1caa184f453c9f1d744cb88ab5c3  uspop2002-dvd2/artists/live/Throwing_Copper/02-Selling_The_Drama.htk
4c3c04b0b4e051a58a839477c9667559  uspop2002-dvd2/artists/men_at_work/Brazil/15-It_s_A_Mistake.htk
5ac96046e65132b23e1b06366270384e  uspop2002-dvd2/artists/metallica/S_M_Disc_1_/08-No_Leaf_Clover.htk
0a28ab170a569f7d8ee68908896cca98  uspop2002-dvd2/artists/natalie_imbruglia/Left_of_the_Middle/11-City.htk
4b2db52e353d57b2560683819c8e2fd8  uspop2002-dvd2/artists/new_radicals/Maybe_You_ve_Been_Brainwashed_Too/01-Mother_We_Just_Can_t_Get_Enough.htk
4625eb6ebdf157d1fe822557b74b0e4c  uspop2002-dvd3/artists/jessica_simpson/Sweet_Kisses/10-Your_Faith_In_Me.htk
e4be6860c03c57827085ee1aaa7a4273  uspop2002-dvd3/artists/neil_young/Harvest/02-Harvest.htk
c48967ee226d145b01d45eb64b98e1c4  uspop2002-dvd3/artists/stevie_wonder/Songs_In_The_Key_Of_Life_Disc_1_/04-Contusion.htk
e295703d4ebe54c7020fba185fd6a1d1  uspop2002-dvd3/artists/stone_temple_pilots/Tiny_Music_Songs_from_the_Vatican_Gift_Shop/06-And_So_I_Know.htk
23b4233612af9c8cf77ee20560cd4093  uspop2002-dvd3/artists/styx/Caught_in_the_Act_-_Live_Disc_1_of_2_/01-Music_Time.htk


The first "official" release of the 3 DVD set containing the MFCC features for the 8764 tracks of uspop2002 was in June 2005. It is noted as "release 0.1 2005-06-02" on the disk label. These are the problems we have noticed subsequently.

Valid HTML 4.0! Last updated: $Date: 2005/06/02 04:27:39 $
Dan Ellis <[email protected]>