|
|||
|
Practicals |
PracticalsThis page contains descriptions and instructions for weekly practical sessions.
Weds 2012-04-25: UnmixingThis week we'll investigate some of the techniques for unmixing audio signals we saw on monday. Specifically, we'll compare adding and subtracting stereo channels with time-frequency masking based on between-channel level difference. We'll be working in Pd with the following patches (you can download them all as one in prac13.zip):
You'll also need playsound_st~ to playback stereo audio files (including mp3), and sel4~, a simple 4-way audio selector. Things to investigate:
Weds 2012-04-18: FingerprintingOur practical investigation of fingerprinting will use a re-implementation of the Shazam fingerprint system that I put together. You can look at the explanation and examples on that web page to see how it works, but we will use a more recent version of the code, which is more efficient: prac12.zip. We will also use pre-built hash tables, since populating the hash table takes about 5 sec / track, which adds up for the artist20 database of 1413 tracks (6 albums across 20 artists) we are using today. We have three tables to play with: HTA20-20hps.mat (30MB) is the largest and most detailed, generated with 20 hashes/sec. HTA20-10hps.mat (17MB) is smaller since it only recorded about 10 hashes/sec for the reference items, and HTA20-10hps-20c.mat (13MB) saved only a maximum of 20 tracks per hash (instead of 100), giving it a much smaller RAM footprint of 80 MB compared to 400 MB for the other two. The Matlab script below leads you through how to use these tools and data: % Set up commands addpath('mp3readwrite'); % mp3 reading % Calculate fingerprints for some audio [d,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/beatles/Revolver/01-Taxman.mp3'); % ("Let It Be" is not in this database!) % Find the landmark pairs L = find_landmarks(d,sr); size(L) L(1:10,:) % Each row of L is {start-time-col start-freq-row end-freq-row delta-time} % in the quantized units of the time-frequency cells % Visualize them superimposed on the spectrogram (zoom in on first 20 secs) show_landmarks(d,sr,L) axis([0 20 0 4000]) % We can convert each pair into a single, 20 bit hash: H = landmark2hash(L); size(H) H(1:10,:) % Each row of H is {track_id start_time hash_val}, where the track_id defaults to 0 % For building the database, we'd store the track_id and start_time keyed by the hash_val % Try loading the precalculated database global HashTable HashTableCounts load HTA20-10hps-20c whos % The simple hash table has 1048576 columns (one for each possible 20 bit value) % Each column consists of 32 bit values; the top 17 bits are the track_id % and the bottom 15 bits are the time offset within the track. % We can see which tracks contain any given hash (3rd column of H): HashTable(:,H(1,3)+1) % (we add 1 to the hash value to avoid trying to access array element 0) % Zero entries in the HashTable are where it's empty. % We get the track indices by dividing by 16384; Names{} converts the % track indices into the actual tracks recorded in the hash table Names{HashTable(1:5,H(1,3)+1)/16384} % "Taxman" is there, although it's not the only one. However, if we try another hash: Names{HashTable(1:5,H(2,3)+1)/16384} % .. we get a different set of tracks, with Taxman as the only repeat. % match_query is the main routine to search the hash table, and % illustrate_match shows the results, as described in the fingerprinting web page % You can use fingerprints to examine the relationship between two "related" tracks: [d1,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/depeche_mode/Speak_and_Spell/11-Just_Can_t_Get_Enough.mp3'); [d2,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/depeche_mode/Speak_and_Spell/16-Just_Can_t_Get_Enough_Schizo_Mix_.mp3'); bestlinalign(d1,sr,d2,sr); % bestlinalign attempts to find a linear time warp between the hashes in two files. % It also shows the scatter of how one track appears in the other. % Note the section between 160 and 180 s which is skewed differently from the rest % We can evaluate the fingerprinter by generating a set of queries from random positions in a few true tracks: % (we're grabbing the soundfiles across the network). % The default is 30s queries with no added white noise: tru = 100:100:1400; [Q,SR] = gen_random_queries(addprefixsuffix(Names(tru),'http://labrosa.ee.columbia.edu:8013/','.mp3')); % If we run the fingerprint query on these, we should get the "tru" track_ids back % eval_fprint just runs match_query for each element of Q and returns the top hit % You can truncate and add noise too; here, truncate to 4 sec, but very little noise (high SNR) Ttrunc = 4.0; SNR = 60; [S,R,QT] = eval_fprint(Q,SR,tru, Ttrunc, SNR); R % You should see most of the first row matching the values of tru, but sometimes not. % You can examine a match in more detail. QT returns the truncated/noised queries. % Show the matching landmarks [dm,srm] = illustrate_match(QT{2},SR,addprefixsuffix(Names,'http://labrosa.ee.columbia.edu:8013/','.mp3')); % Listen to both query and candidate match in stereo: soundsc([dm,QT{2}],SR) % (When it's wrong, it's completely wrong) Things to investigate:
Weds 2012-04-11: Incipit MatchingThis week we will use the Echo Nest Analyze API to find "incipits" -- fragments of music from the beginnings of phrases -- and search for them within a database of recordings. We'll be working in Matlab. The MATLAB script below is mostly self-explanatory. You can download the scripts in prac11.zip and data in AllIncipits.mat (43MB). % Set up commands addpath('mp3readwrite'); % mp3 reading % Run the Echo Nest Analyze on an MP3 file ENA = en_analyze('test.mp3',1) % Look at the results - plot the per-segment chroma using the segment times axs = subplot(311) plot_chroma(ENA.pitches, ENA.segment); % Compare to spectrogram [d,sr] = mp3read('test.mp3',0,1,2); ax = subplot(312) specgram(d,512,sr); caxis([-30 30]); colormap(1-gray); % (linkaxes will let you scroll the panes in sync) axs = [axs,ax]; linkaxes(axs,'x') axis([0 30 0 5000]); % Superimpose the segment start times to check they make sense overplot_times(ENA.segment,'y'); % Now look at the beat times overplot_times(ENA.beat,'r'); % .. and the bar times overplot_times(ENA.bar,'b'); % .. and the sections (major phrase breaks) overplot_times(ENA.section,'g'); % Do the sections make sense? Listen to 10 s excerpts soundsc(seltime(d,sr,ENA.section(1),ENA.section(1)+10),sr); soundsc(seltime(d,sr,ENA.section(2),ENA.section(2)+10),sr); % sometimes... % Calculate the beat-synchronous chroma from the segments BC = en_beatchroma(ENA); ax = subplot(313); plot_chroma(BC,ENA.beat); axs = [axs,ax]; linkaxes(axs,'x'); % We define incipits as the first N beats after each % section (starting from the nearest bar division), % and represent them with beat-chroma matrices [In,St,En,Bt] = make_incipits(ENA); subplot(321) In1 = squeeze(In(1,:,:)); imagesc(In1); axis xy title('1st incipit of test track') % Incipits are key-normalized, so may not exactly match raw beat-chroma % St, En are the actual start and end times % listen to audio soundsc(seltime(d,sr,St(1),En(1)),sr); % compare to chroma resynthesis % (make sure synthesis times start from 0) soundsc(synthesize_chroma(In1,ENA.beat(Bt(1)+[0:31])-ENA.beat(Bt(1)),sr),sr) % AllIncipits.mat contains incipits from 8000+ tracks of uspop2002 AI = load('AllIncipits.mat'); % Calculate the distance to all of them % Incipits are stored in AI.Incipits as unravelled vectors of 384 (=12x32) values % Just match on the first 16 beats (of 32) - a smaller space mb = 16; nchr = 12; dist = sqrt(sum((AI.Incipits(:,1:mb*nchr) - ... repmat(reshape(In1(:,1:mb),1,mb*nchr),size(AI.Incipits,1),1)).^2,2)); % Sort by distance [vv,xx] = sort(dist); % Plot the most similar one ix = xx(1); subplot(323) imagesc(reshape(AI.Incipits(ix,:),12,32)); axis xy % Which tracks is it? title(['Incipit from ',AI.Names{AI.Tracks(ix)},' @ time ',num2str(AI.Starts(ix))], ... 'Interpreter','none') % ('Interpreter' stops it trying to translate underscores) % Listen to the chroma resynth to see if it's similar soundsc(synthesize_chroma(reshape(AI.Incipits(ix,:),12,32),0.35,sr),sr) % Download & listen to the original audio [d2,sr2] = mp3read(['http://labrosa.ee.columbia.edu:8013/',AI.Names{AI.Tracks(ix)},'.mp3']); soundsc(seltime(d2,sr2,AI.Starts(ix),AI.Ends(ix)),sr2); % Similar? Things to try:
Weds 2012-04-04: Chord RecognitionThis week we will train and use a simple Hidden Markov Model to do chord recognition. We will be using precomputed chroma features along with the ground-truth chord labels for the Beatles opus that were created by Chris Harte of Queen Mary, University of London. This practical is all run in Matlab. Here's an example of using the Matlab scripts. You can download them all as chords_code.zip, and the associated data as chords_data.zip.
TrainFileList = textread('trainfilelist.txt','%s');
% Load beat-synchronous chroma for "Let It Be" - item 135
[Chroma,Times] = load_chroma(TrainFileList{135});
% Resynthesize with Shepard tones
SR = 16000;
X = synthesize_chroma(Chroma,Times,SR);
% Listen to first 20 seconds
soundsc(X(1:20*SR),SR)
% Somewhat recognizable
% Train Gaussian models for each chord from whole training set
[Models,Transitions,Priors] = train_chord_models(TrainFileList);
% Look at the means of the 25 learned models (nochord + 12 major + 12 minor)
for i = 1:25; MM(:,i) = Models(i).mean'; end
imagesc(MM)
% Try recognizing chords in Let It Be (which was in the train set, so cheating)
[HypChords, LHoods] = recognize_chords(Chroma,Models,Transitions,Priors);
% We can look at the best (Viterbi) path overlaid on the per-frame log likelihoods
imagesc(max(-10,log10(LHoods)));
colormap(1-gray)
colorbar
hold on; plot(HypChords+1,'-r'); hold off
% Look just at the first hundred beats
axis([0 100 0.5 25.5])
xlabel('time / beats');
ylabel('chord');
keylabels = '-|C|C#|D|D#|E|F|F#|G|G#|A|A#|B|c|c#|d|d#|e|f|f#|g|g#|a|a#|b';
set(gca,'YTick',1:25);
set(gca,'YTickLabel',keylabels);
% Compare the Viterbi (HMM) path to the simple most-likely model for each frame
[Val,Idx] = max(LHoods);
hold on; plot(Idx, 'og'); hold off
% The HMM transition matrix makes it more likely to stay in any given state,
% thus it smooths the chord sequence (eliminates single-frame chords)
% Evaluate accuracy compared to ground-truth
TrueChords = load_labels(TrainFileList{135});
% Add the true labels to the plot
hold on; plot(TrueChords+1, '.y'); hold off
legend('Viterbi','Best','True')
% HypChords and TrueChords are simple vectors of labels in range 0..24.
% What is the average accuracy for this track?
mean(HypChords==TrueChords)
% 71.5% - pretty good!
% For reference, the best per-frame model, without the HMM, gives
mean(Idx-1 == TrueChords) % subtract 1 to convert indices 1..25 into chords 0..24
% 44.9% - nowhere near as good
% To get the full confusion matrix (rows=true, cols=recognized as):
[S,C] = score_chord_recognition(HypChords,TrueChords);
imagesc(C);
set(gca,'XTick',1:25);
set(gca,'XTickLabel',keylabels);
set(gca,'YTick',1:25);
set(gca,'YTickLabel',keylabels);
% Most common confusion is F being recognized as C.
% What do the true chords sound like when rendered as Shepard tones?
LabelChroma = labels_to_chroma(TrueChords);
% .. creates a simple chroma array with canonical triads for each chord
X2 = synthesize_chroma(LabelChroma,Times,SR);
soundsc(X2(1:20*SR),SR)
% Compare "target" chroma, actual chroma, and both true and hypothesized labels
subplot(311)
imagesc(LabelChroma);
axis xy
set(gca, 'YTick', [1 3 5 8 10 12]'); set(gca, 'YTickLabel', 'C|D|E|G|A|B');
subplot(312)
imagesc(Chroma);
axis xy
set(gca, 'YTick', [1 3 5 8 10 12]'); set(gca, 'YTickLabel', 'C|D|E|G|A|B');
subplot(313)
plot(1:length(TrueChords),TrueChords,'o',1:length(HypChords),HypChords,'.r')
legend('True','Hyp')
set(gca,'YTick',[1 3 5 8 10 13 15 17 20 22]); set(gca,'YTickLabel','C|D|E|G|A|c|d|e|g|a');
axis([0 length(TrueChords) 0 25])
colormap hot
% This gives the picture above
% Evaluate recognition over entire test set
TestFileList = textread('testfilelist.txt','%s');
[S,C] = test_chord_models(TestFileList,Models,Transitions,Priors);
% Overall recognition accuracy = 57.7%
Things to try:
Notes: The code includes and makes use of
gaussian_prob.m,
viterbi_path.m, and
normalise.m,
all from
Kevin Murphy's wonderful HMM Toolbox. Weds 2012-03-28: Beat trackingAs promised, we now move on from the real-time processing of Pd to do some offline analysis using Matlab. (In fact, beat tracking in real time is an interesting and worthwhile problem, but doing it offline is much simpler.) I've put together a cut-down version of my dynamic programming beat tracker for us to play with. It includes:
There are also a number of helper/utility functions:
All these functions are available in prac09.zip. The collection of 20 example excerpts and human tapping data that McKinney and Moelants donated for MIREX 2006 (some 50 MB) is separately available as mirex06examples.zip. Here are some things to try:
Weds 2012-03-21: AutotuneGiven pitch tracking and pitch modification, we can now put them both together to modify the pitch towards a target derived from the current input pitch, i.e., autotune, in which a singer's pitch is moved to the nearest exact note to compensate for problems in their intonation. We can use sigmund both to track the singing pitch, and to analyze the voice into sinusoids which we can then resynthesize after possibly changing the pitch. We'll use the following Pd patches:
You can download all these patches in prac08.zip. Loading a sound file then playing it into the patch should generate a close copy of the original voice, but quantized to semitone pitches. The "pitch smoothing" slider controls how abruptly the pitch moves between notes. Try it on some voice files, such as the Marvin Gaye voice.wav, the query-by-singing example 00014.wav, or my pitch sweep ahh.wav. You can also try it on live input by hooking up the adc~ instead of the soundfile playback, but you will probably need to use headphones to avoid feedback. Here are some things to investigate:
Weds 2012-03-07: Pitch trackingMiller Puckette (author of Pd) created a complex pitch tracking object called sigmund~. This week we'll investigate its use and function. You will use the following Pd patches:
You can download these in prac07.zip. sigmund~ operates in various different modes - as a raw pitch tracker, as a note detector/segmenter, and also as a sinusoid tracker. We'll try each mode.
Weds 2012-02-29: ReverbFor this week's practical you will examine a reverberation algorithm, trying to understand the link between the algorithm controls and pieces, and the subjective experience of the reverberation. We will be working with the algorithm in the rev2~ reverberator patch that comes with Pd, although we'll be modifying our own version of it. It's based on the design described in the 1982 Stautner and Puckette paper. You will use the following Pd patches:
You can download them all in prac06.zip. The main test harness allows you to adjust the control parameters of the reverb patch, and to feed in impulses, short tone bursts of different durations, or sound files. You can also sample the impulse response and write it out to a wave file, to be analyzed by an external program.
Here are some sound files you can use to try out the reverberator: voice.wav, guitar.wav, drums.wav. Wed 2012-02-22: LPCThis week we will experiment with LPC analysis/synthesis in Pd using the lpc~ and lpcreson~ units by Edward Kelly and Nicolas Chetry, which are included with the Pd-extended package. The patch lpc.pd uses these units to perform LPC analysis and synthesis, including taking the LPC filters based on one sound and applying them to a different sound (cross-synthesis). The main patch handles loading, playing, and selecting the soundfiles, and uses the additional patches loadsoundfile.pd to read in sound files, playloop.pd to play or loop a sound file, and audiosel~.pd to allow selecting one of several audio streams. The LPC part is done by the lpcanalysis subpatch of the main patch, shown to the right. It "pre-emphasizes" both voice and excitation to boost high frequencies, then applies lpc~ to the voice signal to generate a set of LPC filter coefficients and a residual. The [myenv] subpatch then calculates the energy (envelope) of the residual, and the excitation is scaled to reflect this envelope. This excitation, along with the original filter coefficients (delayed by one block to match the delay introduced by [myenv]), is passed to lpreson~, which simply implements the all-pole filter to impose the voice's formant structure on the excitation. A final de-emphasis re-balances low and high freqencies. You can download these patches in prac05.zip. The entire lpcanalysis subpatch is applied to overlapping windows of 1024 samples at half the sampling rate of the parent patch (i.e. 1024/22050 = 46.4 ms) thanks to the block~ unit, a special feature of Pd which allows subpatches to have different blocking etc. from their parents. On coming out of the subpatch, Pd will overlap-add as necessary, so we apply a final tapered window (from the $0-hanning array) to the outputs. tabreceive~ repeatedly reads the hanning window on every frame. Here is a list of options for things to investigate:
Weds 2012-02-15: Sinusoidal SynthesisThis week we use Pd to perform additive synthesis by controlling the frequencies and magnitudes of a bank of sinewave oscillators based on parameters read from an analysis file written by the SPEAR program we saw in class. The main additive patch instantiates a bank of 32 oscillators, and provides the controls to load an analysis file, to trigger playback, and to modify the time and frequency scale. The actual parsing of the SPEAR file is provided by loadspearfile, and the individual sinusoid partials are rendered by mypartial. Here are some analysis files for notes from a violin, trumpet, and guitar. You can download all these files in prac04.zip, You can experiment with playing back each of these, turning individual partials on or off, and adjusting the time and frequency scales. When the analysis file contains more harmonics than the number of oscillators (32), some of the sinusoids are dropped from the synthesis. You can identify (roughly) which harmonic is which by the average frequency and summed-up magnitude of each harmonic, which gets displayed on each individual [mypartial] patch in additive.pd. As with last week, we will break into small groups to work on this practical, with each group will presenting their discoveries to the whole class in a brief report-back session at the end of the class. Here are some suggestions for investigation:
Note: if you modify the mypartial patch, it's probably a good idea to close and reopen the parent additive patch. I'm not sure how good Pd is at keeping multiple instantiations in sync when one instance is edited. Re-opening the parent patch will re-instantiate all the mypartials, so they should all get updated. Followup: Here are my attempts to (a) add keyboard control and vibrato: additive+kbd (requires keybd, lfo, cmap, and twoway), and (b) add looping while the key is held down: additive+kbd+loop. Weds 2012-02-08: Analog synthesisThis week we'll experiment with simulating an analog synthesizer with Pd. The Pd analog synth simulator consists of several patches:
You can download all these patches along with some support functions in the zip file prac03.zip. Load demo_voice.pd into Pd, and the synth should run. This week we're going to try breaking up into teams of 4 or 5 people to work on the practical. Each team should choose one of the topics below to work on, then at the end of class we'll get a brief report-back from each team. If your team completes one topic, or gets stuck, feel free to try another. Things to investigate:
A good reference for Pd is Miller Puckette's book The Theory and Technique of Electronic Music. To look up individual units, you can try the Index of the online HTML version. A more terse description of the basic operation of Pd is in Miller's original Pd manual. See also the Introduction to Pd excerpted from Andy Farnell's book, Designing Sound - Practical synthetic sound design for film, games and interactive media using dataflow. Weds 2012-02-01: FilteringLast week we looked a fairly complex structure built in Pd. This week, we'll back up a bit and play with some simple filters within Pd. Pd provides a range of built-in filtering objects: [lop~], [hip~], [bp~] (and its sample-rate-controllable twin [vcf~]), the more general [biquad~], and the elemental [rpole~], [cpole~] and [rzero~], [czero~] (see the filters section of the Floss Pd manual, and the Subtractive Synthesis chapter of Johannes Kreidler's Pd Tutorial). The patch demo_filters.pd provides a framework to play with these filter types. It uses playsound~.pd to allow you to play (and loop) a WAV file, select4~.pd to select one of 4 audio streams via "radio buttons", and plotpowspec~.pd (slightly improved from last week) to plot the running FFT spectrum, as well as the [output~] built-in from Pd-extended. You can download all these patches in prac02.zip. Try listening to the various sound inputs provided by the top [select4~] through the different filters provided by the bottom [select4~]. Try changing the cutoff frequency with the slider (as well as the Q of the bpf with the number box); listen to the results and look at the short-time spectrum. You can try loading the speech and guitar samples into the [playsound~] unit to see how filtering affects "real" sounds (click the button at the bottom right of [playsound~] to bring up the file selection dialog; click the button on the left to play the sound, and check the toggle to make it loop indefinitely). Here are a few further experiments you could try:
Followup: I implemented a low-frequency-modulated sweeping notch filter modnotch~.pd and modified the main patch to include it (demo_modnotch.pd) which builds a chain of 8 notches, all modulated at slightly different (and hence incoherent) frequencies. The effect is somewhat interesting, but doesn't really make a single voice sound like a chorus; maybe you need more than 8 notches, or more structure in the notch motion? Weds 2012-01-25: Plucked stringThis week's practical looks at the Karplus-Strong plucked string simulation in Pure Data (Pd). The general pattern of these weekly practical sessions is to give you a piece of code to start with, then ask you to investigate some aspects, by using and changing the code. However, the areas to investigate are left somewhat open, in the hope that we'll each discover different things -- that we can then share. We start with demo_karpluck.pd, my wrapper around Loomer's karpluck~.pd. In addition to the keybd.pd patch used to provide MIDI-like controls from the computer keyboard, this one also uses grapher~.pd to provide an oscilloscope-like time-domain waveform plot, and plotpowspec~.pd (based on the original by Ron Weiss) to provide a smoothed Fourier transform view. You can download all these patches in prac01.zip. This patch provides three main controls for the sound of the plucked string:
Here are some things to try:
Followup: During class, I modified karpluck~ to support a few extra parameters: a single low-pass filter applied to the excitation (to get low-pass noise initialization instead of white, like using a softer pick), and tremelo depth and modulation rate, to modulate the feedback low-pass to give a kind of tremelo/vibrato effect. See karpluckvib~.pd and demo_karpluckvib~.pd . ![]() This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Dan Ellis <dpwe@ee.columbia.edu> Last updated: Wed Apr 25 09:40:04 AM EDT 2012 | ||