Department of Electrical Engineering - Columbia University

[SEAS logo]

ELEN E4896 - Spring 2012

MUSIC SIGNAL PROCESSING

Home page

Course outline

Code

Practicals

Assignments

Columbia Courseworks E4896

Practicals

This page contains descriptions and instructions for weekly practical sessions.


Weds 2012-04-25: Unmixing

[unmixing image]

This week we'll investigate some of the techniques for unmixing audio signals we saw on monday. Specifically, we'll compare adding and subtracting stereo channels with time-frequency masking based on between-channel level difference. We'll be working in Pd with the following patches (you can download them all as one in prac13.zip):

  • Unmix_demo is the main patch which lets you try the three different methods.
  • pan_st~ is simple stereo panning i.e. it lets you attenuate the left or right channels by moving the slider left or right.
  • LRcancel~ is the slightly more complex parametric summing of channels that we saw in class on monday. It converts the 2 channel stereo into a single channel, which you can make the sum, difference, or either of the two channels, by moving the slider.
  • ILDgate~ is full time-frequency masking based on the energy difference between left and right at each frequency band. It includes controls for the target ILD (energy ratio), the "width" of accepted ILDs, and a temporal smoothing parameter, alpha.

You'll also need playsound_st~ to playback stereo audio files (including mp3), and sel4~, a simple 4-way audio selector.

Things to investigate:

  • Listen to maxwell.mp3. First, listen to the left and right channels alone - what do you hear? What about when you listen to the difference, left-right?
  • How well can you isolate these instruments with the ILD parameter of ILDgate~? What is the effect of the "width" parameter?
  • What is the effect of the alpha parameter?
  • Try again with undertow.mp3. Is the style of mixing different here? How many instruments can you isolate?

Weds 2012-04-18: Fingerprinting

[fingerprinting image]

Our practical investigation of fingerprinting will use a re-implementation of the Shazam fingerprint system that I put together. You can look at the explanation and examples on that web page to see how it works, but we will use a more recent version of the code, which is more efficient: prac12.zip. We will also use pre-built hash tables, since populating the hash table takes about 5 sec / track, which adds up for the artist20 database of 1413 tracks (6 albums across 20 artists) we are using today. We have three tables to play with: HTA20-20hps.mat (30MB) is the largest and most detailed, generated with 20 hashes/sec. HTA20-10hps.mat (17MB) is smaller since it only recorded about 10 hashes/sec for the reference items, and HTA20-10hps-20c.mat (13MB) saved only a maximum of 20 tracks per hash (instead of 100), giving it a much smaller RAM footprint of 80 MB compared to 400 MB for the other two.

The Matlab script below leads you through how to use these tools and data:

% Set up commands
addpath('mp3readwrite');  % mp3 reading

% Calculate fingerprints for some audio
[d,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/beatles/Revolver/01-Taxman.mp3');
% ("Let It Be" is not in this database!)
% Find the landmark pairs
L = find_landmarks(d,sr);
size(L)
L(1:10,:)
% Each row of L is {start-time-col start-freq-row end-freq-row delta-time}
% in the quantized units of the time-frequency cells
% Visualize them superimposed on the spectrogram (zoom in on first 20 secs)
show_landmarks(d,sr,L)
axis([0 20 0 4000])

% We can convert each pair into a single, 20 bit hash:
H = landmark2hash(L);
size(H)
H(1:10,:)
% Each row of H is {track_id start_time hash_val}, where the track_id defaults to 0
% For building the database, we'd store the track_id and start_time keyed by the hash_val

% Try loading the precalculated database
global HashTable HashTableCounts
load HTA20-10hps-20c
whos
% The simple hash table has 1048576 columns (one for each possible 20 bit value)
% Each column consists of 32 bit values; the top 17 bits are the track_id
% and the bottom 15 bits are the time offset within the track.

% We can see which tracks contain any given hash (3rd column of H):
HashTable(:,H(1,3)+1)
% (we add 1 to the hash value to avoid trying to access array element 0)
% Zero entries in the HashTable are where it's empty.
% We get the track indices by dividing by 16384; Names{} converts the 
% track indices into the actual tracks recorded in the hash table
Names{HashTable(1:5,H(1,3)+1)/16384}
% "Taxman" is there, although it's not the only one.  However, if we try another hash:
Names{HashTable(1:5,H(2,3)+1)/16384}
% .. we get a different set of tracks, with Taxman as the only repeat.
% match_query is the main routine to search the hash table, and
% illustrate_match shows the results, as described in the fingerprinting web page

% You can use fingerprints to examine the relationship between two "related" tracks:
[d1,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/depeche_mode/Speak_and_Spell/11-Just_Can_t_Get_Enough.mp3');
[d2,sr] = mp3read('http://labrosa.ee.columbia.edu:8013/depeche_mode/Speak_and_Spell/16-Just_Can_t_Get_Enough_Schizo_Mix_.mp3');
bestlinalign(d1,sr,d2,sr);
% bestlinalign attempts to find a linear time warp between the hashes in two files.
% It also shows the scatter of how one track appears in the other.
% Note the section between 160 and 180 s which is skewed differently from the rest

% We can evaluate the fingerprinter by generating a set of queries from random positions in a few true tracks:
% (we're grabbing the soundfiles across the network).  
% The default is 30s queries with no added white noise:
tru = 100:100:1400;
[Q,SR] = gen_random_queries(addprefixsuffix(Names(tru),'http://labrosa.ee.columbia.edu:8013/','.mp3'));
% If we run the fingerprint query on these, we should get the "tru" track_ids back
% eval_fprint just runs match_query for each element of Q and returns the top hit
% You can truncate and add noise too; here, truncate to 4 sec, but very little noise (high SNR)
Ttrunc = 4.0;
SNR = 60;
[S,R,QT] = eval_fprint(Q,SR,tru, Ttrunc, SNR);
R
% You should see most of the first row matching the values of tru, but sometimes not.
% You can examine a match in more detail. QT returns the truncated/noised queries.
% Show the matching landmarks
[dm,srm] = illustrate_match(QT{2},SR,addprefixsuffix(Names,'http://labrosa.ee.columbia.edu:8013/','.mp3'));
% Listen to both query and candidate match in stereo:
soundsc([dm,QT{2}],SR)
% (When it's wrong, it's completely wrong)

Things to investigate:

  • The second column of the return from match_query (and eval_fprint) is the number of matching hashes. Do you notice the trend here for bad matches?
  • The fourth argument to eval_fprint truncates the length of the queries (in sec). How long do they have to be to get reliable retrieval?
  • The fifth argument to eval_fprint adds white noise at that amplitude to the queries. Try a value around 0.01. Does it hurt? Can you compensate with longer queries?
  • In addition to added noise, the fingerprinter should be able to handle spectral modifications (e.g. filter([1 1], [1 -2*0.9*cos(.1) 0.9^2], Q{1}) ). Can you generate test queries modified along these lines (listen to see how they sound), and see how it affects performance?
  • What happens if you add two query waveforms together, e.g. Q{1}+0.2*Q{2} ?
  • The denser hash table, HTA-20hps.mat, needs at least 500MB of RAM, but if you can load it, try it out and see how much better it does.

Weds 2012-04-11: Incipit Matching

[incipits image]

This week we will use the Echo Nest Analyze API to find "incipits" -- fragments of music from the beginnings of phrases -- and search for them within a database of recordings. We'll be working in Matlab. The MATLAB script below is mostly self-explanatory. You can download the scripts in prac11.zip and data in AllIncipits.mat (43MB).

% Set up commands
addpath('mp3readwrite');  % mp3 reading

% Run the Echo Nest Analyze on an MP3 file
ENA = en_analyze('test.mp3',1)
% Look at the results - plot the per-segment chroma using the segment times
axs = subplot(311)
plot_chroma(ENA.pitches, ENA.segment);
% Compare to spectrogram
[d,sr] = mp3read('test.mp3',0,1,2);
ax = subplot(312)
specgram(d,512,sr);
caxis([-30 30]);
colormap(1-gray);
% (linkaxes will let you scroll the panes in sync)
axs = [axs,ax];
linkaxes(axs,'x')
axis([0 30 0 5000]);
% Superimpose the segment start times to check they make sense
overplot_times(ENA.segment,'y');
% Now look at the beat times
overplot_times(ENA.beat,'r');
% .. and the bar times
overplot_times(ENA.bar,'b');
% .. and the sections (major phrase breaks)
overplot_times(ENA.section,'g');

% Do the sections make sense?  Listen to 10 s excerpts
soundsc(seltime(d,sr,ENA.section(1),ENA.section(1)+10),sr);
soundsc(seltime(d,sr,ENA.section(2),ENA.section(2)+10),sr);
% sometimes...

% Calculate the beat-synchronous chroma from the segments
BC = en_beatchroma(ENA);
ax = subplot(313);
plot_chroma(BC,ENA.beat);
axs = [axs,ax];
linkaxes(axs,'x');

% We define incipits as the first N beats after each 
% section (starting from the nearest bar division), 
% and represent them with beat-chroma matrices
[In,St,En,Bt] = make_incipits(ENA);
subplot(321)
In1 = squeeze(In(1,:,:));
imagesc(In1); axis xy
title('1st incipit of test track')
% Incipits are key-normalized, so may not exactly match raw beat-chroma
% St, En are the actual start and end times
% listen to audio
soundsc(seltime(d,sr,St(1),En(1)),sr);
% compare to chroma resynthesis
% (make sure synthesis times start from 0)
soundsc(synthesize_chroma(In1,ENA.beat(Bt(1)+[0:31])-ENA.beat(Bt(1)),sr),sr)

% AllIncipits.mat contains incipits from 8000+ tracks of uspop2002
AI = load('AllIncipits.mat');
% Calculate the distance to all of them
% Incipits are stored in AI.Incipits as unravelled vectors of 384 (=12x32) values
% Just match on the first 16 beats (of 32) - a smaller space
mb = 16;
nchr = 12;
dist = sqrt(sum((AI.Incipits(:,1:mb*nchr) - ...
                 repmat(reshape(In1(:,1:mb),1,mb*nchr),size(AI.Incipits,1),1)).^2,2));
% Sort by distance
[vv,xx] = sort(dist);
% Plot the most similar one
ix = xx(1);
subplot(323)
imagesc(reshape(AI.Incipits(ix,:),12,32)); axis xy
% Which tracks is it?
title(['Incipit from ',AI.Names{AI.Tracks(ix)},' @ time ',num2str(AI.Starts(ix))], ...
      'Interpreter','none')
% ('Interpreter' stops it trying to translate underscores)
% Listen to the chroma resynth to see if it's similar
soundsc(synthesize_chroma(reshape(AI.Incipits(ix,:),12,32),0.35,sr),sr)
% Download & listen to the original audio
[d2,sr2] = mp3read(['http://labrosa.ee.columbia.edu:8013/',AI.Names{AI.Tracks(ix)},'.mp3']);
soundsc(seltime(d2,sr2,AI.Starts(ix),AI.Ends(ix)),sr2);
% Similar?

Things to try:

  • Does the Echo Nest Analyze time segmentation make sense? You can try adding "blips" corresponding to beat, bar, and section times to your original audio (using beat_play.m from the beat-tracking practical) and see if they sound right.
  • Can you find any particularly close matches to any examples (i.e. where dist is small)? Does the chroma sound similar? Does the audio sound similar? Why might these be different?
  • Try modifying the chroma values, e.g. with the power-law compression we tried last week. Does that improve the matching?
  • Do you find certain "degenerate" matching patches? What do you think makes them come up so often?

Weds 2012-04-04: Chord Recognition

[chroma image]

This week we will train and use a simple Hidden Markov Model to do chord recognition. We will be using precomputed chroma features along with the ground-truth chord labels for the Beatles opus that were created by Chris Harte of Queen Mary, University of London. This practical is all run in Matlab.

Here's an example of using the Matlab scripts. You can download them all as chords_code.zip, and the associated data as chords_data.zip.

TrainFileList = textread('trainfilelist.txt','%s');
% Load beat-synchronous chroma for "Let It Be" - item 135
[Chroma,Times] = load_chroma(TrainFileList{135});
% Resynthesize with Shepard tones
SR = 16000;
X = synthesize_chroma(Chroma,Times,SR);
% Listen to first 20 seconds
soundsc(X(1:20*SR),SR)
% Somewhat recognizable

% Train Gaussian models for each chord from whole training set
[Models,Transitions,Priors] = train_chord_models(TrainFileList);
% Look at the means of the 25 learned models (nochord + 12 major + 12 minor)
for i = 1:25; MM(:,i) = Models(i).mean'; end
imagesc(MM)

% Try recognizing chords in Let It Be (which was in the train set, so cheating)
[HypChords, LHoods] = recognize_chords(Chroma,Models,Transitions,Priors);
% We can look at the best (Viterbi) path overlaid on the per-frame log likelihoods
imagesc(max(-10,log10(LHoods)));
colormap(1-gray)
colorbar
hold on; plot(HypChords+1,'-r'); hold off
% Look just at the first hundred beats
axis([0 100 0.5 25.5])
xlabel('time / beats');
ylabel('chord');
keylabels = '-|C|C#|D|D#|E|F|F#|G|G#|A|A#|B|c|c#|d|d#|e|f|f#|g|g#|a|a#|b';
set(gca,'YTick',1:25);
set(gca,'YTickLabel',keylabels);
% Compare the Viterbi (HMM) path to the simple most-likely model for each frame
[Val,Idx] = max(LHoods);
hold on; plot(Idx, 'og'); hold off
% The HMM transition matrix makes it more likely to stay in any given state, 
% thus it smooths the chord sequence (eliminates single-frame chords)

% Evaluate accuracy compared to ground-truth
TrueChords = load_labels(TrainFileList{135});
% Add the true labels to the plot
hold on; plot(TrueChords+1, '.y'); hold off
legend('Viterbi','Best','True')

% HypChords and TrueChords are simple vectors of labels in range 0..24.
% What is the average accuracy for this track?
mean(HypChords==TrueChords)
% 71.5% - pretty good!
% For reference, the best per-frame model, without the HMM, gives
mean(Idx-1 == TrueChords)  % subtract 1 to convert indices 1..25 into chords 0..24
% 44.9% - nowhere near as good

% To get the full confusion matrix (rows=true, cols=recognized as):
[S,C] = score_chord_recognition(HypChords,TrueChords);
imagesc(C);
set(gca,'XTick',1:25);
set(gca,'XTickLabel',keylabels);
set(gca,'YTick',1:25);
set(gca,'YTickLabel',keylabels);
% Most common confusion is F being recognized as C.

% What do the true chords sound like when rendered as Shepard tones?
LabelChroma = labels_to_chroma(TrueChords);
% .. creates a simple chroma array with canonical triads for each chord
X2 = synthesize_chroma(LabelChroma,Times,SR);
soundsc(X2(1:20*SR),SR)

% Compare "target" chroma, actual chroma, and both true and hypothesized labels
subplot(311)
imagesc(LabelChroma);
axis xy
set(gca, 'YTick', [1 3 5 8 10 12]'); set(gca, 'YTickLabel', 'C|D|E|G|A|B');
subplot(312)
imagesc(Chroma);
axis xy
set(gca, 'YTick', [1 3 5 8 10 12]'); set(gca, 'YTickLabel', 'C|D|E|G|A|B');
subplot(313)
plot(1:length(TrueChords),TrueChords,'o',1:length(HypChords),HypChords,'.r')
legend('True','Hyp')
set(gca,'YTick',[1 3 5 8 10 13 15 17 20 22]); set(gca,'YTickLabel','C|D|E|G|A|c|d|e|g|a');
axis([0 length(TrueChords) 0 25])
colormap hot
% This gives the picture above

% Evaluate recognition over entire test set
TestFileList = textread('testfilelist.txt','%s');
[S,C] = test_chord_models(TestFileList,Models,Transitions,Priors);
% Overall recognition accuracy = 57.7%

Things to try:

  • Try training on different numbers of training tracks (e.g. TrainFileList(1:50)). How does this affect performance?
  • Look at the variation in accuracy on the different items in the test set. How much does it vary? Can you explain this variation from your knowledge of the different tracks?
  • If you listen to the resynthesis of the recognized chords (i.e., labels_to_chroma(recognize_chords(...))), can you hear why the recognizer made mistakes, i.e., do the mistaken chords make sense?
  • load_chroma includes simple feature normalization, to make the maximum value of each chroma vector equal to 1. You can probably improve performance by modifying this e.g. by applying a compressive nonlinearity such as Chroma.^0.25. See how this affects performance.
  • The Hidden Markov Model smoothing relies on the large self-loop probabilities of the Transition matrix. See what happens if you replace Transition with a flat matrix e.g. ones(25,25) (which is equivalent to not using an HMM and just taking the most-likely model for each frame), or some mixture between that and a large self-loop probability e.g. eye(25)+0.1*ones(25,25).

Notes: The code includes and makes use of gaussian_prob.m, viterbi_path.m, and normalise.m, all from Kevin Murphy's wonderful HMM Toolbox.
You can calculate your own beat-synchronous chroma features with the chrombeatftrs.m routine, part of my cover song detection system.


Weds 2012-03-28: Beat tracking

[beat_track image]

As promised, we now move on from the real-time processing of Pd to do some offline analysis using Matlab. (In fact, beat tracking in real time is an interesting and worthwhile problem, but doing it offline is much simpler.)

I've put together a cut-down version of my dynamic programming beat tracker for us to play with. It includes:

  • beat_onset.m to calculate the onset strength function;
  • beat_tempo.m to estimate the global tempo from the onset function;
  • beat_simple.m - the core of the dynamic programming search;
  • beat_track.m - a wrapper around these three functions to go from audio file to beat times.

There are also a number of helper/utility functions:

  • beat_plot.m to plot beat times on top of a spectrogram;
  • beat_play.m to make a sound file indicating where the beats were placed;
  • beat_ground_truth.m to read in a file of the McKinney/Moelants ground truth and filter for the dominant tempo;
  • beat_score.m to compare two beat tracks;
  • beat_test.m to run the beat tracker on a list of files and score each one;
  • beat_demo.m - a script that shows an example of using these functions.

All these functions are available in prac09.zip. The collection of 20 example excerpts and human tapping data that McKinney and Moelants donated for MIREX 2006 (some 50 MB) is separately available as mirex06examples.zip.

Here are some things to try:

  • Go through the steps in beat_demo to understand the basic pieces.
  • Investigate one of the examples where the beat tracker doesn't do so well. What went wrong? How consistent are the human results?
  • The one parameter of the DP beat tracker is alpha, the balance between local onset strength and inter-onset-time consistency. A larger alpha results in a more rigid tempo track, better able to bridge over rhythmically ambiguous regions, but less able to cope with small variations in tempo. Can you improve the baseline performance by changing the value of alpha (e.g., the default value set at the top of beat_simple)?
  • The tempo estimation is essentially independent of the beat tracking. Sometimes you can improve the agreement with human tappers simply by "biasing" the tempo picking differently. See if you can improve performance by modifying the bpmmean parameter in beat_tempo.
  • The onset function in beat_onset is very simple and unweighted. Can you perhaps weight different frequency bands to improve performance?
  • The beat tracker sometimes has a systematic shift relative to the humans (it's often early). The hh and tt returns of beat_score give a histogram of where system beats occur relative to human ground truth. How much can you improve system performance by systematically shifting every beat time (e.g., beats = beats + 0.1)?
  • What happens if the tempo varies within a single performance? Maybe you can find an example like that on your computer - try it out! (You might need mp3read.m).

Weds 2012-03-21: Autotune

[autotune image]

Given pitch tracking and pitch modification, we can now put them both together to modify the pitch towards a target derived from the current input pitch, i.e., autotune, in which a singer's pitch is moved to the nearest exact note to compensate for problems in their intonation.

We can use sigmund both to track the singing pitch, and to analyze the voice into sinusoids which we can then resynthesize after possibly changing the pitch. We'll use the following Pd patches:

  • autotune.pd - the main patch, shown above
  • sigmundosc~.pd - a patch to take sigmund's "tracks" message stream and resynthesize into audio
  • sineosc~.pd - the individual sinusoid oscillator, modified from last time to accept frequency scaling messages
  • my_lop.pd - a simple one-pole low-pass filter for control (not audio) values
  • playsound~.pd and grapher.pd from last time.

You can download all these patches in prac08.zip.

Loading a sound file then playing it into the patch should generate a close copy of the original voice, but quantized to semitone pitches. The "pitch smoothing" slider controls how abruptly the pitch moves between notes. Try it on some voice files, such as the Marvin Gaye voice.wav, the query-by-singing example 00014.wav, or my pitch sweep ahh.wav. You can also try it on live input by hooking up the adc~ instead of the soundfile playback, but you will probably need to use headphones to avoid feedback.

Here are some things to investigate:

  • What is the purpose of the moses 0 block in the middle of the patch? If you remove it, and hook the raw pitch input directly into the quantizer, what changes?
  • What is the left-hand output of the moses block achieving? Can you hear the difference if it disconnected?
  • The "center" tones are defined by the A440 reference of the default MIDI scale. However, the query-by-humming example seems to be misaligned. Can you add a tuning offset to allow the input pitch to be systematically offset prior to quantization? Does this help?
  • Currently, the quantization just takes the nearest semitone. How could you make it select only the "white notes" of a major scale, i.e. skipping every 2nd semitone except between E/F and B/C?
  • Occasionally you may hear glitches in the output pitch. What is causing these? How might they be avoided?
  • The current system generates singing with a mostly steady pitch, but singers often add vibrato (pitch modulation at around 6 Hz). How could we add vibrato with the autotune?
  • How would you impose an entirely unrelated pitch (e.g. a monotone, or a steady slow vibrato) on an input signal?

Weds 2012-03-07: Pitch tracking

[sigmund image]

Miller Puckette (author of Pd) created a complex pitch tracking object called sigmund~. This week we'll investigate its use and function.

You will use the following Pd patches:

You can download these in prac07.zip.

sigmund~ operates in various different modes - as a raw pitch tracker, as a note detector/segmenter, and also as a sinusoid tracker. We'll try each mode.

  • In the test_sigmund patch, the left-most path runs sigmund~ in raw pitch tracking mode, echoing the pitch and energy it detects with a sawtooth oscillator. Try running this with the mic input (adc~). How well can it track your voice? What kind of errors does it make?
  • If instead you hook up the playsound~ unit, you can feed an existing soundfile into the tracker. Try voice.wav, speech.wav, and piano.wav. How does it do?
  • The second path runs sigmund~ in note-detecting mode i.e. it first detects the raw pitch, then tries to convert that into a series of note events. We have fed these to our analog synthesizer patch so we can hear them as distinct notes. Try this for the voice and piano.
  • Note tracking depends on a lot of thresholds, which can be set via messages to sigmund~. Try varying the "growth" (minimum level increase, in dB, before a new note is reported) and "vibrato" (maximum pitch deviation, in semitones, before a new note is reported) using the provided sliders. Can you improve the note extraction? How do the best settings depend on the material?
  • The third branch uses sigmund~ as a sinusoid tracker (analogous to SPEAR), then connects the output to a set of sinusoidal oscillators to reconstruct a version of the sound. Try feeding different sounds into this: how does it do? You can vary the cross-fade time of the oscillators with the "fadetime" slider - what does this do?

Weds 2012-02-29: Reverb

[reverb image]

For this week's practical you will examine a reverberation algorithm, trying to understand the link between the algorithm controls and pieces, and the subjective experience of the reverberation. We will be working with the algorithm in the rev2~ reverberator patch that comes with Pd, although we'll be modifying our own version of it. It's based on the design described in the 1982 Stautner and Puckette paper.

You will use the following Pd patches:

You can download them all in prac06.zip.

The main test harness allows you to adjust the control parameters of the reverb patch, and to feed in impulses, short tone bursts of different durations, or sound files. You can also sample the impulse response and write it out to a wave file, to be analyzed by an external program.

  • Try varying the main reverb controls: Liveness, Crossover Frequency, HF Damping, as well as the balance between "wet" and "dry" sound. What are the effects on the sounds of these different effects? How does the effect on the sound you hear depend on the kind of sound you use as test input?
  • Try writing out the impulse response of a setting to a sound file (using the "write -wave ..." message at the bottom right) and load it into Matlab (or another sound editor) to look in more detail at its waveform and spectrogram. Does it look like you expect? How do the controls affect these objective properties of the signal?
  • Open the my_rev2 subpatch (shown to the right). The main work is done by four delay lines, implemented with the four delwrite~/delread~ pairs. You can vary the actual delays of these four lines. How does this affect sound? Are the default values critical to the sound?
  • The other main part of the reverberator is the early echos, implemented in the early-reflect subpatch. If you open that subpatch, you'll find a fader to choose between including those early echos, or just feeding the direct path into the delay lines. How does this affect the sound?
  • This reverberator implements a mixing matrix implemented with the 4x4 block of *~ 1 and *~ -1 elements in the lower left. Try flipping some of these; what happens?

Here are some sound files you can use to try out the reverberator: voice.wav, guitar.wav, drums.wav.


Wed 2012-02-22: LPC

[lpcanalysis image]

This week we will experiment with LPC analysis/synthesis in Pd using the lpc~ and lpcreson~ units by Edward Kelly and Nicolas Chetry, which are included with the Pd-extended package.

The patch lpc.pd uses these units to perform LPC analysis and synthesis, including taking the LPC filters based on one sound and applying them to a different sound (cross-synthesis). The main patch handles loading, playing, and selecting the soundfiles, and uses the additional patches loadsoundfile.pd to read in sound files, playloop.pd to play or loop a sound file, and audiosel~.pd to allow selecting one of several audio streams. The LPC part is done by the lpcanalysis subpatch of the main patch, shown to the right. It "pre-emphasizes" both voice and excitation to boost high frequencies, then applies lpc~ to the voice signal to generate a set of LPC filter coefficients and a residual. The [myenv] subpatch then calculates the energy (envelope) of the residual, and the excitation is scaled to reflect this envelope. This excitation, along with the original filter coefficients (delayed by one block to match the delay introduced by [myenv]), is passed to lpreson~, which simply implements the all-pole filter to impose the voice's formant structure on the excitation. A final de-emphasis re-balances low and high freqencies. You can download these patches in prac05.zip.

The entire lpcanalysis subpatch is applied to overlapping windows of 1024 samples at half the sampling rate of the parent patch (i.e. 1024/22050 = 46.4 ms) thanks to the block~ unit, a special feature of Pd which allows subpatches to have different blocking etc. from their parents. On coming out of the subpatch, Pd will overlap-add as necessary, so we apply a final tapered window (from the $0-hanning array) to the outputs. tabreceive~ repeatedly reads the hanning window on every frame.

Here is a list of options for things to investigate:

  • Run the basic cross-synthesis function by loading sm1-44k.wav as the voice input, and Juno106_Strings.wav as the excitation input. (You'll probably need to enable the dsp on the output~ unit and turn up the volume.) Notice how the strings sound can still be understood as speech.
  • Try selecting the noise input as excitation (using the audiosel~ panel) instead. This lets you hear the information being carried by the LPC filters in its most neutral form.
  • Try varying the LPC Filter Order (also called the model order) with the number box (click and drag up and down to change the number). How does it affect the synthesis sound?
  • Change the patching to the output~ unit to listen to the "residual" output from the lpcanalysis subpatch. How does the residual sound change with the model order?
  • The subpatch uses the residual's envelope (extracted by [myenv]) to scale the excitation. If you disable this (e.g. by patching the excitation directly into the input of lpreson~), how does this affect the sound?
  • The LPC analysis relies on Pd's block~ unit to manage the block size, overlap-add, and sample rate scaling. You can try varying this to see e.g. how the sound varies if you use a longer block size and/or different sampling rate. (Remember to regenerate the hanning window array if you change the block size). Be warned, however: I have been able to crash both Pd and the audio drivers with these settings.
  • One weakness with this setup is that the excitation may not have a "flat" spectrum to provide sufficient high-frequency energy for the LPC filters to "shape". One approach to this is to use a second lpc~ unit to extract a (whitened) residual for the excitation too (the actual LPC coefficients from the excitation are discarded). Try this out and see how it affects the sound.
  • In principle, LPC cross-synthesis can be used for time scaling, simply by slowing down the frame rate between analysis and synthesis (or possibly by interpolating the LPC parameters). The framework in lpcanalysis won't support this, however, as it uses LPC parameters extracted in real time from the speech waveform. To do timescaling, you'd need to implement something more like the way we did sinusoidal synthesis last week, where the parameters are written to table, then read back as needed on synthesis. You should, however, be able to implement both analysis and synthesis stages within Pd, without needing to read or write the parameters to an external file.

Weds 2012-02-15: Sinusoidal Synthesis

[mypartial image]

This week we use Pd to perform additive synthesis by controlling the frequencies and magnitudes of a bank of sinewave oscillators based on parameters read from an analysis file written by the SPEAR program we saw in class.

The main additive patch instantiates a bank of 32 oscillators, and provides the controls to load an analysis file, to trigger playback, and to modify the time and frequency scale. The actual parsing of the SPEAR file is provided by loadspearfile, and the individual sinusoid partials are rendered by mypartial. Here are some analysis files for notes from a violin, trumpet, and guitar. You can download all these files in prac04.zip,

You can experiment with playing back each of these, turning individual partials on or off, and adjusting the time and frequency scales. When the analysis file contains more harmonics than the number of oscillators (32), some of the sinusoids are dropped from the synthesis. You can identify (roughly) which harmonic is which by the average frequency and summed-up magnitude of each harmonic, which gets displayed on each individual [mypartial] patch in additive.pd.

As with last week, we will break into small groups to work on this practical, with each group will presenting their discoveries to the whole class in a brief report-back session at the end of the class. Here are some suggestions for investigation:

  • Using the keybd patch from last week, make the sound resynthesize under keyboard control, modifying its pitch to reflect the key pressed.
  • Try adding inharmonicity to the sound i.e. stretching the frequencies not by a constant factor, but according to some other rule. For instance, what happens if you add a constant offset to each frequency? What happens if you scale only the frequencies above a certain minimum?
  • Add vibrato (slow frequency modulation) to the resynthesized note.
  • Implement looping sustain, where the parameters cycle within some subportion of the analysis data while a key is held down (to provide sustain), then proceed through to the end of the sample once the key is released.
  • Try making a sinusoidal analysis of a sound of your own (using SPEAR) and see how well it works for other sounds, perhaps a spoken word. Remember that the system will only resynthesize the first 32 sinusoids it reads from file, so be careful to prune out the less important tracks before you "export" from SPEAR. (You can also modify the SPEAR analysis files in Matlab using spearread.m and spearwrite.m).

Note: if you modify the mypartial patch, it's probably a good idea to close and reopen the parent additive patch. I'm not sure how good Pd is at keeping multiple instantiations in sync when one instance is edited. Re-opening the parent patch will re-instantiate all the mypartials, so they should all get updated.

Followup: Here are my attempts to (a) add keyboard control and vibrato: additive+kbd (requires keybd, lfo, cmap, and twoway), and (b) add looping while the key is held down: additive+kbd+loop.


Weds 2012-02-08: Analog synthesis

[monosynth image]

This week we'll experiment with simulating an analog synthesizer with Pd.

The Pd analog synth simulator consists of several patches:

  • oscillator~.pd implements a band-limited square wave with pulse-width modulation, including a low-frequency oscillator (LFO) to modulate the pulse width.
  • adsr0~.pd provides a simple ADSR envelope generator. We use two of these, one for the amplitude and one for the filter.
  • voice~.pd includes an [oscillator~] and two [adsr0~]s along with [vcf~] to provide a complete synthesizer voice.
  • demo_voice.pd is a simple main patch that routes events from the keyboard to the [voice~] and provides sliders to control the various parameters.

You can download all these patches along with some support functions in the zip file prac03.zip. Load demo_voice.pd into Pd, and the synth should run.

This week we're going to try breaking up into teams of 4 or 5 people to work on the practical. Each team should choose one of the topics below to work on, then at the end of class we'll get a brief report-back from each team. If your team completes one topic, or gets stuck, feel free to try another.

Things to investigate:

  1. Analog synth programming: You could choose one of the sounds from Juno-106 examples page, or you could download and run the stand-alone demo version of the Loomer Aspect analog synth emulator that I demo'd in class (you need a MIDI keyboard to drive it, but you can use the open-source Virtual MIDI Piano Keyboard). Although our basic synth simulation is simple (and only monophonic), it includes all the basic components you should need to make the full range of classic analog synth sounds. By starting from one of the Aspect presets (many of which sound amazing), you can actually see how the sound is put together as a basis for duplicating it (although you might need to consult the Aspect manual to full decipher how things are connected together: most controls have up to three inputs which are summed).
  2. Additional oscillators: One thing Aspect and the Juno have that's not in the basic synth is multiple oscillators, including a "sub-oscillator", for instance a square wave tuned an octave below the main oscillator. Can you modify the simple synth to include this? How does it sound? You might want to also deploy an LFO, perhaps using the lfo.pd patch.
  3. Modified filter: Another difference with the Juno is that its variable filter is a 24 dB/octave low-pass, much steeper than the single vcf (voltage controlled filter) unit in the patch. You could improve this by adding multiple filters in cascade; how does this affect the sound?
  4. Improved ADSR: The existing ADSR generates linear amplitude envelopes, but these don't sound as good as nonlinear envelopes, for instance in dB or quartic mappings. Miller's book discusses this in chapter 4; see if you can change the amplitude mapping from the ADSR to improve the sound.
  5. Drum sounds: You can also create drum sounds with analog synthesizers, but you usually need a noise source instead of, or in addition to, an oscillator. Try using the "noise~" unit to broaden the range of sounds from the synthesizer, and see if you can get some convincing drum sounds (or other interesting sounds). A good drum sound might also want some pitch variation, perhaps generated by an additional ADSR unit.
  6. Velocity sensitivity: One big musical musical limitation of our synthesizer is that it has no dynamic control - no variation of the sound based on how hard you strike the keys. With the computer keyboard as controller, we're pretty much stuck, but I have a couple of USB musical keyboards as well. If you want to try playing with one of these, they will give you a "velocity" value out of the keybd unit. You could then use this to modulate the amplitude, and maybe even the timbre, of the synth.

A good reference for Pd is Miller Puckette's book The Theory and Technique of Electronic Music. To look up individual units, you can try the Index of the online HTML version. A more terse description of the basic operation of Pd is in Miller's original Pd manual. See also the Introduction to Pd excerpted from Andy Farnell's book, Designing Sound - Practical synthetic sound design for film, games and interactive media using dataflow.


Weds 2012-02-01: Filtering

[demo_filters screenshot]

Last week we looked a fairly complex structure built in Pd. This week, we'll back up a bit and play with some simple filters within Pd.

Pd provides a range of built-in filtering objects: [lop~], [hip~], [bp~] (and its sample-rate-controllable twin [vcf~]), the more general [biquad~], and the elemental [rpole~], [cpole~] and [rzero~], [czero~] (see the filters section of the Floss Pd manual, and the Subtractive Synthesis chapter of Johannes Kreidler's Pd Tutorial).

The patch demo_filters.pd provides a framework to play with these filter types. It uses playsound~.pd to allow you to play (and loop) a WAV file, select4~.pd to select one of 4 audio streams via "radio buttons", and plotpowspec~.pd (slightly improved from last week) to plot the running FFT spectrum, as well as the [output~] built-in from Pd-extended. You can download all these patches in prac02.zip.

Try listening to the various sound inputs provided by the top [select4~] through the different filters provided by the bottom [select4~]. Try changing the cutoff frequency with the slider (as well as the Q of the bpf with the number box); listen to the results and look at the short-time spectrum. You can try loading the speech and guitar samples into the [playsound~] unit to see how filtering affects "real" sounds (click the button at the bottom right of [playsound~] to bring up the file selection dialog; click the button on the left to play the sound, and check the toggle to make it loop indefinitely).

Here are a few further experiments you could try:

Followup: I implemented a low-frequency-modulated sweeping notch filter modnotch~.pd and modified the main patch to include it (demo_modnotch.pd) which builds a chain of 8 notches, all modulated at slightly different (and hence incoherent) frequencies. The effect is somewhat interesting, but doesn't really make a single voice sound like a chorus; maybe you need more than 8 notches, or more structure in the notch motion?


Weds 2012-01-25: Plucked string

[plucked_string screenshot]

This week's practical looks at the Karplus-Strong plucked string simulation in Pure Data (Pd). The general pattern of these weekly practical sessions is to give you a piece of code to start with, then ask you to investigate some aspects, by using and changing the code. However, the areas to investigate are left somewhat open, in the hope that we'll each discover different things -- that we can then share.

We start with demo_karpluck.pd, my wrapper around Loomer's karpluck~.pd. In addition to the keybd.pd patch used to provide MIDI-like controls from the computer keyboard, this one also uses grapher~.pd to provide an oscilloscope-like time-domain waveform plot, and plotpowspec~.pd (based on the original by Ron Weiss) to provide a smoothed Fourier transform view. You can download all these patches in prac01.zip.

This patch provides three main controls for the sound of the plucked string:

  • the "Click width", which determines the duration of the noise burst fed into the digital waveguide to initiate the note;
  • the "Decay constant", which scales the overall amplitude of the reflected pulse in each cycle; and
  • the "Feedback lowpass", which is an additional attenuation applied to higher frequencies.

Here are some things to try:

  • Try playing with these three parameters to see what different kinds of "real" sounds you can approximate.
  • You could also try modifying the "innards" of the karpluck patch. For instance, can you soften the envelope of the noise burst generated by excitation~ subpatch of karpluck~? How does that affect the sound?
  • What about changing the order or type of the lop~ (low-pass filter) in the body of karpluck~? Can you explain these effects?
  • Adding a low-frequency modulation to the feedback lowpass can give a nice, subtle vibrato/tremolo effect. Can you do this?
  • One of the nice things about the waveguide physical models is that they can support different kinds of interaction. In class, I tried to demonstrate my Processing-based string model that supported "damping" to get harmonics of the string. Can you do something like this in Pd?
  • The string model is the simplest waveguide model. Instruments like clarinet and trumpet can also be simulated, but need a nonlinear energy input simulating the reed or lips (see for instance this paper on clarinet modeling by Gary Scavone). I haven't yet found a good implementation of this for Pd, but it would be an interesting thing to look for and/or try.

Followup: During class, I modified karpluck~ to support a few extra parameters: a single low-pass filter applied to the excitation (to get low-pass noise initialization instead of white, like using a softer pick), and tremelo depth and modulation rate, to modulate the feedback low-pass to give a kind of tremelo/vibrato effect. See karpluckvib~.pd and demo_karpluckvib~.pd .


Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Dan Ellis <dpwe@ee.columbia.edu>
Last updated: Wed Apr 25 09:40:04 AM EDT 2012