Dan Ellis
: Resources :
Matlab Audio Processing Examples
Introduction
This area contains several little pieces of Matlab code that might be fun
or useful to play with.
-
Robust landmark-based audio fingerprinting
This is my implementation of the music audio fingerprinting scheme
invented by Avery Wang for Shazam. It is able to match short and noisy
excerpts of music against a reference database. The database part is a
bit vestigial in Matlab, but the landmark hashing works pretty well.
-
Compiled audio fingerprint database creation + query
To make it easier to use from outside Matlab (and for people without Matlab licenses),
I redid my fingerprint code as a compiled Matlab binary, available here (for Mac and Linux).
-
Fingerprint-based Label Alignment to Beatles Audio
This is a special pre-computed fingerprint database plus associated routines
that can be used to automatically modify the timings of the
Beatles annotations from isophonics.net to match the local versions of the relevant
Beatles audio (which may have different time shifts and even speeds due to differences
between the different digital masterings of these albums).
-
Aligning MIDI scores to music audio
- It can be very useful to take a MIDI piece that closely matches the
notes in a real performance, then make a precise temporal alignment between
the two; this allows the discrete MIDI description to be treated as
an approximate transcript of the audio. This code performs this
alignment, based on the approach of constructing an approximate, schematic
version of the expected spectrogram from the MIDI notes.
-
Chroma Features Analysis and Synthesis
- Chroma features capture the melodic/harmonic signature of spectra, and form a nice musical complement to more common spectral features such as MFCCs. This package includes a few different ways to calculate them, as well as a resynthesis routine that modulates "Shepard tones" to resynthesize audio with the chromatic content defined by the features.
-
Beat Tracking and Music Matching
- an easy-to-use beat tracker in Matlab, plus code for describing music audio as per-beat chroma features, which is an effective representation for matching songs that have the same melodic/harmonic content despite changes in tempo or instrumentation.
-
Beat Tracking by Dynamic Programming
- a simplified version of my beat tracker that I put together for my Music Signal Processing class.
-
Constant-Q (Log-Frequency) Spectrogram
- an equivalent for Matlab's short-time Fourier transform calculation/display routine specgram() that instead uses a log-frequency axis, so that an octave (doubling in frequency) corresponds to a constant number of bins, regardless of absolute frequency. This projection is particularly useful in music processing, since musical transposition corresponds only to translation. Also includes a drop-in replacement for specgram()
(for people without the signal processing toolbox).
-
Gammmatone (auditory) Spectrogram
- another specgram()-like function, this time for calculating time-frequency surfaces based on the gammatone approximations to auditory filters.
-
Time-domain audio scrambling
- removes some of the identifiability of audio signals by shuffling
overlapping time windows.
-
RASTA/PLP/MFCC feature calculation and inversion - a Matlab implementation of popular speech recognition feature extraction including MFCC and PLP (as defined by Hermansky and Morgan), as well as code to map features back to (noise-excited) audio. Includes a page on Reproducing the feature outputs of common programs.
-
mp3read and mp3write
- a wrapper to read MPEG-Audio layer III (MP3) files into Matlab that behaves like wavread(), and another one to write MP3 files that behaves like wavwrite().
-
m4aread
- a wrapper to use "faad" to read MPEG4 Audio (AAC / M4A) files into Matlab, just like wavread() and mp3read() above.
-
audioread
- a wrapper to sit above waveread, mp3read, m4aread, flacread, etc. to allow format-independent reading of audio files.
-
cache_results
- a general function to save the results of a function to a disk file, then re-read it if the same computation is requested in future.
-
popen for Matlab
- source code for Mex extensions that allow access to the Unix popen()
function to create processes that provide or accept long streams of
data one bit at a time. The neat version of mp3write uses this, but
it's only available on Unix (Linux, Mac OS X, etc.).
-
Dynamic Time Warp - A simple implementation of
dynamic programming to align the STFTs of two 'similar' sound examples, then use the Phase Vocoder to warp the timebase of one to match the other.
-
Phase vocoder - an implementation of the
popular computer music algorithm for arbitrarily altering the time base
of a sound without changing is short-time spectral character.
-
SOLAFS - an implementation of the popular
speech processing algorithm for changing the timescale of speech by deleting
or duplicating entire pitch cycles.
-
Sinewave Speech Analysis/Synthesis - code to
resynthesize sinewave speech samples from the example parameter files made available at the Haskins site.
2001-03-12 Update:Sinewave parameter analysis, based on
simple LPC pole fitting, is now available!. (Pure LPC analysis/synthesis
is included as a bonus!)
-
Spectral warping of LPC models - a warping transformation applied to LPC-extracted vocal tract resonance model can change the apparent 'size' of the speaker.
-
Plucked String Synthesis - a simple example
of the
digital waveguide synthesis
of musical instruments developed at
Stanford's CCRMA.
-
Sinewave (Harmonic) Modeling
- a simple implementation of sinusoid modeling based on picking
peaks in the short-time Fourier transform magnitude. (Also known
as harmonic modeling or McAulay-Quatieri modeling). Includes
some provision for LPC modeling of noisy residual, along the lines
of Harmonic+Noise modeling, or Serra's
Spectral Modeling Synthesis (SMS).
-
Time-frequency automatic gain control
- takes an audio waveform, and adjusts its gain (in time and frequency) to approach a constant energy level.
See also local copies of code I have submitted to
the Matlab File Exchange.
Acknowledgment
This material is based in part upon work supported by the National
Science Foundation under Grant No. IIS-0238301. Any opinions, findings
and conclusions or recomendations expressed in this material are those
of the author(s) and do not necessarily reflect the views of the
National Science Foundation (NSF).
Last updated: $Date: 2006/11/21 14:51:30 $
Dan Ellis <[email protected]>