Dan Ellis : Resources :

Matlab Audio Processing Examples

Introduction

This area contains several little pieces of Matlab code that might be fun or useful to play with.

Robust landmark-based audio fingerprinting This is my implementation of the music audio fingerprinting scheme invented by Avery Wang for Shazam. It is able to match short and noisy excerpts of music against a reference database. The database part is a bit vestigial in Matlab, but the landmark hashing works pretty well.

Compiled audio fingerprint database creation + query To make it easier to use from outside Matlab (and for people without Matlab licenses), I redid my fingerprint code as a compiled Matlab binary, available here (for Mac and Linux).

Fingerprint-based Label Alignment to Beatles Audio This is a special pre-computed fingerprint database plus associated routines that can be used to automatically modify the timings of the Beatles annotations from isophonics.net to match the local versions of the relevant Beatles audio (which may have different time shifts and even speeds due to differences between the different digital masterings of these albums).

Aligning MIDI scores to music audio - It can be very useful to take a MIDI piece that closely matches the notes in a real performance, then make a precise temporal alignment between the two; this allows the discrete MIDI description to be treated as an approximate transcript of the audio. This code performs this alignment, based on the approach of constructing an approximate, schematic version of the expected spectrogram from the MIDI notes.

Chroma Features Analysis and Synthesis - Chroma features capture the melodic/harmonic signature of spectra, and form a nice musical complement to more common spectral features such as MFCCs. This package includes a few different ways to calculate them, as well as a resynthesis routine that modulates "Shepard tones" to resynthesize audio with the chromatic content defined by the features.

Beat Tracking and Music Matching - an easy-to-use beat tracker in Matlab, plus code for describing music audio as per-beat chroma features, which is an effective representation for matching songs that have the same melodic/harmonic content despite changes in tempo or instrumentation.

Beat Tracking by Dynamic Programming - a simplified version of my beat tracker that I put together for my Music Signal Processing class.

Constant-Q (Log-Frequency) Spectrogram - an equivalent for Matlab's short-time Fourier transform calculation/display routine specgram() that instead uses a log-frequency axis, so that an octave (doubling in frequency) corresponds to a constant number of bins, regardless of absolute frequency. This projection is particularly useful in music processing, since musical transposition corresponds only to translation. Also includes a drop-in replacement for specgram() (for people without the signal processing toolbox).

Gammmatone (auditory) Spectrogram - another specgram()-like function, this time for calculating time-frequency surfaces based on the gammatone approximations to auditory filters.

Time-domain audio scrambling - removes some of the identifiability of audio signals by shuffling overlapping time windows.

RASTA/PLP/MFCC feature calculation and inversion - a Matlab implementation of popular speech recognition feature extraction including MFCC and PLP (as defined by Hermansky and Morgan), as well as code to map features back to (noise-excited) audio. Includes a page on Reproducing the feature outputs of common programs.

mp3read and mp3write - a wrapper to read MPEG-Audio layer III (MP3) files into Matlab that behaves like wavread(), and another one to write MP3 files that behaves like wavwrite().

m4aread - a wrapper to use "faad" to read MPEG4 Audio (AAC / M4A) files into Matlab, just like wavread() and mp3read() above.

audioread - a wrapper to sit above waveread, mp3read, m4aread, flacread, etc. to allow format-independent reading of audio files.

cache_results - a general function to save the results of a function to a disk file, then re-read it if the same computation is requested in future.

popen for Matlab - source code for Mex extensions that allow access to the Unix popen() function to create processes that provide or accept long streams of data one bit at a time. The neat version of mp3write uses this, but it's only available on Unix (Linux, Mac OS X, etc.).

Dynamic Time Warp - A simple implementation of dynamic programming to align the STFTs of two 'similar' sound examples, then use the Phase Vocoder to warp the timebase of one to match the other.

Phase vocoder - an implementation of the popular computer music algorithm for arbitrarily altering the time base of a sound without changing is short-time spectral character.

SOLAFS - an implementation of the popular speech processing algorithm for changing the timescale of speech by deleting or duplicating entire pitch cycles.

Sinewave Speech Analysis/Synthesis - code to resynthesize sinewave speech samples from the example parameter files made available at the Haskins site.
2001-03-12 Update:Sinewave parameter analysis, based on simple LPC pole fitting, is now available!. (Pure LPC analysis/synthesis is included as a bonus!)

Spectral warping of LPC models - a warping transformation applied to LPC-extracted vocal tract resonance model can change the apparent 'size' of the speaker.

Plucked String Synthesis - a simple example of the digital waveguide synthesis of musical instruments developed at Stanford's CCRMA.

Sinewave (Harmonic) Modeling - a simple implementation of sinusoid modeling based on picking peaks in the short-time Fourier transform magnitude. (Also known as harmonic modeling or McAulay-Quatieri modeling). Includes some provision for LPC modeling of noisy residual, along the lines of Harmonic+Noise modeling, or Serra's Spectral Modeling Synthesis (SMS).

Time-frequency automatic gain control - takes an audio waveform, and adjusts its gain (in time and frequency) to approach a constant energy level.

See also local copies of code I have submitted to the Matlab File Exchange.

Acknowledgment

This material is based in part upon work supported by the National Science Foundation under Grant No. IIS-0238301. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).

Last updated: $Date: 2006/11/21 14:51:30 $

Dan Ellis <[email protected]>