Columbia Courseworks (Discussion Board)
About the class projects
The class will include a programming project. Your choice of project
should be agreed with me; the easiest way is probably via email.
A list of ideas is given below. The use of Matlab is encouraged, though
other languages or systems can be used by prior arrangement.
Projects can be done individually, or by teams of two students.
I encourage you to consider teaming up, primarily because it will
allow you to do a more interesting project. I will naturally be
judging two-person projects differently from individual projects,
but I won't be applying a tougher standard - just looking for a
more extensive investigation. The best arrangement is to choose a
division of the project so that each of you can work on
separate but interlocking parts.
Learning teamwork is also one of the more general goals of an engineering
education, so team projects will pick up points for demonstrating a
successful ability to work with others.
The projects will be graded based on a project report
(of around 5 pages)
as well as an optional in-class presentation. If you do a presentation,
I will accept the presentation slidepack as part of the report, although
I would still want a few pages of narration to explain the slides. I do
encourage you to make in-class presentations - it's particularly fun
if you have demonstrations such as sound examples. However, we won't have
time for everyone to make presentations, so please let me know as soon
You can see what some previous students have done in my collection of
example projects from previous years.
The project reports are due one week after the final class i.e. on
Monday Dec 17 2012. Electronic submission is encouraged, but please use
PDF format and not Word .DOC files if at all possible, since I often have
formatting problems with Word files. Reports in the format of web pages
to allow multimedia inclusions are also welcome. Print-outs of reports
can be left in my mailbox in the department office in 1300 Mudd.
Some project suggestions
These projects reflect my personal bias towards audio, but projects
concerned with images, or any other kind of signal processing, are
- DTMF decoder - convert a recording of 'touch tones'
from a real telephone into the corresponding set of digits.
There is a set of example sounds for you to
work with on the sound examples page.
Mitra's book includes some simple code for
DTMF detection (at the start of chapter 11), but this
doesn't deal with the case of multiple tones in a single
soundfile, or necessarily handle the varying signal
quality and spurious sounds (such as dialtone) in some
of the examples.
- Channel equalization. This is a classic signal processing
problem, where the signal has been subject to some unknown filter,
and the goal is to infer what that filter was, then invert its
effects. The sound examples page
contains a set of speech examples subject to several different
filters for you to work with. In those cases, you also
have access to the 'clean' signal before it was filtered,
so it should be possible to do a pretty good job; however,
you can look for algorithms that don't rely on the
clean reference. The basic idea would be to assume you
know what the average spectrum of a speech signal would
look like (since it should be more or less constant, on
average), look at the actual spectrum of your corrupt
example, then design a filter to make the example's spectrum
look more like the ideal average.
- Signal denoising.
Signals get corrupted by noise all kinds of ways - by
electrical interference, by mechanical damage of recording
media, or simply because there were unwanted sounds present
during the original recording. On the
sound examples page there are
a number of speech examples that have been artificially
corrupted by additive noise or by reverberation. Some
approaches to noise reduction involve estimating the
steady-state noise spectrum (by looking during 'gaps' in
the speech), then designing a Wiener filter (see a
signals and systems text) to optimize the signal-to-noise
ratio. More advanced techniques can use the noise floor
estimate to apply dynamic time-frequency gain that attempts
to boost signal while cutting noise. The 'noise gate' found
in a recording studio is one example; a more sophisticated
frequency-dependent approach is the widely-used technique
of spectral subtraction, which is described in the book
Discrete-time processing of speech signals by
Deller, Proakis and Hansen (Macmillan, 1993).
- Speech endpointing
- find the beginning and end of each speech phrase or utterance
in a recording (which may include background noise).
The speech examples with added noise on the
sound examples page can be used here,
as well as the speech-over-music examples. The basic idea is
to design a filter that will do the best job of identifying
speech energy as compared to the nonspeech noise, then
somehow setting a threshold to mark the beginning and end
of the speech when the filtered energy exceeds that threshold.
The following recent paper, chosen somewhat arbitrarily,
describes a more sophisticated approach as well as
including references to more classic papers:
"Robust Entropy-based Endpoint Detection for Speech Recognition
in Noisy Environments," Shen, Hung and Lee, Proc. Int. Conf. on
Spoken Lang. Processing, Sydney, 1998.
- Speech/music discrimination
- classify example fragments as speech, music, or some other class.
There are a few examples on the
sound examples page taken from a larger
database. This paper (written by the people
who collected the database) gives a nice explanation of the
problem, and although some of their methods are quite
involved, others are quite straightforward. They also
give some good references:
"Construction and evaluation of a robust multifeature
Scheirer & Slaney, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Munich, 1997.
- Pitch extraction. This is a widespread problem in
speech and music processing: identifying the local periodicity
of a sound with a perceived pitch. Autocorrelation is the
classic method, but it makes a lot of common errors, so there
are many approaches to improving it. You can use any of the
speech or music examples on the
sound examples page
to work with. Several of the older algorithms are described in
chapter 4 of Digital processing of speech signals by
Rabiner and Schafer (Prentice-Hall, 1978). A more recent
approach that I like, although it's aimed at a more
difficult multi-signal situation, is described in:
"Multi-pitch and periodicity analysis model for sound separation
and auditory scene analysis,"
Karjalainen and Tolonen, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Phoenix, 1999.
- Modeling musical instruments. It turns out that many
musical instruments can be modeled with surprisingly good
quality by relatively simple signal processing networks.
One very fruitful approach has been the "physical modeling
synthesis" developed by Julius Smith and others at
Stanford's CCRMA. Here's a recent article by Julius
Physical Modeling Synthesis Update. You can follow
links from there to lots of other information on his site.
It's quite easy to implement some of these models,
and they can sound surprisingly realistic.
- Timescale modification. In the very first lecture I
played an example of speech that had been slowed down
without lowering the pitch. There are numerous
approaches to this; the following paper describes a
fairly sophisticated recent one, but includes references
to some simpler versions:
"MACH1: Nonuniform time-scale modification of speech,"
Covell, Withgott and Slaney, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Seattle, 1998.
- Sound visualization. We've seen the spectrogram as an
example of rendering a sound as an image. However, there are
very many parameters to vary, with pros and cons to each variation.
This project would choose a particular goal, say a clear
of a certain kind of sound in a variety of backgrounds, then
investigate the best possible processing to facilitate that
display. There are some interesting ideas in the following
"Audio Information Retrieval (AIR) Tools,"
Tzanetakis and Cook, Proc. Int. Symp. on Music Info. Retrieval
(MUSIC IR 2000), Plymouth, October 2000.
- Steganography/watermarking. There are various motivations
for 'hiding' data in a soundfile without making the alteration
audible. One is watermarking, so that a sound can be
recognized as 'valid' without knowing its content ahead of time.
Another is embedding copyright markers that cannot easily
be removed by counterfeiters. This project will investigate
some mechanisms for encoding data in sound, and examine the
limits of how much data can be included, what degradation
to sound quality is entailed, and how hard it is to remove,
or to simulate, the marking.
- Artificial reverberation. I mentioned that one use of
allpass filters is in the simulation of room reverberation.
This project will build a simulated room reverberator and investigate
several enhancements to increase the realism. There are several
good papers to start from.
- Compression. Audio signal compression is a current
hot topic. This project would involve implementing one or more
simple compression schemes, and investigating how they perform
for different kinds of signals, as well as the kinds of distortion
they introduce. One of the classic, high-efficiency
algorithms is ADPCM, described in many places including
chapter 5 of Digital processing of speech signals by
Rabiner and Schafer (Prentice-Hall, 1978). A more modern
lossless compression scheme is the "shorten" program
by Tony Robinson. You can read his technical report on
this page (or see this
A more general overview, including details
of both ADPCM and MPEG-Audio compression is this paper:
"Digital Audio Compression", Davis Pan, Digital Technical
Journal Vol. 5 No. 2, Spring 1993.
- Time-delay angle-of-arrival estimation. We use our
two ears to be able to detect the direction from which a sound
arrives. The strongest cue is probably the slight time
differences that occur due to the finite speed of sound
traveling to each side. Cross-correlation can reveal this
time difference, and indicate the azimuth from which sounds
occur. I have access to several different kinds of
simulated binaural (stereo) recordings; you can experiment
with algorithms to estimate the time difference between the
channels and hence the direction of the source sound.
- Doubletalk detection. In speech processing we frequently
assume that the signal contains just a single voice; in many
cases this is not true, for instance when people interrupt
one another on a telephone call or in a meeting. Separating
these voices is hard, but we would like at least to be able
to detect when it is happening, so we know not to attempt
normal processing. In this project, a set of examples containing
a certain amount of speaker overlap will be provided, along with
some papers describing different possible approaches to detection.
- Synthesizing 3D sound. Since we have some understanding
of how the ear uses binaural (stereo) cues to infer the direction
of different sound sources, we should be able to construct
artificial sounds including those cues that will appear to
come from particular directions. This project will implement
and evaluate some algorithms proposed for this effect.
One source for this is the paper
A Structural Model for Binaural Sound Synthesis by Brown & Duda,
IEEE Tr. Speech & Audio, Sep 1998.
- Cross-synthesis. An interesting effect in electronic
music synthesis is to somehow 'combine' two sounds into a single
sound that appears to have properties of both sources. This
project will implement some variants of how this can be done.
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Last updated: Mon Sep 03 09:36:15 PM EDT 2012