Columbia Courseworks (Discussion Board)
About the class projects
The class includes a programming project. Your choice of project
should be agreed with me; the easiest way is probably via email.
Some ideas are given at the bottom of this page, but they are only
to get you thinking; your project can be on any topic that interests
you, and I encourage you to choose something that will help harness
The biggest benefit of doing a project is that it will expose you
to a broader range of the activities that come up in real engineering
applications of signal processing, instead of the narrow core we've
mostly covered in class. For this reason, the core requirement is that
you do some kind of processing of some kind of real signal -- i.e.,
I don't want a purely theoretical discussion, and I don't want
experiments on ideal signals. In some cases, it's useful to work
with synthetic signals (ones you have constructed yourself, rather
than gathered from from the real world), but they should be as
"real" as possible, i.e., by adding random noise etc.
I strongly recommend you do your project in Matlab because it frees
you from many of the low-level details, but
other languages or systems can be used by prior arrangement.
Projects can be done individually, or by a small team (typically two students).
I encourage you to consider teaming up, primarily because it will
allow you to do a more interesting project. I will naturally be
judging two-person projects differently from individual projects,
but I won't be applying a tougher standard - just looking for a
more extensive investigation. The best arrangement is to choose a
division of the project so that each of you can work on
separate but interlocking parts.
Learning teamwork is also one of the goals of an engineering
education, so team projects will pick up points for demonstrating a
successful ability to work with others.
The projects will be graded based on a project report
(of around 5 pages)
as well as an optional in-class presentation. If you do a presentation,
I will accept the presentation slidepack as part of the report, although
I would still want a few pages of narration to explain the slides and to
cover the points below. I do
encourage you to make in-class presentations - it's particularly fun
if you have demonstrations such as sound examples. However, we won't have
time for everyone to make presentations, so please let me know as soon
Your report must have the following structure, using these section headings:
- Introduction: A general description of the area of your project and why you're doing it.
- Problem Specification: A clear and succinct technical description of the problem you're addressing. Formulating a general problem (e.g., transcribing music) into a well-defined technical goal (e.g., reporting a list of estimated fundamental periods at each time frame) is often the most important part of a project.
- Data: What are the real-world and/or synthetic signals you are going to use to develop and evaluate your work?
- Evaluation Criteria: How are you going to measure how well your project performs? The best criteria are objective, quantitative, and discriminatory. You want to be able to demonstrate and measure improvements in your system.
- Approach: A description of how you went about trying to solve the problem. Sometimes you can make a nice project by contrasting two or more different approaches.
- Results and Analysis: What happened when you evaluated your system using the data and criteria introduced above? What were the principal shorfalls? (This may require you to choose or synthesize data that will reveal these shortcomings.) Your analysis of what happened is one of the most important opportunities to display your command of signal processing concepts.
- Development: If possible, you will come up with ideas about how to improve the shortcomings identified in the previous section, and then implement and evaluate them. Did they, in fact, help? Were there unexpected side-effects?
- Conclusions: What did you learn from doing the project? What did you demonstrate about how to solve your problem?
- References: Complete list of sources you used in completing your project, with explanations of what you got from each.
The reason for this somewhat arbitrary structure is simply to help you
avoid some of the more problematic weaknesses I've seen in past years.
If you're having trouble fitting your work into these sections, you
should probably think more carefully about your project. If you have
a good reason for deviating from this structure, talk to me or the TA.
You can see what some previous students have done in my collection of
example projects from previous years.
The project reports are due one week after the final exam i.e. on
Monday Aug 31 2015. Electronic submission is encouraged, but please use
PDF format and not Word .DOC files if at all possible, since I often have
formatting problems with Word files. Reports in the format of web pages
to allow multimedia inclusions are also welcome.
Some project suggestions
These projects reflect my personal bias towards audio, but projects
concerned with images, or any other kind of signal processing, are
- DTMF decoder - convert a recording of 'touch tones'
from a real telephone into the corresponding set of digits.
There is a set of example sounds for you to
work with on the sound examples page.
Mitra's book includes some simple code for
DTMF detection (at the start of chapter 11), but this
doesn't deal with the case of multiple tones in a single
soundfile, or necessarily handle the varying signal
quality and spurious sounds (such as dialtone) in some
of the examples.
- Channel equalization. This is a classic signal processing
problem, where the signal has been subject to some unknown filter,
and the goal is to infer what that filter was, then invert its
effects. The sound examples page
contains a set of speech examples subject to several different
filters for you to work with. In those cases, you also
have access to the 'clean' signal before it was filtered,
so it should be possible to do a pretty good job; however,
you can look for algorithms that don't rely on the
clean reference. The basic idea would be to assume you
know what the average spectrum of a speech signal would
look like (since it should be more or less constant, on
average), look at the actual spectrum of your corrupt
example, then design a filter to make the example's spectrum
look more like the ideal average.
- Signal denoising.
Signals get corrupted by noise all kinds of ways - by
electrical interference, by mechanical damage of recording
media, or simply because there were unwanted sounds present
during the original recording. On the
sound examples page there are
a number of speech examples that have been artificially
corrupted by additive noise or by reverberation. Some
approaches to noise reduction involve estimating the
steady-state noise spectrum (by looking during 'gaps' in
the speech), then designing a Wiener filter (see a
signals and systems text) to optimize the signal-to-noise
ratio. More advanced techniques can use the noise floor
estimate to apply dynamic time-frequency gain that attempts
to boost signal while cutting noise. The 'noise gate' found
in a recording studio is one example; a more sophisticated
frequency-dependent approach is the widely-used technique
of spectral subtraction, which is described in the book
Discrete-time processing of speech signals by
Deller, Proakis and Hansen (Macmillan, 1993).
- Speech endpointing
- find the beginning and end of each speech phrase or utterance
in a recording (which may include background noise).
The speech examples with added noise on the
sound examples page can be used here,
as well as the speech-over-music examples. The basic idea is
to design a filter that will do the best job of identifying
speech energy as compared to the nonspeech noise, then
somehow setting a threshold to mark the beginning and end
of the speech when the filtered energy exceeds that threshold.
The following recent paper, chosen somewhat arbitrarily,
describes a more sophisticated approach as well as
including references to more classic papers:
"Robust Entropy-based Endpoint Detection for Speech Recognition
in Noisy Environments," Shen, Hung and Lee, Proc. Int. Conf. on
Spoken Lang. Processing, Sydney, 1998.
- Speech/music discrimination
- classify example fragments as speech, music, or some other class.
There are a few examples on the
sound examples page taken from a larger
database. This paper (written by the people
who collected the database) gives a nice explanation of the
problem, and although some of their methods are quite
involved, others are quite straightforward. They also
give some good references:
"Construction and evaluation of a robust multifeature
Scheirer & Slaney, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Munich, 1997.
- Pitch extraction. This is a widespread problem in
speech and music processing: identifying the local periodicity
of a sound with a perceived pitch. Autocorrelation is the
classic method, but it makes a lot of common errors, so there
are many approaches to improving it. You can use any of the
speech or music examples on the
sound examples page
to work with. Several of the older algorithms are described in
chapter 4 of Digital processing of speech signals by
Rabiner and Schafer (Prentice-Hall, 1978). A more recent
approach that I like, although it's aimed at a more
difficult multi-signal situation, is described in:
"Multi-pitch and periodicity analysis model for sound separation
and auditory scene analysis,"
Karjalainen and Tolonen, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Phoenix, 1999.
- Modeling musical instruments. It turns out that many
musical instruments can be modeled with surprisingly good
quality by relatively simple signal processing networks.
One very fruitful approach has been the "physical modeling
synthesis" developed by Julius Smith and others at
Stanford's CCRMA. Here's a recent article by Julius
Physical Modeling Synthesis Update. You can follow
links from there to lots of other information on his site.
It's quite easy to implement some of these models,
and they can sound surprisingly realistic.
- Timescale modification. In the very first lecture I
played an example of speech that had been slowed down
without lowering the pitch. There are numerous
approaches to this; the following paper describes a
fairly sophisticated recent one, but includes references
to some simpler versions:
"MACH1: Nonuniform time-scale modification of speech,"
Covell, Withgott and Slaney, Proc. Int. Conf. on Acous., Speech and
Signal Processing, Seattle, 1998.
- Sound visualization. We've seen the spectrogram as an
example of rendering a sound as an image. However, there are
very many parameters to vary, with pros and cons to each variation.
This project would choose a particular goal, say a clear
of a certain kind of sound in a variety of backgrounds, then
investigate the best possible processing to facilitate that
display. There are some interesting ideas in the following
"Audio Information Retrieval (AIR) Tools,"
Tzanetakis and Cook, Proc. Int. Symp. on Music Info. Retrieval
(MUSIC IR 2000), Plymouth, October 2000.
- Steganography/watermarking. There are various motivations
for 'hiding' data in a soundfile without making the alteration
audible. One is watermarking, so that a sound can be
recognized as 'valid' without knowing its content ahead of time.
Another is embedding copyright markers that cannot easily
be removed by counterfeiters. This project will investigate
some mechanisms for encoding data in sound, and examine the
limits of how much data can be included, what degradation
to sound quality is entailed, and how hard it is to remove,
or to simulate, the marking.
- Artificial reverberation. I mentioned that one use of
allpass filters is in the simulation of room reverberation.
This project will build a simulated room reverberator and investigate
several enhancements to increase the realism. There are several
good papers to start from.
- Compression. Audio signal compression is a current
hot topic. This project would involve implementing one or more
simple compression schemes, and investigating how they perform
for different kinds of signals, as well as the kinds of distortion
they introduce. One of the classic, high-efficiency
algorithms is ADPCM, described in many places including
chapter 5 of Digital processing of speech signals by
Rabiner and Schafer (Prentice-Hall, 1978). A more modern
lossless compression scheme is the "shorten" program
by Tony Robinson. You can read his technical report on
this page (or see this
A more general overview, including details
of both ADPCM and MPEG-Audio compression is this paper:
"Digital Audio Compression", Davis Pan, Digital Technical
Journal Vol. 5 No. 2, Spring 1993.
- Time-delay angle-of-arrival estimation. We use our
two ears to be able to detect the direction from which a sound
arrives. The strongest cue is probably the slight time
differences that occur due to the finite speed of sound
traveling to each side. Cross-correlation can reveal this
time difference, and indicate the azimuth from which sounds
occur. I have access to several different kinds of
simulated binaural (stereo) recordings; you can experiment
with algorithms to estimate the time difference between the
channels and hence the direction of the source sound.
- Doubletalk detection. In speech processing we frequently
assume that the signal contains just a single voice; in many
cases this is not true, for instance when people interrupt
one another on a telephone call or in a meeting. Separating
these voices is hard, but we would like at least to be able
to detect when it is happening, so we know not to attempt
normal processing. In this project, a set of examples containing
a certain amount of speaker overlap will be provided, along with
some papers describing different possible approaches to detection.
- Synthesizing 3D sound. Since we have some understanding
of how the ear uses binaural (stereo) cues to infer the direction
of different sound sources, we should be able to construct
artificial sounds including those cues that will appear to
come from particular directions. This project will implement
and evaluate some algorithms proposed for this effect.
One source for this is the paper
A Structural Model for Binaural Sound Synthesis by Brown & Duda,
IEEE Tr. Speech & Audio, Sep 1998.
- Cross-synthesis. An interesting effect in electronic
music synthesis is to somehow 'combine' two sounds into a single
sound that appears to have properties of both sources. This
project will implement some variants of how this can be done.
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Last updated: Tue Jun 02 10:19:29 EDT 2015