2009-01-13: Here is the list of
new project suggestions,
some from this year, and some from earlier.
2009-01-13: Here are some nice
from the Spring 2008 semester class.
A major part of the class will be the projects undertaken by students
in some area of speech and audio processing and recognition. You are
encouraged to be thinking about your project from the earliest possible
date, and to discuss ideas with me in order to develop the best
project plans. Various resources, such as corpora of sound files or access
to existing software tools, will be provided where possible.
Each project will culminate in a presentation, either in person in the
final class, or via your web pages (or both). It is expected that on-campus
students will make in-person presentations. The final two classes of the semester
(2009-04-28 and 2009-04-30) will consist of these presentations. You will also make
a short project proposal presentation just before spring break
(2009-03-10 and 2009-03-12), which will be the basis of your midterm grade -- more details below.
Don't worry about making the presentations terribly formal or polished;
think of them rather as an opportunity to explain to the rest of the class
some key ideas leading to or learned from the project. The emphasis is
on communication and sharing ideas and knowledge.
I particularly encourage demonstrations and sound examples.
Each project will also be described by a short written
report detailing the work undertaken. This can be in the form of a
single document (printed or online), or as a set of web pages (which has the advantage
of supporting linked examples). In both cases, the report should
follow the broad format of a research publication, with an introduction
describing the problem, a description of the approach, a presentation
of the results, perhaps a discussion, and final conclusions.
A written report
will typically be 5-10 pages in length, including figures.
Project reports should be handed in (or posted to the web) by
one week after the final presentations, i.e. by Thursday 2009-05-07.
Project scope and assessment
To give you some idea of the amount of work expected in the project,
bear in mind that it accounts for half of a 4.5 credit class, which
should work to something like an average of a day a week over the
semester. At the same time, the best projects have simple and clear
concepts at their core, rather than ballooning into vast investigations.
Here's a recipe for one possible project 'shape':
- Identify the area of the investigation.
- Define a specific, concrete task within that area. For instance, in a sound classification project, this would be the set of target classes into which classification will be performed and a corpus that will be used for testing. In other cases, the goal might be more open-ended (for instance, making a recorded male speaker sound female), but should still be explicit, concrete, and have an identified target domain.
- Define evaluation metrics. This is a crucial habit for quality research. Without some kind of measure of how you are getting on, it's too easy to get lost working on a problem without making real progress. For classification problems, evaluation is easily achieved by measuring error rates on a given test set. Other projects may require more thought to come up with suitable evaluations. For the voice transformation example, it is the subjective success ("does it sound like a male or female?") that really matters, but subjective impressions are cumbersome to collect, so although subjective testing is the only real way to assess this work, some other measure (e.g. signal-level distance from a prototype female utterance?) might be more practical.
- Identify the particular approach you intend to use to solve your chosen problem - the kinds of features you plan to extract, the basic signal processing sequence, etc.
- Make an implementation (and debug it!).
- Measure its performance with your evaluation metrics. Also, make a qualitative investigation into its shortcomings. What are the aspects of its behavior that differ from what you hoped or intended? How might they be improved?
- Based on this analysis, modify your implementation (or make a new implementation) in order to address some of the shortcomings.
- Assess this new iteration, compare it to the original. Were you able to improve things relative to the first attempt? How has the pattern of performance changed?
- If you haven't run out of time, you can repeat this cycle indefinitely.
- Finally, step back and look at the whole path you've come down. If you were going to start the project over again, how would you do it? What have you learned about the nature of the problem in the course of your investigation? What are the most promising avenues for future work?
That recipe is sufficiently vague to cover a wide range of projects, but even so it certainly isn't the only way to do things. However, the emphasis on well-defined goals and evaluation standards (so you can be clear about what's relevant and what isn't), and the idea of iterating over an implementation in light of performance analysis, are aspects I consider very valuable.
The projects will be graded on several dimensions:
- Project structure: How well the basic investigation is defined,
how systematically it is pursued, how well the effort invested
was balanced between different areas.
- Technical content: The breadth and depth of understanding of
audio processing-related ideas displayed within the project.
- Presentation: How well the ideas and results of the project are
Finally, conciseness is always a virtue, particularly in the eyes of
the reader. There is a fine art in editing down reports and presentations
(and lectures!) to contain only the important points and nothing extraneous,
while still presenting enough to make the material intelligible. As always,
blindly generating vast volumes of results is a big warning sign that you
should step back and refocus on your objectives.
Project proposal presentations
All the on-campus students
will make a brief oral presentation of their project plans in the
class meetings of 2009-03-10 and 2009-03-12 (directly before spring break).
Each presentation is limited to 5-7 minutes.
The goal is to explain the general idea
of the topic you are addressing, what experiments you will perform, and
how you will assess the results.
These presentations are
assessed by rest the class as the 'midterm' component of the grade.
Some project ideas
This list is offered to stimulate ideas, rather than to define some
limited domain; interesting ideas that fall outside the categories
below are also encouraged.
- Speech recognition variants: There are several speech recognition
frameworks available that can be used to build a working speech recognizer
in a relatively modest amount of time. Access to existing implementations
and corpora will allow students to focus on modifying a certain aspect
of these highly complex systems (such as feature representations, model
structures, or training procedure) and make quantitative measures of the
impact on recognizer performance.
- Audio compression variants: Different ideas for audio signal
compression can be investigated either by starting from scratch or by modifying
one of the packages available in source code. Bitrate reductions can be
measured, although quality judgments are harder to obtain.
- Nonspeech signal recognition: Many of the techniques used in
speech recognition are in fact applicable over a much wider domain. Speech
is only one kind of complex sound, of course; recognizers could be built
for alarm sounds, particular acoustic events in movie soundtracks, animal
calls etc. Suitable corpora and well-defined experiments can accurately
measure the performance of such systems.
- Speaker identification and characterization: Speech recognition
has focused on the lexical content of speech (i.e. the words) and worked
quite hard to exclude other aspects of the signal. Yet when we listen to
speech, we infer considerable information about the speaker, such as gender,
age, country of origin etc. All this information should be present in the
signal, it is simply a matter of finding the right features and training
the right recognizer.
- Spatial location analysis and synthesis: Auditory spatial perception
is a favorite topic of psychoacoustics research, and many models have been
proposed of how the brain recovers spatial information (azimuth, elevation
and range) from the signals at the two ears. These models can be used both
to attempt the recognition of a given sound's origin, and to synthesize
sounds that appear to come from a specific point in space.
- Prosody detection: Prosody refers to variable aspects of the
speech signal apart from those defining the phonetic content; these include
pitch (melody), timing, stress etc. The focus on speech transcription has
left these aspects of the signal relatively neglected, yet they are certainly
informative, particularly if we wish to understand more than simply the
word sequence. This project would investigate extracting reliable correlates
of such feature from speech signals.
- Music synthesis: A huge range of algorithms have been used in
computer and electronic music; some of these could be investigated, compared,
and perhaps extended.
- Music analysis: Automatic transcription
of recorded music is still a major challenge, even for relatively constrained
subsets, but there are other kinds of information, such as rhythm, genre,
instruments and perhaps chord progressions or bass lines that can be more
- Audio and music retrieval: It's not at all obvious how to define
'similarity' between two sounds, be they one-second sound effects or one-hour
orchestral recordings. But if we could, we might be able to build a useful
analog of a search engine working purely on sound. Several groups have
tried; their approaches could be examined, or a new approach could be developed.
- Temporal structure recovery: If you listen to just the soundtrack
of a movie, you can probably get a pretty good idea of what's going on.
Even if you don't understand the words, you may still recognize the sound
effects, or respond to the soundtrack music. What useful coarse-time structural
information can we recover simply by processing the sound channel of multimedia
Last updated: Thu Jan 22 10:26:30 EST 2009