Department of Electrical Engineering - Columbia University

[SEAS logo]

ELEN E4810 - Summer 2015


Home page

Course outline

Matlab scripts

Problem sets



Columbia Courseworks (Discussion Board)

Class projects

About the class projects

The class includes a programming project. Your choice of project should be agreed with me; the easiest way is probably via email. Some ideas are given at the bottom of this page, but they are only to get you thinking; your project can be on any topic that interests you, and I encourage you to choose something that will help harness your enthusiasm.

The biggest benefit of doing a project is that it will expose you to a broader range of the activities that come up in real engineering applications of signal processing, instead of the narrow core we've mostly covered in class. For this reason, the core requirement is that you do some kind of processing of some kind of real signal -- i.e., I don't want a purely theoretical discussion, and I don't want experiments on ideal signals. In some cases, it's useful to work with synthetic signals (ones you have constructed yourself, rather than gathered from from the real world), but they should be as "real" as possible, i.e., by adding random noise etc.

I strongly recommend you do your project in Matlab because it frees you from many of the low-level details, but other languages or systems can be used by prior arrangement.

Projects can be done individually, or by a small team (typically two students). I encourage you to consider teaming up, primarily because it will allow you to do a more interesting project. I will naturally be judging two-person projects differently from individual projects, but I won't be applying a tougher standard - just looking for a more extensive investigation. The best arrangement is to choose a division of the project so that each of you can work on separate but interlocking parts. Learning teamwork is also one of the goals of an engineering education, so team projects will pick up points for demonstrating a successful ability to work with others.

The projects will be graded based on a project report (of around 5 pages) as well as an optional in-class presentation. If you do a presentation, I will accept the presentation slidepack as part of the report, although I would still want a few pages of narration to explain the slides and to cover the points below. I do encourage you to make in-class presentations - it's particularly fun if you have demonstrations such as sound examples. However, we won't have time for everyone to make presentations, so please let me know as soon as possible.

Your report must have the following structure, using these section headings:

  1. Introduction: A general description of the area of your project and why you're doing it.
  2. Problem Specification: A clear and succinct technical description of the problem you're addressing. Formulating a general problem (e.g., transcribing music) into a well-defined technical goal (e.g., reporting a list of estimated fundamental periods at each time frame) is often the most important part of a project.
  3. Data: What are the real-world and/or synthetic signals you are going to use to develop and evaluate your work?
  4. Evaluation Criteria: How are you going to measure how well your project performs? The best criteria are objective, quantitative, and discriminatory. You want to be able to demonstrate and measure improvements in your system.
  5. Approach: A description of how you went about trying to solve the problem. Sometimes you can make a nice project by contrasting two or more different approaches.
  6. Results and Analysis: What happened when you evaluated your system using the data and criteria introduced above? What were the principal shorfalls? (This may require you to choose or synthesize data that will reveal these shortcomings.) Your analysis of what happened is one of the most important opportunities to display your command of signal processing concepts.
  7. Development: If possible, you will come up with ideas about how to improve the shortcomings identified in the previous section, and then implement and evaluate them. Did they, in fact, help? Were there unexpected side-effects?
  8. Conclusions: What did you learn from doing the project? What did you demonstrate about how to solve your problem?
  9. References: Complete list of sources you used in completing your project, with explanations of what you got from each.

The reason for this somewhat arbitrary structure is simply to help you avoid some of the more problematic weaknesses I've seen in past years. If you're having trouble fitting your work into these sections, you should probably think more carefully about your project. If you have a good reason for deviating from this structure, talk to me or the TA.

You can see what some previous students have done in my collection of example projects from previous years.

The project reports are due one week after the final exam i.e. on Monday Aug 31 2015. Electronic submission is encouraged, but please use PDF format and not Word .DOC files if at all possible, since I often have formatting problems with Word files. Reports in the format of web pages to allow multimedia inclusions are also welcome.

Some project suggestions

These projects reflect my personal bias towards audio, but projects concerned with images, or any other kind of signal processing, are equally acceptable.

  • DTMF decoder - convert a recording of 'touch tones' from a real telephone into the corresponding set of digits. There is a set of example sounds for you to work with on the sound examples page. Mitra's book includes some simple code for DTMF detection (at the start of chapter 11), but this doesn't deal with the case of multiple tones in a single soundfile, or necessarily handle the varying signal quality and spurious sounds (such as dialtone) in some of the examples.
  • Channel equalization. This is a classic signal processing problem, where the signal has been subject to some unknown filter, and the goal is to infer what that filter was, then invert its effects. The sound examples page contains a set of speech examples subject to several different filters for you to work with. In those cases, you also have access to the 'clean' signal before it was filtered, so it should be possible to do a pretty good job; however, you can look for algorithms that don't rely on the clean reference. The basic idea would be to assume you know what the average spectrum of a speech signal would look like (since it should be more or less constant, on average), look at the actual spectrum of your corrupt example, then design a filter to make the example's spectrum look more like the ideal average.
  • Signal denoising. Signals get corrupted by noise all kinds of ways - by electrical interference, by mechanical damage of recording media, or simply because there were unwanted sounds present during the original recording. On the sound examples page there are a number of speech examples that have been artificially corrupted by additive noise or by reverberation. Some approaches to noise reduction involve estimating the steady-state noise spectrum (by looking during 'gaps' in the speech), then designing a Wiener filter (see a signals and systems text) to optimize the signal-to-noise ratio. More advanced techniques can use the noise floor estimate to apply dynamic time-frequency gain that attempts to boost signal while cutting noise. The 'noise gate' found in a recording studio is one example; a more sophisticated frequency-dependent approach is the widely-used technique of spectral subtraction, which is described in the book Discrete-time processing of speech signals by Deller, Proakis and Hansen (Macmillan, 1993).
  • Speech endpointing - find the beginning and end of each speech phrase or utterance in a recording (which may include background noise). The speech examples with added noise on the sound examples page can be used here, as well as the speech-over-music examples. The basic idea is to design a filter that will do the best job of identifying speech energy as compared to the nonspeech noise, then somehow setting a threshold to mark the beginning and end of the speech when the filtered energy exceeds that threshold. The following recent paper, chosen somewhat arbitrarily, describes a more sophisticated approach as well as including references to more classic papers: "Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments," Shen, Hung and Lee, Proc. Int. Conf. on Spoken Lang. Processing, Sydney, 1998.
  • Speech/music discrimination - classify example fragments as speech, music, or some other class. There are a few examples on the sound examples page taken from a larger database. This paper (written by the people who collected the database) gives a nice explanation of the problem, and although some of their methods are quite involved, others are quite straightforward. They also give some good references: "Construction and evaluation of a robust multifeature speech/music discriminator" Scheirer & Slaney, Proc. Int. Conf. on Acous., Speech and Signal Processing, Munich, 1997.
  • Pitch extraction. This is a widespread problem in speech and music processing: identifying the local periodicity of a sound with a perceived pitch. Autocorrelation is the classic method, but it makes a lot of common errors, so there are many approaches to improving it. You can use any of the speech or music examples on the sound examples page to work with. Several of the older algorithms are described in chapter 4 of Digital processing of speech signals by Rabiner and Schafer (Prentice-Hall, 1978). A more recent approach that I like, although it's aimed at a more difficult multi-signal situation, is described in: "Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis," Karjalainen and Tolonen, Proc. Int. Conf. on Acous., Speech and Signal Processing, Phoenix, 1999.
  • Modeling musical instruments. It turns out that many musical instruments can be modeled with surprisingly good quality by relatively simple signal processing networks. One very fruitful approach has been the "physical modeling synthesis" developed by Julius Smith and others at Stanford's CCRMA. Here's a recent article by Julius called Physical Modeling Synthesis Update. You can follow links from there to lots of other information on his site. It's quite easy to implement some of these models, and they can sound surprisingly realistic.
  • Timescale modification. In the very first lecture I played an example of speech that had been slowed down without lowering the pitch. There are numerous approaches to this; the following paper describes a fairly sophisticated recent one, but includes references to some simpler versions: "MACH1: Nonuniform time-scale modification of speech," Covell, Withgott and Slaney, Proc. Int. Conf. on Acous., Speech and Signal Processing, Seattle, 1998.
  • Sound visualization. We've seen the spectrogram as an example of rendering a sound as an image. However, there are very many parameters to vary, with pros and cons to each variation. This project would choose a particular goal, say a clear of a certain kind of sound in a variety of backgrounds, then investigate the best possible processing to facilitate that display. There are some interesting ideas in the following paper: "Audio Information Retrieval (AIR) Tools," Tzanetakis and Cook, Proc. Int. Symp. on Music Info. Retrieval (MUSIC IR 2000), Plymouth, October 2000.
  • Steganography/watermarking. There are various motivations for 'hiding' data in a soundfile without making the alteration audible. One is watermarking, so that a sound can be recognized as 'valid' without knowing its content ahead of time. Another is embedding copyright markers that cannot easily be removed by counterfeiters. This project will investigate some mechanisms for encoding data in sound, and examine the limits of how much data can be included, what degradation to sound quality is entailed, and how hard it is to remove, or to simulate, the marking.
  • Artificial reverberation. I mentioned that one use of allpass filters is in the simulation of room reverberation. This project will build a simulated room reverberator and investigate several enhancements to increase the realism. There are several good papers to start from.
  • Compression. Audio signal compression is a current hot topic. This project would involve implementing one or more simple compression schemes, and investigating how they perform for different kinds of signals, as well as the kinds of distortion they introduce. One of the classic, high-efficiency algorithms is ADPCM, described in many places including chapter 5 of Digital processing of speech signals by Rabiner and Schafer (Prentice-Hall, 1978). A more modern lossless compression scheme is the "shorten" program by Tony Robinson. You can read his technical report on it through this page (or see this local copy). A more general overview, including details of both ADPCM and MPEG-Audio compression is this paper: "Digital Audio Compression", Davis Pan, Digital Technical Journal Vol. 5 No. 2, Spring 1993.
  • Time-delay angle-of-arrival estimation. We use our two ears to be able to detect the direction from which a sound arrives. The strongest cue is probably the slight time differences that occur due to the finite speed of sound traveling to each side. Cross-correlation can reveal this time difference, and indicate the azimuth from which sounds occur. I have access to several different kinds of simulated binaural (stereo) recordings; you can experiment with algorithms to estimate the time difference between the channels and hence the direction of the source sound.
  • Doubletalk detection. In speech processing we frequently assume that the signal contains just a single voice; in many cases this is not true, for instance when people interrupt one another on a telephone call or in a meeting. Separating these voices is hard, but we would like at least to be able to detect when it is happening, so we know not to attempt normal processing. In this project, a set of examples containing a certain amount of speaker overlap will be provided, along with some papers describing different possible approaches to detection.
  • Synthesizing 3D sound. Since we have some understanding of how the ear uses binaural (stereo) cues to infer the direction of different sound sources, we should be able to construct artificial sounds including those cues that will appear to come from particular directions. This project will implement and evaluate some algorithms proposed for this effect. One source for this is the paper A Structural Model for Binaural Sound Synthesis by Brown & Duda, IEEE Tr. Speech & Audio, Sep 1998.
  • Cross-synthesis. An interesting effect in electronic music synthesis is to somehow 'combine' two sounds into a single sound that appears to have properties of both sources. This project will implement some variants of how this can be done.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
Dan Ellis <[email protected]>
Last updated: Tue Jun 02 10:19:29 EDT 2015