E6880: Project Info

E6880: Topics in Signal Processing (Media Representation)

Term Project Guidelines

Quick Index

General Project Information
Some Examples of Projects
Bonus for Using Flavor
Image File Formats
Image Viewers
Sample Source Code
Sample Images
Reference Material
The MPEG-2 Video Specification

General Project Information

As the course information indicates, the class requires a programming project. There are no specific requirements on the project, except that it should involve programming and media representation, and be of non-trivial size. Typical projects should be expected to run from 2,000 to 10,000 lines of code, typically somewhere in the middle. Any subject can be pursued, but it first has to be approved by the instructor. You can email a one-paragraph description of your idea, or discuss it during the regular office hours or after class.

There is no requirement in terms of the programming language used for the project, and you can use whatever you are most comfortable with (C++, C, Pascal, Fortran, etc.). Use of high-level languages such as Matlab or Mathematica is not allowed, because they are particularly bad for media representation anyway. Also, any computing facility can be used (Unix workstations, PCs, Macs, etc.). Students are particularly encouraged to consider using Flavor in their projects, or making Flavor their project as well (see below).

At the end of the project, on April 29th, you are expected to present your project in class. You should also submit a short report indicating what you have done and what were the results you obtained, as well as your source code and representative results. Reports should be created in HTML, accompanied by the necessary graphics (HTML is another form of media representation, and this way we can make sure that everyone knows how it works).

Some Examples of Projects

Standard Codecs

The most natural application for media representation projects is encoders and decoders. As a result, any compression algorithm that we have discussed in class is a good candidate for a project. Examples include:

GIF 87a and 89a encoder/decoder: You can write code to encode and decode images. The image input and output format can be any of the popular ones (raw, PBM Plus, etc.). This would be appropriate for a single person.
JPEG: JPEG will be discussed in the next few lectures. It addresses compression of natural images (photographs) and is more sophisticated than GIF. This project would be appropriate for a two-person team, but single person implementations are also doable.
MPEG: MPEG is more complex than JPEG, and addresses compression of digital audio and video. A group of two or three persons can tackle MPEG-1 video within the allocated time frame. People interested in audio may prefer to work on that instead. Finally, the MPEG specification also includes a multiplexing layer, that combines audio and video in a single bitstream. This would be appropriate for a two or three person team. There is actual MPEG hardware and software decoders in the Image and Advanced Television Laboratory that can be made available for you to test your code (i.e., see if the bitstreams you generate can be decoded by MPEG-compliant chips).

Even though the projects mentioned above are image and video-centric, projects that deal with audio compression (e.g., using LPC or CELP) are perfectly acceptable as well.

Non-standard Codecs

You can also make create non-standard codecs (i.e., not complying to any particular industry specification). These can try to improve existing designs or provide for special features that are not supported by current representation schemes.

Graphics Codecs: You can also designed custom codecs for graphics. For example, you might want to try and improve on the GIF LZW algorithm, by eliminating the need for one-pass encoding thus allowing you to obtain (and include with the compressed data) an optimized Huffman code. Or you may want to focus on black and white images (bi-level), typically found in printed or faxed pages, and consider coding strategies especially suitable for them (e.g., you might want to implement the G3 algorithm that can be found in all facsimile devices around the world).
Wavelet Codecs: Wavelet-based codecs are able to support multiple resolutions of the same content using parts of the same bitstream (as we are going to see later in class). If you are intrested to find out more details about wavelets, you can design a wavelet codec using current literature as a reference point. Note that wavelets are not going to be covered very extensively in class.

Applications

Other project areas of interest are applications that use media representation in non-trivial ways. For example, you could implement a simple image manipulating and editing tool in Java. The tool will support a file format that, in addition to the image data, it also contains the shape of the object depicted by the image. One way to do this would be to use the background color defined in GIF, in the same way that Web browsers use it to ignore pixels of that color.

Another example of an application would be to make an animated GIF composer, i.e., a program that takes a set of simple GIF files and creates an animated GIF file. The program should be interactive, and allow user control for the animation. The program could also deviate from the GIF format, and include support for animated text, etc.

The above are just representative examples, and are in no way exhaustive.

Bonus for Using Flavor

Projects that use Flavor in their implementation are of particular interest to us. Consequently, projects that use Flavor for their implementation are strongly encouraged. This will also allow you to learn the way the syntax of the forthcoming MPEG-4 standard is specified. A translator from Flavor to C++ is already available, while one for Java is already in the works (an alpha version should be available soon). As the Flavor tools are still under development, projects that decide to use Flavor will have to interact with the development team to iron out bugs and problems. To compensate for the potential time lost because of the status of the development tools, and in recognition of the exploratory nature of the work, special consideration will be given when assessing the project's grade.

Image File Formats

If your project involves images, you will need to know some basics about how to deal with them, as well as some samples to start playing with. (Note: If you are planning to work on audio, please contact the instructor for further information.)

Image file format refers to the way an image is written in a computer file. In the course of several years, a very large number of formats has been developed. In the following we describe some of the most popular ones.

Raw Format

The raw format is the most trivial one: the pixel values are saved in a raster-scan order (left-to-right, top-to-bottom), one value at a time. For grayscale images having values in the range 0-255, 8 bits or 1 byte per pixel is sufficient. Consequently, an NxM image occupies NxM bytes on disk, with each byte corresponding to the value of one pixel.

The drawback of this format is that the original size of the image cannot be inferred directly by inspection of the file. For example a 256x512 image could not be distinguished from a 512x256 one, unless the image is displayed on the screen.

All other formats were created so that the file itself contains useful information about the image, including size, color map entries (for pseudo-color images, i.e., those that use Color Look-Up Talbes--CLUT), etc.

PBMplus Formats

These are part of a very popular public domain package for Unix systems called pbmplus. It defines three basic formats: PBM (Portable Bitmap, for bi-level images), PGM (Portable Graymap, for grayscale images), and PPM (Portable Pixel Map, for color images). It also includes more than a 100 utilities to convert from/to other formats, including raw, GIF, JPEG, G3, TIFF, etc. It acts as a least common denominator, trying to alleviate the problems of the format pollution that currently exists.

If you are on a Unix system, it will probably be convenient to either deal directly with one of the PBMplus formats, or process your images using the raw format and convert to PBM/PGM/PPM only when you want to display your results. Please consult the on-line manual pages of your system for more information on the structure of these format files (type for example: man 5 pbm (5 is the section of the manual documenting file formats).

Some system administrators install the PBMplus programs without the relevant on-line manual pages. For your convenience, I have placed copies here: pbm.5, pgm.5, ppm.5.

CompuServe GIF

GIF (Graphics Interchange Format) is a very popular format, especially on PCs and Macs, that has been developed by CompuServe. It supports pseudo-color images, using a CLUT. It provides capabilities for lossless compression of the image as it is stored on the file, as well as 4-way interlaced storage so that a rough outline of the image is available very quickly on the display (this is a very popular option in Web pages, since many people access the Web using slow modem lines). Two flavors have been developed, GIF87a, and GIF89a.

GIF is not ideal for image processing with grayscale images, because of the presence of the CLUT. A properly constructed grayscale GIF must have a color map with entries starting from (0, 0, 0), (1, 1, 1), up to (255, 255, 255). The pixel values can then be considered as regular grayscale levels.

You can use existing tools to convert your raw images to this format. The PBMplus package supports GIF. You can read the GIF87a and GIF89a specifications below:

GIF87a.txt

GIF89a.txt

Image Viewers

If your project invovles imagery, you will need to view your source or output images. A variety of programs exist for that purpose. The Netscape and other Web browsers have internal support for viewing a variety of image formats, including GIF and JPEG (a sophisticated lossy compression standard that we will discuss in class). None of them, unfortunately, supports any of the PBMplus formats.

Unix: If you are on a Unix system running the X Window System (X11), then your best choice is the program xv. It supports all the PBMplus formats, GIF, and many others, and it offers a powerful interface for manipulating images.
Windows (PC): If you are working on a PC running Windows (Windows 3.1, Windows 95, or Windows NT 3.x/4.x), you can either use existing imaging software (e.g., the very powerful but expensive Adobe PhotoShop) or one of the shareware or public domain viewers that are abundant on the Internet. You can search for those at the Yahoo directory (www.yahoo.com). I personally use LView Pro, which you can download from here (lview.zip).
Mac: Similar to the PC, you can either use a commercial package (PhotoShop, if you already have it), or a shareware or public domain viewer.

Sample Source Code

Unix Utilities

Before looking at source code, it's good to note that Unix has several utilities that are useful when developing programs that process images.

cmp image1 image2: Determines if the two files/images are byte-by-byte identical. If they are, it will return silently (no ouput); if they are not, it will print the position of the first differing byte. This position is not very useful, since it is given as the N-th character of the K-th line, where line is understood in a textual sense (i.e., a new line starts whenever the character '\n' (or 0x0A) appears).
diff program1.c program2.c: Shows how two versions of a program are different.
od -x image: Shows the contents of the file image in hexadecimal. Useful for checking general characteristics of an image, such as if it is all-zero or a constant gray level.

If you are not very familiar with C, but would like to use it to work in your project, there are several good books around. The definitive books is The C Programming Language by Kernighan and Ritchie (2nd ed.). For C++, you may want to use Stroustroup's The C++ Programming Language (2nd ed.). Both the Columbia Bookstore and Papyrus carry several popular books on various programming languages, as well of course major technical bookstores in the city and elsewhere.

Source Code Samples

rawtopgm

This is a simple converter from the 'raw' to 'pgm' format. It accepts as options the width and height of the image (both default to 256), and requires the name of the raw input file, and the name of the pgm output file. You can access the source code in:

rawtopgm.c

You can use your browser's 'Save As ...' option to save the file on your system).

To compile the program in a Unix system, use the command:

cc -o rawtopgm rawtopgm.c
or
gcc -o rawtopgm rawtopgm.c

'cc' is the C compiler that is common in older Unixes, and it supports only Kernighan and Ritchie (K&R) C. You will be better off using a more advanced compiler, supporting the ANSI C standard. 'gcc' is the GNU C compiler, which supports ANSI C as well as K&R C. It's freely available, so almost all Unix installations have it (if not, your system administrator is not doing a good job).

This program will most likely not compile on PCs or Macs, because the C/C++ compilers for such systems typically lack support for the getopt() function that is used within the program. In these systems, you can use the following version:

rawtopgm.c (version with no getopt() support)

To run the program on a 512x256 image called foo.raw, simply run the command:

rawtopgm -x 512 -y 256 foo.raw foo.pgm

If the width or height is 256, you don't need to specify them in the command line. In addition, the program can also work using the standard input and standard output. As an example, the above command will work equally well as follows:

rawtopgm -x 512 foo.pgm

For the alternate version with no getopt() support, you must specify the image width and height every time, and there is no support for the program to work as a filter. Hence the above command should be written as:

rawtopgm 512 256 foo.raw foo.pgm

rawmse

This is a program that computes the Mean Squared Error (MSE), Signal to Noise Ratio (SNR), and Peak SNR between two images in raw format. As before, the program accepts as options the width and height of the image (both default to 256), and requires the names of the raw original and processed files. You can access the source code in:

rawmse.c

You can use your browser's 'Save As ...' option to save the file on your system).

To compile the program in a Unix system, use the command:

cc -o rawmse rawmse.c -lm
or
gcc -o rawmse rawmse.c -lm

The program is written in ANSI C, so the 'cc' compiler on your Unix system may not be able to compile it. In that case, you can use 'gcc', or ask your system administrator for the location of an ANSI C compiler in your system. The -lm option links in the math library, which provides the logarithm functions used in the program (among several other math functions).

As with 'rawtopgm', this code will most likely not compile on PCs or Macs, because the C/C++ compilers for such systems typically lack support for the getopt() function that is used within the program. In these systems, you can use the following version:

rawmse.c (version with no getopt() support)

To run the program on two 512x256 images called orig.raw and foo.raw, simply run the command:

rawmse -x 512 -y 256 orig.raw foo.raw

If the width or height is 256, you don't need to specify them in the command line. As an example, the above command will work equally well as follows:

rawmse -x 512 orig.raw foo.raw

For the alternate version with no getopt() support, you must specify the image width and height every time. Hence the above command should be written as:

rawmse 512 256 orig. foo.raw

If you have any questions, please contact the instructor or the TA.

Sample Images

The following four sample images can be used to work on your project. They exhibit several different image features, and are good for testing algorithms in a variety of situations. The images are available as GIF, "raw", and PGM files. The latter two are compressed (using both 'pkzip' and 'compress'), so that's easier for you to download. The GIF files are there to allow you to preview the images with your Web browser.

To uncompress the .raw or .pgm files in Unix, download the file, and run the command: uncompress lenna.raw.Z. This will create the original file lenna.raw in the current directory. In the PC just use the pkunzip program on the .zip file.

Lenna (512x512, grayscale)

This is undoubtably the most used image in the field of Digital Image Processing.

View GIF file lenna.gif
Download compressed "raw" file lenna.raw.Z
Download compressed PGM file lenna.pgm.Z
Download zip-ed "raw" and PGM files lenna.zip

Kiel Harbor (512x512, grayscale)

View GIF file kiel.gif
Download compressed "raw" file kiel.raw.Z
Download compressed PGM file kiel.pgm.Z
Download zip-ed "raw" and PGM files kiel.zip

House (256x256, grayscale)

View GIF file house.gif
Download compressed "raw" file house.raw.Z
Download compressed PGM file house.pgm.Z
Download zip-ed "raw" and PGM files house.zip

Lake (512x512, grayscale)

View GIF file lake.gif
Download compressed "raw" file lake.raw.Z
Download compressed PGM file lake.pgm.Z
Download zip-ed "raw" and PGM files lake.zip

Reference Material

The following books and journals contain useful material relating to the course. It is not required or expected that you read any of these. They are made available in case you would like to explore a topic in more detail for your project or if you are just interested in learning more about a particular subject.

N. S. Jayant and P. Noll, "Digital Coding of Waveforms: Principles and Applications to Speech and Video," Prentice Hall, Englewood Cliffs, New Jersey, 1984. A classic for traditional approaches to audiovisual compression (especially audio), which unfortunately is out of print. It is available in the Columbia library, and all technical libraries. Excellent coverage of quantization.
A. N. Netravali and B. G. Haskell, "Digital Pictures: Representation, Compression, and Standards," 2nd ed., Plenum Press, New York, 1995. A very good reference for image and video compression. This 2nd edition includes chapters on JPEG, H.261, and MPEG.
K. R. Rao and P. Yip, "Discrete Cosine Transform: Algorithms, Advantages, Applications," Academic Press, San Diego, California, 1990. Everything you ever wanted to know about the theoretical aspects of DCT is included here. It also has a very good presentation about the KLT, and its relationship to the various version of DCT.
W. Pennebaker and J. Mitchell, "The JPEG Still Image Data Compression Standard," Van Nostrand Reinhold, New York, NY, 1993. Excellent reference for JPEG; it includes the entire specification in an appendix.
J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. LeGall, eds., "MPEG Video Compression Standard," Chapman and Hall, 1996. A very good book on MPEG video; it includes detailed material on both MPEG-1 and MPEG-2.
Y. Fisher, ed., "Fractal Image Compression," Springer-Verlag, New York, 1995. A very good reference for fractal compression as applied to images. It does not, however, include contributions from Jacquin/Barnsley who pioneered the approach.
M. Vetterli and J. Kovacevic, "Wavelets and Subband Coding," Prentice Hall, Englewood Cliffs, New Jersey, 1995. Excellent reference on wavelets and filter banks.
A. Gersho and R. M. Gray, "Vector Quantization and Signal Compression," Kluwer Academic Publishers, Boston, Massachusetts, 1992. A more in-depth treatment of quantization, with approximately half of the book dedicated to vector quantization. This material is not covered in the course.
IEEE Proceedings, Special Issue on Wavelets, J. Kovacevic and I. Daubechies, eds., Vol. 84, No. 4, April 1996. A good general reference on wavelets, covering several applications (not just compression).
L. Torres and M. Kunt, eds., "Video Coding: The Second Generation Approach," Kluwer Academic, Boston, Massachusetts, 1996. A collection of research papers on new approaches to video compression.
O. Avaro, P. Chou, A. Eleftheriadis, C. Herpel, and C. Reader, "The MPEG-4 System and Description Languages," Signal Processing: Image Communication, Special Issue on MPEG-4, 1997 (to appear). This paper is also available in http://www.ee.columbia.edu/~eleft/papers/icj97-oa.html.

The MPEG-2 Video Specification

The following is a draft of the MPEG-2 Video International Standard (dated Janury 20, 1995). Only minor corrections have appeared after this version. The official version is available from ANSI.

ISO/IEC International Standard 13818-2, ITU-T Recommendation H.262, Information Technology - Generic Coding of Moving Pictures and Associated Audio (is138182.pdf, 937KB).

Back to E6880 Home Page

A. Eleftheriadis, [email protected]