E6880: Topics in Signal Processing (Media Representation)


Homeworks



Homework 1 (due Feb. 18)

Cover and Thomas, Chapter 2, Problems 10, 16 (a)-(e), 21. Chapter 4, Problem 5 (a)-(d). Chapter 5, Problems 4 (a)-(b), 6, 8, 15.

Read the Flavor description (will be covered later on in class).

Download Microsoft Word for Windows file: flavor13.zip (15 KB)
Download PostScript file: flavor-1.3.ps (230 KB)

Solutions (prepared by Y. Fang)

Homework 2 (due Mar. 4)

Theory

Cover and Thomas, Chapter 12, Problem 12. Make sure you indicate the contents of the dictionary after encoding is complete. Lempel-Ziv compression is covered in Chapter 12, Section 10 of the Cover and Thomas book (p. 319).

Practice

The following two problems have the following purposes: 1) to familiarize you with the details of the GIF file format, 2) to familiarize you with Flavor, and how it can be used to represent bitstram syntax, and 3) to get you started in terms of image representation programming. If you have trouble figuring out how the Flavor language works let me ([email protected]) know; we can devote some time in the next lecture to go through it in some detail. As it stands, I am assuming that, with some minimal programming background, it should be fairly easy to follow, so I am only going to briefly discuss it in the next lecture.

Problem 1 (Flavor)

Use Flavor to represent the GIF87a bitstream specification. Go as deep in the structure of the data as you can. You should at least be able to represent everything except the pixel data as LZW codes. You might want to use the Flavor translator to C++ to check that the Flavor code runs through the translator (flavorc, see below) without warning (i.e., it is valid Flavor code). The translator is available for almost all Unix platforms.

Problem 2 (Web browser robustness test)

Write a simple program that reads a GIF file, modifies one of the headers (not the LZW-coded data!), and creates a new output GIF file. Pick various values (both valid and invalid) and display the file using a web browser of your choice. Report on how the browser reacts to the various parameters. For example, you might want to put invalid image width or height, or try different screen widths and heights, or change the background color. Together with your report, submit also the source code as well as the original and modified GIF files (mail them directly to the TA).

If you already know some C++ or are willing to learn its very basics, you could use the result of Problem 1 and our flavorc translator to automatically generate the C++ code that will parse the data for you.

Reading Assignment

A commonly used format in Unix workstations is PBM Plus (developed by Jef Poskanzer). It's a quite simple format, deriving its power by the fact that the author has written more than 120 utilities to convert from/to this format and many many other popular ones (GIF, TIFF, XBM, PostScript, etc.). PBM Plus actually consists of three very similar but different formats: PBM (Portable Bit Map - black and white images), PGM (Portable Gray Map - grayscale images), and PPM (Portable Pixel Map - color images). They all consist of a simple ASCII header for the width, height, and maximum value for the pixels, followed by the pixel values in ASCII form. There is also a "binary" version, in which the pixel data is represented in binary format (one byte per pixel per component) to minimize the file size. In either case, the data is represented uncompressed. Thus, PBM Plus is very close to a "raw" image file (binary data with no header).

Supporting Material

Read the GIF87a specification: GIF87a.txt

A very good explanation of how the LZW compression algorithm works, as well as its variation within the GIF specification (by J. Barkaus): zlw.txt

The newer GIF89a specification is also available, in case you are interested: GIF89a.txt

The Flavor web site (currently under construction, but you can get the flavorc programs): http://www.ee.columbia.edu/flavor (PS: We should be releasing Version 1.2 by Feb. 24)

The PBM Plus file formats: PBM, PGM, PPM.

Solutions (prepared by Y. Fang)

Homework 3 (due Apr. 1)

Transform coding, and in particular DCT, is discussed in Chapter 6 of Haskell et al.

Evaluate the 2-D DCT of the following two 4x4 signals x1(m,n), x2(m,n).

x1(m,n)
m\n
(0)
(1)
(2)
(3)
(0)
0
0
1
1
(1)
0
0
1
1
(2)
0
0
1
1
(3)
0
0
1
1

x2(m,n)
m\n
(0)
(1)
(2)
(3)
(0)
0
1
0
1
(1)
1
0
1
0
(2)
0
1
0
1
(3)
1
0
1
0

After you have found X1(u,v) and X2(u,v) for u, v=0, 1, 2, 3 (and you have understood which frequency corresponds to each integer value of u, v), make some comments about the high spatial frequency content of an edge (with a specific orientation) and of noise (note, however, that x2(m,n) is not really random-looking).

Now discard the highest frequency coefficient Xi(3, 3) for both i=1, 2. In other words, consider Yi(u,v)= Xi(u,v)S(u,v), in which S(u,v)=0 if u=v=3, and 1 otherwise. Evaluate the inverse 2-D DCT yi(m,n) for i=1, 2. Discuss your results. Compute the mean squared error in both cases.

Note: To simplify your computations, first obtain the 4x4 DCT matrix, and then compute the transformations using matrix multiplication. You can use Matlab or another programming facility if you want to, but in this case you must include your source code.

Solutions (prepared by Y. Fang)

Homework 4 (due Apr. 22)

Write a program in your favorite programming language that computes the motion vector field (the set of motion vectors) between two frames. Consider only forward motion vectors (i.e., P picture prediction), full-pel accuracy, a macroblock (motion estimation block) size of 16, and a search range of +/- 16. Use full-search block matching and an additional block-based motion estimation technique of your choice (conjugate directions, n-step, etc.).

Run your program on the following input frames (luminance only), given in raw format: m000.raw.Z or m000.zip (reference), m001.raw.Z or m001.zip (current). Both are of size 704x480. Your program should output an ASCII file, containing one line per motion vector, with the two values separated by a single space. Compute the running time of your program for the two implemented motion estimation techniques.

Extend your program to compute the number of bits required to code the entire set of motion vectors using the VLC Table 7.2 (page 155, Haskell et al.). Use differential coding with restart at the beginning of each line of mabcroblocks. Note: you do not have to produce the coded bits; just compute how many of them are required.

Solutions (prepared by Y. Fang)



Back to E6880 Home Page


A. Eleftheriadis, [email protected]
General Syllabus Homeworks Projects Announcements