As the course information indicates, the class requires a programming project. There are no specific requirements on the project, except that it should involve programming and media representation, and be of non-trivial size. Typical projects should be expected to run from 2,000 to 10,000 lines of code, typically somewhere in the middle. Any subject can be pursued, but it first has to be approved by the instructor. You can email a one-paragraph description of your idea, or discuss it during the regular office hours or after class.
There is no requirement in terms of the programming language used for the project, and you can use whatever you are most comfortable with (C++, C, Pascal, Fortran, etc.). Use of high-level languages such as Matlab or Mathematica is not allowed, because they are particularly bad for media representation anyway. Also, any computing facility can be used (Unix workstations, PCs, Macs, etc.). Students are particularly encouraged to consider using Flavor in their projects, or making Flavor their project as well (see below).
At the end of the project, on April 29th, you are expected to present your project in class. You should also submit a short report indicating what you have done and what were the results you obtained, as well as your source code and representative results. Reports should be created in HTML, accompanied by the necessary graphics (HTML is another form of media representation, and this way we can make sure that everyone knows how it works).
The most natural application for media representation projects is encoders and decoders. As a result, any compression algorithm that we have discussed in class is a good candidate for a project. Examples include:
Even though the projects mentioned above are image and video-centric, projects that deal with audio compression (e.g., using LPC or CELP) are perfectly acceptable as well.
You can also make create non-standard codecs (i.e., not complying to any particular industry specification). These can try to improve existing designs or provide for special features that are not supported by current representation schemes.
Other project areas of interest are applications that use media representation in non-trivial ways. For example, you could implement a simple image manipulating and editing tool in Java. The tool will support a file format that, in addition to the image data, it also contains the shape of the object depicted by the image. One way to do this would be to use the background color defined in GIF, in the same way that Web browsers use it to ignore pixels of that color.
Another example of an application would be to make an animated GIF composer, i.e., a program that takes a set of simple GIF files and creates an animated GIF file. The program should be interactive, and allow user control for the animation. The program could also deviate from the GIF format, and include support for animated text, etc.
The above are just representative examples, and are in no way exhaustive.
Projects that use Flavor in their implementation are of particular interest to us. Consequently, projects that use Flavor for their implementation are strongly encouraged. This will also allow you to learn the way the syntax of the forthcoming MPEG-4 standard is specified. A translator from Flavor to C++ is already available, while one for Java is already in the works (an alpha version should be available soon). As the Flavor tools are still under development, projects that decide to use Flavor will have to interact with the development team to iron out bugs and problems. To compensate for the potential time lost because of the status of the development tools, and in recognition of the exploratory nature of the work, special consideration will be given when assessing the project's grade.
If your project involves images, you will need to know some basics about how to deal with them, as well as some samples to start playing with. (Note: If you are planning to work on audio, please contact the instructor for further information.)
Image file format refers to the way an image is written in a computer file. In the course of several years, a very large number of formats has been developed. In the following we describe some of the most popular ones.
The raw format is the most trivial one: the pixel values are saved in a raster-scan order (left-to-right, top-to-bottom), one value at a time. For grayscale images having values in the range 0-255, 8 bits or 1 byte per pixel is sufficient. Consequently, an NxM image occupies NxM bytes on disk, with each byte corresponding to the value of one pixel.
The drawback of this format is that the original size of the image cannot be inferred directly by inspection of the file. For example a 256x512 image could not be distinguished from a 512x256 one, unless the image is displayed on the screen.
All other formats were created so that the file itself contains useful information about the image, including size, color map entries (for pseudo-color images, i.e., those that use Color Look-Up Talbes--CLUT), etc.
These are part of a very popular public domain package for Unix systems called pbmplus. It defines three basic formats: PBM (Portable Bitmap, for bi-level images), PGM (Portable Graymap, for grayscale images), and PPM (Portable Pixel Map, for color images). It also includes more than a 100 utilities to convert from/to other formats, including raw, GIF, JPEG, G3, TIFF, etc. It acts as a least common denominator, trying to alleviate the problems of the format pollution that currently exists.
If you are on a Unix system, it will probably be convenient to either deal directly with one of the PBMplus formats, or process your images using the raw format and convert to PBM/PGM/PPM only when you want to display your results. Please consult the on-line manual pages of your system for more information on the structure of these format files (type for example: man 5 pbm (5 is the section of the manual documenting file formats).
Some system administrators install the PBMplus programs without the relevant on-line manual pages. For your convenience, I have placed copies here: pbm.5, pgm.5, ppm.5.
GIF (Graphics Interchange Format) is a very popular format, especially on PCs and Macs, that has been developed by CompuServe. It supports pseudo-color images, using a CLUT. It provides capabilities for lossless compression of the image as it is stored on the file, as well as 4-way interlaced storage so that a rough outline of the image is available very quickly on the display (this is a very popular option in Web pages, since many people access the Web using slow modem lines). Two flavors have been developed, GIF87a, and GIF89a.
GIF is not ideal for image processing with grayscale images, because of the presence of the CLUT. A properly constructed grayscale GIF must have a color map with entries starting from (0, 0, 0), (1, 1, 1), up to (255, 255, 255). The pixel values can then be considered as regular grayscale levels.
You can use existing tools to convert your raw images to this format. The PBMplus package supports GIF. You can read the GIF87a and GIF89a specifications below:
GIF87a.txt | |
GIF89a.txt |
If your project invovles imagery, you will need to view your source or output images. A variety of programs exist for that purpose. The Netscape and other Web browsers have internal support for viewing a variety of image formats, including GIF and JPEG (a sophisticated lossy compression standard that we will discuss in class). None of them, unfortunately, supports any of the PBMplus formats.
If you are on a Unix system running the X Window System (X11), then your best choice is the program xv. It supports all the PBMplus formats, GIF, and many others, and it offers a powerful interface for manipulating images.
If you are working on a PC running Windows (Windows 3.1, Windows 95, or Windows NT 3.x/4.x), you can either use existing imaging software (e.g., the very powerful but expensive Adobe PhotoShop) or one of the shareware or public domain viewers that are abundant on the Internet. You can search for those at the Yahoo directory (www.yahoo.com). I personally use LView Pro, which you can download from here (lview.zip).
Similar to the PC, you can either use a commercial package (PhotoShop, if you already have it), or a shareware or public domain viewer.
Determines if the two files/images are byte-by-byte identical. If they are, it will return silently (no ouput); if they are not, it will print the position of the first differing byte. This position is not very useful, since it is given as the N-th character of the K-th line, where line is understood in a textual sense (i.e., a new line starts whenever the character '\n' (or 0x0A) appears).
Shows how two versions of a program are different.
Shows the contents of the file image in hexadecimal. Useful for checking general characteristics of an image, such as if it is all-zero or a constant gray level.
If you are not very familiar with C, but would like to use it to work in your project, there are several good books around. The definitive books is The C Programming Language by Kernighan and Ritchie (2nd ed.). For C++, you may want to use Stroustroup's The C++ Programming Language (2nd ed.). Both the Columbia Bookstore and Papyrus carry several popular books on various programming languages, as well of course major technical bookstores in the city and elsewhere.
You can use your browser's 'Save As ...' option to save the file on your system).
To compile the program in a Unix system, use the command:
'cc' is the C compiler that is common in older Unixes, and it supports only Kernighan and Ritchie (K&R) C. You will be better off using a more advanced compiler, supporting the ANSI C standard. 'gcc' is the GNU C compiler, which supports ANSI C as well as K&R C. It's freely available, so almost all Unix installations have it (if not, your system administrator is not doing a good job).
This program will most likely not compile on PCs or Macs, because the C/C++ compilers for such systems typically lack support for the getopt() function that is used within the program. In these systems, you can use the following version:
To run the program on a 512x256 image called foo.raw, simply run the command:
If the width or height is 256, you don't need to specify them in the command line. In addition, the program can also work using the standard input and standard output. As an example, the above command will work equally well as follows:
For the alternate version with no getopt() support, you must specify the image width and height every time, and there is no support for the program to work as a filter. Hence the above command should be written as:
This is a program that computes the Mean Squared Error (MSE), Signal to Noise Ratio (SNR), and Peak SNR between two images in raw format. As before, the program accepts as options the width and height of the image (both default to 256), and requires the names of the raw original and processed files. You can access the source code in:
You can use your browser's 'Save As ...' option to save the file on your system).
To compile the program in a Unix system, use the command:
The program is written in ANSI C, so the 'cc' compiler on your Unix system may not be able to compile it. In that case, you can use 'gcc', or ask your system administrator for the location of an ANSI C compiler in your system. The -lm option links in the math library, which provides the logarithm functions used in the program (among several other math functions).
As with 'rawtopgm', this code will most likely not compile on PCs or Macs, because the C/C++ compilers for such systems typically lack support for the getopt() function that is used within the program. In these systems, you can use the following version:
To run the program on two 512x256 images called orig.raw and foo.raw, simply run the command:
If the width or height is 256, you don't need to specify them in the command line. As an example, the above command will work equally well as follows:
For the alternate version with no getopt() support, you must specify the image width and height every time. Hence the above command should be written as:
If you have any questions, please contact the instructor or the TA.
The following four sample images can be used to work on your project. They exhibit several different image features, and are good for testing algorithms in a variety of situations. The images are available as GIF, "raw", and PGM files. The latter two are compressed (using both 'pkzip' and 'compress'), so that's easier for you to download. The GIF files are there to allow you to preview the images with your Web browser.
To uncompress the .raw or .pgm files in Unix, download the file, and run the command: uncompress lenna.raw.Z. This will create the original file lenna.raw in the current directory. In the PC just use the pkunzip program on the .zip file.
This is undoubtably the most used image in the field of Digital Image Processing.
View GIF file lenna.gif
Download compressed "raw" file lenna.raw.Z
Download compressed PGM file lenna.pgm.Z
Download zip-ed "raw" and PGM files lenna.zip
View GIF file kiel.gif
Download compressed "raw" file kiel.raw.Z
Download compressed PGM file kiel.pgm.Z
Download zip-ed "raw" and PGM files kiel.zip
View GIF file house.gif
Download compressed "raw" file house.raw.Z
Download compressed PGM file house.pgm.Z
Download zip-ed "raw" and PGM files house.zip
View GIF file lake.gif
Download compressed "raw" file lake.raw.Z
Download compressed PGM file lake.pgm.Z
Download zip-ed "raw" and PGM files lake.zip
The following books and journals contain useful material relating to the course. It is not required or expected that you read any of these. They are made available in case you would like to explore a topic in more detail for your project or if you are just interested in learning more about a particular subject.
The following is a draft of the MPEG-2 Video International Standard (dated Janury 20, 1995). Only minor corrections have appeared after this version. The official version is available from ANSI.