A Syntactic Description Language for MPEG-4

Alexandros Eleftheriadis
Department of Electrical Engineering, Columbia University

Contribution M0546, 32nd MPEG Meeting, Dallas, Texas, November 1995

Introduction

This document proposes a concrete starting point for a syntactic representation framework for audio-visual objects, as embodied in MSDL. It originates on the C-like syntax that has been successfully used to describe the structure of coded audio-visual components in MPEG-1 and MPEG-2. This syntax has two different levels: a textual one, that associates coded information with meaningful names (e.g., picture_start_code), as well as a binary one that denotes the actual bits that are placed in the bitstream. MSDL's first and foremost purpose is to describe coded audiovisual information; as such, it must associate bit-level representations with meaningful quantities, and describe the ways these can be combined together to form valid bitstreams. For this purpose, using the existing framework of MPEG-1/2 is deemed as a very suitable starting point.

The features added to the old scheme include object-orientation and more thoroughly defined semantics that can be used for potential machine translation. Even though the current short-term focus in MSDL is Level-1 (L1) functionality (numbering levels from 0 to 2), the long-term goal is to provide full Level-2 (L2) capabilities; care must then be taken in the syntactic description to allow this to happen. The primary difference between L1 and L2 is that in the latter one can specify new tools. This necessitates an MSDL that can actually describe new syntactic components (e.g., an octagonal block) and their processing (or decoding) algorithms, hence making it an application-specific programming language. Hence MSDL must be able to describe the following three items:

  1. syntax (similar to MPEG-1 and MPEG-2; sufficient for L0 and L1),
  2. syntactic description (meta-data, used to convey the structure of new syntactic elements to the decoder, for example the octagonal block; needed for L1 and L2),
  3. algorithm description (the algorithmic part of the decoder; needed only for L2).

This document primarily describes the data representation aspects of this approach, i.e., how can one describe the data that are placed in the bitstream. If, however, one wants to add L2 functionality, it is straightforward to continue and add items 2 and 3. The exact process to do is succinctly outlined. For item 2, it is simply a matter of using a bit-efficient binary representation of the formalism used for item 1. For item 3, the process is more involved, and necessitates the use of a virtual machine.

It is important to note that this framework is in no way restricted to A/V objects in their classical sense. In fact, it can describe any structure that is represented by a series of bits. This includes synthetic objects based on 2-D or 3-D graphics models. Concreteness is necessary in order for MSDL to become useful in specifying coding tools and algorithms, while generality is needed so that unanticipated developments can be easily accommodated. It is worth noting that the framework described here can describe the entire current set of MPEG specifications.

We should point out that this document describes work in progress, and should be considered as such.

PostScript (24 KB)
Word (14 KB)