Welcome To Yong Wang's Homepage!

Low-Complexity H.264 Decoder:
Motion Estimation and Mode Decision

Most of today's video coding systems encode the video bit streams to achieve the best video quality while satisfying certain bitrate constraints. Nevertheless, many media application devices such as mobile handheld devices are getting smaller and lighter. The computational resources available on the handheld devices become relatively scarce, given the increasing functionality and complexity of applications running on the devices. Therefore, recently there is growing interest in complexity (power) aware video coding solutions.

Emerging video coding standards like H.264 achieves significant advances in improving video quality, reducing bandwidth, but at the cost of greatly increased computational complexity at both the encoder and the decoder. Playing encoded videos produced by such compression standards requires major computational resources and thus power on various handheld devices that are getting increasingly popular in mobile applications.

Among the components in the decoding system, the interpolation procedure used in the motion compensation component consumes the largest computation (about 50%) due to the use of sub-pixel motion vectors. See the complexity break down below.

Break down of H.264 decoding complexity for a test sequence Foreman Diagram
(Reference: Lappalainen, V.; Hallapuro, A.; Hamalainen, TD, "Complexity of optimized H.26L video decoder implementation", vol. 13, iss. 7, IEEE Trans. Circuits Syst. Video Technol., pp. 717- 725, July 2003. )

One way to reduce this major cost is to change the coding algorithm so that the generated compressed bit streams incur less non-integer motion vectors and thus less interpolation operations. The figure below shows the basic coding architecture of H.264. While reducing the number of subpixel motion vectors, we want to select the best block mode and best motion estimation so that minimal video quality loss is introduced. The highlighted box indicates the location for applying our new algorithm, which can be incorporated into any existing H.264 encoder. Note the decoder part is not changed at all.

We have developed a novel Complexity Adaptive Motion Estimation and Mode Decision (CAMED) system to improve the selection of the motion vectors and motion compensation block modes in order to significantly reduce the computational cost while keeping the video quality virtually unchanged. Our current extensive tests show reduction of interpolation cost at the decoder by 30%-60% while keeping the quality loss within 0.3dB.

We accomplish this goal by:

(1) applying a rigorous methodology to extend the conventional rate-distortion optimization framework to include the computation term, forming a rate-distortion-complexity optimization problem,

(2) developing a complexity model that can reliably determine the appropriate parameter (i.e., Lagrange multiplier) needed for optimizing the rate-distortion-complexity tradeoff relationships, and

(3) developing a complexity control algorithm to meet specified target complexity level while keeping the complexity as consistent as possible throughout the video sequence.

Our method can be applied to any existing H.264 encoder system and is compatible with any standard-compliant decoder. Since the interpolation operation constitutes the largest computational cost component at the decoder, our results have great potential for reducing the power consumption in any practical video decoding systems using the latest video coding standard such as MPEG-4, H.264 and Motion Compensated Embedded Zero Block Coding (MC-EZBC).

Our extensive experiments with different video contents, bit rates, and complexity levels show very promising results in reducing the number of interpolation by up to 60% while keeping the video quality almost intact (quality difference less than 0.2dB).

Specifically, the experiment conditions are summarized in the following table. H.264 JM82 was used.

The following figure shows an example performance when applying the CAMED to the 'stefan' sequence. We can see that by reducing 60% of the motion compensation complexity, the quality degradation is nearly imperceptible (average 0.197dB in this case).

The following table summarizes the performance of complexity control at 1000kbps. Complexity control error measures the ability in achieving a specific complexity target. It is calculated as the difference between the actual resulting complexity and the target complexity, normalized by the target complexity. Complexity Saving is the percentage of the original computational cost that is removed. Quality Degradation is the quality difference (in PSNR) between the bit stream generated using original H.264 and the one using our complexity control method. These results confirm that large savings of the computational complexity (30% to 60%) can be achieved with small quality degradation (0.3dB). Improvements from different video clips are different depending on the type of the content and the complexity of the signal. The most challenging case is the 'Mobile' sequence, which has a steady camera motion (slowly panning left) and thus the SKIP/DIRECT mode is frequently used. However, even for such a challenging case, our proposed CAMED method can still achieve about 33% complexity saving while keeping the video quality more or less intact.

The following table summarizes the performance of complexity control at 100kbps with frame rate 10fps and I, P frames only. It is clear to observe that the performce at the lower bit rate is better than that at the higher bit rate. The reason is that at lower bit rate the motion information is less sensitive because the residual errors are rough due to severe quantization. In another words it is more reliable to reduce the MC complexity with trivial quality degradation by using R-D-C optimized motion infromation. Usually typical power aware video decoding scenarios (such as mobile video applications) involve low bit rate and low frame rate. Our results indicate that CAMED is an excellent solution for such applications.

We provide some test sequences below to demonstrate the complexity saving by our proposed CAMED method, compared to the ones encoded by the H.264 JM82 reference software. Different video clips at different bit rates are provided. The "xxxK" marks in the file name indicate the target complexity level.

To play the compressed video streams, you can use the H.264 player available at Moonlight. If you can not find any suitable H.264 player (decoder), you can also directly use the YUV files that are attached (each around 45MB). To render the YUV files, you can use the tool VisGenie or YUVGenius.

Sequence	Target Bit Rate (Kbps)	CAMED	H.264 JM82	Quality Degradation (dB)	Complexity Saving (%)
Foreman	1000	Foreman_1000_100K.264 (YUV)	Foreman_1000.264 (YUV)	0.28	60.15
Stefan		Stefan_1000_150K.264 (YUV)	Stefan_1000.264 (YUV)	0.35	39.25
Mobile		Mobile_1000_200K.264 (YUV)	Mobile_1000.264 (YUV)	0.31	33.27
Foreman	600	Foreman_600_100K.264 (YUV)	Foreman_600.264 (YUV)	0.32	59.26
Stefan		Stefan_600_150K.264 (YUV)	Stefan_600.264 (YUV)	0.34	39.27
Mobile		Mobile_600_250K.264 (YUV)	Mobile_600.264 (YUV)	0.20	25.55
Foreman	100	Foreman_100_10K.264 (YUV)	Foreman_100.264 (YUV)	0.16	81.70
Stefan		Stefan_100_20K.264 (YUV)	Stefan_100.264 (YUV)	0.09	65.64
Mobile		Mobile_100_20K.264 (YUV)	Mobile_100.264 (YUV)	0.17	66.08

The binary code for the H.264 encoder with our proposed CAMED technique can be downloaded here.

You can use any H.264 decoder to play the bit streams provided above. Here is one H.264 player available from Moonlight.

If you are interested in parsing the bit streams for more statistical parameters, you can download the latest H.264 reference software package.

The proposed CAMED system has great potential in realizing an efficient low-power video decoder product. There are several interesting topics that will benefit further investigation. First, in practice, many video encoder implementations utilize some fast motion estimation procedures to reduce the power consumption on the encoder side. An interesting topic is to study how the proposed technique will affect the video quality and computational complexity when such fast encoder implementations are applied . Secondly, the actual computational performance on hardware platforms also depends on many other factors such as hardware architecture, memory access strategy, and system-level design issues. It will be important to combine the advantages provided by our method (in reducing the sub-pixel interpolation operations) with architecture- or hardware-level power reduction strategies to achieve the overall target performance. Thirdly, several components of the proposed framework, such as the complexity modeling, are not fully optimized and present interesting opportunities for further improvement.

Yong Wang, Shih-Fu Chang. Complexity Adaptive H.264 Encoding for Light Weight Streams. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Toulouse, France, 2006. [pdf] [Slides]
Yong Wang. Resource Constrained Video Coding/Adaptation. PhD Thesis Graduate School of Arts and Sciences, Columbia University, 2005. [pdf]


	For problems or questions regarding this web page please contact with me at .
	Copyright © By Yong Wang All Rights Reserved
	Last updated: Jan 14th, 2003