Yong Wang's Homepage | Research

Utility Based Video Adaptation and Content Based Optimal Adaptation Operation Prediction



Utility-based Video Adaptation
Content-based Optimal Adaptation Operation Prediction
    1) MPEG-4 FD-CD
    2) Spatio-Temporal Scalability of MC-EZBC using Subjective Quality Evaluation




Video adaptation allows for direct manipulation of existing encoded video streams to meet new resource constraints without re-encoding the video from scratch. Many techniques exist for adapting videos to satisfy heterogeneous resource conditions or user preferences. Moreover, multi-dimensional scalable coding offers a flexible representation for video adaptation in multiple dimensions comprising spatial detail and temporal resolution, thus providing great benefits for universal media access applications. However, currently most of the adaptation research works either focus on the optimization of fixed, predefined video adaptation methods, or approach the multi-dimensional adaptation (MDA) problem in an ad hoc manner.

To provide a systematic solution, we present a general conceptual framework called utility function (UF), which models video entity, adaptation, resource, utility, and the mapping relations among them. The framework extends the conventional rate-distortion model in terms of flexibility and generality. UF allows formulation of various adaptation problems as resource-constrained utility maximization, and supports the adaptation scheme in an interoperable way.

Furthermore, in order to address the issue of computational complexity in generating UF and support real-time video transcoding, we present a content-based statistical approach to facilitate the prediction of UF for real-time transcoding of live videos. It formulates the prediction as a classification problem – each new video clip is classified into one of distinctive categories. If necessary, local regression can also be applied to refine the utility function. the adaptation operation is guided by the predicted utility function. Our extensive experiment results based on MPEG-4 and MC-EZBC adaptation demonstrate that the proposed content-based prediction method achieves very promising performance.

Utility-based Video Adaptation


UF is an efficient tool representing the relationship between the utility of video and the required resource. UF originates from the Adaptation-Resource-Utility (ARU) space concept, where relationships among diverse types of adaptations, resources (e.g., bandwidth, power, and display) and utilities (e.g., objective or subjective quality) were modeled. We use the term “space” in a loose sense here to indicate the multiple dimensionalities involved potentially. Figure 1 depicts the definition of ARU spaces. The entity refers to the basic unit of video data that undergoes the adaptation process. Adaptation operators are the methods to reshape the video entities, such as requantization and frame dropping. All permissible adaptations for a given video entity constitute the adaptation space. Resources are constraints from terminals or networks, include bandwidth, display resolution, power, etc. Utility represents the quality of an entity when it is rendered on an end device after adaptation, such as PSNR, perceptual quality, or even high-level comprehensibility. Multiple utilities define the utility space. The mapping relationship among ARU spaces is illustrated in Figure 1. Typically, there exist multiple adaptation solutions that satisfy the same resource constraints, while yield different utilities. In Figure 1, the points in the oval shaped region in the adaptation space indicate such a constant-resource region. It is such a multi-option situation that makes the adaptation problem interesting – we want to choose the optimal one with the highest utility or minimal resource. When we consider bandwidth for resource and video objective quality for utility, the relations among the ARU spaces can be represented by UF, as indicated in Figure 1 (right part). Each point in UF is associated with one specific adaptation operator, which may include combinations of multiple operations (such as frame dropping and coefficient dropping).

Figure 1. The instantiation of utility function from the resource-adaptation-utility space mapping.

One example using the utility-based framework to realize video adaptation is illustrated in Figure 2, where a two-dimensional adaptation space is constituted, each of which is indexed by a finite set of adaptation operations. For a specific video clip, given an adaptation operator a_i, its corresponding resource and utility value are denoted as r_i and u_i. An operator specifies the exact adaptation parameter in each dimension in meeting the resource constraint. Given a resource constraint r, all of the possible operators meeting the same resource constraint, such as a_A and a_B in Figure 2 can be found from the UF, and the one with optimal utility value will be chosen (a_A in this case). Such utility based adaptation mechanism was also accepted as a part of MPEG-21 Digital Item Adaptation (DIA).

Figure 2. Utility based video adaptation

Content-based Optimal Adaptation Operation Prediction


Utility function based transcoding is an efficient systematic solution for choosing optimal media transcoding operation to meet dynamic resource constraints. However, to date the real-time generation of utility function is not feasible due to computational complexity. We propose a novel content based utility function prediction model, where the basic idea is to consider this problem from a standpoint of statistic pattern classification, using the automatically extracted content features as input and the utility function category as the target. The system architecture of the utility function prediction is shown in Figure 3 below. The upper part shows the high-level view of the procedure of the utility-based adaptation for an incoming live video. For each entity , the content features are extracted and the utility function prediction is applied. The adaptation engine reshapes the stream according to the predicted utility function . The lower part details information about the utility function prediction module. It consists of an offline training routine and an online prediction routine, while the former consists of unsupervised clustering, classification learning and (optional) regression learning, and the latter includes online classification and linear regression. More specifically, for the offline training we firstly build up a media pool using training video clips. For each clip in the pool and are computed in advance. Then, the content features are used to obtain the clustering results and each clip will be mapped to one of the clusters accordingly. Given the labeled instances in the pool, the classification decision is trained using the content features. For the online prediction routine, the content features of the incoming video is analyzed by the classification function. If necessary, according to the classification result, the corresponding regression model for the selected class can be activated and the predicted utility function is obtained.

Figure 3. Overall architecture of the proposed content based utility function prediction framework




FD-CD represents for the combination of frame dropping (FD) and DCT coefficient dropping (CD). FD-CD provides the freedom in optimizing trade-off between spatial and temporal quality. Both operations can be efficiently implemented in the compressed domain without full decoding of the compressed streams.

In our MPEG-4 FD-CD experiment, we selected 2066 video clips, each of which was one second long with SIF format (352x240). The clips were carefully selected to cover a wide range of content features. Every clip was extracted from within a shot and thus no abrupt transitions like shot changes occurred within a clip. The proposed prediction framework was tested using standard cross validation evaluation method in which training and testing was done with random partitions of the pool (70% for training and 30% for testing) over multiple iterations. Four FD operators were adopted: “no frame dropped”, “the first B frame dropped in each sub GOP”, “all B frames dropped”, and “all B and P frames dropped” (namely only keep the I frame ). For each FD, six CD levels were chosen: from 0% to 50% with 10% increment. As a result, there are four operation curves in each utility function.

We applied the above UF framework and demonstrate a content-adaptive utility-based MPEG-4 FD-CD transcoding system. Content features like texture complexity and motion intensity are extracted from each incoming video segment and used to predict the utility function based on pre-trained classifiers. The optimal adaptation operator among all possible options is then automatically selected based on the predicted utility function. Our extensive experiments show very accurate prediction of the utility function as well as the optimal operator - up to 89% in choosing the optimal transcoding operation with the highest quality from multiple alternatives meeting the same target bit rate. More importantly, the whole process of feature extraction, classification, and prediction can be done in real time without needing to use exhaustive comparison of different options.

Figure 4 below is the screen shot of our live demo system, which simulates the real time utility function prediction procedure for MPEG-4 FD-CD. It also shows the extracted features, dynamic network condition, comparison of the actual utility function and predicted one, and comparison of the final transcoded video quality.

Figure 4: Screen shot of the MPEG-4 FD-CD Adaptation Demo

2) Spatio-Temporal Scalability of MC-EZBC using Subjective Quality Evaluation

Motion Compensated Embedded Zero Block Coding (MC-EZBC) is one of the latest subband / wavelet based scalable video coding (MC-3DSBC) systems. Scalable video coding based on MC-3DSBC is becoming increasingly popular, as it yields coding performance competitive with the state-of-art non-scalable codecs, while providing low implementation complexity and high flexibility to match instantaneous network conditions and different receiver capabilities. The spatio-temporal scalability of MC-EZBC is realized by truncate spatial bit planes and temporal subbands respectively, resulting in bit streams with lower temporal sampling rate (i.e., frame rate) and degraded PSNR quality.

In order to understand the influence of different MC-EZBC adaptation operations on the subjective video quality, we conducted extensive subjective tests involving 31 subjects, 128 video clips, a wide range of bandwidth (50Kbps to 1Mbps) and formal subjective quality metrics. Subjects were asked to evaluate the perceptual qualities of a few video clips generated using different spatio-temporal adaptation operations that yield the same bit rate. Figure 5 is the screen shot of the subjective test interface.

Figure 5. Subjective quality evaluation test

After obtaining subjective quality data, we analyze the results using statistical testing methods and investigate the dependence of optimal frame rate on user, bandwidth, and video content characteristics. Our findings indicate the agreement among most users and existence of switching bandwidths at which preferred frame rates change. Dependence of the preference on video content types is also revealed.

Based on these, we applied our proposed prediction framework on MC-EZBC adaptation to choose optimal adaptation operation that matches subjective perceptual quality. Statistical analysis of the experimental results confirms the excellent accuracy in using domain knowledge and content features to predict the optimal adaptation operations - We can achieve up to 95% accuracy in selecting the optimal operations, and compared with the approach using empirical data instead of content-based classification, our method can improve the performance by up to 20%. We also find interesting patterns about the preferred frame rates at different bit rates for different categories of videos.


  Direct Output
  1. Y. Wang, J.-G. Kim, and S.-F. Chang, Utility-based Video Adaptation for UMA and Content-based Utility Function Prediction for Real Time Video Transcoding, under revision for IEEE Trans. Multimedia.
  2. Y. Wang, M. v. d. Schaar, S.-F. Chang, A. C. Loui, Content-Based Optimal Adaptation Operation Prediction For Scalable Video Coding Systems Using Subjective Quality Evaluation. Preparing for IEEE CSVT Special Issue on Analysis and Understanding for Video Adaptation. 2004
  3. D. Mukherjee, E. Delfosse, J.-G. Kim, Y. Wang, Terminal and Network Quality of Service, invited paper, IEEE Trans. On Multimedia Special Issue on MPEG-21. 2004
  4. Y. Wang, S.-F. Chang, A. C. Loui. Subjective Preference of Spatio-Temporal Rate in Video Adaptation Using Multi-Dimensional Scalable Coding. IEEE ICME 2004 special session on Mobile Imaging: technology and applications.Volume: 3, Pages:1719 - 1722, June 27-30, 2004.
  5. Y. Wang, T.-T. Ng, M. v. d. Schaar, S.-F. Chang, Predicting Optimal Operation of MC-3DSBC Multi-Dimensional Scalable Video Coding Using Subjective Quality Measurement. Proc. SPIE Video Communications and Image Processing (VCIP), San Jose, CA, January 2004
  6. J.-G. Kim, Y. Wang, S.F. Chang, Content-Adaptive Utility-Based Video Adaptation, IEEE ICME-2003. July 6-9, 2003. Baltimore, Maryland.
  7. Y. Wang, J.-G. Kim, and S.-F. Chang, Content-based utility function prediction for real-time MPEG-4 transcoding, ICIP 2003, September 14-17, 2003, Barcelona, Spain.
  8. Y. Wang, S.-F. Chang, A. C. Loui. Content-Based Prediction of Optimal Video Adaptation Operations Using Subjective Quality Evaluation. Columbia University ADVENT Technical Report #202-2004-2, January 2004.
  9. Y. Wang, J.-G. Kim, S.-F. Chang, “MPEG-4 Real Time FD-CD Transcoding,” Columbia University ADVENT Technical Report #11122003, 2003.
  10. J.-G. Kim, Y. Wang, S.-F. Chang, K. Kang, J. Kim, "Description of utility function based optimum transcoding," ISO/IEC JTC1/SC29/WG11 M8319 Fairfax May 2002.
Related Contribution
  1. S.-F. Chang, Optimal Video Adaptation and Skimming Using a Utility-Based Framework, Tyrrhenian International Workshop on Digital Communications (IWDC-2002), Capri Island, Italy, Sept. 2002
  2. P. Bocheck, Y. Nakajima and S.-F. Chang, Real-time Estimation of Subjective Utility Functions for MPEG-4 Video Objects, Proceedings of the Packet Video'99 (PV'99), New York, USA, April 26-27, 1999.



Yong Wang
Shih-Fu Chang

in collaboration with (in alphabeta order of the last name)

Jae-Gon Kim (Electronics and Telecommunications Research Institute (ETRI), Korea)
Dr. Alex Loui (Eastman Kodak)
Tian-Tsong Ng
Prof. Mihaela van der Schaar (University of California at Davis)





Old UMA Project Homepage


Columbia | EE Department | ADVENT | DVMM

For problems or questions regarding this web page please contact with me at .
Copyright © By Yong Wang All Rights Reserved
Last updated: Jan 14th, 2003