Welcome To Yong Wang's Homepage!

Utility Based Video Adaptation and Content Based Optimal Adaptation Operation Prediction

Video adaptation allows for direct manipulation of existing encoded video streams to meet new resource constraints without re-encoding the video from scratch. Many techniques exist for adapting videos to satisfy heterogeneous resource conditions or user preferences. Moreover, multi-dimensional scalable coding offers a flexible representation for video adaptation in multiple dimensions comprising spatial detail and temporal resolution, thus providing great benefits for universal media access applications. However, currently most of the adaptation research works either focus on the optimization of fixed, predefined video adaptation methods, or approach the multi-dimensional adaptation (MDA) problem in an ad hoc manner.

To provide a systematic solution, we present a general conceptual framework called utility function (UF), which models video entity, adaptation, resource, utility, and the mapping relations among them. The framework extends the conventional rate-distortion model in terms of flexibility and generality. UF allows formulation of various adaptation problems as resource-constrained utility maximization, and supports the adaptation scheme in an interoperable way.

Furthermore, in order to address the issue of computational complexity in generating UF and support real-time video transcoding, we present a content-based statistical approach to facilitate the prediction of UF for real-time transcoding of live videos. It formulates the prediction as a classification problem – each new video clip is classified into one of distinctive categories. If necessary, local regression can also be applied to refine the utility function. the adaptation operation is guided by the predicted utility function. Our extensive experiment results based on MPEG-4 and MC-EZBC adaptation demonstrate that the proposed content-based prediction method achieves very promising performance.

Utility function based transcoding is an efficient systematic solution for choosing optimal media transcoding operation to meet dynamic resource constraints. However, to date the real-time generation of utility function is not feasible due to computational complexity. We propose a novel content based utility function prediction model, where the basic idea is to consider this problem from a standpoint of statistic pattern classification, using the automatically extracted content features as input and the utility function category as the target. The system architecture of the utility function prediction is shown in Figure 3 below. The upper part shows the high-level view of the procedure of the utility-based adaptation for an incoming live video. For each entity , the content features are extracted and the utility function prediction is applied. The adaptation engine reshapes the stream according to the predicted utility function . The lower part details information about the utility function prediction module. It consists of an offline training routine and an online prediction routine, while the former consists of unsupervised clustering, classification learning and (optional) regression learning, and the latter includes online classification and linear regression. More specifically, for the offline training we firstly build up a media pool using training video clips. For each clip in the pool and are computed in advance. Then, the content features are used to obtain the clustering results and each clip will be mapped to one of the clusters accordingly. Given the labeled instances in the pool, the classification decision is trained using the content features. For the online prediction routine, the content features of the incoming video is analyzed by the classification function. If necessary, according to the classification result, the corresponding regression model for the selected class can be activated and the predicted utility function is obtained.

Figure 3. Overall architecture of the proposed content based utility function prediction framework

1) MPEG-4 FD-CD

FD-CD represents for the combination of frame dropping (FD) and DCT coefficient dropping (CD). FD-CD provides the freedom in optimizing trade-off between spatial and temporal quality. Both operations can be efficiently implemented in the compressed domain without full decoding of the compressed streams.

In our MPEG-4 FD-CD experiment, we selected 2066 video clips, each of which was one second long with SIF format (352x240). The clips were carefully selected to cover a wide range of content features. Every clip was extracted from within a shot and thus no abrupt transitions like shot changes occurred within a clip. The proposed prediction framework was tested using standard cross validation evaluation method in which training and testing was done with random partitions of the pool (70% for training and 30% for testing) over multiple iterations. Four FD operators were adopted: “no frame dropped”, “the first B frame dropped in each sub GOP”, “all B frames dropped”, and “all B and P frames dropped” (namely only keep the I frame ). For each FD, six CD levels were chosen: from 0% to 50% with 10% increment. As a result, there are four operation curves in each utility function.

We applied the above UF framework and demonstrate a content-adaptive utility-based MPEG-4 FD-CD transcoding system. Content features like texture complexity and motion intensity are extracted from each incoming video segment and used to predict the utility function based on pre-trained classifiers. The optimal adaptation operator among all possible options is then automatically selected based on the predicted utility function. Our extensive experiments show very accurate prediction of the utility function as well as the optimal operator - up to 89% in choosing the optimal transcoding operation with the highest quality from multiple alternatives meeting the same target bit rate. More importantly, the whole process of feature extraction, classification, and prediction can be done in real time without needing to use exhaustive comparison of different options.

Figure 4 below is the screen shot of our live demo system, which simulates the real time utility function prediction procedure for MPEG-4 FD-CD. It also shows the extracted features, dynamic network condition, comparison of the actual utility function and predicted one, and comparison of the final transcoded video quality.

Figure 4: Screen shot of the MPEG-4 FD-CD Adaptation Demo

2) Spatio-Temporal Scalability of MC-EZBC using Subjective Quality Evaluation

Motion Compensated Embedded Zero Block Coding (MC-EZBC) is one of the latest subband / wavelet based scalable video coding (MC-3DSBC) systems. Scalable video coding based on MC-3DSBC is becoming increasingly popular, as it yields coding performance competitive with the state-of-art non-scalable codecs, while providing low implementation complexity and high flexibility to match instantaneous network conditions and different receiver capabilities. The spatio-temporal scalability of MC-EZBC is realized by truncate spatial bit planes and temporal subbands respectively, resulting in bit streams with lower temporal sampling rate (i.e., frame rate) and degraded PSNR quality.

In order to understand the influence of different MC-EZBC adaptation operations on the subjective video quality, we conducted extensive subjective tests involving 31 subjects, 128 video clips, a wide range of bandwidth (50Kbps to 1Mbps) and formal subjective quality metrics. Subjects were asked to evaluate the perceptual qualities of a few video clips generated using different spatio-temporal adaptation operations that yield the same bit rate. Figure 5 is the screen shot of the subjective test interface.

Figure 5. Subjective quality evaluation test

After obtaining subjective quality data, we analyze the results using statistical testing methods and investigate the dependence of optimal frame rate on user, bandwidth, and video content characteristics. Our findings indicate the agreement among most users and existence of switching bandwidths at which preferred frame rates change. Dependence of the preference on video content types is also revealed.

Based on these, we applied our proposed prediction framework on MC-EZBC adaptation to choose optimal adaptation operation that matches subjective perceptual quality. Statistical analysis of the experimental results confirms the excellent accuracy in using domain knowledge and content features to predict the optimal adaptation operations - We can achieve up to 95% accuracy in selecting the optimal operations, and compared with the approach using empirical data instead of content-based classification, our method can improve the performance by up to 20%. We also find interesting patterns about the preferred frame rates at different bit rates for different categories of videos.

Y. Wang, J.-G. Kim, and S.-F. Chang, Utility-based Video Adaptation for UMA and Content-based Utility Function Prediction for Real Time Video Transcoding, under revision for IEEE Trans. Multimedia.
Y. Wang, M. v. d. Schaar, S.-F. Chang, A. C. Loui, Content-Based Optimal Adaptation Operation Prediction For Scalable Video Coding Systems Using Subjective Quality Evaluation. Preparing for IEEE CSVT Special Issue on Analysis and Understanding for Video Adaptation. 2004
D. Mukherjee, E. Delfosse, J.-G. Kim, Y. Wang, Terminal and Network Quality of Service, invited paper, IEEE Trans. On Multimedia Special Issue on MPEG-21. 2004
Y. Wang, S.-F. Chang, A. C. Loui. Subjective Preference of Spatio-Temporal Rate in Video Adaptation Using Multi-Dimensional Scalable Coding. IEEE ICME 2004 special session on Mobile Imaging: technology and applications.Volume: 3, Pages:1719 - 1722, June 27-30, 2004.
Y. Wang, T.-T. Ng, M. v. d. Schaar, S.-F. Chang, Predicting Optimal Operation of MC-3DSBC Multi-Dimensional Scalable Video Coding Using Subjective Quality Measurement. Proc. SPIE Video Communications and Image Processing (VCIP), San Jose, CA, January 2004
J.-G. Kim, Y. Wang, S.F. Chang, Content-Adaptive Utility-Based Video Adaptation, IEEE ICME-2003. July 6-9, 2003. Baltimore, Maryland.
Y. Wang, J.-G. Kim, and S.-F. Chang, Content-based utility function prediction for real-time MPEG-4 transcoding, ICIP 2003, September 14-17, 2003, Barcelona, Spain.
Y. Wang, S.-F. Chang, A. C. Loui. Content-Based Prediction of Optimal Video Adaptation Operations Using Subjective Quality Evaluation. Columbia University ADVENT Technical Report #202-2004-2, January 2004.
Y. Wang, J.-G. Kim, S.-F. Chang, “MPEG-4 Real Time FD-CD Transcoding,” Columbia University ADVENT Technical Report #11122003, 2003.
J.-G. Kim, Y. Wang, S.-F. Chang, K. Kang, J. Kim, "Description of utility function based optimum transcoding," ISO/IEC JTC1/SC29/WG11 M8319 Fairfax May 2002.

S.-F. Chang, Optimal Video Adaptation and Skimming Using a Utility-Based Framework, Tyrrhenian International Workshop on Digital Communications (IWDC-2002), Capri Island, Italy, Sept. 2002
P. Bocheck, Y. Nakajima and S.-F. Chang, Real-time Estimation of Subjective Utility Functions for MPEG-4 Video Objects, Proceedings of the Packet Video'99 (PV'99), New York, USA, April 26-27, 1999.

Yong Wang
Shih-Fu Chang

in collaboration with (in alphabeta order of the last name)

Jae-Gon Kim (Electronics and Telecommunications Research Institute (ETRI), Korea)
Dr. Alex Loui (Eastman Kodak)
Tian-Tsong Ng
Prof. Mihaela van der Schaar (University of California at Davis)

	Direct Output	Y. Wang, J.-G. Kim, and S.-F. Chang, Utility-based Video Adaptation for UMA and Content-based Utility Function Prediction for Real Time Video Transcoding, under revision for IEEE Trans. Multimedia. Y. Wang, M. v. d. Schaar, S.-F. Chang, A. C. Loui, Content-Based Optimal Adaptation Operation Prediction For Scalable Video Coding Systems Using Subjective Quality Evaluation. Preparing for IEEE CSVT Special Issue on Analysis and Understanding for Video Adaptation. 2004 D. Mukherjee, E. Delfosse, J.-G. Kim, Y. Wang, Terminal and Network Quality of Service, invited paper, IEEE Trans. On Multimedia Special Issue on MPEG-21. 2004 Y. Wang, S.-F. Chang, A. C. Loui. Subjective Preference of Spatio-Temporal Rate in Video Adaptation Using Multi-Dimensional Scalable Coding. IEEE ICME 2004 special session on Mobile Imaging: technology and applications.Volume: 3, Pages:1719 - 1722, June 27-30, 2004. Y. Wang, T.-T. Ng, M. v. d. Schaar, S.-F. Chang, Predicting Optimal Operation of MC-3DSBC Multi-Dimensional Scalable Video Coding Using Subjective Quality Measurement. Proc. SPIE Video Communications and Image Processing (VCIP), San Jose, CA, January 2004 J.-G. Kim, Y. Wang, S.F. Chang, Content-Adaptive Utility-Based Video Adaptation, IEEE ICME-2003. July 6-9, 2003. Baltimore, Maryland. Y. Wang, J.-G. Kim, and S.-F. Chang, Content-based utility function prediction for real-time MPEG-4 transcoding, ICIP 2003, September 14-17, 2003, Barcelona, Spain. Y. Wang, S.-F. Chang, A. C. Loui. Content-Based Prediction of Optimal Video Adaptation Operations Using Subjective Quality Evaluation. Columbia University ADVENT Technical Report #202-2004-2, January 2004. Y. Wang, J.-G. Kim, S.-F. Chang, “MPEG-4 Real Time FD-CD Transcoding,” Columbia University ADVENT Technical Report #11122003, 2003. J.-G. Kim, Y. Wang, S.-F. Chang, K. Kang, J. Kim, "Description of utility function based optimum transcoding," ISO/IEC JTC1/SC29/WG11 M8319 Fairfax May 2002.
	Related Contribution	S.-F. Chang, Optimal Video Adaptation and Skimming Using a Utility-Based Framework, Tyrrhenian International Workshop on Digital Communications (IWDC-2002), Capri Island, Italy, Sept. 2002 P. Bocheck, Y. Nakajima and S.-F. Chang, Real-time Estimation of Subjective Utility Functions for MPEG-4 Video Objects, Proceedings of the Packet Video'99 (PV'99), New York, USA, April 26-27, 1999.


	For problems or questions regarding this web page please contact with me at .
	Copyright © By Yong Wang All Rights Reserved
	Last updated: Jan 14th, 2003