Yong Wang's Homepage | Research
Yong Wang's Homepage


I am currently a Ph.D. candidate in the Department of Electrical Engineering at Columbia University. I received my MS degree in 2001 and BE in 1999, both in EE from Tsinghua University. Under the direction of Prof. Shih-Fu Chang, my research at Columbia focuses on issues about resource constrained video adaptation. I also have interest in information visualization, video understanding, and content based video analysis using statistic pattern recognition methods.





Thesis Work: Resource Constrained Video Adaptation with Utility Maximization


Conventional video coding / transcoding technologies interest in the issue of rate-distortion optimization. Specifically target bit rate is the resource limitation, and objective quality (MSE or PSNR) is the performance evaluation criterion. Nevertheless, in the scenario of universal multimedia access(UMA), advanced mobile and broadcast multimedia services are focusing on the objective of automatically providing different media presentations to suit various terminals, networks and user interests. In this situation, video coding / transcoding is more challenging because of the reasons in three aspects. Firstly, objective quality is no longer the unique performance evaluation criterion. Subjective quality, comprehensibility, or even the semantic meaning delivery can be the important consideration. Secondly, though bandwidth is still an important limitation, in some applications, particularly in the mobile video solutions, other constraints such as computational complexity and power consumption turn to be the bottleneck issues. Lastly, in terms of adaptation, the development of state-of-art video codecs provides more freedoms in reshaping the video bit streams. Instead of applying a pre-defined operation, multi-dimensional adaptation (MDA) should be considered together in order to maximize the adaptation performance. All of these facts make the video adaptation in UMA a very challenging research topic. Resolving this problem within a thesis work is too ambitious. Therefore, we define a sub area in this topic, named "Resource Constrained Video Adaptation with Utility Maximization", and try to make our contributions from the following aspects (Click on each link will lead you to the corresponding project homepage):

VisGenie: Video Based Information Visualization System


In this project, we designed and realized VisGenie, a generic system for visualizing multimedia streams and the associated metadata. Information visualization is not only useful for end users, but also beneficial for researchers who work on analyzing the media data and improving our understanding and presentation of multimedia information. VisGenie provides a bunch of visualization components that are flexible and extensible to different kinds of visualization applications. It also provide extreme convenience to build up demonstration platforms visualizing the research results. Furthermore, VisGenie defines its own SDK so that users can construct their video analysis platforms instantly. We believe the researchers / developers from the following areas can benefit a lot from using VisGenie: image processing, computer vision, machine learning, pattern recognition, video coding, transcoding, retrieval, understanding, management, etc.

Enter VisGenie homepage

Off-Campus Project

The following projects were done when I was a visiting student at Microsoft Research Asia.
(1) Bi-level Video
(2) Concentric Mosaic Array
(3) Image-based Walkthrough over the Internet

The following project was done when I was a summer intern at HP Labs.
(4) Real Time Motion Analysis for Video Content Understanding


(1) Bi-level Video

The rapid development of wired and wireless networks tremendously facilitates communications between people. However, most of the current wireless networks still work in low bandwidths, and mobile devices still suffer from weak computational power, short battery lifetime and limited display capability. We developed a very low bit-rate bi-level video coding technique, which can be used in video communications almost anywhere, anytime on any device. The products "Microsoft Portrait" based on this research can be found and free downloaded from my mentor's homepage. The image right is a frame example in this communication software.The spirit of this method is that rather than giving highest priority to the "basic colors" of an image as in conventional DCT-based compression methods, we give preference to the outline features of scenes when we have limited bandwidths. These features can be represented by bi-level image sequences that are converted from gray-scale image sequences. By analyzing the temporal correlation between successive frames and flexibility in the scene presentation using bi-level images, we achieve very high ratios with our bi-level video compression scheme. Experiments show that in low bandwidths, our method provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based methods. Our method is especially suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile phones that possess small display screens and light computational power, and work in low bandwidth wireless networks. We have built PC and Pocket PC versions of bi-level video phone systems, which typically provide QCIF-size video with a frame rate of 5-15 fps for a 9.6 Kbps bandwidth

(2) Concentric Mosaic Array

This project introduces a novel image-based rendering system to capture, represent and render real world and synthetic scenes. In our system, a longitudinally aligned camera array is mounted on a rotating arm supported by a tripod. The cameras are always aimed along the radial direction. The scene is captured by the camera array that rotates along a circle. Each pixel of the captured images is indexed by 4 parameters, i.e. the rotation angle of the camera array, the longitudinal number of the camera, the image column number and the image row number. Given the position and the viewing direction of an observer, the system can generate novel views by interpolating the captured pixels in real time without any geometric representation. If the observer is constrained to move on a plane, the size of the scene data can be further reduced to that of an approximately 3.5D plenoptic function. Compared with light field and Lumigraph, our method provides an easier inside-looking-out capture configuration and a uniform spatial sampling pattern. Our system goes a step further than concentric mosaics by allowing users to move continuously within a 3D cylindrical space, thus users can experience significant lateral as well as longitudinal parallaxes and lighting changes of a scene. Moreover, our method provides an image-based solution to the wandering of a large environment through concatenation of various wandering circles. Our technique has potential applications in entertainment, e-commerce and communication.

(3) Image-based Walkthrough over the Internet

In this project, we describe a client-server image-based walkthrough system. This system employs pictures, panoramas, and concentric mosaics captured from real scenes. These data are compressed and stored in a remote server, and can be retrieved over the Internet. Rather than transmitting a video sequence frame by frame, our system at the client end allows users to selectively retrieve image segments at specific viewpoints and viewing directions. This selective retrieval is achieved by implementing a client-server communication protocol. Cache strategies are designed to ensure a smooth viewing experience for the users. Experiments show that within a local area network, our system can reach over 15 frames per second (fps) with a resolution of 360 x 288 pixels while the user is rotating and 3 fps while the user is translating. Even with a 56Kbps modem, our system can still reach over 15 fps with a resolution of 180 x 144 pixels for rotation and 1 fps for translation. With partial cache scheme, our system uses no more than 1MB of RAM at the client end. Therefore our system is particularly suitable for online virtual tours, shopping, and communication on hand-held, palm-size PCs or mobile phones.

(4) Real Time Motion Analysis for Video Content Understanding

Video motion analysis and its application has been a classic research topic for decades. In this paper, we explore the problem of real time video semantics understanding based on motion information. The work can be divided into two segments: global / camera motion estimation and object motion analysis. The former involves optical flow analysis and semantic meaning parsing, and the latter involves object detection and tracking. Although each of these topics has been studied extensively in the literature, a thorough system combining all of them without human intervention, especially under a real time semantic understanding scenario, is still worthy of further investigation. In this paper we develop our approach toward such a destination and propose an integral architecture. The usability and efficiency of the proposed system has been demonstrated through experiments. The result of this project has numerous applications in digital entertainment, such as video and image summarization, annotation, retrieval and editing.



Disclaimer : Papers are on this page for the sole purposed of timely information dissemination, the copyright remains with the corresponding publishers. Please make sure you are under fair use of these bits.
  1. Y. Wang, J.-G. Kim, and S.-F. Chang, Utility-based Video Adaptation for UMA and Content-based Utility Function Prediction for Real Time Video Transcoding, revised for IEEE Trans. Multimedia.
  2. J.-G. Kim, Y. Wang, S.-F. Chang, and H.-M. Kim, An optimal framework of video adaptation and its application to rate adaptation transcoding, ETRI Journal, Volume 27, Number 4, August 2005.
  3. Y. Wang , M. v. d. Schaar, S.-F. Chang, A. C. Loui, Classification-Based Prediction Of Optimal Multi-Dimensional Adaptation For Scalable Video Coding Using Subjective Quality Evaluation. accepted by IEEE CSVT Special Issue on Analysis and Understanding for Video Adaptation. 2005
  4. D. Mukherjee, E. Delfosse, J.-G. Kim, Y. Wang, Terminal and Network Quality of Service, invited paper, IEEE Trans. On Multimedia Special Issue on MPEG-21. 2004 [PDF]
  1. Y. Wang, T. Zhang, D. Tretter. Real Time Motion Analysis for Video Content Understanding. Submitted to VCIP'05.
  2. Y. Wang, S.-F. Chang, A. C. Loui. Subjective Preference of Spatio-Temporal Rate in Video Adaptation Using Multi-Dimensional Scalable Coding. IEEE ICME 2004 special session on Mobile Imaging: technology and applications. Volume: 3, Pages:1719 - 1722, June 27-30, 2004. [PDF]
  3. Y. Wang, T.-T. Ng, M. v. d. Schaar, S.-F. Chang, Predicting Optimal Operation of MC-3DSBC Multi-Dimensional Scalable Video Coding Using Subjective Quality Measurement. Proc. SPIE Video Communications and Image Processing (VCIP), San Jose, CA, January 2004 [PDF]
  4. J.-G. Kim, Y. Wang, S.F. Chang, Content-Adaptive Utility-Based Video Adaptation, IEEE ICME-2003. July 6-9, 2003. Baltimore, Maryland. [PDF]
  5. Y. Wang, J.-G. Kim, and S.-F. Chang, Content-based utility function prediction for real-time MPEG-4 transcoding, ICIP 2003, September 14-17, 2003, Barcelona, Spain. [PDF]
  6. J. Li, G. Chen, J. Xu, Y. Wang, H. Zhou, K. Yu, K. T. Ng and H.-Y. Shum, "Bi-level Video: Video Communications at Very Low Bit Rates", ACM Multimedia Conference 2001, Sep. 30 - Oct. 5, Ottawa, Ontario, Canada, pages 392 - 400. [PDF]
  7. J. Li, K. Yu, G. Chen, Y. Wang, H. Zhou, J. Xu, K. T. Ng, K. Wang, L. Wang and H.-Y. Shum, "Portrait Video Phone", ACM Multimedia Conference 2001, Sep. 30 - Oct. 5, Ottawa, Ontario, Canada, pages 597 - 598. [PDF]
  8. J. Li, Y. Tong, Y. Wang, H.-Y. Shum, Y.-Q. Zhang, "Image-based Walkthrough over the Internet", International Workshop on Very Low Bitrate Video Coding (VLBV01), October 2001, Athens, Greece.[PDF]
  9. J. Li, K. Zhou, Y. Wang, H.-Y. Shum, "A Novel Image-Based Rendering System With A Longitudinally Aligned Camera Array", EuroGraphics 2000 Short Presentations, pp.107-114, Interlaken, Switzerland, 21-25 August, 2000. [PDF]
Technical Report
  1. Y. Wang, S.-F. Chang, Motion Estimation and Mode Decision for Low-Complexity H.264 Decoder. Columbia University ADVENT Technical Report #211-2005-5, 2005.
  2. Y. Wang, L. Xie, S.-F. Chang, VisGenie: a Generic Video Visualization System. Columbia University ADVENT Technical Report #210-2005-4, 2005.
  3. Y. Wang, S.-F. Chang, A. C. Loui. Content-Based Prediction of Optimal Video Adaptation Operations Using Subjective Quality Evaluation. Columbia University ADVENT Technical Report #202-2004-2, January 2004.
  4. Y. Wang, J.-G. Kim, S.-F. Chang, “MPEG-4 Real Time FD-CD Transcoding,” Columbia University ADVENT Technical Report #11122003, 2003.
  1. J.-G. Kim, Y. Wang, S.-F. Chang, K. Kang, J. Kim, "Description of utility function based optimum transcoding," ISO/IEC JTC1/SC29/WG11 M8319 Fairfax May 2002.[PDF]



  1. Yong Wang, Shih-Fu Chang, Motion Estimation and Mode Decision for Low-Complexity H.264 Decoder. Provisional filed 2005
  2. Tong Zhang, Yong Wang, Daniel Tretter, Real Time Video Motion Analysis, Invention disclosure filed 2003
  3. Jae-Gon Kim, Yong Wang, Shih-Fu Chang, Method and System for Optimal Video Transcoding. Invention disclosure filed 2002



  1. Yong Wang, Shih-Fu Chang, H.264 Encoder with complexity adaptive motion estimation and mode decision (CAMED)
  2. Yong Wang, Lexing Xie, Shih-Fu Chang, VISGenie: A generic video based information visualization system
  3. Yong Wang, Shih-Fu Chang, MPEG-4 transcoder based on frame dropping and coefficient dropping (FD-CD)
  4. J. Li, K. Yu, G. Chen, Y. Wang, H. Zhou, J. Xu, K. T. Ng, K. Wang, L. Wang and H.-Y. Shum, Microsoft Portrait Video Phone

Other useful software tools:

  1. YUVGenius: Rendering YUV files through Windows Media Player
  2. Frame Dumper: Dump video frames into bitmap files.
  3. Frame Ripper: Rip video frames and save them into YUV files.


Columbia | EE Department | ADVENT | DVMM

For problems or questions regarding this web page please contact with me at .
Copyright © By Yong Wang All Rights Reserved
Last updated: Jan 14th, 2003