Research |
Yong
Wang's Homepage
|
Thesis Work: Resource Constrained Video Adaptation with Utility Maximization
|
Conventional video coding / transcoding technologies interest in the issue of rate-distortion optimization. Specifically target bit rate is the resource limitation, and objective quality (MSE or PSNR) is the performance evaluation criterion. Nevertheless, in the scenario of universal multimedia access(UMA), advanced mobile and broadcast multimedia services are focusing on the objective of automatically providing different media presentations to suit various terminals, networks and user interests. In this situation, video coding / transcoding is more challenging because of the reasons in three aspects. Firstly, objective quality is no longer the unique performance evaluation criterion. Subjective quality, comprehensibility, or even the semantic meaning delivery can be the important consideration. Secondly, though bandwidth is still an important limitation, in some applications, particularly in the mobile video solutions, other constraints such as computational complexity and power consumption turn to be the bottleneck issues. Lastly, in terms of adaptation, the development of state-of-art video codecs provides more freedoms in reshaping the video bit streams. Instead of applying a pre-defined operation, multi-dimensional adaptation (MDA) should be considered together in order to maximize the adaptation performance. All of these facts make the video adaptation in UMA a very challenging research topic. Resolving this problem within a thesis work is too ambitious. Therefore, we define a sub area in this topic, named "Resource Constrained Video Adaptation with Utility Maximization", and try to make our contributions from the following aspects (Click on each link will lead you to the corresponding project homepage): |
VisGenie: Video Based Information Visualization System
|
In this project, we designed and realized VisGenie, a generic system for visualizing multimedia streams and the associated metadata. Information visualization is not only useful for end users, but also beneficial for researchers who work on analyzing the media data and improving our understanding and presentation of multimedia information. VisGenie provides a bunch of visualization components that are flexible and extensible to different kinds of visualization applications. It also provide extreme convenience to build up demonstration platforms visualizing the research results. Furthermore, VisGenie defines its own SDK so that users can construct their video analysis platforms instantly. We believe the researchers / developers from the following areas can benefit a lot from using VisGenie: image processing, computer vision, machine learning, pattern recognition, video coding, transcoding, retrieval, understanding, management, etc. Enter VisGenie homepage |
|
(1) Bi-level Video The rapid development of wired and wireless networks tremendously facilitates communications between people. However, most of the current wireless networks still work in low bandwidths, and mobile devices still suffer from weak computational power, short battery lifetime and limited display capability. We developed a very low bit-rate bi-level video coding technique, which can be used in video communications almost anywhere, anytime on any device. The products "Microsoft Portrait" based on this research can be found and free downloaded from my mentor's homepage. The image right is a frame example in this communication software.The spirit of this method is that rather than giving highest priority to the "basic colors" of an image as in conventional DCT-based compression methods, we give preference to the outline features of scenes when we have limited bandwidths. These features can be represented by bi-level image sequences that are converted from gray-scale image sequences. By analyzing the temporal correlation between successive frames and flexibility in the scene presentation using bi-level images, we achieve very high ratios with our bi-level video compression scheme. Experiments show that in low bandwidths, our method provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based methods. Our method is especially suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile phones that possess small display screens and light computational power, and work in low bandwidth wireless networks. We have built PC and Pocket PC versions of bi-level video phone systems, which typically provide QCIF-size video with a frame rate of 5-15 fps for a 9.6 Kbps bandwidth This project introduces a novel image-based rendering system to capture, represent and render real world and synthetic scenes. In our system, a longitudinally aligned camera array is mounted on a rotating arm supported by a tripod. The cameras are always aimed along the radial direction. The scene is captured by the camera array that rotates along a circle. Each pixel of the captured images is indexed by 4 parameters, i.e. the rotation angle of the camera array, the longitudinal number of the camera, the image column number and the image row number. Given the position and the viewing direction of an observer, the system can generate novel views by interpolating the captured pixels in real time without any geometric representation. If the observer is constrained to move on a plane, the size of the scene data can be further reduced to that of an approximately 3.5D plenoptic function. Compared with light field and Lumigraph, our method provides an easier inside-looking-out capture configuration and a uniform spatial sampling pattern. Our system goes a step further than concentric mosaics by allowing users to move continuously within a 3D cylindrical space, thus users can experience significant lateral as well as longitudinal parallaxes and lighting changes of a scene. Moreover, our method provides an image-based solution to the wandering of a large environment through concatenation of various wandering circles. Our technique has potential applications in entertainment, e-commerce and communication. (3) Image-based Walkthrough over the Internet In this project, we describe a client-server image-based walkthrough system. This system employs pictures, panoramas, and concentric mosaics captured from real scenes. These data are compressed and stored in a remote server, and can be retrieved over the Internet. Rather than transmitting a video sequence frame by frame, our system at the client end allows users to selectively retrieve image segments at specific viewpoints and viewing directions. This selective retrieval is achieved by implementing a client-server communication protocol. Cache strategies are designed to ensure a smooth viewing experience for the users. Experiments show that within a local area network, our system can reach over 15 frames per second (fps) with a resolution of 360 x 288 pixels while the user is rotating and 3 fps while the user is translating. Even with a 56Kbps modem, our system can still reach over 15 fps with a resolution of 180 x 144 pixels for rotation and 1 fps for translation. With partial cache scheme, our system uses no more than 1MB of RAM at the client end. Therefore our system is particularly suitable for online virtual tours, shopping, and communication on hand-held, palm-size PCs or mobile phones. (4) Real Time Motion Analysis for Video Content Understanding Video motion analysis and its application has been a classic research topic for decades. In this paper, we explore the problem of real time video semantics understanding based on motion information. The work can be divided into two segments: global / camera motion estimation and object motion analysis. The former involves optical flow analysis and semantic meaning parsing, and the latter involves object detection and tracking. Although each of these topics has been studied extensively in the literature, a thorough system combining all of them without human intervention, especially under a real time semantic understanding scenario, is still worthy of further investigation. In this paper we develop our approach toward such a destination and propose an integral architecture. The usability and efficiency of the proposed system has been demonstrated through experiments. The result of this project has numerous applications in digital entertainment, such as video and image summarization, annotation, retrieval and editing. |
|
Disclaimer : Papers are on this page for the sole purposed of timely information dissemination, the copyright remains with the corresponding publishers. Please make sure you are under fair use of these bits. |
|
Journal |
|
|
Conference |
|
|
Technical Report |
|
|
Others |
|
|
|
|
Other useful software tools:
|
Columbia | EE Department | ADVENT | DVMM
For problems or questions regarding this web page please contact with me at . | |
Copyright © By Yong Wang All Rights Reserved | |
Last updated: Jan 14th, 2003 |