| |
(1) Bi-level Video
The rapid development of wired and wireless networks tremendously facilitates communications between people. However, most of the current wireless networks still work in low bandwidths, and mobile devices still suffer from weak computational power, short battery lifetime and limited display capability. We developed a very low bit-rate bi-level video coding technique, which can be used in video communications almost anywhere, anytime on any device. The products "Microsoft Portrait" based on this research can be found and free downloaded from my mentor's homepage. The image right is a frame example in this communication software.The spirit of this method is that rather than giving highest priority to the "basic colors" of an image as in conventional DCT-based compression methods, we give preference to the outline features of scenes when we have limited bandwidths. These features can be represented by bi-level image sequences that are converted from gray-scale image sequences. By analyzing the temporal correlation between successive frames and flexibility in the scene presentation using bi-level images, we achieve very high ratios with our bi-level video compression scheme. Experiments show that in low bandwidths, our method provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based methods. Our method is especially suitable for small mobile devices such as handheld PCs, palm-size PCs and mobile phones that possess small display screens and light computational power, and work in low bandwidth wireless networks. We have built PC and Pocket PC versions of bi-level video phone systems, which typically provide QCIF-size video with a frame rate of 5-15 fps for a 9.6 Kbps bandwidth
(2) Concentric Mosaic Array
This project introduces a novel image-based rendering system to capture, represent and render real world and synthetic scenes. In our system, a longitudinally aligned camera array is mounted on a rotating arm supported by a tripod. The cameras are always aimed along the radial direction. The scene is captured by the camera array that rotates along a circle. Each pixel of the captured images is indexed by 4 parameters, i.e. the rotation angle of the camera array, the longitudinal number of the camera, the image column number and the image row number. Given the position and the viewing direction of an observer, the system can generate novel views by interpolating the captured pixels in real time without any geometric representation. If the observer is constrained to move on a plane, the size of the scene data can be further reduced to that of an approximately 3.5D plenoptic function. Compared with light field and Lumigraph, our method provides an easier inside-looking-out capture configuration and a uniform spatial sampling pattern. Our system goes a step further than concentric mosaics by allowing users to move continuously within a 3D cylindrical space, thus users can experience significant lateral as well as longitudinal parallaxes and lighting changes of a scene. Moreover, our method provides an image-based solution to the wandering of a large environment through concatenation of various wandering circles. Our technique has potential applications in entertainment, e-commerce and communication.
(3) Image-based Walkthrough over the Internet
In this project, we describe a client-server image-based walkthrough system. This system employs pictures, panoramas, and concentric mosaics captured from real scenes. These data are compressed and stored in a remote server, and can be retrieved over the Internet. Rather than transmitting a video sequence frame by frame, our system at the client end allows users to selectively retrieve image segments at specific viewpoints and viewing directions. This selective retrieval is achieved by implementing a client-server communication protocol. Cache strategies are designed to ensure a smooth viewing experience for the users. Experiments show that within a local area network, our system can reach over 15 frames per second (fps) with a resolution of 360 x 288 pixels while the user is rotating and 3 fps while the user is translating. Even with a 56Kbps modem, our system can still reach over 15 fps with a resolution of 180 x 144 pixels for rotation and 1 fps for translation. With partial cache scheme, our system uses no more than 1MB of RAM at the client end. Therefore our system is particularly suitable for online virtual tours, shopping, and communication on hand-held, palm-size PCs or mobile phones.
(4) Real Time Motion Analysis for Video Content Understanding
Video motion analysis and its application has been a classic research topic for decades. In this paper, we explore the problem of real time video semantics understanding based on motion information. The work can be divided into two segments: global / camera motion estimation and object motion analysis. The former involves optical flow analysis and semantic meaning parsing, and the latter involves object detection and tracking. Although each of these topics has been studied extensively in the literature, a thorough system combining all of them without human intervention, especially under a real time semantic understanding scenario, is still worthy of further investigation. In this paper we develop our approach toward such a destination and propose an integral architecture. The usability and efficiency of the proposed system has been demonstrated through experiments. The result of this project has numerous applications in digital entertainment, such as video and image summarization, annotation, retrieval and editing. |