*bibtex popups requires javascript support
Tao Chen

Tao Chen (Tao Chen)

I am a postdoctor at Columbia University working with Professor Shih-Fu Chang. Before that I was a postdoctor in Department of Computer Science and Technology in Tsinghua University, China and School of Computer Science, Tel Aviv University, Israel. I received my doctor and bachelor degree from Department of Computer Science and Technology and Department of Physics (Fundamental Science Class), Tsinghua University in 2011 and in 2005, respectively

My research interests include: computer graphics; image/video processing, editing and composition; social network multimedia. CV

Personal info: I was born on Feb. 3, 1984, Yantai, Shandong Province of China.
My contact info:

Email: taochen (at) ee.columbia.edu
Address: 500 W. 120th St. Rm 1312, Department of Electrical Engineering, Columbia University, New York, NY 10027, USA



Assistive Image Comment Robot – A Novel Mid-Level Concept-Based Representation
Yan-Ying Chen, Tao Chen, Taikun Liu, Hong-Yuan Mark Liao, Shih-Fu Chang
IEEE Transactions on Affective Computing. Accepted.

We present a general framework and working system for predicting likely affective responses of the viewers in the social media environment after an image is posted online. Our approach emphasizes a mid-level concept representation, in which intended affects of the image publisher is characterized by a large pool of visual concepts (termed PACs) detected from image content directly instead of textual metadata, evoked viewer affects are represented by concepts (termed VACs) mined from online comments, and statistical methods are used to model the correlations among these two types of concepts. We demonstrate the utilities of such approaches by developing an end-to-end Assistive Comment Robot application, which further includes components for multi-sentence comment generation, interactive interfaces, and relevance feedback functions. Through user studies, we showed machine suggested comments were accepted by users for online posting in 90% of completed user sessions, while very favorable results were also observed in various dimensions (plausibility, preference, and realism) when assessing the quality of the generated image comments.

[ paper available soon ] [ video ]


Object-Based Visual Sentiment Concept Analysis and Application
Tao Chen, Felix X. Yu, Jiawei Chen, Yin Cui, Yan-Ying Chen, Shih-Fu Chang
ACM Multimedia 2014 (full paper).

This paper studies the problem of modeling object-based visual concepts such as "crazy car" and "shy dog" with a goal to extract emotion related information from social multimedia content. We focus on detecting such adjective-noun pairs because of their strong co-occurrence relation with image tags about emotions. This problem is very challenging due to the highly subjective nature of the adjectives like "crazy" and "shy". However, associating adjectives with concrete physical nouns makes the combined visual concepts more detectable and tractable. We propose a hierarchical system to handle the concept classification in an object specific manner and decompose the hard problem into object localization and sentiment related concept modeling. In order to resolve the ambiguity of concepts we propose a novel classification approach by modeling the concept similarity, leveraging the online commonsense knowledgebase. The proposed framework also allows us to interpret the classifiers by discovering discriminative features. The comparisons between our method and several baselines show great improvement on classification performance. We further demonstrate the power of the proposed system by a few novel applications such as sentiment-aware music slide show of personal albums.

[ paper ] [ video available soon ]


Predicting Viewer Affective Comments Based on Image Content in Social Media
Yan-Ying Chen, Tao Chen, Winston H. Hsu, Hong-Yuan Mark Liao, Shih-Fu Chang
ACM ICMR 2014 (full paper).

Visual sentiment analysis is getting increasing attention because of the rapidly growing amount of images in online social interactions and several emerging applications such as online propaganda and advertisement. Recent studies have shown promising progress in analyzing visual affect concepts intended by the media content publisher. In contrast, this paper focuses on predicting what viewer a ect concepts will be triggered when the image is perceived by the viewers. For example, given an image tagged with "yummy food", the viewers are likely to comment "delicious" and "hungry", which we refer to as viewer affect concepts (VAC) in this paper. To the best of our knowledge, this is the first work explicitly distinguishing intended publisher affect concepts and induced viewer affect concepts associated with social visual content, and aiming at understanding their correlations. We present around 400 VACs automatically mined from million-scale real user comments associated with images in social media. Furthermore, we propose an automatic visual based approach to predict VACs by first detecting publisher affect concepts in image content and then applying statistical correlations between such publisher affect concepts and the VACs. We demonstrate major benefits of the proposed methods in several real-world tasks - recommending images to invoke certain target VACs among viewers, increasing the accuracy of predicting VACs by 20.1% and finally developing a social assistant tool that may suggest plausible, content-specific and desirable comments when users view new images.

[ paper ] [ bibtex ]


Modeling Attributes from Category-Attribute Proportions
Felix X. Yu, Liangliang Cao, Michele Merler, Noel Codella, Tao Chen, John Smith, Shih-Fu Chang
ACM Multimedia 2014 (short paper).

Attribute-based representation has been widely used in visual recognition and retrieval due to its interpretability and cross-category generalization properties. However, classic attribute learning requires manually labeling attributes on the images, which is very expensive, and not scalable. In this paper, we propose to model attributes from category-attribute proportions. The proposed framework can model attributes without attribute labels on the images. Specifically, given a multi-class image datasets with N categories, we model an attribute, based on anM-dimensional category-attribute proportion vector, where each element of the vector characterizes the proportion of images in the corresponding category having the attribute. The attribute learning can be formulated as a learning with label proportion (LLP) problem. Our method is based on a newly proposed machine learning algorithm called /SVM. We show that the category-attribute proportions can be estimated from multiple modalities such as human commonsense knowledge, NLP tools, and other domain knowledge. The value of the proposed approach is demonstrated by various applications including modeling animal attributes, visual sentiment attributes, and scene attributes.

[ paper ]


3-Sweep: Extracting Editable Objects from a Single Photo
Tao Chen, Zhe Zhu, Ariel Shamir, Shi-Min Hu, Daniel Cohen-Or
ACM Transactions on Graphics (TOG). Vol. 32. No. 6. Siggraph Asia 2013.

We introduce an interactive technique for manipulating simple 3D shapes based on extracting them from a single photograph. Such extraction requires understanding of the components of the shape, their projections, and relations. These simple cognitive tasks for humans are particularly difficult for automatic algorithms. Thus, our approach combines the cognitive abilities of humans with the computational accuracy of the machine to solve this problem. Our technique provides the user the means to quickly create editable 3D parts--- human assistance implicitly segments a complex object into its components, and positions them in space. In our interface, three strokes are used to generate a 3D component that snaps to the shape's outline in the photograph, where each stroke defines one dimension of the component. The computer reshapes the component to fit the image of the object in the photograph as well as to satisfy various inferred geometric constraints imposed by its global 3D structure. We show that with this intelligent interactive modeling tool, the daunting task of object extraction is made simple. Once the 3D object has been extracted, it can be quickly edited and placed back into photos or 3D scenes, permitting object-driven photo editing tasks which are impossible to perform in image-space. We show several examples and present a user study illustrating the usefulness of our technique.

[ paper ] [ video ] [ project page (TAU&IDC) ] [ project page (THU) ] [ bibtex ]


Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs
Damian Borth, Rongrong Ji, Tao Chen, Thomas Breuel and Shih-Fu Chang
ACM Multimedia Brave New Idea, Barcelona, Spain, Oct 2013.

To address the challenge of sentiment analysis from visual content, we propose a novel approach based on understanding of the visual concepts that are strongly related to sentiments. Our key contribution is two-fold: first, we present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP). Second, we propose SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image. Experiments on detecting sentiment of image tweets demonstrate significant improvement in detection accuracy when comparing the proposed SentiBank based predictors with the text-based approaches. The effort also leads to a large publicly available resource consisting of a visual sentiment ontology, a large detector library, and the training/testing benchmark for visual sentiment analysis.

[ paper ] [ video ] [ project page ] [ bibtex ]


Towards a Comprehensive Computational Model for Aesthetic Assessment of Videos
Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang and Mubarak Shah
ACM Multimedia Grand Challenge, Barcelona, Spain, Oct 2013.

In this paper we propose a novel aesthetic model emphasizing psychovisual statistics extracted from multiple levels in contrast to earlier approaches that rely only on descriptors suited for image recognition or based on photographic principles. Our approach demonstrates strong correlation with human prediction on 1,000 broadcast quality videos released by NHK as an aesthetic evaluation dataset.

[ paper ] [ bibtex ]


SentiBank: Large-Scale Ontology and Classifiers for Detecting Sentiment and Emotions in Visual Content
Damian Borth, Tao Chen, Rong-Rong Ji and Shih-Fu Chang
ACM Multimedia, Barcelona, Spain, Oct 2013.

We demonstrate a novel system which combines sound structures from psychology and the folksonomy extracted from social multimedia to develop a large visual sentiment ontology consisting of 1,200 concepts and associated classifiers called SentiBank. Each concept, defined as an Adjective Noun Pair (ANP), is made of an adjective strongly indicating emotions and a noun corresponding to objects or scenes that have a reasonable prospect of automatic detection. We demonstrate novel applications made possible by SentiBank including live sentiment prediction of social media and visualization of visual content in a rich intuitive semantic space.

[ paper ] [ video1 ] [ video2 ] [ project page ] [ bibtex ]


Internet visual media processing: a survey with graphics and vision applications
Shi-Min Hu, Tao Chen, Kun Xu, Ming-Ming Cheng, Ralph R. Martin
The Visual Computer, Vol. 29, No.5, 393-405. 2013.

In recent years, the computer graphics and computer vision communities have devoted significant attention to research based on Internet visual media resources. The huge number of images and videos continually being uploaded by millions of people have stimulated a variety of visual media creation and editing applications, while also posing serious challenges of retrieval, organization, and utilization. This article surveys recent research as regards processing of large collections of images and video, including work on analysis, manipulation, and synthesis. It discusses the problems involved, and suggests possible future directions in this emerging research area.

[ paper ] [ bibtex ]


Motion-Aware Gradient Domain Video Composition
Tao Chen, Jun-Yan Zhu, Ariel Shamir, Shi-Min Hu
IEEE Transactions on Image Processing, Vol. 22, No.7, 2532 - 2544. 2013.

Gradient domain composition methods like Poisson blending offer practical solutions for uncertain object boundaries and differences in illumination conditions. However, adapting Poisson image blending to videos faces new challenges due to the addition of the temporal dimension. In videos, the human eye is sensitive to small changes in the blending boundaries across frames, and slight differences in the motion of the source patch and the target video. We present a novel video blending approach that tackles these problems by merging the gradient of source and target video and optimizing consistent blending boundary according to a user provided blending trimap for the source video. We extend the mean-value coordinates interpolation to support hybrid blending with dynamic boundary while maintaining interactive performance. We also provide a user interface and source object positioning method that can efficiently deal with complex video sequences beyond the capability of alpha blending.

[ paper ] [ video ] [ bibtex ]


PoseShop: Human Image Database Construction and Personalized Content Synthesis
Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir, Shi-Min Hu
IEEE Transactions on Visualization and Computer Graphics, Vol.19, No. 5, 824-837. 2013.

We present PoseShop -- a pipeline to construct segmented human image database with minimal manual intervention. By downloading, analyzing, and filtering massive amounts of human images from the Internet we achieve a database which contains 400 thousands human figures that are segmented out of their background. The human figures are organized based on action semantic, clothes attributes and indexed by the shape of their poses. They can be queried using either silhouette sketch or a skeleton to find a given pose. We demonstrate applications for this database for multi-frame personalized content synthesis in the form of comic-strips, where the main character is the user or his/her friends. We address the two challenges of such synthesis, namely personalization and consistency over a set of frames, by introducing head swapping and clothes swapping techniques. We also demonstrate an action correlation analysis application to show the usefulness of the database for vision application.

[ paper ] [ video ] [ supplemental ] [ online human database ] [ bibtex ]


Data-Driven Object Manipulation in Images
Chen Goldberg, Tao Chen, Fang-Lue Zhang, Ariel Shamir, Shi-Min Hu
Computer Graphics Forum, vol. 31, no. 2pt1, pp. 265-274. Eurographics 2012.

We present a framework for interactively manipulating objects in a photograph using related objects obtained from internet images. Given an image, the user selects a scene-object to modify, and provides keywords to describe it. The application then retrieves and segments objects with a similar shape from online images matching the keyword, and deforms them to correspond with the selected object. By matching the candidate object and adjusting manipulation parameters, the application appropriately modifies candidate objects and composites them into the scene. Supported manipulations include transferring texture, color and shape from the matched object to the target in a seamless manner. We demonstrate the versatility of our framework using several inputs of varying complexity, showing applications to object completion, augmentation, replacement and revealing. We also present an evaluation of our results with a user study.

[ paper ] [ video ] [ user study ] [ bibtex ]


Visual Storylines: Semantic Visualization of Movie Sequence
Tao Chen, Aidong Lu, Shi-Min Hu
Computers & Graphics 36.4: 241-249. 2012.

This paper presents a video summarization approach that automatically extracts and visualizes movie storylines in a static image for the purposes of efficient representation and quick overview. A new type of video visualization, Visual Storylines, is designed to summarize video storylines in a succinct visual format while preserving the elegance of original videos. This is achieved with a series of video analysis, image synthesis, relationship quantification and geometric layout optimization techniques. Specifically, we analyze video contents and quantify video story unit relationships automatically through clustering video shots according to both visual and audio data. A multi-level storyline visualization method then organizes and synthesizes a suitable amount of representative information, including both locations and interested objects and characters, with the assistants of special visual languages, according to the relationships between video story units and temporal structure of the video sequence. Several results have demonstrated that our approach is able to abstract the storylines of professionally edited video such as commercial movies and TV series. Preliminary user studies have been performed to evaluate our approach and the results show that our approach can be used to assist viewers to grasp video contents efficiently, especially when they are familiar with the context of the video, or a text synopsis is provided.

[ paper ] [ supplemental ] [ bibtex ]

Sketch2Photo: Internet Image Montage
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu
ACM Transactions on Graphics (TOG). Vol. 28. No. 5. Siggraph Asia 2009.

We present a system that composes a realistic picture from a simple freehand sketch annotated with text labels. The composed picture is generated by seamlessly stitching several photographs in agreement with the sketch and text labels; these are found by searching the Internet. Although online image search generates many inappropriate results, our system is able to automatically select suitable photographs to generate a high quality composition, using a filtering scheme to exclude undesirable images. We also provide a novel image blending algorithm to allow seamless image composition. Each blending result is given a numeric score, allowing us to find an optimal combination of discovered images. Experimental results show the method is very successful; we also evaluate our system using the results from two user studies.

[ paper ] [ project page ] [ bibtex ]

Vectorizing Cartoon Animations
Song-Hai Zhang, Tao Chen, Yi-Fei Zhang, Shi-Min Hu, Ralph R. Martin
IEEE Transactions on Visualization and Computer Graphics, 15(4), 618-629. 2009.

We present a system for vectorizing 2D raster format carton animations. The output animations are visually flicker free, smaller in file size, and easy to edit. We identify decorative lines separately from coloured regions. We use an accurate and semantically meaningful image decomposition algorithm which supports an arbitrary color model for each region. To ensure temporal coherence in the output cartoon, we reconstruct a universal background for all frames, and separately extract foreground regions. Simple user-assistance is required to complete the background. Each region and decorative line is vectorized and stored together with their motions from frame to frame.

[ paper ] [ video ] [ bibtex ]

Video-Based Running Water Animation in Chinese Painting Style
Song-Hai Zhang, Tao Chen, Yi-Fei Zhang, Shi-Min Hu, Ralph R. Martin
Science in China Series F: Information Sciences, 52(2), 162-171. 2009.

This paper presents a novel algorithm for synthesizing animations of running water, such as waterfalls and rivers, in the style of Chinese paintings, for applications such as cartoon making. All video frames are first registered in a common coordinate system, simultaneously segmenting the water from background and computing optical flow of the water. Taking artists¡¯ advice into account, we produce a painting structure to guide painting of brush strokes. Flow lines are placed in the water following an analysis of variance of optical flow, to cause strokes to be drawn where the water is flowing smoothly, rather than in turbulent areas: this allows a few moving strokes to depict the trends of the water flows. A variety of brush strokes is then drawn using a template determined from real Chinese paintings. The novel contributions of this paper are: a method for painting structure generation for flows in videos, and a method for stroke placement, with the necessary temporal coherence.

[ paper ] [ video ] [ bibtex ]




  • Natural Science Award of Ministry of Education, China, First Class, 2012.
  • China Computer Federation Best Dissertation Award, 2011.
  • Best Dissertation Award of Tsinghua University, 2011.
  • The Netexplorateur Internet Invention Award, 2010.
  • "LuZengYong" CAD&CG High Technology Award, 2009.

Academic Services



My Supervisor Shih-Fu Chang's group and personal homepage.

My former Supervisor Shi-Min Hu's group and personal homepage.

My Collaborators and Friends' homepages : Ralph R. Martin, Ariel Shamir, Daniel Cohen-Or, Ping Tan, Aidong Lu, Ming-Ming Cheng, Kun Xu, Yong Li.