Dong-Qing Zhang 

Email: dqzhang at

Ph.D. Student in the Digital Video and Multimedia Lab (DVMM),
Department of Electrical Engineering, Columbia University.

Research Interests

      My research interests are in Artificial Intelligience, multimedia content analysis, high-level computer vision and data mining using statistical and machine learning methods. My Ph.D. research is focused on statistical relational (part-based) models for visual content analysis, including image matching, visual concept detection and other related problems.

Recent Research

1. Learning Random Attributed Relational Graph for Part-based Object/scene Detection

Due to the popularity of digital cameras and camcorders, we have witnessed the dramatic increase of visual content such as photos and videos in recent years. The exponential accumulation of content calls for the need of efficient and accurate indexing and search of visual contents. Users usually are more interested in searching photos and videos in the semantic level instead of the traditional visual feature level. Semantic visual indexing therefore tries to index visual content by tagging photos or video segments with semantic labels, known as visual concepts, which include single or composite objects, visual scenes, events, or their compositions. Due to the large number of visual concepts for indexing, it is necessary to enable the system to automatically learn the concept models from training data, instead of designing them manually by researchers.

We have developed a novel model called Random Attributed Relational Graph for part-based static concept (object and scene) detection. The model extends the traditional Attributed Relational Graph or Random Graph by attaching the graph with random variables, which are used to capture the statistics of part appearance features and part relational features. Random Attributed Relational Graphs can be learned from the training images in an unsupervised manner (i.e. no need to label the parts), and offers high accuracy of detecting objects and scenes and higher learning speed than the previous methods. For more details, please see our object detection web site.

. Learning-based Attributed Relational Graph Matching for Part-based Image Similarity

In this project, we aim at developing a part-based image similarity for detecting Image Near-Duplicate in image databases. In recent years, evidence from computer vision research has shown the promise of part-based model for object recognition and scene understanding. This motivates us to develop a part-based image similarity for accurate image matching and retrieval.
We realize part-based modeling using Attributed relational graph(ARG), an extension of the ordinary graph by attaching real-valued and multidimensional attributes to the vertices and edges. We propose a novel transformation based framework for ARG similarity, which defines the ARG similarity as the likelihood ratio of whether or not the data graph is transformed from the model graph. This framework not only offers a principled definition of ARG similarity but also provide a way for learning ARG similarity from training data in an unsupervised manner. For more details, please see our Image Near-Duplicate Detection web site.

3. Visual Text Detection and Recognition for Visual Indexing

Visual text is the text embedded in visual scenes or overlaid on photos or videos. Since visual text is highly correlated with the visual content inside the photos and videos, detecting and recognizing visual text is very useful for visual indexing and search. In the past, we have explored a few problems relating to visual text detection and recognition, including Overlay Text Extraction , Videotext Recognition , Sports Video Summarization by Videotext Recognition and scene text detection.

4. Graph Theoretical Methods for Multimedia Data Mining

Vector space model has been the most popular model in text/multimedia search and clustering applications. However, many real-world data are difficult to be modeled as feature vectors, for example, graphs or trees. This project aims at applying graph theoretical methods to multimedia data clustering, where the data and their similarities are modeled as weighted graphs without resorting to vector-based representation. Video clustering is realized by partitioning the weighed graphs with disjoined subgraphs. We have explored the spectral clustering methods in the context of bipartite or k-partite graphs for News Video Threading across Sources. For more details, see our publication.


Demo Snapshots

Past Projects

1. Audio processing class final project

2. Facial expression recognition (in KRDL, Singapore)

Last Update : June, 2005.