Semantic Concept Classification by Joint Semi-supervised Learning of Feature Subspaces and Support Vector Machines

Back to Project List



Exploding amounts of multimedia data increasingly require automatic indexing and classification, e.g. training classifiers to produce high-level features, or semantic concepts, chosen to represent image content, like car, person, etc. The scarcity of labeled training data relative to the high-dimensionality multi-modal features is one of the major obstacles for semantic concept classification of images and videos. Semi-supervised learning leverages the large amount of unlabeled data in developing effective classifiers. Feature subspace learning finds optimal feature subspaces for representing data and helping classification. In this paper, we present a novel algorithm, Locality Preserving Semi-supervised Support Vector Machines (LPSSVM), to jointly learn an optimal feature subspace as well as a large margin SVM classifier. Over both labeled and unlabeled data, an optimal feature subspace is learned that can maintain the smoothness of local neighborhoods as well as being discriminative for classification. Simultaneously, an SVM classifier is optimized in the learned feature subspace to have large margin. The resulting classifier can be readily used to handle unseen test data. Additionally, we show that the LPSSVM algorithm can be used in a Reproducing Kernel Hilbert Space for nonlinear classification.

Locality Preserving Semi-supervised SVM

To jointly learn an optimal feature subspace as well as a large margin SVM in a semi-supervised manner, we propose an algorithm called Locality Preserving Semi-supervised SVM (LPSSVM). The cost function is described below. We atim at learning an optimal feature subspace determined by projection matrix a so that the local smoothness revealed by both labeled data XL and unlabeled data XU can be preserved, and the learned feature subspace is discriminative for classifying labeled data XL. Simultaneously we want to learn an optimal large margin SVM classifier in the projected feature subspace.

With a generally used assumption that we pursue the projection matrix in the span of seen data points:


we can obtain an iterative optimization process through coordinate ascent:

The final LPSSVM algorithm is summarized in as below:


Experiments on UCI data

We compare our proposed LPSSVM algorithm with some techniques considered state of the arts, including supervised SVM, semi-supervised LapSVM and LapRLS. Also, we compare with a naive approach, LPP+SVM, to combine LPP and SVM. First kernel-based LPP is applied to get a projection matrix and then an SVM is built over the projected feature vectors. The figures below show the performance comparison over two UCI data sets, where LPSSVM consistently outperforms others when we change the number of labeled training data.


Experiments on Caltech 101

We also test on Caltech 101 image classification task. We use the bag-of-features representation with local SIFT descriptors. We construct a visual codebook containing 50 codewords. The Spatial Pyramid Match kernel with 3 spatial levels is used over the bag-of-features representation and the SPM kernel is fed into SVM for classification. The figure below shows the results. Our method performs much better than other compared semi-supervised algorithms over this task and can outperform supervised SVM with SPM kernels.


Experiments on Consumer Videos

We use a challenging consumer's video set for evaluation, which contains 1358 videos labeled to 21 semantic concepts. It is a multi-label corpus since each keyframe can be labeled to multiple concepts. 5166 keyframes are extracted from the videos. The figure below gives an example keyframe for each concept.

Both visual features like color, texture, edge, and audio features like MFCC coefficients are extracted, and visual features and audio features are concatenated to generate a 2896-dim multi-modal feature vector. 136 videos are randomly sampled as labeled training data, and the rest as unlabeled data (also for evaluation). The figure below shows the performance comparison, where LPSSVM significantly outperforms others in terms of MAP in general. LPSSVM performs particularly well over concepts with strong cues from both visual and audio channels, e.g., crowd, parade, sports, etc., where LPSSVM is able to find discriminative feature subspaces from the high-dimension multi-modal feature space.




  1. Wei Jiang, Shih-Fu Chang, Tony Jebara, Alexander Loui. Semantic concept classification by joint semi-supervised learning of feature subspaces and support vector machines. In European Conference on Computer Vision, Pages 270-283, 2008. [pdf]

For problems or questions regarding this web site contact The Web Master.
Last updated: Janunary 25th, 2009.