|
|
Summary
Exploding amounts of multimedia data increasingly require automatic indexing and classification,
e.g. training classifiers to produce high-level features, or semantic concepts, chosen to represent image
content, like car, person, etc. The scarcity of labeled training data relative to the high-dimensionality multi-modal features
is one of the major obstacles for semantic concept classification of images and videos. Semi-supervised learning leverages
the large amount of unlabeled data in developing effective classifiers. Feature subspace learning finds
optimal feature subspaces for representing data and helping classification. In this paper, we present a
novel algorithm, Locality Preserving Semi-supervised Support Vector Machines (LPSSVM), to jointly learn
an optimal feature subspace as well as a large margin SVM classifier. Over both labeled and unlabeled
data, an optimal feature subspace is learned that can maintain the smoothness of local neighborhoods as
well as being discriminative for classification. Simultaneously, an SVM classifier is optimized in the
learned feature subspace to have large margin. The resulting classifier can be readily used to handle
unseen test data. Additionally, we show that the LPSSVM algorithm can be used in a Reproducing Kernel
Hilbert Space for nonlinear classification.
Locality Preserving Semi-supervised SVM
To jointly learn an optimal feature subspace as well as a large margin SVM in a semi-supervised manner,
we propose an algorithm called Locality Preserving Semi-supervised SVM (LPSSVM). The cost function is
described below. We atim at learning an optimal feature subspace determined by projection matrix a
so that the local smoothness revealed by both labeled data XL and unlabeled data XU
can be preserved, and the learned feature subspace is discriminative for classifying labeled data XL.
Simultaneously we want to learn an optimal large margin SVM classifier in the projected feature subspace.

With a generally used assumption that we pursue the projection matrix in the span of seen data points:
,
we can obtain an iterative optimization process through coordinate ascent:
The final LPSSVM algorithm is summarized in as below:
Experiments on UCI data
We compare our proposed LPSSVM algorithm with some techniques considered state of the arts, including supervised SVM,
semi-supervised LapSVM and LapRLS. Also, we compare with a naive approach, LPP+SVM, to combine LPP and SVM. First
kernel-based LPP is applied to get a projection matrix and then an SVM is built over the projected feature vectors.
The figures below show the performance comparison over two UCI data sets, where LPSSVM consistently outperforms others
when we change the number of labeled training data.
Experiments on Caltech 101
We also test on Caltech 101 image classification task. We use the bag-of-features representation with local SIFT descriptors.
We construct a visual codebook containing 50 codewords. The Spatial Pyramid Match kernel with 3 spatial levels is used over the
bag-of-features representation and the SPM kernel is fed into SVM for classification. The figure below shows the results.
Our method performs much better than other compared semi-supervised algorithms over this task and can outperform
supervised SVM with SPM kernels.
Experiments on Consumer Videos
We use a challenging consumer's video set for evaluation, which contains 1358 videos labeled to 21 semantic concepts.
It is a multi-label corpus since each keyframe can be labeled to multiple concepts. 5166 keyframes are extracted from
the videos. The figure below gives an example keyframe for each concept.
Both visual features like color, texture, edge, and audio features like MFCC coefficients are extracted, and
visual features and audio features are concatenated to generate a 2896-dim multi-modal feature vector. 136 videos are randomly
sampled as labeled training data, and the rest as unlabeled data (also for evaluation). The figure below shows the performance
comparison, where LPSSVM significantly outperforms others in terms of MAP in general. LPSSVM performs particularly well over
concepts with strong cues from both visual and audio channels, e.g., crowd, parade, sports, etc., where LPSSVM is able to
find discriminative feature subspaces from the high-dimension multi-modal feature space.
People
Publication
-
Wei Jiang, Shih-Fu Chang, Tony Jebara, Alexander Loui.
Semantic concept classification by joint semi-supervised learning of feature subspaces
and support vector machines. In European Conference on Computer Vision, Pages 270-283, 2008.
[pdf]
For problems or questions
regarding this web site contact The
Web Master.
Last updated: Janunary 25th, 2009. |