Kernel-based Dynamic Subspace Method (KDSM)

COMBINING ENSEMBLE
TECHNIQUE OF SUPPORT VECTOR
MACHINES WITH THE OPTIMAL
KERNEL METHOD FOR HIGH
DIMENSIONAL DATA
CLASSIFICATION
I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3
1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung,
Taiwan, R.O.C.
2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C.
3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.
Outline
Introduction
• Statement of problems
• The Objective
 Literature Review
• Support Vector Machines
– Kernel method
• Multiple Classifier System
– Random subspace method , Dynamic subspace
method
• An Optimal Kernel Method for selecting RBF
Kernel Parameter
 Optimal Kernel-based Dynamic Subspace Method
 Experimental Design and Results
 Conclusion and Future Work

INTRODUCTION
challenge
• Time-consuming
• Kernel function
Methods
•Support Vector Machines (SVM)
•Multiple Classifier System
•(Ho, T. K. (1998). The Random Subspace Method for
Constructing Decision Forests. IEEE Transaction on
Pattern Analysis and Machine Intelligence, 20(8), 832-844)
•(Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. (2010).
A Dynamic Subspace Method for Hyperspectral Image
Classification, IEEE Transaction on Geosciences and
Remote Sensing, 48(7), 2840-2853.)
Research
Problem
• Hughes
Phenomenon
Hughes Phenomenon (Hughes, 1968)
or so called curse of dimensionality, peaking phenomenon
Small sample size, N
High dimensionality, d
N  d
low performance
Support Vector Machines (SVM)

Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997,
1998)

It’s robust and effect to Hughes phenomenon.
(Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, CalpeMaravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, &
Benediktsson, 2006)

SVM includes

Kernel Trick

Support Vector Learning
The Goal of Kernel Method for Classification
The samples in the same class can be mapped into the
same area.
 The samples in the different classes can be mapped into
the different areas.

Support Vector Learning

SV learning tries to learn a linear separating hyperplane for
a two-class classification problem via a given training set.
Illustration of SV learning with kernel trick:
nonlinear feature mapping  : original space  featurespace
w
yi  1
margins
support vector
yi  1
Feature space
wT  ( x)  b  -1
support vectors
wT  ( x)  b  1
optimal hyperplane
wT  ( x)  b  0
Multiple Classifier System

Approaches to building classifier ensembles.
Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.

There are two effective approaches for generating an ensemble of
diverse base classifiers via different feature subsets.
(Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010)
THE FRAMEWORK OF RANDOM SUBSPACE METHOD
(RSM) BASED ON SVM (HO, 1998)
Given the learning algorithm, SVM, and the ensemble size, S.
THE INADEQUACIES OF RSM
Given the learning algorithm, SVM, and the ensemble size, S.
Given w
*Implicit Number
How to choose a suitable subspace
dimensionality for the SVM.
Without an appropriate subspace
dimensionality for the SVM, RSM
might be inferior to a single
classifier.
random features selection
*Irregular Rule
Each individual feature potentially
possesses the different discriminate
power for classification.
A randomized strategy for selecting
feature is unable to distinguish between
informative features and redundant ones.
DYNAMIC SUBSPACE METHOD (DSM)
(Yang et al., 2010)
3
Density (%)
2
1
0
1
49
97
Feature
145
191
Class separability of LDA for each feature
2
ML
1
0
1
49
97
145
Feature
2
191
SVM
1
0
1
49
97
145
Feature
Density (%)
Density (%)
4
191
Density (%)
Density (%)
• Two importance distributions
– Importance distribution of feature weight, W distribution
to model the selected probability of each feature.
2
kNN
1
0
1
49
97
145
Feature
2
191
BCC
1
0
1
49
97
145
Feature
191
Re-substitution accuracy for each feature
– Importance distribution of subspace dimensionality, R distribution
to automatically determine the suitable subspace size.
4
3
Kernel smoothing
2
1
0
1
49
97
145
191
Dimensionality of Subspace
Initialization
Density (%)
Density (%)
4
3
2
1
0
1
49
97
145
191
Dimensionality of Subspace
R0
THE FRAMEWORK OF DSM BASED ON SVM
Given the learning algorithm, SVM, and the ensemble size, S.
INADEQUACIES OF DSM
* time-consuming
Given
the learning algorithm, SVM, and the ensemble size, S.
Choosing a proper kernel function or a better
parameter of kernel for SVM is quite important
yet ordinarily time-consuming. Especially, an
updating R distribution is obtained by the
resubstitution accuracy in DSM.
*Kernel function
The SVM algorithm provides an effective way
to perform supervised classification. However,
The kernel function is a critical topic to
influence the performance of SVM.
An Optimal Kernel Method for Selecting
RBF Kernel Parameter

The performances of SVM are based on choosing the proper kernel
functions or proper parameters of a kernel function.

Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper
parameter σ of RBF kernel function automatically.
Gaussian Radial Basis Function (RBF) kernel :
 xz
k ( x, z )  exp
 2 2


2

 ,   R  {0} 0  k ( x, z )  1


In the feature space determined by the RBF kernel, the norm of every
sample is one, and the kernel values are positive. Hence, the samples will be
mapped onto the surface of a hypersphere.
Kernel-based Dynamic Subspace
Method (KDSM)
Importance
distribution of
feature
membership
Importance
distribution of
subspace
dimensionality
Optimal RBF
Kernel Parameter
KDSM
Optimal RBF Kernel Algorithm
(Optimal parameters of each dimension in kernel function)
Separability
THE FRAMEWORK OF KDSM
Feature (Band)
Kernel based Feature Selection Distribution
Mdist
Kernel Space
(L-dimension)
~
X  MFS ( X , M , w)
c~  WDS (W )
Optimal RBF Kernel Algorithm + Kernel
Smoothing
Subspace Pool
(Reduced Dataset)
Original Dataset
X
Kernel based W
distribution
Multiple Classifiers
Until the performance of
classification is stable
Decision Fusion
(Majority Voting)
Experiment Design
Algorithm
SVM_CV
SVM_OP
DSM_WACC
DSM_ WLDA
KDSM
Description
Without any dimension reduction on only a
single SVM with CV method
Without any dimension reduction on only a
single SVM with OP method
DSM with the re-substitution accuracy as the
feature weights
DSM with the separability of Fisher’s LDA as
the feature weights
Kernel-based dynamic subspace method
proposed in this research
OP : the optimal method to choose

CV : 5-fold cross-validation
We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a
proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a
proper parameter of slack variable to control the margins.
EXPERIMENTAL DATASET
Hyperspectral Image data
IR Image
Image
(No. of bands)
Washington, DC Mall
(dims d=191)
# of classes
7
Category
(No. of labeled data)
Roof (3776)
Road (1982)
Path (737)
Grass (2870)
Tree (1430)
Water (1156)
Shadow (840)
Experimental Results

There are three cases in Washington, DC Mall.
case 1: N i  20  N  140  d  220 ; case 2: N i  40  d  220  N  280
case 3: d  220  N i  300  N  2100
Case 1
Case 2
Case 3
Method
SVM_CV
SVM_OP
DSM_WACC
DSM_WLDA
KDSM
Accuracy (%)
83.66
83.79
85.49
87.47
88.64
CPU Time (sec)
30.35
3.10
6045.31
2188.62
155.31
Accuracy (%)
86.39
87.89
88.74
89.43
92.53
CPU Time (sec)
116.02
6.65
21113.75
4883.92
308.26
Accuracy (%)
94.69
95.31
95.94
96.94
97.43
376.99
1165048.6
220121.62
17847.7
CPU Time (sec) 5858.18
N i : the number of training samples in class i
N : the number of all training samples
Experiment Results in Washington, DC Mall
The outcome of classification by using various multiple classifier systems:
Case 1
Case 2
Case 3
Method
Accuracy
Ratio
Accuracy
Ratio
Accuracy
Ratio
DSM_WACC
85.49%
38.924
88.74%
68.493
95.94%
65.277
DSM_WLDA
87.47%
14.092
89.43%
15.844
96.94%
12.333
KDSM
88.64%
1
92.53%
1
97.43%
1
Classification Maps with N =20 in Washington, DC Mall
i
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof
■ Road ■ Shadow
DSM_WACC
SVM_CV
DSM_WLDA
SVM_OP
KDSM
Classification Maps (roof) with N =40
i
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof
■ Road ■ Shadow
DSM_WACC
SVM_CV
DSM_WLDA
SVM_OP
KDSM
Classification Maps with N =300 in Washington, DC Mall
i
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof
■ Road ■ Shadow
DSM_WACC
SVM_CV
DSM_WLDA
SVM_OP
KDSM
Conclusions

In this paper, the core of the presented method, KDSM, is
applying both optimal algorithm of selecting the proper RBF
parameter and dynamic subspace method in the subspace
selection based MCS to improve the result of classification in
high dimensional dataset.

The experimental results showed that the classification
accuracies of KDSM invariably are the best among outcomes of
all classifiers in each cases of Washington DC Mall datasets.

Moreover, these results show that comparing with DSM, the
KDSM can not only obtain more accurate outcome of
classification but also economize on computer time.
Thank You
25