COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA CLASSIFICATION I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A. Outline Introduction • Statement of problems • The Objective Literature Review • Support Vector Machines – Kernel method • Multiple Classifier System – Random subspace method , Dynamic subspace method • An Optimal Kernel Method for selecting RBF Kernel Parameter Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work INTRODUCTION challenge • Time-consuming • Kernel function Methods •Support Vector Machines (SVM) •Multiple Classifier System •(Ho, T. K. (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transaction on Pattern Analysis and Machine Intelligence, 20(8), 832-844) •(Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. (2010). A Dynamic Subspace Method for Hyperspectral Image Classification, IEEE Transaction on Geosciences and Remote Sensing, 48(7), 2840-2853.) Research Problem • Hughes Phenomenon Hughes Phenomenon (Hughes, 1968) or so called curse of dimensionality, peaking phenomenon Small sample size, N High dimensionality, d N d low performance Support Vector Machines (SVM) Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998) It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, CalpeMaravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006) SVM includes Kernel Trick Support Vector Learning The Goal of Kernel Method for Classification The samples in the same class can be mapped into the same area. The samples in the different classes can be mapped into the different areas. Support Vector Learning SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set. Illustration of SV learning with kernel trick: nonlinear feature mapping : original space featurespace w yi 1 margins support vector yi 1 Feature space wT ( x) b -1 support vectors wT ( x) b 1 optimal hyperplane wT ( x) b 0 Multiple Classifier System Approaches to building classifier ensembles. Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons. There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets. (Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010) THE FRAMEWORK OF RANDOM SUBSPACE METHOD (RSM) BASED ON SVM (HO, 1998) Given the learning algorithm, SVM, and the ensemble size, S. THE INADEQUACIES OF RSM Given the learning algorithm, SVM, and the ensemble size, S. Given w *Implicit Number How to choose a suitable subspace dimensionality for the SVM. Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier. random features selection *Irregular Rule Each individual feature potentially possesses the different discriminate power for classification. A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones. DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010) 3 Density (%) 2 1 0 1 49 97 Feature 145 191 Class separability of LDA for each feature 2 ML 1 0 1 49 97 145 Feature 2 191 SVM 1 0 1 49 97 145 Feature Density (%) Density (%) 4 191 Density (%) Density (%) • Two importance distributions – Importance distribution of feature weight, W distribution to model the selected probability of each feature. 2 kNN 1 0 1 49 97 145 Feature 2 191 BCC 1 0 1 49 97 145 Feature 191 Re-substitution accuracy for each feature – Importance distribution of subspace dimensionality, R distribution to automatically determine the suitable subspace size. 4 3 Kernel smoothing 2 1 0 1 49 97 145 191 Dimensionality of Subspace Initialization Density (%) Density (%) 4 3 2 1 0 1 49 97 145 191 Dimensionality of Subspace R0 THE FRAMEWORK OF DSM BASED ON SVM Given the learning algorithm, SVM, and the ensemble size, S. INADEQUACIES OF DSM * time-consuming Given the learning algorithm, SVM, and the ensemble size, S. Choosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM. *Kernel function The SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM. An Optimal Kernel Method for Selecting RBF Kernel Parameter The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function. Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically. Gaussian Radial Basis Function (RBF) kernel : xz k ( x, z ) exp 2 2 2 , R {0} 0 k ( x, z ) 1 In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere. Kernel-based Dynamic Subspace Method (KDSM) Importance distribution of feature membership Importance distribution of subspace dimensionality Optimal RBF Kernel Parameter KDSM Optimal RBF Kernel Algorithm (Optimal parameters of each dimension in kernel function) Separability THE FRAMEWORK OF KDSM Feature (Band) Kernel based Feature Selection Distribution Mdist Kernel Space (L-dimension) ~ X MFS ( X , M , w) c~ WDS (W ) Optimal RBF Kernel Algorithm + Kernel Smoothing Subspace Pool (Reduced Dataset) Original Dataset X Kernel based W distribution Multiple Classifiers Until the performance of classification is stable Decision Fusion (Majority Voting) Experiment Design Algorithm SVM_CV SVM_OP DSM_WACC DSM_ WLDA KDSM Description Without any dimension reduction on only a single SVM with CV method Without any dimension reduction on only a single SVM with OP method DSM with the re-substitution accuracy as the feature weights DSM with the separability of Fisher’s LDA as the feature weights Kernel-based dynamic subspace method proposed in this research OP : the optimal method to choose CV : 5-fold cross-validation We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins. EXPERIMENTAL DATASET Hyperspectral Image data IR Image Image (No. of bands) Washington, DC Mall (dims d=191) # of classes 7 Category (No. of labeled data) Roof (3776) Road (1982) Path (737) Grass (2870) Tree (1430) Water (1156) Shadow (840) Experimental Results There are three cases in Washington, DC Mall. case 1: N i 20 N 140 d 220 ; case 2: N i 40 d 220 N 280 case 3: d 220 N i 300 N 2100 Case 1 Case 2 Case 3 Method SVM_CV SVM_OP DSM_WACC DSM_WLDA KDSM Accuracy (%) 83.66 83.79 85.49 87.47 88.64 CPU Time (sec) 30.35 3.10 6045.31 2188.62 155.31 Accuracy (%) 86.39 87.89 88.74 89.43 92.53 CPU Time (sec) 116.02 6.65 21113.75 4883.92 308.26 Accuracy (%) 94.69 95.31 95.94 96.94 97.43 376.99 1165048.6 220121.62 17847.7 CPU Time (sec) 5858.18 N i : the number of training samples in class i N : the number of all training samples Experiment Results in Washington, DC Mall The outcome of classification by using various multiple classifier systems: Case 1 Case 2 Case 3 Method Accuracy Ratio Accuracy Ratio Accuracy Ratio DSM_WACC 85.49% 38.924 88.74% 68.493 95.94% 65.277 DSM_WLDA 87.47% 14.092 89.43% 15.844 96.94% 12.333 KDSM 88.64% 1 92.53% 1 97.43% 1 Classification Maps with N =20 in Washington, DC Mall i □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow DSM_WACC SVM_CV DSM_WLDA SVM_OP KDSM Classification Maps (roof) with N =40 i □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow DSM_WACC SVM_CV DSM_WLDA SVM_OP KDSM Classification Maps with N =300 in Washington, DC Mall i □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow DSM_WACC SVM_CV DSM_WLDA SVM_OP KDSM Conclusions In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset. The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets. Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time. Thank You 25
© Copyright 2024 Paperzz