Modifying SVM to deal with high dimensional data Ammon Washburn, Neng Fan University of Arizona April 26, 2016 1 / 11 Classification using SVM You have data living in very high dimensional space and you want to make a robust classifier Can classify email (spam), pictures (face), cancer (malignant), and many others Can deal with non-separable, non-linear data as well In big data problems, there are thousands of dimensions but really only a couple have actual predicting power Let your algorithm pick the dimensions that matter If you have thousands of dimensions but just a few data points then a sparse SVM is essential to avoid over-fitting (Genetics) Also using `1 means the problem can be reformulated as a linear program (LP) Basic ideas in Activity: Support vectors, margin errors, and margins 2 / 11 SVMs and Maximum Margin programs The basic (linear) maximum margin program is defined by the following optimization problem. m min w,b,ξi X 1 kwk22 + C ξi 2 i=1 s.t. yi (w| xi − b) ≥ 1 − ξi , ξi ≥ 0, i = 1, ..., m This program finds a hyperplane between the groups of data points and uses that to categorize new data. ξi is the penalty used if a data point is ”moved” to the ”right” side while C is some heuristic constant. kwk and the margin between the groups has an inverse relation. 3 / 11 Sparse SVMs The basic sparse linear SVM is exactly the same as before but we now use the `1 norm in Rn [1, 2]. min kwk1 + C w,b,ξi m X ξi i=1 s.t. yi (w| xi − b) ≥ 1 − ξi , ξi ≥ 0, i = 1, ..., m The most sparse norm is `0 which counts the number of non-zero entries This norm isn’t continuous so `1 is next best √ (1, 0) and (√12 , √12 ) both have norm 1 in `2 but (√12 , √12 ) has norm 2 in `1 4 / 11 Full LP sparse ν-SVM m 1 X ξi min kwk1 − νρ + ρ,w,b,ξi m i=1 | s.t. yi (w xi − b) ≥ ρ − ξi i = 1, . . . , m kw k1 ≤ 1 ξi , ρ ≥ 0 i = 1, . . . , m ν has three properties which make it better than using C 1 It is an upper bound on the fraction number of margin errors (points xi with ξi > 0) or ME m 2 It is a lower bound on the fraction of support vectors (points on the boundary) or SV m 3 If the data is drawn i.i.d. from a distribution then asymptotically with probability one, ν is the fraction of margin error and SVs 5 / 11 Algorithms Simplex algorithm will give exact answers (lots of dimensions will be exactly zero) Simplex has a big problem with degeneracy for too small of ν Use interior point to find valid ν values and then use Simplex to find exact answers 6 / 11 Experiments1 Used Wisconsin Breast Cancer data: 30 features (dimensions), benign vs malignant, and an ID Split up data into training data and testing data Type L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 Train pts 50 100 200 300 500 50 100 200 300 500 ν 0.14 0.01 0.04 0.0433 0.022 0.12 0.12 0.175 0.2033 0.192 Ratio Train 0.96 1 0.995 0.99 0.994 0.96 0.95 0.935 0.9166 0.92 Ratio Test 0.9017 0.9381 0.9512 0.9702 0.9710 0.8689 0.9019 0.9159 0.9256 0.9130 ME Ratio 0.0180 0.0093 0.0047 0.0032 0.0019 0.0173 0.0090 0.0045 0.0030 0.0018 Dims 30 30 30 30 30 2 1 2 2 2 7 / 11 Experiments2 Figure 1: There are 34 malignant tumors (diamonds) and 66 benign tumors (pluses) with ν = 0.24. The first graph is the full view while the second zooms in on the important region 8 / 11 Experiments3 Simulated sparse data: The two classes are simulated from multi-variate normal where only 5 of the dimensions have different means and variance is really big. In other words there are only 5 important dimensions. Type L2 L2 L2 L2 L2 L1 L1 L1 L1 L1 Train pts 50 100 200 300 500 50 100 200 300 500 ν 0.62 0.85 0.78 0.8233 0.928 0.54 0.52 0.51 0.5333 0.574 Ratio Train 1 0.97 0.935 0.9366 0.908 0.9 0.91 0.91 0.9166 0.892 Ratio Test 0.763 0.844 0.866 0.889 0.897 0.866 0.891 0.894 0.895 0.91 ME Ratio 0.58 0.81 0.785 0.82 0.934 0.0173 0.0089 0.0044 0.0029 0.0018 Dims 100 100 100 100 100 7 4 4 3 4 9 / 11 Future Research Incorporate uncertainty into the model: Moment information or nominal information Try other sparse penalty functions Extend ideas to other classifiers 10 / 11 References I I Chiranjib Bhattacharyya, LR Grate, Michael I Jordan, L El Ghaoui, and I Saira Mian. Robust sparse hyperplane classifiers: application to uncertain molecular profiling data. Journal of Computational Biology, 11(6):1073–1089, 2004. I Jinbo Bi, Kristin Bennett, Mark Embrechts, Curt Breneman, and Minghu Song. Dimensionality reduction via sparse support vector machines. The Journal of Machine Learning Research, 3:1229–1243, 2003. 11 / 11
© Copyright 2026 Paperzz