Power Talk - University of Arizona

Modifying SVM to deal with high dimensional data
Ammon Washburn, Neng Fan
University of Arizona
April 26, 2016
1 / 11
Classification using SVM
You have data living in very high dimensional space and you want to
make a robust classifier
Can classify email (spam), pictures (face), cancer (malignant), and
many others
Can deal with non-separable, non-linear data as well
In big data problems, there are thousands of dimensions but really
only a couple have actual predicting power
Let your algorithm pick the dimensions that matter
If you have thousands of dimensions but just a few data points then a
sparse SVM is essential to avoid over-fitting (Genetics)
Also using `1 means the problem can be reformulated as a linear
program (LP)
Basic ideas in Activity: Support vectors, margin errors, and margins
2 / 11
SVMs and Maximum Margin programs
The basic (linear) maximum margin program is defined by the following
optimization problem.
m
min
w,b,ξi
X
1
kwk22 + C
ξi
2
i=1
s.t. yi (w| xi − b) ≥ 1 − ξi , ξi ≥ 0, i = 1, ..., m
This program finds a hyperplane between the groups of data points
and uses that to categorize new data.
ξi is the penalty used if a data point is ”moved” to the ”right” side
while C is some heuristic constant.
kwk and the margin between the groups has an inverse relation.
3 / 11
Sparse SVMs
The basic sparse linear SVM is exactly the same as before but we now use
the `1 norm in Rn [1, 2].
min kwk1 + C
w,b,ξi
m
X
ξi
i=1
s.t. yi (w| xi − b) ≥ 1 − ξi , ξi ≥ 0, i = 1, ..., m
The most sparse norm is `0 which counts the number of non-zero
entries
This norm isn’t continuous so `1 is next best
√
(1, 0) and (√12 , √12 ) both have norm 1 in `2 but (√12 , √12 ) has norm 2
in `1
4 / 11
Full LP sparse ν-SVM
m
1 X
ξi
min kwk1 − νρ +
ρ,w,b,ξi
m
i=1
|
s.t. yi (w xi − b) ≥ ρ − ξi
i = 1, . . . , m
kw k1 ≤ 1
ξi , ρ ≥ 0
i = 1, . . . , m
ν has three properties which make it better than using C
1
It is an upper bound on the fraction number of margin errors (points
xi with ξi > 0) or ME
m
2
It is a lower bound on the fraction of support vectors (points on the
boundary) or SV
m
3
If the data is drawn i.i.d. from a distribution then asymptotically with
probability one, ν is the fraction of margin error and SVs
5 / 11
Algorithms
Simplex algorithm will give exact answers (lots of dimensions will be
exactly zero)
Simplex has a big problem with degeneracy for too small of ν
Use interior point to find valid ν values and then use Simplex to find
exact answers
6 / 11
Experiments1
Used Wisconsin Breast Cancer data: 30 features (dimensions), benign
vs malignant, and an ID
Split up data into training data and testing data
Type
L2
L2
L2
L2
L2
L1
L1
L1
L1
L1
Train pts
50
100
200
300
500
50
100
200
300
500
ν
0.14
0.01
0.04
0.0433
0.022
0.12
0.12
0.175
0.2033
0.192
Ratio Train
0.96
1
0.995
0.99
0.994
0.96
0.95
0.935
0.9166
0.92
Ratio Test
0.9017
0.9381
0.9512
0.9702
0.9710
0.8689
0.9019
0.9159
0.9256
0.9130
ME Ratio
0.0180
0.0093
0.0047
0.0032
0.0019
0.0173
0.0090
0.0045
0.0030
0.0018
Dims
30
30
30
30
30
2
1
2
2
2
7 / 11
Experiments2
Figure 1: There are 34 malignant tumors (diamonds) and 66 benign tumors
(pluses) with ν = 0.24. The first graph is the full view while the second zooms in
on the important region
8 / 11
Experiments3
Simulated sparse data: The two classes are simulated from
multi-variate normal where only 5 of the dimensions have different
means and variance is really big.
In other words there are only 5 important dimensions.
Type
L2
L2
L2
L2
L2
L1
L1
L1
L1
L1
Train pts
50
100
200
300
500
50
100
200
300
500
ν
0.62
0.85
0.78
0.8233
0.928
0.54
0.52
0.51
0.5333
0.574
Ratio Train
1
0.97
0.935
0.9366
0.908
0.9
0.91
0.91
0.9166
0.892
Ratio Test
0.763
0.844
0.866
0.889
0.897
0.866
0.891
0.894
0.895
0.91
ME Ratio
0.58
0.81
0.785
0.82
0.934
0.0173
0.0089
0.0044
0.0029
0.0018
Dims
100
100
100
100
100
7
4
4
3
4
9 / 11
Future Research
Incorporate uncertainty into the model: Moment information or
nominal information
Try other sparse penalty functions
Extend ideas to other classifiers
10 / 11
References I
I
Chiranjib Bhattacharyya, LR Grate, Michael I Jordan, L El Ghaoui, and
I Saira Mian.
Robust sparse hyperplane classifiers: application to uncertain molecular
profiling data.
Journal of Computational Biology, 11(6):1073–1089, 2004.
I
Jinbo Bi, Kristin Bennett, Mark Embrechts, Curt Breneman, and
Minghu Song.
Dimensionality reduction via sparse support vector machines.
The Journal of Machine Learning Research, 3:1229–1243, 2003.
11 / 11