SVM RFE

Gene Selection for Cancer
Classification using
Support Vector Machines
Jesye
Content
• Feature ranking with correlation
coefficients
• SVM Recursive Feature Elimination
(SVM RFE)
• Data sets
• Experimental results
Ranking criteria
wi = (mi(+) – mi(-)) / (si(+)+ si(-))
(Golub , 1999)
mi(+):mean of class (+)
mi(-):mean of class (-)
si(+):standard deviation of class (+)
si(-):standard deviation of class (-)
Ranking criteria
(Golub , 1999)
equal number of genes in + - class
(Furey, 2000)
absolute value of wi
(Pavlidis, 2000)
2
2
2
(mi(+) – mi(-)) / (si(+) + si(-) )
similar to Fisher’s discriminant criterion
This paper : wi 2
Recursive Feature Elimination
• 1) Train the classifier.
• 2) Compute the ranking criterion
for all features.
• 3) Remove the feature
with smallest ranking criterion.
SVM RFE
Inputs:
Training examples
X0 = [x1, x2, … xk, …xl]
Class labels
y = [y1, y2, … yk, … yl]
Output:
Feature ranked list r.
SVM RFE
Initialize:
Surviving features
s = [1, 2, … n]
Feature ranked list
r=[]
SVM RFE
Repeat until s = [ ]
X = X0(:, s)
a = SVM-train(X, y)
w = ak yk xk
ci = (wi) 2, for all i
f = argmin(c)
r = [s(f), r]
s = s(1:f-1, f+1:length(s))
End
SVM Model
Minimize over a k :
J  (1/ 2) yh yka ha k ( xh xk   hk )  a k
hk
k
Subject to:
0  ak  C
and
a
k
yk  0
k
Outputs: Parameters: a k
Data sets
• Leukemia
• Colon
7129×72
2000×62
Data sets
For example:
sp. Sample 1 Sample 2 ……
Gene
(Cancer)
Sample k
(Normal)
Gene 1
29
19 ……
16
Gene 2
5
17 ……
40
……
Gene n
……
……
13
……
8 ……
……
2
Experimental results
• Leukemia
Number of genes
100
50
34
20
10
8
5
3
1
(SVM-RFE)
Train accuracy
100
100
100
100
100
100
100
100
92.093
Test accuracy
99.31
98.276
99.31
98.621
98.621
96.552
95.172
92.759
78.966
Experimental results
• Colon
(SVM-RFE)
Number of genes
100
50
33
20
10
8
5
3
1
Train accuracy
100
100
100
100
100
100
99.189
95.405
80
Test accuracy
80.4
80.8
82
79.2
78.8
77.6
75.6
77.6
71.6
The End
Thank you for watching!