Support Vector Machines

Support Vector Machines
Presented By
Sherwin Shaidaee
Papers

Vladimir N. Vapnik, “The Statistical Learning Theory”.
Springer, 1998

Yunqiang Chen, Xiang Zhou, and Thomas S. Huang,
University of Illinois, “ONE-CLASS SVM FOR LEARNING IN
IMAGE RETRIEVAL”, 2001

A. Ganapathiraju, Member, IEEE, J. Hamaker, Member, IEEE
and J. Picone, Senior Member, IEEE, ”Applications of
Support Vector Machines To Speech Recognition” , 2003
Introduction to Support Vector
Machines
 SVMs are based on statistical learning
theory; the aim being to solve only
the problem of interest without
solving a more difficult problem as an
intermediate step
 SVMs are based on the structural risk
minimisation principle, which
incorporates capacity control to
prevent over-fitting
Introduction to Support Vector
Machines
The Separable Case
 Two-Class Classification
 P: Positive N: Negative for Yi=+1,-1
 The support vector algorithm simply looks for
separating hyperplane with largest margin.
wT xi  b  1, for all xi  P
wT xi  b  1, for all xi  N
OR
yi (w T x i  b)  1, for all x i  P  N
Convex Quadratic Problem
1 2
Minimize (w)  w
w ,b
2
subject to yi (wT xi  b)  1, i  1,, l.
Lagrangian for this problem:


l
1
2
L(w, b, )  w   i yi (wT xi  b)  1
2
i 1
where   (1 ,, l )T are the Lagrange multipliers
Convex Quadratic Problem
Differentiation with respect to w & b:
l
 L(w, b, )
 w   i yi xi  0
w
i 1
l
 L(w, b, )
  i yi  0
b
i 1
l
l
1 2
1 l l
T
F ()   i  w   i   i  j yi y j xi x j
2
2 i 1 j 1
i 1
i 1
Support vectors
 Lie closest to the separating
hyperplane.
yi (wT xi  b)  1
 Optimal Weights:
*T
b  yi  w xi
*
 Optimal Bias:
l
w *   *i yi xi
i 1
Types of Support vectors
(a) two-class linear
(b) One-class
Decision Function:
 l
* T
*
f (x)  sgn   yi i x xi  b 
 i 1

(c) Non-linear
Kernel feature Spaces
 Feature space
X H
x   (x)
 Decision Function
f (x)  sgn(  (x)T w*  b* )
 l

 sgn   yi *i  (x)T  (xi )  b* 
 i 1

Kernel Function
f (x)  sgn(  (x)T w*  b* )
 l

 sgn   yi *i  (x)T  (xi )  b* 
 i 1

.
K (x, z)   (x)T  (z)
Type
Polynomial of degree p
Radial basis function
2-layer sigmoidal neural
network
Kernel function
Non-Separable Data
l
1
2
Minimize (w, b, )  w  C  ik
w , b,
2
i 1
subject to yi (wT  (xi )  b)  1  i , i  0, i  1,, l
Image Retrieval An Application for
SVM
 Relevance Feedback
 Problem with small number of training
samples and the high dimension of the
feature space
Image Retrieval
 One-Class SVM
 Estimate the distribution of the target
images in the feature space without
over-fitting to the user feedback.
One-Class SVM – Decision
Function (LOC-SVM)
Non-linear Case Using Kernel
(KOC-SVM)
Experiment
 Five Classes:
 Airplanes, Cars, Horses, Eagles , Glasses
 100 images for each class
 10 images are randomly drawn as
training sample.
 Hit rates in the first 20 and 100
images are used as the performance
measure.
Results