poster - Computer Science and Engineering

Artificial Intelligence Research Laboratory
Bioinformatics and Computational Biology Program
Computational Intelligence, Learning, and Discovery Program
Department of Computer Science
AAAI 2005
Learning Support Vector Machine Classifiers From Distributed Data Sources
Cornelia Caragea, Doina Caragea and Vasant Honavar
Learning from Data
Given a data set D, a hypothesis class H, and a performance criterion P, the
learning algorithm L outputs a hypothesis hH that optimizes P.
Support Vectors
Learning from Distributed Data
Margin of Separation
Given the data fragments D1,…, DN of a data set D distributed across N sites, a set
of constraints Z, a hypothesis class H and a performance criterion P, the task of the
learner Ld is to output a hypothesis hH that optimizes P, using only operations
allowed by Z.
Sufficient Statistics: A statistic sL(D) is a sufficient statistic for learning a hypothesis
h using a learning algorithm L applied to a data set D if there exists a procedure that
takes sL(D) as input and outputs h. Usually, we cannot compute all the sufficient
statistics at once. Instead we can only compute sufficient statistics for the refinement
of a hypothesis hi into a hypothesis hi+1.
Exactness: An algorithm Ld for learning from distributed data sets D1, …, DN is said
to be exact relative to its centralized counterpart L if the hypothesis produced by Ld is
identical to that obtained by L from complete data set D by appropriately combining
the data sets D1, …, DN.
D
D1
D1
D2
Separating Hyperplane: wx+b=0
Separating Hyperplane: wx+b=0
D3
D2
Margin of Separation
Counterexample to Naïve Distributed SVM
1
2
arg max || w || subject to yi ( w  xi + b )  1 for all i  1,..., N
2
Learning from distributed data Ld
Optimal solution: w    yi xi and b  yi  w xi for any i such that i  0
*
*
*
*
i 1
SVM Algorithm
Learning Phase
SVM(D:data, K:kernel)
Solve the optimization problem:
t
1 t t
k
max  ( )   (1  i )i   i  j yi y j  K ( xi , x j ) + C ( i )
2 i 1 j 1
 0
i 1
i 1
subject to:
Learning from data L
Classifier h
t
 y
Classifier h
i 1
Exactness condition: q(D) = C(q1(D1),…, qN(DN))
Our approach relies on identifying sufficient statistics for learning SVMs. We present
an algorithm that learns SVMs from distributed data by iteratively computing the set
of refinement sufficient statistics. Our algorithm is exact with respect to its centralized
counterpart and efficient in terms of time complexity.
Information extraction from distributed data + Hypothesis generation
VConv(VConv( D1 ) VConv( D2 ))  VConv( D1D2 )
N
t
D3
Exact learning: all boundary information VConv(D+)VConv(D-)
where VConv(D) - the set of vertices that define the convex hull of D.
Algorithm exponential in the number of dimensions.
i
i
0
SVM from Horizontally Distributed Data
Learning Phase
Initialize SV   (the global set of support vectors).
repeat
{
Let SV '  SV .
Send SV ' to all data sources Dk
for (each data source Dk )
{
Apply SVM ( Dk SV ' ) and find the support vectors SVk
Send the support vectors SVk to the central location.
}
At the central location:N
Compute SVD   SVk .

0  i  Ci  1,..., t
Let  be the solution of this optimization problem.
Classification Phase
For a new instance x
t
assign x to the class f ( x)  sign ( yi *i  K ( x, xi ) + b* )
*
k 1
Apply SVM (SVD )
i 1
The support vectors (xi,yi) and their corresponding coefficients λi can be seen
as sufficient statistics.
}
until
Let
Query answering engine
Let
Learner
Partial hypothesis hi
Statistical Query
Formulation
Hypothesis Generation
hi+1R(hi , s(D, hi->hi+1 ))
Query s(D,hi->hi+1)
Query
Decomposition q1
D1
q2
D2
Answer
Answer s(D,hi->hi+1) Composition
qK
SVM finds a separating hyperplane maximizing the margin of separation between
classes when the data are linearly separable.
Kernels can be used to make data sets separable in high dimensional feature spaces.
SVM is among one of the most effective machine learning algorithms for
classification problems.

Apply SVM
to the set SV
Query
Decomp
D2
Answer
Comp
SV={(xi,yi)| sv}
Take
union
p
f ( x)  sign ( yil *il  K ( x, xil ) + b* )
l 1
D1
SV(D)
be the set of final support vectors.
be their corresponding weights.
Query answering engine
SVM
Statistical query
formulation
*
il
Classification Phase
for a new instance x
assign x to the class
Set SV
DN
Support Vector Machines
SV  SV '
xi1 , yi1 ,..., xi p , yi p
Learning Support Vector Machines from Distributed Data
SV(Di)
to find the new SV
D3
Naïve Approach: the resulting algorithm is not exact!
Exact and Efficient Learning SVM From Distributed Data
We ran experiments on artificially generated data and protein function
classification data. The results showed that our algorithm converges to the
exact solution in a relatively small number of steps. This makes it preferable
to previous algorithm for learning from distributed data.
Data sources
Naïve-Tr. Acc.
Naïve-Ts. Acc.
Iter-Tr. Acc.
Iter-Ts. Acc.
Centr-Tr. Acc.
Centr.-Ts.Acc.
No of iter.
Artificially generated
0.75
0.61
1
1
1
1
3
Human-Yeast protein
0.60
0.57
0.69
0.63
0.69
0.63
3
Acknowledgements: This work is supported in part by grants from the National Science Foundation (IIS 0219699), and the National Institutes of Health (GM 066387) to Vasant Honavar.