Towards Scalable Support Vector Machines Using Squashing

Towards Scalable Support Vector
Machines Using Squashing
• Author:Dmitry Pavlov, Darya Chudova,
•
Padhraic Smyth
•
Info. And Comp. Science
•
University of California
• Advisor:Dr. Hsu.
• Reporter:Hung Ching-Wen
Outline
•
•
•
•
•
•
•
1. Motivation
2. Objective
3. Introduction
4. SVM
5. Squashing for SVM
6.EXPERIMENTS
7. conclusion
Motivation
• SVM provide classification model with
strong theoretical foundation and excellent
empirical performance.
• But the major drawback of SVM is the
necessity to solve a large-scale quadratic
programming problem.
Objective
• This paper combines likelihooh-based
squashing with a probabilistic formulation
of SVMs, enabling fast training on squashed
data sets.
Introduction
• The applicability of SVMs to large datasets
is limited ,because the high computational
cost.
• Speed-up training algorithms:
• Chunking,Osuna’s decomposition method
SMO
• They can accelerate the training, but cannot
scale well with the size of the training data.
Introduction
•
•
•
•
Reducing the computational cost :
Sampling
Boosting
Squashing(DuMouchel et. al.,Madigan et.
al.)
• 本文作者提出Squashing-SMO,以解決
SVM的高計算成本問題
SVM
• Training data:D={(xi,yi):i=1,…,N}
•
xi is a vector, yi=+1,-1
• In linear SVM :The linear separating
classify y=<w,x>+b
• w is the normal vector
• b is the intercept of the hyperplane
SVM(non-separable)
SVM(a prior on w)
Squashing for SVM
• (1).Select a probabilistic model
• P((X,Y) ∣θ)
• (2).Our objective is to find mle θML
Squashing for SVM
• (3). Training data:D={(xi,yi):i=1,…,N}can be
grouped into Nc groups
• (Xc,Yc)sq:The squashed data point placed at the
cluster C
• βc :the wieght
Squashing for SVM
• If take the prior of w is
• P(w) ~exp(-∥w∥2)
Squashing for SVM
• (4).The optimization model for the
squashed data:
Squashing for SVM
• Important design issues for the squashing
algorithm:
• (1).the choice of the number and location of the
squashing points
• (2).to sample the values of w from the prior p(w)
• (3).b can be made from the optimization model
• (4).fixed w,b ,we evaluate the likelihood of
training point, and repeat the selection procedure L
times(L is length)
EXPERIMENTS
•
•
•
•
experiment datasets:
Synthetic data
UCI machine learning
UCI KKD repositories
EXPERIMENTS
• Evalute:
• Full-SMO,Srs-SMO(simple random
simple),squash-SMO,boost-SMO
• Run:over 100 runs
• Performance:
• Misclassification rate ,learning time ,the
memory
EXPERIMENTS(Results on
Synthetic data)
• (Wf,bf):estimated by full-SMO
• (Ws,bs): :estimated by squashed or sampled data
EXPERIMENTS(Results on
Synthetic data)
EXPERIMENTS(Results on
Synthetic data)
EXPERIMENTS(Results on
Benchmark data)
EXPERIMENTS(Results on
Benchmark data)
EXPERIMENTS(Results on
Benchmark data
EXPERIMENTS(Results on
Benchmark data)
conclusion
• 1.we describe how the use of squashing make the
training of SVM applicable to large datasets.
• 2.comparison with full-SMO show squash-SMO
and boost-SMO are near-optimal performance
with much lower time and memory.
• 3.srs-SMO has a higher misclassification rate.
• 4.squash-SMO and boost-SMO can tune
parameter in cross-validation ,it is impossible to
full-SMO
conclusion
• 5.although the performance of squash-SMO
and boost-SMO is similar on the benchmark
problems.
• 6. squash-SMO can offer a better
interpretability of model and can be
expected to run faster than SMO that do not
reside in the memory.
opinion
• It is a good ideal that the author describe
how the use of squashing make the training
of SVM applicable to large datasets.
• 我們可以根據資料性質來改變w的prior
distribution, 例如指數分配,Log-normal,或
用無母數方法去做