Slides - Virginia Tech

ECE 5424: Introduction to
Machine Learning
Topics:
– SVM
– Lagrangian Duality
– (Maybe) SVM dual & kernels
Readings: Barber 17.5
Stefan Lee
Virginia Tech
Recap of Last Time
(C) Dhruv Batra
2
(C) Dhruv Batra
Image Courtesy: Arthur Gretton
3
(C) Dhruv Batra
Image Courtesy: Arthur Gretton
4
(C) Dhruv Batra
Image Courtesy: Arthur Gretton
5
(C) Dhruv Batra
Image Courtesy: Arthur Gretton
6
Generative vs. Discriminative
• Generative Approach (Naïve Bayes)
– Estimate p(x|y) and p(y)
– Use Bayes Rule to predict y
• Discriminative Approach
– Estimate p(y|x) directly (Logistic Regression)
– Learn “discriminant” function f(x) (Support Vector Machine)
(C) Dhruv Batra
7
SVMs are Max-Margin Classifiers
x+
x-
Maximize this while
getting examples correct.
8
Last Time
• Hard-Margin SVM Formulation
min
𝑠. 𝑡.
𝑤,𝑏
𝑇
𝑤 𝑥𝑗
𝑤𝑇𝑤
+ 𝑏 𝑦𝑗 ≥ 1, ∀𝑗
• Soft-Margin SVM Formulation
min
𝑠. 𝑡.
𝑤,𝑏
𝑇
𝑤𝑇𝑤 + 𝐶
𝑖 𝜉𝑖
𝑤 𝑥𝑗 + 𝑏 𝑦𝑗 ≥ 1−, 𝜉𝑗 ∀𝑗
𝜉𝑗 ≥ 0,
∀𝑗
9
Last Time
• Can rewrite as a hinge loss
1 𝑇
min 𝑤 𝑤 + 𝐶
ℎ(𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏)
𝑤,𝑏 2
𝑖
SVM: Hinge Loss
LR: Logistic Loss
10
Today
• I want to show you how useful SVMs really are by
explaining the Kernel trick but to do that….
• …. we need to talk about Lagrangian Duality
11
Constrained Optimization
• How do I solve these sorts of problems:
min 𝑓 𝑤
𝑤
𝑠. 𝑡.
(C) Dhruv Batra
ℎ1 𝑤
⋮
ℎ𝑘 𝑤
𝑔1 𝑤
⋮
𝑔𝑚 𝑤
=0
=0
≤0
≤0
12
Introducing the Lagrangian
• Lets assume we have this problem
min 𝑓 𝑤
𝑤
𝑠. 𝑡.
ℎ 𝑤 =0
𝑔 𝑤 ≤0
Define the Lagrangian function:
𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤 + 𝛼𝑔 𝑤
𝑠. 𝑡. 𝛼 ≥ 0
Where 𝛽 and 𝛼 are called Lagrange Multipliers
(or dual variables)
(C) Dhruv Batra
13
Gradient of Lagrangian
𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤 + 𝛼𝑔 𝑤
𝛿𝐿
=𝑔 𝑤 =0
𝛿𝛼
𝛿𝐿
=ℎ 𝑤 =0
𝛿𝛽
𝛿𝐿 𝛿𝑓(𝑤)
𝛿ℎ(𝑤)
𝛿𝑔(𝑤)
=
+𝛽
+𝛼
=0
𝛿𝑤
𝛿𝑤
𝛿𝑤
𝛿𝑤
This will find critical points in the constrained function.
(C) Dhruv Batra
14
Building Intuition
𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤
𝛿𝐿 𝛿𝑓(𝑤)
𝛿ℎ(𝑤)
=
+𝛽
=0
𝛿𝑤
𝛿𝑤
𝛿𝑤
𝛿𝑓(𝑤)
𝛿ℎ(𝑤)
= −𝛽
𝛿𝑤
𝛿𝑤
15
Geometric Intuition
Image Credit: Wikipedia
16
Geometric Intuition
h
Image Credit: Wikipedia
17
A simple example
• Minimize f(x,y) = x + y s.t. x2 + y2 = 1
• L(x,y, 𝛽) = x + y + 𝛽(x2 + y2 – 1)
•
𝛿𝐿
𝛿𝑥
𝛿𝐿
𝛿Β
= 1 + 2𝛽𝑥 = 0,
𝛿𝐿
𝛿𝑥
= 1 + 2𝛽𝑦 = 0
= 𝑥2 + 𝑦2 − 1 = 0
• 𝑥=𝑦=
1
2𝛽
→
2
4𝛽2
=1 →𝛽= ± 2
• Makes 𝑥 = 𝑦 = ±
2
2
18
Why go through this trouble?
• Often solutions derived from the Lagrangian form are
easier to solve
• Sometimes they offer some useful intuition about the
problem
• It builds character.
19
Lagrangian Duality
• More formally on overhead
– Starring a game-theoretic interpretation, the duality gap, and
KKT conditions.
20