ECE 5424: Introduction to Machine Learning Topics: – SVM – Lagrangian Duality – (Maybe) SVM dual & kernels Readings: Barber 17.5 Stefan Lee Virginia Tech Recap of Last Time (C) Dhruv Batra 2 (C) Dhruv Batra Image Courtesy: Arthur Gretton 3 (C) Dhruv Batra Image Courtesy: Arthur Gretton 4 (C) Dhruv Batra Image Courtesy: Arthur Gretton 5 (C) Dhruv Batra Image Courtesy: Arthur Gretton 6 Generative vs. Discriminative • Generative Approach (Naïve Bayes) – Estimate p(x|y) and p(y) – Use Bayes Rule to predict y • Discriminative Approach – Estimate p(y|x) directly (Logistic Regression) – Learn “discriminant” function f(x) (Support Vector Machine) (C) Dhruv Batra 7 SVMs are Max-Margin Classifiers x+ x- Maximize this while getting examples correct. 8 Last Time • Hard-Margin SVM Formulation min 𝑠. 𝑡. 𝑤,𝑏 𝑇 𝑤 𝑥𝑗 𝑤𝑇𝑤 + 𝑏 𝑦𝑗 ≥ 1, ∀𝑗 • Soft-Margin SVM Formulation min 𝑠. 𝑡. 𝑤,𝑏 𝑇 𝑤𝑇𝑤 + 𝐶 𝑖 𝜉𝑖 𝑤 𝑥𝑗 + 𝑏 𝑦𝑗 ≥ 1−, 𝜉𝑗 ∀𝑗 𝜉𝑗 ≥ 0, ∀𝑗 9 Last Time • Can rewrite as a hinge loss 1 𝑇 min 𝑤 𝑤 + 𝐶 ℎ(𝑦𝑖 (𝑤 𝑇 𝑥𝑖 + 𝑏) 𝑤,𝑏 2 𝑖 SVM: Hinge Loss LR: Logistic Loss 10 Today • I want to show you how useful SVMs really are by explaining the Kernel trick but to do that…. • …. we need to talk about Lagrangian Duality 11 Constrained Optimization • How do I solve these sorts of problems: min 𝑓 𝑤 𝑤 𝑠. 𝑡. (C) Dhruv Batra ℎ1 𝑤 ⋮ ℎ𝑘 𝑤 𝑔1 𝑤 ⋮ 𝑔𝑚 𝑤 =0 =0 ≤0 ≤0 12 Introducing the Lagrangian • Lets assume we have this problem min 𝑓 𝑤 𝑤 𝑠. 𝑡. ℎ 𝑤 =0 𝑔 𝑤 ≤0 Define the Lagrangian function: 𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤 + 𝛼𝑔 𝑤 𝑠. 𝑡. 𝛼 ≥ 0 Where 𝛽 and 𝛼 are called Lagrange Multipliers (or dual variables) (C) Dhruv Batra 13 Gradient of Lagrangian 𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤 + 𝛼𝑔 𝑤 𝛿𝐿 =𝑔 𝑤 =0 𝛿𝛼 𝛿𝐿 =ℎ 𝑤 =0 𝛿𝛽 𝛿𝐿 𝛿𝑓(𝑤) 𝛿ℎ(𝑤) 𝛿𝑔(𝑤) = +𝛽 +𝛼 =0 𝛿𝑤 𝛿𝑤 𝛿𝑤 𝛿𝑤 This will find critical points in the constrained function. (C) Dhruv Batra 14 Building Intuition 𝐿 𝑤, 𝛽, 𝛼 = 𝑓 𝑤 + 𝛽ℎ 𝑤 𝛿𝐿 𝛿𝑓(𝑤) 𝛿ℎ(𝑤) = +𝛽 =0 𝛿𝑤 𝛿𝑤 𝛿𝑤 𝛿𝑓(𝑤) 𝛿ℎ(𝑤) = −𝛽 𝛿𝑤 𝛿𝑤 15 Geometric Intuition Image Credit: Wikipedia 16 Geometric Intuition h Image Credit: Wikipedia 17 A simple example • Minimize f(x,y) = x + y s.t. x2 + y2 = 1 • L(x,y, 𝛽) = x + y + 𝛽(x2 + y2 – 1) • 𝛿𝐿 𝛿𝑥 𝛿𝐿 𝛿Β = 1 + 2𝛽𝑥 = 0, 𝛿𝐿 𝛿𝑥 = 1 + 2𝛽𝑦 = 0 = 𝑥2 + 𝑦2 − 1 = 0 • 𝑥=𝑦= 1 2𝛽 → 2 4𝛽2 =1 →𝛽= ± 2 • Makes 𝑥 = 𝑦 = ± 2 2 18 Why go through this trouble? • Often solutions derived from the Lagrangian form are easier to solve • Sometimes they offer some useful intuition about the problem • It builds character. 19 Lagrangian Duality • More formally on overhead – Starring a game-theoretic interpretation, the duality gap, and KKT conditions. 20
© Copyright 2026 Paperzz