Support Vector Machines More information can be found in http://www.cs.cmu.edu/~awm/tutorials Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data? Support Vector Machines: Slide 2 Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data? Support Vector Machines: Slide 3 Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data? Support Vector Machines: Slide 4 Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 How would you classify this data? Support Vector Machines: Slide 5 Linear Classifiers x denotes +1 a f yest f(x,w,b) = sign(w. x - b) denotes -1 Any of these would be fine.. ..but which is best? Support Vector Machines: Slide 6 Classifier Margin x denotes +1 denotes -1 a f yest f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. Support Vector Machines: Slide 7 Maximum Margin a x denotes +1 f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. denotes -1 Linear SVM This is the simplest kind of SVM (Called an LSVM) Support Vector Machines: Slide 8 Maximum Margin a x denotes +1 f yest f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. denotes -1 Support Vectors are those datapoints that the margin pushes up against Linear SVM This is the simplest kind of SVM (Called an LSVM) Support Vector Machines: Slide 9 Why Maximum Margin? 1. Intuitively this feels safest. denotes +1 denotes -1 Support Vectors are those datapoints that the margin pushes up against f(x,w,b) = sign(w. - b) 2. If we’ve made a small error inxthe location of the boundary (it’s been The maximum jolted in its perpendicular direction) this gives us leastmargin chance linear of causing a misclassification. classifier is the 3. LOOCV is easy since the classifier model is linear immune to removal of any with the,nonum, support-vector datapoints. maximum margin. 4. There’s some theory (using VC is the dimension) that isThis related to (but not the same as) thesimplest propositionkind that of this is a good thing. SVM (Called an LSVM) 5. Empirically it works very very well. Support Vector Machines: Slide 10 Estimate the Margin denotes +1 denotes -1 x wx +b = 0 • What is the distance expression for a point x to a line wx+b= 0? Support Vector Machines: Slide 11 Estimate the Margin denotes +1 denotes -1 wx +b = 0 Margin • What is the expression for margin? Support Vector Machines: Slide 12 Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Support Vector Machines: Slide 13 Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin • Min-max problem game problem Support Vector Machines: Slide 14 Maximize Margin denotes +1 denotes -1 wx +b = 0 Margin Strategy: Support Vector Machines: Slide 15 Maximum Margin Linear Classifier • How to solve it? Support Vector Machines: Slide 16 Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints. Support Vector Machines: Slide 17 Quadratic Programming Find arg max u T u Ru T cd u 2 Quadratic criterion a11u1 a12u2 ... a1mum b1 Subject to a21u1 a22u2 ... a2 mum b2 : n additional linear inequality constraints an1u1 an 2u2 ... anmum bn a( n 1)1u1 a( n 1) 2u2 ... a( n 1) mum b( n 1) a( n 2)1u1 a( n 2) 2u2 ... a( n 2) mum b( n 2) : a( n e )1u1 a( n e ) 2u2 ... a( n e ) mum b( n e ) e additional linear equality constraints And subject to Support Vector Machines: Slide 18 Quadratic Programming Find arg max u T u Ru T cd u 2 Quadratic criterion a11u1 a12u2 ... a1mum b1 Subject to a21u1 a22u2 ... a2 mum b2 : n additional linear inequality constraints an1u1 an 2u2 ... anmum bn a( n 1)1u1 a( n 1) 2u2 ... a( n 1) mum b( n 1) a( n 2)1u1 a( n 2) 2u2 ... a( n 2) mum b( n 2) : a( n e )1u1 a( n e ) 2u2 ... a( n e ) mum b( n e ) e additional linear equality constraints And subject to Support Vector Machines: Slide 19 Uh-oh! This is going to be a problem! What should we do? denotes +1 denotes -1 Support Vector Machines: Slide 20 Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1: Find minimum w.w, while minimizing number of training set errors. Problemette: Two things to minimize makes for an ill-defined optimization Support Vector Machines: Slide 21 Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) Tradeoff parameter There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Support Vector Machines: Slide 22 Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 1.1: Minimize w.w + C (#train errors) Tradeoff parameter Can’t be expressed as a Quadratic Programming problem. Solving it may be too slow. There’s a serious practical (Also, doesn’t distinguish between problem that’s about to make disastrous errors and near misses) us reject this approach. Can you guess what it is? Support Vector Machines: Slide 23 Uh-oh! denotes +1 denotes -1 This is going to be a problem! What should we do? Idea 2.0: Minimize w.w + C (distance of error points to their correct place) Support Vector Machines: Slide 24 Support Vector Machine (SVM) for Noisy Data denotes +1 denotes -1 3 1 • Any problem with the above formulism? 2 Support Vector Machines: Slide 25 Support Vector Machine (SVM) for Noisy Data denotes +1 denotes -1 3 1 • Balance the trade off between margin and classification errors 2 Support Vector Machines: Slide 26 Support Vector Machine for Noisy Data How do we determine the appropriate value for c ? Support Vector Machines: Slide 27 An Equivalent QP: Determine b Fix w A linear programming problem ! Support Vector Machines: Slide 28 Suppose we’re in 1-dimension What would SVMs do with this data? x=0 Support Vector Machines: Slide 29 Suppose we’re in 1-dimension Not a big surprise x=0 Positive “plane” Negative “plane” Support Vector Machines: Slide 30 Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0 Support Vector Machines: Slide 31 Harder 1-dimensional dataset Remember how permitting nonlinear basis functions made linear regression so much nicer? Let’s permit them here too x=0 z k ( xk , x ) 2 k Support Vector Machines: Slide 32 Harder 1-dimensional dataset Remember how permitting nonlinear basis functions made linear regression so much nicer? Let’s permit them here too x=0 z k ( xk , x ) 2 k Support Vector Machines: Slide 33
© Copyright 2024 Paperzz