Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University [email protected] http://www.eie.polyu.edu.hk/∼mwmak References: C. Bishop, ”Pattern Recognition and Machine Learning”, Appendix E, Springer, 2006. S.Y. Kung, M.W. Mak and S.H. Lin, Biometric Authentication: A Machine Learning Approach, Prentice Hall, 2005, Chapter 4. March 22, 2017 Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 1 / 30 Overview 1 Motivations: Why Study Constrained Optimization and SVM Why Study Constrained Optimization Why Study SVM 2 Constrained Optimization Hard-Constrained Optimization Lagrangian Function Inequality Constraint Multiple Constraints Software Tools 3 Support Vector Machines Linear SVM: Separable Case Linear SVM: Fuzzy Separation (Optional) Nonlinear SVM SVM for Pattern Classification Software Tools Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 2 / 30 Why Study Constrained Optimization? Constrained optimization is used in almost every discipline: Power Electronics: “Design of a boost power factor correction converter using optimization techniques,” IEEE Transactions on Power Electronics, vol. 19, no. 6, pp. 1388-1396, Nov. 2004. Wireless Communication: “Energy-constrained modulation optimization,” IEEE Transactions on Wireless Communications, vol. 4, no. 5, pp. 2349-2360, Sept. 2005 Photonics: “Module Placement Based on Resistive Network Optimization,” in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 3, no. 3, pp. 218-225, July 1984. Multimedia: “Nonlinear total variation based noise removal algorithms.” Physica D: Nonlinear Phenomena 60.1-4 (1992): 259-268. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 3 / 30 Why Study SVM? SVM is a typical application of constraint optimization. SVMs are used everywhere: Power Electronics: “Support Vector Machines Used to Estimate the Battery State of Charge,” IEEE Transactions on Power Electronics, vol. 28, no. 12, pp. 5919-5926, Dec. 2013. Wireless Communication: “Localization In Wireless Sensor Networks Based on Support Vector Machines,” IEEE Transactions on Parallel and Distributed Systems, vol. 19, no. 7, pp. 981-994, July 2008. Photonics: “Development of robust calibration models using support vector machines for spectroscopic monitoring of blood glucose.” Analytical chemistry 82.23 (2010): 9719-9726. Multimedia: “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308-311, May 2006. Bioinformatics: “Gene selection for cancer classification using support vector machines.” Machine learning, 46.1-3 (2002): 389-422. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 4 / 30 Constrained Optimization Constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints which set conditions for the variables that are required to be satisfied, or soft constraints which have some variable values that are penalized in the objective function if the conditions on the variables are not satisfied. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 5 / 30 Constrained Optimization A general constrained minimization problem: min f (x) subject to gi (x) = ci for i = 1, . . . , n (Equality constraints) hj (x) ≥ dj for j = 1, . . . , m (Inquality constraints) (1) where gi (x) = ci and hj (x) ≥ dj are called hard constraints. If the constrained problem has only equality constraints, the method of Lagrange multipliers can be used to convert it into an unconstrained problem whose number of variables is the original number of variables plus the original number of equality constraints. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 6 / 30 Constrained Optimization Example: Maximization of a function of two variables with equality constraints: max f (x, y ) (2) subject to g (x, y ) = 0 At the optimal point (x ∗ , y ∗ ), the gradient of f (x, y ) and g (x, y ) are anti-parallel, i.e., ∇f (x ∗ , y ∗ ) = −λ∇g (x ∗ , y ∗ ), where λ is called the Lagrange multiplier. (See Tutorial for explanation.) Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 7 / 30 Constraint Optimization Extension to function of D variables: max f (x) subject to g (x) = 0 (3) where x ∈ <D . Optimal occurs when ∇f (x) + λ∇g (x) = 0. (4) Note that the red curve is of dimension D − 1. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 8 / 30 Lagrangian Function Define the Lagrangian function as L(x, λ) ≡ f (x) + λg (x) (5) where λ 6= 0 is the Lagrange multiplier. The optimal condition (Eq. 4) will be satisfied when ∇x L = 0. Note that ∂L/∂λ = 0 leads to the constrained equation g (x) = 0. The constrained maximization can be written as: max L(x, λ) = f (x) + λg (x) subject to λ 6= 0, g (x) = 0 Man-Wai MAK (EIE) Constrained Optimization and SVM (6) March 22, 2017 9 / 30 Lagrangian Function: 2D Example Find the stationary point of the function f (x1 , x2 ): max f (x1 , x2 ) = 1 − x12 − x22 subject to g (x1 , x2 ) = x1 + x2 − 1 = 0 (7) Lagrangian function: L(x, λ) = 1 − x12 − x22 + λ(x1 + x2 − 1) Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 10 / 30 Lagrangian Function: 2D Example Differenting L(x, λ) w.r.t. x1 , x2 , and λ and set the results to 0, we obtain −2x1 + λ = 0 −2x2 + λ = 0 x1 + x2 − 1 = 0 The solution is (x1∗ , x2∗ ) = ( 12 , 21 ), and the corresponding λ = 1. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 11 / 30 Inequality Constraint Maximization with inequality constraint max f (x) subject to g (x) ≥ 0 (8) Two possible solutions for the max of L(x, µ) = f (x) + µg (x): Inactive Constraint : g (x) > 0, µ = 0, ∇f (x) = 0 Active Constraint : g (x) = 0, µ > 0, ∇f (x) = −µ∇g (x) (9) Therefore, the maximization can be rewritten as max L(x, µ) = f (x) + µg (x) subject to g (x) ≥ 0, µ ≥ 0, µg (x) = 0 (10) which is known as the Karush-Kuhn-Tucker (KKT) condition. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 12 / 30 Inequality Constraint For minimization, min f (x) subject to g (x) ≥ 0 (11) We can also express the minimization as min L(x, µ) = f (x) − µg (x) subject to g (x) ≥ 0, µ ≥ 0, µg (x) = 0 Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 (12) 13 / 30 Multiple Constraints Maximization with multiple equality and inequality constraints: max f (x) subject to gj (x) = 0 for j = 1, . . . , J hk (x) ≥ 0 for k = 1, . . . , K . (13) This maximization can be written as max L(x, {λj }, {µk }) = f (x) + J P j=1 λj gj (x) + K P µk hk (x) k=1 subject to λj 6= 0, gj (x) = 0 for j = 1, . . . , J and µk ≥ 0, hk (x) ≥ 0, µk hk (x) = 0 for k = 1, . . . , K . Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 (14) 14 / 30 Software Tools for Constrained Optimization Matlab Optimization Toolbox: fmincon can find the minimum of a function subject to nonlinear multivariable constraints. Python: scipy.optimize.minimize provides a common interface to unconstrained and constrained minimization algorithms for multivariate scalar functions Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 15 / 30 Linear SVM: Separable Case x c 1 x1 x1 Consider a training set {xi , yi ; i = 1, . . . , N} ∈ X × {+1, −1} shown (b) below, where X is the set of input data in <D and yi are the labels. (a) −w/kwk c yi = 1 yi = −1 x2 Margin: d −w/kwk : x1 : x2 c2 w · x + b = +1 x Decision Plan x1 (c) w·x+b=0 x1 w · x + b = −1 (d) Figure: Linear SVM on 2-D space ustration of a two-class classification process, from finding : y = +1; ◦: y = −1. es for a basici classifier to i the determination of decision and anes of the SVM classifier. (a) N sets of labeled data {xi , yi ;i = Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 16 / 30 Linear SVM: Separable Case A linear support vector machine (SVM) aims to find a decision plane (a line for the case of 2D) x·w+b =0 that maximizes the margin of separation (see Fig. 1). Assume that all data points satisfy the constraints: xi · w + b ≥ +1 for i ∈ {1, . . . , N} where yi = +1. xi · w + b ≤ −1 for i ∈ {1, . . . , N} where yi = −1. (15) (16) Data points x1 and x2 in previous page satisfy the equality constraint: x1 · w + b = +1 (17) x2 · w + b = −1 Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 17 / 30 Linear SVM: Separable Case Using Eq. 17 and Fig. 1, the distance between the two separating hyperplane (also called the margin of separation) can be computed: d(w) = (x1 − x2 ) · w 2 = kwk kwk Maximizing d(w) is equivalent to minimizing kwk2 . So, the constrained optimization problem in SVM is 1 2 min 2 kwk subject to yi (xi · w + b) ≥ 1 ∀i = 1, . . . , N (18) Equivalently, minimizing a Lagrangian function: P min L(w, b, {αi }) = 21 kwk2 − N i=1 αi [yi (xi · w + b) − 1] subject to αi ≥ 0, yi (xi · w + b) − 1 ≥ 0, αi [yi (xi · w + b) − 1] = 0, ∀i = 1, . . . , N (19) Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 18 / 30 Linear SVM: Separable Case Setting ∂ ∂ L(w, b, {αi }) = 0 and L(w, b, {αi }) = 0, ∂b ∂w subject to the constraint αi ≥ 0, results in XN XN αi yi = 0 and w = αi yi xi . i=1 (20) (21) i=1 Substituting these results back into the Lagrangian function: N N N X X X 1 L(w, b, {αi }) = (w · w) − αi yi (xi · w) − αi y i b + αi 2 = = 1 2 N X i=1 N X i=1 αi yi xi · αi − Man-Wai MAK (EIE) 1 2 N X j=1 i=1 N X αj yj xj − N X N X i=1 j=1 i=1 i=1 αi yi xi · N X j=1 i=1 αj yj xj + N X αi i=1 αi αj yi yj (xi · xj ). Constrained Optimization and SVM March 22, 2017 19 / 30 Linear SVM: Separable Case This results in the following Wolfe dual formulation: max α N X i=1 N N 1 XX αi − αi αj yi yj (xi · xj ) 2 i=1 j=1 (22) subject to N X i=1 αi yi = 0 and αi ≥ 0, i = 1, . . . , N. The solution contains two kinds of Lagrange multiplier: 1 2 αi = 0: The corresponding xi are irrelevant αi > 0: The corresponding xi are critical xk for which αk > 0 are called support vectors. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 20 / 30 Linear SVM: Separable Case The SVM output is given by f (x) = w · x + b X = αk yk xk · x + b k∈S where S is the set of indexes for which αk > 0. b can be computed by using the KTT condition, i.e., for any k such that yk = 1 and αk > 0, we have αk [yk (xk · w + b) − 1] = 0 =⇒ b = 1 − xk · w. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 21 / 30 Linear SVM: Fuzzy Separation (Optional) If the data patterns are not separable by a linear hyperplane, a set of slack variables {ξ = ξ1 , . . . , ξN } is introduced with ξi ≥ 0 such that the inequality constraints in SVM become yi (xi · w + b) ≥ 1 − ξi ∀i = 1, . . . , N. (23) The slack variables {ξi }N i=1 allow some data to violate the constraints in Eq. 18. The value of ξi indicates the degree of violation of the constraint. The minimization problem becomes X 1 min kwk2 + C ξi , 2 i subject to yi (xi · w + b) ≥ 1 − ξi , (24) where C is a user-defined penalty parameter to penalize any violation of the safety margin for all training data. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 22 / 30 Linear SVM: Fuzzy Separation (Optional) The new Lagrangian is N N X X X 1 L(w, b, α) = kwk2 +C ξi − αi (yi (xi ·w+b)−1+ξi )− βi ξi , 2 i i=1 i=1 (25) where αi ≥ 0 and βi ≥ 0 are, respectively, the Lagrange multipliers to ensure that yi (xi · w + b) ≥ 1 − ξi and that ξi ≥ 0. Differentiating L(w, b, α) w.r.t. w, b, and ξi , we obtain the Wolfe dual: N N N X 1 XX max αi − αi αj yi yj (xi · xj ) (26) α 2 i=1 i=1 j=1 subject to 0 ≤ αi ≤ C , i = 1, . . . , N, Man-Wai MAK (EIE) PN i=1 αi yi Constrained Optimization and SVM = 0. March 22, 2017 23 / 30 Linear SVM: Fuzzy Separation (Optional) Three types of support vectors: Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 24 / 30 Nonlinear SVM Assume that we have a nonlinear function φ(x) that map x from the input space to a much higher (possibly infinite) dimensional space called the feature space. While data are not linearly separable in the input space, they will become Section linearly separable 4.5. Nonlinear SVMs in the feature space. 109 x2 Decision boundary yi = 1 yi = 1 yi = −1 yi = −1 Decision boundary K(x, xi ) x1 Input Space Feature Space Figure 4.8. Use of a kernel function to map nonlinearly separable data in the Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 25 / 30 Nonlinear SVM The SVM output becomes f (x) = N X i=1 αi yi φ(xi ) · φ(x) + b However, the dimension of φ(x) is very high and could be infinite in some cases, meaning that this function may not be implementable. Fortunately, the dot product φ(xi ) · φ(x) can be replaced by a kernel function: φ(xi ) · φ(x) = φ(x)T φ(x) = K (xi , x) which can be efficiently implemented. Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 26 / 30 Nonlinear SVM Common kernel functions include x · xi p Polynomial Kernel : K (x, xi ) = 1 + 2 , p > 0 (27) σ kx − xi k2 RBF Kernel : K (x, xi ) = exp − (28) 2σ 2 1 Sigmoidal Kernel : K (x, xi ) = (29) x·xi +b 1 + e − σ2 Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 27 / 30 Nonlinear SVM Comparing Kernels Comparing kernels: Linear SVM, C=1000.0, #SV=7, acc=95.00%, normW=0.94 RBF SVM, 2*sigma2=8.0, C=1000.0, #SV=7, acc=100.00% 10 10 9 18 8 11 7 9 17 3 4 13 20 10 19 6 1 Linear Kernel 2 4 8 4 9 6 x 8 6 0 19 15 8 RBF Kernel 1 10 10 5 2 7 13 20 3 2 0 3 4 5 14 17 2 5 15 16 12 1 6 4 3 11 7 2 5 18 8 x2 x2 12 14 1 6 0 16 0 2 7 4 9 6 x 1 8 10 1 Polynomial SVM, degree=2, C=10.0, #SV=7, acc=90.00% 10 9 18 8 11 7 x2 2 5 3 4 13 4 20 3 10 19 15 5 2 6 Polynomial Kernel 8 1 7 0 2 4 9 6 x Man-Wai MAK (EIE) 14 17 1 6 0 16 12 8 10 1 Constrained Optimization and SVM March 22, 2017 28 / 30 SVM for Pattern Classification SupportVectorMachines SVM● isSVM goodisfor binary classification: good for binary classification. f (x) > 0 ⇒ x ∈ Class 1; f (x) ≤ 0 ⇒ x ∈ Class 2 ● To classify multiple classes, we can use the one-vs-rest To classify multiple classes,Kwe use the one-vs-resttoapproach approach to convert binary classifications a K-classto converting K binary classifications to a K -class classification: classification k * = argmax f (k ) (x) k f (0) (x) = ∑y (0) i (0) i T i α x x+b Pick Max Score (0) i∈SV0 f (9) (x) = ∑y (9) i α i(9) x iT x + b(9) i∈SV9 SVM0 Classifying digit ‘0’ and the rest SVM1 SVM9 Convert to Vector Classifying digit ‘9’ from the rest 39 ! Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 29 / 30 Software Tools for SVM Matlab: fitcsvm trains an SVM for two-class classification. Python: svm from the sklearn package provides a set of supervised learning methods used for classification, regression and outliers detection. C/C++: LibSVM is a library for SVM. It also has Java, Perl, Python, Cuda, and Matlab interface. Java: SVM-JAVA implements sequential minimal optimization for training SVM in Java. Javascript: http://cs.stanford.edu/people/karpathy/svmjs/demo/ Man-Wai MAK (EIE) Constrained Optimization and SVM March 22, 2017 30 / 30
© Copyright 2026 Paperzz