Constrained Convex Optimization Sebastian Urban [email protected] 1 / 41 Convexity 2 / 41 Why do we need convex functions for optimization? ∇f=0 ∇f=0 Non-convex functions can have multiple local minima with ∇f (x ) = 0. Gradient descent will not necessarily find the global minimum. 3 / 41 Convex function Definition (convex function) A function f : X → R defined on a convex set X is called convex if, for any two points x , y ∈ X and any t ∈ [0, 1], f (tx + (1 − t)y ) ≤ tf (x ) + (1 − t)f (y ) . If a function is convex any two points of the function’s graph can be connected by a straight line that is above the function graph. 4 / 41 Concave function Definition (concave function) A function f : X → R is concave if and only if −f is convex. f (x ) 5 / 41 First-order convexity condition f(y) f((1-t)x+ty) f(x) x (1-t)x+ty y Convexity imposes a rate of rise constraint on the function. f ((1 − t)x + ty ) ≤ (1 − t)f (x ) + tf (y ) f ((1 − t)x + ty ) − f (x ) t Difference between f (y ) and f (x ) is bounded by function values between x and y . f (y ) − f (x ) ≥ 6 / 41 First-order convexity condition f (y ) − f (x ) ≥ f (x + t(y − x )) − f (x ) t f(y) Let t → 0 and apply the definition of the directional derivative, f (y ) − f (x ) ≥ (y − x )T ∇f (x ) . Theorem Suppose f : X → R is a differentiable function and X is convex. Then f is convex if and only if for x , y ∈ X f (y ) ≥ f (x ) + (y − x )T ∇f (x ) . x Proof. See Boyd p. 70. 7 / 41 Minimization of convex functions Given a function f0 : RN → R, minimize f0 (x ). Local minima have zero gradient At a (local) minimum point x ∗ the gradient must be zero, ∇f0 (x ∗ ) = 0 , otherwise we could follow the gradient to get an even lower value, f0 (x ∗ − ε∇f0 (x ∗ )) = f0 (x ∗ ) − ε||∇f0 (x ∗ )||22 + O(ε2 ||∇f0 (x ∗ )||22 ) < f0 (x ∗ ) for sufficently small ε . Zero gradient implies minimum for convex functions If f0 is a convex function and ∇f0 (x ∗ ) = 0 then x ∗ is a (global) minimum of f0 . Using the first-order convexity criterion, we have f0 (y ) ≥ f0 (x ∗ ) + (y − x ∗ )T ∇f0 (x ∗ ) = f0 (x ∗ ) for all y . 8 / 41 Convex set Definition (convex set) A set X in a vector space is called convex if, for all x , y ∈ X and any t ∈ [0, 1], [(1 − t)x + ty ] ∈ X . A convex set. A non-convex set. 9 / 41 Constrained Optimization 10 / 41 Optimization with inequality constraints Constrained optimization problem Given f0 : RN → R and fi : RN → R, minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , p . Feasibility A point x ∈ RN is called feasible if and only if it satisfies all constraints fi (x ) ≤ 0, i = 1, . . . , p of the optimization problem. Minimum and minimizer We call the optimal value the minimum p ∗ , and the point where the minimum is obtained the minimizer x ∗ . Thus p ∗ = f (x ∗ ). 11 / 41 Minimization with inequality constraints Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 I I If the minimum is in the interior of the circle (f1 (x ∗ ) < 0), we have reached it when ∇f0 (x ∗ ) = 0. If the minimum is on the circle (f1 (x ∗ ) = 0), we have reached it when the negative gradient −∇f0 (x ∗ ) has no component that we could follow without changing the value of f1 . ∇f1 f1(x)≤0 -∇f0 min Thus at the minimizer x ∗ we have −∇f0 (x ∗ ) = α ∇f1 (x ∗ ) f0 (x ) is color coded. with α ≥ 0 to ensure that −∇f0 and ∇f1 are not anti-parallel. 12 / 41 Multiple inequality constraints Given f0 : RN → R and fi : RN → R, minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , p . For multiple constraints f1 , . . . , fp we have reached the minimum when the negative gradient has no component that we could follow without changing the value of any constraint. This is the case when −∇f0 is a linear combination of the ∇fi ’s, −∇f0 (x ∗ ) = p X αi ∇fi (x ∗ ) , αi ≥ 0 . i=1 13 / 41 Lagrangian minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , m Definition (Lagrangian) We define the Lagrangian L : RN × Rm → R associated with the above problem as m X L(x , α) = f0 (x ) + αi fi (x ) . i=1 We refer to αi ≥ 0 as the Lagrange multiplier associated with the inequality constraint fi (x ) ≤ 0. Calculating the gradient of L w.r.t. x shows that our optimality criterion at x ∗ is recovered, ∇x ∗ L(x ∗ , α) = ∇f0 (x ∗ ) + m X αi ∇fi (x ∗ ) = 0 . i=1 14 / 41 Interpretation of the Lagrangian We can rewrite the problem, minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , m , as the unconstrained problem minimize f0 (x ) + m X I (fi (x )) i=1 where the indicator function, I (u) = 0 u ≤ 0, ∞ u > 0, ensures that solutions that violate the constraints are rejected. If we “approximate” I (fi (x )) by αi fi (x ) the objective becomes minimize L(x , α) = f0 (x ) + m X αi fi (x ) . i=1 15 / 41 Interpretation of the Lagrangian minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , m L(x , α) = f0 (x ) + m X αi fi (x ) i=1 (Animation (lagrangian2.avi) only visible in Adobe Reader.) For every α the minimum of L(x , α) w.r.t. x is a lower bound on the optimal value of the constrained problem. 16 / 41 Interpretation of the Lagrangian minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , m L(x , α) = f0 (x ) + m X αi fi (x ) i=1 Magenta line shows minx L(x , α) for α ∈ [0, 5]. How does it relate to the solution of the constrained optimization problem? 17 / 41 Lagrange dual function Definition (Lagrange dual function) The Largrange dual function g : Rm → R is the minimum of the Lagrangian over x given α, g(α) = min L(x , α) = min f0 (x ) + x x m X ! αi fi (x ) . i=1 It is concave since it is the pointwise minimum of a family of affine functions of α. (Exercise) 18 / 41 Lagrange dual function Theorem For any α ≥ 0 the value of the Lagrange dual function g(α) is a lower bound of the minimum p ∗ of the constrained problem, g(α) ≤ p ∗ . Proof. Suppose α ≥ 0 and x̃ is a feasible point, that is fi (x̃ ) ≤ 0. Then we have m X αi fi (x̃ ) ≤ 0 i=1 and therefore L(x̃ , α) = f0 (x̃ ) + m X αi fi (x̃ ) ≤ f0 (x̃ ) . i=1 Hence g(α) = min L(x , α) ≤ L(x̃ , α) ≤ f0 (x̃ ) . x 19 / 41 Lagrange dual problem For each α ≥ 0 the Lagrange dual function g(α) gives us a lower bound on the optimal value p ∗ of the original optimization problem. What is the best (highest) lower bound? Lagrange dual problem maximize g(α) = min L(x , α) x subject to αi ≥ 0, i = 1, . . . , m The maximum d ∗ of the Lagrange dual problem is the best lower bound on p ∗ that we can achieve by using the Lagrangian. 20 / 41 Dualities Weak duality (always) Since g(α) is a lower bound of p ∗ we have weak duality, d ∗ ≤ p∗ . The difference between the solution of the original and the dual problem, p∗ − d ∗ ≥ 0 , is called the (optimal) duality gap. Strong duality (under certain conditions) Under certrain conditions we have d ∗ = p∗ , i.e. the solution to the Lagrange dual problem is a solution of the original constrained optimization problem. We then say that strong duality holds. 21 / 41 Constraint qualifications for convex problems Consider the constrained optimization problem, minimize f0 (x ) subject to fi (x ) ≤ 0 . Slater’s constraint qualification If f0 , f1 , . . . , fm are convex and there exists an x ∈ Rn such that fi (x ) < 0, i = 1, . . . , m or the constraints are affine, that is fi (x ) = w T i x + bi ≤ 0 , the duality gap is zero. Proof. See Boyd p. 234. 22 / 41 Dual solution Problem: ∗ ∗ Even when d = p , we do not know if the minimizer x̃ = arg minx L(x , α∗ ) of the dual problem equals the minimizer x ∗ of the original problem. So far the minimizer x̃ of the Lagrangian could differ from the minimizer x ∗ of the original problem. 23 / 41 Dual solution Let x ∗ be a minimizer of the primal problem and α∗ a maximizer of the dual problem. If strong duality holds, then L(x ∗ , α∗ ) = g(α∗ ) = min L(x , α∗ ) . x Proof. Let x̃ be a minimizer of L(x , α∗ ) over x . We have d ∗ = g(α∗ ) = min L(x , α∗ ) = L(x̃ , α∗ ) (minimizer) x = f0 (x̃ ) + m X αi∗ fi (x̃ ) i=1 m X ≤ f0 (x ∗ ) + i=1 ∗ ≤ f0 (x ∗ ) = p αi∗ fi (x ∗ ) (Lagrangian) (x̃ minimizer over x ) (α ≥ 0; x ∗ feasible: fi (x ∗ ) ≤ 0) and since d ∗ = p ∗ , we have g(α∗ ) = L(x̃ , α∗ ) = L(x ∗ , α∗ ). 24 / 41 Dual solution Corollary Let x ∗ be a minimizer of the primal problem and α∗ a maximizer of the dual problem. If strong duality holds and L(x , α) is convex in x , then x ∗ = arg min L(x , α∗ ) . x 25 / 41 Recipe for solving constrained optimization problems The constrained optimization problem, minimize f0 (x ) subject to fi (x ) ≤ 0 i = 1, . . . , m , with f0 convex and f1 , . . . , fm convex can be solved as follows: 1. Calculate the Lagrangian L(x , α) = f0 (x ) + m X αi fi (x ) . i=1 2. Obtain the Lagrange dual function by solving ∇x ∗ L(x ∗ , α) = 0 . 3. Solve the dual problem maximize g(α) = L(x ∗ , α) subject to αi ≥ 0 i = 1, . . . , m . The solution of this problem is also the solution of the original problem if Slater’s condition is satisfied. 26 / 41 Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 27 / 41 Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 Write down the Lagrangian L(x , α) = −x1 − x2 + α(x12 + x22 − 1) . Calculate the derivative of L(x , α) w.r.t. x and set it to zero, 1 ∇x L(x , α) = − + 2αx = 0 1 1 ⇒ x1 = x2 = . 2α Substituting this back into L(x , α) gives the dual function g(α) = 1 1 −α− . 2α α 28 / 41 Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 1 1 g(α) = −α− 2α α We must now maximize the dual function g(α) subject to α ≥ 0. Since g(α) is concave, we set its derivative to zero and solve for α. dg 1 1 = 2 −1− 2 =0 dα α 2α 1 ⇒ α2 = 2 1 ⇒ α1,2 = ± √ 2 1 ⇒ α∗ = √ 2 since we require α ≥ 0 29 / 41 Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 1 1 g(α) = −α− 2α α 1 ∗ α =√ 2 The optimization problem is convex and Slater’s condition holds, therefore the minimal value of f0 (x ) is √ 1 ∗ √ = − 2. p =g 2 √ We calculate the minimizer x ∗ by substituting α = 1/ 2 into x1 = x2 = 1/2α and get 1 x1∗ = x2∗ = √ . 2 30 / 41 Example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 1 1 −α− g(α) = 2α α ∇f1 f1(x)≤0 -∇f0 min 1 α∗ = √ 2 √ ∗ p =− 2 1 x1∗ = x2∗ = √ 2 31 / 41 Properties of the solution of a constrained optimization problem What can we say about the solution? Is there yet another method to solve a constrained optimization problem? 32 / 41 KKT conditions for convex problems Consider the constrained optimization problem, minimize f0 (x ) subject to fi (x ) ≤ 0 , with f0 convex and f1 , . . . , fm convex and strong duality holds. KKT conditions If and only if x ∗ , α∗ are points that satisfy the Karush-Kuhn-Tucker (KKT) conditions, fi (x ∗ ) ≤ 0 αi∗ ∗ primal feasibility, ≥0 dual feasibility, =0 ∇x L(x , α ) = 0 complementary slackness, x ∗ minimizes Lagrangian, αi∗ fi (x ) ∗ ∗ for i = 1, . . . , m, then x ∗ and α∗ are optimal solutions of the constrained optimization problem and the corresponding Lagrange dual problem. 33 / 41 KKT conditions for convex problems Proof. x ∗ , α∗ optimal solutions ⇒ KKT conditions. The problem is convex and therefore L(x ∗ , α∗ ) minimizes the Lagrangian, ∇x L(x ∗ , α∗ ) = 0 . By strong duality we have L(x ∗ , α∗ ) = f0 (x ∗ ) + m X αi∗ fi (x ∗ ) = f0 (x ∗ ) i=1 and with α ≥ 0 and fi (x̃ ) ≤ 0 it follows that α∗i fi (x ∗ ) = 0 i = 1, . . . , m . 34 / 41 KKT conditions for convex problems Proof. KKT conditions ⇒ x ∗ , α∗ optimal solutions. Since α∗i ≥ 0 the Lagrangian L(x , α∗ ) is convex in x because it is a sum of convex functions. From ∇x L(x ∗ , α∗ ) = 0 it follows that x ∗ is a minimizer of L(x , α∗ ) and thus g(α∗ ) = L(x ∗ , α∗ ) = f0 (x ∗ ) + m X αi∗ fi (x ∗ ) = f0 (x ∗ ) i=1 since αi fi (x ∗ ) = 0. Let x̃ be a minimizer of the primal problem. Since g(α∗ ) is a lower bound of f0 (x ) for feasible x we have f0 (x ∗ ) = g(α∗ ) ≤ f0 (x̃ ) ≤ f0 (x ∗ ) and thus p ∗ = f0 (x̃ ) = f0 (x ∗ ) = g(α∗ ) . 35 / 41 Complementary slackness The KKT condition αi fi (x ∗ ) = 0 can only be satisfied if αi = 0 or fi (x ∗ ) = 0. If the constraint fi is inactive (fi (x ∗ ) < 0), we have αi = 0. αi > 0 is only possible if the constraint is active (fi (x ∗ ) = 0). Explanation: If the minimum is in the interior of the feasible area (inactive constraint) we have ∇f0 (x ) = 0 . ∇f1 f1(x)≤0 -∇f0 min However if the minimum is on the boundary (active constraint) we can only demand −∇f0 (x ) = α1 ∇f1 (x ) with α1 > 0. 36 / 41 Complementary slackness: example minimize f0 (x ) = −(x1 + x2 ) subject to f1 (x ) = x12 + x22 − 1 ≤ 0 1 Solution: α∗ = √ 2 1 x1∗ = x2∗ = √ 2 Because α∗ > 0 we know that constraint f1 is active, i.e. ∇f1 f1(x)≤0 -∇f0 min 1 1 f1 (x ∗ ) = √ 2 + √ 2 −1 = 0 . 2 2 f1 is actively limiting the achiveable minimum, because if we moved further towards the minimum of f0 we would get f1 (x ) > 0. 37 / 41 Summary I I I Lagrange formalism reformulates constrained optimization problem into a dual problem. KKT conditions characterize the solution and sometimes allow to obtain it without minimization. Convexity is necessary to guarantee optimality. 38 / 41 Constrained optimization with equality constraints Given f0 : RN → R, fi : RN → R and hi : RN → R, minimize f0 (x ) subject to fi (x ) ≤ 0, i = 1, . . . , m, hi (x ) = 0, i = 1, . . . , p . m X p X Lagrangian: L(x , α) = f0 (x ) + αi fi (x ) + i=1 αi ≥ 0, βi hi (x ) i=1 i = 1, . . . , m (not for βi ’s) Use same solution method. 39 / 41 Literature S. Boyd, L. Vandenberghe Convex Optimization Cambridge University Press 2004 http://www.stanford.edu/~boyd/cvxbook/ 40 / 41
© Copyright 2026 Paperzz