Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler • assume function f is proper close convex, differentiable • consider min f (x) • gradient descent iteration (with step size c): xk+1 = xk − c ∇f (xk ) • xk+1 minimizes the following local quadratic approximation of f : f (xk ) + h∇f (xk ), x − xk i + 1 kx − xk k22 2c • compare with forward Euler iteration, a.k.a. the explicit update: x(t + 1) = x(t) − Δt ∙ ∇f (x(t)) Backward Euler / implicit gradient descent • backward Euler iteration, also known as the implicit update: solve x(t + 1) ←− x(t + 1) = x(t) − Δt ∙ ∇f (x(t + 1)) . • equivalent to: solve 1. u(t + 1) ←− u = ∇f (x(t) − Δt ∙ u) , 2. x(t + 1) = x(t) − Δt ∙ u(t + 1). • we can view it as the “implicit gradient” descent: solve (xk+1 , uk+1 ) ←− x = xk − cu, u = ∇f (x). • c is the “step size”, very different from a standard step size. • explicit (implicit) update uses the gradient at the start (end) point Implicit gradient step = proximal operation • proximal update: proxcf (z) := arg min f (x) + x 1 kx − zk2 . 2c • optimality condition: 0 = c ∇f (x∗ ) + (x∗ − z). • given input z, - proxcf (z) returns solution x∗ - ∇f (proxcf (z)) returns u∗ solve (x∗ , u∗ ) ←− x = z − cu, u = ∇f (x). Proposition Proximal operator is equivalent to an implicit gradient (or backward Euler) step. Proximal operator handles sub-differentiable f • assume that f is closed, proper, sub-differentiable convex function • ∂f (x) is denoted as the subdifferential of f at x. Recall u ∈ ∂f (x) if f (x0 ) ≥ f (x) + hu, x0 − xi, ∀x0 ∈ Rn . • ∂f (x) is point-to-set, neither direction is unique • prox is well-defined for sub-differentiable f ; it is point-to-point, prox maps any input to a unique point Proximal operator proxcf (z) := arg min f (x) + x 1 kx − zk2 . 2c • since objective is strongly convex, solution proxcf (z) is unique • since f is proper, dom proxcf = Rn • the followings are equivalent proxcf z = x∗ = arg min f (x) + x 1 kx − zk2 , 2c solve x∗ ←− 0 ∈ c ∂f (x) + (x − z), solve (x∗ , u∗ ) ←− x = z − cu, u ∈ ∂f (x). • point x∗ minimizes f if and only if x∗ = proxf (x∗ ). Lagrange duality Convex problem subject to Ax = b. minimize f (x) Relax the constraints and price their violation (pay a price if violated one way; get paid if violated the other way; payment is linear to the violation) L(x; y) := f (x) + yT (Ax − b) For later use, define the augmented Lagrangian LA (x; y, c) := f (x) + yT (Ax − b) + Minimize L for fixed price y: c kAx − bk2 2 d(y) := − minx L(x; y). Always, d(y) is convex The Lagrange dual problem minimize d(y) y Given dual solution y∗ , recover x∗ = minx L(x; y∗ ) (under which conditions?) Question: how to compute the explicit/implicit gradients of d(y)? Dual explicit gradient (ascent) algorithm Assume d(y) is differentiable (true if f (x) is strictly convex.) Gradient descent iteration (if the maximizing dual is used, it is called gradient ascent): yk+1 = yk − c ∇f (yk ). It turns out to be relatively easy to compute ∇d, via an unstrained subproblem: ∇d(y) = b − Aˉ x, Dual gradient iteration solve 1. xk ←− minx L(x; yk ); 2. yk+1 = yk − c(b − Axk ). ˉ = arg min L(x; y). where x x Sub-gradient of d(y) Assume d(y) is sub-differentiable (which condition on primal can guarantee this?) Lemma ˉ = arg minx L(x; y), we have b − Aˉ Given dual point y and x x ∈ ∂d(y). Proof. Recall • u ∈ ∂d(y) if d(y0 ) ≥ d(y) + hu, y0 − yi for all y0 ; • d(y) := − minx L(x; y). ˉ, From (ii) and definition of x d(y) + hb − Aˉ x, y0 − yi = −L(ˉ x; y) + (b − Aˉ x)T (y − y0 ) = −[f (ˉ x + yT (Aˉ x − b)] + (b − Aˉ x)T (y − y0 ) = −[f (ˉ x) + (y0 )T (Aˉ x − b)] = −L(ˉ x; y0 ) ≤ d(y0 ). From (i), b − Aˉ x ∈ ∂d(y). Dual explicit (sub)gradient iteration The iteration: solve 1. xk ←− minx L(x; yk ); 2. yk+1 = yk − ck (b − Axk ); Notes: • (b − Axk ) ∈ ∂d(yk ) as shown in the last slide • it does not require d(y) to be differentiable • convergence might require a careful choice of ck (e.g., a diminishing sequence) if d(y) is only sub-differentiable (or lacking Lipschitz continuous gradient) Dual implicit gradient Goal: to descend using the (sub)gradient of d at the next point y k+1 : Following from the Lemma, we have b − Axk+1 ∈ ∂d(yk+1 ), where xk+1 = arg min L(x; yk+1 ) x Since the implicit step is y k+1 = y − c(b − Ax k k+1 ), we can derive xk+1 = arg min L(x; yk+1 ) ⇐⇒ x 0 ∈ ∂x L(xk+1 ; yk+1 ) = ∂f (xk+1 ) + AT yk+1 = ∂f (xk+1 ) + AT (yk − c(b − Axk+1 )). Therefore, while xk+1 is a solution to minx L(x; yk+1 ); it is also a solution to min LA (x; yk , c) = f (x) + (yk )T (Ax − b) + x which is independent of yk+1 . c kAx − bk2 , 2 Dual implicit gradient Proposition Assuming y0 = y − c(b − Ax0 ), the followings are equivalent solve 1. x0 ←− minx L(x; y0 ), solve 2. x0 ←− minx LA (x; y, c). Dual implicit gradient iteration The iteration yk+1 = proxcd (yk ) is commonly known as the augmented Lagrangian method or the method of multipliers. Implementation: solve 1. xk+1 ←− minx LA (x; yk , c); 2. yk+1 = yk − c(b − Axk+1 ). Proposition The followings are equivalent 1. the augmented Lagrangian iteration; 2. the implicit gradient iteration of d(y); 3. the proximal iteration yk+1 = proxcd (yk ). Dual explicit/implicit (sub)gradient computation Definitions: • L(x; y) = f (x) + yT (Ax − b) • LA (x; y, c) = L(x; y) + c kAx 2 − bk2 Objective: d(y) = − min L(x; y). x Explicit (sub)gradient iteration: y k+1 = yk − c∇d(yk ) or use a subgradient ∂d(yk ) 1. xk+1 = arg minx L(x; yk ); 2. yk+1 = yk − c(b − Axk+1 ). Implicit (sub)gradient step: yk+1 = proxcd yk 1. xk+1 = arg minx LA (x; yk , c); 2. yk+1 = yk − c(b − Axk+1 ). The implicit iteration is more stable; “step size” c does not need to diminish.
© Copyright 2026 Paperzz