A Greedy Framework for First-Order Optimization Jonathan Huggins 1 Massachusetts 1 Jacob Steinhardt 2 Institute of Technology 2 Stanford University Dec 10, 2013 JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 1/6 Motivation We want to solve the following saddle point problem: min max L(u, θ), u θ where L(u, θ) = h(u) + u T θ − R(θ). (Assume h, R convex.) JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 2/6 Motivation We want to solve the following saddle point problem: min max L(u, θ), u θ where L(u, θ) = h(u) + u T θ − R(θ). (Assume h, R convex.) Tie-in with optimization: can think of this as minimizing def L(u) = max L(u, θ) = h(u) + R ∗ (u). θ JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 2/6 Motivation def L(u, θ) = h(u) + u T θ − R(θ) We want to solve the following saddle point problem: min max L(u, θ), u θ where L(u, θ) = h(u) + u T θ − R(θ). (Assume h, R convex.) Tie-in with optimization: can think of this as minimizing def L(u) = max L(u, θ) = h(u) + R ∗ (u). θ JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 2/6 Being (Too) Greedy def L(u, θ) = h(u) + u T θ − R(θ) Let’s try the following updates: ut = arg min L(u, θt ) u θt+1 = arg max L(ut , θ) θ “iterative best response” JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 3/6 Being (Too) Greedy def L(u, θ) = h(u) + u T θ − R(θ) Let’s try the following updates: ut = arg min L(u, θt ) u θt+1 = arg max L(ut , θ) θ “iterative best response” Issue: let u, θ ∈ R, h(u) = 12 u 2 , R(θ) = 21 θ2 . Then: 1 2 ut = arg min u + uθ = −θt 2 u 1 2 θt+1 = arg max uθ − θ = ut 2 θ OSCILLATION JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 3/6 Being Just Greedy Enough def L(u, θ) = h(u) + u T θ − R(θ) P def Can get what we want if we replace ut with ût = 1t ts=1 us : ut = arg min L(u, θt ) u θt+1 = arg max L(ût , θ) θ JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 4/6 Being Just Greedy Enough def L(u, θ) = h(u) + u T θ − R(θ) P def Can get what we want if we replace ut with ût = 1t ts=1 us : ut = arg min L(u, θt ) u θt+1 = arg max L(ût , θ) θ Theorem If R is strongly convex then |L(ût , θt ) − L(u ∗ , θ∗ )| ≤ O log(T ) T . Note: can get O(1/T ) convergence if we use a weighted average for ût . JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 4/6 Frank-Wolfe def L(u, θ) = h(u) + u T θ − R(θ) We get Frank-Wolfe for h ≡ 0: ut arg min u T θt = arg min L(u, θt ) = u u θt+1 = arg max L(ût , θ) = arg max ûtT θ − R(θ) = θ JHH and JS (MIT, Stanford) ∂R ∗ (ût ) θ Greedy First-Order Optimization Dec 10, 2013 5/6 Frank-Wolfe def L(u, θ) = h(u) + u T θ − R(θ) We get Frank-Wolfe for h ≡ 0: ut arg min u T θt = arg min L(u, θt ) = u u θt+1 = arg max L(ût , θ) = arg max ûtT θ − R(θ) = θ ∂R ∗ (ût ) θ General updates: ut = ∂h∗ (−θt ) θt+1 = ∂R ∗ (ût ). JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 5/6 Applications L(u, θ) Algorithm T ∗ h(u) + u θ − f (θ) mirror descent kuk1 + (Au − y )T θ − 21 kθk22 thresholded Frank-Wolfe P Tr(AT X ) + ni=1 yi (Xii − 1 − η log yi ) AHK low-rank SDP Ex∼µ [θT (φ(x) − φ̄)] − q1 kθkqq q-herding (Note: some of the entries above use a dual version of the algorithm.) JHH and JS (MIT, Stanford) Greedy First-Order Optimization Dec 10, 2013 6/6
© Copyright 2026 Paperzz