SYS 6003: Optimization Fall 2016 Lecture 20 Date: Nov 7th Instructor: Quanquan Gu We are going to introduce the properties of proximal mapping (proximal operator) and the calculation rules of proximal mapping. Definition 1 (Proximal Mapping) The proximal mapping of h(x) is 1 Proxh (x) = arg min ku − xk22 + h(u). u 2 Lemma 1 (Non-expansiveness) If u = Proxh (x), v = Proxh (y), we have (u − v)> (x − y) ≥ ku − vk22 . (1) ku − vk2 ≤ kx − yk2 . (2) In addition, Remark 1 (2) implies that kProxh (x) − Proxh (y)k2 ≤ kx − yk2 . It means that the proximal function is 1-Lipschitz function. Proof: Since u = Proxh (x), we have x − u ∈ ∂h(u), which implies h(v) ≥ h(u) + (x − u)> (v − u). Similarly we have y − v ∈ ∂h(v), which implies h(u) ≥ h(v) + (y − v)> (u − v). Thus we have (x − u + v − y)> (u − v) ≥ 0. Rearrange the inequality, we have (x − y)> (u − v) ≥ ku − vk22 . By (3) we have ku − vk22 ≤ (x − y)> (u − v) ≤ kx − yk2 · ku − vk2 . Thus ku − vk2 ≤ kx − yk2 . Now we focus on the calculation rules of proximal mapping. 1 (3) 1. Separable sum f (x, y) = g(x) + h(y), we have x Proxg (x) Proxf = . y Proxg (y) Example 1 f (x) = kxk1 = d X |xi | = |x1 | + |x2 | + · · · + |xd |. i=1 2. Scaling and translation of argument For a 6= 0, x ∈ R, f (x) = g(ax + b), we have Proxf (x) = 1 Proxa2 g (ax + b) − b a Proof: 1 û = Proxf (x) = arg min (u − x)2 + g(au + b) u 2 Let t = au + b, thus u = (t − b)/a, we have 2 1 t−b − x + g(t) t̂ = arg min t 2 a 1 = arg min 2 (t − b − ax)2 + g(t) t 2a 1 = arg min (t − b − ax)2 + a2 g(t) t 2 = Proxa2 g (ax + b) Thus we have Proxf (x) = û = t̂ − b 1 = Proxa2 g (ax + b) − b . a a 3. “Right” scalar multiplication f (x) = λg(x/λ), 2 we have Proxf (x) = λ · Proxλ−1 g (x/λ) Proof: 1 û = Proxf (x) = arg min (u − x)2 + f (u) u 2 1 = arg min (u − x)2 + λf (u/λ) u 2 Let t = u/λ, thus u = λt, we have 1 t̂ = arg min (λt − x)2 + λg(t) t 2 λ2 x 2 = arg min t− + λg(t) t 2 λ 1 x 2 1 = arg min t − + g(t) t 2 λ λ = Proxλ−1 g (x/λ) Thus we have Proxf (x) = û = λt̂ = λ · Proxλ−1 g (x/λ). 4. Addition to linear function For a, x ∈ Rd , f (x) = g(x) + a> x, we have Proxf (x) = λ · Proxg (x − a) Proof: 1 Proxf (x) = arg min ku − xk22 + f (u) u 2 1 = arg min ku − xk22 + λg(u) + a> u u 2 1 = arg min [u> u − 2u> x + x> x + 2a> u] + g(u) u 2 1 = arg min ku − x + ak22 + g(u) u 2 = Proxg (x − a) 3 5. Addition to quadratic function For µ > 0, µ f (x) = g(x) + kx − ak22 , 2 we have Proxf (x) = Proxθg θx + (1 − θ)a , where θ = 1/(1 + µ). Example 2 (Elastic Net) min x 1 kAx − bk22 + λkxk1 + µkxk22 2n Proof: 1 Proxf (x) = arg min ku − xk22 + f (u) u 2 1 µ = arg min ku − xk22 + g(u) + kx − ak22 u 2 2 1 > = arg min [u u − 2u> x + x> x + µu> u − 2µu> a + µa> a] + g(u) u 2 1 = arg min [(1 + µ)u> u − 2(x + µa)> u + C] + g(u) u 2 x + µa > 1+µ > 0 u + C + g(u) = arg min u u−2 u 2 1+µ x + µa 1 1 2 g(u) = arg min u − + u 2 1+µ 2 1+µ = Proxθg θx + (1 − θ)a Examples: 1. Quadratic function: For A > 0, x, b ∈ Rd , c ∈ R, 1 f (x) = x> Ax + b> x + c, 2 we have Proxtf (x) = (I + tA)−1 (x − tb). 2. Logarithmic function: f (x) = − d X log(xi ), i=1 we have xi + Proxtf (x) i = 4 p x2i + 4t . 2 3. Euclidean norm(`2 norm): f (x) = kxk2 , we have ( Proxtf (x) = 1− 0 5 t kxk2 x , kxk2 > t , otherwise
© Copyright 2026 Paperzz