Lecture 20

SYS 6003: Optimization
Fall 2016
Lecture 20
Date: Nov 7th
Instructor: Quanquan Gu
We are going to introduce the properties of proximal mapping (proximal operator) and
the calculation rules of proximal mapping.
Definition 1 (Proximal Mapping) The proximal mapping of h(x) is
1
Proxh (x) = arg min ku − xk22 + h(u).
u 2
Lemma 1 (Non-expansiveness) If u = Proxh (x), v = Proxh (y), we have
(u − v)> (x − y) ≥ ku − vk22 .
(1)
ku − vk2 ≤ kx − yk2 .
(2)
In addition,
Remark 1 (2) implies that kProxh (x) − Proxh (y)k2 ≤ kx − yk2 . It means that the proximal
function is 1-Lipschitz function.
Proof: Since u = Proxh (x), we have x − u ∈ ∂h(u), which implies
h(v) ≥ h(u) + (x − u)> (v − u).
Similarly we have y − v ∈ ∂h(v), which implies
h(u) ≥ h(v) + (y − v)> (u − v).
Thus we have
(x − u + v − y)> (u − v) ≥ 0.
Rearrange the inequality, we have
(x − y)> (u − v) ≥ ku − vk22 .
By (3) we have
ku − vk22 ≤ (x − y)> (u − v) ≤ kx − yk2 · ku − vk2 .
Thus
ku − vk2 ≤ kx − yk2 .
Now we focus on the calculation rules of proximal mapping.
1
(3)
1. Separable sum
f (x, y) = g(x) + h(y),
we have
x
Proxg (x)
Proxf
=
.
y
Proxg (y)
Example 1
f (x) = kxk1 =
d
X
|xi | = |x1 | + |x2 | + · · · + |xd |.
i=1
2. Scaling and translation of argument
For a 6= 0, x ∈ R, f (x) = g(ax + b), we have
Proxf (x) =
1
Proxa2 g (ax + b) − b
a
Proof:
1
û = Proxf (x) = arg min (u − x)2 + g(au + b)
u 2
Let t = au + b, thus u = (t − b)/a, we have
2
1 t−b
− x + g(t)
t̂ = arg min
t 2
a
1
= arg min 2 (t − b − ax)2 + g(t)
t 2a
1
= arg min (t − b − ax)2 + a2 g(t)
t 2
= Proxa2 g (ax + b)
Thus we have
Proxf (x) = û =
t̂ − b
1
= Proxa2 g (ax + b) − b .
a
a
3. “Right” scalar multiplication
f (x) = λg(x/λ),
2
we have
Proxf (x) = λ · Proxλ−1 g (x/λ)
Proof:
1
û = Proxf (x) = arg min (u − x)2 + f (u)
u 2
1
= arg min (u − x)2 + λf (u/λ)
u 2
Let t = u/λ, thus u = λt, we have
1
t̂ = arg min (λt − x)2 + λg(t)
t 2
λ2 x 2
= arg min
t−
+ λg(t)
t
2
λ
1
x 2 1
= arg min t −
+ g(t)
t 2
λ
λ
= Proxλ−1 g (x/λ)
Thus we have
Proxf (x) = û = λt̂ = λ · Proxλ−1 g (x/λ).
4. Addition to linear function
For a, x ∈ Rd ,
f (x) = g(x) + a> x,
we have
Proxf (x) = λ · Proxg (x − a)
Proof:
1
Proxf (x) = arg min ku − xk22 + f (u)
u 2
1
= arg min ku − xk22 + λg(u) + a> u
u 2
1
= arg min [u> u − 2u> x + x> x + 2a> u] + g(u)
u 2
1
= arg min ku − x + ak22 + g(u)
u 2
= Proxg (x − a)
3
5. Addition to quadratic function
For µ > 0,
µ
f (x) = g(x) + kx − ak22 ,
2
we have
Proxf (x) = Proxθg θx + (1 − θ)a ,
where θ = 1/(1 + µ).
Example 2 (Elastic Net)
min
x
1
kAx − bk22 + λkxk1 + µkxk22
2n
Proof:
1
Proxf (x) = arg min ku − xk22 + f (u)
u 2
1
µ
= arg min ku − xk22 + g(u) + kx − ak22
u 2
2
1 >
= arg min [u u − 2u> x + x> x + µu> u − 2µu> a + µa> a] + g(u)
u 2
1
= arg min [(1 + µ)u> u − 2(x + µa)> u + C] + g(u)
u 2
x + µa >
1+µ >
0
u + C + g(u)
= arg min
u u−2
u
2
1+µ
x + µa 1
1
2
g(u)
= arg min u −
+
u 2
1+µ 2 1+µ
= Proxθg θx + (1 − θ)a
Examples:
1. Quadratic function: For A > 0, x, b ∈ Rd , c ∈ R,
1
f (x) = x> Ax + b> x + c,
2
we have
Proxtf (x) = (I + tA)−1 (x − tb).
2. Logarithmic function:
f (x) = −
d
X
log(xi ),
i=1
we have
xi +
Proxtf (x) i =
4
p
x2i + 4t
.
2
3. Euclidean norm(`2 norm):
f (x) = kxk2 ,
we have
( Proxtf (x) =
1−
0
5
t
kxk2
x , kxk2 > t
, otherwise