Gradient Descent
Converges to Minimizers
Jason D. Lee, Max Simchowitz, Michael I. Jordan,
and Benjamin Recht
Presented by Qiuwei Li
EECS, Colorado School of Mines
SINE
x
s(x)
SIgnals and NEtworks
1
Outline
I
Gradient Decent Introduction
I
Almost Sure Not Converge to A Saddle
I
Proof by Stable-Center Manifold theorem
1
Gradient Decent
I
Objective: minimize the function f : Rn → R
I
Input: intimal x 0 , stepsize α
I
x k+1 = x k − α∇f (x k )
2
When Gradient Decent Converge?
I
Converge when Critical Points i.e. ∇f (x) = 0
I
Local Mins
I
Saddle Points
f (x) = 2(x12 + x22 )
f (x) = −x12 + x22
3
Hard to Converge to Saddle
4
Hard to Converge to Saddle
I
I
Those points Converge to Saddles are often measure 0.
Example: Only the Read Region
5
Terminologies and Assumptions
I
Strict saddle if λmin(∇2f (x ∗)) < 0
I
The gradient map: g (x) = x − α∇f (x)
I
Define W s (x ∗) = {x : limk g k (x) = x ∗}
I
k∇f (x) − ∇f (y )k2 ≤ Lkx − y k2
6
Main Result
I
Assume f ∈ C 2
I
Let x ∗ be a strict saddle
I
Assume 0 < α <
1
L
Then,
Vol(Ws (x ∗)) = 0
7
Stable-Center Manifold Theorem
I
Let x ∗ be a fixed point of some map g
I
g is diffeomorphism
I
Ecs = Span{vi : λi (Dg (x ∗ )) ≤ 1}
I
cs
Then there exist disk Wloc
of x ∗ (living in tangent space
of Ecs at x ∗ ) and neighbourhood B of x ∗ such that
(
cs
cs
g (Wloc
) ∩ B ⊂ Wloc
cs
g k (x) ∈ B, ∀k ≥ 0 ⇒ x ∈ Wloc
8
Stable-Center Manifold Theorem
9
Tangent Spaces have the same Dims of
Manifold
10
Proof by Stable-Center Manifold Theorem
I
Show the gradient map g is diffeomorphism. Then
x ∈ W s (x ∗ ) ⇒ g t (x) ∈ B, ∀t ≥ Tx
⇒ g k (g t (x)) ∈ B, ∀k ≥ 0
cs
⇒ g t (x) ∈ Wloc
cs
⇒ x ∈ g −t (Wloc
)
−t
cs
⇒ x ∈ ∪t≥0 g (Wloc
)
I
I
I
(1)
(2)
(3)
(4)
(5)
x ∗ is strict saddle ⇒ λmin (∇2 f (x ∗ )) < 0
cs
Dg (x ∗ ) = I − α∇2 f (x) ⇒ dim(Ecs ) < n ⇒ Vol(Wloc
)=
0
Diffeomorphism maps zero volume to zero volume
cs
⇒ Vol(g −t (Wloc
)) = 0
11
Show g is diffeomorphism if 0 < α <
I
I
I
I
I
I
1
L
diffeomorphism=bijection+g , g −1 continuously differen.
Show bijection by construct the inverse map of y = g (x)
1
: x = Proxα(−f ) (y ) = arg min kx − y k2 + α(−f )(x)
x 2
∇2 ( 21 kx − y k2 − αf (x)) = I − α∇2 f (x) > 0
Strongly CVX ⇒ Unique minimizer
xy : xy − y − α∇f (xy ) = 0, ∀y
f ∈ C 2 ⇒ g continuously differentiable
Inverse function Thm: Dg (x) = I − α∇2 f (x) > 0 ⇒
g −1 continuously differentiable
12
A Example
I
f (x) = 21 x > Hx ∈ C 2 with H = diag(λ1 , . . . , λn )
I
λ1 , . . . , λk > 0, λk+1 , . . . , λn < 0
I
x ∗ = 0 the unique saddle point, also strict saddle
I
g (x) = (I − αH)x diffeomorphism
I
Es = Span{e1 , . . . , ek } ⇒ zero volume
Then 0 < α < 1/|λ|max ⇒ Vol(Ws (x ∗ )) = 0
Double Check:
I g k (x) = (I − αH)k x =
Pk
Pn
(1
−
αλ
)x
+
i
i
i=1
j=k+1 (1 − αλj )xj
I
I
Ws (x ∗ ) = {x : xk+1 = . . . = xn = 0} ⇒ zero volumn
13
A Example: f (x) = −x12 + x22
14
© Copyright 2026 Paperzz