Gradient Methods

Gradient Methods
April 2004
Preview



Background
Steepest Descent
Conjugate Gradient
Preview



Background
Steepest Descent
Conjugate Gradient
Background



Motivation
The gradient notion
The Wolfe Theorems
Motivation

The min(max) problem:
min f ( x )
x

But we learned in calculus how to solve that
kind of question!
Motivation



Not exactly,
n
Functions:
High order polynomials:
f :R R
1 3
1 5
1
7
x  x 
x 
x
6
120
5040

What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem

Connectivity shapes (isenburg,gumhold,gotsman)
mesh  {C  (V , E ), geometry}

What do we get only from C without geometry?
Motivation- “real world” problem

First we introduce error functionals and then try
to minimize them:
Es ( x 
Er ( x 
n3
  x x
)
i
( i , j )E
n3
n
)   L( xi ) 2
i 1
1
L( xi ) 
di

( i , j )E
x j  xi
j

1
2
Motivation- “real world” problem

Then we minimize:
E (C ,  )  arg min 1    Es ( x)   Er ( x) 
x


n3
High dimension non-linear problem.
The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
Motivation- “real world” problem

Changing the parameter:
E (C ,  )  arg min 1    Es ( x)   Er ( x) 
x
n3
Motivation


General problem: find global min(max)
This lecture will concentrate on finding local
minimum.
Background



Motivation
The gradient notion
The Wolfe Theorems
1 
1 
f := ( x , y )cos x  cos y  x
2 
2 
Directional Derivatives:
first, the one dimension derivative:

Directional Derivatives :
Along the Axes…
f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…
vR
2
v 1
f ( x, y )
v
Directional
Derivatives
f ( x, y )
y
f ( x, y )
x
The Gradient: Definition in R
2
f :R R
2
 f
f ( x, y ) : 
 x
f 

y 
In the plane
The Gradient: Definition
f :R R
n
 f
f 

f ( x1 ,..., xn ) : 
,...,
xn 
 x1
The Gradient Properties

The gradient defines (hyper) plane
approximating the function infinitesimally
f
f
z   x   y
x
y
The Gradient properties

By the chain rule: (important for later use)
v 1
f
( p )  (f ) p , v
v
The Gradient properties

Proposition 1:
is maximal choosing v 
f
v
is minimal choosing
1
 (f ) p
(f ) p
1
v
 (f ) p
(f ) p
(intuitive: the gradient point the greatest change direction)
The Gradient properties
Proof: (only for minimum case)
1
Assign: v 
by chain rule:
 (f ) p
(f ) p
f ( x, y )
1
( p)  (f ) p ,
 (f ) p 
v
(f ) p
 (f ) p
1
 (f ) p , (f ) p 
(f ) p
(f ) p
2
  (f ) p
The Gradient properties
On the other hand for general v:
f ( x, y )
( p)  (f ) p , v  (f ) p  v 
v
 (f ) p
f ( x, y )

( p)   (f ) p
v
The Gradient Properties

 R be a
Proposition 2: let f : R 
1
smooth C function around P,
if f has local minimum (maximum) at p
then,
n
(f ) p  0
(Intuitive: necessary for local min(max))
The Gradient Properties
Proof:
Intuitive:
The Gradient Properties
Formally: for any
We get:
v  R \ {0}
n
df ( p  t  v)
0
(0)  (f ) p , v
dt
 (f ) p  0
The Gradient Properties



We found the best INFINITESIMAL DIRECTION
at each point,
Looking for minimum: “blind man” procedure
How can we derive the way to the minimum
using this knowledge?
Background



Motivation
The gradient notion
The Wolfe Theorems
The Wolfe Theorem


This is the link from the previous gradient
properties to the constructive algorithm.
The problem:
min f ( x )
x
The Wolfe Theorem
We introduce a model for algorithm:
n
Data: x 0  R
Step 0:
set i=0
Step 1:
if f ( xi )  0 stop,
else, compute search direction
Step 2:
compute the step-size
i  arg min f ( xi    hi )

 0
Step 3:
set
hi  R
xi 1  xi  i  hi go to step 1
n
The Wolfe Theorem
The Theorem: suppose f : R 
 R C1
smooth, and exist continuous function:
n
k : R  [0,1]
n
And,
x : f ( x)  0  k ( x)  0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi  k ( xi )  f ( xi )  hi
The Wolfe Theorem
And f
( y)  0  hi  0
Then if { xi }i0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y )  0
The Wolfe Theorem
The theorem has very intuitive interpretation :
Always go in decent direction.
hi
f ( xi )
Preview



Background
Steepest Descent
Conjugate Gradient
Steepest Descent




What it mean?
We now use what we have learned to
implement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:
min f ( x )
x
Steepest Descent
Steepest descent algorithm:
n
Data: x 0  R
Step 0:
set i=0
Step 1:
if f ( xi )  0 stop,
else, compute search direction hi  f ( xi )
Step 2:
compute the step-size
i  arg min f ( xi    hi )

 0
Step 3:
set
xi 1  xi  i  hi go to step 1
Steepest Descent


{
x
}
Theorem: if i i 0 is a sequence constructed
by the SD algorithm, then every accumulation
point y of the sequence satisfy:
f ( y )  0

Proof: from Wolfe theorem
Remark: wolfe theorem gives us numerical stability is the derivatives aren’t
given (are calculated numerically).
Steepest Descent

From the chain rule:
d
f ( xi  i  hi )  f ( xi  i  hi ), hi  0
d

Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent



The steepest descent find critical point and
local minimum.
Implicit step-size rule
Actually we reduced the problem to finding
minimum:
f :RR

There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent

Back with our connectivity shapes: the authors
solve the 1-dimension problem analytically.
i  arg min f ( xi    hi )
 0

They change the spring energy and get a
quartic polynomial in x
Es ( x 
n3
)

( i , j )E
 x x
i
2
j

1
2
Preview



Background
Steepest Descent
Conjugate Gradient