Gradient Methods
April 2004
Preview
Background
Steepest Descent
Conjugate Gradient
Preview
Background
Steepest Descent
Conjugate Gradient
Background
Motivation
The gradient notion
The Wolfe Theorems
Motivation
The min(max) problem:
min f ( x )
x
But we learned in calculus how to solve that
kind of question!
Motivation
Not exactly,
n
Functions:
High order polynomials:
f :R R
1 3
1 5
1
7
x x
x
x
6
120
5040
What about function that don’t have an analytic
presentation: “Black Box”
Motivation- “real world” problem
Connectivity shapes (isenburg,gumhold,gotsman)
mesh {C (V , E ), geometry}
What do we get only from C without geometry?
Motivation- “real world” problem
First we introduce error functionals and then try
to minimize them:
Es ( x
Er ( x
n3
x x
)
i
( i , j )E
n3
n
) L( xi ) 2
i 1
1
L( xi )
di
( i , j )E
x j xi
j
1
2
Motivation- “real world” problem
Then we minimize:
E (C , ) arg min 1 Es ( x) Er ( x)
x
n3
High dimension non-linear problem.
The authors use conjugate gradient method
which is maybe the most popular optimization
technique based on what we’ll see here.
Motivation- “real world” problem
Changing the parameter:
E (C , ) arg min 1 Es ( x) Er ( x)
x
n3
Motivation
General problem: find global min(max)
This lecture will concentrate on finding local
minimum.
Background
Motivation
The gradient notion
The Wolfe Theorems
1
1
f := ( x , y )cos x cos y x
2
2
Directional Derivatives:
first, the one dimension derivative:
Directional Derivatives :
Along the Axes…
f ( x, y )
y
f ( x, y )
x
Directional Derivatives :
In general direction…
vR
2
v 1
f ( x, y )
v
Directional
Derivatives
f ( x, y )
y
f ( x, y )
x
The Gradient: Definition in R
2
f :R R
2
f
f ( x, y ) :
x
f
y
In the plane
The Gradient: Definition
f :R R
n
f
f
f ( x1 ,..., xn ) :
,...,
xn
x1
The Gradient Properties
The gradient defines (hyper) plane
approximating the function infinitesimally
f
f
z x y
x
y
The Gradient properties
By the chain rule: (important for later use)
v 1
f
( p ) (f ) p , v
v
The Gradient properties
Proposition 1:
is maximal choosing v
f
v
is minimal choosing
1
(f ) p
(f ) p
1
v
(f ) p
(f ) p
(intuitive: the gradient point the greatest change direction)
The Gradient properties
Proof: (only for minimum case)
1
Assign: v
by chain rule:
(f ) p
(f ) p
f ( x, y )
1
( p) (f ) p ,
(f ) p
v
(f ) p
(f ) p
1
(f ) p , (f ) p
(f ) p
(f ) p
2
(f ) p
The Gradient properties
On the other hand for general v:
f ( x, y )
( p) (f ) p , v (f ) p v
v
(f ) p
f ( x, y )
( p) (f ) p
v
The Gradient Properties
R be a
Proposition 2: let f : R
1
smooth C function around P,
if f has local minimum (maximum) at p
then,
n
(f ) p 0
(Intuitive: necessary for local min(max))
The Gradient Properties
Proof:
Intuitive:
The Gradient Properties
Formally: for any
We get:
v R \ {0}
n
df ( p t v)
0
(0) (f ) p , v
dt
(f ) p 0
The Gradient Properties
We found the best INFINITESIMAL DIRECTION
at each point,
Looking for minimum: “blind man” procedure
How can we derive the way to the minimum
using this knowledge?
Background
Motivation
The gradient notion
The Wolfe Theorems
The Wolfe Theorem
This is the link from the previous gradient
properties to the constructive algorithm.
The problem:
min f ( x )
x
The Wolfe Theorem
We introduce a model for algorithm:
n
Data: x 0 R
Step 0:
set i=0
Step 1:
if f ( xi ) 0 stop,
else, compute search direction
Step 2:
compute the step-size
i arg min f ( xi hi )
0
Step 3:
set
hi R
xi 1 xi i hi go to step 1
n
The Wolfe Theorem
The Theorem: suppose f : R
R C1
smooth, and exist continuous function:
n
k : R [0,1]
n
And,
x : f ( x) 0 k ( x) 0
And, the search vectors constructed by the
model algorithm satisfy:
f ( xi ), hi k ( xi ) f ( xi ) hi
The Wolfe Theorem
And f
( y) 0 hi 0
Then if { xi }i0 is the sequence constructed by
the algorithm model,
then any accumulation point y of this sequence
satisfy:
f ( y ) 0
The Wolfe Theorem
The theorem has very intuitive interpretation :
Always go in decent direction.
hi
f ( xi )
Preview
Background
Steepest Descent
Conjugate Gradient
Steepest Descent
What it mean?
We now use what we have learned to
implement the most basic minimization
technique.
First we introduce the algorithm, which is a
version of the model algorithm.
The problem:
min f ( x )
x
Steepest Descent
Steepest descent algorithm:
n
Data: x 0 R
Step 0:
set i=0
Step 1:
if f ( xi ) 0 stop,
else, compute search direction hi f ( xi )
Step 2:
compute the step-size
i arg min f ( xi hi )
0
Step 3:
set
xi 1 xi i hi go to step 1
Steepest Descent
{
x
}
Theorem: if i i 0 is a sequence constructed
by the SD algorithm, then every accumulation
point y of the sequence satisfy:
f ( y ) 0
Proof: from Wolfe theorem
Remark: wolfe theorem gives us numerical stability is the derivatives aren’t
given (are calculated numerically).
Steepest Descent
From the chain rule:
d
f ( xi i hi ) f ( xi i hi ), hi 0
d
Therefore the method of steepest descent
looks like this:
Steepest Descent
Steepest Descent
The steepest descent find critical point and
local minimum.
Implicit step-size rule
Actually we reduced the problem to finding
minimum:
f :RR
There are extensions that gives the step size
rule in discrete sense. (Armijo)
Steepest Descent
Back with our connectivity shapes: the authors
solve the 1-dimension problem analytically.
i arg min f ( xi hi )
0
They change the spring energy and get a
quartic polynomial in x
Es ( x
n3
)
( i , j )E
x x
i
2
j
1
2
Preview
Background
Steepest Descent
Conjugate Gradient
© Copyright 2026 Paperzz