Chapter 10 Conjugate Direction Methods

Chapter 10 Conjugate Direction Methods
An Introduction to Optimization
Spring, 2012
Wei-Ta Chu
1
2012/4/13
Introduction
Conjugate direction methods can be viewed as being
intermediate between the method of steepest descent and
Newton’s method.
Solve quadratics of variables in steps.
The usual implementation, the conjugate gradient algorithm,
requires no Hessian matrix evaluations.
No matrix inversion and no storage of an
matrix are required.
The conjugate direction methods typically perform better than
the method of steepest descent, but not as well as Newton’s
method.
2
Introduction
For a quadratic function of variables
,
,
, the best direction of search is in the
-conjugate direction.
Basically, two directions
and
in are said to be conjugate if
.
Definition 10.1: Let be a real symmetric
matrix. The
directions
are -conjugate if for all
, we
have
3
Introduction
Lemma 10.1: Let be a symmetric positive definite
matrix. If the directions
,
, are
nonzero and -conjugate, then they are linearly independent.
Proof: Let
be scalars such that
Premultiplying this equality by
because all other terms
But
and
Therefore,
4
,
; hence,
,
,
, yields
, by -conjugacy.
,
.
, are linearly independent.
Example
Let
Note that
. The matrix is positive definite
because all its leading principal minors are positive:
Our goal is to construct a set of -conjugate vectors
Let
,
,
require that
. We have
5
. We
Example
Let
,
,
. Then,
, and thus
To find the third vector
, which would be -conjugate with
and
, we require that
and
.
We have
If we take
mutually conjugate.
6
, then the resulting set of vectors is
The Conjugate Direction Algorithm
This method of finding -conjugate vectors is inefficient. A
systematic procedure for finding -conjugate vectors can be
devised using the idea underlying the Gram-Schmidt process
of transforming a given basis of
into an orthonormal basis
of .
7
The Conjugate Direction Algorithm
Minimizing the quadratic function of variables
where
function
,
. Note that because
, the
has a global minimizer that can be found by solving
Basic Conjugate Direction Algorithm. Given a starting point
and -conjugate directions
; for
,
8
The Conjugate Direction Algorithm
Theorem 10.1: For any starting point
, the basic conjugate
direction algorithm converges to the unique (that solves
) in steps; that is,
.
Proof: Consider
. Because the
are linearly
independent, there exist constants
, such that
Now premultiply both sides of this equation by
to obtain
where the terms
, by the - conjugate property. Hence,
9
The Conjugate Direction Algorithm
Now, we can write
Therefore,
So writing
and premultiplying the above by
because
and
10
and
, we obtain
. Thus,
, which completes the proof.
Example
Find the minimizer of
using the conjugate direction method with the initial point
, and -conjugate direction
and
.
We have
and hence
Thus,
11
Example
To find
, we compute
and
Therefore,
Because
12
is a quadratic function in two variables,
The Conjugate Direction Algorithm
For a quadratic function of variables, the conjugate direction
method reaches the solution after steps.
Suppose that we start at
and search in the direction
to
obtain
we claim that
13
The Conjugate Direction Algorithm
The equation
implies that
, where
the chain rule to get
Evaluating the above at
has the property that
. To see this, apply
, we get
Because is a quadratic function of , and the coefficient of
the
term in
is
, the above implies that
14
The Conjugate Direction Algorithm
Using a similar argument, we can show that for all ,
and hence,
Lemma 10.2: In the conjugate direction algorithm,
for all ,
Proof: Note that
because
15
, and
. Thus,
The Conjugate Direction Algorithm
Prove by induction. The result is true for
because
We now show that if the result is true for
(i.e.
), then it is true for , i.e.
Fix
and
. By the induction hypothesis,
Because
and
by the -conjugacy, we have
It remains to be shown that
16
The Conjugate Direction Algorithm
Indeed,
because
.
Therefore, by induction, for all
17
and
The Conjugate Direction Algorithm
By Lemma 10.2 we see that
is orthogonal to any vector
from the subspace spanned by
We now show that not only does
satisfy
, but also
In other words, if we write
then we can express
. As increases, the
subspace
“expands,” and will eventually
fill the whole of Rn (provided that
are linearly
independent).
18
The Conjugate Direction Algorithm
Therefore, for some sufficiently large , will lie in . For
this reason, the above result is sometimes called the expanding
subspace theorem.
To prove the expanding subspace theorem, define the matrix
Note that
Hence,
19
. Also
The Conjugate Direction Algorithm
Now, consider any vector
. There exists a vector such
that
. Let
. Note that is
a quadratic function and has a unique minimizer that satisfies
the FONC. By the chain rule,
Therefore,
By Lemma 10.2,
. Therefore, satisfies the
FONC for the quadratic function , and hence is the
minimizer of ; that is,
20
The Conjugate Gradient Algorithm
The conjugate direction algorithm is very effective. However,
to use the algorithm, we need to specify the - conjugate
directions.
Fortunately, there is a way to generate -conjugate directions
as we perform iterations.
The conjugate gradient algorithm does not use prespecified
conjugate directions, but instead computes the directions as the
algorithm proceeds.
At each stage of the algorithm, the direction is calculated as a
linear combination of the previous direction and the current
gradient, in such as way that all the directions are mutually
- conjugate.
21
The Conjugate Gradient Algorithm
For a quadratic function of variables, we can locate the
function minimizer by performing searches along mutually
conjugate directions.
We consider the quadratic function
where
point
Thus,
22
. Our first search direction from an initial
is in the direction of steepest descent; that is,
, where
The Conjugate Gradient Algorithm
We search in a direction
that is -conjugate to . We
choose
as a linear combination of
and
. In general,
at the
th step, we choose
as a linear combination of
and
. Specifically, we choose
The coefficients
is -conjugate to
choosing to be
23
, are chosen in such a way that
. This is accomplished by
The Conjugate Gradient Algorithm
The algorithm
1. Set
2.
3.
4.
5.
6.
7.
8. Set
24
; select the initial point
. If
, stop; else, set
. If
; go to step 3.
, stop.
Example
Proposition 10.1: In the conjugate gradient algorithm, the
directions
are -conjugate.
Consider the quadratic function
We find the minimizer using the conjugate gradient algorithm,
using the starting point
We can represent as
where
25
Example
We have
Hence,
26
Example
Hence,
27
Proof of Proposition 10.1
We use induction. We first show that
.
Substituting for
we see that
Assume that
, are -conjugate
directions. From Lemma 10.2 we have
Thus,
is orthogonal to each of the directions
We now show that
28
Proof of Proposition 10.1
Fix
. We have
Substituting this equation into the previous one yields
Because
, it follows that
We are now ready to show that
We have
If
, then
, by virtue of the induction
hypothesis. Hence, we have
29
Proof of Proposition 10.1
But
Thus,
It remains to show
Using the expression for
completes the proof.
30
. Because
. We have
, we get
, which
The Conjugate Gradient Algorithm for Nonquadratic
Problems
The algorithm can be extended to general nonlinear functions
by interpreting
as a second-order Taylor
series approximation of the objective function.
For a quadratic, the matrix , the Hessian of the quadratic, is
constant. However, for a general nonlinear function the Hessian
is a matrix that has to be reevaluated at each iteration of the
algorithm.
Observe that appears only in the computation of the scalars
and . Because
can be replaced
by a numerical line search procedure, we need only concern
ourselves with the formula for .
31
The Conjugate Gradient Algorithm for Nonquadratic
Problems
Hestenes-Stiefel Formula. Replacing the
by the term
. The two terms are equal in the quadratic case.
. Premultiplying both sides by ,
subtracting from both sides, and recognizing that
we get
, which we can rewrite as
.
Therefore, the Hestenes-Stiefel Formula
32
The Conjugate Gradient Algorithm for Nonquadratic
Problems
Polak-Ribiere Formula. Starting from the Hestenes-Stiefel
formula, we multiply out the denominator to get
By Lemma 10.2,
and premultiplying this by
. Also, since
, we get
where once again we used Lemma 10.2. Hence, we get the
Polak-Ribiere formula
33
The Conjugate Gradient Algorithm for Nonquadratic
Problems
Flether-Reeves Formula. Starting with the Polak-Ribiere
formula, we multiply out the numerator to get
We now use
, which we get by using the equation
and applying Lemma 10.2. This leads to the Fletcher-Reeves
formula
34
The Conjugate Gradient Algorithm for Nonquadratic
Problems
Without the Hessian matrix , all we need are the objective
function and gradient values at each iteration. For the quadratic
case the three expressions for are exactly equal.
A very important issue in minimization problems of
nonquadratic functions is the line search. If the line search is
known to be inaccurate, the Hestenes-Stiefel formula for is
recommended.
35
Homework 2
Exercises 8.3, 8.15
Exercises 9.1
Exercises 10.9
Hand over your homework at the class of Apr. 27.
36