LMBOPT - Fakultät für Mathematik

LMBOPT – A limited memory method
for bound-constrained optimization
Arnold Neumaier
Behzad Azmi
Fakultät für Mathematik, Universität Wien
Nordbergstr. 15 , A-1090 Wien, Austria
email: [email protected]
WWW: http://www.mat.univie.ac.at/∼neum/
/matlab/opt/LM/lmbopt1.tex
Preliminary version, June 27, 2014
confidential – do not distribute
Abstract.
This is an early draft of a paper on the bound constrained optimization problem solver
LMBOPT and the theory on which it is based.
This draft is incomplete and may still contain major errors.
Contents
1 Introduction
2
2 Smooth bound-constrained optimization problems
2
3 The line search
3
4 The bent search path
7
5 Convergence
9
6 Subspace steps
11
1
7 Implementation issues
12
8 Numerical results
15
1
Introduction
2
Smooth bound-constrained optimization problems
{s.problem}
Let f : C ⊆ Rn → R be a continuously differentiable function with gradient
g(x) := f ′ (x)T ∈ Rn .
We consider the bound-constrained optimization problem
min f (x)
s.t. x ∈ x := {x ∈ Rn | x ≤ x ≤ x}.
(1) {e.bopt}
Here one-sided or missing bounds are accounted for by allowing the values −∞ for components of x and ∞ for components of x.
To have a well-defined optimization problem, the box x must be part of the domain C of
definition of x. (In practice, one may allow a smaller domain of definition if f satisfies
the coercivity condition that, as x approaches the boundary of the domain of definition,
f (x) exceeds the function value f (x0 ) at a known starting point x0 .)
f is called the objective function. A point x is called feasible if it belongs to the box
x. Given a feasible point x and an index i, we call the bound xi or xi active if xi = xi or
xi = xi , respectively. In both cases, we also call the index i and the component xi active.
Otherwise, i.e., if xi ∈ ]xi , xi [, the index i, the component xi , and the bounds xi and xi are
called nonactice or free. We write
I(x) := {i | xi < xi < xi }
(2) {e.I}
for the set of free indices.
If the gradient g = g(x) has a nonzero component gi at a nonactive index i, we may change
xi slightly without leaving the feasible region. The value of objective function is reduced
by moving slightly to smaller or larger values depending on whether gi > 0 or gi < 0,
respectively. However, if xi is active, only changes of xi in one direction is possible without
losing feasibility. The value of objective function can then possibly be reduced by moving
slightly in the feasible direction only when
gi ≤ 0 if xi = xi ,
(3) {e.wa}
gi ≥ 0 if xi = xi ,
2
and a decrease is guaranteed only if the slightly stronger condition
gi < 0 if xi = xi ,
gi > 0 if xi = xi
(4) {e9.03}
holds. We say that the active variable xi is weakly active if (3) holds, and strongly
active if (3) is violated. Thus changing a single strongly active variable only cannot lead
to a better feasible point. If (4) holds, we call the active index i freeable and say that the
variable xi can be freed from its bound.
Combining the various cases, we see that a decrease is always possible unless the reduced
gradient gred (x) at x, with components

0
if xi = xi = xi ,


min(0, gi (x)) if xi = xi < xi ,
gred (x)i :=
(5) {e9.09}

 max(0, gi (x)) if xi = xi > xi ,
gi (x)
otherwise,
vanishes. A feasible point x with gred (x) = 0 is called a stationary point of the optimization problem (1). By the above, a local minimizer x of (1) must be a stationary
point. This statement is a concise expression of the first order optimality conditions
for bound-constrained optimization.
Note that although the gradient is assumed to be continuous, the reduced gradient need
not be continuous, since it may change abruptly when a bound becomes active. If no bound
is active, gred (x) = g(x). The set of free or freeable indices can be written in terms of the
reduced gradient as
I+ (x) := {i | xi < xi < xi or (gred )i 6= 0}.
(6) {e.I+}
3
The line search
{s.LS}
Our optimization method improves an initial feasible point x0 by constructing a sequence
x0 , x1 , x2 , . . . of feasible points with decreasing function values. To ensure this, we search in
each iteration along an appropriate search path x(α) with x(0) = xℓ , and take xℓ+1 = x(αℓ )
where αℓ is determined by a line search. If the iteration index ℓ is fixed, we simply write x
for the current point xℓ .
A line search proceeds by searching points x(α) on a curve of feasible points parameterized
by a step size α > 0 starting at the current point x = x(0). The goal is to find a
value for the step size such that f (x(α)) is sufficiently smaller than f (x), with a notion of
”sufficiently” to be made precise. If the gradient g = g(x) is nonzero, the existence of such
an α > 0 is guaranteed if the tangent vector p := x′ (0) exists and satisfies
g T p < 0.
(7) {131}
In the unconstrained case, the curve is frequently taken to be a ray from x in direction p,
giving x(α) = x + αp.
3
A good and computationally useful measure of progress of a line search is the Goldstein
quotient
f (x(α)) − f (x)
for α > 0.
(8) {133’}
µ(α) :=
αg T p
µ can be extended to a continuous function on [ 0, ∞] by defining µ(0) := 1 since, by
l’Hôpital’s rule,
f ′ (x(α))x′ (α)
f ′ (x)x′ (0)
lim µ(α) = lim
=
= 1.
α→0
α→0
gT p
gT p
By assumption (7) we have f (x(α)) < f (x) iff α > 0 and µ(α) > 0. Restrictions on the
values of the Goldstein quotient define regions where sufficient descent is achieved. We
consider here the sufficient descent condition
µ(α)|µ(α) − 1| ≥ β
(9) {e.Goldfast}
with fixed β > 0. This condition requires µ(α) to be sufficiently positive, forcing f (x(α)) <
f (x) and forbidding steps that are too long, and to be not too close to 1, forbidding steps
that are too short. Hence satisfying the condition promises to deliver a sensible decrease
in the objective function. A step size with (9) can be found constructively, at least when
the objective function is bounded below.
3.1 Theorem. Let β ∈ ]0, 14 [, g T p < 0. If f (x(α)) is ubounded below as α → ∞ then the
equation µ(b
α) = 12 has a solution α
b > 0, and any α sufficiently close to α
b satisfies (9).
Proof. Let f := inf f (x(α)) and µ0 := inf µ(α). If µ0 > 0 then (8) implies for α > 0 the
α≥0
α≥0
inequality
f − f (x) ≤ f (x(α)) − f (x) = αg T pµ(α) ≤ αg T pµ0 ,
(10) {e.ub}
but since g T p < 0, this is impossible for sufficiently large α. Therefore µ0 ≤ 0. By
continuity, µ(b
α) = 21 has a solution α
b > 0. Since µ(b
α)|µ(b
α) − 1| = 41 > β, (9) holds for all
α sufficiently close to α
b.
⊓
⊔
Near a local minimizer, twice continuously differentiable functions are bounded below and
almost quadratic. For a linear search path and a quadratic function that is bounded below,
α2
α2 T
f (x + αp) = f (x) + αg(x) p + p G(x)p =: f + αa + b with a < 0 < b,
2
2
T
(11) {135b}
so that µ(α) = 1 + αb/2a is linear in α. Thus
µ(α) < 1,
α
a
=− >0
2(1 − µ(α))
b
for all α > 0. The minimizer along the search line can therefore be computed from any α,
α
b=
α
2(1 − µ(α))
4
(12) {e.alpLin}
giving µ(b
α) = 21 . In general, we therefore use the formula (12) after the first function
evaluation. If the resulting value for µ does not satisfy the sufficient descent condition (9),
the function is far from quadratic, and we proceed with a simple bisection scheme: We use
extrapolation by constant factors towards 0 or ∞ until we have a nontrivial bracket for α
b;
then we use geometric mean steps since the bracket may span several orders of magnitude.
However, we quit the line search once the stopping test is satisfied, and accept as final step
size the α with the best function value.
3.2 Algorithm. Curved line search (CLS)
{a.cls}
Purpose: find a step size α with µ(α)|µ(α) − 1| ≥ β
Input: x(α) (search path), f0 = f (x(0)), d0 = g(x(0))T x′ (0), αinit (initial step size)
Requirements: d0 < 0, αinit > 0
Parameters: β ∈ ]0, 14 [, q > 1
start=true;
α = 0; α = ∞;
α = αinit ; µ = (f (x(α)) − f0 )/(αd0 );
while start or µ|µ − 1| < β,
if µ ≥ 21 , α = α;
else
α = α;
end;
if start,
start=0;
if µ < 1, α = 12 α/(1 − µ);
else
α = αq;
end;
else
if α = ∞, α = αq;
elseif α = 0, α = √
α/q;
else
α = α α;
end;
end;
µ = (f (x(α)) − f0 )/(αd0 );
end;
return the α with the best f (x(α)) found.
Note that this idealized line search needs an extra stopping test to end after finitely many
steps when f is unbounded below along the search curve.
In practice, one may use a small value such as β = 0.02, and a large value of q such as
q = 50. The best values depend on the particular algorithm calling the line search, and
must be determined by calibration on a set of test problems.
The sufficient descent condition (9) is closely related to the so-called Goldstein condition
f (x) − αµ′′ g T p ≤ f (x(α)) ≤ f (x) − αµ′ g T p,
5
(13) {138g}
where 0 < µ′ < µ′′ < 1. Indeed, (13) is equivalent to
µ′ ≤ µ(α) ≤ µ′′ ,
(14) {138}
hence (9) holds with
β = min(µ′ (1 − µ′ ), µ′′ (1 − µ′′ )) > 0.
Conversely, (9) implies that either (13) holds with
2β
√
,
µ =
1 + 1 − 4β
′′
′
µ =
1+
√
1 − 4β
,
2
or the alternative fast descent condition
µ(α) ≥ µ′′′
holds with
′′′
µ =
1+
(15) {138f}
√
1 + 4β
.
2
The Goldstein condition (14) can be interpreted geometrically by drawing in the graph of
f (x(α)) the cone defined by the two lines through (0, f ) with slopes µ′ g T p and µ′′ g T p. This
cone cuts out a section of the graph, which defines the admissible step size parameters.
The Goldstein condition allows only a tiny and inefficient range of step sizes in cases where
for small α, the graph of f (x(α)) is concave and fairly flat, while for larger α, f (x(α)) is
strongly increasing. This is one of the reasons why most currently used line searches are
based on the so-called Wolfe condition, which needs gradient evaluations during the line
search.
Our line search is gradient-free but avoids the defects of the Goldstein condition. Indeed, in
the above cases, the range allowed by (9) is considerably larger than that of the Goldstein
condition, since it includes the values where (15) holds. Our line search is even more liberal
as it accepts the step size with best known function value once a step size satisfying (9)
was found; this step size may itself violate (9).
{t132}
3.3 Theorem.
Suppose that the objective function has a Lipschitz continuous gradient, i.e.,
kg(y) − g(x)k2 ≤ γky − xk2
for x, y ∈ Rn .
(16) {e.lcg}
If the search path is a ray along the direction p then
(f (x) − f (x(α′ ))kpk22
2β
≥
.
T
2
(g(x) p)
γ
(17) {135c}
holds for any step size α′ with f (x(α′ )) ≤ f (x(α)) for some α > 0 satisfying the sufficient
descent condition (9).
6
Proof. By assumption, x(α) = x + αp. The function ψ defined by
ψ(α) := f (x + αp) − αg(x)T p
satisfies
ψ ′ (α) = g(x + αp)T p − g(x)T p = (g(x + αp) − g(x))T p.
The Cauchy-Schwarz inequality gives
|ψ ′ (α)| ≤ kg(x + αp) − g(x)k2 kpk2 ≤ γkαpk2 kpk2 = γα kpk22 ,
hence
Z α
ψ ′ (α)dα
|f (x + αp) − f (x) − αg(x) p| = |ψ(α) − ψ(0)| = 0
Z α
Z α
γα2
γαkpk22 dα =
|ψ ′ (α)|dα ≤
kpk22 .
≤
2
0
0
T
Therefore
2
2 f (x + αp) − f (x) − αg(x)T p kpk22
=
≥
|µ(α) − 1|.
|g(x)T p|
αγ
αg(x)T p
αγ
On the other hand, since g(x)T p < 0,
f (x(α)) − f (x)
f (x) − f (x(α))
=
= αµ(α).
T
|g(x) p|
g(x)T p
Taking the product, we conclude that
2µ(α)|µ(α) − 1|
2β
2
(f (x) − f (x(α′ )) kpk22
≥
.
≥ αµ(α) |µ(α) − 1| =
T
2
(g(x) p)
αγ
γ
γ
⊓
⊔
4
The bent search path
{s.bent}
For an arbitrary point x ∈ Rn , we define its feasible projection π[x] by
π[x]i := max(xi , min(xi , xi )) =
(
xi
xi
xi
if xi ≤ xi ,
if xi ≥ xi ,
otherwise.
(18) {e.projB}
For solving the bound constrained optimization problem (1), we do each line search along
a bent search path
x(α) := π[x + αp],
(19) {bentpath}
7
obtained by projecting the ray x + αp (α ≥ 0) into the feasible set. The bent search path
is piecewise linear, with breakpoints at the elements of the set
xi − xi
xi − xi
: pi > 0 ∪
: pi < 0 \ {0, ∞}.
S :=
pi
pi
If the breakpoints α1 , . . . , αm are ordered such that
0 = α0 < α1 < . . . < αm < αm+1 = ∞,
the bent search path is linear on each interval [αi−1 , αi ] (i = 1, ..., m + 1). By setting some
entries of p to zero if necessary, we may assume that
x(α) = x + αp for 0 ≤ α ≤ α1 .
This is equivalent to requiring
pi ≥ 0 if xi = xi ,
pi ≤ 0 if xi = xi .
(20) {e.p1}
Algorithms unable to identify the set of optimal active bound constraints may free and fix
the same subset of variables alternatively in a large number of successive iterations. Such a
zigzagging behaviour is a major cause of inefficiency that must be avoided as far as possible.
The most useful way to prevent this zigzagging behaviour is to control the conditions under
which the variable are freed. This is done in the following algorithmic scheme.
{a.bopt}
4.1 Algorithm. (for bound constrained optimization)
Given a feasible initial point x0 .
x = x0 ; I = I+ (x);
while gred (x) 6= 0,
If I = ∅, update I = I+ (x); % freeing step
Update x by performing the line search of Algorithm 3.2
along a bent search path satisfying B1–B2;
update I = I(x);
if no new bound has been fixed but some variable can be freed,
update I = I+ (x); % freeing step;
end;
end;
Suppose that in the previous step no variable has been fixed in one of its bounds, now we
have to make a decision between performing a freeing step or an iteration step. In order to
prevent the zigzagging behaviour, first we perform a iteration step. Then we check whether
a new variable became active. If a variable has been active, we repeat the iteration step
with respect to the new subspace of nonactive variables. After that, again we check whether
a active variable can be freed. If a variable can be freed, we do a freeing step. Otherwise,
we repeat the iteration step again.
The bent search path’s tangent direction is determined by a subspace step method over the
subspace of variables corresponding to I.
8
T
4.2 Proposition. Suppose p is a feasible direction at x satisfying (20), gred
p < 0, and
pi = 0 if either xi = xi , gi > 0 or xi = xi , gi < 0.
Then x(α) := x + αp is feasible for sufficiently small α > 0, and we have f (x(α)) < f (x)
for such α.
Proof. Due to the definition of the reduced gradient and second condition,
gi pi = (gred )i pi
for all
i = 1, . . . , n.
(21)
Hence, we have
T
g T p = gred
p ≤ 0.
(22)
Consequently, for sufficiently small α > 0,
f (x(α)) = f (x + αp) = f (x) + αg T p + o(α) = f (x) + α(g T p + o(1)) < f (x).
⊓
⊔
5
Convergence
According to our convergence theory below, we impose the following conditions
B=B1 If S = ∅ or α < min S, the line search is efficient.
B2 If α ≥ min S, the condition
f (x(α)) ≤ f (x) + µ′ αg T p
(23) {descent}
is satisfied for a fixed µ′ ∈ (0, 1).
We now prove that if condition B is satisfied, the algorithm terminates at a stationary point
of the problem; consequently the first order optimality conditions hold for this point. The
convergence analysis consists of an analysis of how the activities may change, together with
an analysis of what happens in the free affine subspaces consisting of all points in which
the active variables have fixed values.
From now on, we eliminate the subscripts k of the outer iteration for simplicity and we will
write x for xk , f for fk = f (xk ), g for the gradient g(xk ), x̄ for xk+1 , f¯ for fk+1 = f (xk+1 )
and ḡ for the gradient g(xk+1 ).
{t.nozigzagb
5.1 Theorem. Suppose that all search directions p used in Algorithm 4.1 satisfy
pi = 0 if either xi = xi , gi > 0 or xi = xi , gi < 0
9
(24) {e.psub}
and the reduced angle condition
T
gred
p
≤ −δ < 0.
kgred k2 kpk2
(25) {e.rac}
If the sequence of iteration points xl (l = 0, 1, . . .) is bounded then:
l
(i) The reduced gradients gred
satisfy
l
inf kgred
k2 = 0.
l≥0
(ii) If the iteration does not stop but xl → x
b for l → ∞ then x
b satisfies the first order
optimality conditions gred (b
x) = 0, and for all i and sufficiently large l, we have
xli = x
bi = xi if gi (b
x) > 0,
xli = x
bi = xi if gi (b
x) < 0.
(26) {e.bfixl}
(27) {e.bfixu}
Proof. Everything is trivial when the algorithm stops at a stationary point; hence we may
assume that infinitely many iterations are taken. By removing the components of x fixed
by the bound constraints, we may also assume that xi < xi for all i.
(i) Suppose that the line search is efficient only finitely often then there is an integer L such
that in all iterations l > L, condition (B) is violated; in particular, the set of breakpoints is
nonempty, and at least one new bound is fixed. Thus ultimately, each iteration fixes some
new bound. But this means that ultimately, only Step 2 is executed. Since the number of
activities is finite and no bound is freed in Step 2, this can happen only a finite number of
times, contradiction.
Thus the line search is infinitely often efficient, and, by the definition of efficiency, there is
a number ρ > 0 such that infinitely often
(f − f¯)kpk22
≥ ρ.
(g T p)2
(28) {e.135}
Now by (24), we have (gred )i = gi or (gred )i = 0 ≤ gi , and in the latter case, pi = 0.
Therefore, in both cases, (gred )i pi = gi pi , and summing this gives
T
g T p = gred
p.
Since the xl are bounded and f is continuous, the f l = f (xl ) are bounded, hence fb := inf f l
is finite. Since we have a descent sequence, lim f l = fb. In the infinitely many iterations
satisfying (28), we conclude (writing −δ < 0 for the left hand side of (25)) that
gT p
|g T p|
δkgred k ≤ − red = red ≤
kpk2
kpk2
10
s
fl − fl+1
→ 0,
ρ
l
so that inf kgred
k = 0.
(ii) By continuity of the gradient, gb := g(b
x) = lim g l .
l→∞
Let i be an index i for which gbi > 0. Hence there is a number L such that gil > 0 for l > L,
and the definition (5) of the reduced gradient implies that for l > L,
0 if xli = xi ,
l
(gred )i =
gil otherwise.
l
By part (i), a subsequence of the gred
converges to zero, and we conclude that xli = xi for
infinitely many l > L. Since also gil > 0 for l > L, (4) implies that the bound cannot be
freed anymore. Thus xli = xi for all l > L and x
bi = lim xli = xi , so that (26) holds for
l→∞
l > L. Similarly, if i is an index i for which gbi < 0 then (27) holds for sufficiently large l.
Using (26), (27), and the definition of the reduced gradient, we may now conclude that
l
gred (b
x) = lim gred
= 0. Thus the first order optimality conditions hold.
⊓
⊔
Thus all strongly active variables will be ultimately fixed. As a consequence, the algorithm
is asymptotically zigzag-free for all problems whose stationary points have no activities
with zero gradient.
6
Subspace steps
{s.SS}
The solution x of a linear system
Gx = b
with symmetric, positive definite coefficient matrix G ∈ Rn×n and right hand side b ∈ Rn
may be viewed as the minimizer of the strictly convex quadratic function
1
f (x) := xT Gx − bT x + f0 ,
2
since its gradient
g(x) = Gx − b
(29) {233}
vanishes precisely at the solution. For any s ∈ Rn , we may compute the Hessian-vector
product
y := Gs = g(x + s) − g(x)
in terms of gradient information only; here the choice of x ∈ Rn is completely arbitrary. If
we have a list
S := [s1 , . . . , sm ]
of vectors sℓ for which
y ℓ := Gsℓ
11
is available, we may form the matrix
Y := [y 1 , . . . , y m ] = GS,
and find for any z ∈ Rm the subspace expansion
1
1
f (x − Sz) = f (x) − g(x)T Sz + (Sz)T GSz = f − cT z + z T Hz,
2
2
where
f := f (x),
c := S T g(x),
H := S T GS = S T Y = Y T S.
We assume that the columns of S are linearly independent; then H is positive definite. The
minimum of f (x − Sz) with respect to z is attained at
zb := H −1 c = H −1 S T g(x),
and the associated point and gradient are
x
b = x − Sb
z,
gb := g(x) − Y zb.
If we calculate y = Gs at an additional direction s 6= 0, we have the consistency relations
h := sS T Gs = Y T s = S T y,
0 < γ := sT Gs = y T s =
f (x + αs) − f − αg T s
α2 /2
for all α ∈ R. We may now form the augmented matrices
S := [S, s],
7
Y = GS = [Y, y],
T
H = S GS =
H
hT
h
γ
.
Implementation issues
In the program, the ith components of x and x are stored in low(i) and upp(i), respectively.
The initial point. Conjugate direction methods and their limited memory versions produce search directions that are linear combinations of the previous steps and the preconditioned gradient. This may lead to inefficiencies, e.g., for minimizing the quadratic function
2
f (x) := (x1 − 1) +
n
X
i=2
(xi − xi−1 )2
when started from x0 = 0 and a diagonal preconditioner is used. It is easy to see by induction that, for any method that chooses its search directions that are linear combinations of
the previously computed preconditioned gradients, the ith iteration point has zeros in all
coordinates k > i and its gradient has zeros in all coordinates k > i + 1. Since the solution
12
is the all-one vector, this implies that at least n iterations are needed to reduce the maximal
error in components of x to below one.
Thus for large-scale problems, some attention should be given to choosing the initial point
not too special, so that the gradient contains significant information about all components.
This is especially important in the bound constrained case, where the signs of gradient
components determine which variables may be freed.
The following piece of Matlab code moves a user-given initial point x slightly into the
relative interior of the feasible domain. deltax is a number in [0, 1[; choosing it as 0 just
projects the starting point into the feasible box.
% force interior starting point when deltax>0;
ind=(x<=low);
x(ind)=low(ind)+min(1,deltax*(upp(ind)-low(ind)));
ind=(x>=upp);
x(ind)=upp(ind)-min(1,deltax*(upp(ind)-low(ind)));
Avoiding zigzagging. Zigzagging is the main source of inefficiency of simple methods
such as steppest descent. Any search direction p must satisfy g T p < 0. In order to avoid
zigzagging we choose the search direction p as the vector with a fixed value g T p = −γ < 0
closest (with respect to the 2-norm) to the previous search direction.
{t512}
7.1 Theorem. Among all p ∈ Rn with g T p = −γ < 0, the distance
kp − p0 k22 = (p − p0 )T (p − p0 )
becomes minimal for
where
p = p0 − λg,
λ=
γ + g T p0
.
gT g
(30) {212}
(31) {213}
Proof. This optimization problem can be solved using Lagrange multipliers. We have to
find a stationary point of the Lagrange function
1
L(p) := (p − p0 )T (p − p0 ) + λg T p,
2
giving the condition p − p0 + λg = 0, hence (30). The Lagrange multiplier λ is determined
from the constraint g T p = −γ, and yields (31).
⊓
⊔
Enforcing the angle condition. In general, if we have a direction q, we may always
add a multiple of the gradient to enforce the angle condition for the modified direction
p = q − tg
13
(32) {e.pId}
with a suitable factor t ≥ 0; the case t = 0 corresponds to the case where q already satisfies
the bounded angle condition. The choice of t depends on the three numbers
σ1 := g T B −1 g > 0,
σ2 := q T B −1 q > 0,
σ := g T B −1 q;
these are related by the Cauchy–Schwarz inequality
σ
∈ [−1, 1].
c := √
σ1 σ2
We want to choose t such that the angle condition
p
gT p
g T B −1 g · pT Bp
≤ −δangle .
(33) {e.angle}
holds. In terms of the σi , this reads
σ − tσ1
p
≤ −δangle .
σ1 (σ2 − 2tσ + t2 σ1 )
If c ≤ −δangle , this holds for t = 0, and we make this choice. Otherwise we enforce equality.
2
Squaring, multiplying with the denominator, and subtracting δangle
(σ − tσ1 )2 gives
2
2
2
2
(1 − δangle
)(σ − tσ1 )2 = δangle
σ1 (σ2 − 2tσ + t2 σ1 ) − δangle
(σ − tσ1 )2 = δangle
(σ1 σ2 − σ 2 ),
hence
where
2
|σ − tσ1 | = δangle
w,
w :=
σ1 σ2 (1 − c2 )
.
2
1 − δangle
(To ensure that w ≥ 0 even in finite precision arithmetic, one should use max(ε, 1 − c2 ) in
place of 1 − c2 , where ε is the machine precision.) For the larger solution,
√
σ+δ w
,
t=
σ1
p satisfies the angle condition with equality.
Taking account of finite precision effects.
Stopping.
Algorithm 4.1 is appropriate for a rigorous convergence analysis. In practice, one needs
additional stopping criteria that guarantee an approximate result after finitely many steps.
While implementing the algorithm, we terminate the algorithm whenever one of the following conditions are satisfied
• The norm of reduced gradient is sufficiently small.
• A large number of iterations have been done.
14
8
Numerical results
References
15