Lecture 2

Selected Numerical Methods
Part 2: iterative methods
for nonlinear equations
Roberto Ferretti
• General geatures of iterative methods
• Methods for scalar equations: fixed–point iterations
• Methods for scalar equations: Newton’s method and its derivations
1
General features of iterative methods
In making an iterative method
xk+1 = T (xk )
(1)
work ptoperly, we need several ingredients:
• Approximate location of the solution x̄ – most iterative methods
have a local convergence
• Suitable definition of the function T (·) – it must be a contraction
• Suitable definition of the stopping criterion – that is, a correct
estimation of the error at a given step
2
Approximate location of the solution: expecially in nonlinear cases,
we do not expect that the equation (or system) may have a unique
solution, so the mapping T (·) cannot in general be a contraction on
the whole of Rn. Typically, it may occur that
• The definition of T itself depends on the neighbourhood we choose
• The definition of T does not depend on the neighbourhood, but its
convergence does (as in Newton’s method)
3
Definition of the iteration function T (·): this is the core of the
numerical theory, in practice the requirements are
• T must be a contraction, at least in the neighbourhood of the
solution
• The error should have a fast decrease, so that the required accuracy
is achieved with a low computational complexity
• The construction of T should not require complex informations
(even derivatives may not be explicitly known)
4
Error kxk − x̄k: it can be bounded in two different forms:
• If it is possible to give a bound on the initial error kx0 − x̄k, then
kxk − x̄k ≤ LT kxk−1 − x̄k ≤ · · · ≤ LkT kx0 − x̄k
• If not, from the updating kxk − xk−1k the error can be estimated as
kxk − x̄k ≤ kxk+1 − xk k + kxk+2 − xk+1k + · · · ≤
LT
≤ LT kxk − xk−1k + L2
kx
−
x
k
+
·
·
·
≤
k
k−1
T
1−LT kxk − xk−1 k
• In defining a suitable stopping criterion for iterations, it is also
necessary to take into account the residual |f (x)| of the equation
5
|xk − x̄| small, |f (x)| large
|xk − x̄| large, |f (x)| small
6
Estimating the error at the k–th step as a function of the error at the
(k − 1)–th step,
kxk − x̄k ≤ LT kxk−1 − x̄k ≤ · · · ≤ LkT kx0 − x̄k
shows that in a fixed-point iterative method the Lipschitz constant
of T should be kept as low as possible
• Working in a neighbourhood of x̄ is crucial for obtaining a small
Lipschitz constant
• Convergence is (at least) exponential, but can still be very slow if
LT ≈ 1
7
In some case, this behaviour can be improved: we define order of
convergence of a method the largest exponent γ such that
kxk − x̄k ≤ Ckxk−1 − x̄kγ
• In methods based on a contraction we typically have C = LT and
γ = 1, but the interest is towards methods for which γ > 1 (as a rule,
a higher order implies a faster reduction of the error for the method)
• The case γ > 1 correspond to a convergence speed faster then
exponential
8
Example: kx0 − x̄k = 0.1, errors for a linear and a quadratic method
iter.
γ = 1, C = 0.1
γ = 2, C = 1
0
0.1
0.1
1
0.01
0.01
2
0.001
0.0001
3
0.0001
0.00000001
9
The reduction of the error depends more strongly on the exponent
γ then on the constsnt C. This motivates the effort in constructing
methods for which γ > 1: in particular, ”superlinear methods” (secants, Muller) for 1 < γ < 2 and ”quadratic methods” (Newton) for
γ=2
indice
10
Methods for scalar equations: fixed–point iterations
In one dimension, we rewrite methods in the form (1) as
xk+1 = g(xk )
(g : R → R)
(2)
• The contractivity condition becomes |g 0(x)| ≤ Lg < 1
• In one dimension it is possible to give a graphic interpretation of
the construction and possible convergence of the sequence xk , since
the solution is the intersection of the graphs

y = x
y = g(x)
11
0 < g 0(x) < 1
−1 < g 0(x) < 0
12
1 < g 0(x)
g 0(x) < −1
13
A standard way to set the scalar equation f (x) = 0 in fixed–point
form is to rewrite it as
x = x + α(x)f (x)
(3)
• For this form to be equivalent to the original equation, the function
α(x) must not have zeroes in the neighbourhood of x̄.
• Possibly, α(x) ≡ α (a nonzero constant)
• Using (3) to define an iterative method, it is usually assumed that
x̄ is a simple root (actually, if x̄ is a multiple root, we have g 0(x̄) = 1
and therefore g cannot be a contraction)
14
Starting with the case α(x) ≡ α, the ideal situation for convergence
would be to have
g 0(x̄) = 1 + αf 0(x̄) = 0
since in this case the contraction coefficient may be arbitrarily small,
if restricted to a sufficiently small neighbourhood of x̄.
• x̄ being unknown (also the explicit expression of f 0 might not be
available), the constant α should be an approximation of the optimal
value ᾱ = −1/f 0(x̄)
15
• A first possibility is to replace f 0(x̄) with the incremental ratio of
f computed on a (sufficiently small) interval [a, b] containing x̄: the
resulting method is
b−a
f (xk )
xk+1 = xk −
f (b) − f (a)
• A second possibility is to replace f 0(x̄) with f 0(x0), provided x0 is
close enough to x̄:
1
xk+1 = xk − 0
f (xk )
f (x0)
• Theory confirms that both methods are convergent if f ∈ C 1,
f 0(x̄) 6= 0 and a, b, x0 are close enough to x̄
16
The choice of α may be made rigorous by means of the following
result, giving the order of convergence of a method in the form (2):
• If g ∈ C m+1 and g 0(x̄) = · · · = g (m)(x̄) = 0, g (m+1)(x̄) 6= 0, then the
method converges with order m + 1 if x0 is sufficiently close to x̄
• In particular, if g(x) is in the form (3) with α(x) = α (constant), then
if α = −1/f 0(x̄) convergence is quadratic (in general, such a structure
of the method cannot ensure convergence with order greater then 2)
index
17
Methods for scalar equations: Newton’s method and
its derivations
Newton’s method is obtained using the form (3), with α(x) = −1/f 0(x):
f (x )
xk+1 = xk − 0 k
f (xk )
• We must assume to know the explicit expression of the derivative
• If f ∈ C 2 and f 0(x̄) 6= 0, then
f 0(x̄)2 − f (x̄)f 00(x̄)
0
g (x̄) = 1 −
=0
0
2
f (x̄)
and the method converges with quadratic order if x0 is close enough
to x̄
18
• The approximation xk+1 is the
zero of the tangent to the graph
of f in (xk , f (xk ))
• Vanishing of the derivative around
x̄ shoud be avoided (in such a case,
the tangent would become parallel
to the x axis)
19
• It can be proved that Newton’s method has monotone, global
convergence in any of the following cases:



f (x) increasing & convex on [x̄, x ]


0


x
>
x̄,
0


f (x) decreasing & concave







f (x) increasing & concave on [x , x̄]


0


x
<
x̄,

0

f (x) decreasing & convex

• In case of zeroes of multiplicity m > 1, the method can be corrected
so as to preserve quadratic convergence
20
If the expression of the derivative is unknown, the secant method
replaces the computation of f 0(x) with the incremental ratio between
the abscissas xk−1 and xk , obtaining thus the scheme
xk − xk−1
xk+1 = xk −
f (xk )
f (xk ) − f (xk−1)
• The explicit expression of the derivative is not required, and moreover f is still computed once each iteration
• If f ∈ C 2 and f 0(x̄) 6= 0, convergence is superlinear, with exponent
√
1+ 5
γ=
≈ 1.618
2
provided x0 and x1 are close enough to x̄
21
• the approximation xk+1 is the
zero of the line through the points
(xk , f (xk )) and (xk−1, f (xk−1))
• As in Newton’s method, it is required that x̄ is a simple zero of
the function f
• Monotone convergence holds under the same conditions as in Newton’s method
22
In Muller’s method the same idea of the secant method is used by
defining xk+1 as a zero of the so–called interpolating polynomial of
f ) with degree n = 2 passing in the points (xk , f (xk )), (xk−1, f (xk−1))
and (xk−2, f (xk−2))
• Suitable strategies allow to choose between the two roots, and it
is possible to converge to complex roots (for this reason, it is often
used for algebraic equations)
• The method converges with order γ ≈ 1.84 if f ∈ C 3 and x0, x1 and
x2 are close enough to x̄
23
Comparison among the various iterative methods:
• In order to apply Newton’s method the function and its derivative
should be known in closed explicit form; with this limitation, the
method is very efficient
• Whenever f is not explicitly known, the most efficient methods are
secants and Muller, if the function f is sufficiently smooth
• In lack of regularity for f , other techniques must be applied (typically, bisection)
24
scheme
complexity
order
regolarity
bisection
comp. of f (x)
γ=1
C0
fixed point
comp. of f (x)
γ=1
C1
secants
comp. of f (x)
γ ≈ 1.62
C2
Muller
comp. of f (x)
γ ≈ 1.84
C3
Newton
comp. of f (x), f 0 (x)
γ=2
C2
indice
25