Lecture C12 Krylov subspace methods Part I A motivating example

A motivating example
Consider the matrix

 


−1
0 2 1
1 1 1 1 0 0
1 1 1
A = −1 3 1 = 0 1 1 0 2 0 0 1 1
−2 2 3
1 0 1 0 0 3
1 0 1
Lecture C12
Krylov subspace methods
Part I
The characteristic polynomial is
Carl Christian Kjelgaard Mikkelsen
p(t) = (t − 1)(t − 2)(t − 3) = t 3 − 6t 2 + 11t − 6
Department of Computing Science and HPC2N
Umeå University
and
p(A) = A3 − 6A2 + 11A − 6I = 0
October 2014
It follows, that the inverse matrix
A−1 =
is a polynomial in A!
1 2
A − 6A + 11I
6
1 / 22
Another motivating example!
Let A = I − E where
then A is nonsingular and
A
−1
2 / 22
Motivation
kE k = ρ < 1
= (I − E )
−1
=
∞
X
E
Conclusion
It certainly appears plausible that the solution of a linear system
j
j=0
Ax = b
is a power series in A and the polynomial
pm (A) =
m−1
X
j=0
Ej =
m−1
X
j=0
can be computed or at least approximated as
(I − A)j
x = p(A)b
satisfies
for a suitable polynomial p.
∞
∞
X j X
ρm
−1
j
≤
E
ρ
=
kA − pm (A)k = 1−ρ
j=m j=m
In short, A−1 can be approximated by a polynomial in A!
3 / 22
4 / 22
The motivation for Krylov subspace methods
Motivation for preconditioners
In general, low order approximations are not sufficient, unless ...
Theorem
Let A ∈ Rn×n be nonsingular and let b ∈ Rn . Then there exists a
real polynomial p
m−1
X
αj t j
p(t) =
j=0
such that
x = p(A)b =
m−1
X
αj Aj b
j=0
Remark
In general, m = n, but perhaps we can approximate
x = A−1 b ≈ q(A)b
where q is a polynomial of lower degree?
5 / 22
Motivation for preconditioners
6 / 22
Motivation for preconditioners
In general, low order approximations are not sufficient, unless ...
In general, low order approximations are not sufficient, unless ...
Preconditioning
If M is nonsingular, then
Preconditioning
If M is nonsingular, then
Ax = b
⇔
M −1 Ax = M −1 b
Ax = b
Perhaps we can approximate
⇔
M −1 Ax = M −1 b
Perhaps we can approximate
x = A−1 b = (M −1 A)−1 (M −1 b) ≈ w (M −1 A)M −1 b
x = A−1 b = (M −1 A)−1 (M −1 b) ≈ w (M −1 A)M −1 b
where w is a polynomial of lower degree?
where w is a polynomial of lower degree?
Definition
The matrix M is the preconditioner and the system
M −1 Ax = M −1 b
6 / 22
is the preconditioned system.
6 / 22
Krylov subspaces
Krylov subspaces
Definition
Definition
Rn×n
Let A ∈
and let b ∈
is defined as
Rn .
Let A ∈ Rn×n and let b ∈ Rn . Then the Krylov subspace Kj (A, b)
is defined as
Kj (A, b) = span b, Ab, A2 b, . . . , Aj−1 b ⊆ Rn .
Then the Krylov subspace Kj (A, b)
Kj (A, b) = span b, Ab, A2 b, . . . , Aj−1 b ⊆ Rn .
Theorem
Let A ∈ Rn×n and let f ∈ R. Then the Krylov subspaces
Kj (A, b) = span Ai b : i = 0, 1, 2, . . . , j − 1
R
form a monotone increasing sequence of vector spaces and
n
K (A, b) := ∪∞
j=1 Kj (A, b) ⊆ R
is the smallest A invariant subspace which contains b.
7 / 22
Krylov subspace methods
7 / 22
Numerical considerations
Krylov subspace methods for the linear system
Remarks
1 If A has a dominant eigenvector, then it is likely that
Wj = b Ab A2 b . . . Aj−1 b
Ax = b
attempt to approximate the solution x = A−1 b with
xj ∈ x0 + Kj (A, r0 ),
r0 = b − Ax0
is very, very ill conditioned for large values of j (Why?)
2
where the formula used, i.e. the specific choice of polynomials,
xj = pj (A)r0 =
j−1
X
We need orthonormal vectors {vi }ji=1 such that
Kj (A, b) = span{v1 , v2 , . . . , vj }
(j)
αi Ai r0 ,
in order to avoid numerical problems.
i=0
3
depends on the method. They are iterative methods, computing
and evaluating each approximation xj one at a time.
8 / 22
The vectors vj can be computed using the Arnoldi algorithm,
which we will cover during the next lecture.
9 / 22
Standard Krylov subspace methods
CG algorithm
Let r0 denote the initial residual,
Krylov subspace methods for the linear system
r0 = b − Ax0 .
Ax = b
where x0 denotes your initial guess for x.
include
1
CG: The Conjugate Gradient algorithm applies to systems
where A is symmetric positive definite (SPD).
2
GMRES: The Generalized Minimal RESidual algorithm is the
first method to try if A is not SPD.
3
BiCG: The BiConjugate Gradient algorithm applies to general
linear systems, but the convergence can be quite erratic.
Abstract characterization of CG
The CG algorithm returns approximations
xj ∈ x0 + Kj (A, r0 ),
j = 0, 1, . . . ,
such that
kx − xj kA = min k(I − Aq(A))(x − x0 )kA
q∈Pj−1
They are implemented in MATLAB as pcg, gmres, bicg.
where
Personal observation
In general, they are all equally good or equally bad. The choice of
the preconditioner is much more critical!
kxkA :=
√
x T Ax
is the norm induced by the matrix A.
10 / 22
CG algorithm
11 / 22
Convergence of CG algorithm
Applies to linear system
Ax = b
Theorem
The jth iterate xj satisfies
where A is symmetric positive definite.
#j
"p
κ(A) − 1
kx − xj kA ≤ 2 p
kx − x0 kA
κ(A) + 1
Algorithm 1 CG algorithm
1: r0 := b − Ax0 , p0 := r0
2: for j = 0, 1, . . . , until convergence do
3:
αj = (rj , rj )/(Apj , pj )
4:
xj+1 := xj + αj pj
5:
rj+1 := rj − αj Apj
βj := (rj+1 , rj+1 )/(rj , rj )
6:
7:
pj+1 := rj+1 + βj pj
8: end for
We require inner products (x, y ) = x T y and matrix vector
products x → Ax.
where x is the solution of Ax = b, x0 is the initial guess, and
κ(A) = kAk2 kA−1 k2 =
σmax (A)
,
σmin(A)
(x, y )A = (Ax, y ).
Proof.
Saad, section 6.11.3.
12 / 22
13 / 22
Preconditioning for CG, Part I
Preconditioning for CG, Part II
If M = LLT can be found, such that
1 the SPD matrix
A′ = L−1 AL−T
MATLAB proceeds in manner which is mathematically equivalent.
In general, if M is SPD, then
has condition number
A = M −1 A
1 ≤ κ2 (A′ ) ≪ κ2 (A),
2
is not symmetric,
the action of A′ , i.e. the map
but A is symmetric with respect to the inner product
x → A′ x
can be computed quickly or equivalently, systems of the form
Ly = g
and
(x, y )M = (Mx, y ) = (x, My )
LT z = h
can be solved quickly,
then it would be extremely sensible to apply CG to the system
L−1 AL−T (LT x) = L−1 b.
the PCG algorithm is CG adapted to this inner product!
dot products, linear updates, action of A and M −1
implemented in MATLAB as pcg.
14 / 22
PCG algorithm in MATLAB
15 / 22
MATLAB example
First preconditioner to try
L=ichol(A)
>>
>>
>>
>>
Run CG algorithm on M −1 Ax = M −1 b, M = LLT
[x,flag,relres,it,resvec] = pcg(A,b,tol,maxit,L,L’)
N=100; G=numgrid(’S’,N+2); A=delsq(G);
b=ones(N^2,1);
L=ichol(A);
[x,flag,relres,it,resvec] = pcg(A,b,1e-16,100,L,L’);
relres is the relative residual for the final approximation x.
2
it is the number of completed iterations
log10 relative residual
0
resvec is the 2-norm of the computed residuals
tol is the tolerance; stops if
krj k2 ≤ tolkr0 k2
−2
−4
−6
−8
−10
Warning
You have to compute the residuals for the original system!
−12
0
16 / 22
Preconditioning
No preconditioning
10
20
30
40
50
60
number of iterations
70
80
90
100
17 / 22
GMRES algorithm
Convergence of GMRES
Theorem
Let A be such that kI − Ak2 = ρ < 1. Set rj = b − Axj . Then
1+ρ
ρj kr0 k2
krj k ≤
1−ρ
Let r0 denote the initial residual,
r0 = b − Ax0 ,
where x0 is your initial guess for x.
Abstract characterization of GMRES
The GMRES algorithm returns approximations {xj } such that
xj ∈ x0 + Kj (A, r0 ),
Proof.
Definition of GMRES and analysis of Jacobi iteration.
r0 = b − Ax0 ,
Remark
We want M such that M −1 A = I − E , where E = I − A satifies
minimizes the corresponding residual, specifically
kE k2 = ρ < 1
kb − Axj k2 = min{kb − Axk2 : x ∈ x0 + Kj (A, r0 )},
and then we will apply GMRES to
which ensures that the residuals are monotonically decreasing.
M −1 Ax = M −1 b.
18 / 22
GMRES in MATLAB
19 / 22
Summary
First preconditioner to try
M=ilu(A)
1
Krylov subspace methods for Ax = b hinge on the fact that
A−1 b = p(A)b
Run the GMRES algorithm on M −1 Ax = M −1 b
[x,flag,relres,it,resvec] = gmres(A,b,[],tau,maxit,M)
2
A is the matrix
for at least one polynomial p.
They build approximations xj of the form
xj ∈ x0 + Kj (A, r0 ),
b is the right hand side
3
tau is the tolerance; stop if
r0 = b − Ax0
The Krylov subspaces are given by
Kj (A, r0 ) = span{r0 , Ar0 , A2 r0 , . . . , Aj−1 r0 }
kM −1 rj k2 ≤ τ kM −1 r0 k2
4
maxit is the maximum number of iterations
M is the preconditioner
The preconditioner M is all critical and apply our Krylov
subspace methods to the preconditioned linear system,
M −1 Ax = M −1 b,
Warning
You have to compute the residuals for the original system!
which is equivalent to Ax = b.
20 / 22
21 / 22
Further reading
1
Short lecture notes by Robert Skeel available at
https://www.cs.purdue.edu/homes/skeel/CS515/
deal with numerical linear algebra in general and have many
review questions.
2
Skeel’s chapters 8 and 9 deal with Krylov subspace methods
and point into:
3
Youssef Saad’s book
“Iterative methods for sparse linear systems”
is freely available at
http://www-users.cs.umn.edu/~saad/PS/
22 / 22