Numerical Linear Algebra Chap. 3: Eigenvalue Problems

Numerical Linear Algebra
Chap. 3: Eigenvalue Problems
Heinrich Voss
[email protected]
Hamburg University of Technology
Institute of Numerical Simulation
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
1 / 43
Eigenvalues
λ ∈ C is an eigenvalue of A ∈ Cn×n if the homogeneous linear system of
equations
Ax = λx
has a nontrivial solution x ∈ Cn \ {0}. Then, x is called an eigenvector of A
corresponding to λ.
The set of all eigenvalues of A is called the spectrum of A and is denoted by
σ(A).
λ is an eigenvalue of A if and only if
det(A − λI) = 0.
χ(λ) := det(A − λI) is a polynomial of degree n, the characteristic polynomial
of A.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
2 / 43
Eigenvalues ct.
If λ̃ is a root of χ of multiplicity k (i.e. the poynomial χ(λ) is divisable by
(λ − λ̃)k but not by (λ − λ̃)k +1 ) then k is called the algebraic multiplicity of λ̃.
The algebraic multiplicity of λ̃ is denoted by α(λ̃).
For A ∈ Cn×n its characterictic polynomial χ has degree n. Hence, the sum of
all algebraic multiplicities of eigenvalues equals n.
If λ is an eigenvalue of A then
Eλ := {x ∈ Cn : (A − λI)x = 0}
is a subspace of Cn , which is called the eigenspace of A corresponding to λ.
γ(λ) := dim Eλ is the geometric multiplicity of an eigenvalue λ of A.
It can be shown that γ(λ) ≤ α(λ) for every eigenvalue λ.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
3 / 43
Similar matrices
Let X ∈ Cn×n be nonsingular. Then
A and B := X −1 AX
are called similar matrices. A 7→ X −1 AX is called similarity transformation.
Since
det(B − λI)
= det(X −1 (A − λI)X )
= det(X −1 ) det(A − λI) det(X ) = det(A − λI),
similar matrices have the same eigenvalues including their algebraic
multiplicities.
It can be shown that the geometric multiplicities coincide as well.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
4 / 43
Diagonalizable matrix
Let Ax j = λj x j , j = 1, . . . , k where λi 6= λj for i 6= j. Then the set {x 1 , . . . , x k }
is linearly independent.
Let x =
Pk
j=1
αj x j = 0. For j ∈ {1, . . . , k } it follows
k
Y
(A − λ1 I) · · · · · (A − λj−1 I)(A − λj+1 I) · · · · · (A − λk I)x = αj
(λj − λi )x j = 0,
i=1,i6=j
and therefore αj = 0.
In particular, if A has n different eigenvalues λj with eigenvectors x j , then
X := (x 1 , . . . , x n ) is nonsingular, and it holds
AX = (Ax 1 , . . . , Ax n ) = (λ1 x 1 , . . . , λn x n ) = X Λ
⇐⇒
X −1 AX = Λ
where Λ =: diag(λ1 , . . . , λn ) denotes a diagonal matrix with entries λ1 , . . . , λn .
Hence, A is diagonalizable, i.e. similar to a diagonal matrix.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
5 / 43
Diagonalizable matrix ct.
More generally, if for all eigenvalues λj , j = 1, . . . , k of A the algebraic and
geometric multiplies coincide (α(λj ) = γ(λj )), then choosing in each of the
eigenspaces Eλj a basis x j,1 , . . . , x j,α(λj ) , the matrix
X = (x 1,1 , . . . , x 1,α(λ1 ) , x 2,1 , . . . , x k ,α(λk ) )
is nonsingular, and it digonalizes A.
It can be shown that A is diagonalizable if and only if α(λj ) = γ(λj ) for every
eigenvalue λj of A.
For
A=
0 1
0 0
α(0) = 2 6= 1 = γ(0), and therefore not every matrix is diagonalizable.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
6 / 43
Jordan’s canonical form
Let A ∈ Cn×n with distict eigenvalues λ1 , . . . , λk .
nonsingular matrix X such that

J1
O

X −1 AX = diag(J1 , . . . , Jk ) := 

O
Then there exists a
O
J2
...
...,O
..
.
O
O
...
Jk





is a block diagonal matrix.
Each of the diagonal blocks Jj = diag(Jj,1 , . . . , Jj,γ(λj ) ) is a block diagonal
matrix of dimension α(λj ) with γ(λj ) blocks where


λj 1
... 0

.. 
 0 λj . . .
.




..
..
..
Jj,i = 

.
.
.



.
.
.
..
.. 1 
 ..
0 ...
0 λj
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
7 / 43
Hermitian matrices
A ∈ Rn×n is symmetric if A = AT . More generally, A ∈ Cn×n is a Hermitian
T
matrix if AH := A = A, where A denotes the matrix obtained from A by
replacing each of its entries by its conjugate complex.
All eigenvalues of a Hermitian matrix are real: for Ax = λx it holds
x H Ax = x H (λx) = λx H x
and x H Ax = (AH x)H x = (Ax)H x = (λx)H x = λx H x
from which we get λ = λ, i.e. λ ∈ R.
Eigenvectors of a Hermitian matrix correponding to distinct eigenvalues are
orthogonal: for Ax = λx, Ay = µy and λ 6= µ it holds
y H Ax = λy H x
and y H Ax = (AH y )H x = (Ay )H x = µy H x.
Hence, (λ − µ)y H x = 0, and λ 6= µ implies y H x = 0.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
8 / 43
Invariant subspace
A subspace V of Cn is an invariant subspace of A if Ax ∈ V for every x ∈ V .
Every invariant subspace of A contains an eigenvector of A.
Let x 1 , . . . , x k ∈ Cn be a basis of V . Then for j = 1, . . . , k there exists bij ∈ C
Pk
such that Ax j = i=1 bij x i .
Let λ be an eigenvalue of B = (bij ) ∈ Ck ×k with eigenvector ξ = (ξ1 , . . . , ξk )T ,
Pk
and let x := i=1 ξi x i 6= 0. Then
Ax =
k
X
ξj Ax j =
j=1
TUHH
Heinrich Voss
k X
k
X
j=1 i=1
ξj bij x i =
k X
k
k
X
X
(
bij ξj )x i =
λξi x i = λx.
i=1 j=1
NLA: Chap.3, Eigenvalue Problems
i=1
2006
9 / 43
Hermitian matrices are diagonalizable
Let A be a Hermitian matrix. Then there exists a unitary matrix U ∈ Cn×n (i.e.
U H U = I) such that
U H AU = diag(λ1 , . . . , λn ).
Let x 1 be an eigenvector of A such that Ax 1 = λ1 x 1 and (x 1 )H x 1 = 1. Then
for x ∈ Cn such that x H x 1 = 0 it holds
(Ax)H x 1 = x H AH x 1 = x H (Ax 1 ) = λ1 x H x 1 = 0.
Hence, V1 := {x ∈ Cn : x H x 1 = 0} is an invariant subspace of A, and
therefore it contains an eigenvector x 2 which can be normalized such that
(x 2 )H x 2 = 1.
If x 1 , . . . , x j are j orthogonal eigenvectors of A, then in the same way as before
Vj := {x 1 , . . . , x j }⊥ = {x ∈ Cn : x H x i = 0, i = 1, . . . , j}
is an invariant subspace of A, and hence there exists an eigenvector x j+1
which is orthogonal to x 1 , . . . , x j .
U = (x 1 , . . . , x n ) renders the desired property.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
10 / 43
Rayleigh’s principle
Let A ∈ Cn×n be a Hermitian matrix. Then for x 6= 0
RA (x) :=
x H Ax
xHx
is called Rayleigh quotient of A at x.
Let λ1 ≤ λ2 ≤ · · · ≤ λn be the eigenvalues of A, and let x 1 , . . . , x n be a set of
corresponding orthogonalized eigenvectors. Then it holds
λ1 = min RA (x) and λn = max RA (x).
x6=0
x6=0
for i = 1, 2, . . . , n it holds
λi
TUHH
= min{RA (x) : x ∈ Cn , x H x j = 0, j = 1, . . . , i − 1}
= max{RA (x) : x ∈ Cn , x H x j = 0, j = i + 1, . . . , n}
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
11 / 43
Proof of Rayleigh’s principle
Let x 1 , . . . , x n be an orthonormal system of eigenvectors of A ∈ Cn×n where
Ax j = λj x j .
Pn
For x ∈ Cn , x 6= 0 let x = j=1 ξj x j .
xHx
=
n
X
ξj x j
n
H X
j=1
x H Ax
=
n
X
k =1
ξj x j
H
=
j,k =1
n
X
A
j=1
n
X
n
n
X
X
ξk x k =
ξj ξk (x j )H x k =
|ξj |2
ξk x k =
k =1
ξj x j
n
H X
j=1
j=1
n
X
ξj x j
n
H X
j=1
ξ k λk x k =
n
X
ξk Ax k
k =1
λj |ξj |2
k =1
j=1
αj λj ,
|ξj |2
with αj = Pn
2
k =1 |ξk |
Hence,
RA (x) =
n
X
j=1
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
12 / 43
Proof of Rayleigh’s principle
From 0 ≤ αj ≤ 1 and
Pn
λ1 =
j=1
αj = 1 one obtains
n
X
j=1
αj λ1 ≤
n
X
j=1
λ1 = RA (x 1 ),
λi
αj λj ≤
n
X
αj λn = λn .
j=1
λn = RA (x n ).
= min{RA (x) : x ∈ Cn , x H x j = 0, j = 1, . . . , i − 1}
= max{RA (x) : x ∈ Cn , x H x j = 0, j = i + 1, . . . , n}
follow in a similar way since ξ1 = · · · = ξi−1 = 0 if x H x j = 0 for j = 1, . . . , i − 1.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
13 / 43
Numerical methods
Linear systems of equations Ax = b can be solved by a finite algorithm (i.e. a
finite number of operations) like Gauss elimination.
Determining an eigenvalue of a matrix A ∈ Rn×n is equivalent to finding a root
of the characteristic polynomial
χ(λ) := det(A − λI) = 0.
It is known (Theorem of Abel) that for n ≥ 5 there is no formula for solving
det(A − λI) = 0
for λ. Hence, the eigenvalue problem Ax = λx usually can be solved only by
iterative methods.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
14 / 43
Example


0.2 0.3 0.4
A = 0.6 0.2 0.5
0.2 0.5 0.1
Choose any vector x 0 ∈ R3 and compute the sequence
x k := Ax k −1 , k = 1, 2, 3, . . .
After a small number of steps (≈ 10) we obtain


0.5122
x k = 0.6974 and kAx k − x k k small.
0.5013
x k seems to be an eigenvector corresponding to the eigenvalue λ = 1.
Is this a miracle?
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
15 / 43
A is stochastic
All elements of A are nonnegative, and every column of A adds to 1. Matrices
with these properties are called stochastic. They describe the behavior of
Markov chaines.
If A is stochastic, then every row of AT adds to 1, and therefore (1, 1, . . . , 1)T
is an eigenvector of AT corresponding to the eigenvalue 1.
det(A − λI) = det(AT − λI)
implies that the eigenvalues of A and AT coincide. Hence, every stochastic
matrix has one eigenvalue λ = 1.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
16 / 43
Power method
Assume that A is diagonizable, i.e. there exist n linearly independent
eigenvectors u 1 , . . . , u n of A, and assume that λ1 is a dominant eigenvalue
|λ1 | > |λ2 |, |λ3 |, . . . , |λn |.
The initial vector x 0 can be representeted as
x0 =
n
X
αj u j
j=1
n
n
n
X
X
X
αj Au j =
αj λj u j
Ax 0 = A
αj u j =
j=1
TUHH
Heinrich Voss
j=1
NLA: Chap.3, Eigenvalue Problems
j=1
2006
17 / 43
Power method ct.
n
n
n
X
X
X
A2 x 0 = A
αj λj u j =
αj λj Au j =
αj λ2j u j
j=1
j=1
j=1
By induction it follows
Am x 0 =
n
X
j=1
n
λ m X
j
j
m
1
αj λm
u
=
λ
α
u
+
α
uj .
1
j
j
1
λ1
j=2
From |λj |/|λ1 | < 1 it follows that (λj /λ1 )m → 0. Hence, if α1 6= 0, then the
sequence
n
λ m
X
j
m 0
1
λ−m
A
x
=
α
u
+
αj
uj
1
1
λ1
j=2
converges to an eigenvector corresponding to λ1 .
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
18 / 43
Power method ct.
If |λ1 | =
6 1, then for increasing m one obtains overflow or underflow.
Apply the method to


0.2 0.3 0.4
B = 0.6 −0.1 0.5
0.2 0.5 0.1
The sequence x m converges to the null vector. The largest eigenvalue of B in
modulus seems to be smaller than 1.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
19 / 43
Power method
Normalize x m in each step to avoid underflow or overflow.
Power method
1: Given initial vector x 0
2: for m = 0, 1, 2, . . . until convergence do
3:
y m+1 = Ax m ;
4:
km+1 = ky m+1 k
5:
x m+1 = y m+1 /km+1
6: end for
With this modification the power method converges in a reasonable number of
steps to an eigenvector corresponding to the dominant eigenvalue
λ1 = 0.9304.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
20 / 43
Observations
m 0
1
λ−m
1 A x = α1 u +
n
X
αj
λ m
j=2
j
λ1
uj
demonstrates that the speed of convergence depends on
q := max
j=2,...,m
|λj |
.
|λ1 |
The smaller q is, the faster is the convergence of the power method.
If the initial vector x 0 has no component of the eigenvector corresponding to
the dominant eigenvalue (i.e. α1 = 0), then in the course of the algorithm
rounding errors usually produce a component of u 1 which is amplified in
further iterations until convergence.
Starting the power method for A with a linear combination of eigenvectors
corresponding to λ2 and λ3 one obtains a reasonable approximation to an
eigenvector corresponding to λ1 after 40 iterations.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
21 / 43
Observations ct.
If λ1 is a multiple dominant eigenvalue of A
λ1 = λ 2 = · · · = λ p ,
|λ1 | > |λj | for j = p + 1, . . . , n,
and A is diagonalizable, then all considerations above stay true.
For
|λ1 | = |λ2 | > |λj | for j = 3, . . . , n,
and λ1 6= λ2
one does not obtain convergence of the power method.
In steps 4 and 5 of the power method the normalization can be replaced by a
scaling
km+1 = `T y m+1
where ` ∈ Rn is a vector which is not orthogonal to the eigenvector u 1
corresponding to the dominant eigenvector.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
22 / 43
Inverse iteration
Applying the power method to the inverse matrix A−1 one can determine the
smallest eigenvalue in modulus.
Inverse iteration
Given initial vector x 0
for m = 0, 1, 2, . . . until convergence do
Solve Ay m+1 = x m for y m+1
km+1 = ky m+1 k
x m+1 = y m+1 /km+1
end for
Applying inverse iteration to the matrix B one gets fast convergence to an
eigenvector corresponding to the smallest eigenvalue λ3 = −0.2111. For A
the convergence is very slow. What is the difference?
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
23 / 43
Inverse iteration ct.
The shifted matrix A − λ̃I has eigenvalues λj − λ̃, if λj are the eigenvalues of
A.
If λ̃ is not an eigenvalue of A, then (A − λ̃I)−1 has eigenvalues
1
.
λj −λ̃
If |λp − λ̃| < |λj − λ̃| for j = 1, . . . , n, j 6= p then
Inverse iteration with fixed shift
Given initial vector x 0
for m = 0, 1, 2, . . . until convergence do
Solve (A − λ̃I)y m+1 = x m for y m+1
km+1 = `T y m+1
x m+1 = y m+1 /km+1
end for
converges to an eigenvector corresponding to λp . The rate of convergence is
q = max
j6=k
TUHH
Heinrich Voss
|λk − λ̃|
.
|λj − λ̃|
NLA: Chap.3, Eigenvalue Problems
2006
24 / 43
Inverse iteration with variable shifts
For large m it holds that x m is an approximate eigenvector corresponding to
λp and `T x m = 1. Hence,
km+1 = `T y m+1 = `T ((A − λ̃I)−1 x m ) ≈
1
1
`T x m =
.
λp − λ̃
λp − λ̃
This observations suggests to iterate the shift as well:
km+1 ≈
1
λm+1 − λm
=⇒
λm+1 := λm + 1/km+1
Inverse iteration with variable shifts
Given initial vector x 0 and initial approximation λ0
for m = 0, 1, 2, . . . until convergence do
Solve (A − λm I)y m+1 = x m for y m+1
km+1 = `T y m+1
x m+1 = y m+1 /km+1
λm+1 = λm + 1/km+1
end for
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
25 / 43
Quadratic convergence
Let λ̃ be an algebraically simple eigenvalue of A (i.e. λ̃ is a simple root of
det(A − λI) = 0), let ũ be a corresponding eigenvector such that `T ũ = 1.
Then inverse iteration with variable shifts converges locally and quadratically
to (λ̃, ũ): There exists some positive constant C > 0 such that, if λ0 is
sufficiently close to λ̃ and x 0 is sufficiently close to ũ, then it holds
|λ̃ − λm+1 | ≤ C|λ̃ − λm |2
TUHH
Heinrich Voss
and kũ − x m+1 k ≤ Ckũ − x m k2 .
NLA: Chap.3, Eigenvalue Problems
2006
26 / 43
Deflation
Assume that we have already obtained the largest (smallest, closest to a
given shift) eigenvalue λ̃ and corresponding eigenvector ũ.
How can we compute further eigenpairs by the power method?
Let ỹ be a left eigenvector of A corresponding to some eigenvalue µ̃ 6= λ̃, i.e.
ỹ T A = µ̃ỹ .
Then it holds
µ̃ỹ T ũ = (ỹ T A)ũ = ỹ T (Aũ) = λ̃ỹ T ũ
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
=⇒
ỹ T ũ = 0.
2006
27 / 43
Deflation ct.
Let B := A − ũw T , where w ∈ Rn satisfies w T ũ 6= 0
B ũ = Aũ − ũw T ũ = (λ̃ − w T ũ)ũ,
i.e. ũ is an eigenvector of B corresponding to the eigenvalue λ̃ − w T ũ.
With eigenvalue µ̃ 6= λ̃ of A and its corresponding left eigenvector y , it holds
y T B = y T A − y T ũw T = λ̃y T .
Hence, all eigenvalues of A are kept (only the right eigenvectors can change),
whereas the eigenvalue λ̃ − w T ũ can be moved anywhere by the choice of w
(for instance to 0 to compute the second largest eigenvalue of A in modulus).
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
28 / 43
Symmetric matrices
Let A = AT ∈ Rn×n be a symmetric matrix, λ̃ an eigenvalue of A, and ũ a
corresponding eigenvalue such that kũk = 1.
Let
B = A − λ̃ũ ũ T
If v ∈ Rn is an eigenvector of A (Av = µv ) such that v T ũ = 0 then
Bv = Av − ũ ũ T v = Av = µv
Hence, all eigenvalues of A which are different from λ̃ are eigenvalues of B as
well. 0 is an eigenvalue of B replacing λ̃. If λ̃ is a multiple eigenvalue of A,
then λ̃ is an eigenvalue of B, but the multiplicity is reduced by 1.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
29 / 43
QR Algorithm
QR algorithm
A0 := A
for m = 0, 1, 2, . . . until convergence do
Factorize Am = Qm Rm
Am+1 = Rm Qm
end for
T
T
Am+1 = Rm Qm = Qm
(Qm Rm )Qm = Qm
Am Qm
Hence, all Am are (orthogonally) similar, and therefore they have the same
eigenvalues.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
30 / 43
QR algorithm ct.
If the eigenvalues of A are pairwise different of each other in modulus,
|λ1 | > |λ2 > | · · · > |λn |
and if a further technical condition is satisfied, then the QR algorithm
converges in the following sense:
(m)
If (Am )jk = ajk , then
(m)
lim ajk
m→∞
(m)
lim ajj
m→∞
TUHH
Heinrich Voss
=0
for j > k
= λj
for j = 1, . . . , n
NLA: Chap.3, Eigenvalue Problems
2006
31 / 43
QR algorithm and power method
With
Um = Q1 Q2 · · · · · Qm , Sm = Rm Rm−1 · · · · · R1
it holds
Am = Um Sm .
(∗)
For m = 1 the statement is trivial: A = Q1 R1 = U1 S1 .
T
T
Am+1 = Rm Qm = Qm
Am Qm yields by induction Am+1 = Um
AUm .
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
32 / 43
QR algorithm and power method ct.
If (∗) is valid for some m − 1, then it follows from the definition of Am+1
T
T
T
T
Rm = Am+1 Qm
= Um
AUm Qm
= Um
AUm−1
Multiplying by Sm−1 from the right und by Um from the left we obtain
Um Sm = AUm−1 Sm−1 = Am
which is the proposition for m.
From (∗) we obtain for the first unit vector e1 and ρ = (Rm )(1,1)
An e1 = Um Rm e1 = ρUm e1 .
Hence, the first column has the same direction as the m-th iterate of the
power method with initial vector e1 , and it is not surprising that r11 converges
to the largest eigenvalue of A in modulus and the first column to a
corresponding eigenvector.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
33 / 43
Examples
For


1 −1 −1
6
3
A= 4
−4 −4 −1
the upper triangular form appears after approximately 10 steps, and the
diagonal elements are in the right order.
For


1
0
1
3 −1
B= 2
−2 −2 2
the upper triangular form is arrived after approximately 20 steps, but the
diagonal elements are not ordered by magnitude (So, the technical condition
of the last Theorem is not satisfied).
After further 50 steps the diagonal elements are ordered by magnitude.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
34 / 43
QR algorithm with shifts
QR algorithm with shifts
A0 := A
for m = 0, 1, 2, . . . until convergence do
Choose a suitable shift κm
Factorize Am − κm I = Qm Rm
Am+1 = Rm Qm + κm I
end for
Again all matrices Am are similar
Am+1
T
= Rm Qm + κm I = Qm
(Qm Rm )Qm + κm I
T
T
= Qm (Am − κm I)Qm + κm I = Qm
Am Qm .
and therefore all eigenvalues of the matrices Am coincide.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
35 / 43
Choice of shifts
Let Qj and Rj be the orthogonal and upper triangular matrices obtained in the
QR algorithm with shifts κj , and let
Um = Q1 Q2 · · · · · Qm , Sm = Rm Rm−1 · · · · · R1 .
Then
Um Sm = (A − κm I)(A − κm−1 I) · · · · · (A − κ1 I).
(+)
T
H
From Am+1 = Qm
Am Qm it follows immediately by induction Am+1 = Um
AUm .
For m = 1 equation (+) reads
U1 S1 = Q1 R1 = A − κ1 I
which is the decomposition in the first step of the QR algorithm with shifts.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
36 / 43
Choice of shifts ct.
Assume that (+) holds for some m − 1. From the definition of Am+1 follows
T
T
T
T
Rm = (Am+1 − κm I)Qm
= Um
(A − κm I)Um Qm
= Um
(A − κm I)Um−1 .
Multiplying with Sm−1 from the right and Um from the left one obtains
Um Sm = (A − κm I)Um−1 Sm−1 = (A − κm I)(A − κm−1 I) · · · · · (A − κ1 I).
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
37 / 43
Choice of shifts ct.
From (+) one gets for the last unit vector en
T −1 n
(AT − κm I)−1 · · · · · (AT − κ1 I)−1 en = Um (Sm
) e
T
T −1
Since Sm
and (Sm
) are lower triangular matrices, it holds that
T −1 n
Um (Sm
) e = σUm en
for some σ.
Hence
(AT − κm I)−1 · · · · · (AT − κ1 I)−1 en = σUm en
and the last column of Um can be interpreted as the result of m steps of
inverse iteration with shifts κ1 ,. . . ,κm and initial vector en
(m)
This suggests to choose κm = an,n which is expected to converge to λn .
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
38 / 43
Reducing the cost
The most expensive part in the QR algorithm (shifted or not) is the
computation of the QR factorization in every step.
This cost can be reduced considerably, if the matrix is transformed to upper
Hessenberg form first:

a11
a21

 0

.
A=
 ..

 .
 ..
0
a12
a22
a32
a13
a23
a33
..
.
...
...
...
..
.
a1,n−1
a2,n−1
a3,n−1
..
0
0
..
.
.
. . . an.n−1

a1n
a2n 

a3n 






ann
A has upper Hessenberg form, if ajk = 0 for j > k 1.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
39 / 43
Reducing the cost ct.
Assume that Am has upper Hessenberg form. Then a QR decomposition can
be obtained in the following way:
Multiply Am from the left by a rotation in the plane spanned by the first two unit
vectors e1 and e2 , i.e. by a matrix


cos θ
sin θ 0 0 . . . 0
− sin θ cos θ 0 0 . . . 0


 0
0
1 0 . . . 0


U12 =  0
0
0 1 . . . 0


 ..
..
..
.. 
..
 .
. .
.
.
0
0
0 0 ... 1
Then U12 Am contains in its first two rows linear combinatiosn of the first two
rows of A, and the rows 3, . . . , n are the same as in Am . The rotation angle
can be chosen such that the element in the position (2, 1) is annihilated.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
40 / 43
Reducing the cost ct.
Multiplying U12 Am from the left by a rotation matrix U23 corresponding to rows
2 and 3, we annihilate the element in position (3, 2), which does not change
the element 0 in the (2, 1) position.
Continuing that way we annihilate the elements in positions (i + 1, 1) by a
rotation Ui,i+1 in the plane spanned by ei and ei+1 .
We finally arrive at
Un−1,n · . . . · · · U23 U12 Am = R,
TUHH
Heinrich Voss
T
T
i.e. Am = QR, Q = U12
· · · · · Un−1,n
.
NLA: Chap.3, Eigenvalue Problems
2006
41 / 43
Reducing the cost ct.
T
T
Am+1 = RQ = RU12
· · · · · Un−1,n
T
Multiplying R by U12
combines the first two columns of R and leaves the other
T
columns unchanged. Multiplying by U23
combines columns 2 and 3 and
leaves the other ones unchanged, etc.
Obviously
T
T
Am+1 = RU12
· · · · · Un−1,n
becomes an upper Hessenberg matrix.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
42 / 43
Reduction to Hessenberg form
A given matrix can be transformed to upper Hessenberg form using
Householder matrices.
For
a11 c T
A=
, B ∈ R(n−1)×(n−1) , b, c ∈ Rn−1
b
B
let w ∈ Rn−1 , kwk = 1 such that the Householder matrix Q1 = I − 2ww T maps
b to a multiple ofthe firstunit vector in Rn−1 .
1 0
Then with P1 =
we get
0 Q1

a11
 k


A1 := P1 AP1 =  0
 ..
 .
c T Q1





Q1 BQ1 

0
and the first column already has obtained the desired form.
The following columns can be tranformed in a similar way.
TUHH
Heinrich Voss
NLA: Chap.3, Eigenvalue Problems
2006
43 / 43