2.1.2 Matrix Norms, Inner Products and Condition Numbers

2.1. REGULAR LINEAR SYSTEMS
51
we finally get
N
X
N3 N2 N
−
+
k2 − N 2 =
3
2
6
k=1
multiplications for LU decomposition. Additionally we have to perform
N (N − 1)/2 divisions.
Often we have to solve the same linear system for different right hand sides b.
In that case the factorization step needs only to be performed once and only
the (cheaper) forward and backward substitution steps have to be performed
for the different right hand sides. A particular example is the numerical
evaluation of the mathematical expression
A−1 B
This can be rewritten as a
AX = B
where X is a matrix with columns x(i) . Every column is then the solution of
a linear system
Ax(i) = b(i) .
2.1.2
Matrix Norms, Inner Products and Condition
Numbers
Numerical computations are always influenced by errors. The errors in the
results of our algorithms we considered so far have mainly two sources
• Round-off errors
• errors in the input data .
Later we will meet a third error source, the truncation error when solving
things iteratively.
In order to study the effects of errors we have to be able to measure the size
of errors. Errors are often described as relative quantities, i.e.
absolute error
exact solution
and as the exact solution often is not available we consider instead
absolute error
relative error =
.
obtained solution
An error in the result of a linear system is a vector, thus we have to be able
to measure sizes of vectors.
For this end we introduce norms
relative error =
52
CHAPTER 2. LINEAR SYSTEMS
Definition 52
A vector norm is defined by k.k : Rn → R with
• kxk ≥ 0
• kxk = 0 ⇔ x = 0
• kx + yk ≤ kxk + kyk
• kαxk = |α|kxk α ∈ R
(see also [Spa94, p. 111]).
Norms we use in this course:
kxkp := (|x1 |p + . . . + |xn |p )1/p
so-called p-norm or Hölder-norm.
Example 53
kxk1 = |x1 | + . . . + |xn |
1/2
kxk2 = (|x1 |2 + . . . + |xn |2 ) (Euklid)
kxk∞ = max |xi |
Theorem 54
All norms on Rn are equivalent in the sense:
There are constants c1 , c2 > 0 such that for all x
c1 kxkα ≤ kxkβ ≤ c2 kxkα
holds.
Example 55
√
kxk∞ ≤ kxk2 ≤ nkxk∞
√
kxk2 ≤ kxk1 ≤ nkxk2
kxk∞ ≤ kxk1 ≤ nkxk∞
Recall from your calculus course, that the definition of convergence is based
on norms. The ultimate consequence of this theorem is that an iteration
process in a finite dimensional space converging in one norm is also converging in any other norm. For proving convergence we just can select a norm
which is the most convenient for the particular proof. Note, that in infinite
dimensional spaces (function spaces) this nice property is lost.
We relate now vector norms to matrices. The concept is highly based on
viewing matrices as linear maps
A : Rn −→ Rn .
2.1. REGULAR LINEAR SYSTEMS
53
Definition 56
kAkp = max
x6=0
kAxkp
= max kAxkp
kxkp =1
kxkp
defines a matrix norm, which is called subordinate to the vector norm kxkp .
Some matrix norms:
kAk1 = maxj
kAk2 =
P
p
maxi λi (AT A) where λi (A) denotes the i-th eigenvalue of A
kAk∞ = maxi
kAkF =
|aij |
i
P
j
P P
i
|aij |
2
j |aij |
1/2
(Frobenius Norm)
Vector and matrix norms can be computed in MATLAB by the command MATLAB
norm, which takes additional argument to define the type of norm, i.e. ’inf’
stands for infinity norm.
We consider now the sensitivity of the linear system Ax = b with respect to
perturbations ∆b of the input data b.
Ax̂ = A(x + ∆x) = b + ∆b
How is the relative input error
k∆bk
kbk
related to the relative output error
k∆xk
?
kxk
From
A∆x = ∆b ⇒ ∆x = A−1 ∆b
we obtain by taking norms
∆x = A−1 ∆b ⇒ k∆xk ≤ kA−1 kk∆bk.
(2.5)
54
CHAPTER 2. LINEAR SYSTEMS
Note that the right inequality is a direct consequence from Def. 56. Analogously we get
b = Ax ⇒ kbk ≤ kAkkxk.
This leads to,
k∆xk
kA−1 k k∆bk
kAk kA−1 kk∆bk
≤
=
kxk
kxk
kAkkxk
Thus,
k∆bk
k∆xk
≤ kAk kA−1 k
kxk
kbk
Definition 57 κ(A) := kAkkA−1 k is called the condition number of A.
MATLAB
Condition numbers can obtained in MATLAB by using the commands cond
and condest. The first command computes the condition number exactly
and takes as argument a specification of the type of norm used for computing
this number. condest estimates the condition number with respect to the
1-norm.
We consider also perturbations of the matrix A:
(A + ∆A)(x + ∆x) = b
and define for this end: A(t) := A + t∆A and x(t) := x + t∆x, with t ∈ R.
Consider:
A(t)x(t) = b
and take the derivative w.r.t. t:
A0 (t)x(t) + A(t)x0 (t) = 0
x0 (t) = −A(t)−1 A0 (t)x(t)
Thus,
kx0 k
≤ kA−1 k kA0 k
kxk
kA0 k
≤ kA−1 k kAk
kAk
Note, A0 (t) = ∆A and x0 (t) = ∆x. Thus
k∆xk
k∆Ak
≤ κ(A)
kxk
kAk
2.2. LEAST SQUARES AND DECOMPOSITIONS
55
Example 58 The relative error due to round-off is called the so-called machine epsilon. Usually we have ε ≈ 10−16 when using double precision arithmetic.
If there is no other error in the input data we obtain
k∆xk
≤ κ(A)ε
kxk
If κ(A) = 1 no amplification of the relative input error occurs. If κ(A) = 10k
we loose in the worst case k digits of accuracy.
The number κ(A)−1 can be viewed as the distance from A to the nearest
singular matrix.
2.2
Least Squares and Decompositions
Measurements of data collected from nature often contain “noise“ due to
randomness and/or errors due to procedural technical or human faults. Many
times the collected data overdetermine the problem which we are trying to
solve rentering the solution non-existant. Given the fact that randomness in
the data itself or error in measurements are to blame for these issues we may
want to expand our domain of possible solutions to include those which do
not exactly overlay the data. In mathematical jargon we talk about fitting
the data instead of interpolating it. In other words we will try to find a
solution which will get as close as possible to the collected information. Let
us be more specific in terms of how we measure “closeness”.
In that respect we consider a simple motivating example.
Example 59
We would like to fit a linear model y = a1 t + a0 through the following data,
i
ti
yi
0
1
2
1
-1
1
2
1
3
Notice that the system is overdetermined since the number of equations is
greater than the number of unknowns and therefore inconsistent. We therefore consider a solution to our linear model which clearly will not work out
for all the data provided but will be “close enough” to that data.
The least squares theory provides a solution to a problem of that type by
producing the solution which minimizes the total residuals. The residuals
are defined to be the differences between the actual data and our model. In
56
CHAPTER 2. LINEAR SYSTEMS
mathematical terms the residual is simply the Euclidean distance ri = yi −
(a0 + a1 ti ) for each i. The least squares solution is produced by minimizing
the total of those residuals squared
n
X
R =
[yi − (a0 + a1 ti )]2 .
2
i=1
Our problem has therefore changed to that of obtaining the unknown a0 and
a1 for which the above quantity is minimized. Note that function for R2 is
continuous and convex which implies that the minimum exists and can be
found easily via help of minimization theory.
Alternatively we can simply solve the system using matrix theory. We begin
by writting the linear system in matrix form for each data point provided as
follows

 

y0
1 t0 1 t1  a0 = y1 
a1
y2
1 t2
Which upon substitution of the provided data gives the following Ax = b type
system to solve,


 
1 1 2
1 −1 a0 = 1
a1
1 1
3
Since the matrix A is not square we multiply both sides by AT in order to
produce
3 1
a0
6
=
.
1 3
a1
4
The solution for this system of equation is a0 = 7/4 and a1 = 3/4 which
gives the line y = 7/4 + 3/4t as the one which is closest to the provided data
in Euclidean distance.
Similarly we could fit nonlinear curves to a given set of data if we suspect
that such a fit would be more appropriate.
Example 60
We would like to fit the data
i
ti
yi
0
-1.0
1.0
1
2
3
4
-0.5 0.0 0.5 1.0
0.5 0.0 0.5 2.0