2.1. REGULAR LINEAR SYSTEMS 51 we finally get N X N3 N2 N − + k2 − N 2 = 3 2 6 k=1 multiplications for LU decomposition. Additionally we have to perform N (N − 1)/2 divisions. Often we have to solve the same linear system for different right hand sides b. In that case the factorization step needs only to be performed once and only the (cheaper) forward and backward substitution steps have to be performed for the different right hand sides. A particular example is the numerical evaluation of the mathematical expression A−1 B This can be rewritten as a AX = B where X is a matrix with columns x(i) . Every column is then the solution of a linear system Ax(i) = b(i) . 2.1.2 Matrix Norms, Inner Products and Condition Numbers Numerical computations are always influenced by errors. The errors in the results of our algorithms we considered so far have mainly two sources • Round-off errors • errors in the input data . Later we will meet a third error source, the truncation error when solving things iteratively. In order to study the effects of errors we have to be able to measure the size of errors. Errors are often described as relative quantities, i.e. absolute error exact solution and as the exact solution often is not available we consider instead absolute error relative error = . obtained solution An error in the result of a linear system is a vector, thus we have to be able to measure sizes of vectors. For this end we introduce norms relative error = 52 CHAPTER 2. LINEAR SYSTEMS Definition 52 A vector norm is defined by k.k : Rn → R with • kxk ≥ 0 • kxk = 0 ⇔ x = 0 • kx + yk ≤ kxk + kyk • kαxk = |α|kxk α ∈ R (see also [Spa94, p. 111]). Norms we use in this course: kxkp := (|x1 |p + . . . + |xn |p )1/p so-called p-norm or Hölder-norm. Example 53 kxk1 = |x1 | + . . . + |xn | 1/2 kxk2 = (|x1 |2 + . . . + |xn |2 ) (Euklid) kxk∞ = max |xi | Theorem 54 All norms on Rn are equivalent in the sense: There are constants c1 , c2 > 0 such that for all x c1 kxkα ≤ kxkβ ≤ c2 kxkα holds. Example 55 √ kxk∞ ≤ kxk2 ≤ nkxk∞ √ kxk2 ≤ kxk1 ≤ nkxk2 kxk∞ ≤ kxk1 ≤ nkxk∞ Recall from your calculus course, that the definition of convergence is based on norms. The ultimate consequence of this theorem is that an iteration process in a finite dimensional space converging in one norm is also converging in any other norm. For proving convergence we just can select a norm which is the most convenient for the particular proof. Note, that in infinite dimensional spaces (function spaces) this nice property is lost. We relate now vector norms to matrices. The concept is highly based on viewing matrices as linear maps A : Rn −→ Rn . 2.1. REGULAR LINEAR SYSTEMS 53 Definition 56 kAkp = max x6=0 kAxkp = max kAxkp kxkp =1 kxkp defines a matrix norm, which is called subordinate to the vector norm kxkp . Some matrix norms: kAk1 = maxj kAk2 = P p maxi λi (AT A) where λi (A) denotes the i-th eigenvalue of A kAk∞ = maxi kAkF = |aij | i P j P P i |aij | 2 j |aij | 1/2 (Frobenius Norm) Vector and matrix norms can be computed in MATLAB by the command MATLAB norm, which takes additional argument to define the type of norm, i.e. ’inf’ stands for infinity norm. We consider now the sensitivity of the linear system Ax = b with respect to perturbations ∆b of the input data b. Ax̂ = A(x + ∆x) = b + ∆b How is the relative input error k∆bk kbk related to the relative output error k∆xk ? kxk From A∆x = ∆b ⇒ ∆x = A−1 ∆b we obtain by taking norms ∆x = A−1 ∆b ⇒ k∆xk ≤ kA−1 kk∆bk. (2.5) 54 CHAPTER 2. LINEAR SYSTEMS Note that the right inequality is a direct consequence from Def. 56. Analogously we get b = Ax ⇒ kbk ≤ kAkkxk. This leads to, k∆xk kA−1 k k∆bk kAk kA−1 kk∆bk ≤ = kxk kxk kAkkxk Thus, k∆bk k∆xk ≤ kAk kA−1 k kxk kbk Definition 57 κ(A) := kAkkA−1 k is called the condition number of A. MATLAB Condition numbers can obtained in MATLAB by using the commands cond and condest. The first command computes the condition number exactly and takes as argument a specification of the type of norm used for computing this number. condest estimates the condition number with respect to the 1-norm. We consider also perturbations of the matrix A: (A + ∆A)(x + ∆x) = b and define for this end: A(t) := A + t∆A and x(t) := x + t∆x, with t ∈ R. Consider: A(t)x(t) = b and take the derivative w.r.t. t: A0 (t)x(t) + A(t)x0 (t) = 0 x0 (t) = −A(t)−1 A0 (t)x(t) Thus, kx0 k ≤ kA−1 k kA0 k kxk kA0 k ≤ kA−1 k kAk kAk Note, A0 (t) = ∆A and x0 (t) = ∆x. Thus k∆xk k∆Ak ≤ κ(A) kxk kAk 2.2. LEAST SQUARES AND DECOMPOSITIONS 55 Example 58 The relative error due to round-off is called the so-called machine epsilon. Usually we have ε ≈ 10−16 when using double precision arithmetic. If there is no other error in the input data we obtain k∆xk ≤ κ(A)ε kxk If κ(A) = 1 no amplification of the relative input error occurs. If κ(A) = 10k we loose in the worst case k digits of accuracy. The number κ(A)−1 can be viewed as the distance from A to the nearest singular matrix. 2.2 Least Squares and Decompositions Measurements of data collected from nature often contain “noise“ due to randomness and/or errors due to procedural technical or human faults. Many times the collected data overdetermine the problem which we are trying to solve rentering the solution non-existant. Given the fact that randomness in the data itself or error in measurements are to blame for these issues we may want to expand our domain of possible solutions to include those which do not exactly overlay the data. In mathematical jargon we talk about fitting the data instead of interpolating it. In other words we will try to find a solution which will get as close as possible to the collected information. Let us be more specific in terms of how we measure “closeness”. In that respect we consider a simple motivating example. Example 59 We would like to fit a linear model y = a1 t + a0 through the following data, i ti yi 0 1 2 1 -1 1 2 1 3 Notice that the system is overdetermined since the number of equations is greater than the number of unknowns and therefore inconsistent. We therefore consider a solution to our linear model which clearly will not work out for all the data provided but will be “close enough” to that data. The least squares theory provides a solution to a problem of that type by producing the solution which minimizes the total residuals. The residuals are defined to be the differences between the actual data and our model. In 56 CHAPTER 2. LINEAR SYSTEMS mathematical terms the residual is simply the Euclidean distance ri = yi − (a0 + a1 ti ) for each i. The least squares solution is produced by minimizing the total of those residuals squared n X R = [yi − (a0 + a1 ti )]2 . 2 i=1 Our problem has therefore changed to that of obtaining the unknown a0 and a1 for which the above quantity is minimized. Note that function for R2 is continuous and convex which implies that the minimum exists and can be found easily via help of minimization theory. Alternatively we can simply solve the system using matrix theory. We begin by writting the linear system in matrix form for each data point provided as follows y0 1 t0 1 t1 a0 = y1 a1 y2 1 t2 Which upon substitution of the provided data gives the following Ax = b type system to solve, 1 1 2 1 −1 a0 = 1 a1 1 1 3 Since the matrix A is not square we multiply both sides by AT in order to produce 3 1 a0 6 = . 1 3 a1 4 The solution for this system of equation is a0 = 7/4 and a1 = 3/4 which gives the line y = 7/4 + 3/4t as the one which is closest to the provided data in Euclidean distance. Similarly we could fit nonlinear curves to a given set of data if we suspect that such a fit would be more appropriate. Example 60 We would like to fit the data i ti yi 0 -1.0 1.0 1 2 3 4 -0.5 0.0 0.5 1.0 0.5 0.0 0.5 2.0
© Copyright 2025 Paperzz