Classroom Notes for Approximation Theory Thomas Shores1 Contents 1. Operator Algebra for Approximation Theory 2. Pade Approximations 3. Trigonometric Polynomials 4. Complex Trigonometric Polynomials 5. Discrete Fourier Transforms (DFT) 6. The Fast Fourier Transform (FFT) Description Complexity Implementation 7. All About Cubic Splines General Equations for All Cubic Splines Standard Form for the Defining Cubics Extra Conditions to Uniquely Define the Interpolating Cubic Spline 8. A Review and Tour of Linear Algebra Solving Systems Matrix Arithmetic The Basics Inner Products Eigenvectors and Eigenvalues Condition Number of a Matrix Singular Value Decomposition 9. A Tour of Matlab What is Matlab? A Tutorial Introduction to Matlab References 1 4 5 7 10 13 13 14 15 17 17 18 18 20 20 22 22 25 26 27 29 30 30 31 38 1. Operator Algebra for Approximation Theory Unless otherwise stated, we assume that A is a subset of the normed vector space B with norm ∥·∥. Consult Section 8 for more linear algebra details if needed. 1 These are some classroom notes from a course in approximation theory (Math 441) that I taught in 2009 with a few corrections and updates that relate to Adam’s NA seminar, along with a review of basic linear algebra and Matlab tutorial (which applies to Octave except for a few Matlab-specific commands, mainly graphics.) Last Rev.: 2/16/16. 1 2 Definition. The norm of a (not necessarily linear) operator X : B → A is the smallest number ∥X∥ such that for all f ∈ B, ∥X (f )∥ ≤ ∥X∥ · ∥f ∥ , if such an ∥X∥ exists. In this case the operator is said to be bounded and otherwise, unbounded. Notice that in the case of a linear operator X, i.e., X (af + bg) = aX (f )+bX (g) for scalars a, b and f, g ∈ B, this is equivalent to the definition that we gave in the background notes (see page 25). Definition. An operator X : B → A is said to be an approximation (projection) operator if for all f ∈ B, we have X (X (f )) = X (f ) . In the case of an approximation operator X, the quantity ∥X∥ is often called the Lebesgue constant of the operator. A sufficient condition for an operator to be an approximation operator is the following (stronger) condition: for all a ∈ A, (3.2) X (a) = a. Most of the operators we deal with will have the property (3.2). Next, we give a name to the distance from f to the subset cA. Definition. Given f ∈ B and approximant subset A, we define d∗ (f ) = min ∥f − a∥ a∈A Here is a situation in which we can actually estimate d∗ in a useful way: Theorem. (text 3.2) If c = π/2, B = C k [a, b] and A = Pn with 1 ≤ k ≤ n, then for any f ∈ B, ! (n − k)! k ! ! ! c !f (k) ! . d∗ (f ) ≤ n! ∞ Here is a connection between two of the definitions just discussed: Theorem. (text 3.1) Let A be a finite dimensional subspace of B and X : B → A a linear operator satisfying (3.2). Then ∥f − X (f )∥ ≤ {1 + ∥X∥} d∗ (f ) . OK, now let’s do a bit of experimenting with a simple linear approximation operator: Example. Let B = C 1 [0, 1], A = P1 , and X : B → A the interpolation operator that interpolates f (x) at x = x0 , x1 , with a ≤ x0 < x1 ≤ b. Let’s experiment with a reasonable interesting function first and see how Theorems 3.1 and 3.2 fare. Recall that we can find an interpolating polynomial p (x) = ax+b through (x0 , y0 ) and (x1 , y1 ) by way of the vandermonde matrix system ax0 + b = y0 ax1 + b = y1 . Furthermore, this is a linear operator (we checked this in class), so that text Theorem 3.1 applies. The norm of X we calculated in class to be 1. 3 % start with f(x) = exp(sin(3*x)) fcn = inline(’exp(sin(3*x))’,’x’) fcnp = inline(’exp(sin(3*x)).*cos(3*x)*3’,’x’) xnodes = (0:.01:1)’; plot(xnodes,fcn(xnodes)) hold on, grid % start with the endpoints for interpolation nodes p = [0,1;1,1]\fcn([0;1]) plot(xnodes,polyval(p,xnodes)) dfXf = norm(polyval(p,xnodes)-fcn(xnodes),inf) % not so hot, so let’s look for a “best approximation” nn = 20 astar = [1,nn]; dstar = dfXf; xgrid = linspace(0,1,nn)’; for ii = 1:nn-1 for jj = ii+1:nn p = [xgrid(ii),1;xgrid(jj),1]\fcn([xgrid(ii);xgrid(jj)]); d = norm(polyval(p,xnodes)-fcn(xnodes),inf); if (d < dstar) dstar = d; astar = [ii,jj]; end end end pstar = [xgrid(astar(1)),1;xgrid(astar(2)),1]\fcn([xgrid(astar(1));xgrid(astar(2))]); plot(xnodes,polyval(pstar,xnodes)) dstar % compare results to Theorem 3.1 prediction dfXf, (1+1)*dstar % compare results to Theorem 3.2 prediction c = pi/2 n = 1 k = 1 dstar, factorial(n-k)/factorial(n)*c*norm(fcnp(xnodes),inf) Finally, here are some calculations regarding complexity of linear system solving that is alluded to in BackgroundNotes.pdf. n = 80; a = n*eye(n)+rand(n); b = rand(n,1); tic; c = a\b; toc % now repeat with n = 160, 320 and 640...how does time grow Let’s do some more calculations x = (-5:0.1:5)’; fcn = inline(’1./(1+x.^2)’,’x’) plot(x,fcn(x)) grid, hold on n = 5 nodes = linspace(-5,5,n)’; fnodes = fcn(nodes); 4 p = vander(nodes)\fnodes;grid, pnodes = polyval(p,x); plot(x,pnodes);% what do you think? norm(pnodes-fcn(x),inf) % now start over and do it with n = 10, 20 % next, look for the culprit -- close the plot window n = 5 nodes = linspace(-5,5,n)’; prd = ones(size(x)); for ii =1:n prd = prd.*(x - nodes(ii)); end plot(x,prd) hold on, grid % now repeat the loop with n = 10, 20 2. Pade Approximations This is a rational function approximation whose basic idea is this: Let the approximating rational function to f ∈ C N +1 [a, b] be a rational function r (x) = p0 + p1 x + · · · + pn p (x) = q (x) 1 + q 1 x + · · · + q m xm where 0 ∈ (a, b), N = m + n is the “total degree” of r (x) and require that r (j) (0) = f (j) (0) , j = 0, 1, . . . , N, so that r (x) is something like a Taylor series approximation. This gives us N + 1 conditions on N + 1 coefficients p0 , . . . , pn , q1 , . . . , qm , so in principle we should be able to solve for all the coefficients (but there are no guarantees). Of course, we don’t want to be in the business of computing all those derivatives, so here’s an alternate approach: Let g (x) = f (x) − r (x), so that the derivative condition is simply that all derivatives of g up to the N -th order vanish at x = 0. We have that q (x) f (x) − p (x) g (x) = q (x) also has N + 1 continuous derivatives defined at x = 0, so has a Taylor series expansion ′ g (0) g(N ) (0) N g(N +1) (ξ) N +1 g (x) = g (0) + x + ··· + x + x 1 n! (n + 1)! for some ξ between 0 and x. So the derivative conditions are equivalent to requiring that xN +1 be a factor of g (x). However, the denominator has constant term 1, so x cannot be factored from it. Therefore, we must have that xN +1 is a factor of q (x) f (x) − p (x). Thus our strategy is to write out an expansion for f (x), say f (x) = a0 + a1 x + · · · , multiply terms and require that the first N + 1 coefficients of (1 + q1 x + · · · + qm ) (a0 + a1 x + · · · ) − (p0 + p1 x + · · · + cn xn ) 5 be zero. This won’t require complicated derivatives, and can easily be done, once we have an expansion for f . In fact, the n-th term can be written out explicitly, so the conditions are q0 = 1, 0= j " k=0 qk aj−k − pj , j = 0, 1, . . . , N. 3. Trigonometric Polynomials Real-Valued Functions. We covered the discrete Fourier series operator in class. Now let’s put it to work. A review: With the standard inner product in C [−π, π], we saw that the projection of of f (x) ∈ C [−π, π] into the finite dimensional subspace is the trig polynomial Wn = span {cos kx, sin kx | 0 ≤ k ≤ n} n a0 " q (x) = (ak cos kx + bk sin kx) , + 2 k=1 where, for 0 ≤ k ≤ n, 1 ak = π and, for 1 ≤ k ≤ n, ˆ π f (x) cos kx dx −π ˆ 1 π f (x) sin kx dx π −π If we use the trapezoidal rule with N + 1 equally spaced nodes xj = 2π N j on the interval [0, 2π] (remember that the location of the interval makes no difference – only that it should be of length 2π) to approximate the Fourier coefficients ak , bk with a′k , b′k , then first and last values agree by periodicity and we obtain # $ N −1 2π 2 " ′ f (xj ) cos jk ak = N N bk = j=0 b′k 2 = N N −1 " j=0 f (xj ) sin # 2π jk N $ Wait a minute, you say! That doesn’t look like the trapezoidal rule that I know – the endpoints should have one half the coefficient of interior points. However, bear in mind that assuming periodicity, we have f (x0 ) = f (xN ), so we combine the endpoints. As an aside, % & you might recall that the trapezoidal method applied to smooth functions has error O h2 , where h is the stepsize between the equally spaced nodes. As a matter of fact,% the error of & this method as applied to analytic periodic functions is spectacularly better: O e−a/h , for a suitable positive constant a. This is exponential decay of error. For details see the informative paper [3] by L. Trefethen and J. Weideman. First, let’s create a function that actually evaluates a trig polynomial at vector argument x. We understand that such a trig polynomial will be specified by the vectors of coefficients a = (a1 , . . . , an , a0 ) b = (b1 , b2 , . . . , bn ) . OK, here it is: 6 function retval = trigeval(a,b,x) % usage: y = trigeval(a,b,x) % description: given vectors of trig coefficients a, b % of length n+1 and n, respectively, and argument vector % x, this function returns the value of the trig series % p(x) = a(n+1)/2 + sum(a(j)*cos(j*x)+b(j)*sin(j*x),j=1..n). % n = length(b); if (length(a) ~= n+1) error(’incorrect vector lengths’); end retval = a(n+1)/2*ones(size(x)); for j = 1:n retval = retval + a(j)*cos(j*x)+b(j)*sin(j*x); end Now create a script that does the work of building a discrete Fourier trig polynomial from ’myfcn.m’. It would look something like this: % script: fouriertest.m % description: perform some approximation experiments with % Fourier polynomials on the function myfcn defined externally. n = 2; % account for highest frequency N = 4; % sampling number on [0,2*pi] a = zeros(1,n+1); b = zeros(1,n); % construct the coefficients for myfcn xnodes = linspace(0,2*pi,N+1); xnodes = xnodes(1:N); % trim off 2*pi xnodes = xnodes - pi; % let’s work on [-pi,pi] f = myfcn(xnodes); % dispose of 0th coefficients a(n+1) = 2*sum(f)/N; % now the rest for j = 1:n a(j) = 2*f*cos(j*xnodes)’/N; b(j) = 2*f*sin(j*xnodes)’/N; end % display norm of node error x = -pi:.01:pi; disp(’|| myfcn(x) - trigeval(a,b,x) ||:’) ,disp(norm(myfcn(x) - trigeval(a,b,x),inf)) % now try plotting clf plot(x,myfcn(x),x,trigeval(a,b,c)) 7 Now for some experiments: (1) Let’s start with something really simple like f (x) = sin 2x, N = 3, 4 and n from ' N −1 ( to N . Hmmm. Something’s seriously wrong. Try N = 5, 7. interpolation rate 2 (2) Now try f (x) = sin 2x − 0.2 ∗ cos 5x, but start with N = 5. What’s going on here? Nyquist discovered that in order to recover a periodic signal by sampling, a sufficient condition is that we have a sample rate more than twice the highest frequency (in a sinusoidal component) of the signal. It is not a necessary condition. This number is called the Nyquist sampling rate. Recall that frequency is defined f = 1/T , where T is the period of a function. In the first example, the highest frequency is 1/π, so our sampling rate must be greater than 2/π. Thus in an interval of width 2π, we could sample with N > 2π · π2 = 4. So N = 5 will work. What went wrong with smaller N ? Consider this simple example: If we sample with N = 2, we cannot distinguish between sin x, sin 2x, sin 3x, etc. This phenonemon is called aliasing, and is well studied in the area of signal processing. Now consider the second example. The highest frequency occurring is 5/(2π), so the Nyquist sampling rate gives that in an interval of width 2π a sufficient sampling number is N > 10. Finally, let’s break the rules and change the function to a square wave: f (x) = sign (x). Experiment with increasing sample sizes. This time, settle on a trig polynomial size and increase sampling until we are satisfied. Draw a separate figure for n = 5, 11, 21. Conclusions? Note that we saw several interesting features: (1) The discrete Fourier series does seem to converge pointwise to f (x). (2) At the discontinuity it appears to be converging to a point half way between the left and right-hand limits. (3) The “bumps” near the discontinuity appears to gradually contract in width, but their height does not go to zero. This is the so-called “Gibbs effect” that was observed around the end of the nineteenth century by J. Gibbs. 4. Complex Trigonometric Polynomials Here the trig functions are replaced the by the complex exponentials eikx , k = −n, . . . 0, . . . , n, which turn out to be an orthogonal set of functions in the space C [−π, π] of continuous complex-valued functions with domain [−π, π]. One can define the complex function ez in terms of the power series for the exponential function learned in Calculus, but for our purposes, it suffices to use this fact as definition: For real x, eix = cos x + i sin x. We use the standard complex inner product ˆ ⟨f, g⟩ = π f (x) g (x)dx −π and induced norm ∥f ∥2 = ⟨f, f ⟩ in this setting. Remember that to integrate a complex function, you simply integrate its real and complex parts, that is, if f (x) = g (x) + ih (x), with g (x) and h (x) real-valued, then ˆ b ˆ b ˆ b f (x) dx = g (x) dx + i h (x) dx. a a a Let Wn be the finite dimensional subspace spanned by these exponentials and, as with the trig polynomials, we obtain the standard projection formula for the best approximation to 8 f (x) ∈ C [−π, π] from Wn is given by q (x) = n " ck eikx , k=−n where ) * ˆ π f, eikx 1 ck = ikx ikx = f (x) e−ikx dx. ⟨e , e ⟩ 2π −π The bridge between these coefficients and the real Fourier coefficients consists of the following identities for k ≥ 0, which are easily checked from definitions: ak = ck + c−k bk = i (ck − c−k ) 1 ck = (ak − ibk ) 2 1 c−k = (ak + ibk ) . 2 Just as with the ak and bk we can approximate the ck by a trapezoidal integration using N + 1 equally spaced nodes xj = 2π N j on the interval [0, 2π], as with real Fourier coefficients, ′ and obtain the approximation ck to ck as c′k = N −1 N −1 2πi 1 1 " 1 " f (xj ) e− N jk = yj ω kj = ⟨y, wk ⟩ ≡ Yk , N N N j=0 where j=0 yj = f (xj ) , Yk = c′k y = (y0 , y1, . . . , yN −1 ) ∈ CN and + , wk = ω 0 , ω,k . . . , ω k(N −1) ∈ CN , −1 Here we use the notation ωN = e2πi/N and ω= ωN = ω N (check these equalities for yourself) T N and the inner product ⟨y, w⟩ = y w on C . The vectors wk have some very interesting properties. For one, one checks that the sequence {wk } is periodic in that wm = wm+N for all integers m, since ω N = 1. Moreover, for 0 ≤ m, n < N , N −1 + N −1 ,j " " N, if m = n (m−n) mj −nj ω ω = ω = . ⟨wm , wn ⟩ = 0, if m ̸= n j=0 j=0 For the unequal case, check the identity % & 1 + r + r 2 + · · · + r N −1 (1 − r) = 1 − r N . Thanks to periodicity of these vectors, any sequence of vectors wk , k = m, . . . , m + N − 1, forms an orthogonal basis of CN . In particular, for any vector y the standard projection formula gives N −1 N −1 N −1 " " 1 " ⟨y, wk ⟩ Yk w k . wk = ⟨y, wk ⟩ wk = y= ⟨wk , wk ⟩ N k=0 k=0 k=0 9 Therefore, for 0 ≤ j ≤ N − 1, the ith coordinate of y is given by yj = N −1 " Yk (wk )i = k=0 N −1 " 2πi Yk e N k=0 All of this suggests an N × N matrix of the form F = 1 N express the relation between the yj and Yk in matrix form: kj . . 2πi e− N kj / k,j which enables us to Y =F y, y =N F Y. Let’s do some experiments with Matlab: % construct some DFT matrices N = 5 zeta = exp(-2*pi*i/N) z = zeta.^(0:N-1) % some random vector y = rand(N,1) % try help vander -- it doesn’t quite do what we want F = fliplr(vander(z))/N Finv = N*conj(F) Y = F*y norm(y - Finv*Y) % it works! O.K. now we are going to do some experiments that will be better handled with a script. So start Matlab and edit ’myfcn’. Change it to a square wave on [0, 2π] by retval = (abs(x)>10*eps).*sign(pi-x); Next, edit ’fouriertest’, or whatever you called the file from the previous experiments. Here is what you need: n = 4 % highest frequency in Fourier approximation N = 2*n + 1 % account for Nyquist sampling rate rule a = zeros(1,n+1); b = zeros(1,n); % construct the coefficients for myfcn xnodes = linspace(0,2*pi,N+1); xnodes = xnodes(1:N); % trim off 2*pi f = myfcn(xnodes); % dispose of 0th coefficient a(n+1) = 2*sum(f)/N; % now the rest of real Fourier coefs for k = 1:n a(k) = 2*f*cos(k*xnodes)’/N; b(k) = 2*f*sin(k*xnodes)’/N; end % next construct the complex coefficients % with a suitable DFT matrix 10 zeta = exp(-2*pi*i/N) z = zeta.^(-n:n) F = fliplr(vander(z))/N; Y = F*f.’; % ok, let’s compare ac = zeros(1,n+1); % a(n+1) is exceptional Fourier a_0 of index 0 ac(n+1) = 2*c(n+1); disp(’| a_0 - 2*c_0 | :’) ,disp(abs(a(n+1) - ac(n+1))) % the a(k) are Fourier a_k, k>0 ac(1:n) = c(n+2:N) + c(n:-1:1); disp(’|| a_k - (c_k + c_{-k}) || :’), disp(norm(a(1:n)-ac(1:n),inf)) % the b(k) are Fourier b_k, k>0 bc = i*(c(n+2:N) - c(n:-1:1) ); disp(’ || b_k - (c_k - c_{-k}) || :’), disp(norm(b - bc,inf)) Now save it as ’DFTtest’. Once you have run DFTtest, try plotting (or just add it to DFTtest): x = 0:.01:2*pi; plot(x,myfcn(x),x,trigeval(a,b,x),x,trigeval(ac,bc,x)); norm(imag(ac),inf) norm(imag(bc),inf) 5. Discrete Fourier Transforms (DFT) 2π As usual, ωN = e N i , a primitive N th root of unity. To recap, given periodic data [yk ] = N −1 −1 ′ {yk }k=0 , we obtain transformed data [Yk ] = {Yk }N k=0 (we’re replacing ck by Yk ), and conversely, by way of the formulas defining the discrete Fourier transform and its inverse, namely Yk = N −1 1 " −kj yj ωN ≡ FN ([yk ]) N j=0 yk = N N −1 " j=0 kj −1 ≡ FN ([Yk ]) . Yj ω N Next, let’s alter the definition of the data [yk ]. Since the data is periodic, we can interpret it as a doubly infinite sequence [yk ] = . . . , y−j , . . . , y−1 , y0 , y1 , . . . , yN −1 , yN , . . . , yj , . . . where we understand that in general, that if j = qN + r, with integers j, q, r and 0 ≤ r < N , then yj = yr . (If you know congruences, you will recognize that j ≡ r (modN ).) We have the same interpretation with transformed data [Yk ], which is also periodic, given that [yk ] is. Here is a key idea: Definition 1. Given periodic data sets [yk ] and [zk ], we define the convolution of these to be [wk ] = [yk ] ∗ [zk ] , 11 where wk = N −1 " j=0 yk zk−j , k ∈ Z. % & The complexity of the convolution operator is 2N 2 , or simply O N 2 . To illustrate the power of the DFT, we next develop two useful applications of the DFT, namely polynomial multiplication and numerical derivative calculation. Polynomial multiplication: We can interpret polynomial multiplication as convolution in the following way: Given polynomials p (x) = a0 + a1 x + · · · + am xm q (x) = b0 + b1 x + · · · + bn xn , we know that P (x) Q (x) = a0 b0 + (a0 b1 + a1 b0 ) x + · · · + am bn xm+n . The kth term is actually " aj bℓ , j+ℓ=k where 0 ≤ j ≤ m and 0 ≤ ℓ ≤ n. As far as complexity, we see that the complexity is about 2 (m + 1) (n + 1) flops, so certainly O (mn). Let N ≥ m + n + 1 and pad the vectors of coefficients with zeros to obtain vectors a = (a0 , a1 , . . . , am , 0, . . . , 0) b = (b0 , b1 , . . . , bn , 0, . . . , 0) in the space CN . Next form the periodic data sets [ak ] and [bk ], form [ck ] = [ak ] ∗ [bk ] and observe that for 0 ≤ k ≤ m + n, ck = N −1 " aj bk−j j=0 = " a j bℓ , j+ℓ=k Here the padded zeros ensure that 0 ≤ j ≤ m and 0 ≤ ℓ ≤ n. Moreover, if k > m + n, then the padded zeros will make the sum identically 0, since one factor in each term will always be zero. So this is exactly the same sum as for the coefficients of the product polynomial, padded with zeros beyond the (m + n + 1)th entry. We need one more key fact: Theorem 2. If FN ([yk ]) = [Yk ] and FN ([zk ]) = [Zk ] , then FN ([yk ] ∗ [zk ]) = [N Yk Zk ] and −1 ([N Yk Zk ]) . [yk ] ∗ [zk ] = FN 12 Proof. Let [wk ] = [yk ] ∗ [zk ] and calculate 0 1 N −1 N −1 1 " " −kj Wk = yℓ zj−ℓ ωN N j=0 ℓ=0 ⎛ ⎞ −1 N −1 N " " 1 −kj ⎝ yℓ zj−ℓ ⎠ ωN = N j=0 ℓ=0 ⎛ ⎞ N −1 N −1 " 1 " −k(j−ℓ) ⎠ −kℓ ⎝ = yℓ ωN zj−ℓ ωN N j=0 ℓ=0 0N −1−ℓ 1 N −1 " 1 " −kℓ −km yℓ ωN = zm ωN N ℓ=0 m=−ℓ 0 N −1 1 0 N −1 1 1 " 1 " −kℓ −km =N yℓ ωN zm ωN N N m=0 ℓ=0 = N Yk Zk . The last equality is immediate. All of this suggests an algorithm for computing convolutions: First, apply the DFT to each sequence, then take the coordinate-wise product of the transforms, and finally % 2take & the inverse transform of this product. Although we can say that the complexity is O N , clearly there is a factor of 10 or so that would seem to be too much work when contrasted with doing it directly. So what’s the point? Enter the FFT, which will reduce the work of a transform (or inverse transform) to O (N log2 N ). Now this idea will become profitable in some situations. Let’s confirm the convolution theorem in a special case, namely for the multiplication of p (x) = (1 − x)2 and q (x) = (1 − x)3 by means of the convolution trick. N = 8 zeta = exp(-2*pi*i/N).^(0:N-1); F = fliplr(vander(zeta))/N; Finv = N*conj(F); % represent the polynomial with lowest to highest degree vector y = [1,-2,1,0,0,0,0,0]’ z = [1,-3,3,-1,0,0,0,0]’ Y = F*y; Z = F*z; W = Y.*Z; w = Finv*(N*W) Numerical derivative calculations: If we believe that the standard projection formula for the best approximation q(x) to f (x) ∈ C [−π, π] from Wn is a very good approximation to f (x), then it is reasonable to consider the derivative of q (x) to be a good approximation to the derivative of f (x), that is, f (x) ≈ q (x) = n " k=−n ck eikx , and f ′ (x) ≈ q ′ (x) = n " k=−n ikck eikx . 13 Let’s try this out with the test function f (x) = sin 2x − 0.2 cos 5x, with f ′ (x) = 2 cos 2x + sin 5x. All we have to do is modify the complex Fourier coefficients, which we illustrate in this Matlab calculation (which you should save as the script DFTderivtest): n = 5 % highest frequency in Fourier approximation N = 2*n + 1 % account for Nyquist sampling rate % construct the coefficients for myfcn, derivative myfcnp xnodes = linspace(0,2*pi,N+1)’; xnodes = xnodes(1:N); % trim off 2*pi f = myfcn(xnodes); fp = myfcnp(xnodes); % this time we’ll do the calculationDF with a suitable DFT matrix zeta = exp(-2*pi*i/N).^(-n:n); F = fliplr(vander(zeta))/N; c = F*f; cp = F*fp; cpapprox = i*(-n:n)’.*f; % ok, let’s compare coefficients disp(’|| cp - cpapprox || :’), disp(norm(cp - cpapprox,inf)) 6. The Fast Fourier Transform (FFT) Description. We’ll do pretty much everything you need to know about the development of the FFT. OK, let’s start with an update of the definition of the discrete Fourier transform (DFT): given a data sequence (yk ) of complex numbers, periodic of period N, we define the DFT of this data to be the periodic sequence (Yn ) of period N given by the formula 7 1 6 −(N −1)k −k −2k , k = 0, 1, . . . , N − 1 y0 + y1 ωN + y2 ωN + · · · + yN −1 ωN Yk = N Since the Yk are a periodic sequence, these are all the values we need to know. Symbolically, we write [Yk ] = FN ([yk ]). Now assume that N is even, say N = 2m and we can split this −k sum into even and odd indexed terms and factor ωN from the odd terms to get that for k = 0, 1, . . . , N − 1, 7 16 −k Yk = Ek + ωN Ok 2 where 6 7 −(N −2)k −2k −4k 1 y0 + y2 ωN + y4 ωN + · · · + yN −2 ωN Ek = m 7 6 −(N −2)k −2k −4k 1 Ok = m y1 + y3 ωN + y5 ωN + · · · + yN −1 ωN Here are some simple facts: −(k+m) −k ωN = −ωN Ek+m = Ek Ok+m = Ok It follows that for k = 0, 1, . . . , N/2 − 1, we have these DFT doubling formulas 6 7 −k Yk = 12 Ek + ωN Ok 6 7 −k Ok . Yk+m = 12 Ek − ωN 14 Observe that this essentially cuts the work of computing Yk in half, requiring about N 2 /2 multiplications and additions instead of N 2 . This observation lies at the heart of the FFT. However, the real power of this idea is realized when we team it up with recursion. To this end, we suppose that N is purely even, that is, N = 2p for some integer p. Now make the key 2 =ω observation that calculation of the Ek and Ok ’s is itself a DFT. For we have that ωN N/2 , from which it follows that 7 6 −(m−1)k −n −2n 1 y0 + y2 ωN/2 + y4 ωN/2 + · · · + yN −2 ωN/2 Ek = m 7 6 −(m−1)k −n −2n 1 Ok = m y1 + y3 ωN/2 + y5 ωN/2 + · · · + yN −1 ωN/2 In other words, the two quantities above are simply the DFT of the pairs of the half sized data y0 , y2 , . . . , yN −2 and y1 , y3 , . . . yN −1 . Thus, we see that we can halve these two data sets again and reduce the problem to computing four quarter sized DFTs. Recursing all the way down to the bottom, we reduce the original problem to computing N DTFs of 1/N -th sized data sets. It takes us p of these steps to get down to the bottom, namely size N/20 to N/21 , then N/21 to N/22 , all the way down to N/2p−1 to N/2p = 1. However, at the bottom, the DFT of a singleton data point is simply the data point itself. Are we done? Well, not quite. This is only half the problem, because once we’ve reached the bottom, we have to reassemble the data a step at a time by way of DFT doubling formulas to build the DFT at at the top. Let’s examine how we have to rearrange our original data set for this recursion in the case p = 3, that is, N = 8. Designate levels by ℓ. ℓ=3 ℓ=2 ℓ=1 ℓ=0 y0 y0 y0 y0 y1 y2 y4 y4 y2 y4 y2 y2 y3 y6 y6 y6 y4 y1 y1 y1 y5 y3 y5 y5 y6 y5 y3 y3 y7 y7 y7 y7 Now we’re ready to work our way back up. Notice, BTW, ℓ = 1 really is the same as the bottom. Complexity. The recursive way of thinking is very handy. For one thing, it enables us to do a complexity analysis without much effort. We’ll ignore the successive divisions by 2. We could just ignore them during the computations, then at the end divide our final result by 2p = N. This only involves N real multiplications, which is negligible compared to the rest of the work. Also notice that the number of multiplications for a full reconstruction is about half the number of additions, so we’ll focus on additions. (Bear in mind that we assume the values of ωN are precomputed and available.) How much work is involved in going from the two data sets to the single one of length 2p ? Let’s denote the total number of additions by Ap . We see from the formulas for Yk and Yk+m that Ap = 2Ap−1 + 2p and down at the bottom A0 = 0. Recursion yields Ap = = .. . 2Ap−1 + 2p 2(2Ap−2 + 2p−1 ) + 2p = 2p A0 + p2p = N log2 N Similarly, one could count multiplications Mp recursively, with the recursion formula being Mp = 2Mp−1 + 2p−1 − 1, 15 0 = 1 and M = 0, to obtain that M = 1 N (log N − 2). For not counting multiplication by ωN 0 p 2 2 reasonably large N this is a very significant reduction; e.g., if N = 256 = 28 , then N 2 = 65536 and N log2 N = 2048. Implementation. We might be tempted to use recursion to program the algorithm. From a practical point of view, this is usually a bad idea, because recursion, the elegant darling of computer science aficionados, is frequently inefficient as a programming paradigm. Rather, let’s assume that we have decomposed that data set into the lowest level. That process alone is interesting, so suppose we have a routine that takes an index vector 0 : N − 1 and rearranges it in such a way that the returned index vector is what is needed to work our way back up the recursion chain of the FFT. The syntax of the routine should be retval=FFTsort(p) 2p . where N = With the routine FFTsort in hand, let’s address the issue of practical coding of the FFT. Rather than thinking from the top down (recursion), let’s think from the bottom up (induction). Of course, they’re equivalent. It suffices to understand how things work from the m = N/2 to N. So let’s re-examine the formulas from the point of view of DFTs. Consider this tabular form for the relevant data: ℓ=p Y0 Y1 . . . YN −1 ℓ = p − 1 E0 E1 . . . Em−1 O0 O1 . . . Om−1 Notice that the first half of the second row is the DFT of the first half of the second level of rearranged data (like ℓ = 2 in our N = 8 example above). Let’s call the process of applying Fact 2 to the second row to get the first row DFT doubling. Back to the general case, imagine that we’re at the ℓth level in our reconstruction, where 0 ≤ ℓ < p. At this level we have constructed 2p−ℓ consecutive DFTs of size 2ℓ , where each DFT is the DFT of the corresponding data at the ℓth level of rearranged input data. Now we pair these DFTs up to construct 2p−ℓ−1 consecutive DFTs of size 2ℓ+1 . Here’s a picture of what the DFT doublings look like in our N = 8 example (start at the bottom): ℓ = 3 E = Y0 E = Y1 E = Y2 E = Y3 E = Y4 E = Y5 E = Y6 E = Y7 ℓ=2 E E E E O O O O ℓ=1 E E O O E E O O ℓ = 0 y0 = E y4 = O y2 = E y6 = O y1 = E y5 = O y3 = E y7 = O Keep going until we get to the top. When we get there we have reconstructed the Yn′ s. One can envision two nested for loops that do the whole job. In fact, here’s a simple Matlab code for the FFT. All divisions by 2 are deferred to the last line, as suggested by our earlier discussion. FFT Algorithm: function Y = FFTransform(y) % usage: Y = FFTransform(y) % description: This function accepts input vector y of length % 2^p (NB: assumed and not checked for) to and outputs % the DFT Y of y using the FFT algorithm. y = y(:); % turn y into a column 16 N = length(y); p = log2(N); omegaN = exp(-pi*i/N).^(0:N-1).’; % all the omegas we need Y = y(FFTsort(p)); % initial Y at bottom level stride = 1; for l=1:p omegal = omegaN(1:N/stride:N); stride = 2*stride; for j=1:stride:N evens = Y(j:j+stride/2-1); odds = omegal.*Y(j+stride/2:j+stride-1); Y(j:j+stride/2-1) = evens + odds; Y(j+stride/2:j+stride-1) = evens - odds; end end Y = Y/N; We might ask what one has to do about the inverse DFT. Recall that this transform is given by the equation 7 6 (N −1)k k 2k , k = 0, 1, . . . , N − 1. y k = Y0 + Y1 ω N + Y2 ω N + · · · + YN −1 ωN −1 Symbolically, we write (yk ) = FN (Yn ). This operation does what it says, that is, compute untransformed data from transformed data. The good news is that we need do nothing more than write a program for the FFT. The reason is that from the definition we can see that 7 1 6 −(N −1)k −k −2k yk = N Y 0 + Y 1 ωN + Y 2 ωN + · · · + Y N −1 ωN , k = 0, 1, . . . , N − 1, N from which it follows that (yk ) = N FN (Y n ). Therefore, it suffices to have an fast algorithm for the DFT and a fast algorithm for the inverse DFT is more or less free. There are other refinements to applications and construction of the DFT coming from signal processing; we won’t go into these. Since these notes are all about approximation theory, there is one more question: how exactly do these numbers relate to the approximation formula that started this whole business, namely, q (x) = n " ck eikx . k=−n The answer lies in a careful scrutiny of the computed approximate Fourier coefficients Yk ≈ ck . Let m = N/2 are recall from the periodicity of the Yk ’s that Yj = Yj−2m . Now re-index these computed numbers as follows: Y0 , Y1 , . . . Ym−1 , Ym , Ym+1 , Ym+2 , ... Y2m−1 Y0 , Y1 , . . . Ym−1 , Ym , Ym+1−2m , Ym+2−2m . . . Y2m−1−2m Y0 , Y1 , . . . Ym−1 , Ym , Y−(m−1) , Y−(m−2) . . . Y−1 17 Thus we are really calculating the approximation q (x) = m−1 " Yk eikx + Ym eimx + N −1 " Yk e−i(k−N )x . k=m+1 k=0 In other words, the correct sequence of exponents of eix implied by the Yk ordering is 0, 1, . . . m − 1 m, − (m − 1) , − (m − 2) , . . . , −1. 7. All About Cubic Splines We study cubic splines with a finite number of (ordered) knots at x0 , x1 , . . . , xn . The space of all such objects isS (x0 , x1 , . . . , xn ; 4) and such an object is a function s (x) ∈ C 2 [x0 , xn ], such that on each subinterval [xj , xj+1 ], j = 0, 1, . . . , n − 1, we have s (x) = sj (x) , a cubic polynomial defined for x ∈ [xj , xj+1 ]. General Equations for All Cubic Splines. Here is the problem presented by knot interpolation alone. Problem: Given a function f (x) ∈ C 2 [x0 , xn ], find all cubic splines that interpolate f (x) at the knots x0 , x1 , . . . , xn , that is, s (xj ) = f (xj ), j = 0, 1, . . . , n. Notation: (1) sj (x) ≡ aj + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3 , j = 0, 1, . . . , n-1. (2) yj = f (xj ), j = 0, 1, . . . , n. (3) Mj = s′′ (xj ), j = 0, 1, . . . , n. (The Mj ’s are called the second moments of the spline.) (4) hj = xj+1 − xj , j = 0, 1, . . . , n − 1. Here is how we solve this problem: Start with s′′ (x) which must be a piecewise linear continuous function. Since we know its endpoint values in the subinterval [xj , xj+1 ], we may use simple Lagrange interpolation to obtain (xj+1 − x) Mj + (x − xj ) Mj+1 . s′′j (x) = hj Integrate twice and we have to add in an arbitrary linear term which, for convenience we write as follows (xj+1 − x)3 Mj + (x − xj )3 Mj+1 + Cj (xj+1 − x) + Dj (x − xj ) , sj (x) = 6hj where Dj and Cj are to be determined. Now differentiate this expression and obtain − (xj+1 − x)2 Mj + (x − xj )2 Mj+1 − Cj + Dj . 2hj For later use we note that if we shift the indices down one, we obtain s′j (x) = − (xj − x)2 Mj−1 + (x − xj−1 )2 Mj − Cj−1 + Dj−1 . = 2hj−1 Our strategy is to reduce the determination of every unknown to the determination of the Mj ’s. Start with the interpolation conditions for s, which implies that for j = 0, 1, . . . , n − 1, s′j−1 (x) sj (xj ) = yj = so that Cj = h2j Mj + Cj hj , 6 hj Mj yj − . hj 6 18 Similarly, from sj (xj+1 ) = yj+1 = h2j Mj+1 + Dj hj , 6 we have Dj = yj+1 hj Mj+1 − . hj 6 Next, use the continuity of s′ (x) to obtain that for j = 1, . . . , n − 1, s′j−1 (xj ) = s′j (xj ) which translates into hj−1 Mj yj−1 − 2 hj−1 h2j−1 Mj h2j Mj − Cj−1 + Dj−1 = − − Cj + Dj 2hj−1 2hj hj−1 Mj−1 yj hj−1 Mj hj Mj yj hj Mj yj+1 hj Mj+1 + + − =− − + + − . 6 hj−1 6 2 hj 6 hj 6 After simplifying, we obtain # $ hj−1 hj−1 hj−1 hj hj yj+1 yj yj−1 yj hj Mj−1 + − + − Mj + Mj+1 = − + − , 6 2 6 2 6 6 hj hj hj−1 hj−1 that is, hj−1 Mj−1 + 6 # hj−1 + hj 3 $ Mj + hj Mj+1 = f [xj , xj+1 ] − f [xj−1 , xj ] . 6 Thus we are two equations short of what we need to determine the n+1 unknowns M0 , M1 , . . . , Mn . Standard Form for the Defining Cubics. Notice that if we express the cubic polynomials defining s (x) in standard form, then for j = 0, 1, . . . , n − 1, we have sj (x) = aj + bj (x − xj ) + cj (x − xj )2 + dj (x − xj )3 , s′j (x) = bj + 2cj (x − xj ) + 3dj (x − xj )2 , s′′j (x) = 2cj + 3dj (x − xj ) . We can use these equations along with our earlier description of sj (x) to obtain that for j = 0, 1, . . . , n − 1, aj = f (xj ) bj = f [xj , xj+1 ] − hj (2Mj + Mj+1 ) 6 Mj 2 1 dj = (Mj+1 − Mj ) . 6hj cj = These are the formulas that we need to put the spline in standard PP form. Extra Conditions to Uniquely Define the Interpolating Cubic Spline. 19 Clamped (complete) cubic spline. Define the spline s (x) = sc (x) by the two additional conditions s′ (x0 ) = f ′ (x0 ) , s′ (xn ) = f ′ (xn ) . Use the formula for s′ (x) to obtain additional equations h0 h0 M0 + M1 = f [x0 , x1 ] − f ′ (x0 ) 3 6 hn−1 hn−1 Mn−1 + Mn = f ′ (xn ) − f [xn−1 , xn ] . 6 3 The resulting system has symmetric positive definite coefficient matrix, from which it follows that there is a unique clamped cubic spline interpolating f (x) at the knots of the spline. The system is (the convention is that blank entries are understood to have value zero) ⎤⎡ ⎤ ⎡ ⎤ ⎡ h0 h0 M0 f [x0 , x1 ] − f ′ (x0 ) 3 6 h1 ⎥ ⎢ M1 ⎥ ⎢ ⎥ ⎢ h0 h0 +h1 f [x1 , x2 ] − f [x0 , x1 ] 6 ⎥⎢ ⎥ ⎢ ⎥ ⎢ 6 ⎥ ⎢ ⎥ ⎢ ⎢ ⎥ .. .. .. .. .. = ⎥⎢ ⎥ ⎢ ⎢ ⎥ . . . . . ⎥⎢ ⎥ ⎢ ⎥ ⎢ hn−1 ⎦ ⎣ hn−2 hn−2 +hn−1 ⎦ ⎣ ⎣ Mn−1 f [xn−1 , xn ] − f [xn−2 , xn−1 ] ⎦ 6 6 hn−1 hn−2 Mn f ′ (xn ) − f [x0 , x1 ] 6 3 Error analysis leads to this theorem: Theorem 3. If f ∈ C 4 [a, b], knots are defined as a = x0 < x1 < · · · < xn = b, h = max0≤j≤n−1 (xj+1 − xj ), and sc (x)% is the clamped cubic spline for f , then for x ∈ [a, b] and & 5 1 3 constants cj , j = 0, 1, 2, where c = 384 , 24 , 8 , such that > > ! ! > (j) > 4−j ! (4) ! >f (x) − s(j) !f ! . c (x)> ≤ cj h ∞ Another striking fact is that cubic spline minimize “wiggles” in the following sense. Theorem 4. Among all g (x) ∈ C 2 [a, b] interpolating f (x) ∈ C 2 [a, b] at the nodes a = x0 < x1 < · · · < xn = b, and f ′ (x) at a and b, the choice g (x) = sc (x) minimizesf ˆ b % ′′ &2 g (x) dx. a Natural cubic spline. Define the spline s (x) = snat (x) by the two additional conditions s′′ (x0 ) = 0, s′′ (xn ) = 0. Obtain additional equations M0 = 0 Mn = 0. The resulting system has symmetric positive definite coefficient matrix, from which it follows that there is a unique clamped cubic spline interpolating f (x) at the knots of the spline. The system is ⎤⎡ ⎡ ⎤ ⎡ ⎤ 1 M0 0 h1 ⎥ ⎢ M1 ⎥ ⎢ ⎢ h0 h0 +h1 ⎥ f [x1 , x2 ] − f [x0 , x1 ] 6 ⎥⎢ ⎢ 6 ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ . . . . . . . . . . = ⎥⎢ ⎢ ⎥ ⎢ ⎥. . . . . . ⎥ ⎢ ⎢ ⎥ ⎢ ⎥ h +h h h n−2 n−1 n−2 n−1 ⎦ ⎣ M ⎣ ⎦ ⎣ f [xn−1 , xn ] − f [xn−2 , xn−1 ] ⎦ n−1 6 6 Mn 0 1 One can obtain similar error bounds as with clamped cubic splines, but of course not as good. 20 “Not-a-knot” cubic spline. Define the spline s (x) = snk (x) to be the cubic interpolating spline at knots x0 by the two additional conditions s (x1 ) = f (x1 ) , s (xn−1 ) = f (xn−1 ) . One obtain two additional equations which again give a linear system with a unique solution by evaluating the expression for s (x) (in terms of moments Mj ) at the nodes x1 and xn−1 . In the limit, as x1 → x0 and xn−1 → xn , the not-a-knot cubic spline tends to the clamped cubic spline. As a final example, estimate error on this function $ # 1 3 x − x2 + x sin (x) , 0 ≤ x ≤ 2π f (x) = 10 using the various cubic splines. Tools for constructing these three splines are the function files CCSpp.m, NCSpp.m and NKCSpp.m. These are located in the file PPfcns.m, but they are also unpacked along with all the other functions from PPfcns.m in the folder PPfncs to be found in Course Materials. For example, here is how to construct the natural cubic spline for f (x) and plot it along with the plot of f (x). You might find it more helpful to plot the error. This could be helpful for deciding how to add knots or change their location. First edit ’myfcn.m’ to the following: retval = (0.1*x.^3 - x.^2 + x).*sin(x); Then do the following in Matlab: knots = linspace(0, 2*pi, 6); pp = NCSpp(knots,myfcn(knots)); x = 0:0.001:2*pi; plot(x, myfcn(x)) hold on, grid plot(x, PPeval(pp,x)) 8. A Review and Tour of Linear Algebra Note: This really is a brief tour. For the most part, it is material from Math 314. Since it is less likely that you saw a variety of norms and a development of the singular value decomposition, I’m including a little more detail concerning these topics. Solving Systems . Solving linear systems is a fundamental activity, practiced from high school forward. Recall that a linear equation is one of the form a1 x1 + a2 x2 + · · · + an xn = b, where x1 , x2 , . . . , xn are the variables and a1 , a2 , . . . , an , b are known constants. A linear system is a collection of linear equations, and a solution to the linear system is a choice of the variables that satisfies every equation in the system. What can we expect? Example 6. Use geometry to answer the question with one or more equations in 2 unknowns. Ditto 3 unknowns. Basic Fact (Number of Solutions): A linear system of m equations in n unknowns is either inconsistent (no solutions) or consistent, in which case it either has a unique solution or infinitely many solutions. In the former case, unknowns cannot exceed equations. If 21 the number of equations exceeds the number of unknowns, the system is called overdetermined. If the number of unknowns exceeds the number of equations, the system is called underdetermined. While we’re on this topic, let’s review some basic linear algebra as well: The general form for a linear system of m equations in the n unknowns x1 , x2 , . . . , xn is a11 x1 + a12 x2 + · · · + a1j xj + · · · a1n xn a21 x1 + a22 x2 + · · · + a2j xj + · · · a2n xn .. . ai1 x1 + ai2 x2 + · · · + aij xj + · · · ain xn .. . am1 x1 + am2 x2 + · · · + amj xj + · · · amn xn = = .. . = .. . = b1 b2 .. . bi .. . bm Notice how the coefficients are indexed: in the ith row the coefficient of the jth variable, xj , is the number aij , and the right hand side of the ith equation is bi . The statement “A = [aij ]” means that A is a matrix (rectangular table of numbers) whose (i, j)th entry, i.e., entry in the ith row and jth column, is denoted by aij . Generally, the size of A will be clear from context. If we want to indicate that A is an m × n matrix, we write A = [aij ]m,n . Similarly, the statement “b = [bi ]” means that b is a column vector (matrix with exactly one column) whose ith entry is denoted by bi , and “c = [cj ]” means that c is a row vector (matrix with exactly one row) whose jth entry is denoted by cj . In case the type of the vector (row or column) is not clear from context, the default is a column vector. How can we describe the matrices of the general linear system described above? First, there is the m × n coefficient matrix ⎡ a11 a21 .. . a12 a22 .. . ··· ··· ⎢ ⎢ ⎢ ⎢ A=⎢ ⎢ ai1 ai2 · · · ⎢ . .. ⎣ .. . am1 am2 · · · a1j a2j .. . ··· ··· a1n a2n .. . aij .. . ··· ain .. . amj ··· amn ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ Notice that the way we subscripted entries of this matrix is really very descriptive: the first index indicates the row position of the entry and the second, the column position of the entry. Next, there is the m × 1 right hand side vector of constants ⎡ ⎢ ⎢ ⎢ ⎢ b=⎢ ⎢ ⎢ ⎣ b1 b2 .. . bi .. . bm ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ 22 Finally, stack this matrix and vector along side each other (we use a vertical bar below to separate the two symbols) to obtain the m × (n + 1) augmented matrix ⎤ ⎡ a11 a12 · · · a1j · · · a1n b1 ⎢ a21 a22 · · · a2j · · · a2n b2 ⎥ ⎢ . .. ⎥ .. .. .. ⎥ ⎢ . . ⎥ . . . . ? = [A | b] = ⎢ A ⎥ ⎢ ⎢ ai1 ai2 · · · aij · · · ain bi ⎥ ⎢ . .. .. .. .. ⎥ ⎣ .. . . . . ⎦ am1 am2 · · · amj · · · amn bm Matrix Arithmetic. Given any two matrices (or vectors) A = [aij ] and B = [bij ] of the same size, we may add them or multiply by a scalar according to the rules A + B = [aij + bij ] and cA = [caij ] . For each size m×n we can form a zero matrix of size m×n: just use zero for every entry. This is a handy matrix for matrix arithmetic. Matlab knows all about these arithmetic operations, as we saw in our Matlab introduction. Moreover, there is a multiplication of matrices A and B provided that A is m × p and B is p × n. The result is an m × n matrix given by the formula @ p A " AB = aik bkj . k=1 Again, Matlab knows all about matrix multiplication. Another simple matrix operation: The transpose of a matrix A = [ai,j ] is a matrix AT = [aj,i ] in which the rows of A have been transposed into columns. Matlab knows these as well: use the apostrophe as in A’. The Basics. There are many algebra rules for matrix arithmetic. We won’t review all of them here, but one noteworthy fact is that matrix multiplication is not commutative: AB ̸= BA in general. Another is that there is no general cancellation of matrices: if AB = 0. Another is that there is an identity matrix for any square size n × n: In is the matrix with ones down the diagonal entries and zeros elsewhere. For example, ⎤ ⎡ 1 0 0 I3 = ⎣ 0 1 0 ⎦ . 0 0 1 We normally just write I and n should be clear from context. Algebra fact: if A is m × n and B is n × m , then AI = A and IB = B. One of the main virtues of matrix multiplication is that is gives us a symbolic way of representing the general linear system given above, namely, we can put it in the simple looking form Ax = b, where A and b are given as above, namely Ax = b, 23 where ⎡ ⎢ ⎢ ⎢ ⎢ x=⎢ ⎢ ⎢ ⎣ x1 x2 .. . xi .. . xn ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ is the (column) vector of unknowns. Actually, this is putting the cart before the horse because this idea actually inspires the whole notion of matrix multiplication. This form suggests a way of thinking about the solution to the system, namely, b . A The only problem is that this doesn’t quite make sense in the matrix world. But we can make it sensible by a little matrix algebra: imagine that there were a multiplicative inverse matrix A−1 for A, that is, a matrix such that x= A−1 A = I = AA−1 . Then we could multiply both sides of Ax = b on the left by A−1 and obtain that x = A−1 b. Matlab represents this equation by x = A\b In general, a square matrix A may not have an inverse. One condition for an inverse to exist is that the matrix have nonzero determinant, where this is a number associated with a matrix such as one saw in high school algebra with Cramer’s rule. We won’t go into details here. Of course, Matlab knows all about inverses and we can compute the inverse of A by issuing the commands A^(-1) or inv(A). Sets of all matrices or vectors of a given size form “number systems” with a scalar multiplication and addition operation that satisfies a list of basic laws. Such a system is a vector space. For example, for vectors from R3 , we have operations ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ a1 b1 a1 + b1 ⎣ a2 ⎦ + ⎣ b2 ⎦ = ⎣ a2 + b2 ⎦ a3 b3 a3 + b3 ⎡ ⎤ ⎡ ⎤ a1 ca1 c ⎣ a2 ⎦ = ⎣ ca2 ⎦ a3 ca3 and similar operations for n-vectors of the vector space Rn or the vector space of n×n matrices Rn×n . Note that we can also deal with complex numbers in more or less the same fashion as reals, giving rise to the complex vector spaces Cn and Cn×n . BTW, see what a nuisance it is to write out column vectors – a space hog. Here’s a remedy. My Shorthand: a tuple like c = (c1 , c2 , . . . , cn ) is understood to represent a column vector [c1 , c2 , . . . , cn ]T . A more exotic example of a vector space (I’m not bothering to give the formal definition) is C[a, b] = {f (x) : [a, b] → R | f (x) is continuous on [a, b]} 24 together with the vector space operations of addition and scalar multiplication given by the formulas (f + g) (x) = f (x) + g (x) (cf ) (x) = c (f (x)) . With vector spaces come a steady flow of ideas that we won’t elaborate on here – such as subspaces, linear independence, spanning sets, bases and many more. Some key vector spaces associated with the m × n matrix A: • Null space (kernel) of A: N (A) = {x ∈ Rn | Ax = 0}, a subspace of the vector space Rn . 6 7 Bn m • Range (column space) of A: R (A) = y ∈ R | y = j=1 cj Aj , the subspace of Rm consisting of linear combinations of the columns Aj of the matrix A (a.k.a. the span of these columns.) A Simple But Key Idea: one checks easily that if x = (x1 , x2 , . . . , xn ) and A = [A1 , A2 , . . . , An ] then Ax = A1 x1 + · · · + An xn . From this we obtain a concise description of R (A), namely that R (A) = {Ax | x ∈ Rn } . Norms. Before we get started, there are some useful ideas that we need to explore. We already know how to measure the size of a scalar (number) quantity x: use |x| as a measure of its size. Thus, we have a way of thinking about things like the size of an error. Now suppose we are dealing with vectors and matrices. Question: How do we measure the size of more complex quantities like vectors or matrices? Answer: We use some kind of yardstick called a norm that assigns to each vector x a non-negative number ∥x∥ subject to the following norm laws for arbitrary vectors x, y and scalar c: • For x ̸= 0, ∥x∥ > 0 and for x = 0, ∥x∥ = 0. • ∥cx∥ = |c| ∥x∥. • ∥x + y∥ ≤ ∥x∥ + ∥y∥. Examples: For vector space Rn and vector x = [x1 , x2 ; . . . ; xn ], • 1-norm: ∥x∥1 = |x1 | + |x2 | + · · · + |xn | • 2-norm (the default norm): % &1/2 ∥x∥2 = x21 + x22 + · · · + x2n • ∞-norm:∥x∥∞ = max (|x1 | , |x2 | , · · · , |xn |) Let’s do some examples in Matlab, which knows all of these norms: >A=[1;-3;2;-1], x=[2;0;-1;2] >norm(x) >norm(x,1) >norm(x,2) >norm(x,inf)>norm(x) >norm(A,1) >norm(A,2) >norm(A,inf) 25 >norm(-2*x),abs(-2)*norm(x) >norm(A*x),norm(A)*norm(x) We can abstract the norm idea to more exotic vector spaces like C[a, b] in many ways. Here is one that is analogous to the infinity norm on Rn : ∥f ∥ ≡ max |f (x)| . a≤x≤b Matrix Norms: For every vector norm ∥x∥ of n-vectors there is a corresponding induced norm on the vector space of n × n matrices A by the formula ∥A∥ = max x̸=0 ∥Ax∥ = max ∥Ax∥ . ∥x∥ ∥x∥=1 These induced matrix norms, also called operator norms, in reference to the operator of matrix multiplication defined by a matrix, satisfy all the basic norm laws and, in addition: ∥Ax∥ ≤ ∥A∥ ∥x∥ (compatible norms) ∥AB∥ ≤ ∥A∥ ∥B∥ (multiplicative norm) One can show directly that if A = [aij ], then n " |aij | • ∥A∥1 = max 1≤j≤n i=1 C • ∥A∥2 = ρ (AT A), where ρ(B) is largest eigenvalue of square matrix B in absolute value. n " |aij | • ∥A∥∞ = max 1≤i≤n j=1 There is a more general definition of matrix norm that requires that the basic norm laws be satisfied as well as one more, the so-called multiplicative property: ∥AB∥ ≤ ∥A∥ ∥B∥ for all matrices conformable for multiplication. One such standard example is the Frobenius norm: given that A = [aij ] m " n " |aij |2 . ∥A∥F = i=1 j=1 One can check that the all the matrix norm properties are satisfied. Inner Products. The 2-norm is special. It can be derived from another operation on vectors called the inner product (or dot product). Here is how it works. Definition. The inner product (dot product) of vectors u, v in Rn is the scalar u · v = uT v. This operation satisfies a long list of properties, called the inner product properties: • • • • u · u ≥ 0 with equality if and only if u = 0. u·v =v·u u · (v + w) = u · v + u · w c (u · v) = (cu) · v 26 For complex numbers and complex vector spaces the only change in the above list of properties is that u · v = v · u. For example, for u, v in Cn , we could define u · v = uT v or u · v = uT v, depending on where we want to put the conjugation sign. What really makes this idea important are the following facts. Here we understand that ∥u∥ means the 2-norm ∥u∥2 . • √ ∥u∥ = u · u • If θ is an angle between the vectors represented in suitable coordinate space, then u · v = ∥u∥ ∥u∥ cos θ. • If θ is an angle between the vectors represented in suitable coordinate space, then u · v = ∥u∥ ∥u∥ cos θ. • Once we have the idea of angle between vectors, we can speak of orthogonal (perpendicular) vectors, i.e., vectors for which the angle between them is π/2 (90◦ ). • Given any two vectors u, v, with v ̸= 0, the formula u·v projv u = v v·v defines a vector parallel to v such that u − projv u is orthogonal to v. • We can introduce the concept of orthogonal set of vectors: An orthogonal set of vectors is one such that elements are pairwise orthogonal. It turns out that orthogonal sets of nonzero vectors are always linearly independent. The Gram-Schmidt algorithm gives us a way of turning any linearly independent set into an orthogonal set with the same span. We can abstract the idea of inner product to more exotic vector spaces. Consider, for example, C[a, b] with the inner product (which we now denote with angle brackets instead of the dot which is reserved for the standard vector spaces) ˆ b ⟨f, g⟩ = f (x) g (x) dx. a Of course, this gives rise to an induced norm ˆ b ∥f ∥ = f (x)2 dx a Eigenvectors and Eigenvalues. An eigenvalue for a square n × n matrix A is a number λ such that for some NONZERO vector x, called an eigenvector for λ, Ax = λx. Eigenvalue Facts: (1) The eigenvalues of A are precisely the roots of the nth degree polynomial p (λ) = |λI − A| , where || denotes determinant. (2) An n × n matrix A has exactly n eigenvalues, counting multiplicities and complex numbers. (3) The eigenvalues of αI + βA are just α + βλ, where λ runs over the eigenvalues of A. (4) More generally, if r(λ) = p(λ)/q(λ) is any rational function, i.e., p, q are polynomials, then the eigenvalues of r(A) are r(λ), where λ runs over the eigenvalues of A, provided both expressions make sense. 27 (5) The spectral radius of A is just ρ (A) = max{|λ| | λ is an eigenvalue of A} (6) Suppose that x0 is an arbitrary initial vector, {bk } is an arbitrary sequence of uniformly bounded vectors, and the sequence {xk }, k = 0, 1, 2, . . . , is given by the formula xk+1 = Axk + bk . If ρ(A) < 1, then {xk } is a uniformly bounded sequence of vectors. Such a matrix A is called a stable matrix. (7) If ρ(A) > 1, then {xk } will not be uniformly bounded for some choices of x0 . (8) The matrix A is diagonalizable, i.e., there exists an invertible matrix P and diagonal matrix D such that ⎡ ⎤ λ1 ⎢ ⎥ λ2 ⎢ ⎥ P −1 AP = D = ⎢ ⎥. . . ⎣ ⎦ . λn Here the diagonal entries of D are precisely the eigenvalues of A. Let’s do a few calculations with matrices to illustrate the idea of eigenvalues and eigenvectors in Matlab, which knows all about these quantities. > A = [3 1 0;-1 3 1;0 3 -2] > eig(A) > eig(A) > [V,D]=eig(A) > v = V(:,1),lam = D(1,1) > A*v,lam*v > x = [1;-2;3] > b = [2;1;0] > x = A*x+b % repeat this line Now let’s demonstrate boundedness of an arbitrary sequence as above. Try the same with a suitable A. > A = A/4 > max(abs(eig(A))) > x = [1;-2;3] > x = A*x+b % repeat this line Condition Number of a Matrix. Condition number is only defined for invertible square matrices. Definition. Let A be a square invertible matrix. The condition number of A with respect to a matrix norm ∥·∥is the number ! ! cond (A) = ∥A∥ !A−1 ! . Roughly speaking, the condition number measures the degree to which changes in A lead to changes in solutions of systems Ax = b. A large condition number means that small changes in A may lead to large changes in x. Of course this quantity is norm dependent. Definition. One of the more important applications of the idea of a matrix norm is the famous Banach Lemma. Essentially, it amounts to a matrix version of the familiar geometric series encountered in calculus. 28 Theorem 5. Let M be a square matrix such that ∥M ∥ < 1 for some operator norm ∥·∥. Then the matrix I − M is invertible. Moreover, (I − M )−1 = I + M + M 2 + · · · + M k + · · · ! ! ! ! and !(I − M )−1 ! ≤ 1/(1 − ∥M ∥). Proof. Form the familiar telescoping series + , (I − M ) I + M + M 2 + · · · + M k = I − M k+1 so that + , I − (I − M ) I + M + M 2 + · · · + M k = M k+1 Now by the multiplicative property of matrix norms and fact that ∥M ∥ < 1 ! ! ! k+1 ! k+1 → 0. !M ! ≤ ∥M ∥ k→∞ % & It follows that the matrix limk→∞ I + M + M 2 + · · · + M k = B exists and that I−(I − M ) B = 0, from which it follows that B = (I − M )−1 . Finally, note that ! ! ! ! 2 k !I + M + M 2 + · · · + M k ! ≤ ∥I∥ + ∥M ∥ + ∥M ∥ + · · · + ∥M ∥ ≤ 1 + ∥M ∥ + ∥M ∥2 + · · · + ∥M ∥k 1 ≤ . 1 − ∥M ∥ Now take the limit as k → ∞ to obtain the desired result. In the case of an operator norm, the Banach lemma has a nice application. ! Corollary 6. If A = I + N, where ∥N ∥ < 1, then cond(A) ≤ 1 + ∥N ∥ 1 − ∥N ∥ We leave the proof as an exercise. Here is a very fundamental result for numerical linear algebra. The scenario: suppose that we desire to solve the linear system Ax = b, where A is invertible. Due to arithmetic error or possibly input data error, we end up with a value x + δx which solves exactly a “nearby” system (A + δA)(x + δx) = b + δb. (It can be shown by using an idea called “backward error analysis” that this is really what happens when many algorithms are used to solve a linear system.) The question is, what is the size of the relative error ∥δx∥/∥x∥? As long as the perturbation matrix ∥δA∥ is reasonably small, there is a very elegant answer. Theorem 7. Suppose that A is invertible, Ax = b, (A+δA)(x+δx) = b+δb and ∥A−1 δA∥ = c < 1 with respect to some operator norm. Then A + δA is invertible and D E cond(A) ∥δA∥ ∥δb∥ | δx∥ ≤ + ∥x∥ 1−c ∥A∥ ∥b∥ Proof. That the matrix I + A−1 δA follows from hypothesis and the Banach lemma. Expand the perturbed equation to obtain (A + δA)(x + δx) = Ax + δAx + Aδx + δA δx = b + δb Now subtract the terms Ax = b from each side and solve for δx to obtain (A + δA)δx = A−1 (I + A−1 δA)δx = −δA · x + δb 29 so that δx = (I + A−1 δA)−1 A−1 [−δA · x + δb] Now take norms and use the additive and multiplicative properties and the Banach lemma to obtain ∥A−1 ∥ ∥δx∥ ≤ [∥δA∥∥x∥ + ∥δb∥] . 1−c Next divide both sides by ∥x∥ to obtain D E ∥A−1 ∥ ∥δb∥ ∥δx∥ ≤ ∥δA∥ + ∥x∥ 1−c ∥x∥ Finally, notice that ∥b∥ ≤ ∥A∥∥x∥. Therefore, 1/∥x∥ ≤ ∥A∥/∥b∥. Replace 1/∥x∥ in the right hand side by ∥A∥/∥b∥ and factor out ∥A∥ to obtain D E ∥δx∥ ∥A−1 ∥∥A∥ ∥δA∥ ∥δb∥ ≤ + ∥x∥ 1−c ∥A∥ ∥b∥ which completes the proof, since by definition, cond A = ∥A−1 ∥∥A∥. ! If we believe that the inequality in the perturbation theorem can be sharp (it can!), then it becomes clear how the condition number of the matrix A is a direct factor in how relative error in the solution vector is amplified by perturbations in the coefficient matrix. Singular Value Decomposition. Here is the fundamental result: Theorem 8. Let A be an m×n real matrix. Then there exist m×m orthogonal matrix U , n×n orthogonal matrix V and m × n diagonal matrix Σ with diagonal entries σ1 ≥ σ2 ≥ . . . ≥ σp , with p = min{m, n}, such that U T AV = Σ. Moreover, the numbers σ1 , σ2 , . . . , σp are uniquely determined by A. Proof. There is no loss of generality in assuming that n = min{m, n}. For if this is not the case, we can prove the theorem for AT and by transposing the resulting SVD for AT , obtain a factorization for A. Form the n × n matrix B = AT A. This matrix is symmetric and its eigenvalues are nonnegative (we leave these facts as exercises). Because they are nonnegative, we can write the eigenvalues of B in decreasing order of magnitude as the squares of positive real numbers, say as σ12 ≥ σ22 ≥ . . . ≥ σn2 . Now we know from the Principal Axes Theorem that we can find an orthonormal set of eigenvectors corresponding to these eigenvalues, say Bvk = σk2 vk , k = 1, 2, . . . , n. Let V = [v1 , v2 , . . . , vn ]. Then V is an orthogonal n×n matrix. Next, suppose that σr+1 , σr+2 , . . . , σn are zero, while σr ̸= 0. Next set uj = σ1j Avj , j = 1, 2, . . . , r. These are orthonormal vectors in Rm since σk2 T 1 T T 1 T 0, if j ̸= k T uj u k = vj A Avk = vj Bvk = vj vk = 1, if j = k σj σk σj σk σj σk Now expand this set to an orthonormal basis u1 , u2 , . . . , um of Rm . This is possible by standard theorems in linear algebra. Now set U = [u1 , u2 , . . . , um ]. This matrix is orthogonal and we calculate that if k > r, then uTj Avk = 0 since Avk = 0, and if k < r, then 0, if j ̸= k T T T uj Avk = uj Avk = σk uj uk = σk , if j = k It follows that U T AV = [uTj Avk ] = Σ. Finally, if U, V are orthogonal matrices such that U T AV = Σ, then A = U ΣV T and therefore B = AT A = V ΣU T U ΣV T = V Σ2 V T 30 so that the squares of the diagonal entries of Σ are the eigenvalues of B. It follows that the numbers σ1 , σ2 , . . . , σn are uniquely determined by A. ! The numbers σ1 , σ2 , . . . , σp are called the singular values of the matrix A, the columns of U are the left singular vectors of A, and the columns of V are the right singular values of A. There is an interesting geometrical interpretation of this theorem from the perspective of linear transformations and change of basis as developed in Section 3.7 of [1]. It can can be stated as follows. Corollary 9. Let T : Rn → Rm be a linear transformation with matrix A with respect to the standard bases. Then there exist orthonormal bases u1 , u2 , . . . , um and v1 , v2 , . . . , vn of Rm and Rn , respectively, such that the matrix of T with these bases is diagonal with nonnegative entries down the diagonal. Proof. First observe that if U = [u1 , u2 , . . . , um ] and V = [v1 , v2 , . . . , vn ], then U and V are the change of basis matrices from the standard bases to the bases u1 , u2 , . . . , um and v1 , v2 , . . . , vn of Rm and Rn , respectively. Also, U −1 = U T . Now apply a change of basis theorem and the result follows. ! We leave the following fact as an exercise. Corollary 10. Let U T AV = Σ be the SVD of A and suppose that σr ̸= 0 and σr+1 = 0. Then (1) rank A = r (2) N (A) = span{vr+1 , vr+2 , . . . , vn } (3) range A = span{u1 , u2 , . . . , ur } We have only scratched the surface of the many facets of the SVD. Like most good ideas, it is rich in applications. We mention one more. It is based on the following fact, which can be proved by examining the entries of A = U ΣV T : The matrix A of rank r can be written as a sum of r rank one matrices, namely A = σ1 u1 v1T + σ2 u2 v2T + · · · + σr ur vrT where the σk , uk , vk are the singular values, left and right singular vectors, respectively. In fact, it can be shown that this representation is the most economical in the sense that the partial sums Ak = σ1 u1 v1T + σ2 u2 v2T + · · · + σk uk vkT , k = 1, 2, . . . , r give the rank k approximation to A that is closest among all rank k approximations to A. Thus, if A has a small number of relatively large singular values, say k of them, then we can approximate A pretty well by Ak . 9. A Tour of Matlab What is Matlab? Matlab is (1) An interactive system for numerical computation. (2) A programmable system for numerical computation. (3) A front-end for state-of-the-art libraries, currently LAPACK and optimized BLAS. History: • Late 70s: Cleve Moler writes Matlab under an NSF grant as a front-end for the current state-of-the-art numerical linear algebra packages LINPACK and BLAS. 31 • Early 80s: Moler founds the MathWorks and introduces a commercial version of Matlab running on a PC and written in Fortran. • Early 90s: Matlab is rewritten in C and “handle graphics” is added into Matlab. • Late 90s: Matlab evolves into an OOP supporting LAPACK and optimized BLAS. An excellent reference for this material is the superb text (recommended but not required) by Desmond Higman and Nicholas Higman, Matlab Guide, SIAM, Philadelphia, 2000. Note: Except for graphics, you can do most of the Matlab work for this course with Octave, an open source program available under Linux – there is a Windows version, too, but it may require some work to set it up. A Tutorial Introduction to Matlab. Here we go.... Our procedure in all the lectures will be to work our way through the commands that are listed in the narrative that follows. Separately one should open a Matlab window for command execution. The prompt sign “> ” and typewriter text indicate a command to be typed in by the instructor or reader at the Matlab command prompt. Comments will not be preceded by the prompt and will appear in normal text. I will make comments on the commands that are executed as we read through the lecture, and students are invited to ask questions as we proceed. In the body of the Matlab command listing there will be occasionally a “%” sign. This is Matlab’s way of starting a comment, and all text until the next end of line is ignored by the Matlab interpreter. It isn’t necessary to type in these comments in order to work through this lesson in a Matlab session. They are inserted to give a little more explanation to the meaning of the commands for those who are reading this file and executing the commands themselves. How to handle a Matlab assignment. Well, do it, of course. The real question is: how do I turn it in? I suggest that you start up Matlab and do the work you need to do. Once you actually knew what you’re doing, redo it with an eye to recording your work and turning it in to me. To keep a recording of your work, you issue the following command to Matlab > diary ’myfile’ (Caution: make sure your current working directory is a place where you can write files – use the icons above the workspace area to change working directories.) Matlab will then send a copy of all your typed input and the output to a file called ’myfile.’ For clarity, use descriptive names for your files, such as ’jsmithasgn1’ so that when I save the files that you email me, I can tell what it’s about by the title. If at any point you want to stop the diary feature, issue the command > diary off To resume the diary feature, simply type > diary on This will cause input to be appended to myfile. You can make comments in your homework file by typing % at the command line and this too will be recorded. For example > % This is a comment. Be sure to start your file with the comments > % Name: yourname > % Email: your email address When you end your session, the file will be closed and you can view it and even edit it with a text editor. As a matter of fact, you can even edit and view it with the Matlab Editor. Just type > edit myfile When you have finished the assignment, you can sow this text into another document. 32 Starting and stopping. Start up Matlab and quit after a few calculations and getting some help. > 1+1 > 1+1; > pi > % to get help on a command or built-in variable just type: > help pi % get help on built-in variable pi > help quit % get help on built-in common > helpwin % or get help browser style > quit A calculator. Restart and do the following: > x=0.3 > sin(x) > exp(x)*sin(x)/log(x) > 2^24 > (1+sqrt(5))/2 > x > format long > x > format > z = 2 -3*i % complex numbers are no problem either > z^2 > exp(z) Simple plotting. Matlab’s plotting, unlike a CAS like Maple, does not plot a function.Rather, it does a dot-to-dot on vectors of x- and y-coordinates. Most Matlab built-in scalar functions know to handle a vector argument by applying the function to each coordinate. (Correction: used to not plot a function. There is a function called ’ezplot’ whose expanded capabilities does just that.) > x = 0: 0.1: 3; > y = sin(x); > plot(x,y); > z = x - x.^2/2 + x.^3/6; > plot(x,z) > % close the plot window, then > plot(x,y) > hold on > plot(x,z) Functions and scripts. There are two kinds of text files that can be “run” under Matlab, both of which have the suffix “.m” The first kind is a script file. This file contains a list of commands that one could execute at the command line. It is rather like a program in C or Fortran, but not as capable. For one thing, there is no division of program into a “main” section and various subroutines. You cannot include function definitions in a script file. Functions, that is, value returning subroutines are what function files, the second kind of executable text file, are good for. Make sure a copy of chebyfcns.m is in your working directory and type > edit chebyfcns 33 This file shows some of the things you can (and can’t) do in Matlab. Multiple functions are a no-no, unless some are private, but this file would actually run just fine under Octave. Numbers. Start Matlab. Matlab is a fine calculator. But as with any calculator, one has to be careful: > ((2/7+1000)-1000)-2/7 Multiple commands are possible > x=sin(.3),y=cos(.3);x^2+y^2 And of course help is always available, specific or general: > help sqrt > helpwin There is another handy facility that is a word search function: > lookfor elliptic One always has access to the last output: > exp(-3) > x=ans; > disp(x) And one can always load and save variables: > save ’junkfile’ x y > % Now clear all variables: > clear > who > load ’junkfile’ x > who Arithmetic. Now a few words about arithmetic, which conforms to double precision IEEE standard. Here eps is the distance from 1 to the next floating point number: > eps > 1+eps > format long > 1+eps > format hex > 1 > 1+eps > 1+2*eps > format There are other kinds of numbers. Just for the record: > realmax > realmin > realmin/2 > realmax/realmin > 0/0 % indeterminate...see what Matlab reports > cos(0)/sin(0) Matrices. This is where Matlab really excels in a way that other computer packages don’t. Building and manipulation matrices is quite easy with Matlab: > A = [1 3 2; 2 1 1; 4 0 2] % create a matrix > size(A) > det(A) 34 > eye(3) > inv(A) > inv(A)*A,A*inv(A) > A^(-1) > A^2 > A + eye(3) > A - 2*eye(3) > (1:3) % create a vector using the colon operator > b = (1:3)’ > x = A\b % solve system A*x = b > x = A^(-1)*b % solve it using inverse of A > (1:3) % create a vector using the colon operator > v = (1:3)’ > eig(A) % calculate eigenvalues of A > [V,lam]=eig(A) % eigenvectors and eigenvalues > A*V(:,2)-lam(2,2)*V(:,2) % check eigen relation > V^(-1)*A*V-lam % better check of eigen relation > zeros(2) % create a 2x2 matrix of zeros > zeros(2,1) % create a 2x1 matrix of zeros > ones(3) > ones(3,2) > eye(3) % create a 3x3 identity matrix > eye(2,3) > rand(3) > A=rand(3)-0.5 % create a 3x3 matrix of random numbers in [-0.5,0.5] > size(A) > x=ones(1,5) > size(x) > length(x) Of course you can build and reshape matrices from the command line in many ways: > a = [1 2 3 > 4, 5, 7 > 2,4, 6] > a = [1 2 3; 4, 5, 7; 2,4, 6] One accesses entries or changes them with a standard mathematical notation: > a(1,3) > a(1,3) = 10 > a(1,3) One can access a vector (row or column) by a single coordinate: > y = x’ % the prime sign ’ performs (Hermitian) transpose operation > x(3) > y(4) = 7 One can build matrices in blocks: > b = [eye(3) a; a eye(3)] > c = repmat(b,2,3) The all-important colon notation gets used in two different ways. First as a separator in a type of vector constructor: > x = 1:5 35 > y = 1:2:5 > z = (1:0.5:5)’ The other principle use if the colon notation is to work as a wild card of sorts. The colon in a position used for a row indicator means > c = b(:, 3:5) Matrix manipulations: > triu(a) > tril(a) > diag(a) > diag(diag(a)) > reshape(a,1,9) Matrix constructions (there is a huge number of special matrices that can be constructed by a special command.) A sampling: > toeplitz((1:5),(1:5).^2) > hilb(6) > vander((1:6)) Multidimensional arrays: Matlab can deal with arrays that are more than two dimensional. Try the following > a = [1 2; 3 4] > a(:,:,2) = [1 1;2 2] > a A final note on matrices: one can even use a convenient Array Editor to modify matrices. Do this: if the Workspace window is not already visible, click on the View button, then check Workspace. Now double-click on the variable ’a’, a matrix we created earlier. The Array Editor will open up. Edit a few entries and close the Editor. Confirm your changes by typing at the command line: > a Objects. Objects are instances of classes. Matlab has some OOP capabilities, but for now we’re going to confine our attention to some fairly simple types of objects. Matlab has five built-in classes of objects, one of which we’ve already seen: • double: double-precision floating point numeric matrix or array • sparse: two-dimensional real (or complex) sparse matrix • char: character structure • struct: structure array • cell: cell array Various Matlab toolboxes provide additional class definitions. Of course, we’ve seen lots of doubles. Here’s another very familiar sort of object, namely a string object: s = ’Hello world’ size(s) s disp(s) The struct object is what it sounds like, a way to create structures and access them. There are two ways to build a structure: command line assignments or the struct command. What actually gets constructed is a structure array. Try the following > record.name = ’John Doe’ > record.hwkscore = 372 36 > record.ssn = [111 22 3333] > record > record(2).name = ’Mary Doe’; > record(2).hwkscore = 40; > record(2).ssn = [222 33 4444]; > record > record(1).hwkscore Finally, a cell object is an array whose elements can be any other object. For example: > A(1,1) = {[1 2; 3 4]}; > A(1,2) = {’John Smith’} > A(2,1) = {249} > A(2,2) = {[1;2;3;4]} > A{1,2} > A(1,2) Basic Graphics. Start with some simplel plotting. Matlab’s plotting, unlike a CAS like Maple, does not plot a function. Rather, it does a dot-to-dot on vectors of x- and y-coordinates. Most Matlab built-in scalar functions know to handle a vector argument by applying the function to each coordinate. > x = 0: 0.1: 3; > y = sin(x); > plot(x,y); > z = x - x.^2/2 + x.^3/6; > plot(x,z) > % close the plot window, then > plot(x,y) > hold on > plot(x,z) Next, we’ll get help to guide us along. The plot command has many options. We’ll hit the highlights. We’ll also keep a help window open so we can peruse the help files as we go. After we open it, we’ll move it to a corner. helpwin Now click on plot in the Help Window and examine the possibilities for the 1-3 character string S. x = 0:0.01:1; % create array of abscissas and suppress output y1 = 4*x.*(1- x); plot(x, y1, ’r+:’) A plot with multiple curves is possible without doing a “hold on”: y2 = sin(pi*x); plot(x, y1, ’r+:’, x, y2, ’g’) We can make it fancier. For example xlabel(’x’) ylabel(’y’) title(’Comparison of 4x(1-x) and sin(\pi x).’) As a matter of fact, you can massage your plot window quite a bit with the tools available in the tool bar. For example, click on the arrow icon and then draw an arrow on the inside of the curves pointing to the inner curve. Then click on the A icon (A for ascii) and click at the 37 base of the arrow. Then type in “4*x*(1-x)”. There are lots of other things one can do to a graph from the graph window itself. You can even save your figures to many different formats. Click on the File button, then the menu choice Export. Select under “Save as type:” the choice Portable Document Format. Then browse to the directory you want, edit the name of the file, say to “junk.pdf.” Then click on Save. Now use the Windows Explorer to find the file you’ve created and double click on it. Acrobat will now show you your picture. There are a number of alternates to the plot command as well. Look in the Help Window. There’s even an ezplot which is a Maple rip-off (or is it the other way around?).Try these: ezplot(’4*x*(1-x)’) polar(x,y2) Interesting. Let’s stretch it out. x = 0:0.01:60; y2 = sin(pi*x); polar(x,y2) Axes and other controls. There are other ways to massage your graph. For example, start with ezplot(’4*x*(1-x)’) hold on ezplot(’sin(pi*x)’) Not very good. Let’s fiddle it a bit: axis equal axis off axis([0 1 0 1.2]) axis on Well, let’s just do it again: x = 0:.01:1; y1 = 4*x*(1-x); y2 = sin(pi*x); plot(x,y1,’--’, x, y2, ’r:’) legend(’ 4x(1-x)’, ’sin(\pi x)’) title(’\it Comparison of 4x(1-x) and sin(\pi x)’) What do you notice about the graph? Multiple plots. Here is the way to construct multiple plots, along with another variation on plot: subplot(2,2,1) plot(x,y1) subplot(222) fplot(’sin(x)’, [0 1]) subplot(223) fplot(’sin(round(2*pi*x))’, [0 1], ’r--’) subplot(224), polar(x,y2) A Bit of Approximation. OK, we’re ready to do something serious. First recall the Weierstrauss theorem (or open up the file VectorSpaces.pdf and look at the last section. Let’s start by viewing polynomial approximations to the function f (x) = sin x, −π ≤ x ≤ π. In Matlab, 38 we represent a polynomial by a vector of coefficients in descending order. Thus x3 − 2x + 3 would be represented as [1, 0, −2, 3]. Here we go: x = (-pi:0.1:pi)’; y = sin(x); plot(x,y) grid, hold on tpoly = [1/120,0,-1/6,0,1,0] % start with a 5th degree Taylor about 0 ty = polyval(tpoly,x); plot(x,ty);% not bad, sort of n = 3; nodes = linspace(-pi,pi,n+1)’; fnodes = sin(nodes); ipoly = vander(nodes)\fnodes; iy = polyval(ipoly,x); plot(x,iy);% what do you think? % play fair and do this again with n = 5 % now start over with the function f(x) = 1/(1+x^2) on the interval [-5,5] % use y = 1./(1+x.^2); in place of y = sin(x); References [1] M.J.D. Powell, Approximation Theory and Methods, Cambridge University Press, Cambridge, 1981. [2] T. Shores, Applied Linear Algebra and Matrix Analysis, Springer, New York, 2007. [3] L. Trefethen and J. Weideman, The exponentially convergent trapezoidal rule, SIAM Rev., 56 (2014), pp. 385–458.
© Copyright 2025 Paperzz