EE448/528 Version 1.0 John Stensby Chapter 6 r r AX = b: The Minimum Norm Solution and the Least-Square-Error Problem Like the previous chapter, this chapter deals with the linear algebraic equation problem r r AX = b. However, in this chapter, we impose additional structure in order to obtain specialized, r r r but important, results. First, we consider Problem #1: b is in the range of A. For AX = b, either a unique solution exists or an infinite number of solutions exist. We want to find the solution with minimum norm. The minimum norm solution always exists, and it is unique. Problem #1 is called r the minimum norm problem. Next, we consider Problem #2: b is not in the range of A so that r r r r r AX = b has no solution. Of all the vectors X that minimize AX − b 2 , we want to find the one with minimum norm. Problem #2 is called the minimum norm, least-square-error problem. Its solution always exists and it is unique. It should be obvious that Problems #1 and #2 are special cases of a more general r r r r problem. That is, given AX = b (with no conditions placed on b), we want to find the X which r r r r simultaneously minimizes both AX − b 2 and X 2 . Such an X always exists, it is always unique, r and it is linearly related to b. Symbolically, we write r r X = A+ b , (6-1) where A+ denotes a linear operator called the pseudo inverse of A (yes, if A-1 exists, then A+ = A-1 r r r r r r and X = A-1b). For Problem #1, X = A+ b will be a solution of AX = b, and it will be the “shortest r r length” (minimum norm) solution. For Problem #2, X = A+ b simultaneously minimizes both r r r r r AX − b 2 and X 2 even though it does not satisfy AX = b (which has no solution for Problem r #2 where b ∉ R(A) ). Problem #1: The Minimum Norm Problem r In this section we consider the case where b ∈ R(A). The system CH6.DOC Page 6-1 EE448/528 r Version 1.0 r AX = b John Stensby (6-2) has a unique solution or it has and infinite number of solutions as described by (5-7). If (6-2) has a unique solution, then this solution is the minimum norm solution, by default. If (6-2) has an infinite number of solutions, then we must find the solution with the smallest norm. In either case, the minimum norm solution is unique, and it is characterized as being orthogonal to K(A), as shown in what follows. r Each solution X of (6-2) can be uniquely decomposed into r r r X = X K⊥ ⊕ X K , (6-3) where r X K⊥ ∈ K( A )⊥ = R ( A ∗ ) r X K ∈ K( A ) . (6-4) However r r r r r AX = A ( X K⊥ ⊕ X K ) = AX K⊥ = b , (6-5) r r r so X K⊥ ∈ K( A )⊥ is the only part of X that is significant in generating b. Theorem 6.1 r In the decomposition (6.3), there is only one vector X K⊥ ∈ K( A )⊥ that is common to all r r r r r solutions of AX = b. Equivalently, there is a unique vector X K⊥ ∈ K( A )⊥ for which A X K⊥ = b. Proof: To show this, assume there are two, and arrive at a contradiction. We assume the r r r r r r existence of X K⊥ ∈ K( A )⊥ and YK⊥ ∈ K( A )⊥ such that A X K⊥ = b and A YK⊥ = b. Then, simple CH6.DOC Page 6-2 EE448/528 Version 1.0 John Stensby r r r r r subtraction leads to the conclusion that A( X K⊥ - YK⊥ ) = 0, or the difference X K⊥ - YK⊥ ∈ K(A). r r This contradiction (as we know, X K⊥ - YK⊥ ∈ K(A)⊥ ) leads to the conclusion that there is only r r r one X K⊥ ∈ K( A )⊥ that is common to all solutions of AX = b (each solution has a decomposition r of the form (6.3) where a common X K⊥ is used).♥ Theorem 6.2 r r r The unique solution X K⊥ ∈ K( A )⊥ is the minimum norm solution of AX = b. That is, r X K⊥ 2 r < X , (6-6) 2 r r r r r where X is any other (i.e., X ≠ X K⊥ ) solution of AX = b. Proof: r r r r r Let X, X ≠ X K⊥ , be any solution of AX = b. As shown by (6-3), we can write the decomposition r r r X = X K⊥ ⊕ X K , (6-7) r where XK ∈ K(A). Clearly, we have r 2 r r 2 r r r r X = X K⊥ ⊕ X K = X K⊥ ⊕ X K , X K⊥ ⊕ X K 2 2 r r r r = X K⊥ , X K⊥ + X K , X K ( due to orthogonality of vectors) (6-8) r r 2 2 = X K⊥ + X K , 2 2 r r so that X K⊥ 2 < X 2 as claimed.♥ When viewed geometrically, Theorem 6.2 is obvious. As shown by Figure 6.1, the r r solution set for AX = b is a linear variety, a simple translation of K(A). Figure 6.1 shows an CH6.DOC Page 6-3 Version 1.0 John Stensby r X Li ne ar V ar iet y Co nt ain in g So lu tio ns of A r X = b r EE448/528 r XK r X K⊥ r X is non - optimum r X K⊥ ∈ K(A)⊥ is optimum (minimum norm) K[ A] r r r r X = X K ⊥ + X K where X K ∈ K( A ) Figure 6-1: The solution set is a linear variety. The minimum norm solution is orthogonal to K(A). r r arbitrary solution X, and it shows the “optimum”, minimum norm, solution X K⊥ . Clearly, the solution becomes “optimum” when it is exactly orthogonal to K(A) (i.e., it is in K(A)⊥). We have solved Problem #1. The Pseudo Inverse A+ r r As shown by Theorems 6.1 and 6.2, when b ∈ R(A), a unique minimum norm solution to r r AX = b always exists. And, this solution is in R(A*) = K(A)⊥. Hence we have a mapping from b r ∈ R(A) back to X ∈ R(A*) (at this point, it might be a good idea to study Fig. 5.1 once again). It is not difficult to show that this mapping is linear, one-to-one and onto. We denote this mapping by A+ : R(A) → R(A*), r (6-9) r r r and we write X = A+b, for b ∈ R(A) and X ∈ R(A*). Finally, our unique minimum norm solution r r r r r to the AX = b, b ∈ R(A), problem (i.e., "Problem #1") is denoted symbolically by X K⊥ = A+b. CH6.DOC Page 6-4 EE448/528 Version 1.0 John Stensby r As a subspace of U, R(A*) is a vector space in its own right. Every X in vector space R(A*) ⊂ U gets mapped by matrix A to something in vector space R(A) ⊂ V. By restricting the domain of A to be just R(A*) (instead of the whole U), we have a mapping with domain R(A*) and co-domain R(A). Symbolically, we denote this mapping by Y A R [ A∗ : R(A ∗ ) → R(A) , (6-10) ] and we call it the restriction of A to R(A*). The mapping (6-10) is linear, one-to-one, and onto (even though A : U → V may not be one-to-one or onto). More importantly, the inverse of mapping (6-10) is the mapping (6-9). As is characteristic of an operator and its inverse, we have Y A + R [ A∗ ] Y A A r r r A+ b = b for all b ∈ R(A) R [ A∗ ] , r r r ∗ X = X for all X ∈ R(A ) (6-11) as expected. Finally, the relationship between (6-9) and (6-10) is illustrated by Fig. 6-2. As defined so far, the domain of A+ is only R(A) ⊂ V. This is adequate for our discussion r of "Problem #1"-type problems where b ∈ R(A). However, for our discussion of "Problem #2" r (and the more general optimization problem, discussed in the paragraph containing (6-1), where b ∈ V), we must extend the domain of A+ to be all of V = R(A) ⊕ K(A*). A+ is already defined on A R(A*) R(A*) = K(A)⊥ R(A) = K(A*)⊥ A+ Figure 6-2: Restricted to R(A*) = K(A)⊥, A is one-to-one and onto, and it has the inverse A+. CH6.DOC Page 6-5 EE448/528 Version 1.0 John Stensby R(A); we need to define the operator on K(A*). We do this in a very simple manner. We define r r r A+b = 0, b ∈ K(A*). (6-12) With the extension offered by (6-12), we have the entire vector space V as the domain of A+. The operations performed by A and A+ are illustrated by Fig. 6.3. When defined in this manner, A+ is called the Moore-Penrose pseudo inverse of A (or simply the pseudo inverse). A number of important observations can be made from inspection of Fig. 6-3. Note that r r r r r r r r r r r r (1) AA+ X = 0 for X ∈ K(A*) = R(A)⊥, (2) AA+ X = X for X ∈ K(A*)⊥ = R(A), (3) A+ AX = 0 for X ∈ K(A) = R(A*)⊥, (6-13) (4) A+ AX = X for X ∈ R(A*) = K(A)⊥, and (5) R(A*) = R(A+ ) and K(A*) = K(A+ ) (however, A* ≠ A+ ). al on og rth O nts re e s a lem ce p pa om bs C Su Space V Space U A R(A*) R(A*) = K(A)⊥ R(A) = K(A*)⊥ Su bs pa c Co es m are pl O em rt en hog ts on al Note that (1) and (2) of (6-13) imply that AA+ is the orthogonal projection of V onto R(A). Also, A+ K(A) = R(A*)⊥ A When restricted to R(A*), A is one-to-one from R(A*) onto R(A). K(A*) = R(A)⊥ A+ is one-to-one from R(A) onto R(A*) . r A maps K(A) to 0 r A+ maps K(A*) to 0 r 0 A+ r 0 Figure 6-3: How A and A+ map the various parts of U and V. CH6.DOC Page 6-6 EE448/528 Version 1.0 John Stensby (3) and (4) of (6-13) imply that A+ A is the orthogonal projection of U onto R(A*) = R(A+ ). Finally, when A-1 exists we have A+ = A-1. A Original Moore and Penrose Definitions of the Pseudo Inverse In 1935, Moore defined A+ as the unique n×m matrix for which U V A+ 1) AA+ is the orthogonal projection of V onto R(A) = K(A*)⊥, and (6-14) 2) A+ A is the orthogonal projection of U onto R(A*) = K(A)⊥. In 1955, Penrose gave a purely algebraic definition of the pseudo inverse. He said that A+ is the unique n×m matrix satisfying 3) AA+ A = A, 4) A+ AA+ = A+ , (6-15) 5) AA+ and A+ A are Hermitian. Of course Moore's and Penrose's definitions are equivalent, and they agree with the geometric description we gave in the previous subsection. While not very inspiring, Penrose's algebraic definition makes it simple to check if a given matrix is A+ . Once we compute a candidate for A+, we can verify our results by checking if our candidate satisfies Penrose's conditions. In addition, Penrose's definition leads to some simple and useful results as shown by the next few theorems. Theorem 6.3 (A+)+ = A++ = A (6-16) Proof: The result follows by direct substitution into (6-15). Theorem 6.4 (A*)+ = (A+)* or A*+ = A+* . CH6.DOC (6-17) Page 6-7 EE448/528 Version 1.0 John Stensby That is, the order of the + and * is not important! Proof: We claim that the pseudo inverse of A* is A + * . Let's use (6-15) to check this out! 3′) A*A+*A* = A*(AA+)* = ((AA+)A)* = (AA+A)* = A* so that 3) holds! 4′) A+*A*A+* = A+*(A+A)* = ((A+A)A+)* = (A+AA+)* = A+* so that 4) holds! 5′) A+*A* = (AA+)* = ((AA+)*)* = (A+*A* )* so A+*A* is Hermitian! So is A*A+* !! Theorem 6.5 Suppose A is Hermitian. Then A+ is Hermitian. Proof: (A+)* = (A*)+ = A+. ? Given matrices A and B, one might ask if (AB)+ = B+A+ in general. The answer is NO! It is not difficult to find some matrices that illustrate this. Hence, a well known and venerable property of inverses (i.e., that (AB)-1 = B-1A-1) does not carry over to psuedo inverses. Computing the Pseudo Inverse A+ There are three cases that need to be discussed when one tries to find A+. The first case is when rank(A) < min(m,n) so that A does not have full rank. The second case is when m×n matrix A has full row rank (rank(A) = m). The third case is when A has full column rank (rank(A) = n). Case #1: r = Rank(A) < Min(m,n) r r In this case, K(A) contains non-zero vectors, and the system AX = b has an infinite number of solutions. For this case, a simple "plug-in-the-variables" formula for A+ does not exist. However, the pseudo inverse can be calculated by using the basic ideas illustrated by Fig. 6.3. r r r r r r Suppose m×n matrix A has rank r. Let X1, X2, ... , Xr be a basis of R(A*) and Y1, Y2, ... , Ym-r be a basis of K(A*). From (6-11) and Fig. 6.3, we see r r A+ AX1 L AX r r r r r Y1 L Ym − r = A+ AX1 L A+ AX r r r = X1 L X r r r r r A+ Y1 L A+ Ym − r (6-18) r r 0 L 0 . r r r Now, note that AX1, ..., AXr is a basis for R(A). Since Y1, Y2, ... , Ym-r is a basis of K(A*) = CH6.DOC Page 6-8 EE448/528 Version 1.0 r r R(A)⊥, we see that m×m matrix AX1 L AX r John Stensby r r Y1 L Ym − r is nonsingular. From (6-18) we have r r A + = [X1 L X r r r r r 0 L 0] AX1 L AX r 1424 3 r r −1 Y1 L Ym − r (6-19) m − r zero columns While not simple, Equation (6-19) is "useable" in many cases, especially when m and n are small integers. Example % Pseudo Inverse Example % Enter 4x3 matrix A. A = [1 1 2; 0 2 2; 1 0 1; 1 0 1] A = 1 1 2 0 2 2 1 0 1 1 0 1 % Have MatLab Calculate an Orthonormal Basis of R(A*) X = orth(A') X = 0.3052 -0.7573 0.5033 0.6430 0.8084 -0.1144 % Have MatLab Calculate an Orthonormal Basis of K(A*) Y = null(A') Y = 0.7559 -0.3780 -0.3780 -0.3780 0 0.0000 -0.7071 0.7071 % Use (6-19) of class notes to calculate the pseudo inverse pseudo = [ X [0 0 0]' [0 0 0]' ]*inv([A*X Y]) pseudo = 0.1429 -0.2381 0.2619 0.2619 0.0000 0.3333 -0.1667 -0.1667 0.1429 0.0952 0.0952 0.0952 % Same as that produced by MatLab's pinv function? pinv(A) ans = 0.1429 -0.2381 0.2619 0.2619 0.0000 0.3333 -0.1667 -0.1667 0.1429 0.0952 0.0952 0.0952 % YES! YES! CH6.DOC Page 6-9 EE448/528 Version 1.0 John Stensby Case #2: r = Rank(A) = m (A has full row rank) r r For this case, we have m ≤ n, the columns of A may or may not be dependent, and AX = b may or may not have an infinite number of solutions. For this case, n×m matrix A* has full column rank and (6-19) becomes e j A + = A ∗ AA∗ −1 , (6-20) a simple formula. Case #3: Rank(A) = n (A has full column rank) For this case, we have n ≤ m, the columns of A are independent. For this case, the n×n r r matrix A*A is nonsingular (why?). On the left, we multiply the equation AX = b by A* to obtain r r A*AX = A*b, and this leads to r r X K⊥ = ( A∗A )−1 A∗ b (6-21) When A has full column rank, the pseudo inverse is A+ = (A*A)-1A*, a simple formula. Example 6-1 A = (6-22) LM1 N0 OP Q 0 1 1 0 LM OP NQ r 2 b= 1 Note that rank(A) = 2 and nullity(A) = 1. By inspection, the general solution is LM OP MM PP NQ LM MM N OP PP Q 2 1 r X = 1 +α 0 0 −1 CH6.DOC Page 6-10 EE448/528 Version 1.0 John Stensby Now, find the minimum norm solution. We do this two ways. Since this problem is so simple, lets compute r 2 2 2 2 X 2 = (2 + α) + 1 + α . The minimum value of this is found by computing d r 2 X 2 = 2( 2 + α ) + 2α = 0 dα which yields α = -1. Hence, the minimum norm solution is LM OP MM PP NQ LM MM N OP PP Q LM OP MM PP NQ 2 1 1 r X K⊥ = 1 − 1 0 = 1 , 0 −1 1 (6-23) a vector that is orthogonal to K(A) as required (please check orthogonality). This same result can be computed by using (6-20) since, as outlined above, special case #2 applies here. First, we compute AA ∗ = LM2 0OP , ( AA∗ )−1 = LM1 / 2 0OP . N0 1 Q N 0 1Q Then, we use these results with (6-20) to obtain LM MM N OP L PP MN Q OP LM2OP = LM10 Q N1Q MMN1 1 0 r r r 1/ 2 0 X K⊥ = A + b = A ∗ ( AA∗ )−1 b = 0 1 0 1 1 0 CH6.DOC OP L O PP MN PQ Q LM OP MM PP NQ 0 1 1 1 = 1 , 1 0 1 Page 6-11 EE448/528 Version 1.0 John Stensby the same results as (6-23)! MatLab’s Backslash ( \ ) and Pinv Functions r r r A unique solution of AX = b exists if b ∈ R(A) and nullity(A) = 0. It can be found by using MatLab's backslash ( \ ) function. The syntax is X = A\b; MatLab calls their backslash matrix left division. r Let r = rank(A) and nullity(A) = n - r ≥ 1. For if b ∈ R(A), the complete solution can be described in terms of n - r independent parameters, as shown by (5-7). These independent r parameters can be chosen to yield a solution X that contains at least n - r zero components. Such a solution is generated by the MatLab syntax X = A\b. So, if a unique solution exists, MatLab's backslash operator finds it; otherwise, the backslash operator finds the solution that contains the r maximum number of zeros (in general, this maximum-number-of-zeros solution will not be X K⊥ , the minimum norm solution). r MatLab can also find the optimum, minimum norm, solution X K⊥ . The MatLab syntax r r for this is X = pinv(A)*b. If a unique solution to AX = b exists, then you can find it using r r this syntax (but, for this case, the backslash operator is a better method). If AX = b has an infinite number of solutions, then X = pinv(A)*b finds the optimum, minimum norm solution. MatLab calculates the pseudo inverse by using the singular value decomposition of A, a method that we will discuss in Chapter 8. Example 6-2 (Continuation of Example 6-1) We apply MatLab to the problem given by Example 6-1. The MatLab diary file follows. % use the same A and b as was used in Example 6-1 A = [1 0 1;0 1 0]; b = [2;1]; % find the solution with the maximum number of zero components X = A\b X = 2 1 0 norm(X) CH6.DOC Page 6-12 EE448/528 Version 1.0 John Stensby ans = 2.2361 % % find the optimum, minimum norm, solution Y = pinv(A)*b Y = 1.0000 1.0000 1.0000 % solution is the same as that found in example 6-1! norm(Y) ans = 1.7321 % the "minimum norm" solution has a smaller norm % then the "maximum-number-of-zeros" solution. Problem #2: The Minimum Norm, Least-Square-Error Problem (Or: What Can We Do With Inconsistent Linear Equations?) There are many practical problems that result, in one way or another, in an inconsistent r r r system AX = b of equations (i.e., b ∉ R(A)). A solution does not exist in this case. However, we r can "least-squares fit" a vector X to an inconsistent system. That is, we can consider the leastr r r squares error problem of calculating an X that minimizes the error AX − b 2 (equivalent to r r 2 minimizing the square error AX − b 2 ). As should be obvious (or, with just a little thought!), r such an X is not always unique. Hence, we find the minimum norm vector that minimizes the r r r r error AX − b 2 (often, the error vector AX - b is called the residual). As it turns out, this r r minimum norm solution to the least-squares error problem is unique, and it is denoted as X = A+ b , where A+ is the pseudo inverse of A. Furthermore, with MatLab's pinv function, we can solve this minimum-norm, least-square-error problem (as it is called in the literature). However, before we jump too far into this problem, we must discuss orthogonal projections and projection operators. The Orthogonal Projection of One Vector on Another Let's start out with a simple, one dimensional, projection problem. Suppose we have two r r r vectors, X1 and X2, and we want to find the "orthogonal projection", denoted here as Xp, of r r vector X2 onto vector X1. As depicted by Figure 6-4, the "orthogonal projection" is co-linear r r r r with X1. And, the error vector X2 - Xp is orthogonal to X1. CH6.DOC Page 6-13 EE448/528 Version 1.0 r X2 r 0 John Stensby r r X2 − X p r Xp r X1 r r Figure 6-4: Orthogonal projection of X 2 onto X1 r r We can find a formula for Xp. Since the error must be orthogonal to X1, we can write r r r r r r r r r r 0 = X 2 − X p , X1 = X 2 − αX1, X1 = X 2 , X1 − α X1, X1 (6-24) which leads to r r X 2 , X1 α= r 2 X1 2 and r r r r X2 , X1 r r X1 X p = r 2 X1 = X 2 , r X1 2 X1 2 r (6-25) r X1 r . X1 2 r (6-26) r r r r Hence, Xp is the component of X2 in the X1 direction; X2 - Xp is the component of X2 in a r direction that is perpendicular to X1. In the next section, these ideas are generalized to the projection of a vector on an arbitrary subspace. Orthogonal Projections Let W be a subspace of n-dimensional vector space U (the dimensionality of W is not important in this discussion). An n×n matrix P is said to represent an orthogonal projection operator on W (more simply, P is said to be an orthogonal projection on W) if CH6.DOC Page 6-14 EE448/528 Version 1.0 John Stensby a) R(P) = W b) P2 = P (i.e., P is idempotent) (6-27) c) P* = P (i.e., P is Hermitian) We can obtain some simple results from consideration of definition (6-27). First, from a) r r r r r we have PX ∈ W for all X ∈ U. Secondly, from a) and b) we have PX = X for all X ∈ W. To r r r r r r r r r see this, note that for X ∈ W = R(P) we have P(PX - X) = P2X - PX = PX - PX = 0 so that PX - r X ∈ K(P). r r However, PX - X ∈ R(P) = W. r r The only vector that is in K(P) and R(P) r r simultaneously is 0. Hence, PX = X for each and every X ∈ W. The third observation from r r r (6-27) is that a), b) and c) tell us that (I - P)X ∈ W⊥ for each X ∈ U. To see this, consider any Y r r r r ∈ W = R(P). Then, Y = PX2 for some X2 ∈ U. Now, take any X1 ∈ U and write r r r r r r r r ( I − P )X1, Y = (I − P )X1, PX 2 = P∗ ( I − P )X1, X 2 = P( I − P )X1, X2 r r r r r r = PX1 − P 2X1, X 2 = PX1 − PX1, X 2 . (6-28) =0 r r r Hence, for all X1 ∈ U, we have (I - P)X1 orthogonal to W. Hence, X is decomposed into a part r r PX ∈ W and a part (I - P)X ∈ W⊥. Figure 6.5 depicts this important decomposition. The fourth observation from (6-27) is r r r X ∈ W⊥ ⇒ PX = 0. (6-29) CH6.DOC Page 6-15 EE448/528 Version 1.0 r X r 0 John Stensby r (I - P)X ∈ W ⊥ r PX Subspace W r r Figure 6-5: Orthogonal Projection ofr X onto a subspace W. X is r decomposed into PX ∈ W and (I - P)X ∈ W⊥. This is obvious; it follows from the facts that r PX ∈ W r r r ( I − P )X = X − PX ∈ W ⊥ , (6-30) r r r r and if X ∈ W⊥, the only way (6-30) can be true is to have PX = 0 for all X ∈ W⊥. The fifth r observation is that P plays a role in the unique direct sum decomposition of any X ∈ U. Recall r r r r r that U = W⊕W⊥, and any X ∈ U has the unique decomposition X = X W ⊕ X W ⊥ , where XW ∈ r r r r r W and X W ⊥ ∈ W ⊥ . Note that PX = XW and (I - P)X = X W ⊥ . Hence, if P is the orthogonal projection operator on W, then (I - P) is the orthogonal projection operator on W⊥. PX is the Optimal Approximation of X r r r Of all vectors in W, PX is the "optimal" approximation of X ∈ U. Consider any Xδ ∈ W r r r r r r and any X ∈ U, and note that Z ≡ PX + Xδ ∈ W (think of Xδ ∈ W as being a perturbation of PX r r r r r r r ∈ W). We show that X − PX 2 ≤ X − {PX + Xδ } 2 for all X ∈ U and Xδ ∈ W. Consider r r r 2 r r r r r r X − {PX + Xδ } = ( X − PX ) − X δ , ( X − PX ) − X δ 2 r r r r r r r r r r r r = X − PX, X − PX − X δ , X − PX − X − PX, X δ + Xδ , Xδ . CH6.DOC (6-31) Page 6-16 EE448/528 Version 1.0 John Stensby r r r r r r However, the cross terms Xδ , X − PX and X − PX, Xδ are zero so that r r r 2 r r 2 r 2 X − {PX + Xδ } = X − PX + Xδ . 2 2 (6-32) 2 From this, we conclude that r r 2 r r r 2 r r 2 r 2 X − PX ≤ X − {PX + Xδ } = X − PX + Xδ 2 2 r 2 (6-33) 2 r r for all Xδ ∈ W and X ∈ U. Of all the vectors in subspace W, PX ∈ W comes “closest” (in the 2- r norm sense) to X ∈ U. Projection Operator on W is Unique The orthogonal projection on a subspace is unique. Assume, for the moment, that there are n×n matrices P1 and P2 with the properties given by (6-27). Use these to write r r r r 2 r∗ r ∗ r ∗ ∗ ( P1 − P2 )X 2 = X ( P1 − P2 ) ( P1 − P2 )X = P1X ( P1 − P2 )X − P2X ( P1 − P2 )X c h r r ∗ = P1X {P1 − I} − {P2 − I} X − c hb r g r c h (6-34) r r ∗ P2X {P1 − I} − {P2 − I} X c hb g r r However both P1X and P2X are orthogonal to both (P1 - I)X and (P2 - I)X. This leads to the conclusion that r ( P1 − P2 )X 2 = 0 (6-35) r for all X ∈ U, a result that requires P1 = P2. Hence, given subspace W, the orthogonal projection operator on W is unique, as originally claimed. CH6.DOC Page 6-17 EE448/528 Version 1.0 John Stensby How to “Make” an Orthogonal Projection Operator r r r Let v1, v2, ... , vk be n×1 orthonormal vectors that span k-dimensional subspace W of ndimensional vector space U. Use these vectors as columns in the n×k matrix r r r Q ≡ v1 v 2 L v k . (6-36) We claim that n×n matrix P = Q Q∗ (6-37) r is the unique orthogonal projection onto subspace W. To show this, note that for any vector X ∈ U we have r r r r PX = Q Q∗ X = v1 v 2 L LM vr1∗ OP r∗ r MM v 2 PP r r vk X = v1 MM M PP MNvr ∗k PQ r v2 L r LM vr1∗ Xr OP r∗ r r MM v 2 X PP vk MM M PP MNvr ∗k Xr PQ r r (6-38) r r Now, PX is a linear combination of the orthonormal basis v1, v2, ... , vk. As X ranges over all of r r r r r U, the vector PX ranges over all of W (since Pvj = vj, 1 ≤ j ≤ k, and v1, ... , vk span W). As a result, R(P) = W; this is the first requirement of (6-27). The second requirement of (6-27) is met by realizing that P2 = (QQ*)(QQ*) = Q I Q* = P, (6-39) since Q*Q is an k×k identity matrix I. Finally, the third requirement of (6-27) is obvious, and P = CH6.DOC Page 6-18 EE448/528 Version 1.0 John Stensby QQ* is the unique orthogonal projection on the subspace W. Special Case: One Dimensional Subspace W r r Consider the one dimensional case W = span(X1), where X1 has unit length. Then (6-37) r r r r gives us n×n matrix P = X1 X1∗ . Let X2 ∉ W. Then the orthogonal projection of X2 onto W is r r r r r r r r r r r r r r X p = PX 2 = ( X1 X1∗ )X 2 = X1 ( X1∗X 2 ) = ( X1∗X 2 )X1 = X2 , X1 X1 , (6-40) r r r where we have used the fact that X1∗X 2 is a scalar. In light of the fact that X1 has unit length, (6-40) is the same as (6-26). Now that we are knowledgeable about projection operators, we can r r r handle the important AX = b, b ∉ R(A), problem. The Minimum-Norm, Least-Squares-Error Problem At last, we are in a position to solve yet another colossal problem in linear algebraic r r r equation theory. Namely, what to do with the AX = b problem when b ∉ R(A) so that no exact solution(s) exist. This type of problem occurs in applications where there are more equations (constraints) then there are unknowns to be solved for. In this case, the problem is said to be overdetermined. r r r Such a problem is characterized by a matrix equation AX = b, where A is m×n with m > n, and b r is m×1. Clearly, the rows of A are dependent. And, b may have been obtained by making r imprecise measurements, so that rank(A) ≠ rank(A ¦ b), and no solution exists. Thus, not knowing what is important in a model (i.e., which constraints are the most important and which r can be thrown out), and an inability to precisely measure input b, can lead to an inconsistent set of equations. A "fix" for this problem is to throw out constraints (individual equations) until the modified system has a solution. However, as discussed above, it is not always clear how to accomplish this when all of the m constraints appear to be equally valid (ignorance keeps us from obtain a "better" model). Instead of throwing out constraints, it is sometimes better to do the best with what you CH6.DOC Page 6-19 EE448/528 Version 1.0 John Stensby r have. Often, the best approach is to find an X that satisfies r r AX − b 2 r r AZ − b . = min r (6-41) 2 Z∈U r r r That is, we try to get AX as close as possible to b. The problem of finding X that minimizes r r AX − b 2 is known as the least squares problem (since the 2-norm is used). A solution to the least squares problem always exists. However, it is not unique, in r r general. To see this, let X minimize the norm of the residual; that is, let X satisfy (6-41) so that it r is a solution to the least squares problem. Then, add to X any vector in K(A) to get yet another vector that minimizes the norm of the residual. It is common to add structure to the least squares problem in order to force a unique r r r r solution. We choose the smallest norm X that minimizes AX − b 2 . That is, we want the X that r r r 1) minimizes AX − b 2 , and 2) minimizes X 2 . This problem is called the minimum-norm, r least-squares problem. For all coordinate vectors b in V, it is guaranteed to have a unique r solution (when b ∈ R(A), we have a "Problem #1"-type problem - see the discussion on page 6- r r 1). And, as shown below, from b in V back to the coordinate vector X in U, the mapping is the r r r pseudo inverse of A. As before, we write X = A+ b for each b in V. The pseudo inverse solves the general optimization problem that is discussed in the paragraph containing Equation (6-1). r r r r r To verify that X = A+ b is the smallest norm X that minimizes AX − b 2 , let’s go about r r the process of minimizing the norm of the residual AX - b. First, recall from Fig. 5-1 that U = r r R(A*)⊕K(A) and V = R(A)⊕K(A*). Hence, X ∈ U and b ∈ V have the unique decompositions r r r X = X R [ A∗ ] ⊕ X K[ A ] , r r r b = b R [ A ] ⊕ b K[ A ∗ ] , CH6.DOC R|Xrr S|X T R [ A∗ ] K[ A ] R| S| T ∈ R ( A∗ ) ∈ K[A ] U| V| W U| V| W r bR[A ] ∈ R ( A ) r . b K[ A ∗ ] ∈ K( A ∗ ) (6-42) Page 6-20 EE448/528 Version 1.0 John Stensby Now, we compute r r r r2 r r 2 AX − b = A{X R [ A∗ ] ⊕ X K[ A ]}- {b R [ A ] ⊕ b K[ A ∗ ]} 2 2 r r r 2 = {AX R [ A∗ ] − b R [ A ]} − b K[ A ∗ ] (6-43) 2 r r r 2 2 = AX R [ A∗ ] − b R [ A ] + b K[ A ∗ ] . 2 2 r r The second line of this result follows from the fact AX K[ A ] = 0 . The third line follows from the r r r orthogonality of {AX R [ A∗ ] − b R [ A ]} and b ker[A ∗ ] . In minimizing (6-43), we have no control over r r r r b K[ A ∗ ] . However, how to choose X should be obvious. Select X = X R [ A∗ ] ∈ R(A*) = K(A)⊥ r r r which satisfies A X R [ A∗ ] = b R[ A ] . By doing this, we will minimize (6-43) and X 2 r simultaneously. However, this optimum X is given by r r r r r X R [ A∗ ] = A+ b R [ A ] = A+ b R [ A ] ⊕ b K[ A ∗ ] = A+ b . (6-44) From this, we conclude that the pseudo inverse solves the general optimization problem r r introduced by the paragraph containing Equation (6-1). Finally, when you use X = A+b, the norm r of the residual becomes b K[ A ∗ ] 2 ; folks, it just doesn’t get any smaller than this! r Let’s give a geometric interpretation of our result. Given b ∈ V, the orthogonal r r projection of b on the range of A is b R[ A ] . Hence, when solving the general minimum norm, r r r least-square error problem, we first project b on R(A) to obtain b R[ A ] . Then we solve AX = r b R[ A ] for its minimum norm solution, a "Problem #1"-type problem. Folks, this stuff is worth repeating and illustrating with a graphic! Again, to solve the general problem outlined on page 6-1, we r r 1) Orthogonally project b onto R(A) to get b R[ A ] , r r r 2) Solve AX = b R[ A ] for X ∈ R(A*) = K(A)⊥ (a “Problem #1” exercise). CH6.DOC Page 6-21 Version 1.0 John Stensby Co nt ain in g r X′ r b R [A ] V ar ie ty r r b − b R[A ] r X Li ne ar r b So lu tio ns of A r X = b r EE448/528 (a) K( A) r X ′ is non - optimum r X is optimum (b) r r Figure 6-6: a) b R[A ] is the orthogonal projection of b onto R(A). b) Find the minimum norm r r solution to AX = b R[A ] (this is a "Problem #1"-type problem). r This two-step procedure produces an optimum X that minimizes r r AX − b 2 and r X2 simultaneously! It is illustrated by Fig. 6-6. Application of the Pseudo Inverse: Least Squares Curve Fitting We want to pass a straight line through a set of n data points. We want to do this in such a manner that minimizes the squared error between the line and the data. Suppose we have the data points (xk, yk), 1 ≤ k ≤ n. As illustrated by Figure 6-7, we desire to fit the straight line y = a0x + a1 (6-45) to the data. We want to find the coefficients a0 and a1 that minimize CH6.DOC Page 6-22 Version 1.0 y-axis EE448/528 John Stensby x y= + a 1x a0 x x x x-axis x x x denotes supplied data point Figure 6-7: Least-squares fit a line to a data set. Error 2 = n ∑ [ yk − ( a1xk + a0 )]2 . (6-46) k =1 This problem can be formulated as an overdetermined linear system, and it can be solved by using the pseudo inverse operator. On the straight line, define ~ yk, 1 ≤ k ≤ n, to be the ordinate values that correspond to the xk, 1 ≤ k ≤ n. That is, define ~ yk = a1xk + a0, 1 ≤ k ≤ n, as the line points. Now, we adopt the vector notation r T YL ≡ ~y1 ~y2 L ~yn r T Yd ≡ y1 y2 L y n , r (6-47) r where YL denotes “line values”, and Yd denotes “y-data”. Now, denote the “x-data” n×2 matrix as LM1 1 Xd = M MMM N1 OP PP PQ x1 x2 . M xn (6-48) Finally, the “line equation” is CH6.DOC Page 6-23 EE448/528 Xd Version 1.0 John Stensby LMa0 OP = Yr L , N a1 Q (6-49) which defines a straight line with slope a1. Our goal is to select [a0 a1]T to minimize LM OP N Q r a0 r 2 r r 2 − Yd . YL − Yd 2 = Xd a1 (6-50) 2 Unless all of the data lay on the line, the system LM OP N Q r a0 r = Yd Xd a1 (6-51) is inconsistent, so we want a “least squares fit” to minimize (6-50). From our previous work, we know the answer is LMa0 OP = Xd+ Yrd , N a1 Q (6-52) where Xd+ is the pseudo inverse of Xd. Using (6-22), we write LMa0 OP = dX T X i−1XdT Yrd , N a1 Q d d (6-53) Xd is n×2 with rank 2. Hence X dT X d is a 2×2 positive definite symmetric (and nonsingular) matrix. Example Consider the 5 points CH6.DOC Page 6-24 EE448/528 Version 1.0 k xk yk 1 0 0 2 1 1.4 3 2 2.2 4 3 3.5 5 5 4.4 LM1 MM11 Xd = MM1 MN1 OP PP PP PQ 0 1 2 , 3 5 John Stensby LM 0 OP 14 . M P r Yd = M2.2P MM35. PP MN4.4PQ MatLab yields the pseudo inverse X d+ = LM .5270 N−.1486 .3784 .2297 .0811 −.2162 −.0811 −.0135 .0541 .1892 OP Q so that the coefficients a0 and a1 are LMa0 OP = Xd+ Yrd = LM.3676OP N a1 Q N.8784Q The following MatLab program plots the points and the straight line approximation. % EX7_3.M Least-squares curve fit with a line using % \ operator and polyfit. The results are displayed % and plotted. x=[0 1 2 3 5]; % Define the data points y=[0 1.4 2.2 3.5 4.4]; A1=[1 1 1 1 1 ]'; % Least squares matrix A=[A1 x']; Als=A'*A; CH6.DOC Page 6-25 Version 1.0 bls=A'*y'; % Compute least squares fit Xlsq1=Als\bls; Xlsq2=polyfit(x,y,1); f1=polyval(Xlsq2,x); error=y-f1; disp(' x y f1 y-f1') table=[x' y' f1' error']; disp(table) fprintf('Strike a key for the plot\n') pause % Plot clf plot(x,y,'xr',x,f1,'-') axis([-1 6 -1 6]) title('Least Squares line Figure 7.3') xlabel('x') ylabel('y') John Stensby Least Squares line Figure 7.3 6 5 4 3 y EE448/528 2 1 0 -1 -1 0 1 2 3 4 5 x The output of the program follows. x y 0 0 f1 y-f1 0.3676 -0.3676 1.0000 1.4000 1.2459 0.1541 2.0000 2.2000 2.1243 0.0757 3.0000 3.5000 3.0027 0.4973 5.0000 4.4000 4.7595 -0.3595 Strike a key for the plot N-Port Resistive Networks: Application of Pseudo Inverse Consider the n-port electrical network. I1 + V1 - I2 We treat the n-port as a black box that is + V2 - In + Vn - ... ... Black Box n-Port Resistive Network Figure 6-8: n-port network described by terminal behavior. CH6.DOC Page 6-26 EE448/528 Version 1.0 John Stensby characterized by its terminal behavior. Suppose there are only resistors and linearly dependent sources in the black box. Then we can characterize the network using complex-valued voltages and currents (i.e., “phasors”) by a set of equations of the form V1 = z11I1 + z12I2 + L + z1n I n V2 = z21I1 + z22I2 + L + z2n I n . M M M (6-54) M Vn = z n1I1 + z n 2I2 + L + z nn I n r r r r This is equivalent to writing V= ZI where V = [V1 V2 ... Vn]T, I = [I1 I2 ... In]T and LM z11 z21 Z=M MM M Nzn1 z12 L z1n z22 L z2n M L M z n 2 L z nn OP PP PQ (6-55) is the n×n impedance matrix. The zik, 1 ≤ i,k ≤ n, are known as open circuited impedance parameters (they are real-valued since we are dealing with resistive networks). The network is said to be reciprocal if input port i can be interchanged with output port j and the relationship between port voltages and currents remain unchanged. That is, the network is reciprocal if Z is Hermitian (Z* = ZT since the matrix is real-valued). Also, for a reciprocal resistive n-port, it is possible to show that Hermitian Z is positive semi-definite. Suppose we connect two reciprocal n-ports in parallel to form a new reciprocal n-port. Given impedance matrices Z1 and Z2 for the two n-ports, we want to find the impedance matrix Zequ of the parallel combination. Before doing this, we must give some preliminary material. Theorem 6-6 Given n×n impedance matrices Z1 and Z2 of two resistive, reciprocal n port networks. Then CH6.DOC Page 6-27 EE448/528 Version 1.0 John Stensby R ( Z1 ) + R ( Z2 ) = R ( Z1 + Z2 ) . (6-56) Proof: As discussed above, we know that Z1 and Z2 are Hermitian and non-negative definite. We prove this theorem by showing 1) R(Z1 + Z2) ⊂ R(Z1) + R(Z2), and 2) R(Z1) ⊂ R(Z1 + Z2) and R(Z2) ⊂ R(Z1 + Z2) so that R(Z1) + R(Z2) ⊂ R(Z1 + Z2). r r r r r We show #1. Consider Y ∈ R(Z1 + Z2). Then there exists an X such that Y = (Z1 + Z2)X = Z1X r r r r + Z2X. But Z1X ∈ R(Z1) and Z2X ∈ R(Z2). Hence, Y ∈ R(Z1) + R(Z2), and this implies that R(Z1 + Z2) ⊂ R(Z1) + R(Z2). r r r We show #2. Let Y ∈ K(Z1+Z2) so that (Z1+Z2)Y = 0. Then r r r r r r r r 0 = Y,( Z1 + Z2 )Y = ( Z1 + Z2 )Y, Y = Z1Y, Y + Z2Y, Y . (6-57) Now, Z1 and Z2 are non-negative definite so that both inner products on the right-hand side of (6-57) must be non-negative. Hence, we must have r r r r Z1Y, Y = Z2Y, Y = 0 . (6-58) r r r r Since Z1 and Z2 are Hermitian, Equation (6-58) can be true if and only if Z1Y = 0 and Z2Y = 0; r r that is, Y ∈ K(Z1) and Y ∈ K(Z2). Hence, we have shown that K(Z1 + Z2 ) ⊂ K(Z1 ), K(Z1 + Z2 ) ⊂ K(Z2 ) (6-59) R(Z1 ) ⊂ R(Z1 + Z2 ), R(Z2 ) ⊂ R(Z1 + Z2 ) Finally, the second set relationship in (6-59) implies R(Z1) + R(Z2) ⊂ R(Z1+Z2), and the theorem is proved.♥ CH6.DOC Page 6-28 EE448/528 Version 1.0 John Stensby Parallel Sum of Matrices Let N1 and N2 be reciprocal n-port resistive networks that are described by n×n matrices Z1 and Z2, respectively. The parallel sum of Z1 and Z2 is denoted as Z1:Z2, and it is defined as Z1:Z2 = Z1 (Z1 + Z2)+ Z2 , (6-60) an n×n matrix. Theorem 6-7 Let Z1 and Z2 be impedance matrices of reciprocal n-port resistive networks. Then, we have Z1:Z2 = Z1 (Z1 + Z2)+ Z2 = Z2 (Z1 + Z2)+ Z1 = Z2:Z1 (6-61) Proof: By inspection, the matrix (Z1 + Z2)(Z1 + Z2)+(Z1 + Z2) is symmetric. Multiply out this matrix to obtain (Z1 + Z2)(Z1 + Z2)+(Z1 + Z2) = [Z1(Z1 + Z2)+Z1 + Z2(Z1 + Z2)+Z2] + [Z1(Z1 + Z2)+Z2 + Z2(Z1 + Z2)+Z1] . (6-62) The left-hand-side of this result is symmetric. Hence, the right-hand-side must be symmetric as well. This requires that both Z1(Z1 + Z2)+Z2 and Z2(Z1 + Z2)+Z1 be symmetric, so that b Z1 Z1 + Z2 g+ Z2 = dZ1( Z1 + Z2 )+ Z2 iT = ZT2 e( Z1 + Z2 )+ jT Z1T = ZT2 e( Z1 + Z2 )T j+ Z1T (6-63) b = Z2 Z1 + Z2 g+ Z1 as claimed. Hence, we have Z1:Z2 = Z2:Z1 and the parallel sum of symmetric matrices is a CH6.DOC Page 6-29 EE448/528 Version 1.0 John Stensby symmetric matrix. Theorem 6-8: Parallel n-Port Resistive Networks Suppose that N1 and N2 are two reciprocal n-port resistive networks that are described by impedance matrix Z1 and Z2, respectively. Then the parallel connection of N1 and N2 is described by the impedance matrix Zequ = Z1 : Z2 = Z1(Z1 + Z2)+Z2 , (6-64) the parallel sum of matrices Z1 and Z2. Proof: r r r r r Let I 1, I 2 and I = I 1 + I 2 be the current vectors entering networks N1, N2 and the parallel r combination of N1 and N2, respectively. Likewise, let V be the voltage vector that is supplied to the parallel connection of N1 and N2. To prove this theorem, we must show that r r r V=Z1(Z1 + Z2)+Z2 I = (Z1 : Z2)I . (6-65) r Since V is supplied to the parallel combination, we have r r r V = Z1 I 1 = Z2 I 2 . (6-66) Use this last equation to write r r r r r Z1I = Z1( I1 + I2 ) = V + Z1I2 r r r r r Z2 I = Z2 ( I1 + I2 ) = V + Z2 I1 . (6-67) Now, multiply these last two equations by (Z1 + Z2)+ to obtain CH6.DOC Page 6-30 EE448/528 Version 1.0 John Stensby r r r ( Z1 + Z2 )+ Z1I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z1I2 r r r ( Z1 + Z2 ) Z2 I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z2 I1 . (6-68) + On the left, multiply the first of these by Z2 and the second by Z1 to obtain (using the parallel sum notation) r r r ( Z2: Z1 ) I = Z2 ( Z1 + Z2 )+ V + ( Z2: Z1 ) I2 r r r ( Z1: Z2 ) I = Z1( Z1 + Z2 )+ V + ( Z1: Z2 ) I1 . (6-69) r r r However, we know that Z1:Z2 = Z2:Z1, I = I 1 + I 2, and Z1+Z2 is symmetric. Hence, when we add the two equation (6-69) we get r r ( Z1: Z2 ) I = ( Z1 + Z2 )( Z1 + Z2 )+ V , r = PV (6-70) where P ≡ (Z1 + Z2)(Z1 + Z2)+ (6-71) r r r is an orthogonal projection operator onto the range of Z1 + Z2. Since V = Z1 I 1 = Z2 I 2, we know r r that V ∈ R(Z1) and V ∈ R(Z2). However, by Theorem 6-6, we know that R(Z1) ⊂ R(Z1 + Z2) r r r and R(Z2) ⊂ R(Z1 + Z2). Hence, we have V ∈ R(Z1 + Z2) so that PV = V, where P is the orthogonal projection operator (6-71). Hence, from (6-70), we conclude that r r r ( Z1: Z2 ) I = P V = V , (6-72) so the parallel sum Z1:Z2 is the impedance matrix for the parallel combination of N1 and N2.♥ CH6.DOC Page 6-31 EE448/528 Version 1.0 John Stensby 1 3 • Example Consider the 3-port networks shown to the right (1 to 1', 2 to 2' and 3 to 3' are the ports; voltages are sensed positive, and currents are defined as entering, the unprimed terminals 1, 2 and 3). For 1 ≤ i, j ≤ • R1 2 • • R2 2' • 3, the entries in the impedance matrix are 1' 3' • V zij ≡ i , Ij where Vi is the voltage across the ith port (voltage is signed positive at the unprimed port terminal), and Ij is the current entering the jth port (current defined as entering unprimed port terminal). Elementary circuit theory can be used to obtain LMR1 Z= 0 MMR N1 0 R1 R2 R1 + R 2 R2 R2 OP PP Q as the impedance matrix for this 3-port. Connect two of these 3-port networks in parallel to form the network depicted to the 2 right. The impedance matrices are 3 LM1 Z1 = 0 MM1 N L1 M Z 2 = M0 MN1 OP PP Q , 1O 1PP 2PQ 0 1 2 2 2 3 0 1 1 • 2' 2 • 2' • • 1Ω • 1' • 2Ω • 3' • 1 3 • • 1Ω 2 1' 1 • 1' • 3' • • • 1Ω 2' • CH6.DOC 1 3 • 3' • Page 6-32 EE448/528 Version 1.0 John Stensby both of which are singular. By Theorem 6-8, the impedance matrix for the parallel combination is Zequ = Z1: Z2 = Z1( Z1 + Z2 )+ Z2 LM1 = M0 MN1 CH6.DOC 0 1 2 2 2 3 OP F LM1 PP GG MM10 Q HN PPO PQ LM MM N 0 1 1 0 1 2 2 + 0 1 1 2 3 1 1 2 OPI + LMML1 PPJJ MMMM01 QK N N OPOP LM1 / 2 PPPP = MM1 0/ 2 QQ N 0 1 1 1 1 2 OP PP Q 0 1/ 2 . 2/3 2/3 2/3 7/6 Page 6-33
© Copyright 2024 Paperzz