Chapter Six - Minimum Norm Solution and the Least-Square

EE448/528
Version 1.0
John Stensby
Chapter 6
r r
AX = b: The Minimum Norm Solution and the Least-Square-Error Problem
Like the previous chapter, this chapter deals with the linear algebraic equation problem
r
r
AX = b. However, in this chapter, we impose additional structure in order to obtain specialized,
r
r
r
but important, results. First, we consider Problem #1: b is in the range of A. For AX = b, either
a unique solution exists or an infinite number of solutions exist. We want to find the solution with
minimum norm. The minimum norm solution always exists, and it is unique. Problem #1 is called
r
the minimum norm problem. Next, we consider Problem #2: b is not in the range of A so that
r r
r r
r
AX = b has no solution. Of all the vectors X that minimize AX − b 2 , we want to find the one
with minimum norm. Problem #2 is called the minimum norm, least-square-error problem. Its
solution always exists and it is unique.
It should be obvious that Problems #1 and #2 are special cases of a more general
r
r
r
r
problem. That is, given AX = b (with no conditions placed on b), we want to find the X which
r r
r
r
simultaneously minimizes both AX − b 2 and X 2 . Such an X always exists, it is always unique,
r
and it is linearly related to b. Symbolically, we write
r
r
X = A+ b ,
(6-1)
where A+ denotes a linear operator called the pseudo inverse of A (yes, if A-1 exists, then A+ = A-1
r
r
r
r
r
r
and X = A-1b). For Problem #1, X = A+ b will be a solution of AX = b, and it will be the “shortest
r
r
length” (minimum norm) solution. For Problem #2, X = A+ b simultaneously minimizes both
r r
r
r r
AX − b 2 and X 2 even though it does not satisfy AX = b (which has no solution for Problem
r
#2 where b ∉ R(A) ).
Problem #1: The Minimum Norm Problem
r
In this section we consider the case where b ∈ R(A). The system
CH6.DOC
Page 6-1
EE448/528
r
Version 1.0
r
AX = b
John Stensby
(6-2)
has a unique solution or it has and infinite number of solutions as described by (5-7). If (6-2) has
a unique solution, then this solution is the minimum norm solution, by default. If (6-2) has an
infinite number of solutions, then we must find the solution with the smallest norm. In either case,
the minimum norm solution is unique, and it is characterized as being orthogonal to K(A), as
shown in what follows.
r
Each solution X of (6-2) can be uniquely decomposed into
r r
r
X = X K⊥ ⊕ X K ,
(6-3)
where
r
X K⊥ ∈ K( A )⊥ = R ( A ∗ )
r
X K ∈ K( A )
.
(6-4)
However
r
r
r
r
r
AX = A ( X K⊥ ⊕ X K ) = AX K⊥ = b ,
(6-5)
r
r
r
so X K⊥ ∈ K( A )⊥ is the only part of X that is significant in generating b.
Theorem 6.1
r
In the decomposition (6.3), there is only one vector X K⊥ ∈ K( A )⊥ that is common to all
r
r
r r
r
solutions of AX = b. Equivalently, there is a unique vector X K⊥ ∈ K( A )⊥ for which A X K⊥ = b.
Proof: To show this, assume there are two, and arrive at a contradiction. We assume the
r
r
r
r
r
r
existence of X K⊥ ∈ K( A )⊥ and YK⊥ ∈ K( A )⊥ such that A X K⊥ = b and A YK⊥ = b. Then, simple
CH6.DOC
Page 6-2
EE448/528
Version 1.0
John Stensby
r
r
r
r
r
subtraction leads to the conclusion that A( X K⊥ - YK⊥ ) = 0, or the difference X K⊥ - YK⊥ ∈ K(A).
r
r
This contradiction (as we know, X K⊥ - YK⊥ ∈ K(A)⊥ ) leads to the conclusion that there is only
r
r r
one X K⊥ ∈ K( A )⊥ that is common to all solutions of AX = b (each solution has a decomposition
r
of the form (6.3) where a common X K⊥ is used).♥
Theorem 6.2
r
r r
The unique solution X K⊥ ∈ K( A )⊥ is the minimum norm solution of AX = b. That is,
r
X K⊥
2
r
< X ,
(6-6)
2
r
r r
r r
where X is any other (i.e., X ≠ X K⊥ ) solution of AX = b.
Proof:
r r r
r r
Let X, X ≠ X K⊥ , be any solution of AX = b. As shown by (6-3), we can write the decomposition
r r
r
X = X K⊥ ⊕ X K ,
(6-7)
r
where XK ∈ K(A). Clearly, we have
r 2
r
r 2
r
r
r
r
X = X K⊥ ⊕ X K = X K⊥ ⊕ X K , X K⊥ ⊕ X K
2
2
r
r
r
r
= X K⊥ , X K⊥ + X K , X K
( due to orthogonality of vectors)
(6-8)
r
r 2
2
= X K⊥ + X K ,
2
2
r
r
so that X K⊥ 2 < X 2 as claimed.♥
When viewed geometrically, Theorem 6.2 is obvious. As shown by Figure 6.1, the
r
r
solution set for AX = b is a linear variety, a simple translation of K(A). Figure 6.1 shows an
CH6.DOC
Page 6-3
Version 1.0
John Stensby
r
X
Li
ne
ar
V
ar
iet
y
Co
nt
ain
in
g
So
lu
tio
ns
of
A r
X
=
b
r
EE448/528
r
XK
r
X K⊥
r
X is non - optimum
r
X K⊥ ∈ K(A)⊥ is optimum (minimum norm)
K[
A]
r r
r
r
X = X K ⊥ + X K where X K ∈ K( A )
Figure 6-1: The solution set is a linear variety. The minimum norm
solution is orthogonal to K(A).
r
r
arbitrary solution X, and it shows the “optimum”, minimum norm, solution X K⊥ . Clearly, the
solution becomes “optimum” when it is exactly orthogonal to K(A) (i.e., it is in K(A)⊥). We have
solved Problem #1.
The Pseudo Inverse A+
r
r
As shown by Theorems 6.1 and 6.2, when b ∈ R(A), a unique minimum norm solution to
r
r
AX = b always exists. And, this solution is in R(A*) = K(A)⊥. Hence we have a mapping from b
r
∈ R(A) back to X ∈ R(A*) (at this point, it might be a good idea to study Fig. 5.1 once again). It
is not difficult to show that this mapping is linear, one-to-one and onto. We denote this mapping
by
A+ : R(A) → R(A*),
r
(6-9)
r
r
r
and we write X = A+b, for b ∈ R(A) and X ∈ R(A*). Finally, our unique minimum norm solution
r
r r r
r
to the AX = b, b ∈ R(A), problem (i.e., "Problem #1") is denoted symbolically by X K⊥ = A+b.
CH6.DOC
Page 6-4
EE448/528
Version 1.0
John Stensby
r
As a subspace of U, R(A*) is a vector space in its own right. Every X in vector space
R(A*) ⊂ U gets mapped by matrix A to something in vector space R(A) ⊂ V. By restricting the
domain of A to be just R(A*) (instead of the whole U), we have a mapping with domain R(A*)
and co-domain R(A). Symbolically, we denote this mapping by
Y
A
R [ A∗
: R(A ∗ ) → R(A) ,
(6-10)
]
and we call it the restriction of A to R(A*). The mapping (6-10) is linear, one-to-one, and onto
(even though A : U → V may not be one-to-one or onto). More importantly, the inverse of
mapping (6-10) is the mapping (6-9). As is characteristic of an operator and its inverse, we have
Y
A
+
R [ A∗ ]
Y
A A
r r
r
A+ b = b for all b ∈ R(A)
R [ A∗ ]
,
r r
r
∗
X = X for all X ∈ R(A )
(6-11)
as expected. Finally, the relationship between (6-9) and (6-10) is illustrated by Fig. 6-2.
As defined so far, the domain of A+ is only R(A) ⊂ V. This is adequate for our discussion
r
of "Problem #1"-type problems where b ∈ R(A). However, for our discussion of "Problem #2"
r
(and the more general optimization problem, discussed in the paragraph containing (6-1), where b
∈ V), we must extend the domain of A+ to be all of V = R(A) ⊕ K(A*). A+ is already defined on
A
R(A*)
R(A*) = K(A)⊥
R(A) = K(A*)⊥
A+
Figure 6-2: Restricted to R(A*) = K(A)⊥, A is one-to-one and onto, and it
has the inverse A+.
CH6.DOC
Page 6-5
EE448/528
Version 1.0
John Stensby
R(A); we need to define the operator on K(A*). We do this in a very simple manner. We define
r r r
A+b = 0, b ∈ K(A*).
(6-12)
With the extension offered by (6-12), we have the entire vector space V as the domain of A+. The
operations performed by A and A+ are illustrated by Fig. 6.3. When defined in this manner, A+ is
called the Moore-Penrose pseudo inverse of A (or simply the pseudo inverse).
A number of important observations can be made from inspection of Fig. 6-3. Note that
r
r
r
r
r
r
r
r
r
r
r
r
(1) AA+ X = 0 for X ∈ K(A*) = R(A)⊥,
(2) AA+ X = X for X ∈ K(A*)⊥ = R(A),
(3) A+ AX = 0 for X ∈ K(A) = R(A*)⊥,
(6-13)
(4) A+ AX = X for X ∈ R(A*) = K(A)⊥, and
(5) R(A*) = R(A+ ) and K(A*) = K(A+ ) (however, A* ≠ A+ ).
al
on
og
rth
O nts
re e
s a lem
ce p
pa om
bs C
Su
Space V
Space U
A
R(A*)
R(A*) = K(A)⊥
R(A) = K(A*)⊥
Su
bs
pa
c
Co es
m are
pl O
em rt
en hog
ts on
al
Note that (1) and (2) of (6-13) imply that AA+ is the orthogonal projection of V onto R(A). Also,
A+
K(A) = R(A*)⊥
A
When restricted to R(A*), A is one-to-one
from R(A*) onto R(A).
K(A*) = R(A)⊥
A+ is one-to-one from R(A) onto R(A*) .
r
A maps K(A) to 0
r
A+ maps K(A*) to 0
r
0
A+
r
0
Figure 6-3: How A and A+ map the various parts of U and V.
CH6.DOC
Page 6-6
EE448/528
Version 1.0
John Stensby
(3) and (4) of (6-13) imply that A+ A is the orthogonal projection of U onto R(A*) = R(A+ ).
Finally, when A-1 exists we have A+ = A-1.
A
Original Moore and Penrose Definitions of the Pseudo Inverse
In 1935, Moore defined A+ as the unique n×m matrix for
which
U
V
A+
1) AA+ is the orthogonal projection of V onto R(A) = K(A*)⊥, and
(6-14)
2) A+ A is the orthogonal projection of U onto R(A*) = K(A)⊥.
In 1955, Penrose gave a purely algebraic definition of the pseudo inverse. He said that A+
is the unique n×m matrix satisfying
3) AA+ A = A,
4) A+ AA+ = A+ ,
(6-15)
5) AA+ and A+ A are Hermitian.
Of course Moore's and Penrose's definitions are equivalent, and they agree with the geometric
description we gave in the previous subsection.
While not very inspiring, Penrose's algebraic definition makes it simple to check if a given
matrix is A+ . Once we compute a candidate for A+, we can verify our results by checking if our
candidate satisfies Penrose's conditions. In addition, Penrose's definition leads to some simple and
useful results as shown by the next few theorems.
Theorem 6.3
(A+)+ = A++ = A
(6-16)
Proof: The result follows by direct substitution into (6-15).
Theorem 6.4
(A*)+ = (A+)* or A*+ = A+* .
CH6.DOC
(6-17)
Page 6-7
EE448/528
Version 1.0
John Stensby
That is, the order of the + and * is not important!
Proof: We claim that the pseudo inverse of A* is A + * . Let's use (6-15) to check this out!
3′) A*A+*A* = A*(AA+)* = ((AA+)A)* = (AA+A)* = A* so that 3) holds!
4′) A+*A*A+* = A+*(A+A)* = ((A+A)A+)* = (A+AA+)* = A+* so that 4) holds!
5′) A+*A* = (AA+)* = ((AA+)*)* = (A+*A* )* so A+*A* is Hermitian! So is A*A+* !!
Theorem 6.5
Suppose A is Hermitian. Then A+ is Hermitian.
Proof: (A+)* = (A*)+ = A+.
?
Given matrices A and B, one might ask if (AB)+ = B+A+ in general. The answer is NO! It
is not difficult to find some matrices that illustrate this. Hence, a well known and venerable
property of inverses (i.e., that (AB)-1 = B-1A-1) does not carry over to psuedo inverses.
Computing the Pseudo Inverse A+
There are three cases that need to be discussed when one tries to find A+. The first case is
when rank(A) < min(m,n) so that A does not have full rank. The second case is when m×n matrix
A has full row rank (rank(A) = m). The third case is when A has full column rank (rank(A) = n).
Case #1: r = Rank(A) < Min(m,n)
r
r
In this case, K(A) contains non-zero vectors, and the system AX = b has an infinite
number of solutions. For this case, a simple "plug-in-the-variables" formula for A+ does not exist.
However, the pseudo inverse can be calculated by using the basic ideas illustrated by Fig. 6.3.
r r
r
r
r r
Suppose m×n matrix A has rank r. Let X1, X2, ... , Xr be a basis of R(A*) and Y1, Y2, ... ,
Ym-r be a basis of K(A*). From (6-11) and Fig. 6.3, we see
r
r
A+ AX1 L AX r
r
r
r
r
Y1 L Ym − r = A+ AX1 L A+ AX r
r
r
= X1 L X r
r
r
r
r
A+ Y1 L A+ Ym − r
(6-18)
r
r
0 L 0 .
r
r
r
Now, note that AX1, ..., AXr is a basis for R(A). Since Y1, Y2, ... , Ym-r is a basis of K(A*) =
CH6.DOC
Page 6-8
EE448/528
Version 1.0
r
r
R(A)⊥, we see that m×m matrix AX1 L AX r
John Stensby
r
r
Y1 L Ym − r is nonsingular. From (6-18)
we have
r
r
A + = [X1 L X r
r
r
r
r
0 L 0] AX1 L AX r
1424
3
r
r
−1
Y1 L Ym − r
(6-19)
m − r zero
columns
While not simple, Equation (6-19) is "useable" in many cases, especially when m and n are small
integers.
Example
% Pseudo Inverse Example
% Enter 4x3 matrix A.
A = [1 1 2; 0 2 2; 1 0 1; 1 0 1]
A = 1
1
2
0
2
2
1
0
1
1
0
1
% Have MatLab Calculate an Orthonormal Basis of R(A*)
X = orth(A')
X = 0.3052
-0.7573
0.5033
0.6430
0.8084
-0.1144
% Have MatLab Calculate an Orthonormal Basis of K(A*)
Y = null(A')
Y = 0.7559
-0.3780
-0.3780
-0.3780
0
0.0000
-0.7071
0.7071
% Use (6-19) of class notes to calculate the pseudo inverse
pseudo = [ X
[0 0 0]'
[0 0 0]' ]*inv([A*X Y])
pseudo = 0.1429
-0.2381
0.2619
0.2619
0.0000
0.3333
-0.1667
-0.1667
0.1429
0.0952
0.0952
0.0952
% Same as that produced by MatLab's pinv function?
pinv(A)
ans = 0.1429
-0.2381
0.2619
0.2619
0.0000
0.3333
-0.1667
-0.1667
0.1429
0.0952
0.0952
0.0952
% YES! YES!
CH6.DOC
Page 6-9
EE448/528
Version 1.0
John Stensby
Case #2: r = Rank(A) = m (A has full row rank)
r
r
For this case, we have m ≤ n, the columns of A may or may not be dependent, and AX = b
may or may not have an infinite number of solutions. For this case, n×m matrix A* has full
column rank and (6-19) becomes
e j
A + = A ∗ AA∗
−1
,
(6-20)
a simple formula.
Case #3: Rank(A) = n (A has full column rank)
For this case, we have n ≤ m, the columns of A are independent. For this case, the n×n
r
r
matrix A*A is nonsingular (why?). On the left, we multiply the equation AX = b by A* to obtain
r
r
A*AX = A*b, and this leads to
r
r
X K⊥ = ( A∗A )−1 A∗ b
(6-21)
When A has full column rank, the pseudo inverse is
A+ = (A*A)-1A*,
a simple formula.
Example 6-1 A =
(6-22)
LM1
N0
OP
Q
0 1
1 0
LM OP
NQ
r
2
b=
1
Note that rank(A) = 2 and nullity(A) = 1. By inspection, the general solution is
LM OP
MM PP
NQ
LM
MM
N
OP
PP
Q
2
1
r
X = 1 +α 0
0
−1
CH6.DOC
Page 6-10
EE448/528
Version 1.0
John Stensby
Now, find the minimum norm solution. We do this two ways. Since this problem is so simple,
lets compute
r 2
2
2
2
X 2 = (2 + α) + 1 + α .
The minimum value of this is found by computing
d r 2
X 2 = 2( 2 + α ) + 2α = 0
dα
which yields α = -1. Hence, the minimum norm solution is
LM OP
MM PP
NQ
LM
MM
N
OP
PP
Q
LM OP
MM PP
NQ
2
1
1
r
X K⊥ = 1 − 1 0 = 1 ,
0
−1
1
(6-23)
a vector that is orthogonal to K(A) as required (please check orthogonality). This same result can
be computed by using (6-20) since, as outlined above, special case #2 applies here. First, we
compute
AA ∗ =
LM2 0OP , ( AA∗ )−1 = LM1 / 2 0OP .
N0 1 Q
N 0 1Q
Then, we use these results with (6-20) to obtain
LM
MM
N
OP L
PP MN
Q
OP LM2OP = LM10
Q N1Q MMN1
1 0
r
r
r
1/ 2 0
X K⊥ = A + b = A ∗ ( AA∗ )−1 b = 0 1
0 1
1 0
CH6.DOC
OP L O
PP MN PQ
Q
LM OP
MM PP
NQ
0
1
1
1
= 1 ,
1
0
1
Page 6-11
EE448/528
Version 1.0
John Stensby
the same results as (6-23)!
MatLab’s Backslash ( \ ) and Pinv Functions
r
r
r
A unique solution of AX = b exists if b ∈ R(A) and nullity(A) = 0. It can be found by
using MatLab's backslash ( \ ) function. The syntax is X = A\b; MatLab calls their backslash
matrix left division.
r
Let r = rank(A) and nullity(A) = n - r ≥ 1. For if b ∈ R(A), the complete solution can be
described in terms of n - r independent parameters, as shown by (5-7). These independent
r
parameters can be chosen to yield a solution X that contains at least n - r zero components. Such
a solution is generated by the MatLab syntax X = A\b. So, if a unique solution exists, MatLab's
backslash operator finds it; otherwise, the backslash operator finds the solution that contains the
r
maximum number of zeros (in general, this maximum-number-of-zeros solution will not be X K⊥ ,
the minimum norm solution).
r
MatLab can also find the optimum, minimum norm, solution X K⊥ . The MatLab syntax
r
r
for this is X = pinv(A)*b. If a unique solution to AX = b exists, then you can find it using
r
r
this syntax (but, for this case, the backslash operator is a better method). If AX = b has an infinite
number of solutions, then X = pinv(A)*b finds the optimum, minimum norm solution.
MatLab calculates the pseudo inverse by using the singular value decomposition of A, a method
that we will discuss in Chapter 8.
Example 6-2 (Continuation of Example 6-1)
We apply MatLab to the problem given by Example 6-1. The MatLab diary file follows.
% use the same A and b as was used in Example 6-1
A = [1 0 1;0 1 0];
b = [2;1];
% find the solution with the maximum number of zero components
X = A\b
X =
2
1
0
norm(X)
CH6.DOC
Page 6-12
EE448/528
Version 1.0
John Stensby
ans = 2.2361
%
% find the optimum, minimum norm, solution
Y = pinv(A)*b
Y =
1.0000
1.0000
1.0000
% solution is the same as that found in example 6-1!
norm(Y)
ans = 1.7321
% the "minimum norm" solution has a smaller norm
% then the "maximum-number-of-zeros" solution.
Problem #2: The Minimum Norm, Least-Square-Error Problem (Or: What Can We Do
With Inconsistent Linear Equations?)
There are many practical problems that result, in one way or another, in an inconsistent
r
r
r
system AX = b of equations (i.e., b ∉ R(A)). A solution does not exist in this case. However, we
r
can "least-squares fit" a vector X to an inconsistent system. That is, we can consider the leastr r
r
squares error problem of calculating an X that minimizes the error AX − b 2 (equivalent to
r r 2
minimizing the square error AX − b 2 ). As should be obvious (or, with just a little thought!),
r
such an X is not always unique. Hence, we find the minimum norm vector that minimizes the
r r
r r
error AX − b 2 (often, the error vector AX - b is called the residual). As it turns out, this
r
r
minimum norm solution to the least-squares error problem is unique, and it is denoted as X = A+ b
, where A+ is the pseudo inverse of A. Furthermore, with MatLab's pinv function, we can solve
this minimum-norm, least-square-error problem (as it is called in the literature). However, before
we jump too far into this problem, we must discuss orthogonal projections and projection
operators.
The Orthogonal Projection of One Vector on Another
Let's start out with a simple, one dimensional, projection problem. Suppose we have two
r
r
r
vectors, X1 and X2, and we want to find the "orthogonal projection", denoted here as Xp, of
r
r
vector X2 onto vector X1. As depicted by Figure 6-4, the "orthogonal projection" is co-linear
r
r
r
r
with X1. And, the error vector X2 - Xp is orthogonal to X1.
CH6.DOC
Page 6-13
EE448/528
Version 1.0
r
X2
r
0
John Stensby
r
r
X2 − X p
r
Xp
r
X1
r
r
Figure 6-4: Orthogonal projection of X 2 onto X1
r
r
We can find a formula for Xp. Since the error must be orthogonal to X1, we can write
r
r r
r
r r
r r
r r
0 = X 2 − X p , X1 = X 2 − αX1, X1 = X 2 , X1 − α X1, X1
(6-24)
which leads to
r r
X 2 , X1
α= r 2
X1 2
and
r r
r
r
X2 , X1 r
r
X1
X p = r 2 X1 = X 2 , r
X1 2
X1 2
r
(6-25)
r
X1
r
.
X1 2
r
(6-26)
r
r
r
r
Hence, Xp is the component of X2 in the X1 direction; X2 - Xp is the component of X2 in a
r
direction that is perpendicular to X1. In the next section, these ideas are generalized to the
projection of a vector on an arbitrary subspace.
Orthogonal Projections
Let W be a subspace of n-dimensional vector space U (the dimensionality of W is not
important in this discussion). An n×n matrix P is said to represent an orthogonal projection
operator on W (more simply, P is said to be an orthogonal projection on W) if
CH6.DOC
Page 6-14
EE448/528
Version 1.0
John Stensby
a) R(P) = W
b) P2 = P (i.e., P is idempotent)
(6-27)
c) P* = P (i.e., P is Hermitian)
We can obtain some simple results from consideration of definition (6-27). First, from a)
r
r
r
r
r
we have PX ∈ W for all X ∈ U. Secondly, from a) and b) we have PX = X for all X ∈ W. To
r
r r
r
r
r
r
r
r
see this, note that for X ∈ W = R(P) we have P(PX - X) = P2X - PX = PX - PX = 0 so that PX -
r
X ∈ K(P).
r
r
However, PX - X ∈ R(P) = W.
r
r
The only vector that is in K(P) and R(P)
r
r
simultaneously is 0. Hence, PX = X for each and every X ∈ W. The third observation from
r
r
r
(6-27) is that a), b) and c) tell us that (I - P)X ∈ W⊥ for each X ∈ U. To see this, consider any Y
r
r
r
r
∈ W = R(P). Then, Y = PX2 for some X2 ∈ U. Now, take any X1 ∈ U and write
r r
r
r
r r
r r
( I − P )X1, Y = (I − P )X1, PX 2 = P∗ ( I − P )X1, X 2 = P( I − P )X1, X2
r
r r
r
r r
= PX1 − P 2X1, X 2 = PX1 − PX1, X 2
.
(6-28)
=0
r
r
r
Hence, for all X1 ∈ U, we have (I - P)X1 orthogonal to W. Hence, X is decomposed into a part
r
r
PX ∈ W and a part (I - P)X ∈ W⊥. Figure 6.5 depicts this important decomposition. The fourth
observation from (6-27) is
r
r
r
X ∈ W⊥ ⇒ PX = 0.
(6-29)
CH6.DOC
Page 6-15
EE448/528
Version 1.0
r
X
r
0
John Stensby
r
(I - P)X ∈ W ⊥
r
PX
Subspace W
r
r
Figure 6-5: Orthogonal
Projection ofr X onto a subspace W. X is
r
decomposed into PX ∈ W and (I - P)X ∈ W⊥.
This is obvious; it follows from the facts that
r
PX ∈ W
r r
r
( I − P )X = X − PX ∈ W ⊥
,
(6-30)
r
r
r
r
and if X ∈ W⊥, the only way (6-30) can be true is to have PX = 0 for all X ∈ W⊥. The fifth
r
observation is that P plays a role in the unique direct sum decomposition of any X ∈ U. Recall
r r
r
r
r
that U = W⊕W⊥, and any X ∈ U has the unique decomposition X = X W ⊕ X W ⊥ , where XW ∈
r
r r
r r
W and X W ⊥ ∈ W ⊥ . Note that PX = XW and (I - P)X = X W ⊥ . Hence, if P is the orthogonal
projection operator on W, then (I - P) is the orthogonal projection operator on W⊥.
PX is the Optimal Approximation of X
r
r
r
Of all vectors in W, PX is the "optimal" approximation of X ∈ U. Consider any Xδ ∈ W
r
r
r
r
r
r
and any X ∈ U, and note that Z ≡ PX + Xδ ∈ W (think of Xδ ∈ W as being a perturbation of PX
r
r
r
r r
r
r
∈ W). We show that X − PX 2 ≤ X − {PX + Xδ } 2 for all X ∈ U and Xδ ∈ W. Consider
r
r r 2
r
r
r
r
r
r
X − {PX + Xδ } = ( X − PX ) − X δ , ( X − PX ) − X δ
2
r r
r
r
r r
r r
r
r r
r
= X − PX, X − PX − X δ , X − PX − X − PX, X δ + Xδ , Xδ .
CH6.DOC
(6-31)
Page 6-16
EE448/528
Version 1.0
John Stensby
r r
r
r
r r
However, the cross terms Xδ , X − PX and X − PX, Xδ are zero so that
r
r r 2
r
r 2 r 2
X − {PX + Xδ } = X − PX + Xδ .
2
2
(6-32)
2
From this, we conclude that
r
r 2
r
r r 2
r
r 2 r 2
X − PX ≤ X − {PX + Xδ } = X − PX + Xδ
2
2
r
2
(6-33)
2
r
r
for all Xδ ∈ W and X ∈ U. Of all the vectors in subspace W, PX ∈ W comes “closest” (in the 2-
r
norm sense) to X ∈ U.
Projection Operator on W is Unique
The orthogonal projection on a subspace is unique. Assume, for the moment, that there
are n×n matrices P1 and P2 with the properties given by (6-27). Use these to write
r
r
r
r 2 r∗
r ∗
r ∗
∗
( P1 − P2 )X 2 = X ( P1 − P2 ) ( P1 − P2 )X = P1X ( P1 − P2 )X − P2X ( P1 − P2 )X
c h
r
r ∗
= P1X {P1 − I} − {P2 − I} X −
c hb
r
g
r
c h
(6-34)
r
r ∗
P2X {P1 − I} − {P2 − I} X
c hb
g
r
r
However both P1X and P2X are orthogonal to both (P1 - I)X and (P2 - I)X. This leads to the
conclusion that
r
( P1 − P2 )X 2 = 0
(6-35)
r
for all X ∈ U, a result that requires P1 = P2. Hence, given subspace W, the orthogonal projection
operator on W is unique, as originally claimed.
CH6.DOC
Page 6-17
EE448/528
Version 1.0
John Stensby
How to “Make” an Orthogonal Projection Operator
r r
r
Let v1, v2, ... , vk be n×1 orthonormal vectors that span k-dimensional subspace W of ndimensional vector space U. Use these vectors as columns in the n×k matrix
r r
r
Q ≡ v1 v 2 L v k .
(6-36)
We claim that n×n matrix
P = Q Q∗
(6-37)
r
is the unique orthogonal projection onto subspace W. To show this, note that for any vector X ∈
U we have
r
r
r r
PX = Q Q∗ X = v1 v 2 L
LM vr1∗ OP
r∗
r MM v 2 PP r
r
vk
X = v1
MM M PP
MNvr ∗k PQ
r
v2 L
r
LM vr1∗ Xr OP
r∗ r
r MM v 2 X PP
vk
MM M PP
MNvr ∗k Xr PQ
r r
(6-38)
r
r
Now, PX is a linear combination of the orthonormal basis v1, v2, ... , vk. As X ranges over all of
r
r
r
r
r
U, the vector PX ranges over all of W (since Pvj = vj, 1 ≤ j ≤ k, and v1, ... , vk span W). As a
result, R(P) = W; this is the first requirement of (6-27). The second requirement of (6-27) is met
by realizing that
P2 = (QQ*)(QQ*) = Q I Q* = P,
(6-39)
since Q*Q is an k×k identity matrix I. Finally, the third requirement of (6-27) is obvious, and P =
CH6.DOC
Page 6-18
EE448/528
Version 1.0
John Stensby
QQ* is the unique orthogonal projection on the subspace W.
Special Case: One Dimensional Subspace W
r
r
Consider the one dimensional case W = span(X1), where X1 has unit length. Then (6-37)
r r
r
r
gives us n×n matrix P = X1 X1∗ . Let X2 ∉ W. Then the orthogonal projection of X2 onto W is
r
r
r r r
r r r
r r r
r r r
X p = PX 2 = ( X1 X1∗ )X 2 = X1 ( X1∗X 2 ) = ( X1∗X 2 )X1 = X2 , X1 X1 ,
(6-40)
r r
r
where we have used the fact that X1∗X 2 is a scalar. In light of the fact that X1 has unit length,
(6-40) is the same as (6-26). Now that we are knowledgeable about projection operators, we can
r
r r
handle the important AX = b, b ∉ R(A), problem.
The Minimum-Norm, Least-Squares-Error Problem
At last, we are in a position to solve yet another colossal problem in linear algebraic
r
r
r
equation theory. Namely, what to do with the AX = b problem when b ∉ R(A) so that no exact
solution(s) exist.
This type of problem occurs in applications where there are more equations (constraints)
then there are unknowns to be solved for. In this case, the problem is said to be overdetermined.
r
r
r
Such a problem is characterized by a matrix equation AX = b, where A is m×n with m > n, and b
r
is m×1. Clearly, the rows of A are dependent. And, b may have been obtained by making
r
imprecise measurements, so that rank(A) ≠ rank(A ¦ b), and no solution exists.
Thus, not
knowing what is important in a model (i.e., which constraints are the most important and which
r
can be thrown out), and an inability to precisely measure input b, can lead to an inconsistent set of
equations.
A "fix" for this problem is to throw out constraints (individual equations) until the
modified system has a solution. However, as discussed above, it is not always clear how to
accomplish this when all of the m constraints appear to be equally valid (ignorance keeps us from
obtain a "better" model).
Instead of throwing out constraints, it is sometimes better to do the best with what you
CH6.DOC
Page 6-19
EE448/528
Version 1.0
John Stensby
r
have. Often, the best approach is to find an X that satisfies
r r
AX − b
2
r r
AZ − b .
= min
r
(6-41)
2
Z∈U
r
r
r
That is, we try to get AX as close as possible to b. The problem of finding X that minimizes
r r
AX − b 2 is known as the least squares problem (since the 2-norm is used).
A solution to the least squares problem always exists. However, it is not unique, in
r
r
general. To see this, let X minimize the norm of the residual; that is, let X satisfy (6-41) so that it
r
is a solution to the least squares problem. Then, add to X any vector in K(A) to get yet another
vector that minimizes the norm of the residual.
It is common to add structure to the least squares problem in order to force a unique
r r
r
r
solution. We choose the smallest norm X that minimizes AX − b 2 . That is, we want the X that
r r
r
1) minimizes AX − b 2 , and 2) minimizes X 2 . This problem is called the minimum-norm,
r
least-squares problem. For all coordinate vectors b in V, it is guaranteed to have a unique
r
solution (when b ∈ R(A), we have a "Problem #1"-type problem - see the discussion on page 6-
r
r
1). And, as shown below, from b in V back to the coordinate vector X in U, the mapping is the
r
r
r
pseudo inverse of A. As before, we write X = A+ b for each b in V. The pseudo inverse solves
the general optimization problem that is discussed in the paragraph containing Equation (6-1).
r r
r
r
r
To verify that X = A+ b is the smallest norm X that minimizes AX − b 2 , let’s go about
r r
the process of minimizing the norm of the residual AX - b. First, recall from Fig. 5-1 that U =
r
r
R(A*)⊕K(A) and V = R(A)⊕K(A*). Hence, X ∈ U and b ∈ V have the unique decompositions
r r
r
X = X R [ A∗ ] ⊕ X K[ A ] ,
r r
r
b = b R [ A ] ⊕ b K[ A ∗ ] ,
CH6.DOC
R|Xrr
S|X
T
R [ A∗ ]
K[ A ]
R|
S|
T
∈ R ( A∗ )
∈ K[A ]
U|
V|
W
U|
V|
W
r
bR[A ] ∈ R ( A )
r
.
b K[ A ∗ ] ∈ K( A ∗ )
(6-42)
Page 6-20
EE448/528
Version 1.0
John Stensby
Now, we compute
r
r
r r2
r
r
2
AX − b = A{X R [ A∗ ] ⊕ X K[ A ]}- {b R [ A ] ⊕ b K[ A ∗ ]}
2
2
r
r
r
2
= {AX R [ A∗ ] − b R [ A ]} − b K[ A ∗ ]
(6-43)
2
r
r
r
2
2
= AX R [ A∗ ] − b R [ A ] + b K[ A ∗ ] .
2
2
r
r
The second line of this result follows from the fact AX K[ A ] = 0 . The third line follows from the
r
r
r
orthogonality of {AX R [ A∗ ] − b R [ A ]} and b ker[A ∗ ] . In minimizing (6-43), we have no control over
r
r
r r
b K[ A ∗ ] . However, how to choose X should be obvious. Select X = X R [ A∗ ] ∈ R(A*) = K(A)⊥
r
r
r
which satisfies A X R [ A∗ ] = b R[ A ] . By doing this, we will minimize (6-43) and X 2
r
simultaneously. However, this optimum X is given by
r
r
r
r
r
X R [ A∗ ] = A+ b R [ A ] = A+ b R [ A ] ⊕ b K[ A ∗ ] = A+ b .
(6-44)
From this, we conclude that the pseudo inverse solves the general optimization problem
r
r
introduced by the paragraph containing Equation (6-1). Finally, when you use X = A+b, the norm
r
of the residual becomes b K[ A ∗ ] 2 ; folks, it just doesn’t get any smaller than this!
r
Let’s give a geometric interpretation of our result. Given b ∈ V, the orthogonal
r
r
projection of b on the range of A is b R[ A ] . Hence, when solving the general minimum norm,
r
r
r
least-square error problem, we first project b on R(A) to obtain b R[ A ] . Then we solve AX =
r
b R[ A ] for its minimum norm solution, a "Problem #1"-type problem.
Folks, this stuff is worth repeating and illustrating with a graphic! Again, to solve the
general problem outlined on page 6-1, we
r
r
1) Orthogonally project b onto R(A) to get b R[ A ] ,
r r
r
2) Solve AX = b R[ A ] for X ∈ R(A*) = K(A)⊥ (a “Problem #1” exercise).
CH6.DOC
Page 6-21
Version 1.0
John Stensby
Co
nt
ain
in
g
r
X′
r
b R [A ]
V
ar
ie
ty
r r
b − b R[A ]
r
X
Li
ne
ar
r
b
So
lu
tio
ns
of
A r
X
=
b
r
EE448/528
(a)
K(
A)
r
X ′ is non - optimum
r
X is optimum
(b)
r
r
Figure 6-6: a) b R[A ] is the orthogonal projection of b onto R(A). b) Find the minimum norm
r r
solution to AX = b R[A ] (this is a "Problem #1"-type problem).
r
This two-step procedure produces an optimum X that minimizes
r r
AX − b 2 and
r
X2
simultaneously! It is illustrated by Fig. 6-6.
Application of the Pseudo Inverse: Least Squares Curve Fitting
We want to pass a straight line through a set of n data points. We want to do this in such
a manner that minimizes the squared error between the line and the data. Suppose we have the
data points (xk, yk), 1 ≤ k ≤ n. As illustrated by Figure 6-7, we desire to fit the straight line
y = a0x + a1
(6-45)
to the data. We want to find the coefficients a0 and a1 that minimize
CH6.DOC
Page 6-22
Version 1.0
y-axis
EE448/528
John Stensby
x
y=
+
a 1x
a0
x
x
x
x-axis
x
x
x denotes supplied data point
Figure 6-7: Least-squares fit a line to a data set.
Error 2 =
n
∑ [ yk − ( a1xk + a0 )]2 .
(6-46)
k =1
This problem can be formulated as an overdetermined linear system, and it can be solved
by using the pseudo inverse operator. On the straight line, define ~
yk, 1 ≤ k ≤ n, to be the ordinate
values that correspond to the xk, 1 ≤ k ≤ n. That is, define ~
yk = a1xk + a0, 1 ≤ k ≤ n, as the line
points. Now, we adopt the vector notation
r
T
YL ≡ ~y1 ~y2 L ~yn
r
T
Yd ≡ y1 y2 L y n
,
r
(6-47)
r
where YL denotes “line values”, and Yd denotes “y-data”. Now, denote the “x-data” n×2 matrix
as
LM1
1
Xd = M
MMM
N1
OP
PP
PQ
x1
x2
.
M
xn
(6-48)
Finally, the “line equation” is
CH6.DOC
Page 6-23
EE448/528
Xd
Version 1.0
John Stensby
LMa0 OP = Yr L ,
N a1 Q
(6-49)
which defines a straight line with slope a1. Our goal is to select [a0 a1]T to minimize
LM OP
N Q
r a0
r 2
r
r 2
− Yd .
YL − Yd 2 = Xd
a1
(6-50)
2
Unless all of the data lay on the line, the system
LM OP
N Q
r a0
r
= Yd
Xd
a1
(6-51)
is inconsistent, so we want a “least squares fit” to minimize (6-50). From our previous work, we
know the answer is
LMa0 OP = Xd+ Yrd ,
N a1 Q
(6-52)
where Xd+ is the pseudo inverse of Xd. Using (6-22), we write
LMa0 OP = dX T X i−1XdT Yrd ,
N a1 Q d d
(6-53)
Xd is n×2 with rank 2. Hence X dT X d is a 2×2 positive definite symmetric (and nonsingular)
matrix.
Example
Consider the 5 points
CH6.DOC
Page 6-24
EE448/528
Version 1.0
k
xk
yk
1
0
0
2
1
1.4
3
2
2.2
4
3
3.5
5
5
4.4
LM1
MM11
Xd =
MM1
MN1
OP
PP
PP
PQ
0
1
2 ,
3
5
John Stensby
LM 0 OP
14
.
M
P
r
Yd = M2.2P
MM35. PP
MN4.4PQ
MatLab yields the pseudo inverse
X d+ =
LM .5270
N−.1486
.3784 .2297 .0811 −.2162
−.0811 −.0135 .0541 .1892
OP
Q
so that the coefficients a0 and a1 are
LMa0 OP = Xd+ Yrd = LM.3676OP
N a1 Q
N.8784Q
The following MatLab program plots the points and the straight line approximation.
% EX7_3.M Least-squares curve fit with a line using
% \ operator and polyfit. The results are displayed
% and plotted.
x=[0 1 2 3 5];
% Define the data points
y=[0 1.4 2.2 3.5 4.4];
A1=[1 1 1 1 1 ]';
% Least squares matrix
A=[A1 x'];
Als=A'*A;
CH6.DOC
Page 6-25
Version 1.0
bls=A'*y';
% Compute least squares fit
Xlsq1=Als\bls;
Xlsq2=polyfit(x,y,1);
f1=polyval(Xlsq2,x);
error=y-f1;
disp('
x
y
f1
y-f1')
table=[x' y' f1' error'];
disp(table)
fprintf('Strike a key for the
plot\n')
pause
% Plot
clf
plot(x,y,'xr',x,f1,'-')
axis([-1 6 -1 6])
title('Least
Squares
line
Figure 7.3')
xlabel('x')
ylabel('y')
John Stensby
Least Squares line Figure 7.3
6
5
4
3
y
EE448/528
2
1
0
-1
-1
0
1
2
3
4
5
x
The output of the program follows.
x
y
0
0
f1
y-f1
0.3676 -0.3676
1.0000
1.4000
1.2459
0.1541
2.0000
2.2000
2.1243
0.0757
3.0000
3.5000
3.0027
0.4973
5.0000
4.4000
4.7595 -0.3595
Strike a key for the plot
N-Port Resistive Networks: Application of Pseudo Inverse
Consider the n-port electrical network.
I1
+ V1 -
I2
We treat the n-port as a black box that is
+ V2 -
In
+ Vn -
... ...
Black Box n-Port Resistive Network
Figure 6-8: n-port network described by
terminal behavior.
CH6.DOC
Page 6-26
EE448/528
Version 1.0
John Stensby
characterized by its terminal behavior. Suppose there are only resistors and linearly dependent
sources in the black box. Then we can characterize the network using complex-valued voltages
and currents (i.e., “phasors”) by a set of equations of the form
V1 = z11I1 + z12I2 + L + z1n I n
V2 = z21I1 + z22I2 + L + z2n I n
.
M
M
M
(6-54)
M
Vn = z n1I1 + z n 2I2 + L + z nn I n
r
r
r
r
This is equivalent to writing V= ZI where V = [V1 V2 ... Vn]T, I = [I1 I2 ... In]T and
LM z11
z21
Z=M
MM M
Nzn1
z12 L z1n
z22 L z2n
M L M
z n 2 L z nn
OP
PP
PQ
(6-55)
is the n×n impedance matrix. The zik, 1 ≤ i,k ≤ n, are known as open circuited impedance
parameters (they are real-valued since we are dealing with resistive networks).
The network is said to be reciprocal if input port i can be interchanged with output port j
and the relationship between port voltages and currents remain unchanged. That is, the network
is reciprocal if Z is Hermitian (Z* = ZT since the matrix is real-valued). Also, for a reciprocal
resistive n-port, it is possible to show that Hermitian Z is positive semi-definite.
Suppose we connect two reciprocal n-ports in parallel to form a new reciprocal n-port.
Given impedance matrices Z1 and Z2 for the two n-ports, we want to find the impedance matrix
Zequ of the parallel combination. Before doing this, we must give some preliminary material.
Theorem 6-6
Given n×n impedance matrices Z1 and Z2 of two resistive, reciprocal n port networks. Then
CH6.DOC
Page 6-27
EE448/528
Version 1.0
John Stensby
R ( Z1 ) + R ( Z2 ) = R ( Z1 + Z2 ) .
(6-56)
Proof: As discussed above, we know that Z1 and Z2 are Hermitian and non-negative definite. We
prove this theorem by showing
1) R(Z1 + Z2) ⊂ R(Z1) + R(Z2), and
2) R(Z1) ⊂ R(Z1 + Z2) and R(Z2) ⊂ R(Z1 + Z2) so that R(Z1) + R(Z2) ⊂ R(Z1 + Z2).
r
r
r
r
r
We show #1. Consider Y ∈ R(Z1 + Z2). Then there exists an X such that Y = (Z1 + Z2)X = Z1X
r
r
r
r
+ Z2X. But Z1X ∈ R(Z1) and Z2X ∈ R(Z2). Hence, Y ∈ R(Z1) + R(Z2), and this implies that
R(Z1 + Z2) ⊂ R(Z1) + R(Z2).
r
r
r
We show #2. Let Y ∈ K(Z1+Z2) so that (Z1+Z2)Y = 0. Then
r
r
r r
r r
r r
0 = Y,( Z1 + Z2 )Y = ( Z1 + Z2 )Y, Y = Z1Y, Y + Z2Y, Y .
(6-57)
Now, Z1 and Z2 are non-negative definite so that both inner products on the right-hand side of
(6-57) must be non-negative. Hence, we must have
r r
r r
Z1Y, Y = Z2Y, Y = 0 .
(6-58)
r
r
r
r
Since Z1 and Z2 are Hermitian, Equation (6-58) can be true if and only if Z1Y = 0 and Z2Y = 0;
r
r
that is, Y ∈ K(Z1) and Y ∈ K(Z2). Hence, we have shown that
K(Z1 + Z2 ) ⊂ K(Z1 ), K(Z1 + Z2 ) ⊂ K(Z2 )
(6-59)
R(Z1 ) ⊂ R(Z1 + Z2 ), R(Z2 ) ⊂ R(Z1 + Z2 )
Finally, the second set relationship in (6-59) implies R(Z1) + R(Z2) ⊂ R(Z1+Z2), and the theorem
is proved.♥
CH6.DOC
Page 6-28
EE448/528
Version 1.0
John Stensby
Parallel Sum of Matrices
Let N1 and N2 be reciprocal n-port resistive networks that are described by n×n matrices
Z1 and Z2, respectively. The parallel sum of Z1 and Z2 is denoted as Z1:Z2, and it is defined as
Z1:Z2 = Z1 (Z1 + Z2)+ Z2 ,
(6-60)
an n×n matrix.
Theorem 6-7
Let Z1 and Z2 be impedance matrices of reciprocal n-port resistive networks. Then, we have
Z1:Z2 = Z1 (Z1 + Z2)+ Z2 = Z2 (Z1 + Z2)+ Z1 = Z2:Z1
(6-61)
Proof:
By inspection, the matrix (Z1 + Z2)(Z1 + Z2)+(Z1 + Z2) is symmetric. Multiply out this matrix to
obtain
(Z1 + Z2)(Z1 + Z2)+(Z1 + Z2)
= [Z1(Z1 + Z2)+Z1 + Z2(Z1 + Z2)+Z2] + [Z1(Z1 + Z2)+Z2 + Z2(Z1 + Z2)+Z1] .
(6-62)
The left-hand-side of this result is symmetric. Hence, the right-hand-side must be symmetric as
well. This requires that both Z1(Z1 + Z2)+Z2 and Z2(Z1 + Z2)+Z1 be symmetric, so that
b
Z1 Z1 + Z2
g+ Z2 = dZ1( Z1 + Z2 )+ Z2 iT = ZT2 e( Z1 + Z2 )+ jT Z1T = ZT2 e( Z1 + Z2 )T j+ Z1T
(6-63)
b
= Z2 Z1 + Z2
g+ Z1
as claimed. Hence, we have Z1:Z2 = Z2:Z1 and the parallel sum of symmetric matrices is a
CH6.DOC
Page 6-29
EE448/528
Version 1.0
John Stensby
symmetric matrix.
Theorem 6-8: Parallel n-Port Resistive Networks
Suppose that N1 and N2 are two reciprocal n-port resistive networks that are described by
impedance matrix Z1 and Z2, respectively. Then the parallel connection of N1 and N2 is described
by the impedance matrix
Zequ = Z1 : Z2 = Z1(Z1 + Z2)+Z2 ,
(6-64)
the parallel sum of matrices Z1 and Z2.
Proof:
r r
r
r
r
Let I 1, I 2 and I = I 1 + I 2 be the current vectors entering networks N1, N2 and the parallel
r
combination of N1 and N2, respectively. Likewise, let V be the voltage vector that is supplied to
the parallel connection of N1 and N2. To prove this theorem, we must show that
r
r
r
V=Z1(Z1 + Z2)+Z2 I = (Z1 : Z2)I .
(6-65)
r
Since V is supplied to the parallel combination, we have
r
r
r
V = Z1 I 1 = Z2 I 2 .
(6-66)
Use this last equation to write
r
r r
r
r
Z1I = Z1( I1 + I2 ) = V + Z1I2
r
r r
r
r
Z2 I = Z2 ( I1 + I2 ) = V + Z2 I1
.
(6-67)
Now, multiply these last two equations by (Z1 + Z2)+ to obtain
CH6.DOC
Page 6-30
EE448/528
Version 1.0
John Stensby
r
r
r
( Z1 + Z2 )+ Z1I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z1I2
r
r
r
( Z1 + Z2 ) Z2 I = ( Z1 + Z2 )+ V + ( Z1 + Z2 )+ Z2 I1
.
(6-68)
+
On the left, multiply the first of these by Z2 and the second by Z1 to obtain (using the parallel sum
notation)
r
r
r
( Z2: Z1 ) I = Z2 ( Z1 + Z2 )+ V + ( Z2: Z1 ) I2
r
r
r
( Z1: Z2 ) I = Z1( Z1 + Z2 )+ V + ( Z1: Z2 ) I1
.
(6-69)
r r
r
However, we know that Z1:Z2 = Z2:Z1, I = I 1 + I 2, and Z1+Z2 is symmetric. Hence, when we add
the two equation (6-69) we get
r
r
( Z1: Z2 ) I = ( Z1 + Z2 )( Z1 + Z2 )+ V
,
r
= PV
(6-70)
where
P ≡ (Z1 + Z2)(Z1 + Z2)+
(6-71)
r
r
r
is an orthogonal projection operator onto the range of Z1 + Z2. Since V = Z1 I 1 = Z2 I 2, we know
r
r
that V ∈ R(Z1) and V ∈ R(Z2). However, by Theorem 6-6, we know that R(Z1) ⊂ R(Z1 + Z2)
r
r
r
and R(Z2) ⊂ R(Z1 + Z2). Hence, we have V ∈ R(Z1 + Z2) so that PV = V, where P is the
orthogonal projection operator (6-71). Hence, from (6-70), we conclude that
r
r r
( Z1: Z2 ) I = P V = V ,
(6-72)
so the parallel sum Z1:Z2 is the impedance matrix for the parallel combination of N1 and N2.♥
CH6.DOC
Page 6-31
EE448/528
Version 1.0
John Stensby
1
3
•
Example
Consider the 3-port networks shown to the right (1 to 1', 2 to 2' and
3 to 3' are the ports; voltages are sensed positive, and currents are
defined as entering, the unprimed terminals 1, 2 and 3). For 1 ≤ i, j ≤
•
R1
2
•
•
R2
2'
•
3, the entries in the impedance matrix are
1'
3'
•
V
zij ≡ i ,
Ij
where Vi is the voltage across the ith port (voltage is signed positive at the unprimed port
terminal), and Ij is the current entering the jth port (current defined as entering unprimed port
terminal). Elementary circuit theory can be used to obtain
LMR1
Z= 0
MMR
N1
0
R1
R2
R1 + R 2
R2
R2
OP
PP
Q
as the impedance matrix for this 3-port.
Connect two of these 3-port networks
in parallel to form the network depicted to the
2
right. The impedance matrices are
3
LM1
Z1 = 0
MM1
N
L1
M
Z 2 = M0
MN1
OP
PP
Q
,
1O
1PP
2PQ
0 1
2 2
2 3
0
1
1
•
2'
2
•
2'
•
•
1Ω
•
1'
•
2Ω
•
3'
•
1
3
•
•
1Ω
2
1'
1
•
1'
•
3'
•
•
•
1Ω
2'
•
CH6.DOC
1
3
•
3'
•
Page 6-32
EE448/528
Version 1.0
John Stensby
both of which are singular. By Theorem 6-8, the impedance matrix for the parallel combination is
Zequ = Z1: Z2 = Z1( Z1 + Z2 )+ Z2
LM1
= M0
MN1
CH6.DOC
0 1
2 2
2 3
OP F LM1
PP GG MM10
Q HN
PPO
PQ
LM
MM
N
0 1
1 0 1
2 2 + 0 1 1
2 3
1 1 2
OPI + LMML1
PPJJ MMMM01
QK N N
OPOP LM1 / 2
PPPP = MM1 0/ 2
QQ N
0 1
1 1
1 2
OP
PP
Q
0 1/ 2 .
2/3 2/3
2/3 7/6
Page 6-33