Jim Lambers
MAT 610
Summer Session 2009-10
Lecture 3 Notes
These notes correspond to Sections 2.5-2.7 in the text.
Orthogonality
A set of vectors {x1 , x2 , . . . , xπ } in βπ is said to be orthogonal if xππ xπ = 0 whenever π β= π. If, in
addition, xππ xπ = 1 for π = 1, 2, . . . , π, then we say that the set is orthonormal. In this case, we can
also write xππ xπ = πΏππ , where
{
1 π=π
πΏππ =
0 π β= π
is called the Kronecker delta.
[
]π
[
]π
are orthogonal, as xπ y = 1(4) β
and y = 4 2 1
Example The vectors x = 1 β3 2
β
3(2) + 2(1) = 4 β 6 + 2 = 0. Using the fact that β₯xβ₯2 = xπ x, we can normalize these vectors to
obtain orthonormal vectors
β‘
β€
β‘ β€
1
4
x
1
y
1
xΜ =
= β β£ β3 β¦ , yΜ =
= β β£ 2 β¦,
β₯xβ₯2
β₯yβ₯2
14
21 1
2
which satisfy xΜπ xΜ = yΜπ yΜ = 1, xΜπ yΜ = 0. β‘
The concept of orthogonality extends to subspaces. If π1 , π2 , . . . , ππ is a set of subspaces of βπ ,
we say that these subspaces are mutually orthogonal if xπ y = 0 whenever x β ππ , y β ππ , for π β= π.
Furthermore, we deο¬ne the orthogonal complement of a subspace π β βπ to be the set π β₯ deο¬ned
by
π β₯ = {y β βπ β£ xπ y = 0, x β π}.
Example Let π be the subspace of β3 deο¬ned by π = span{v1 , v2 }, where
β‘ β€
β‘
β€
1
1
v1 = β£ 1 β¦ , v2 = β£ β1 β¦ .
1
1
{[
]π }
1 0 β1
Then, π has the orthogonal complement π β₯ = span
.
It can be veriο¬ed directly, by computing inner products, that the single basis vector of π β₯ ,
[
]π
v3 = 1 0 β1 , is orthogonal to either basis vector of π, v1 or v2 . It follows that any
1
multiple of this vector, which describes any vector in the span of v3 , is orthogonal to any linear
combination of these basis vectors, which accounts for any vector in π, due to the linearity of the
inner product:
(π3 v3 )π (π1 v1 + π2 v2 ) = π3 π1 v3π v1 + π3 π2 v3π v2 = π3 π3 (0) + π3 π2 (0) = 0.
Therefore, any vector in π β₯ is in fact orthogonal to any vector in π. β‘
It is helpful to note that given a basis {v1 , v2 } for any two-dimensional subspace π β β3 , a basis
{v3 } for π β₯ can be obtained by computing the cross-product of the basis vectors of π,
β‘
β€
(v1 )2 (v2 )3 β (v1 )3 (v2 )2
v3 = v1 × v2 = β£ (v1 )3 (v2 )1 β (v1 )1 (v2 )3 β¦ .
(v1 )1 (v2 )2 β (v1 )2 (v2 )1
A similar process can be used to compute the orthogonal complement of any (π β 1)-dimensional
subspace of βπ . This orthogonal complement will always have dimension 1, because the sum of the
dimension of any subspace of a ο¬nite-dimensional vector space, and the dimension of its orthogonal
complement, is equal to the dimension of the vector space. That is, if dim(π ) = π, and π β π is a
subspace of π , then
dim(π) + dim(π β₯ ) = π.
Let π = range(π΄) for an π × π matrix π΄. By the deο¬nition of the range of a matrix, if y β π,
then y = π΄x for some vector x β βπ . If z β βπ is such that zπ y = 0, then we have
zπ π΄x = zπ (π΄π )π x = (π΄π z)π x = 0.
If this is true for all y β π, that is, if z β π β₯ , then we must have π΄π z = 0. Therefore, z β null(π΄π ),
and we conclude that π β₯ = null(π΄π ). That is, the orthogonal complement of the range is the null
space of the transpose.
Example Let
β‘
β€
1
1
π΄ = β£ 2 β1 β¦ .
3
2
If π = range(π΄) = span{a1 , a2 }, where a1 and a2 are the columns of π΄, then π β₯ = span{b}, where
[
]π
b = a1 × a2 = 7 1 β3 . We then note that
π΄π b =
[
1
2 3
1 β1 2
]
2
β‘
β€
[ ]
7
β£ 1 β¦= 0 .
0
β3
That is, b is in the null space of π΄π . In fact, because of the relationships
rank(π΄π ) + dim(null(π΄π )) = 3,
rank(π΄) = rank(π΄π ) = 2,
we have dim(null(π΄π )) = 1, so null(π΄) = span{b}. β‘
A basis for a subspace π β βπ is an orthonormal basis if the vectors in the basis form an
orthonormal set. For example, let π1 be an π × π matrix with orthonormal columns. Then these
columns form an orthonormal basis for range(π1 ). If π < π, then this basis can be extended to a
basis of all of βπ by obtaining an orthonormal basis of range(π1 )β₯ = null(ππ1 ).
β₯
If we deο¬ne π2 to be a matrix whose
[ columns] form an orthonormal basis of range(π1 )π , then
the columns of the π × π matrix π = π1 π2 are also orthonormal. It follows that π π = πΌ.
That is, ππ = πβ1 . Furthermore, because the inverse of a matrix is unique, πππ = πΌ as well. We
say that π is an orthogonal matrix.
One interesting property of orthogonal matrices is that they preserve certain norms. For example, if π is an orthogonal π × π matrix, then β₯πxβ₯22 = xπ ππ πx = xπ x = β₯xβ₯22 . That is, orthogonal
matrices preserve β2 -norms. Geometrically, multiplication by an π × π orthogonal matrix preserves
the length of a vector, and performs a rotation in π-dimensional space.
Example The matrix
β€
cos π 0 sin π
0
1
0 β¦
π=β£
β sin π 0 cos π
β‘
is an orthogonal matrix, for any angle π. For any vector x β β3 , the vector πx can be obtained by
rotating x clockwise by the angle π in the π₯π§-plane, while ππ = πβ1 performs a counterclockwise
rotation by π in the π₯π§-plane. β‘
In addition, orthogonal matrices preserve the matrix β2 and Frobenius norms. That is, if π΄ is
an π × π matrix, and π and π are π × π and π × π orthogonal matrices, respectively, then
β₯ππ΄πβ₯πΉ = β₯π΄β₯πΉ ,
β₯ππ΄πβ₯2 = β₯π΄β₯2 .
The Singular Value Decomposition
The matrix β2 -norm can be used to obtain a highly useful decomposition of any π × π matrix π΄.
Let x be a unit β2 -norm vector (that is, β₯xβ₯2 = 1 such that β₯π΄xβ₯2 = β₯π΄β₯2 . If z = π΄x, and we
deο¬ne y = z/β₯zβ₯2 , then we have π΄x = πy, with π = β₯π΄β₯2 , and both x and y are unit vectors.
3
π and βπ , respectively, to obtain
We can extend x and y,
to orthonormal [bases of β
]
]
[ separately,
orthogonal matrices π1 = x π2 β βπ×π and π1 = y π2 β βπ×π . We then have
[ π ]
]
[
y
π
π1 π΄π1 =
π΄ x π2
π
π2
[ π
]
y π΄x yπ π΄π2
=
π2π π΄x π2π π΄π2
[
]
πyπ y yπ π΄π2
=
ππ2π y π2π π΄π2
[
]
π wπ
=
0 π΅
β‘ π΄1 ,
where π΅ = π2π π΄π2 and w = π2π π΄π y. Now, let
[
s=
π
w
]
.
Then β₯sβ₯22 = π 2 + β₯wβ₯22 . Furthermore, the ο¬rst component of π΄1 s is π 2 + β₯wβ₯22 .
It follows from the invariance of the matrix β2 -norm under orthogonal transformations that
π 2 = β₯π΄β₯22 = β₯π΄1 β₯22 β₯
β₯π΄1 sβ₯22
(π 2 + β₯wβ₯22 )2
β₯
= π 2 + β₯wβ₯22 ,
2
β₯sβ₯2
π 2 + β₯wβ₯22
and therefore we must have w = 0. We then have
π1π π΄π1
[
= π΄1 =
π 0
0 π΅
]
.
Continuing this process on π΅, and keeping in mind that β₯π΅β₯2 β€ β₯π΄β₯2 , we obtain the decomposition
π π π΄π = Ξ£
where
π=
[
u1 β
β
β
uπ
]
β βπ×π ,
π =
[
v1 β
β
β
vπ
]
β βπ×π
are both orthogonal matrices, and
Ξ£ = diag(π1 , . . . , ππ ) β βπ×π ,
π = min{π, π}
is a diagonal matrix, in which Ξ£ππ = ππ for π = 1, 2, . . . , π, and Ξ£ππ = 0 for π β= π. Furthermore, we
have
β₯π΄β₯2 = π1 β₯ π2 β₯ β
β
β
β₯ ππ β₯ 0.
4
This decomposition of π΄ is called the singular value decomposition, or SVD. It is more commonly
written as a factorization of π΄,
π΄ = π Ξ£π π .
The diagonal entries of Ξ£ are the singular values of π΄. The columns of π are the left singular
vectors, and the columns of π are the right singluar vectors. It follows from the SVD itself that
the singular values and vectors satisfy the relations
π΄π uπ = ππ vπ ,
π΄vπ = ππ uπ ,
π = 1, 2, . . . , min{π, π}.
For convenience, we denote the πth largest singular value of π΄ by ππ (π΄), and the largest and smallest
singular values are commonly denoted by ππππ₯ (π΄) and ππππ (π΄), respectively.
The SVD conveys much useful information about the structure of a matrix, particularly with
regard to systems of linear equations involving the matrix. Let π be the number of nonzero singular
values. Then π is the rank of π΄, and
range(π΄) = span{u1 , . . . , uπ },
null(π΄) = span{vπ+1 , . . . , vπ }.
That is, the SVD yields orthonormal bases of the range and null space of π΄.
It follows that we can write
π
β
π΄=
ππ uπ vππ .
π=1
This is called the SVD expansion of π΄. If π β₯ π, then this expansion yields the βeconomy-sizeβ
SVD
π΄ = π1 Ξ£ 1 π π ,
where
π1 =
[
u1 β
β
β
uπ
]
β βπ×π ,
Ξ£1 = diag(π1 , . . . , ππ ) β βπ×π .
Example The matrix
β‘
β€
11 19 11
π΄ = β£ 9 21 9 β¦
10 20 10
has the SVD π΄ = π Ξ£π π where
β
β β€
β‘ β
1/β3 β1/β2
1/β6
π = β£ 1/β3
1/ 2
1/ 6 β¦ ,
β
1/ 3
0 β 2/3
and
β
β
β β€
1/ 6 β1/β3
1/ 2
β
β¦
π = β£ 2/3
1/β3
β
β0 ,
1/ 6 β1/ 3 β1/ 2
β‘
β‘
β€
42.4264
0 0
0 2.4495 0 β¦ .
π=β£
0
0 0
5
]
]
[
[
Let π = u1 u2 u3 and π = v1 v2 v3 be column partitions of π and π , respectively.
Because there are only two nonzero singular values, we have rank(π΄) = 2, Furthermore, range(π΄) =
span{u1 , u2 }, and null(π΄) = span{v3 }. We also have
π΄ = 42.4264u1 v1π + 2.4495u2 v2π .
β‘
The SVD is also closely related to the β2 -norm and Frobenius norm. We have
β₯π΄β₯2 = π1 ,
and
min
xβ=0
β₯π΄β₯2πΉ = π12 + π22 + β
β
β
+ ππ2 ,
β₯π΄xβ₯2
= ππ ,
β₯xβ₯2
π = min{π, π}.
These relationships follow directly from the invariance of these norms under orthogonal transformations.
The SVD has many useful applications, but one of particular interest is that the truncated SVD
expansion
π
β
π΄π =
ππ uπ vππ ,
π=1
where π < π = rank(π΄), is the best approximation of π΄ by a rank-π matrix. It follows that the
distance between π΄ and the set of matrices of rank π is
π
β
min β₯π΄ β π΅β₯2 = β₯π΄ β π΄π β₯2 = ππ uπ vππ = ππ+1 ,
rank(π΅)=π
π=π+1
because the β2 -norm of a matrix is its largest singular value, and ππ+1 is the largest singular value
of the matrix obtained by removing the ο¬rst π terms of the SVD expansion. Therefore, ππ , where
π = min{π, π}, is the distance between π΄ and the set of all rank-deο¬cient matrices (which is zero
when π΄ is already rank-deο¬cient). Because a matrix of full rank can have arbitarily small, but still
positive, singular values, it follows that the set of all full-rank matrices in βπ×π is both open and
dense.
Example The best approximation of the matrix π΄ from the previous example, which has rank
two, by a matrix of rank one is obtained by taking only the ο¬rst term of its SVD expansion,
β‘
β€
10 20 10
π΄1 = 42.4264u1 v1π = β£ 10 20 10 β¦ .
10 20 10
6
The absolute error in this approximation is
β₯π΄ β π΄1 β₯2 = π2 β₯u2 v2π β₯2 = π2 = 2.4495,
while the relative error is
β₯π΄ β π΄1 β₯2
2.4495
=
= 0.0577.
β₯π΄β₯2
42.4264
That is, the rank-one approximation of π΄ is oο¬ by a little less than six percent. β‘
Suppose that the entries of a matrix π΄ have been determined experimentally, for example by
measurements, and the error in the entries is determined to be bounded by some value π. Then, if
ππ > π β₯ ππ+1 for some π < min{π, π}, then π is at least as large as the distance between π΄ and
the set of all matrices of rank π. Therefore, due to the uncertainty of the measurement, π΄ could
very well be a matrix of rank π, but it cannot be of lower rank, because ππ > π. We say that the
π-rank of π΄, deο¬ned by
rank(π΄, π) = min rank(π΅),
β₯π΄βπ΅β₯2 β€π
is equal to π. If π < min{π, π}, then we say that π΄ is numerically rank-deο¬cient.
Projections
Given a vector z β βπ , and a subspace π β βπ , it is often useful to obtain the vector w β π that
best approximates z, in the sense that z β w β π β₯ . In other words, the error in w is orthogonal to
the subspace in which w is sought. To that end, we deο¬ne the orthogonal projection onto π to be
an π × π matrix π such that if w = π z, then w β π, and wπ (z β w) = 0.
Substituting w = π z into wπ (z β w) = 0, and noting the commutativity of the inner product,
we see that π must satisfy the relations
zπ (π π β π π π )z = 0,
zπ (π β π π π )z = 0,
for any vector z β βπ . It follows from these relations that π must have the following properties:
range(π ) = π, and π π π = π = π π , which yields π 2 = π . Using these properties, it can be shown
that the orthogonal projection π onto a given subspace π is unique. Furthermore, it can be shown
that π = π π π , where π is a matrix whose columns are an orthonormal basis for π.
Projections and the SVD
Let π΄ β βπ×π have the SVD π΄ = π Ξ£π π , where
[
]
[
]
Λπ , π = ππ πΛπ ,
π = ππ π
7
where π = rank(π΄) and
ππ =
[
u1 β
β
β
uπ
]
,
ππ =
[
v1 β
β
β
vπ
]
.
Then, we obtain the following projections related to the SVD:
β ππ πππ is the projection onto the range of π΄
β πΛπ πΛππ is the projection onto the null space of π΄
Λπ π
Λππ is the projection onto the orthogonal complement of the range of π΄, which is the null
β π
space of π΄π
β ππ πππ is the projection onto the orthogonal complement of the null space of π΄, which is the
range of π΄π .
These projections can also be determined by noting that the SVD of π΄π is π΄π = π Ξ£π π .
Example For the matrix
β‘
β€
11 19 11
π΄ = β£ 9 21 9 β¦ ,
10 20 10
that has rank two, the orthogonal projection onto range(π΄) is
]
[
[
] uπ1
= u1 uπ1 + u2 uπ2 ,
π2 π2π = u1 u2
uπ2
[
]
where π = u1 u2 u3 is the previously deο¬ned column partition of the matrix π from the
SVD of π΄. β‘
Sensitivity of Linear Systems
Let π΄ β βπ×π be nonsingular, and let π΄ = π Ξ£π π be the SVD of π΄. Then the solution x of the
system of linear equations π΄x = b can be expressed as
x = π΄β1 b = π Ξ£β1 π π b =
π
β
uπ b
π
π=1
ππ
vπ .
This formula for x suggests that if ππ is small relative to the other singular values, then the system
π΄x = b can be sensitive to perturbations in π΄ or b. This makes sense, considering that ππ is the
distance between π΄ and the set of all singular π × π matrices.
8
In an attempt to measure the sensitivity of this system, we consider the parameterized system
(π΄ + ππΈ)x(π) = b + πe,
where E β βπ×π and e β βπ are perturbations of π΄ and b, respectively. Taking the Taylor
expansion of x(π) around π = 0 yields
x(π) = x + πxβ² (0) + π(π2 ),
where
xβ² (π) = (π΄ + ππΈ)β1 (e β πΈx),
which yields xβ² (0) = π΄β1 (e β πΈx).
Using norms to measure the relative error in x, we obtain
β₯x(π) β xβ₯
β₯π΄β1 (e β πΈx)β₯
= β£πβ£
+ π(π2 ) β€ β£πβ£β₯π΄β1 β₯
β₯xβ₯
β₯xβ₯
{
}
β₯eβ₯
+ β₯πΈβ₯ + π(π2 ).
β₯xβ₯
Multiplying and dividing by β₯π΄β₯, and using π΄x = b to obtain β₯bβ₯ β€ β₯π΄β₯β₯xβ₯, yields
(
)
β₯eβ₯ β₯πΈβ₯
β₯x(π) β xβ₯
= π
(π΄)β£πβ£
+
,
β₯xβ₯
β₯bβ₯ β₯π΄β₯
where
π
(π΄) = β₯π΄β₯β₯π΄β1 β₯
is called the condition number of π΄. We conclude that the relative errors in π΄ and b can be
ampliο¬ed by π
(π΄) in the solution. Therefore, if π
(π΄) is large, the problem π΄x = b can be quite
sensitive to perturbations in π΄ and b. In this case, we say that π΄ is ill-conditioned; otherwise, we
say that π΄ is well-conditioned.
The deο¬nition of the condition number depends on the matrix norm that is used. Using the
β2 -norm, we obtain
π1 (π΄)
π
2 (π΄) = β₯π΄β₯2 β₯π΄β1 β₯2 =
.
ππ (π΄)
It can readily be seen from this formula that π
2 (π΄) is large if ππ is small relative to π1 . We
also note that because the singular values are the lengths of the semi-axes of the hyperellipsoid
{π΄x β£ β₯xβ₯2 = 1}, the condition number in the β2 -norm measures the elongation of this hyperellipsoid.
Example The matrices
[
π΄1 =
0.7674 0.0477
0.6247 0.1691
]
[
,
π΄2 =
0.7581 0.1113
0.6358 0.0933
]
do not appear to be very diο¬erent from one another, but π
2 (π΄1 ) = 10 while π
2 (π΄2 ) = 1010 . That
is, π΄1 is well-conditioned while π΄2 is ill-conditioned.
9
To illustrate the ill-conditioned nature of π΄2 , we solve the systems π΄2 x1 = b1 and π΄2 x2 = b2
for the unknown vectors x1 and x2 , where
[
]
[
]
0.7662
0.7019
b1 =
, b2 =
.
0.6426
0.7192
These vectors diο¬er from one another by roughly 10%, but the solutions
[
]
[
]
0.9894
β1.4522 × 108
x1 =
, x2 =
0.1452
9.8940 × 108
diο¬er by several orders of magnitude, because of the sensitivity of π΄2 to perturbations. β‘
Just as the largest singular value of π΄ is the β2 -norm of π΄, and the smallest singular value is
the distance from π΄ to the nearest singular matrix in the β2 -norm, we have, for any βπ -norm,
β₯Ξπ΄β₯π
1
=
min
.
π
π (π΄) π΄+Ξπ΄ singular β₯π΄β₯π
That is, in any βπ -norm, π
π (π΄) measures the relative distance in that norm from π΄ to the set of
singular matrices.
Because det(π΄) = 0 if and only if π΄ is singular, it would appear that the determinant could be
used to measure the distance from π΄ to the nearest singular matrix. However, this is generally not
the case. It is possible for a matrix to have a relatively large determinant, but be very close to a
singular matrix, or for a matrix to have a relatively small determinant, but not be nearly singular.
In other words, there is very little correlation between det(π΄) and the condition number of π΄.
Example Let
β‘
β’
β’
β’
β’
β’
β’
β’
π΄=β’
β’
β’
β’
β’
β’
β’
β£
1 β1 β1 β1 β1 β1 β1 β1 β1 β1
0
1 β1 β1 β1 β1 β1 β1 β1 β1
0
0
1 β1 β1 β1 β1 β1 β1 β1
0
0
0
1 β1 β1 β1 β1 β1 β1
0
0
0
0
1 β1 β1 β1 β1 β1
0
0
0
0
0
1 β1 β1 β1 β1
0
0
0
0
0
0
1 β1 β1 β1
0
0
0
0
0
0
0
1 β1 β1
0
0
0
0
0
0
0
0
1 β1
0
0
0
0
0
0
0
0
0
1
β€
β₯
β₯
β₯
β₯
β₯
β₯
β₯
β₯.
β₯
β₯
β₯
β₯
β₯
β₯
β¦
Then det(π΄) = 1, but π
2 (π΄) β 1, 918, and π10 β 0.0029. That is, π΄ is quite close to a singular
π ,
matrix, even though det(π΄) is not near zero. For example, the nearby matrix π΄Λ = π΄ β π10 u10 v10
10
which has entries
β‘
β’
β’
β’
β’
β’
β’
β’
Λ
π΄ββ’
β’
β’
β’
β’
β’
β’
β£
1
β0
β0
β0
β0.0001
β0.0001
β0.0003
β0.0005
β0.0011
β0.0022
β1
1
β0
β0
β0
β0.0001
β0.0001
β0.0003
β0.0005
β0.0011
β1
β1
β1
β1
β1
β1
β1
β1
1
β1
β1
β1
β0
1
β1
β1
β0
β0
1
β1
β0
β0
β0
1
β0.0001
β0
β0
β0
β0.0001 β0.0001
β0
β0
β0.0003 β0.0001 β0.0001
β0
β0.0005 β0.0003 β0.0001 β0.0001
is singular. β‘
11
β1
β1
β1
β1
β1
β1
1
β0
β0
β0
β1
β1
β1
β1
β1
β1
β1
1
β0
β0
β1
β1
β1
β1
β1
β1
β1
β1
1
β0
β1
β1
β1
β1
β1
β1
β1
β1
β1
1
β€
β₯
β₯
β₯
β₯
β₯
β₯
β₯
β₯,
β₯
β₯
β₯
β₯
β₯
β₯
β¦
© Copyright 2026 Paperzz