ICS 6N Computational Linear Algebra Orthogonality and Least

ICS 6N Computational Linear Algebra
Orthogonality and Least Squares
Xiaohui Xie
University of California, Irvine
[email protected]
Xiaohui Xie (UCI)
ICS 6N
1 / 28
Inner product
Let x, y be vectors in R n . The inner product between x and y is
defined to be
x · y = x1 y1 + x2 y2 + . . . + xn yn
= xT y
x, y are called orthogonal if x · y = 0
Xiaohui Xie (UCI)
ICS 6N
2 / 28
Norm
The length (or norm) of a vector R n is defined by
√
√
kxk = x · x = x T x
Provide a measure on the length of a vector
Satisfying the following three properties:
kxk ≥ 0
Triangular inequality: ky + xk ≤ kxk + ky k
kcxk = |c|kxk
A vector whose length is 1 is called a unit vector.
Xiaohui Xie (UCI)
ICS 6N
3 / 28
Distance in R n
For u and v in R n , the distance between u and v , written as dist(u,v),
is the length of the vector u − v . That is
dist(u, v ) = ku − v k
Xiaohui Xie (UCI)
ICS 6N
4 / 28
Distance in R n : an example
Compute the distance between vectors u = (7, 1) and v = (3, 2).
Calculate
7
3
4
u−v =
−
=
1
2
−1
ku − v k =
Xiaohui Xie (UCI)
q
42 + (−1)2 =
ICS 6N
√
17
5 / 28
If x · y = 0, what is ky − xk?
Solution
ky − xk2 = (y − x)T (y − x)
= (y T − x T )(y − x)
= yT y − yT x − xT y + xT xT
= ky k2 − 2x T y + kxk2
And since x · y = x T y = 0 we have
ky − xk2 = ky k2 + kxk2
Xiaohui Xie (UCI)
ICS 6N
6 / 28
Orthogonal set
A set of nonzero vectors u1 , u2 , . . . , up in R n is said to be
orthogonal if ui · uj = 0 for any i 6= j
The set is orthonormal if it is orthogonal and kui k = 1 for
i = 1, 2, . . . , p
Xiaohui Xie (UCI)
ICS 6N
7 / 28
Example


 
 
1
1
0





u1 = −1 , u2 = 1 , u3 = 0
0
0
1
We can check: u1T u2 = 0 u1T u3 = 0 u2T u3 = 0. So it is an orthogonal set.
If W = span u1 , . . . , up , then u1 , . . . , up forms a basis for W, called
orthogonal basis.
Xiaohui Xie (UCI)
ICS 6N
8 / 28
Orthogonal basis
Let W = span u1 , . . . , up where u1 , . . . , up is an orthogonal basis
of W.
Let x be a vector in W. Find out c1 , c2 , . . . , cp such that
x = c1 u1 + c2 u2 + . . . + cp up
Since uiT x = uiT (c1 u1 + c2 u2 + . . . + cp up ) = ci uiT ui ,
ci =
Xiaohui Xie (UCI)
ICS 6N
uiT x
uiT ui
9 / 28
Matrix with orthonormal columns
Suppose u1 , u2 , . . . , un is an orthonormal set in R m
Let U be an mxn matrix defined as:
U = u1 u2 . . . un
Then
 
u1
u2   
U T U =  .  u1 u2 . . . un = In
.
.
un
Because the entries are orthogonal
Xiaohui Xie (UCI)
ICS 6N
10 / 28
Properties of matrices with orthonormal columns
a) kUxk = kxk for any x in R n
b) (U · x)(U · y ) = x · y for any x, y in R n
c) Ux · Uy = 0 if x · y = 0
Xiaohui Xie (UCI)
ICS 6N
11 / 28
Proof:
a) kUxk2 = (Ux)T Ux = x T U T Ux = x T x = kxk2
b) Ux · Uy = (Ux)T Uy = x T U T Uy = x T y = x · y
Xiaohui Xie (UCI)
ICS 6N
12 / 28
Orthogonal matrix
An nxn matrix U is called an orthogonal matrix if its column vectors
are orthonormal.
UT U = I
U −1 = U T
Xiaohui Xie (UCI)
ICS 6N
13 / 28
Orthogonal projection
Given a vector u in R n , consider the problem of decomposing a vector y in
R n into two components:
y = ŷ + z
where ŷ is in span u and z is orthogonal to u. ŷ is called the orthogonal
projection of y onto u.
Xiaohui Xie (UCI)
ICS 6N
14 / 28
Orthogonal projection
Decompose y = ŷ + z
Let ŷ = αu for some scalar α. Then
z = y − ŷ = y − αu
Since z · u = 0, then
u T (y − αu) = 0
so we have u T y = αu T u, and
α=
Proju (y ) = ŷ = αu =
Xiaohui Xie (UCI)
yT u
uT u
yT u
u
uT u
ICS 6N
15 / 28
Orthogonal projection: an example
7
4
Let y =
and u =
. Find the orthogonal projection of y onto u.
6
2
Xiaohui Xie (UCI)
ICS 6N
16 / 28
Orthogonal projection
Let W = span u1 , . . . , up is a subspace of R n , where u1 , . . . , up is an
orthogonal set. Decompose y into two components:
y = ŷ + z
where ŷ is a vector in W and z is orthogonal to W. ŷ is called the
orthogonal projection of y onto W.
Xiaohui Xie (UCI)
ICS 6N
17 / 28
Since ŷ is in W , write
ŷ = c1 u1 + c2 u2 + . . . + cp up
z = y − ŷ is orthogonal to W , implying that ui · z = 0 for every i.
From uiT (y − ŷ ) = 0 =⇒ ci =
uiT y
uiT ui
So the orthogonal project of y onto W is
ProjW (y ) = ŷ =
Xiaohui Xie (UCI)
upT y
u1T y
u
+
.
.
.
+
up
1
upT up
u1T u1
ICS 6N
18 / 28
Orthogonal projection: an example


 
 
2
−2
1
Let u1 =  5  , u2 =  1  and y = 2. Find the orthogonal
−1
1
3
projection of y onto W = Span{u1 , u2 }.
Xiaohui Xie (UCI)
ICS 6N
19 / 28
Best approximation theorem
Let W be a subspace of R n , and let y be any vector in R n . Let ŷ be
the orthogonal projection of y onto W . Then ŷ is the closest point
in W to y , in the sense that
ky − ŷ k < ky − v k
for all v in W distinct from ŷ .
Xiaohui Xie (UCI)
ICS 6N
20 / 28
Best approximation theorem
Let W be a subspace of R n , and let y be any vector in R n . Let ŷ be
the orthogonal projection of y onto W . Then ŷ is the closest point
in W to y , in the sense that
ky − ŷ k < ky − v k
for all v in W distinct from ŷ .
PROOF: Take v in W distinct from ŷ . Then
ky − v k = ky − ŷ + ŷ − v k
= ky − ŷ k + kŷ − v k > ky − ŷ k
Xiaohui Xie (UCI)
ICS 6N
21 / 28
Finding orthogonal basis
Givena basis x1, . . . , xp for a subspace W of R n , find an orthogonal
basis v1 , . . . , vp for W such that for any i = 1, · · · , p
span{v1 , · · · , vi } = span{x1 , · · · , xi }
Xiaohui Xie (UCI)
ICS 6N
22 / 28
Gram-Schmidt process for finding orthogonal basis
Givena basis x1, . . . , xp for a subspace W of R n , find an orthogonal
basis v1 , . . . , vp for W such that for any i = 1, · · · , p
span{v1 , · · · , vi } = span{x1 , · · · , xi }
1
v1 = x1
2
v2 = x2 − Projv1 (x2 ) = x2 −
3
v3 = x3 − Proj
span v1 ,v2
x2T v1
v
v1T v1 1
(x3 ) = x3 −
x3T v1
v
v1T v1 1
−
x3T v2
v
v2T v2 2
..
.
4
vp = xp − Proj
= xp −
(xp )
span v1 ,...vp−1
xpT v1
xT v
xT v
v − vpT v2 v2 + . . . + v Tp vp−1 vp−1
v1T v1 1
2
2
p−1 p−1
Xiaohui Xie (UCI)
ICS 6N
23 / 28
Gram-Schmidt process: an example
 
 
 
1
0
0
1
1
0

 
 
Let x1 = 
1 , x2 = 1 and x3 = 1. Construct an orthogonal basis for
1
1
1
W = Span{x1 , x2 , x3 }.
Xiaohui Xie (UCI)
ICS 6N
24 / 28
Least square problems
Suppose Ax = b has no solutions. Can we still find a solution x such
that Ax is “closest” to b?
Most common cases: A is an mxn matrix with m > n. The system
Ax = b has more equations than variables. So in general there is no
solution.
“Best solution” in the following sense: Find x̂ such that Ax̂ is the
closest point to b. That is,
kAx̂ − bk ≤ kAx − bk
for all x in R n .
x̂ is called the least square solution.
Xiaohui Xie (UCI)
ICS 6N
25 / 28
Find least square solutions of Ax = b
Problem: Find x̂ such that Ax̂ is closest to b.
The problem is equivalent to finding a point b̂ in Col A that is closest
to b.
From the best approximation theorem, the point in Col A closest to b
is the orthogonal projection of b onto Col A:
Ax̂ = b̂ = projCol
Xiaohui Xie (UCI)
ICS 6N
A
b
26 / 28
Find least square solutions of Ax = b
The point in Col A closest to b is
Ax̂ = projCol
A
b
The residual r = b − Ax̂ is orthogonal to Col A, implying that
(b − Ax̂) ⊥ ai =⇒ aiT (b − Ax̂) = 0 for all i
Written in matrix format, AT (b − Ax̂) = 0
So we have the normal equation
AT Ax̂ = AT b
If AT A is invertible, then
x̂ = (AT A)−1 AT b
Xiaohui Xie (UCI)
ICS 6N
27 / 28
Least square problems: an example
Find a least-squares solution of the inconsistent system Ax = b for


 
4 0
2
A = 0 2 , b =  0 
1 1
11
Compute
17 1
A A=
,
1 5
T
19
A b=
11
T
Solve the normal equation AT Ax̂ = AT b using Gaussian elimination,
1
x̂ =
2
Xiaohui Xie (UCI)
ICS 6N
28 / 28