Orthogonality - Chu Hai College of Higher Education

Chapter 6:Orthogonality
Page 1
Chapter 6: Orthogonality
In this chapter, we will introduce the concept of orthogonality. The concept
of orthogonaility plays a vital role in linear algebra, probability theory and
functional analysis. Topics in this chapter are listed as follows:
Section 1.1: Scalar Products in an Euclidean Space
Section 1.2: Orthogonality
Section 1.3: Projections
Section 1.4: Least Square Problems
Section 1.1: Scalar Products in an Euclidean Space
1. Definition of scalar or inner products:
Suppose V is a vector space.
For any x, y V, the scalar or inner product of x and y, denoted as <x, y>,
is a real number such that it satisfies the following three properties:
 <a x1 + b x2, y> = a <x1, y> + b<x2, y>,  x1, x2, y V, 
a, b R
 <x, y> = <y, x>
 <x, x>  0; <x, x> = 0 iff x = 0
2. Definition of a Euclidean space:
A vector space V with a scalar product is called a real Euclidean space
3. A typical example of Euclidean spaces:
Consider the vector space of all n-dimensional real vectors Rn
For any two vector x = (x1, x2, …, xn) and y = (y1, y2, …, yn),
we define the scalar product as
<x, y> = x1y1 + x2y2 + x3y3 + … + xnyn
Then, the vector space with the scalar product <x, y> is a Euclidean space
4. Example:
Suppose x = (1, 2, 3) and y = (2, 3, 6)
Then,
<x, y> = 1(2) + 2(3) + 3(6) = 26
Chapter 6:Orthogonality
Page 2
5. Question:
How many inner or scalar products we can define on Rn?
6. Example of Euclidean spaces:
Consider the vector space of all m  n matrices M(m, n)
For any A = (aij), B = (bij)  M(m, n), we define the inner product as
follows:
m
n
i 1
j 1
<A, B> =   aij bij
That is, the inner product is defined as the sum of the products of the
corresponding elements of A and B
Then, the vector space with the inner product <A, B> is a Euclidean space
7. Definition of trace of a square matrix:
The trace of a square matrix is the sum of all its diagonal elements
We denote the trace of a square matrix A by tr(A)
Example:
 1 2 1


Let A =  3 2 1 ; then tr(A) = 1 + 2 + 1 = 4
 3 3 1


By using the concept of the trace of a square matrix, we can write the inner
product <A, B> as follows:
<A, B> = tr(ABT)
Note that ABT is a n  n matrix
Exercise: Check that <A, B> = tr(ABT)
Chapter 6:Orthogonality
Page 3
8. Definition of norm:
Suppose E is a Euclidean space.
For any vector x  E, we define the norm of the vector x by a non-negative
real number || x || as follows:
|| x || =  x, x 
Interpretation or meaning of norm:
It is a generalization of the concept of length of a vector
Example:
Consider the space of n-dimensional real vectors Rn
For x = (x1, x2, …, xn)  Rn, the norm of x is given by
|| x || = (x12 + x22 + x32 + … + xn2)1/2
9. Cauchy-Schwartz inequality:
Suppose E is a Euclidean space.
For any vector x, y  E,
<x, y>2

<x, x> <y, y>
|<x, y>|

|| x || || y ||
Equivalently, we have
Proof:
Let z = tx + y, where x, y  E, t  R
Then,
<z, z> = < tx + y, tx + y > = t2<x, x> + 2t<x, y> + <y ,y>

0
Hence, the following quadratic equation in t can have either exactly one root
or no roots
Thus, its discriminant
D = 4 <x, y>2 – 4 <x, x> <y, y>
=> <x, y>2

<x, x> <y, y>

0
Chapter 6:Orthogonality
Page 4
10. Properties of a norm:
 || x ||

0 ; || x || = 0  x = 0
 || a x || = | a | || x ||,
 || x + y ||

x 
|| x || + || y ||,
E, a  R
 x,
y  E (Triangle inequality)
Remark: The triangle inequality means that the sum of the lengths of any two
sides of a triangle is greater than or equal to the length of the third side
Section 1.2: Orthogonality
1. Definition of the angle between two vectors:
Suppose E is a Euclidean space.
Let  be the angle between any two vectors x, y  E
Then,  is defined by the following equation:
cos  = <x, y>/(|| x || || y ||)
Note that by the Cauchy-Schwartz inequality
|<x, y>|
=> - || x || || y ||

|| x || || y ||

<x, y>

-1

<x, y>/(|| x || || y ||)
=> -1

cos 
=>

|| x || || y ||

1
1
2. Note that we can define angle between any two functions, any two
polynomials or any two matrices as long as we can define the
corresponding Euclidean spaces
It is not necessary to restrict the definition of the angle between any two
vectors in R2 , R3 or Rn, etc.
Chapter 6:Orthogonality
Page 5
3. Exercise:
Consider two vectors x = (1, 2, 3) and y = (2, 4, 6) in R3
Find the angle  between x and y
Solution:
cos  = <x, y>/(|| x || || y ||)
cos  =

(1)(2)  (2)(4)  (3)(6)
12  2 2  3 2  2 2  4 2  6 2
=0o
4. Exercise:
Consider the following two matrices in M(2, 2):
1 2 
 , B =
1 3 
A = 
1 0 


1 2 
Find the angle  between A and B
Solution:
cos  = <A,B>/(|| A || || B ||)
cos  =

(1)(1)  (2)(0)  (1)(1)  (3)(2)
12  2 2  12  3 2  12  0 2  12  2 2
= cos-1(8/3(10)0.5)
5. Definition of normalization:
Consider a vector x in a Euclidean space E
The normalization of the vector x is defined by
x / || x ||
Chapter 6:Orthogonality
Page 6
6. Exercise:
Consider the following matrix in M(2, 2)
1 2 

1
3


A = 
Find the normalization of A
Solution
1 2 
1
 / 12  2 2  12  3 2 =
15
1 3 
N(A)= 
1 2 


1 3 
7. Definition of Orthogonality:
Two vector x, y in a Euclidean space E are said to be orthogonal
if
<x, y> = 0
Note that if x, y are orthogonal to each other,
<x, y> = 0
=>
=>
cos  = 0

= 900
Chapter 6:Orthogonality
Page 7
Section 1.3: Projections
1. Definition of a projection:
Consider two vectors w and v in a Euclidean space E
The projection of the vector v along the vector w, denoted as projw v, is
equal to cw, for some c  R, such that
v – cw
is orthogonal to w
2. Graphically, we have
3. In order to find the projwv, we first have to find the scalar c
From the definition of projwv, we notice that v – cw and w are orthogonal
Hence, we have
<v – cw, w> = 0
=> <v, w> - c <w, w> = 0
=> c = <v, w>/<w, w>
This implies that
projw v = (<v, w>/<w, w>) w
4. Orthogonal component:
Note that from the definition v - projwv is orthogonal to w
We call the vector v - projwv the orthogonal component of the projection of
v along w
Hence, the orthogonal component is given by
v - (<v, w>/<w, w>) w
Chapter 6:Orthogonality
Page 8
5. Exercise:
Consider the following two vectors in R3
x = (1, 2, 3)T, y = (2, 6, 3)T
(a) Find the projection of x along y
(b) Find the orthogonal component of the projection in part (a)
Solution
(a)
(b)
 x, y  (1)( 2)  (2)(6)  (3)(3)

y
2 2  6 2  32
23

7
 x, y  y

orthogonal component= x 
y
y
c
1 
 2
23 1  


=  2   6 
7 7
3
3
 3 
 49 
 40 
=  
 49 
 1 29 
 49 
Chapter 6:Orthogonality
Page 9
6. Exercise:
Consider the following two matrices in M(2, 2)
A=
1 2 

 ,
1 3 
B=
1 1 


1 2 
(a)
Find the projection of A along B
(b)
Find the orthogonal component of the projection in part (a)
Solution
 A, B  (1)(1)  (2)(1)  (1)(1)  (3)( 2) 10 7

=
B
7
12  12  12  2 2
(a)
c
(b)
orthogonal component= A 
 A, B 
B
 B, B 
1 2 10 1 1
  7 1 2
1
3




=
 3
 7
=
3

 7
4
7
1

7
Chapter 6:Orthogonality
Section 1.4
Page 10
LEAST SQUARE PROBLEMS
A standard technique is mathematical and statistical modeling is to find a
least squares fit to a set of data points in the plane. The least squares curve is
usually the graph of a standard type of function such as a linear function, a
polynomial, or a trigonometric polynomial.
1.
Least Squares solutions to Overdetermined Systems
Given a system of equations Ax=b, where A is an mn matrix with m>n and
bRm, then for each xRn we can form a residual
r(x)=b-Ax
The distance between b and Ax is given by
b  Ax  r (x)
We wish to find a vector xRn for which r (x) will be a minimum.
Minimizing r (x) is equivalent to minimizing r (x) 2. A vector x̂ that
accomplishes this is said to be a least squares solution to the system Ax=b.
If x̂ is a least squares solution to the system Ax=b and p=A x̂ , then p is a
vector in the column space of A that is closest to b. The following theorem
guarantees that such a closest vector p not only exists, but is unique.
Additionally, it provides an important characterization of the closest vector.
2.
Theorem 1
Let S be a subspace of Rm. For each bRm there is a unique element p of S
that is closest to b, that is,
b y  b p
for any yp in S. Futhermore, a given vector p in S will be closest to a given
vector bRm if and only if b-pS.
Chapter 6:Orthogonality
Page 11
3.
Theorem 2
If A is an mn matrix of rank n, the normal equations
ATAx=ATb
have a unique solution
T
x̂ =(A
and
x̂
A)-1ATb
is the unique least squares solution to the system Ax=b.
4. Exercise
Find the least squares solution to the system
x1  x2  3
 2 x1  3 x2  1
2 x1  x2  2
Solution
The normal equations for this system are
1
1
 3
x
1

2
2


1  2 2  


1

 
1 3  1  2 3   x   1 3  1 1


 
 2 
 2  1
 2
The simplifies to the 22 system
 9  7   x1  5
 7 11   x   4

 2   
The solution to the 22 system is (
83 71 T
, )
50 50
Chapter 6:Orthogonality
5.
Page 12
Given a table of data
x
y
x1
y1
…
…
x2
y2
xm
ym
We wish to find a linear function
y=c0+c1x
that best fits the data in the least squares sense. If we require that
yi=c0+c1xi for
i=1, ….., m
we get a system of m equations in two unknowns.
1 x1 
 y1 
1 x  c
y 
2  0


 2


 : :   c1   : 


 
1 xm 
 ym 
(1)
The linear function whose coefficients are the least squares solution to (1) is
said to be the best least squares fit to the data by a linear function.
6.
Exercise
Given the data
x
y
0
1
3
4
6
5
Find the best least squares fit by a linear function.
Solution
For this example the system (4) becomes
Ac=y
where
1 0
c 
A  1 3 c   0 


 c1 
1 6
1 
y  4
 
5
Chapter 6:Orthogonality
Page 13
The normal equations
ATAc=ATy
Simplify to
3 9  c0  10 
9 45  c   42 

 1   
The solution of this system is (4/3, 2/3). Thus the best linear least squares fit
is given by
y
4 2
 x
3 3
If the data does not resemble a linear function, we could use a higher-degree
polynomial. To find the coefficients c0, c1, c2, …., cn of the best least squares
fit to the data
x
y
x1
y1
x2
y2
…
…
xm
ym
by a polynomial of degree n, we must find the least squares solution to the
system
1 x1

1 x2
:

1 xm
(2)
x1
2
x2
2
xm
2
n
x1  c0   y 1 
n 
  
... x2   c1   y 2 

 :   : 
  
n 
... xm  cn   y n 
...
7. Exercise
Find the best quadratic least squares fit to the data
x
y
0
3
1
2
2
4
3
4
Chapter 6:Orthogonality
Solution
Page 14
For this example the system (2) be
1
1

1

1
0 0
 3
c


0
1 1     2
 c1   
2 4    4
 c2   
3 9
 4
Thus the normal equations are
1
1 1 1 1  
0 1 2 3 1

 1
0 1 4 9 
1
0 0
 3
c
1
1
1
1




0
 2
1 1   
 c1  0 1 2 3  
  4
2 4   



  
c
0
1
4
9
 2  
3 9
 4
These simplify to
 4 6 14   c0  13 
 6 14 36   c   22 

 1   
14 36 98 c 2  54 
The solution to this system is (2.75, -0.25, 0.25). The quadratic polynomial
that gives the best least squares fit to the data is
p( x)  2.75  0.25 x  0.25 x 2
End of this chapter