Recall: Dot product on R : u · v = (u1,u2) · (v1,v2) = u1v1 + u2v2, u · u

Recall: Dot product on R2 :
u · v = (u1 , u2 ) · (v1 , v2 ) = u1 v1 + u2 v2 ,
u · u = u21 + u22 = ||u||2 .
Geometric Meaning:
u · v = ||u|| ||v|| cos θ.
u
θ
v
1
Reason: The opposite side is given by u − v.
||u − v||2 = (u − v) · (u − v)
=u·u−v·u−u·v+v·v
= ||u||2 + ||v||2 − 2u · v.
By Cosine law: c2 = a2 + b2 − 2ab cos θ, i.e.
||u − v||2 = ||u||2 + ||v||2 − 2||u|| ||v|| cos θ.
Comparing two equalities, we get:
u · v = ||u|| ||v|| cos θ.
2
Inner Product → Generalization of dot product.
Direct generalization to Rn :
u · v := u1 v1 + . . . + un vn =
n
∑
ui vi .
i=1
Using matrix notation:
n
∑
i=1

ui vi = [ u1

v1
.. 

. . . un ] · . = uT v = vT u.
vn
This is called the (standard) inner product on Rn .
3
Thm 1 (P.359): Let u, v, w ∈ Rn and c ∈ R. Then:
(i) u · v = v · u;
(ii) (u + v) · w = u · w + v · w;
(iii) (cu) · v = c(u · v);
(iv) u · u ≥ 0, and u · u = 0 iff u = 0.
Note: (iv) is sometimes called: “positive-definite” property.
• A general inner product is defined using the above 4
properties.
• For complex inner product, need to add complex conjugate to (i).
4
Def: The length (or norm) of v is defined as:
√
||v|| := v · v.
||v|| = 1 : called unit vectors.
Def: The distance between u, v is defined as:
dist(u, v) := ||u − v||.
Def: The angle between u, v is defined as:
̸
u, v) := cos
−1
5
u·v
.
||u|| · ||v||
Extra: The General Inner Product Space
Let V be a vector space over R or C.
Def: An inner product on V is a real/complex-valued function of two vector variables < u, v > such that:
(a) < u, v >= < v, u >; (conjugate symmetric)
(b) < u + v, w >=< u, w > + < v, w >;
(c) < cu, v >= c < u, v >;
(linear in the first vector variable)
(d) < u, u >≥ 0; and < u, u >= 0 iff u = 0.
(positive-definite property)
6
Def: A real/complex vector space V equipped with an inner
product is called an inner product space.
Note: (i) An inner product is conjugate linear in the second
vector variable:
< u, cv1 + dv2 >= c̄ < u, v1 > +d¯ < u, v2 > .
(ii) If we replace (a) by < u, v > “ = ” < v, u >, consider:
< iu, iu > “=” i2 < u, u >= − < u, u >,
it will be incompatible with (d).
• When working with complex inner product space, must
take complex conjugate when interchanging u, v.
7
Examples of (general) inner product spaces:
1. The dot product on Cn : (∗ : conjugate transpose)
< u, v >:= u1 v̄1 + . . . + un v̄n = v∗ u.
2. A non-standard inner product on R2 :
[
< u, v >:= u1 v1 −u1 v2 −u2 v1 +2u2 v2 = v
T
1
−1
3. An inner product on the matrix space Mm×n :
< A, B >:= tr(B ∗ A) =
m ∑
n
∑
j=1 k=1
8
ajk b̄jk .
]
−1
u.
2
4. Consider the vector space V of continuous real/complexvalued functions defined on the interval [a, b]. Then the
following is an inner product on V :
1
< f, g >:=
b−a
∫
b
f (t)g(t)dt.
a
[In real case, the norm ||f || will give the “root-meansquare” (r.m.s.) of area bounded by the curve of f and
the t-axis over the interval [a, b].]
9
Schwarz’s inequality:
(a1 b1 + . . . + an bn )2 ≤ (a21 + . . . + a2n )(b21 + . . . + b2n ).
Pf: The following equation cannot have distinct solution:
(a1 x + b1 )2 + . . . + (an x + bn )2 = 0
(a21 + . . . + a2n )x2 + 2(a1 b1 + . . . + an bn )x
+(b21 + . . . + b2n ) = 0
So ∆ ≤ 0, and this gives the inequality.
10
The Cauchy-Schwarz Inequality:
|u · v| ≤ ||u|| · ||v||,
and equality holds if, and only if, {u, v} is l.d.
Proof: When u ̸= 0, set û =
1
||u|| u.
Consider w = v − (v · û)û.
v
w
(v · û)û
u
Obviously, w · w = ||w||2 ≥ 0 ⇒ Cauchy-Schwarz Inequality.
11
Set k = v · û:
0 ≤ (v − k û) · (v − k û)
= v · v − 2k(v · û) + k 2 (û · û)
= ||v||2 − k 2 ,
v·u
Note that k =
, so:
||u||
2
(v
·
u)
2
k2 =
≤
||v||
||u||2
⇒
(u · v)2 ≤ ||u||2 · ||v||2 .
Taking positive square roots, we obtain the result.
12
Thm: (Triangle Inequality) For u, v ∈ Rn :
||u + v|| ≤ ||u|| + ||v||,
and equality holds iff one of the vectors is a non-negative
scalar multiple of the other.
Proof: Consider ||u + v||2 .
(u + v) · (u + v) = ||u||2 + 2(u · v) + ||v||2
≤ ||u||2 + 2||u|| · ||v|| + ||v||2
= (||u|| + ||v||)2 .
Taking square root, we obtain the inequality.
13
Orthogonality: Pythagoras’ Theorem in vector form:
u+v
||u + v||2 = ||u||2 + ||v||2
v
u
But in general we have:
||u + v||2 = ||u||2 + 2(u · v) + ||v||2 ,
so we need u · v = 0.
14
Def: Let u, v be two vectors in Rn . When u · v = 0, we say
that u is orthogonal to v, denoted by u ⊥ v.
• This generalizes the concept of perpendicularity.
• 0 is the only vector that is orthogonal to every vector v
in Rn .
Example: In R2 , we have:
[ ] [
]
3
4
⊥
.
4
−3
Thm 2 (P.362): u and v are orthogonal iff
||u + v||2 = ||u||2 + ||v||2 .
15
Common Orthogonality:
Def: Let S be a set of vectors in Rn . If u is orthogonal to
every vector in S, we will say “u is orthogonal to S”, denoted
by u ⊥ S.
i.e. We can regard u to be a “common perpendicular” to S.
Examples: (i) 0 ⊥ Rn .
(ii) In R2 , let S = x-axis. Then e2 ⊥ S.
(iii) In R3 , let S = x-axis. Then both e2 ⊥ S and e3 ⊥ S.
Exercise: Let u, v ⊥ S. Show that:
(i) (au + bv) ⊥ S for any numbers a, b;
***
16
(ii) u ⊥ Span S.
Orthogonal Complement:
Def: Let S be a set of vectors in Rn . We define:
S ⊥ := {u ∈ Rn | u ⊥ S},
called the orthogonal complement of S in Rn .
i.e. S ⊥ collects all the “common perpendiculars” to S.
Examples: (i) {0}⊥ = Rn , (Rn )⊥ = {0}.
(ii) In R2 , let S = x-axis. Then S ⊥ = y-axis.
(iii) In R3 , take S = {e1 }. Then S ⊥ = yz-plane.
17
Thm: S ⊥ is always a subspace of Rn .
Checking: (i) 0 ⊥ v for every v ∈ S. So 0 ∈ S ⊥ .
(ii) Pick any u1 , u2 ∈ S ⊥ . For any scalars a, b ∈ R, consider:
(au1 + bu2 ) · v = a(u1 · v) + b(u2 · v) = a · 0 + b · 0 = 0,
whenever v ∈ S. So au1 + bu2 ∈ S ⊥ (cf. previous exercise).
Note: S itself need not be a subspace.
Thm: (a) S ⊥ = (Span S)⊥ . (b) Span S ⊆ (S ⊥ )⊥ .
Pf: (a) S ⊥ ⊇ (Span S)⊥ is easy to see, since any vector
u ⊥ Span S must also satisfy u ⊥ S.
18
Now, pick any u ∈ S ⊥ . For every v ∈ Span S, write:
v = c1 v1 + . . . + cp vp ,
vi ∈ S, i = 1, . . . , p.
Then since u ⊥ S:
u · v = c1 (u · v1 ) + . . . + cp (u · vp ) = 0,
and hence u ∈ (Span S)⊥ , so “S ⊥ ⊆ (Span S)⊥ ” is proved.
(b) Pick a vector w ∈ Span S, we have l.c.:
w = c1 v1 + . . . + cp vp .
For any u ∈ S ⊥ :
w · u = c1 (v1 · u) + . . . + cp (vp · u) = 0
19
⇒
w ∈ (S ⊥ )⊥ .
Thm 3 (P.363): Let A be an m × n matrix. Then:
(Row A)⊥ = Nul A
and (Col A)⊥ = Nul AT .
Pf: The product Ax can be rewritten as:

 T


r1 · x
r1


.
Ax =  ..  x =  ...  .
rm
rTm · x
So x ∈ Nul A ⇔ x ∈ {rT1 , . . . rTm }⊥ ⇔ x ∈ (Row A)⊥ .
Hence (Row A)⊥ = Nul A. Apply the result to AT , we obtain:
(Col A)⊥ = (Row AT )⊥ = Nul AT .
20
Orthogonal sets and Orthonormal sets
Def: A set S is called orthogonal if any two vectors in S are
always orthogonal to each other.
Def: A set S is called orthonormal if (i) S is orthogonal, and
(ii) each vector in S is of unit length.
Example: Orthonormal set:

 




3
−1
−1 
 1
1
1
√  1  , √  2  , √  −4  .
 11

6
66
1
1
7
21
Thm 4 (P.366): An orthogonal set S of non-zero vectors is
always linearly independent.
Pf: Let S = {u1 , u2 , . . . , up } and consider the relation:
c1 u1 + c2 u2 + . . . + cp up = 0.
Take inner product with u1 , then:
c1 (u1 · u1 ) + c2 (u1 · u2 ) + . . . + cp (up · u1 ) = 0 · u1
c1 ||u1 ||2 + c2 · 0 + . . . + cp · 0 = 0.
As ||u1 || ̸= 0, we must have c1 = 0. Similarly for other ci .
So S must be l.i.
22
The method of proof of previous Thm 4 gives:
Thm 5 (P.367): Let S = {u1 , . . . , up } be an orthogonal set
of non-zero vectors and let v ∈ Span S. Then:
v · u1
v · up
v=
u1 + . . . +
up .
2
2
||u1 ||
||up ||
Pf: Let c1 , . . . , cp be such that v = c1 u1 + . . . + cp up . Take
inner product with u1 , we have:
v · u1 = c1 (u1 · u1 ) + . . . + cp (up · u1 ) = c1 ||u1 ||2 .
So c1 =
v·u1
||u1 ||2 .
Similarly for other ci .
23
Thm 5′ : Let S = {û1 , . . . ûp } be an orthonormal set. Then
for any v ∈ Span S, we have:
v = (v · û1 )û1 + . . . + (v · ûp )ûp .
Remark: This generalizes our familiar expression in R3 :
v = (v · i)i + (v · j)j + (v · k)k.
Example: Express v as a l.c. of the vectors in S:
  
 
 

1
−1
−1 
 3
v =  2  , S =  1  ,  2  ,  −4  .


3
1
1
7
24
New method: Compute c1 , c2 , c3 directly:
   
1
3
2 · 1
3
1
 
c1 =
3
||  1  ||2
1
⇒
  

1
−1
2 ·  2 
3
1


c2 =
−1
||  2  ||2
1
8
,
c1 =
11
6
c2 = = 1,
6
25
  

1
−1
 2  ·  −4 
3
7


c3 =
−1
||  −4  ||2
7
12
2
c3 =
=
.
66
11
Exercise: Determine if v ∈ Span {u1 , u2 }.


3
v =  2 ,
−5
 
1
u1 =  2  ,
2
***
26


−2
u2 =  2  .
−1
Orthogonal basis and Orthonormal basis
Def: A basis for a subspace W is called an orthogonal basis
if it is an orthogonal set.
Def: A basis for a subspace W is called an orthonormal basis
if it is an orthonormal set.
Examples: (i) {e1 , . . . , en } is an orthonormal basis for Rn .
[ ] [
]
3
4
(ii) S = {
,
} is an orthogonal basis for R2 .
4
−3
[3] [ 4 ]
S ′ = { 54 , 53 } is an orthonormal basis for R2 .
−5
5
27
(iii) The following set S:
  
 

−1
−1 
 3
S =  1  ,  2  ,  −4 


1
1
7
is an orthogonal basis for R3 .
(iv) The columns of an n × n orthogonal matrix A will form
an orthonormal basis for Rn .
Orthogonal matrix: square matrix and AT A = In .
28
Checking: Write A = [ v1
...
vn ].
(i, j)-th entry of AT A = viT vj = vi · vj .
{
1, i = j,
(i, j)-th entry of In =
0, i ̸= j.
Above checking also works for non-square matrix:
Thm 6 (P.371): The n columns of an m × n matrix U are
orthonormal iff U T U = In .
But for square matrices: AB = I ⇒ BA = I. So:
(iv)′ The rows of an n × n orthogonal matrix A (written in
column form) also form an orthonormal basis for Rn .
29
Matrices having orthonormal columns are very special:
Thm 7 (P.371): Let T : Rn → Rm be a linear transformation given by an m × n standard matrix U with orthonormal
columns. Then for any x, y ∈ Rn :
a. ||U x|| = ||x||
(preserving length)
b. (U x) · (U y) = x · y
(preserving inner product)
c. (U x) · (U y) = 0 iff x · y = 0
(preserving orthogonality)
Pf: Direct verifications using U T U = In .
Results not true for just orthogonal columns.
30
Recall:
Let S = {u1 , . . . , up } be orthogonal. When v ∈ W = Span S,
we have:
v · up
v · u1
u1 + . . . +
up .
v=
2
2
||u1 ||
||up ||
What happens if v ̸∈ W ?
• LHS ̸= RHS, as RHS is always a vector in W .
• v′ = RHS is still computable.
What is the relation between v and v′ ?
31
LHS = v,
p
∑
v · ui
′
RHS = v =
ui .
2
||ui ||
i=1
Take inner product of RHS with uj :
′
v · uj =
(∑
p
i=1
v · ui
ui
2
||ui ||
)
· uj
p
∑
v · ui
v · uj
=
(ui · uj ) =
(uj · uj ) = v · uj ,
2
2
||ui ||
||uj ||
i=1
which is the same as LHS · uj .
32
In other words, (v − v′ ) · uj = 0 for j = 1, . . . , p.
Thm: The vector z = v − v′ is orthogonal to every vector
in Span S, i.e. z ∈ (Span S)⊥ .
v
z
W
ui
v′
33
Def: Let {u1 , . . . , up } be an orthogonal basis for W . For
each v in Rn , the following vector in W :
v · u1
v · up
u1 + . . . +
up ,
projW v :=
2
2
||u1 ||
||up ||
is called the orthogonal projection of v onto W .
Remark: {u1 , . . . , up } must be orthogonal, otherwise RHS
will not give us the correct vector v′ .
Note: v = projW v ⇔ v ∈ W.
34
Example: In R3 , consider S = {e1 , e2 }. Then W = Span S
is the xy-plane. For any vector v ∈ R3 :
z
v
v·e2
||e2 ||2
v·e1
||e1 ||2
x
e1
projW v
35
e2
y
 
x
projW v =  y 
0
Exercise: Consider in R3 and W = Span {u1 , u2 }. Find
projW v:
 
1
v = 0,
1


2
u1 =  −2  ,
1
***
36


2
u2 =  1  .
−2
Def: The decomposition:
v = projW v + (v − projW v),
(v − projW v) ∈ W ⊥ ,
is called the orthogonal decomposition of v w.r.t. W .
v
z = v − projW v
W
w = projW v
Thm 8 (P.376): Orthogonal decomposition w.r.t. W is the
unique way to write v = w + z with w ∈ W and z ∈ W ⊥ .
37
Exercise: Find the orthogonal projection of v onto W =
Nul A.
 
1
2
A = [ 1 1 −1 −1 ] , v =   .
3
4
***
Thm 9 (P.378): Let v ∈ Rn and let w ∈ W . Then we have:
||v − projW v|| ≤ ||v − w||,
and equality holds only when w = projW v.
38
Pf: We can rewrite v − w as:
v − w = (v − projW v) + (projW v − w).
v − projW v
v
W
projW v
w
v−w
projW v − w
Can apply “Pythagoras Theorem” to the right-angle triangle.
39
||v − w||2 = ||v − projW v||2 + ||projW v − w||2
≥ ||v − projW v||2 ,
and equality holds iff ||projW v − w|| = 0 iff w = projW v.
Because of the inequality:
||v − projW v|| ≤ ||v − w||,
projW v sometimes is called the best approximation of v by
vectors in W .
40
Def: The distance of v to W is defined as:
dist(v, W ) := ||v − projW v||.
Obviously, v ∈ W iff dist(v, W ) = 0.
Exercise: Let W = Span {u1 , u2 , u3 }. Find dist(v, W ):




 
 
1
1
1
2
 −1 
 1 
1
4
u1 = 
 , u2 = 
 , u3 =   and v =   .
1
−1
1
6
−1
−1
1
4
Sol: Remeber to check that {u1 , u2 , u3 } is orthogonal.
***
41
Extension of Orthogonal Set
Let S = {u1 , . . . , up } be an orthogonal basis for W = Span S.
When W ̸= Rn , we can find a vector v ̸∈ W and:
z = v − projW v ̸= 0.
This vector z is in W ⊥ , i.e. will satisfy:
z · w = 0 for every w ∈ W .
Hence the following set will again be orthogonal:
S ∪ {z} = {u1 , . . . , up , z}.
42
Thm: Span(S ∪ {v}) = Span(S ∪ {z}).
In other words, we can extend an orthogonal set S by
adding the vector z.
• S1 = {u1 } orthogonal, v2 ̸∈ Span S1 , then compute z2 .
⇒ S2 = {u1 , z2 } is again orthogonal.
and Span {u1 , v2 } = Span {u1 , z2 }.
• S2 = {u1 , u2 } orthogonal, v3 ̸∈ Span S2 , compute z3 .
⇒ S3 = {u1 , u2 , z3 } is again orthogonal.
and Span {u1 , u2 , v3 } = Span {u1 , u2 , z3 }.
..
.
This is called the Gram-Schmidt orthogonalization process.
43
Thm 11 (P.383): Let {x1 , . . . , xp } l.i.. Define u1 = x1 and:
x2 · u1
u1
u 2 = x2 −
2
||u1 ||
x3 · u2
x3 · u1
u 3 = x3 −
u2 −
u1
2
2
||u2 ||
||u1 ||
..
.
up = xp −
p−1
∑
xp · u i
i=1
||ui
||2
ui .
Then {u1 , . . . , up } will be orthogonal and for 1 ≤ k ≤ p:
Span {x1 , . . . , xk } = Span {u1 , . . . , uk }.
44
Notes: (i) Must use {ui } to compute projWk xk+1 since the
formula:
k
∑
xk+1 · ui
ui ,
projWk xk+1 =
2
||ui ||
i=1
is only valid for orthogonal set {ui }.
(ii) If obtain uk = 0 for some k, i.e. xk = projW xk , we have:
xk ∈ Span {x1 , . . . , xk−1 }.
so {x1 , . . . , xk } will be l.d. instead.
(iii) All the ui will be non-zero vectors as {xi } is l.i.
45
Example: Apply Gram-Schmidt Process to {x1 , x2 , x3 }:
 

 

1
2
1
x1 =  1  , x2 =  0  , x3 =  1  .
0
−1
1
Solution: Take u1 = x1 . Then:


1
x2 · u 1
2
u 2 = x2 −
u1 = x2 − u1 =  −1  ,
2
||u1 ||
2
−1
 1 
3
−1
2
u 3 = x3 −
u2 − u1 =  − 31  .
3
2
2
3
46
Example: Apply Gram-Schmidt Process to {x1 , x3 , x2 }:
 
 


1
1
2
x1 =  1  , x3 =  1  , x2 =  0  .
0
1
−1
Solution: Take u′1 = x1 . Then:
 
0
′
x
·
u
2
3
1 ′
′
0,
u′2 = x3 −
u
=
u
=
x
−
3
1
1
′
||u1 ||2
2
1


1
2
−1
u′2 − u′1 =  −1  .
u′3 = x2 −
1
2
0
47
Exercise: Find an orthogonal basis for Col A:

1
3
A=
1
1
3
4
1
2
1
−2
1
2

2
1
.
1
2
Sol: First find a basis for Col A (e.g. pivot columns of A).
Then apply Gram-Schmidt Process.
***
48
Approximation Problems: Solve Ax = b.
Due to the presence of errors, a consistent system may appear
as an inconsistent system:




 x1 + x2 = 1
 x1 + x2 = 1.01
x1 − x2 = 0
→
x1 − x2 = 0.01




2x1 + 2x2 = 2
2x1 + 2x2 = 2.01
Also in practice, exact solutions are usually not necessary.
• How to obtain a “good” approximate “solution” for the
above inconsistent system?
49
Least squares solution: How to measure the “goodness”
of x0 as an approximated solution to the system:
Ax = b?
• Minimize the difference ||x − x0 ||
Problem: But x is unknown......
Another way of approximation:
x0 ≈ x
“ ⇒ ” Ax0 ≈ Ax = b.
50
Analysis: Find x0 such that:
Ax0 = b0 ,
and b0 is as close to b as possible.
• b0 must be in Col A.
• ||b − b0 ||2 is a sum of squares → least squares solution.
Best approximation property of orthogonal projection:
||b − projW b|| ≤ ||b − w||
for every w in W = Col A.
Should take b0 = projW b.
51
Example: Find the least squares solution of the inconsistent
system:


 x1 + x2 = 1.01
x1 − x2 = 0.01


2x1 + 2x2 = 2.01
To compute projW b, we need an orthogonal basis for W =
Col A first.
  

1
1
a basis for Col A is: { 1  ,  −1 }.
2
2
52
Then by Gram-Schmidt Process, we get an orthogonal basis
for W = ColA:  

  

1
1
1
1
{ 1  ,  −1 } −→ { 1  ,  −5 }.
2
2
2
2
Compute b0 = projW b:

  

 

1.01
1
1.01
1
 0.01  ·  1     0.01  ·  −5  

1
1
2.01
2
2.01
2
1 +
 −5 
 


b0 =
1
1
2
2
2
2




|| 1 ||
|| −5 ||
2
2
53

Hence:

1.006
b0 =  0.01  .
2.012
Since b0 ∈ Col A, the system Ax0 = b0 must be consistent.
Solving Ax0

1
1
2
= b0 :
1 |
−1 |
2 |


1.006
1 0
0.01  →  0 1
2.012
0 0

| 0.508
| 0.498  .
|
0
Thus we have the following least squares solution:
[
]
0.508
x0 =
.
0.498
54
But we have the following result:
(Col A)⊥ = Nul AT .
Then, since we take b0 = projCol A b:
(b − b0 ) ∈ (Col A)⊥ ⇔ (b − b0 ) ∈ Nul AT
⇔ AT (b − b0 ) = 0
⇔ AT b0 = AT b.
So, if x0 is an approximate solution, we have:
AT (Ax0 ) = AT b.
The above is usually called the normal equation of Ax = b.
55
Thm 13 (P.389): The least squares solutions of Ax = b are
the solutions of the normal equation AT Ax = AT b.
In the following case, the least square solution will be unique:
Thm 14 (P.391): Let A be an m×n matrix with rank A = n.
Then the n × n matrix AT A is invertible.
Example: Find again the least squares solution:


 x1 + x2 = 1.01
x1 − x2 = 0.01


2x1 + 2x2 = 2.01
56
Solution: Solve the normal equation. Compute:


[
] 1 1
[
]
1 1 2 
6 4
T

A A=
1 −1 =
,
1 −2 2
4 6
2 2


[
[
] 1.01
]
1 1 2 
5.04
T

A b=
0.01 =
.
1 −2 2
5.02
2.01
So the normal equation is:
[
][ ] [
]
6 4
x1
5.04
=
4 6
x2
5.02
57
[
⇒
x1
x2
]
[
]
0.508
=
.
0.498
Least Squares Problems
Linear Regression: Fitting data (xi , yi ) with straight line.
y
x
To “minimize” the differences indicated by the red intervals.
58
When a straight line y = c + mx can pass through all the
points, it will of course “best fit” the data. This requires:

c + mx1 = y1



..
.



c + mxn = yn

↔



y1
x1 [ ]
 .. 
..  c
=
 . 
.
m
1 xn
yn
1
 ...
being consistent.
But in general the above system Ax = b is inconsistent.
59
Measurement of closeness: square sum of y-distances.
|y1 − (mx1 + c)|2 + . . . + |yn − (mxn + c)|2 .
Note that this is expressed as ||b − b0 ||2 , where:


y1
 .. 
b =  . ,
yn


c + mx1
..
.
b0 = 
.
c + mxn
b0 ∈ Col A since Ax = b0 is consistent.
→ Use normal equation!
60
Example: Find a straight line that best fits the points:
(2, 1),
(5, 2),
(7, 3),
(8, 3),
in the sense of minimizing the square-sum of y-distances.
Sol: The (inconsistent) system is:

1
1

1
1

 
2 [ ]
1
5 c
2
=  .

7
m
3
8
3
We are going the find its least squares solution.
61
Compute:


1
2
[
]
[
1 1 1 1 1 5
4
T
A A=

=
2 5 7 8
1 7
22
1 8
 
[
] 1
[ ]
1 1 1 1 2
9
T
A b=
.
 =
2 5 7 8
3
57
3
62
]
22
,
142
So the normal equation is:
[
4 22
22 142
][
]
[
]
c
9
=
,
m
57
5
).
which has a unique solution of ( 27 , 14
The “best fit” straight line will be:
2
5
y = + x.
7 14
63
Polynomial Curve Fitting:
Example: Find a polynomial curve of degree at most 2
which best fits the following data:
(2, 1),
(5, 2),
(7, 3),
(8, 3),
in the sense of least squares.
Sol: Consider the general form of the fitting curve:
y = a0 · 1 + a1 · x + a2 · x2 .
64
The curve cannot pass through all the 4 points as:

2
a
·
1
+
a
·
2
+
a
·
2

0
1
2



 a0 · 1 + a1 · 5 + a2 · 52
2

a
·
1
+
a
·
7
+
a
·
7

0
1
2



a0 · 1 + a1 · 8 + a2 · 82
is inconsistent.
Again, use normal equation.
65
=1
=2
=3
=3
The corresponding normal equation AT Ax = AT b is:

4
22
 22 142
142 988




142
a0
9
988   a1  =  57  ,
7138
a2
393
19 19
1
which has a unique solution of ( 132
, 44 , − 132
).
So the best fitting polynomial is:
19
1 2
19
+ x−
x .
y=
132 44
132
66
General Curve Fitting:
Example: Find a curve in the form c0 + c1 sin x + c2 sin 2x
which best fits the following data:
π
π
π
π
( , 1), ( , 2), ( , 3), ( , 3),
6
4
3
2
in the sense of least squares.
Sol: Let y = c0 · 1 + c1 · sin x + c2 · sin 2x. The system

π
2π
c
·
1
+
c
·
sin
+
c
·
sin
=1
0
1
2
6
6


c0 · 1 + c1 · sin π4 + c2 · sin 2π
4 =2
π
2π
c
·
1
+
c
·
sin
+
c
·
sin

0
1
2
3
3 =3

c0 · 1 + c1 · sin π2 + c2 · sin 2π
2 =3
is inconsistent.
67
Solving AT Ax = AT b ......
√
√
√
184 − 39 2 − 89 3 + 9 6
√
√
√ ≈ −2.29169
c0 =
78 − 18 2 − 38 3 + 6 6
√
√
√
9+3 2−7 3−2 6
√
√
√ ≈ 5.31308
c1 =
−39 + 9 2 + 19 3 − 3 6
√
√
√
8 + 9 2 − 10 3 − 6 6
√
√
√ ≈ 0.673095
c2 =
78 − 18 2 − 38 3 + 6 6
So the best fitting function is:
(−2.29169) + (5.31308) sin x + (0.673095) sin 2x.
68
Extra: Continuous Curve Fitting
Find g(x) best fitting a given f (x).
g(x)
f (x)
Try to minimize the difference (area) between two curves.
69
• To minimize the “root-mean-square” (r.m.s.) of area
between two curves:
√
∫ b
1
|f (x) − g(x)|2 dx.
b−a a
• Given by the following inner product:
∫ b
1
< f, g >=
f (x)g(x)dx.
b−a a
• not in Rn , not the standard inner product...
No normal equation.
• But we can use orthogonal projection.
70
Recall: Formula of orthogonal projection in general:
p
∑
< y, ui >
projW y =
ui ,
< ui , ui >
i=1
where {u1 , . . . , up } is an orthogonal basis of W .
Example: Fit f (x) = x over [0, 1] by l.c. of
S = {1, sin 2πkx, cos 2πkx; k = 1, 2, . . . , n}
Sol: S is orthogonal under the inner product:
∫ 1
f (x)g(x)dx.
< f, g >=
0
(direct checking)
71
So compute those “< y, ui >”:
1
< f (x), 1 >= ,
2
1
< f (x), sin 2πkx >= −
,
2πk
< f (x), cos 2πkx >= 0.
We also need those “< ui , ui >”:
< 1, 1 >= 1,
1
< sin 2πkx, sin 2πkx >= ,
2
1
< cos 2πkx, cos 2πkx >= .
2
72
So the best fitting curve is g(x) = projW f (x):
(
)
1
1 sin 2πx sin 4πx
sin 2nπx
g(x) = −
+
+ ... +
2 π
1
2
n
When n = 5:
1
0.8
0.6
0.4
0.2
0.2
0.4
73
0.6
0.8
1
Example: Let f (x) = sgn(x), the sign of x:

 −1 for x < 0
sgn(x) =
0 for x = 0

1 for x > 0
Find the best r.m.s. approximation function over [−1, 1] using l.c. of S = {1, sin kπx, cos kπx; k = 1, 2, 3, . . . , 2n + 1}.
Sol: Interval changed. Use new inner product:
1
< f, g >=
2
∫
1
f (x)g(x)dx.
−1
74
Then S is orthogonal (needs another checking) and:
1
< 1, 1 >= 1, < sin kπx, sin kπx >= =< cos kπx, cos kπx > .
2
So, we compute:
< sgn(x), 1 > = 0;
{
0
< sgn(x), sin kπx > =
2
kπ
< sgn(x), cos kπx > = 0.
75
if k is even,
if k is odd;
Hence the best r.m.s. approx. to sgn(x) over [−1, 1] is:
(
)
4
sin 3πx sin 5πx
sin(2n + 1)πx
sin πx +
+
... +
.
π
3
5
2n + 1
When 2n + 1 = 9:
1
0.5
-1
-0.5
0.5
-0.5
-1
76
1