Jim Lambers
COS 702
Spring Semeseter 2010-11
Lecture 1 Notes
Discrete Least Squares Approximations
One of the most fundamental problems in science and engineering is data ο¬ttingβconstructing a
function that, in some sense, conforms to given data points. Two such data-ο¬tting techniques are
polynomial interpolation and piecewise polynomial interpolation. Interpolation techniques, of any
kind, construct functions that agree exactly with the data. That is, given points (π₯1 , π¦1 ), (π₯2 , π¦2 ),
. . ., (π₯π , π¦π ), interpolation yields a function π (π₯) such that π (π₯π ) = π¦π for π = 1, 2, . . . , π.
However, ο¬tting the data exactly may not be the best approach to describing the data with a
function. We have seen that high-degree polynomial interpolation can yield oscillatory functions
that behave very diο¬erently than a smooth function from which the data is obtained. Also, it
may be pointless to try to ο¬t data exactly, for if it is obtained by previous measurements or other
computations, it may be erroneous. Therefore, we consider revising our notion of what constitutes
a βbest ο¬tβ of given data by a function.
One alternative approach to data ο¬tting is to solve the minimax problem, which is the problem
of ο¬nding a function π (π₯) of a given form for which
max β£π (π₯π ) β π¦π β£
1β€πβ€π
is minimized. However, this is a very diο¬cult problem to solve.
Another approach is to minimize the total absolute deviation of π (π₯) from the data. That is,
we seek a function π (π₯) of a given form for which
π
β
β£π (π₯π ) β π¦π β£
π=1
is minimized. However, we cannot apply standard minimization techniques to this function, because, like the absolute value function that it employs, it is not diο¬erentiable.
This defect is overcome by considering the problem of ο¬nding π (π₯) of a given form for which
π
β
[π (π₯π ) β π¦π ]2
π=1
is minimized. This is known as the least squares problem. We will ο¬rst show how this problem is
solved for the case where π (π₯) is a linear function of the form π (π₯) = π1 π₯ + π0 , and then generalize
this solution to other types of functions.
1
When π (π₯) is linear, the least squares problem is the problem of ο¬nding constants π0 and π1
such that the function
π
β
πΈ(π0 , π1 ) =
(π1 π₯π + π0 β π¦π )2
π=1
is minimized. In order to minimize this function of π0 and π1 , we must compute its partial derivatives
with respect to π0 and π1 . This yields
π
π
β
βπΈ
=
2(π1 π₯π + π0 β π¦π ),
βπ0
β
βπΈ
=
2(π1 π₯π + π0 β π¦π )π₯π .
βπ1
π=1
π=1
At a minimum, both of these partial derivatives must be equal to zero. This yields the system of
linear equations
(π )
π
β
β
π₯π π1 =
π¦π ,
ππ0 +
π=1
(
π
β
)
π₯π
π0 +
π=1
(π
β
)
π₯2π
π=1
π1 =
π=1
π
β
π₯π π¦π .
π=1
These equations are called the normal equations.
Using the formula for the inverse of a 2 × 2 matrix,
]
[
]β1
[
1
π βπ
π π
,
=
π
π π
ππ β ππ βπ
we obtain the solutions
) βπ
βπ
β
( π=1 π¦π ) β ( π
π=1 π₯π ) ( π=1 π₯π π¦π )
,
β
βπ
2
2
π π
π=1 π₯π β ( π=1 π₯π )
βπ
βπ
β
π π
π=1 π₯π π¦π β ( π=1 π₯π ) ( π=1 π¦π )
.
βπ
β
2
2
π π
π=1 π₯π β ( π=1 π₯π )
(βπ
π0 =
π1 =
2
π=1 π₯π
Example We wish to ο¬nd the linear function π¦ = π1 π₯ + π0 that best approximates the data shown
in Table 1, in the least-squares sense. Using the summations
π
β
π₯π = 56.2933,
π=1
π
β
π
β
π₯2π = 380.5426,
π=1
π=1
π¦π = 73.8373,
π
β
π₯π π¦π = 485.9487,
π=1
we obtain
π0 =
π1 =
380.5426 β
73.8373 β 56.2933 β
485.9487
742.5703
=
= 1.1667,
2
10 β
380.5426 β 56.2933
636.4906
10 β
485.9487 β 56.2933 β
73.8373
702.9438
=
= 1.1044.
2
10 β
380.5426 β 56.2933
636.4906
2
π
1
2
3
4
5
6
7
8
9
10
π₯π
2.0774
2.3049
3.0125
4.7092
5.5016
5.8704
6.2248
8.4431
8.7594
9.3900
π¦π
3.3123
3.8982
4.6500
6.5576
7.5173
7.0415
7.7497
11.0451
9.8179
12.2477
Table 1: Data points (π₯π , π¦π ), for π = 1, 2, . . . , 10, to be ο¬t by a linear function
We conclude that the linear function that best ο¬ts this data in the least-squares sense is
π¦ = 1.1044π₯ + 1.1667.
The data, and this function, are shown in Figure 1. β‘
It is interesting to note that if we deο¬ne the π × 2 matrix π΄, the 2-vector a, and the π-vector
y by
β‘
β€
β‘
β€
1 π₯1
π¦1
[
]
β’ 1 π₯2 β₯
β’ π¦2 β₯
π0
β’
β₯
β’
β₯
π΄=β’ .
, a=
, y = β’ . β₯,
β₯
.
.
.
.
π
β£ .
β£ . β¦
1
. β¦
1 π₯π
π¦π
then a is the solution to the system of equations
π΄π π΄a = π΄π y.
These equations are the normal equations deο¬ned earlier, written in matrix-vector form. They arise
from the problem of ο¬nding the vector a such that
β£π΄a β yβ£
is minimized, where, for any vector u, β£uβ£ is the magnitude, or length, of u.
In this case, this expression is equivalent to the square root of the expression we originally
intended to minimize,
π
β
(π1 π₯π + π0 β π¦π )2 ,
π=1
3
Figure 1: Data points (π₯π , π¦π ) (circles) and least-squares line (solid line)
but the normal equations also characterize the solution a, an π-vector, to the more general linear
least squares problem of minimizing β£π΄a β yβ£ for any matrix π΄ that is π × π, where π β₯ π, whose
columns are linearly independent.
We now consider the problem of ο¬nding a polynomial of degree π that gives the best leastsquares ο¬t. As before, let (π₯1 , π¦1 ), (π₯2 , π¦2 ), . . ., (π₯π , π¦π ) be given data points that need to be
approximated by a polynomial of degree π. We assume that π < π β 1, for otherwise, we can use
polynomial interpolation to ο¬t the points exactly.
Let the least-squares polynomial have the form
ππ (π₯) =
π
β
ππ π₯π .
π=0
Our goal is to minimize the sum of squares of the deviations in ππ (π₯) from each π¦-value,
β‘
β€2
π
π
π
β
β
β
β£
πΈ(a) =
[ππ (π₯π ) β π¦π ]2 =
ππ π₯ππ β π¦π β¦ ,
π=1
π=1
4
π=0
where a is a column vector of the unknown coeο¬cients of ππ (π₯),
β€
β‘
π0
β’ π1 β₯
β₯
β’
a = β’ . β₯.
β£ .. β¦
ππ
Diο¬erentiating this function with respect to each ππ yields
β‘
β€
π
π
β
β
βπΈ
=
2β£
ππ π₯ππ β π¦π β¦ π₯ππ , π = 0, 1, . . . , π.
βππ
π=1
π=0
Setting each of these partial derivatives equal to zero yields the system of equations
(π
)
π
π
β
β π+π
β
π₯π
ππ =
π₯ππ π¦π , π = 0, 1, . . . , π.
π=0
π=1
π=1
These are the normal equations. They are a generalization of the normal equations previously
deο¬ned for the linear case, where π = 1. Solving this system yields the coeο¬cients {ππ }ππ=0 of the
least-squares polynomial ππ (π₯).
As in the linear case, the normal equations can be written in matrix-vector form
π΄π π΄a = π΄π y,
where
β‘
β’
β’
β’
π΄=β’
β’
β£
1
1
1
..
.
π₯0
π₯1
π₯2
..
.
π₯20
π₯21
π₯22
..
.
β
β
β
β
β
β
β
β
β
..
.
π₯π0
π₯π1
π₯π2
..
.
1 π₯π π₯2π β
β
β
π₯ππ
β€
β₯
β₯
β₯
β₯,
β₯
β¦
β‘
β’
β’
a=β’
β£
π0
π1
..
.
ππ
β€
β₯
β₯
β₯,
β¦
β‘
β’
β’
y=β’
β£
π¦1
π¦2
..
.
β€
β₯
β₯
β₯.
β¦
π¦π
The normal equations equations can be used to compute the coeο¬cients of any linear combination
of functions {ππ (π₯)}ππ=0 that best ο¬ts data in the least-squares sense, provided that these functions
are linearly independent. In this general case, the entries of the matrix π΄ are given by πππ = ππ (π₯π ),
for π = 1, 2, . . . , π and π = 0, 1, . . . , π.
Example We wish to ο¬nd the quadratic function π¦ = π2 π₯2 + π1 π₯ + π0 that best approximates the
data shown in Table 2, in the least-squares sense. By deο¬ning
β‘
β€
β‘
β€
1 π₯1 π₯21
π¦1
β‘
β€
π0
β’ 1 π₯2 π₯2 β₯
β’ π¦2 β₯
2 β₯
β’
β’
β₯
β£
β¦
π
π΄=β’ .
,
a
=
,
y
=
β’ .. β₯ ,
β₯
..
..
1
β£ ..
β¦
β£
. β¦
.
.
π2
2
1 π₯10 π₯10
π¦10
5
π
1
2
3
4
5
6
7
8
9
10
π₯π
2.0774
2.3049
3.0125
4.7092
5.5016
5.8704
6.2248
8.4431
8.7594
9.3900
π¦π
2.7212
3.7798
4.8774
6.6596
10.5966
9.8786
10.5232
23.3574
24.0510
27.4827
Table 2: Data points (π₯π , π¦π ), for π = 1, 2, . . . , 10, to be ο¬t by a quadratic function
and solving the normal equations
π΄π π΄a = π΄π y,
we obtain the coeο¬cients
π0 = 4.7681,
π1 = β1.5193,
π2 = 0.4251,
and conclude that the quadratic function that best ο¬ts this data in the least-squares sense is
π¦ = 0.4251π₯2 β 1.5193π₯ + 4.7681.
The data, and this function, are shown in Figure 2. β‘
Least-squares ο¬tting can also be used to ο¬t data with functions that are not linear combinations
of functions such as polynomials. Suppose we believe that given data points can best be matched
to an exponential function of the form π¦ = ππππ₯ , where the constants π and π are unknown. Taking
the natural logarithm of both sides of this equation yields
ln π¦ = ln π + ππ₯.
If we deο¬ne π§ = ln π¦ and π = ln π, then the problem of ο¬tting the original data points {(π₯π , π¦π )}π
π=1
with an exponential function is transformed into the problem of ο¬tting the data points {(π₯π , π§π )}π
π=1
with a linear function of the form π + ππ₯, for unknown constants π and π.
Similarly, suppose the given data is believed to approximately conform to a function of the form
π¦ = ππ₯π , where the constants π and π are unknown. Taking the natural logarithm of both sides of
this equation yields
ln π¦ = ln π + π ln π₯.
6
Figure 2: Data points (π₯π , π¦π ) (circles) and quadratic least-squares ο¬t (solid curve)
If we deο¬ne π§ = ln π¦, π = ln π and π€ = ln π₯, then the problem of ο¬tting the original data points
{(π₯π , π¦π )}π
π=1 with a constant times a power of π₯ is transformed into the problem of ο¬tting the data
points {(π€π , π§π )}π
π=1 with a linear function of the form π + ππ€, for unknown constants π and π.
Example We wish to ο¬nd the exponential function π¦ = ππππ₯ that best approximates the data
shown in Table 3, in the least-squares sense. By deο¬ning
β‘
β€
β‘
β€
1 π₯1
π§1
[ ]
β’ 1 π₯2
β₯
β’ π§2 β₯
π
β’
β₯
β’
β₯
π΄ = β’ . . . β₯, c =
, z = β’ . β₯,
π
β£ .. .. .. β¦
β£ .. β¦
1 π₯5
π§5
where π = ln π and π§π = ln π¦π for π = 1, 2, . . . , 5, and solving the normal equations
π΄π π΄c = π΄π z,
we obtain the coeο¬cients
π = 0.4040,
π = ππ = πβ0.2652 = 0.7670,
7
π
1
2
3
4
5
π₯π
2.0774
2.3049
3.0125
4.7092
5.5016
π¦π
1.4509
2.8462
2.1536
4.7438
7.7260
Table 3: Data points (π₯π , π¦π ), for π = 1, 2, . . . , 5, to be ο¬t by an exponential function
and conclude that the exponential function that best ο¬ts this data in the least-squares sense is
π¦ = 0.7670π0.4040π₯ .
β‘
8
© Copyright 2026 Paperzz