Best Approximation in the 2-norm

Jim Lambers
MAT 772
Fall Semester 2010-11
Lecture 12 Notes
These notes correspond to Sections 9.2 and 9.3 in the text.
Best Approximation in the 2-norm
Suppose that we wish to obtain a function 𝑓𝑛 (π‘₯) that is a linear combination of given functions
{πœ™π‘— (π‘₯)}𝑛𝑗=0 , and best fits a function 𝑓 (π‘₯) at a discrete set of data points {(π‘₯𝑖 , 𝑓 (π‘₯𝑖 ))}π‘š
𝑖=1 in a
least-squares sense. That is, we wish to find constants {𝑐𝑗 }𝑛𝑗=0 such that
⎑
⎀2
π‘š
π‘š
𝑛
βˆ‘
βˆ‘
βˆ‘
⎣
[𝑓𝑛 (π‘₯𝑖 ) βˆ’ 𝑓 (π‘₯𝑖 )]2 =
𝑐𝑗 πœ™π‘— (π‘₯𝑖 ) βˆ’ 𝑓 (π‘₯𝑖 )⎦
𝑖=1
𝑖=1
𝑗=0
is minimized. This can be accomplished by solving a system of 𝑛 + 1 linear equations for the {𝑐𝑗 },
known as the normal equations.
Now, suppose we have a continuous set of data. That is, we have a function 𝑓 (π‘₯) defined on
an interval [π‘Ž, 𝑏], and we wish to approximate it as closely as possible, in some sense, by a function
𝑓𝑛 (π‘₯) that is a linear combination of given functions {πœ™π‘— (π‘₯)}𝑛𝑗=0 . If we choose π‘š equally spaced
points {π‘₯𝑖 }π‘š
𝑖=1 in [π‘Ž, 𝑏], and let π‘š β†’ ∞, we obtain the continuous least-squares problem of finding
the function
𝑛
βˆ‘
𝑓𝑛 (π‘₯) =
𝑐𝑗 πœ™π‘— (π‘₯)
𝑗=0
that minimizes
∫
𝐸(𝑐0 , 𝑐1 , . . . , 𝑐𝑛 ) =
𝑏
[𝑓𝑛 (π‘₯) βˆ’ 𝑓 (π‘₯)]2 𝑑π‘₯ =
π‘Ž
∫
π‘Ž
𝑏
⎑
⎀2
𝑛
βˆ‘
⎣
𝑐𝑗 πœ™π‘— (π‘₯) βˆ’ 𝑓 (π‘₯)⎦ 𝑑π‘₯.
𝑗=0
To obtain the coefficients {𝑐𝑗 }𝑛𝑗=0 , we can proceed as in the discrete case. We compute the
partial derivatives of 𝐸(𝑐0 , 𝑐1 , . . . , 𝑐𝑛 ) with respect to each π‘π‘˜ and obtain
⎑
⎀
∫ 𝑏
𝑛
βˆ‘
βˆ‚πΈ
=
πœ™π‘˜ (π‘₯) ⎣
𝑐𝑗 πœ™π‘— (π‘₯) βˆ’ 𝑓 (π‘₯)⎦ 𝑑π‘₯,
βˆ‚π‘π‘˜
π‘Ž
𝑗=0
and requiring that each partial derivative be equal to zero yields the normal equations
]
∫ 𝑏
𝑛 [∫ 𝑏
βˆ‘
πœ™π‘˜ (π‘₯)πœ™π‘— (π‘₯) 𝑑π‘₯ 𝑐𝑗 =
πœ™π‘˜ (π‘₯)𝑓 (π‘₯) 𝑑π‘₯, π‘˜ = 0, 1, . . . , 𝑛.
𝑗=0
π‘Ž
π‘Ž
1
We can then solve this system of equations to obtain the coefficients {𝑐𝑗 }𝑛𝑗=0 . This system can
be solved as long as the functions {πœ™π‘— (π‘₯)}𝑛𝑗=0 are linearly independent. That is, the condition
𝑛
βˆ‘
𝑐𝑗 πœ™π‘— (π‘₯) ≑ 0,
π‘₯ ∈ [π‘Ž, 𝑏],
𝑗=0
is only true if 𝑐0 = 𝑐1 = β‹… β‹… β‹… = 𝑐𝑛 = 0. In particular, this is the case if, for 𝑗 = 0, 1, . . . , 𝑛, πœ™π‘— (π‘₯) is a
polynomial of degree 𝑗. This can be proved using a simple inductive argument.
Example We approximate 𝑓 (π‘₯) = 𝑒π‘₯ on the interval [0, 5] by a fourth-degree polynomial
𝑓4 (π‘₯) = 𝑐0 + 𝑐1 π‘₯ + 𝑐2 π‘₯2 + 𝑐3 π‘₯3 + 𝑐4 π‘₯4 .
The normal equations have the form
𝑛
βˆ‘
π‘Žπ‘–π‘— 𝑐𝑗 = 𝑏𝑖 ,
𝑖 = 0, 1, . . . , 4,
𝑗=0
or, in matrix-vector form, 𝐴c = b, where
∫ 5
∫ 5
5𝑖+𝑗+1
𝑖 𝑗
, 𝑖, 𝑗 = 0, 1, . . . , 4,
π‘Žπ‘–π‘— =
π‘₯ π‘₯ 𝑑π‘₯ =
π‘₯𝑖+𝑗 𝑑π‘₯ =
𝑖+𝑗+1
0
0
∫ 5
π‘₯𝑖 𝑒π‘₯ 𝑑π‘₯, 𝑖 = 0, 1, . . . , 4.
𝑏𝑖 =
0
Integration by parts yields the relation
𝑏𝑖 = 5𝑖 𝑒5 βˆ’ π‘–π‘π‘–βˆ’1 ,
𝑏0 = 𝑒5 βˆ’ 1.
Solving this system of equations yields the polynomial
𝑓4 (π‘₯) = 2.3002 βˆ’ 6.226π‘₯ + 9.5487π‘₯2 βˆ’ 3.86π‘₯3 + 0.6704π‘₯4 .
As Figure 1 shows, this polynomial is barely distinguishable from 𝑒π‘₯ on [0, 5].
However, it should be noted that the matrix 𝐴 is closely related to the 𝑛 × π‘› Hilbert matrix 𝐻𝑛 ,
which has entries
1
[𝐻𝑛 ]𝑖𝑗 =
, 1 ≀ 𝑖, 𝑗 ≀ 𝑛.
𝑖+π‘—βˆ’1
This matrix is famous for being highly ill-conditioned, meaning that solutions to systems of linear
equations involving this matrix that are computed using floating-point arithmetic are highly sensitive to roundoff error. In fact, the matrix 𝐴 in this example has a condition number of 1.56 × 107 ,
which means that a change of size πœ– in the right-hand side vector b, with entries 𝑏𝑖 , can cause a
change of size 1.56πœ– × 107 in the solution c. β–‘
2
Figure 1: Graphs of 𝑓 (π‘₯) = 𝑒π‘₯ (red dashed curve) and 4th-degree continuous least-squares polynomial approximation 𝑓4 (π‘₯) on [0, 5] (blue solid curve)
Inner Product Spaces
As the preceding example shows, it is important to choose the functions {πœ™π‘— (π‘₯)}𝑛𝑗=0 wisely, so that
the resulting system of normal equations is not unduly sensitive to round-off errors. An even better
choice is one for which this system can be solved analytically, with relatively few computations. An
ideal choice of functions is one for which the task of computing 𝑓𝑛+1 (π‘₯) can reuse the computations
needed to compute 𝑓𝑛 (π‘₯).
To that end, recall that two π‘š-vectors u = βŸ¨π‘’1 , 𝑒2 , . . . , π‘’π‘š ⟩ and v = βŸ¨π‘£1 , 𝑣2 , . . . , π‘£π‘š ⟩ are orthogonal if
π‘š
βˆ‘
uβ‹…v =
𝑒𝑖 𝑣𝑖 = 0,
𝑖=1
where u β‹… v is the dot product, or inner product, of u and v.
By viewing functions defined on an interval [π‘Ž, 𝑏] as infinitely long vectors, we can generalize
the inner product, and the concept of orthogonality, to functions. To that end, we define the inner
product of two real-valued functions 𝑓 (π‘₯) and 𝑔(π‘₯) defined on the interval [π‘Ž, 𝑏] by
∫
βŸ¨π‘“, π‘”βŸ© =
𝑏
𝑓 (π‘₯)𝑔(π‘₯) 𝑑π‘₯.
π‘Ž
3
Then, we say 𝑓 and 𝑔 are orthogonal with respect to this inner product if βŸ¨π‘“, π‘”βŸ© = 0.
In general, an inner product on a vector space 𝒱 over ℝ, be it continuous or discrete, has the
following properties:
1. βŸ¨π‘“ + 𝑔, β„ŽβŸ© = βŸ¨π‘“, β„ŽβŸ© + βŸ¨π‘”, β„ŽβŸ© for all 𝑓, 𝑔, β„Ž ∈ 𝒱
2. βŸ¨π‘π‘“, π‘”βŸ© = π‘βŸ¨π‘“, π‘”βŸ© for all 𝑐 ∈ ℝ and all 𝑓 ∈ 𝒱
3. βŸ¨π‘“, π‘”βŸ© = βŸ¨π‘”, 𝑓 ⟩ for all 𝑓, 𝑔 ∈ 𝒱
4. βŸ¨π‘“, 𝑓 ⟩ β‰₯ 0 for all 𝑓 ∈ 𝒱, and βŸ¨π‘“, 𝑓 ⟩ = 0 if and only if 𝑓 = 0.
This inner product can be used to define the norm of a function, which generalizes the concept
of the magnitude of a vector to functions, and therefore provides a measure of the β€œmagnitude” of
a function. Recall that the magnitude of a vector v, denoted by βˆ₯vβˆ₯, can be defined by
βˆ₯vβˆ₯ = (v β‹… v)1/2 .
Along similar lines, we define the 2-norm of a function 𝑓 (π‘₯) defined on [π‘Ž, 𝑏] by
βˆ₯𝑓 βˆ₯2 = (βŸ¨π‘“, 𝑓 ⟩)
1/2
(∫
=
𝑏
2
)1/2
[𝑓 (π‘₯)] 𝑑π‘₯
.
π‘Ž
As we will see, it can be verified that this function does in fact satisfy the properties required of a
norm. The continuous least-squares problem can then be described as the problem of finding
𝑓𝑛 (π‘₯) =
𝑛
βˆ‘
𝑐𝑗 πœ™π‘— (π‘₯)
𝑗=0
such that
(∫
βˆ₯𝑓𝑛 βˆ’ 𝑓 βˆ₯2 =
𝑏
)1/2
[𝑓𝑛 (π‘₯) βˆ’ 𝑓 (π‘₯)] 𝑑π‘₯
2
π‘Ž
is minimized. This minimization can be performed over 𝐢[π‘Ž, 𝑏], the space of functions that are
continuous on [π‘Ž, 𝑏], but it is not necessary for a function 𝑓 (π‘₯) to be continuous for βˆ₯𝑓 βˆ₯2 to be
defined. Rather, we consider the space 𝐿2 (π‘Ž, 𝑏), the space of real-valued functions such that βˆ£π‘“ (π‘₯)∣2
is integrable over (π‘Ž, 𝑏).
One very important property that βˆ₯ β‹… βˆ₯2 has is that it satisfies the Cauchy-Schwarz inequality
βˆ£βŸ¨π‘“, π‘”βŸ©βˆ£ ≀ βˆ₯𝑓 βˆ₯2 βˆ₯𝑔βˆ₯2 ,
𝑓, 𝑔 ∈ 𝒱.
This can be proven by noting that for any scalar 𝑐 ∈ ℝ,
𝑐2 βˆ₯𝑓 βˆ₯22 + 2π‘βŸ¨π‘“, π‘”βŸ© + βˆ₯𝑔βˆ₯22 = βˆ₯𝑐𝑓 + 𝑔βˆ₯22 β‰₯ 0.
4
The left side is a quadratic polynomial in 𝑐. In order for this polynomial to not have any negative
values, it must either have complex roots or a double real root. This is the case if the discrimant
satisfies
4βŸ¨π‘“, π‘”βŸ©2 βˆ’ 4βˆ₯𝑓 βˆ₯22 βˆ₯𝑔βˆ₯22 ≀ 0,
from which the Cauchy-Schwarz inequality immediately follows. By setting 𝑐 = 1 and applying this
inequality, we immediately obtain the triangle-inequality property of norms.
Suppose that we can construct a set of functions {πœ™π‘— (π‘₯)}𝑛𝑗=0 that is orthogonal with respect to
the inner product of functions on [π‘Ž, 𝑏]. That is,
∫
βŸ¨πœ™π‘˜ , πœ™π‘— ⟩ =
𝑏
{
πœ™π‘˜ (π‘₯)πœ™π‘— (π‘₯) 𝑑π‘₯ =
π‘Ž
0
π‘˜ βˆ•= 𝑗
.
π›Όπ‘˜ > 0 π‘˜ = 𝑗
Then, the normal equations simplify to a trivial system
[∫ 𝑏
]
∫ 𝑏
2
[πœ™π‘˜ (π‘₯)] 𝑑π‘₯ π‘π‘˜ =
πœ™π‘˜ (π‘₯)𝑓 (π‘₯) 𝑑π‘₯,
π‘Ž
π‘˜ = 0, 1, . . . , 𝑛,
π‘Ž
or, in terms of norms and inner products,
βˆ₯πœ™π‘˜ βˆ₯22 π‘π‘˜ = βŸ¨πœ™π‘˜ , 𝑓 ⟩,
π‘˜ = 0, 1, . . . , 𝑛.
It follows that the coefficients {𝑐𝑗 }𝑛𝑗=0 of the least-squares approximation 𝑓𝑛 (π‘₯) are simply
π‘π‘˜ =
βŸ¨πœ™π‘˜ , 𝑓 ⟩
,
βˆ₯πœ™π‘˜ βˆ₯22
π‘˜ = 0, 1, . . . , 𝑛.
If the constants {π›Όπ‘˜ }π‘›π‘˜=0 above satisfy π›Όπ‘˜ = 1 for π‘˜ = 0, 1, . . . , 𝑛, then we say that the orthogonal
set of functions {πœ™π‘— (π‘₯)}𝑛𝑗=0 is orthonormal. In that case, the solution to the continuous least-squares
problem is simply given by
π‘π‘˜ = βŸ¨πœ™π‘˜ , 𝑓 ⟩, π‘˜ = 0, 1, . . . , 𝑛.
Next, we will learn how sets of orthogonal polynomials can be computed.
5