Nychka, DouglaThe Mean Posterior Variance of a Smoothing Spline and a Consistent Estimate of the Mean Squared Error"

·0
•
THE MEAN POSTERIOR VARIANCE OF A SMOOTHING SPLINE AND A CONSISTENT
ESTIMATE OF THE MEAN SQUARED ERROR.
by
DOUGLAS NYCHKA
......
"
Institute of Statistics 'Mimeograph Series no. 1688.
NORTH CAROLINA STATE UNIVERSITY
Raleigh, N?rth Carolina
The mean posterior variance of a smoothing spline and a consistent estimate
of the mean squared error.
Douglas Nychka
Department of Statistics
North Carolina State University
Abstract
A smoothing spline estimator can be interpreted in two ways: either as
the solution of a variational problem or as the posterior mean when a
particular Gaussian prior is placed on the unknown regression function.
In
order to explain the remarkable performance of her Bayesian "confidence
intervals" in a simulation study, Wahba (1983) conjectured that the mean
posterior variance of a spline estimate evaluated at the observation points
will be close to the expected mean squared e!"ror.
The estimate of the mean
posterior variance proposed by Wahba is shown to converge in probability to
a quantity proportional to the expected mean squared error.
This result is
established by relating this statistic to a consistent risk estimate based on
generalized cross validation.
Subject Classification:
(Amos-Mos):
Primary 62G05, Secondary 62G15
Key Words and Phrases
Nonparametric regression, smoothing spline, confidence intervals for a
smoothing spline, risk estimate, cross validation.
1
1.
Introduction
Consider the additive model:
= l,n.
k
where f is a smooth, unknown function evaluated at the points
o~
tIn
~
t 2n . . .
~
tnn
~ 1
and {ekn}k=l,n
are independent
and identically distributed errors with E(e)=O , E(e2 ) = u 2 •
This paper will study some of the large sample properties of a
smoothing spline as a nonparametric estimate for f.
th
For A > 0 the m
order natural smoothing spline estimate for f, fA' is the minimizer of
(1.1)
for all
are absolutely continuous } .
This nonparametric estimate for f has an intriguing stochastic
2
interpretation (Wahba(l978)). Suppose ~n - N (Q, u I) and f is a
realization from the Gaussian process
Ij=l 8.J
where
~
W(O) = O.
- N
(O,~I)
tj- l
2
+~
nA
Jot
Ct-s)m-l
(m-l)! + dW(s)
and W(·) is the standard Wiener process with
If fA is a natural spline estimate then
fACt) = lim
E(fC t )
~->.
I X)
.
When the smoothing parameter, A, is fixed fA is a linear function of
y and there is an matrix \(A), such that fA
(fACt ln )···, fCt nn ))·
= ACA)y
with l~
The limiting conditional covariance is
2
=
2
u A(A)
= lim
Var
~->.
(f I XJ
.
A periodic smoothing spline is obtained by restricting the minimization of
(1.1) to the set:
= h(k)
{hzwmro,IJ: h(k)(O)
(1), k
= 0,
m-l } and these
conditional equations will also hold for a periodic spline provided the
Gaussian process is modified slightly to reflect the periodicity of f.
Wahba (1983) used this correspondence between a smoothing spline
estimator and the posterior distribution of f to motivate 95%
pointwise confidence intervals for f (t
kn
) of the form:
Although these intervals were derived within a Bayesian framework the simulation results reported in Wahba (1983), indicate that
they work well when evaluated by a frequentist criterion for fixed
functions.
Wahba's confidence procedure is one of the few approaches
that has been developed for spline estimates, and, given the growing
interest in the use of smoothing splines for data analysis (see Silverman
(1984) for a review), it is important to understand the statistical
properties of this method.
In explaining the success of this confidence
procedure, Wahba conjectured that the mean posterior variance (MPV)
for fA at the points {tkn}k=l,n
is close to the expected mean squared error.
Unfortunately Wahba's argument for her hypothesis has several places that
are heuristic.
Also, her analysis does not acknowledge the fact
that the value for A was not fixed in the simulations but rather was
determined by generalized cross validation.
In this paper we give a proof
for a version of Wahba's conjecture that accounts for the adaptive choice of
the smoothing parameter.
To establish this result we consider a consistent
3
estimator of the expected mean squared error obtained by subtracting an
1 n
2
an estimate of - !
e
from the generalized cross validation
n k=l kn
validation function. Asymptotically, this estimator also turns
out to be proportional to the estimated MPV proposed by Wahba.
Given this relationship, our version of Wahba's conjecture is simple to
establish.
Let
T ().) =
n
for f). and let
*
).0
~
k=l
denote the mean squared error
(f,(t kn ) - f(t kn )]2
~
be the minimizer of E T ().) for).
n
[0,-).
t
Wahba's
conjecture is:
(1.2)
I f fa
W;
2* tr(A().o»)
[O,lJ then a
ET
= a
n
as n
-> -
for
SOlIle
at [(1
().O)
[1
-+-
(90
(1»)
-+-~) (1 -~) , 1]
This conjecture is important because it links a frequency
quantity with a functional of the posterior distribution.
In fact,
this correspondence provides a simple explanation for the accuracy of the
confidence intervals in Wahba's Monte Carlo study.
spline, consider the random variable:
random variable independent of
equal probability.
~
u = [f(~)
- f
For a periodic
).0
(~)J where ~ is a
taking on the values
{tkn}k=l,n with
U has a mean of zero, has variance ET ().o) and for the
o
test cases considered by Wahba, is approximately normally distributed.
(see Nychka (1986».
Thus, we are lead to the 95% confidence interval.
for
f(~)
4
.
Because Wahba.. evaluates her procedure using the average coverage
probability of the pointwise confidence intervals at {tkn } 'k=l,n she
in effect is computing the confidence level for the interval given above.
Moreover, because the periodic splines used in'this study were applied to
equally spaced data, A(A) is a circulant matrix and thus ~k(AO) =
ktrA(A o).
Therefore, for Wahba's Bayesian confidence intervals:
f
(1.3)
AO
(t
kn
)
~ 1.96 ~2 trA(A o )
the average coverage probability will be close to the nominal level,
O
provided that ~ trA(A o ) is close to E Tn (A ) •
Although (1.2) is stated in terms of expected values where
AO is known, the actual confidence intervals computed in Wahba's
Monte Carlo study were
(1.4)
where
.
A is
V(A)
f~
(t kn )
~
1.96
;in
(T'
2
..
trA(A)
the minimizer of the generalized cross validation function:
%1
=
(r-A(A)Jy~
(! tr(r-A(A»))2
is an estimator for q2 .
2
2
I (!-A(A»)Y I
"2
=
and
(T'
tr( r-A(A) )
The frequency interpretation of the Wahba's
confidence intervals that has been briefly outlined above can be extended
to the situation when h O and q2 are estimated by ~ and ;2 .
If
~2tr (A(~)]
is a consistent estimate for the MPV then the difference
between the average cqverage probabilities for the intervals in (1.3)
and (1.4) will converge to zero in probability (Nychka
"2
(T'
motivation for focusing on the "estimated" MPV: n tr
are summarized by the theorem stated below.
5
This is the· --and our results
Suppose that the knots, {tkn}k=l,n ' are a random sample from a
distribution with pdf g such that g is strictly positive on [0,1] and
g~ C·[O,l] .
TheoreIB 1. 1
A
).
is the JIliniJllizer of V().) restricted to
~
.
[An' .) with An ... n- '5 , fA is a natural spline with m
011
2, ft
?m
WZ '
and f satisfies the natural boundary conditions:
(1.5)
then
(1.6)
f(k)(O) = f(k)(l) = 0, k = m, 2m-l
"'2
ii trA(i)
(T
~) K as n -)-
E T (A 0)
n
where
K = (1
1
2m,
] (1 + ~m
)
The hypotheses of this theorem differ in several significant
ways from Wahba's initial formulation.
First, note that the
frequency interpretation of the average coverage probability outlined above
does not really depend on the errors being normally distributed.
is required that is U be approximately normal.
For this reason we
considered the consistency of the estimated MPV where the normal
assumption has been replaced by the boundedness of higher order
moments of the error distribution.
By placing more restrictions
on f we obtain convergence to a particular constant that is
actually the lower end point for the range of
a
given in (1.2).
More will be said about the boundary condition& required of f at
6
All that
Finally, we note that Theorem 1.1 will also
the end of this section.
hold if fA is a periodic spline provided that the natural boundary
conditions in (1.5) are replaced by the periodic ones:
f(k){O) = f{k){l) , for k = O,2m-l.
The three smooth periodic test
,
functions considered by Wahba in her simulations satisfy these boundary
conditions.
Theorem 1.1 will be established by comparing a.2 tr A{~) to an
Ii
o
estimator of ET (A ) based on the generalized cross validation
n
function.
Besides giving a clear proof, we feel this risk estimate is of
From Speckman (1983) and Cox (1984),
interest in its own right.
under the conditions of Theorem 1.1:
(1. 7)
V{~) _S2
______~n_
E T (A o )
f)
1 uniformly for A
£
n
Thus, given a consistent estimate of 52
n
Tn = V(~)
estimate of E T (A o ) is
n
- g2n
[A ,-) where 52 = 1 t e 2
n
n
n k=l k
say S"2 ,a natural
n
The key point of our
argument is to establish the following
Theorem 1.2
Let
(1.8)
"2
S =
n
I
..
(I-A(A»)Y
.I
2
1
andC = 2- -K
tr(I-eA{A»)
then
S"2
n
52
n
=
~p
(
0)
E T~(A )
7
Tn IE
Combining this result with (1.2) we have
Tn (A
O
)
2) 1 and
A
Theorem 1.1 will follow easily from the consistency of Tn and the
asymptotic behavior of tr A(A) .
A
(2-e)
n1 trA(A)
A
+
n1 tr
A
A(A)
2
and after some algebra
(1.9)
A
Thus, T is proportional to the estimated mean posterior variance.
n
From
Lemma 2. I and the choice of A , the bracketed term in (1.9) converges to 1
n
and noting that K
1
= 2=e '
Theorem 1.1 now follows.
Arguing from an analogy with linear regression, Wahba suggested
0-2 as an estimate for (1'2.
A2
s2
n
P
E T (A ) -)
(I'
n
~
where
~
Although 0-2 - S2
n
=o.
= <!>p (1)
This is the reason for the
0
slight modification of
constant
e.
a2
in (1.8) through the insertion of the
This change is somewhat arbitrary and there other
modifications that will also work.
For example, one could obtain the
same consistency results with
and
2m
A = ( 2Drl
)~
4m+l
A
A
Besides an estimate of (1'2 based on the residual sum of squares,
consistent estimates for (1'2
can be constructed using first (or
8
second) differences of
X.
(see Rice (1983».
This latter estimate has
the advantage that no assumpt ions need to be made about the
functional form for the bias of fA .
based on differencing
X will
Unfortunately, estimators
not give consistent estimates of S2 •
n
Risk estimators for a smoothing spline have been developed
through the connection between Stein's unbiased risk estimate
and generalized cross validation (Li (1985).
One interesting
feature of Li's work is that his risk estimate only requires an
estimate of a 2 rather than s2.
n
His results, however, are limited
to the case of normal errors and involve the asymptotic
approximation of fA by a Stein estimate.
Also, our work is related
to Rice (1983) because in the special case when t
kn = ~ , a
periodic smoothing spline is also a kernel estimate.
The next section states general versions of Theorems 1.1 and
1.2 while Section 3 contains the preliminary lemmas for the proofs
of these theorems.
The proofs of the theorems are given in Section 4.
The last section establishes the form for the mean squared bias of a natural
cubic spline on an interior
i~terva1
of [0,1].
This technical result was
included to support the discussion at the end of this first section.
The boundary conditions on f required in Theorem 1.1 are
very restrictive and we end this section by discussing the background for
this condition and suggesting a modification to avoid this limitation on f.
Let ~ = { f: f
Members of
~
e
~ [0,1] , f(k)(o) = f(k)(l), k = m, 2m-l}
are the "very smooth" functions defined in
Wahba (1977) or the case p=2 in the appendix of Wahba (1983) .
This class was considered because E Tn (A o ) achieves its fastest rate of
9
convergence
for fE
~.
and the average squared basis for these
"very smooth" functions has asymptotically an explicit functional
form.
This form is given in condition (F3) in the next section
and is the key to establishing the consistency of
Although members of
~
i
and 52
n
must have more derivatives than is
required by the penalty function in (1.1), the main objection to this class
is that f must satisfy the natural boundary conditions in its higher
derivatives.
Probably for most applications this is not a reasonable
assumption.
Unfortunately, if fEW
2m
[0,1] but does not satisfy these
natural boundary conditions then condition (F3) will not hold.
Rice and
Rosenblatt (1984) considered a slightly different spline estimate and showed
4
that if f E C [0,1], but f is not contained in
in the neighborhoods of
a the
bias of the spline
° and 1 will dominate the average squared bias.
This has the unfortunate consequence that the mean squared error (and the
cross validation function) will be dominated by the behavior of the estimate
at the boundaries.
A simple solution to this problem is to restrict evaluation of
fA to an interior interval:
[6, 1-6] for some 6>0.
Although this
means that information about f at the boundary is lost, it is reasonable to
believe that the boundary effects cited above will be eliminated and the
mean squared error for fA in [6,1-6] will satisfy (F3).
In section 5 we show that this is indeed the case when g
~
1 and m : 2.
If the results in Lemma 3.1 can be established when V(A), T
n
(A) and
and S2 are appropriately restricted to [6, 1-6] then Theorem 1.1 and
n
4
Theorem 1. 2 will still hold for all r E C [0,1] .
We feel this additional
analysis is beyond the scope of this paper, however, because our main
objective is to give a rigorous basis for Wahba's original
10
conjecture in the context of her simulation study.
2.
General Theorem
A general version of Theorems 1.1 and 1.2 depends on three
conditions which we will refer to as (Fl) - (F3).
Let Gn denote the empirical distribution for the knots,
{t kn }k=l,n .
Following Cox (1984), (subsequently abbreviated GA) we
consider two cases:
A Designed knots:
sup
I
There is a distribution function G such that
Gn(v)-G(v)I
= ~(!
)
vl:[o,l]
B Random knots:
{tkn} is a random sample with distribution function G.
In either case we will assume that g
[O,lJ and g
I:
=~ G is
strictly positive on
C·[O,IJ .
(Fl)
with
Case A:
6 ) 4m-l
Case B:
6 ) 2(8m-3)/5
Unfortunately due to the difficulty in obtaining uniform asymptotic
approximations for fA over all A, we must restrict attention to values of
the smoothing parameter in an interval [A , .) with X -)
n
rate.
n
0
at a particular
Thus, we will redefine the smoothing parameter estimate as
.
A = argmin V(A)
11
(F2)
Case A:
An
4m
n-5
log (n)
2m
An ::: n 5 log (n)m
Case B:
(F3)
:::
There is a
~
> 0 such that
[ l+~(l)J
uniformly for AE [A
Theorem 2.1
Under (Fl)-(F3) if'
2.la)
52 n
8n2
K=
(~l
= ~
8n2
(E
p
is given by (1.8) and
T = v(i) - 8n2
then
T (AO)J
n
2.lb)
2.1c)
where
3.
J
[::.rlJ
as n->-
Preliminary Results
The proof of Theorem 2.1 will follow from the five lemmas given
below.
To simplify notation, let r=2m and
1
A- F
J1«A) = ~ ii
' where --\ =
(
k=l,2 and
1
F
dv •
The first lemma is a collection of some well known asymptotic
properties of a smoothing spline, while the second establishes
12
n,-
).
A
the consistency of A.
The remaining Lemmas can be motivated by examining the
terms arising from the expansion of 52 and S2 at lines (4.1) and (4.2) in the
n
proof of Theorem 2.1.
n
For all of the following lemmas it should
be assumed that (FI)-(F3) are in force and A(A) may be
interpreted to be the hat matrix for either a natural or periodic spline.
te-a 3.1
Under Fl-.F3 , as n -)
(3.1)
~ tr
CD
(A(A)k) = Ji«A) (1 +
~
(1) ) , k=I,2
and
UDiforIIIly for A
n
I
Tn(A)- ETn(A)
sup
AZ[A ,-)
(3.3) a)
[A , -)
Z
n
ET (A)
n
.
V(A)-S2 - T (A)
I
(3.3) b)
(3.4)
AO
n
= ~p (1)
~ 0- 2 ]
- . n')'2r
_
[
A
(3.5)
n
T (A)
n
=1
+
~
p
(1)
and
13
Proof.
Examining the proof of Lemma 4.4 from GA, Cox actually shows
Ii tr A(A)k = ~=l
that
(l+ABj)-k (l+alo(l) ) for A
[An'lD] where
£
B. are the eigenvalues for the differential operator
J
~
=
(_l)m d2m
--g
dt 2m
~
with the domain for L consisting of
~£
2m
W2 that
either satisfy the natural or periodic boundary conditions
depending on the choice of fA'
~
u.
J
= ot2m.2m
J
From Naimark, 1967, pg. 78-79
(1 + alo(l») and it follows that
tj=l (1 + AB J.)-k = tj=l (1 ~ A(otj)2mJ-k
(1 + alo(l») as A -> O.
Applying similar arguments to those in the proof of Lemma 2.1 in
Cox
(1983b),
~=1
(1 + A(aj) 2mJk = a
~
A-
~
(1 +
~(l)J
as
A -> O.
(3.2) follows trivally from (3.1) and (F3).
Both (3.3a) and
(3.3b) are proved by Speckman (1983) for normally distributed errors
and are generalized in the proof of Theorem 5.1 of GA for error
distributions that satisfy the moment conditions in (F2).
(3.4) is proved in Wahba (1977) and can be verified by computing the
minimum of the RHS of (3.2).
Note that AO£ [A ,lD) for A given in (F2) and
n
n
thus, these approximations are appropriate for studying the mean square
convergence of fAo.
Because A°I;[A ,lD) as n -> lD both (3.5) and (1.7)
n
can be easily derived from the preceeding results.
Finally, (3.6)
may be verified by substituting the asymptotic form for AO into the RHS of
(3.2) and into
Pk .
14
LemIla
3.2
...
If A is the minimizer of V(A) for At [A , eD)
n
then
as n -) eD
Proof
Let g(A) : 7A
2
+
cr
2
-n
~A
1r
-
From (3.2) and (3.3a) we have sup
ITn (A) - g(A)I: ~ (1)
A t [An,eD)
g(A)
p
...
or
g(A!
: 1 +
T (A)
n
~
p
(1)
Combining this asymptotic equivalence with (3.2) and (3.5) it now follows
that
:
...
...
T (A)
n
g(A)
...
1 + ~ (1)
p
:
Tn(A)
Set
,,(u)
: (au)~ + (bu)-(l~)
a<t + b-(l-<t)
Let 6 :
cr2
~
Then g(~) ;lg(A
a :
[~]~
...
...
and choose 6 such that A :
O
) :
~
...
(
~
)
~
[~
n2r
where ~: 2r;ll + 2r ,
andb
15
r
)1+2r
.
~ [ : ] =1
Thus,
if and only if u=l.
+
~p(l).
Note that ~(u) = 1
Because ~ (u) is continuous at 1, 6~ £> 1,
otherwise, a contradiction will be obtained.
.
.
.
From the definition
of 6, the consistency of A now follows.
r.e-a
3.3
~ ~~ A (Ao)k ~n
= ~2~
(A
O
)
(l~p (1) ] , k
= 1,2
as n -> •
Proof
First we note that A(A)% is well defined since A(A) is nonnegative definite.
k
with B = A(A)2
Let
From (3.1) it follows E(X) = 1 +
e , ... , e
1
and
n
For any matrix B, with
~(l).
L Ld.
Var(X)
1
From (3.4)
1
----o =
n'i< (A
)
Thus Var(X) -> 0 and X £> 1
16
tewm 3.4
(3.7)
.. 2
1
S
n
Proof
Adding and subtracting
2
1 ~ I-A (AO) X1
ii
JO
1 - ~
n tr A(Ao)_
the LHS of (3.7) can be rewritten as
(3.8)
V(i) ~ (i) - V(Ao) ~(AO)
(3.9)
+
k I(I-A(A»)X ~2 ~
with
~(A)
(1- k tr
=
2
A(A»)
and
1
1
~
1-eii tr A(A)
1 -
e }£1(~
Expanding (3.8) using the identity:
(ab-cd)
(3.10)
(3.11)
= % {a-c)(b+d)
+ % (a+c) (b-d) yields
(v(i) - V(A0») (~(i) + ~(A 0) )/ 2
+ (V(A) + V(A0») (~(i) - ~(A 0)]/2
Now from Lemma 3.1 ,
(V(A) - V(AO»)
= (Tn(A)
= ~p(E
From (3.1) it is clear that
- Tn(Ao») (1 + ~p(l))
Tn(Ao»)
~(A)
is bounded and therefore (3.10) converges
to zero at the necessary rate.
To simplify notation, let }£In(A)
= ii1
17
tr A(A) .
)
Using the algebraic identity given above
~(i) - ~(AO) = e(~ln(i) - PIn (AO»)~
where
~
=
~In (i)]2
( 1 -
o
+ (I-PIn (A )]2
-..;....-~...;;.;;:::.-_~-~-~---:-----
2[1-ePI(i)]
[I-ePI(A
o
»)
and
Note that
From (3.1) by similar arguments we also have
from Lemma 3.2 and (3.6).
...
PIn (A)
- PIn(A 0 )
Because both
~
= $p
and
~
(
E Tn(A 0 ) ]
are bounded in probability
• ...
~(A)
-
~(A
0
)
=
(A O)] and it now follows that (3.11) also converges to zero at
n
the correct rate.
$p(T
Finally, we consider
T
=
T
in (3.9)
e[~ln(Ao) - ~l(Ao) ]
Using arguments similar to those in lemma 3.3, one can show
the lemma now follows.
18
Lemma 3.5
Let
I' = (f(t l ), .. ·, f(t n )]
, then
as n
-> •
%tr
A(A )2
Proof
Adding and subtracting
a 2 (2-e),ul("
O
°)
a
=
2 (,u2(AO) -
%tr
A(,,0)2]
E T (AO)
E Tn("o)
(3.13)
from r we have
n
as n -> •
1-
by the asymptotic equivalences given in Lemma 3.1.
Substituting the asymptotic form for
(3.14)
r·/
IE T (A O )
=1 -
(2 - e) .1
~(~
n
Referring to
+
1
1)
,,0 into
(3.12) yields
+ ~ (1)
Selby (1979), pg. 461
Thus by the choice of
e it
is straight forward to verify that
the middle term on the LHS of 3.14 above equals 1.
4.
Proof of Theorem 2.1
To simplify notation, let
~
= ~(A o )
and recall that
k=1,2
19
(4.1)
The first term in (4.1) converges in probability at the
necessary rate by Lemma 3.4.
Noting that
X= i
+
~
, by
expanding the second term and adding and subtracting
- a
(4.2)
2.
e~l
+ 2a
2
~l
- a
(Rl + R2 + H3
where
2
~2
~lJ
+ rJ I{l
Rl
= e~l[s:
R2
= -2[~ IA%~I
H3 = ft
we have the expression
- a
2
)
-
~la2)
+
(~ IA~~2
-
~2a2J
i' (I-A)2 ~
and r is defined in the statement of Lemma 3.5.
Because 1 - e
~l
--> 1 to complete the proof it is sufficient
to show that each term in the numerator of (4.2) is ~p
Moreover, in view of the asymptotic relations between
and
Pk(~
o
T
n
(A
o)
) in Lemma 3.1, it is enough to establish a convergence rate
~p(~lV~2)
to zero of the form
for each term.
The convergence of R follows from, the fact that
l
a
E
Tn(~o») •
(E
2
Note that E(R3 )
Rz
is
=
0
~p (~lV~2)
and Var (R3 )
by Lemma 3.3
= n1 2 110
~
ft 21
2
(I-A) 2 in0
(I-A)
~ 1:.
E Tn (~o)
n
20
ill
2
where the first inequality follows because the eigenvalues of
rate using (3.2) and (3.4).
B3
converges at
t~e
Thus, by Chebyshev's inequality,
appropriate rate.
= ~p (E
Finally, note that r
Tn(A
o
»)
by Lemma 3.5
and thus having considered
all the terms in (4.2), 2.1a follows.
The second of the theorem part is a consequence of (2.1a) and
(1.7).
For 2.lc) we refer to the discussion in the introduction
and note that from (3.1),
1
Because Ii
..
~I(A) ~
ktr A (~) = k~1 (~) (1 + ~ (1»)
1
..
= a (_1]
n 5 ,Ii tr A(A) -) 0
1
Ii ~l(An)
as
n -)- and the bracketed term in (1.8) converges to 1.
5.
Bias of a natural spline in a interior iDterval.
In this section we will show that when a natural spline is
restricted to an interior interval the average bias will satisfy
condition (F3).
This result complements the discussion at the end of the
first section.
TheoreJB 5.1
Let fA be a natural cubic spline (m=2) and for 6 ) 0 let
2
2 1
b(A,6)
=Ii
6
L
tim
I:
[6,1-6]
[f(t k ) - E fA(t k )]
t[6,l-6]} .
Assume that (FI) and (F2) hold and G (u) -) u as n -)- •
n
21
~ C4 (0,1] and n4 f is Dot identically zero# then
If f
b().,6)2 = ')').2(1 + ~(l»)
where')'
= f [n4 fJ
unifol'11lly for
).1:
[).n,-J as 0->-
2
dt
[6,1-6]
This theorem will be shown using the three lemmas developed
in this section and for each of these results it should be assumed that
the hypotheses of Theorem 5.1 are in force.
Let [Bn).f) [t]
= f).(t) - f(t).
It will be useful to approximate
B ). using an operator that does not depend on n. Let L be the
n
linear differential operator: ).D 4 + I with domain:
n={~l:w~ (0,1]: ~(k)(O) = ~(k)(l)=O k=2,3}
That is, if 1
Green's function for L.
f [0,1]
I;
and let ~(.,.) be the
L2 [0,1] then
~
+ ~ = 1 with ~&n. The
ds
form for h). can be computed (using symbol manipulation software) and is
~(s)=
h). (s,t) 1(t)dt
solves).
~ ~
Also in the appendix is a lemma giving some upper
given in the Appendix.
bounds on this integral kernel.
One way of interpreting the Green's function
is that it is a limiting form for A().) as the knots become dense in the
interval.
(Cox (1983a) for more details of this interpretation).
let (B).f) (s) =
(5.1)
f[0,1]
DB f
U n).
-
~(s,t) f(t) dt - f(s)
B fO 2
).
OL
[0,1]
=~
then Cox (1983b) shows that
(IIBAfll] -> 0 as 0-> 0 for
A£[). n ,-) under (Fl) and (F2).
To simplify notation, let x6 be the indicator function for
[6, 1-6] and unless otherwise labeled " .. " should be taken to be
the usual norm for L2 [0,1].
22
If we
LeJIIIa 5.1
Let
f,8.
C1 [O,l] and suppose that f
a
then for any
p>O
there is an
N<
= $.
for
s
[~
a
,
1- ~ ]
CD
for A a [A , CD) and n .. N.
n
Proof
(5.2)
+
For s
= I[ ~] ~(s,t)
BA(8.- f )(s)
a
0'2
I[1- ~,lJ
h,(s,t) (8.(t) - f(t»)dt
J\
[6, 1-6] we have Is-tl> ~
I I[
(8.(t)-f(t») dt
~ J hA(s,t)
and thus from Lemma A.l
(8.(t) - f(t) ) dt
0, 2
~
I
6
p
M e-
2
sup
p
8.(t) -f(t)
ta[o,6]
1
where
p
= 10.- 4
and M <
CD.
A similar bound holds for the second
term in (5.2) and thus part a) of the lemma follows.
Let
~n(s,t)
fA(s)
be the kernel such that:
= 1[0,1]
~n(s,t) dGn
similar to (A.3) for
~n(s,t).
and we first derive a bound
This estimate is obtained using the
techniques in the proof of Theorem 3.2
in Cox (1983a).
and h.
J
23
= h,(s,t.)
J
J\
then given the bounds for h in Lemma A.l and (FI) and (F2) Cox (1983a)
shows that
for n sufficiently large.
We claim that for
where Dn
=
~
= 0,1,
sup
'G (u)-G(u)
UZ [0, IJ
n
The proof is by induction.
Clearly (5.3) holds when v
= ° by
and integrating by parts,
"
Now applying a Holder
inequality,
I~
Rn~+1
hj(s)
I ~ II ~ [::p
Dn
~(s,t) R~Ahj(~)J d~
[O,lJ
and using (Lemma A.l) and the induction hypothesis
~
v[ B
(6A J
where I(p,s, t)
2 JV+l 1+
p
D p ~2 I(p,s,t ).
n
-I
.
J
e-pls~l- plt-~Id_.
[0,1]
24
Lemma A.l.
Integrating I(p,s,t) and identifying the dominating terms we have
I(p,s,t.) ~ 3 ~ e-P~
J
replaces v.
Given this bound note that
I hj(s) I,
that Dn =
and thus (5.3) will now hold when v+l
a (n-
[~=o
[B
p
2
DnAJv
% log(n)] , Bp2Dn
I
p2 e -
->
°
p
A
and by (12) using the fact
as n -)
° for h£
[h n ,-) .
Thus, for n sufficiently large, the bracketed term above is a convergent
geometric series with a limit of 1 as n -> -.
With this estimate for
h.(s), part b) now of the lemma now follows in the same manner as part a).
J
Lemma 5.2
(5.4)
Proof
First, we will consider ~
&
4
C [0,1] such that ~ = f for s
and ~(k) is a periodic function on [0,1] for k=0,4.
&
(£ ' 1- ~]
We will argue that
(5.5)
and given the results of lemma 5.la, the extension to nonperiodic f will
follow.
From the discussion in Cox (1983a), Section 5,
~
has the
decomposition:
~(s,t)
=
~(s-t)
~
+ rh(s,t) where
is a periodic kernel with two
continuous derivatives and the form for r h is given in the
appendix~
Note that Bh
~(s) = ~(s)
+
-
J[0,1] ~(s-t) ~(t)dt
I[0,1] rh(s,t) ~(t)dt
25
From the definition of KA(s-t) as a Green's function (see appendix) ,
~(s)
- f
[0,1]
and extending
~
~(s,t) ~(t)dt = AD4 f~(S,t) ~(t)
dt
periodically outside [0,1]
=A I
~
[0,1] A
(u)
4
d4
ds
~(s-u)
du
-) AD 4~(s) as A -) 0.
By the construction of r A reported in the appendix it is clear that
Moreover, from Lemma A.l we have the rough
bound:
where
1<
IrA(s,t)
M
p~
6
-
° < M <. and = A
1st. J r,\(s,t)
-!4
p
ds4
2p
for se[6,1-6]
.
Therefore,
~(t)dt
1\
for any p, uniformly for sl:[6,1-6] •
Taking p
= 1,
we have
BA~
= A D4~
+
~
(A) and (5.5) now follows.
26
Finally, we relate these
resultst~nonperiodicf.
By
Lemma 5.1 with p > 1
IX6 BA f
~
~
-
I~
BA~ ~
X6
~
=~
By construction,
f and
~
BA(f-~) ~
X6
= ~ (~X6BA~IJ
P
(A )
agree on [6,1-6] and
The lemma now follows.
LeJII8ll
5.3
If
~ £ ~
then
(5.6a)
and
(5.Gb)
1
n
6
E
t
k
£
[6,1-6]
uniformly for A
C
~
B
nA
[A n , . )
= U~XR B,~~2
H
(t)2
k
U
A
[1 +
~(l)J
as n -> •
Proof
IIx6 BnA~ I - ~X6 BA~I
I
~ I X6(BnA-BA)~ I ~ I
=~
[II
BA~
I)
(B nA -
BnA)~ ~
= ~(A)
by (5.1) and the assumption that
Given the convergence rate for
~ £ ~.
I
&II
X B
6 A
of the lemma now follows.
27
in Lemma (5.2) the first part
Let
G:
denote the empirical distribution for tkns [~,l-~1 and
G~ its limit as n -) - .
With this notation, the discrete norm in
(5.6b) can be reexpressed as:
J[O,l](BnA~J2 d G:
and we have
Intergratingby parts and applying. theCauchy-Schwarz
inequality:
=
~
2 sup
IG:
(s) -
G6 (S)1
~ ~ BnA~
sa[O,l]
Let D
n
= sup IG6n
- G6
I
and taking the usual norm for
wl2
[0,1] there
is an M < • such that the above expression
Now following the arguments given in the proof of Lemma 4.1, pg. 38 of
GA.
Moreover, by the analysis given above and Lemma 5.2 we know that
Thus, we can conclude that
(5.7)
uniformly for A
e
[A n ,-1
and given (5.6a) the second part of the
lemma now follows.
28
Proor or Theorem 5. 1
The proof consists of extending the convergence results for
&£
~
in Lemma 5.3 to functions that do not satisfy the natural
boundary conditions.
f on
[ ~2
'
For f
£
C4 rO,lJ choose &
£
~ so that & agrees with
l-~]
2
By a slight modification of the proof of part b in Lemma 5.1, it is
possible to show that J(O,lJ[B nA (f-&)]2 d G:
while combining Lemmas 5.2 and 5.3 we have
29
= ~(AP)
, for any p >
f[BnA~J2 dGn~
/'A 2 •
°.
Thus,
This appendix gives an explicit formula for the Greens function, hA(s,t),
for the differential operator L defined in Section 5.
Lemma A.I following
these formulae basically states that 1~(s,t)1 decreases exponentially
as s diverges from t.
A MACSYMA batch program is included at the end of
this appendix for those that wish to reproduce these results.
Let
For s
~
1.
=
~(s,t)
t
s>t, then hA(s,t)
(A.l)
~(s,t)
is given below.
By the symmetry in the operator L if
= hA(l-s,l-t)
= a2(t)
e1.s sin(1.s) + a3(t) e- 1.s sin (ls)
+ al(t) e1.s cos(1.s) + a4(t) e-1.scos(1.s)
To shorten the expressions make the substitutions:
[ss
= sin(1.s),
cs
[ct - cos (1.t), st
[c
= cos
(1.), s
= cos(1.s),
= sin(1.t),
= sin
et
=
=
e
=
es
(1.),
then
Cal
e20
= e21'
a4
=
e2~J
e21
and
(A.2)
a2
= a3 = (al-a4)/2
30
where
~d
e21
=2
e 4 et + 8 c 2 e 2 et - 12 e 2 et + 2 et
As mentioned in Section 5,
hx(s,t)
= ~(s,t)
+ rA(s,t)
where K is a Greens function for the operator AD
conditions
~d
= 0 with boundary
~(2)(1) = 0
v
v~
+ I with periodic boundary
r A must have the form:
and ~ v solves (AD 4 + I) ~ v
with 0
4
v2'
conditions:
~(3)(O) = 0
v
v3
being Kronecker's delta function.
31
~v(3) (1) = 0v4
The forms for KA (s,t) ,
~l(s)
can be found by the relations:
and
~2(s)
are given below.
= ~l(l-s)
~3(s)
We note that given the periodicity of
~(s,t)
and
~4(s)
~3
and
= ~2(1-s)
~4
•
with respect to (t-s), r A
simplifies
IS.
will have the_ same form as
~
in (A.l) although the
coefficients al(t) - a4 (t) will be different.
[al
e4l
= e1. .2.
e42
=
e43
=-
e41
= e42'
a2
e43
= e42'
e44
a3 = e45' a4
e46
= e45]
2
s.2. st - e1. .2. st + cl e1. .2. st - ct e1. .2. s.2. -. ct e1.2 .2. + cl ct e1. .2.
8 e1. 2et - 16 cl e1. et + 8 et
e1. .2. s.2. s t - e1. 2.2. s t + cl e1. .2. s t - ct e1. .2. s.2. + ct e1. 2.2.
- cl ct e1. .2.
e44
=-
e1. et .2. s.2. st - cl e1. et .2. st + et .2. st + ct e1. et .2. sl.
- cl
e45
=
e46
=-
ct e1. et 1. + ct et .2.
8 e1. 2_ 16 cl e1. + 8
e1. et 1. s.2. st + cl e1. et .2. st - et 1. st - ct e1. et 1. s.2.
- cl ct e1. et .2. + ct et .2.
Finally, we give the forms for
Let 1(s)
= al*es*cs
~l
and
~2
+ a2*es*cs + a3*ss/es + a4*cs/es
32
then
CPr: ""
Cal
where
e14
e15' a2
=
e16
= e15'
a3
e17
= e15'
a4
=
and
q)2
=""
al
= e15'
where
e14
a2
e16
= e17'
a3
= e16
e17'
a4
= el§]
e15
and
e14
=-
2
2
2 ce s + e - 1
33
Lemma A.I
For p
= 0,1
-%
and p = A there is a B<m such that
(A.3)
(A.4)
for all s,t
~
[O,lJ and
p
> 0 .
Proof
First assume that s~t and p=O.
Note that e21 ~ 2 e
4p
+ t
and by examining the separate terms in e20 we have
and for, t~[O,lJ, e(p) < B e 4p for some B <m.
o
~
B p e
Thus,
-pt
o
/al(t) cos(pt) ePs
and
0
I
~.
Bp
o
e
p(s-t)
repeating these steps for a4(t) yields a similar bound for
la4(t) sin (ps) exp (-ps)/
Using (A.2) similar analyses can be applied
to a (t) and a (t) and the same bounds may be obtained for the two
2
3
components of h involving these coefficients. (A.3) now follows for p=O
A
and s<t.
When s>t the bound is obtained by substituting 1-5 and l-t for
s and t.
This has the effect of simply interchanging s and t in the
exponential.
For p=l (A.3) follows in a straight forward manner using the
the bounds on al(t) - a4(t) described above.
For (A.4) using similar arguments to those given above it is straight
forward but tedious to show that
~l
is bounded by
B2 e -ps for some B<m .
p
34
By examining the form of K, it is also possible to establish that
e
1
-pt
for some B <CD
and thus,
~2
'2 K(O,t) 'PI(s)
~s
Applying this argument to the other terms in rX(s,t) yields similar
bounds and (A.4) now follows.
35
" MACSYMA batch program to compute the greens function for the operator
"$
"
"
"
"
"
"
Lg
=
"$
"$
4
(lambda* D + I ) with boundary conditions:
2
ItS
ItS
ItS
2
D g(O)= D g(l) = 0
and
3
ff
"s
3
D g(O)= D g(l) = 0
"$
"
"
"
"
"
"
In the expressions below 1= 1/ ( sqrt(2) * lambda**(1/4)
The form for the Greens function is
h = al*cs*es + a2*ss*es +a3*ss/es + a4*cs/es
for s<=t
h = bl*cs*es + b2*ss*es +b3*ss/es + b4*cs/es
for s>t
"
"
"
"
"
"
"
It
ItS
ItS
ItS
"$
"$
"$
ItS
"$
"$
"$
"$
where the a's and b's are functions of t and 1 that must be
computed and the other variables are defined in subsl below.
ItS
The a and b coefficents will be found by solving a set of
"$
linear equations prescribed by the continuity of h and its
ItS
derivatives at s=t and the boundary conditions.
"
"****** define general form for Greens function
"$
"$
h : al*cs*es + a2*ss*es +a3*ss/es + a4*cs/es $
subsl : [ss=sin(l*s),cs=cos(l*s) ,es=exp(l*s)]$
subs2: [cos(l*t)=ct,sin(l*t)=st,exp(l*t)=et]$
subs3: [cos(l)=c,sin(l)=s,exp(l)=e]$
" ******* part when s<=t
ha : ev(h,subsl)$
"S
It ******* part when s>t
hb : ev(ha,al=bl,a2=b2,a3=b3,a4=b4)$
"$
" ****** set up system of 4 equations due to the continuity
"
conditions at s=t
"$
"$
differ: ev(hb-ha,s=t)$
cont-cd : [differ=O ,diff(differ,t)=O,
diff( differ,t,2)=O,diff( differ,t,3) =4* (l**4) ]$
" ****** solve for b in terms of a and simplify
"$
bcoef:
linsolve(cont-cd,[bl,b2,b3,b4]) $
bcoef ev(bcoef,subs2)$
bcoef : ratsimp( expand ( ev(bcoef,sU*2= l-cU*2) ) ) $
" ****** system of 4 equations due to boundary condtions at 0 and 1
36
"$
temp: [diff(ha,s,2)=O, diff(ha,s,3)=O]$
temp2: [diff(hb,s,2)=O, diff(hb,s,3)=O]$
nbc: append( ev(temp,s=O), ev(temp2,s=1) )$ .
nbc: ev(nbc, subs3)$
It ****** express equations just in terms of a to give 4 equations in
It ****** 4 unknowns
It$
"$
nbca : ev(nbc,bcoef)$
" ****** solve for a
acoef: combine( expand( linsolve(nbca,[al,a2,a3,a4J) ) ) $
acoef : combine ( expand ( eve acoef, s**2= l-c**2) ) ) $
"$
****** display solution
pieces :pickapart( acoef,3);
"$
It
" ****** Note: hb can be found by the relation: ha(s,t)= hb( I-s,l-t)
37
It
REFERENCES
Cox, D. D. (1983a).
Asymptotics for M-type smoothing splines.
Ann Statist
11, 550-55l.
Cox, D. D. (1983b).
Approximation of method of regularization estimators.
Technical Report 723, University of Wisconsin/Madison, Department of
Statistics.
Cox, D. D.
(1934).
Gaussian approximation of smoothing splines.
Technical
Report 743, University of Wisconsin/Madison, Department of Statistics.
Naimark, M. A. (1967).
Linear differential operators
Part I.
Frederick Ungar Publishing Co., Inc.; New York.
Nychka, D. W.
Rice,
J~
(1986).
Manuscript in preparation.
and Rosenblatt, M.
(1983).
derivatives and deconvolution.
Rice, J. (1984).
Ann Statist 12
Smoothing splines:
regression,
Ann Statist 11, 141-156.
Bandwidth choice for nonparametric regression.
1215-30.
Selby, S. (ed.) (1979).
CRe Standard Mathematical Tables.
Central
Rubber Company, Cleveland, Ohio.
Silverman, B. W. (1984).
Some aspects of the spline smoothing approach
to non-parametric regression curve fitting.
J. R. Statist Soc. B.
47 1-52.
Speckman, P.
(1983).
Efficient nonparametric regression with cross
validated smoothing splines, unpublished manuscript.
Wahba, G.
(1977)
Practical approximate solutions to linear operator
equations when the data are noisy.
SIAM L.. NUMER ANAL.
38
14, 661-667.
Wahba, G.
(1978)
Improper, priors, spline smoothing and the problem of
guarding against model errors in regression.
J.R. Statist. Soc.
B 49 364-372 •
Wahba, G.
(1983).
Bayesian "confidence intervals" for the cross-validated
smoothing spline.
J. R. Statist Soc.
Dr. Douglas Nychka
Department of Statistics
North Carolina State University
Campus Box 8203
Raleigh, North Carolina 27695-8203
39
b.
45
133-150.