Fan, Jianqing and Gijbels, Irene; (1992).Spatial and Design Adaptation: Adaptive Order Polynomial Approximation in Function Estimation."

Spatial and Design Adaptation: Adaptive Order
Polynomial Approximation in Function Estimation *
Jianqing Fan t
Irene Gijbels~
Department of Statistics
Department of Mathematics
University of North Carolina
Limburgs Universitair Centrum
Chapel Hill, N.C. 27599-3260
B-3590 Diepenbeek, Belgium
September 5, 1992
Abstract
Nonparametric estimation of the mean regression function using local polynomial
approximation has been shown to be attractable, both from theoretical and practical
point of view. An important issue is the choice of the order of the polynomial approximation, as well as the associated smoothing parameter. The approximation order should
depend on the local curvature and noise level: overfitting introduces a larger variance
and underfitting produces a larger bias. In this paper we introduce a fully adaptive variable order approximation method for estimating the mean regression function and its
derivatives. It turns out that the proposed method is outstanding to all other fixed order
approximation methods. The procedure possesses the so-called spatial-adaptation property. Moreover, we implement an appealing local variable bandwidth, which leads to
an additional design-adaptation property: the resulting estimator automatically adapts
to the sparsity of the design points. The bandwidth selection rule here is also innovative. We have experienced outstanding performance of our selection method for both
fixed order and variable order approximations. We provide theoretical foundations for
the performance of the proposed procedure, and establish empirical evidence for it via
simulated examples. Finally we extend the procedure to higher dimensions.
·Supported by NSF grant DMS-8505550 while authors visited the Mathematical Sciences Research Institute, Berkeley, California.
tSupported by NSF grant DMS-9203135. tSenior Research Assistant at the National Fund for Scientific
Research (Belgium).
Abbreviated title. FAVORS
AMS 1991 subject classification. Primary 62G07. Secondary 62E25, 62H99.
Key words and phmses. Adaptive order function approximation; bandwidth selection; designadaptation; estimation of derivatives; nonparametric regression; spatial-adaptation.
1
1
Introduction
Let (Xl, Yd,"', (Xn , Yn ) be a random sample from a population (X, Y), whose relationship
can be modeled by
Y = m(X) + O"(X)€,
E€ = 0
and
var(€)
= 1,
where X and € are independent. It is common practice to estimate the regression function
m(x) = E(YIX = x) and its derivatives. Among other methods, local polynomial fitting
has been proved useful. The idea was first introduced by Stone (1977), and was studied in
Cleveland (1979), Lejeune (1985), Muller (1987) and Cleveland and Devlin (1988). A thorough investigation of the local linear approximation method is provided in Fan (1992a,b),
who establishes the merits of this particular local approximation method. An interesting
theorectical study of higher order polynomial approximations is given in Ruppert and Wand
(1992).
Fitting polynomials of higher order leads to a possible reduction of the bias, but on
the other hand also to an increase of the variability. Table 1 in Section 2.3 provides a
more precise description on the increase of the variability with the order of approximation.
The table gives an idea of the price which has to be paid for introducing more parameters.
For example, the variability of a local quadratic fit is about twice as much as that of a
local linear fit. Another interesting observation is that odd order approximations are more
preferable than even order approximations. For example a local cubic fit outperforms a
local quadratic fit. In fact, a local polynomial fit of order 2q + 1 has the same asymptotic
variance as a fit of order 2q.
Fur~her, there
is no direct comparison between a polynomial fit
of order 2q - 1 and 2q. For example, a local linear and a local quadratic fit are not directly
comparable. These and other important findings are discussed in Section 2. So, a main
concern is how to choose the order of the polynomial approximation. Intuitively it is clear
that in a flat non-sloped region a local constant approximation is preferable, whereas in a
sloped region a local linear fit is recommendable. Local quadratic and cubic approximations
should typically be used at peaks or valleys. Hence, a natural requirement is that the order
2
of approximation should be adjusted to the curvature of the unknown regression function.
In this paper we provide such a data-driven order approximation method, leading to the
Fully Adaptive Variable Order Regression Smoother (FAVORS). The proposed procedure is
easy to implement, and theoretically relies on simple least squares regression considerations.
Simulation results strongly support our findings and hence provide empirical evidence.
We illustrate the above points, via a simulated example from model (3.1), with a uniform design density. Figure 1.1 presents the true regression curve together with a typical
simulated data set. It is certainly desirable to use a local linear method in both ends and
local cubic/quadratic fits for the middle part of the curve. We implemented fixed order
approximations together with a variable order approximation (to be introduced in Section
2.5) for three different smoothing parameters in (2.18): k = 10, 20, and an automatic
choice of k (to be introduced in Section 2.6). Figures 1.2 - 1.4 show the pointwise mean
absolute deviation (standardized so that FAVORS has risk 1.0) at each location for different
order approximations, based on 400 simulations. As expected, the local quadratic and cubic
fits are about 1.7 times as variable as the local linear fit at both ends (where all fits are
asymptotically unbiased). The adaptive order approximation chooses the 10Gallinear fit at
these locations. In the middle part, the local linear and constant fits have a larger bias,
causing the larger estimation error -
the adaptive procedure chooses the local quadratic
and cubic method to adapt to the curvature. We also would like to mention that even if the
bandwidth is increased by a factor two, FAVORS remains performing well in comparison
with all fixed order approximations. In order words, FAVORS is less sensitive to the choice
of the bandwidth. This highlights another nice feature of FAVORS, which can be explained
as follows. When the bandwidth is too large for a local linear fit, the adaptive method tends
to choose a higher order approximation; when the bandwidth is too small for a local cubic
fit, the procedure opts for a lower order approximation. This 'robustness' to the chqice of
the smoothing parameter outlines another superiority of the proposed method.
IPut Figures 1.1 - 1.4 about here I
3
Another decisive aspect in the estimation procedure is how to choose the neighborhood,
and hence the effective number of data points used. This issue is addressed by introducing
and discussing an appealing variable bandwidth, which automatically adapts to the design
of the data points and which moreover possesses a large flexibilility to change the degree
of adaptation. Using this concept of variable bandwidth in combination with the proposed
variable order approximation method leads to an outstanding procedure that is highlighted
by two major properties: spatial-adaptation and design-adaptation. The smoothing parameter is estimated from approximation point of view, which is expected to be semiparametric
efficient. Essentially, the idea consists of providing a good finite sample estimate for the
Average Mean Squared Error (AMSE). The intuition in this estimation step relies on the interplay between least squares theory and function approximation considerations. These new
developments enable us to easily evaluate the approximate error (bias) and the variability of
the local polynomial approximation method, and hence to estimate its mean squared error.
Moreover, they open a gateway to deeper insight into local polynomial approximation. See
Section 2.2 for details. The expertise of this automatic selection procedure is illustrated via
simulated examples and the motorcycle data.
The proposed estimation procedure can be extended to the multivariate setting in a
straightforward way. However, the "curse of dimensionality" makes higher order approximation impractical due to lacking of effective local data points and a considerable increase
of the number of parameters. For example in the two-dimensional case, the local cubic
fit involves already estimation of 10 parameters. A more important issue in multivariate
settings is how to choose appropriate amounts of smoothing in the various directions. In
addressing this question we focuss our attention on local linear approximation, keeping the
number of parameters in the model on a moderate side.
The organization of the paper is as follows.
In the next section we introduce and
explain in detail each step of the Fully Adaptive Variable Order Regression Smoother. We
propose methods for assessing MSE, choosing the smoothing parameter and the order of
4
approximation. The theoretical findings in Section 2 are supported by empirical evidence
which is summarized in the results of four simulated examples, presented in Section 3. The
implementation of the methodology in higher dimensions is addressed in Section 4. Some
concluding remarks in Section 5 end this exposition.
2
Fully Adaptive Variable Order Approximation
We focus on I-dimensional nonparametric regression to introduce the basic ideas of the
methodology. For convenience of notation, we order the data according to the
Xl
~
...
~
XI
s, i.e.
X n , throughout this section. In the sequel, we explain step by step the concept
of the Fully Adaptive Variable Order Regression Smoother (FAVORS).
2.1
Local polynomial approximation
Suppose that we are interested in estimating the regression function at a point xo. If the
(r
+ 1 )th derivative of m( x) at the point Xo exists, we can approximate m( x) locally by a
polynomial of order r:
m(x) ~ m(xo) + m'(xo)(x - xo) +... + m(r)(xo)(x -
xor Ir!,
(2.1)
for x in a neighborhood of xo. This suggests to carry through a local polynomial regression:
(2.2)
where K(·) denotes a nonnegative weight function and hn
-
a smoothing parameter -
determines the size of the neighborhood of xo. Let {,Bj} denote the solution to the weighted
least squares problem (2.2). It is clear from (2.1) that j!,Bj estimates m(i)(xo), j = 0"", r.
With r
= 0,
the above method leads to the Nadaraya-Watson (1964) estimator, but
now motivated from functional approximation point of view. It has been shown that this
local constant approximation method is insufficient from both theorectical and practical
considerations: the bias of the estimator is large and boundary effects are largely notable (see
5
Fan (1992b». A local linear approximation repairs those drawbacks, as was investigated by
Fan (1992a) and Fan and Gijbels (1992a). Advantages of such a local linear approximation
method are, among others, adaptation to various types of designs highly clustered and nearly uniform -
random and fixed;
and absence of boundary effects. The above local
approximation idea was introduced by Stone (1977) and Cleveland (1979), among others. It
is only recently however that the merits of this estimation method are pronounced. A local
linear approximation might not be sufficient enough at a location near a peak or valley.
There a higher order approximation might be preferable. Cleveland and Devlin (1988)
investigated the local quadratic approximation method. Ruppert and Wand (1992) further
extend some of the results of Fan (1992a,b) and Fan and Gijbels (1992a) from local linear
approximation to higher order approximation and to the multivariate case.
2.2
Assessing bias and variance
The main purpose of this section is to give finite sample bias and variance expressions for
the local polynomial approximation method. These expressions will be used later on to
select the order of the approximation and the amount of smoothing.
Let W be a diagonal matrix with entries K (X~-"XQ ). Denote by X the design matrix
whose (l,j)th element is (Xl - xo)i- 1 and let y = (Y},···, Ynf. Then, the weighted least
squares problem (2.2) can be written in matrix form as:
(y - Xf3l W (y - Xf3),
where f3
= (f3o, ... ,f3r)T.
Ordinary least squares theory provides the solution
whose conditional mean and variance are:
6
(2.3)
where m
= (m(X1),''', m(Xn)l, r = m -
X/3 and ~
= diag( K2((Xi -
xo)/h n )U 2(Xi))'
Since the approximation (2.1) is local, it follows that the i th element of r is
where
/3i = mW(xo)fj!. The approximation order given in (2.5) is sufficient, as is inspired
by the root-bandwidth selection method (see for example Hall, Sheather, Jones and Marron
(1991)). Indeed, in case of estimating m(d)(xo) with r - d odd, the approximation in (2.5)
up to the (r + 3yd term guarantees an efficient procedure. However, the finite sample bias
would be underestimated especially in case of a higher order fit. To elaborate this point,
let us consider for example a local cubic fit. Then, the leading term in (2.5) {(Xi - xo)4}
is highly correlated with the column {(Xi - xo?} in the design matrix. Furthermore, the
second term in (2.5) has a high correlation with the column {(Xi - xo)3} in the design
matrix. Therefore, direct substitution of T = (Ti) into (2.4) would lead to a bias expression
of the form
(XTWX)-lXTWT
~ 0 (h~(XTWX)-1XTW {(Xi - XO)2}
= 0 (h~(O,O, 1,ol + h~(O,O,O, 1l)
+ h~(XTWX)-1XTW {(Xi -
XO)3})
+...
+ ....
Hence, the first two main terms in (2.5) do not contribute much in estimating the bias of
the regression coefficients
/30 and /31'
To improve the finite sample performance, we remedy
for this collinearity problem as follows. Let
Sn,j
o
= ~
~(Xi - xo) i K (Xi h- x ) .
1=1
(2.6)
n
Then, the approximated bias is
(2.7)
7
where Sn = XTWX is a (r
+ 1) x
(r
+ 1)
matrix whose (i,j)th element is sn,i+i-2' To
remove the collinearity effect, without influencing the asymptotic performance, the higher
order terms sn,r+aH, ..• ,Sn,2r+4 in the (r + 1) x 1 vector in (2.7) are replaced by zero, where
a = { 3,
4,
if (r - d) is odd
(2.8)
if( r - d) is even
In the case that r = 1, d = 0, we recommend taking a = 4. In other words, the collinearity
problem is handled by setting some appropriate higher order terms (Xi - xo)i equal to
zero. Clearly, such an operation has no effect on the asymptotic properties. Our empirical
experiences show an important gain in finite sample performance.
The variance in (2.4) can be approximated by
using the local homoscadesity. Remark that 13rH' ••. ,13r+a and
(1'2 ( Xo)
can be easily esti-
mated by the least squares method. More precisely, natural estimates for 13rH,' .• ,13r+a
are simply the estimated regression coefficients from fitting a (r
+ a)th
order polynomial
locally, and an intuitive estimate for (1'2(xO) is the weighted residual sum of squares:
(2.10)
In case that the homoscadestic linear model holds true, the above estimator of the variance
is indeed unbiased. Substituting the estimates for 13rH,' .• ,13r+a and
(1'2 ( xo)
into (2.7) and
(2.9), we obtain from (2.4) the following estimated bias and variance for ~:
~r+1Sn,r+l + ... + ~r+4Sn,r+4
=
1
5n
~r+1Sn,2r+l + ... + ~r+4Sn,2r+4
where as explained above
Sn,r+a+l
= 0,' .. ,Sn,2r+4 = O.
8
(2.11)
This approach provides an effective and easily-applicable method for estimating the finite
sample bias and variance.
We remark that (2.11) gives the estimated bias and variance, not only for m(xo) but also
for m(d)(xo)
= d!/1d.
For example, the estimated bias for m(d)(xo) is the (d + 1)th element
of the first expression in (2.11) multiplied with d!. Its estimated variance is given by the
(d + l)th diagonal element of the second expression in (2.11) times (d!?
2.3
Even versus odd order approximation
In the sequel we study in detail even versus odd order polynomial approximations. We
argue in favor of odd order approximations: local linear is preferable above local constant
approximation and local cubic outperforms local quadratic approximation, and so on. As a
consequence, if the interest is in estimating m( xo), only odd order polynomial approximations should be emphasized in practical implementations.
To provide evidence in favor of approximations of odd order T
= 2q+ 1 above approxima-
tions of even order r = 2q, we consider the asymptotic expressions for the bias and variance
of m(xo). It-was shown by Ruppert and Wand (1992) that for a nonrandom bandwidth hn ,
under certain conditions, the asymptotic bias and variance for odd order (2q+ 1) polynomial
approximation are of the form
(2.12)
where Cq and D q are constants depending on q and the kernel function
J(,
and fx(') denotes
the marginal density of X, i.e. the design density. On the other hand, the asymptotic bias
and variance for even order 2q polynomial approximation are determined by
The constant factors· in the asymptotic bias and variance can be calculated directly from
least squares theory. Indeed, for example, it is easy to see from (2.9) that
•
T
var(,BIX1, ••. , X n ) ~ H A
9
-1
(12 (Xo)
-1
BAH h
n
n
f ( ),
X Xo
(2.13)
where H = diag(1, h;;-l, ... , h;;-r), and A and B are respectively a (r + 1) X (r + 1) matrix
whose (i,j)th element is Pi+i-2 and vi+i-2, respectively, with Pi =
J ui K 2(u )du.
J ui K( u )du and
vi =
The asymptotic variance of m( xo) is the (1,1 )th element of the matrix on the
right-hand side of (2.13). For example for a local cubic fit (q = 1) we find
(2.14)
In contrast, a local linear and local constant fit involve the constant factors Co = ~P2 and
Do = vo. For a standard normal kernel,
For the symmetric beta kernel
(2.15)
we have
P2i =
Beta(j + 1/2,"'{ + 1)
Beta(1/2,"'{ + 1) ,
and
v2i=
Beta(j + 1/2,2"'{ + 1)
[Beta(1/2,"'{+1)J2 .
Note that the choices "'{ = 0,1,2, and 3 lead to respectively the uniform, the Epanechnikov,
the Biweight and the Triweight kernel function. The following table shows how much the
variance increases with the order of the approximation, relative to the variance of the
Nadaraya-Watson estimator (local constant fit). The calculations can be carried through
easily by using (2.13).
Table 1. Increase of the variability with the order of the polynomial approximation
Order of approximation
1 = Local Linear
2 = Local Quadratic
3 = Local Cubic
4
5
6
7
8
9
10
Normal
1
1.6876
1.6876
2.2152
2.2152
2.6762
2.6762
3.1224
3.1224
3.5704
Uniform
1
2.2500
2.2500
3.5156
3.5156
4.7852
4.7852
6.0562
6.0562
7.3281
10
Epanechnikov
1
2.0833
2.0833
3.1550
3.1550
4.2222
4.2222
5.28~2
5.2872
6.3509
Biweight
1
1.9703
1.9703
2.8997
2.8997
3.8133
3.8133
4.7193
4.7193
5.6210
Triweight
1
1.9059
1.9059
2.7499
2.7499
3.5689
3.5689
4.3753
4.3753
5.1744
Note that there is no loss in terms of asymptotic variance by doing a local linear instead of
a local constant fit. This remark applies to the comparison of any even order approximation
with its consecutive odd order approximation. However, the asymptotic variance increases
when moving from an odd order approximation to its consecutive even order approximation.
For example, in case of a Gaussian kernel, the variance increases by a factor of 1.6876 when
a local quadratic instead of a local linear approximation is used. The increase in variability
is more pronounced for the four other kernels: the smoother the kernel the less the increase
in variability.
The above results are an extension of a result of Fan (1992b) who establishes the asymptotic bias and variance expressions for the local linear approximation method (q=O). Moreover, he provides extensive evidence in favor of the local linear approximation method
above the local constant fit. A decision theoretical consideration suggests that the local constant fit has very low efficiency, partially caused by the appearence of the factor
m(2 q+l)(xo)!x(xo)/!x(xo)
whenever
m(2Q+1)(xo)
in the asymptotic bias. This annoying factor creates large bias
is large. Furthermore, the even order 2q polynomial fit does not re-
duce the asymptotic variance in comparison with a polynomial fit of order 2q+ 1. Therefore,
the even order approximation can only be used when
m(2 Q+1)(xo)
is small. For example,
the local constant fit can only perform well when the true regression curve is flat in the
neighborhood of xo. In that case, the local linear fit has approximately the same bias and
variance as the local constant fit. However, the former one definitely outperforms when the
regression curve is sloped. In any case, the local linear fit is preferable. Design-adaptation
and boundary-effect considerations give even more favor to the local linear fit (see Fan and
Gijbels (1992a)). The preceding intuitions and basic arguments apply also to the comparision of any consecutive pair of order 2q and 2q + 1 polynomial approximations. We remark
that there is no direct comparison among odd order approximations (as for example the
local linear and local cubic fit). Such a comparison depends on the curvature of the unknown regression function around xo. Therefore, a variable order approximation procedure
11
is desirable. This issue will be discussed in Section 2.5.
The preceding arguments are further supported by the following empirical evidence. We
simulate from the model
Y = 3 + cX 2
+ €,
X", N(O,l),
€ '"
N(0,0.5 2 ).
We use the Epanechnikov kernel and for the smoothing parameter hn we take the variable
bandwidth (2.18). The tuning parameter is chosen to be k = [0.3n°.8], where [z] denotes
the integer part of z. In the simulation, we take c =
°
and 2, corresponding to a flat and
a quadratic curve. The following table informs about the Mean Average Squared Error
(MASE) based on 400 simulations. For each simulation, we calculated the Average Squared
Error (ASE) as follows:
1
ASE
100
= 101 ?=(m(Xi) -
m(Xi))2,
(2.16)
1=0
and MASE is the average of the 400 different ASE values, where the grid point Xi =
-1.8
+ 0.036i.
Further, the contribution of the bias, respectively the variance in the mean
average squared error is given by:
1
(AB)2 = 101
100
1
400
400
j=l
L - L mJ(xd -
i=O {
2
.
m(xi)
,
(ASD)2 = MASE - (AB)2.
(2.17)
}
This Average Bias (AB) and the Average Standard Deviation (ASD) are listed in the table.
The MASE can easily be computed via (2.17). Further, the table provides conditional
(simulating a set of design points once and keeping them fixed across simulations) and
unconditional (design points change across simulations) results.
For the constant model c = 0, all local fits in Table 2 are unbiased, and hence the
conditional and unconditional results are similar. However, it is clear that local quadratic
and cubic approximations overfit the model, causing more variability. When k is moderate,
such as k
= 20 and 36 in Table 2, the local linear fit is very close to the local constant fit,
as justified by the asymptotic theory.
Table 2. Comparison odd and even order polynomial approximations
12
n, k
n = 200
k = 20
n =400
k = 36
order
0
1
2
3
0
1
2
3
Constant model c = 0
Condo results
Uncond. results
Quadratic model c = 2
Condo results
Uncond. results
AB
ASD
AB
ASD
AB
ASD
AB
ASD
0.0052
0.0055
0.0087
0.0094
0.0039
0.0040
0.0060
0.0060
0.0958
0.1028
0.1406
0.1509
0.0697
0.0741
0.1029
0.0047
0.0049
0.0068
0.0073
0.0038
0.0037
0.0062
0.0062
0..0936
0.0994
0.1394
0.1561
0.0687
0.0724
0.1019
0.1058
0.2527
0.0770
0.0087
0.0094
0.2396
0.0582
0.0060
0.0060
0.0958
0.1028
0.1406
0.1509
0.0697
0.0741
0.1029
0.1077
0.2125
0.0738
0.0068
0.0073
0.1969
0.0675
0.0062
0.0062
0.1798
0.1017
0.1394
0.1561
0.1251
0.0740
0.1019
0.1058
O.lOn
For the quadratic model, with c = 2, the local quadratic and cubic fits are unbiased,
whereas the local constant and linear fits are biased. As a consequence, this nonnegligible
bias causes the larger variability in the unconditional case. In the conditional case however,
this effect does not exist and hence, as can be seen from the table, the variance increases
with the order as already expected from the asymptotics. Further, the local constant fit
performs very poorly: the bias is considerably large due to its dependence on m'( x) = 4x.
For the presented moderate values of k the performance of the local quadratic and cubic
fits is very similar, which is again an illustration of the asymptotic findings.
2.4
Adaptation to Design Sparsity
The kernel method described in Section 2.1 uses the same amount of smoothing at each location. It ignores the information given by the sparsity of the data, and a serious criticism is
that it is not able to adapt to the design density. Silverman (1992) uses such an adaptationargument to argue in favor of the smoothing spline technique, where a variable bandwidth
(proportional to the (1/4)th power of the design density) is inherent to the method (see
Silverman (1984)). A similar adaptation is obtained by introducing the variable smoothing
parameter
(2.18)
where l is the index of the design point Xi closest to xo, and k is a factor that determines
the number of local data points. (Recall that the Xi'S have been ordered.) As intuitively
13
hoped, it uses a larger amount of smoothing at sparse regions and a smaller amount at
dense regions. The justification for this statement is given in Fan and Gijbels (1992b), who
show that
•
hkn(XO) =
whenever k n
-+ 00
such that kn/n
-+
kn
f ( )(1 + op(1)),
n X Xo
o.
(2.19)
Hence, like in smoothing splines, the variable
bandwidth adapts automatically to the design of the data points. Moreover, this type of
variable bandwidth is very appealing since it introduces an extra flexibility of changing
the degree of adaptation to the design points. For example, h~~4( xo), is equivalent with a
smoothing spline estimator, and hkn (xo) itself corresponds with a nearest neighbor type of
estimator. A special feature of the variable bandwidth in (2.18) is that it adapts to the
sparsity of the data points in a very natural and easily-implemented way. This natural
discretization is well in contrast with the continuous smoothing parameters appearing in
smoothing splines as well as in the usual kernel regression estimators, where extra numerical
efforts are needed in order to find an optimal bandwidth which minimizes some object
function such as a cross-validation curve. The advantage of the variable bandwidth in (2.18)
over a nearest-neighborhood type of bandwidth is that it usually produces a smoother curve
and is easier to deal with mathematically. Combination of this variable bandwidth with
the local polynomial regression method enhances the flexibility and interpretability of the
resulting estimators.
When Xo is near the boundary, the convention for the variable bandwidth (2.18) is as
follows. If l - k ~ 0 or
Xl-k ~ Cleft, Xl-k
is taken to be
left limiting point. A similar convention, involving
Cright
Cleft,
where
Cleft
is a prescribed
is attained for the right boundary.
This convention prevents for too few data points near the boundary, occuring in certain
design situations. With the above convention, the bandwidth (2.18) is "asymmetric" at the
boundary, and can be regarded as an attempt towards extrapolation. We finally remark
that with the smoothing parameter (2.18), the local polynomial approximation method is
equivariant under linear transformations of the data.
14
2.5
Variable order approximation
The intuition behind the idea of variable order approximation is as follows. Higher order
approximation gives a smaller bias, but results in a larger variance for the estimated parameters in (2.1), as discussed in Section 2.3. Therefore, it is desirable to change the degree
of approximation according to the curvature of the regression function. For instance, at
a flat region, a local constant fit would be a sufficient approximation. By using this local
constant approximation, we indeed improve upon the stability of the estimated coefficients.
At a sloped region, a local linear approximation is preferable and at peaks or valleys, local
quadratic or cubic approximations are recommendable. In other words, one has to adapt
the order of the approximation to the curvature of the regression function. Such a kind
of spatial-adaptation is also aimed by wavelets methods such as in Donoho (1992). This
idea equiped with the variable bandwidth (2.18) results into a design-adaptive and spatialadaptive procedure. As explained in Section 2.3, local constant and quadratic fits do not
have to be considered, when estimating m(.), since those fits are outperformed by local
linear and cubic fits. Nevertheless, we still include the local quadratic fit in the following
algorithm.
For a given point xo, the basic idea for selecting an order of polynomial approximation
is to choose that particular order which has the smallest estimated mean squared error.
The estimated mean squared error can be derived from (2.11). Suppose that the goal
is to estimate m(d)(xo), the dth order derivative of m in xo, and that we are interested
only in polynomial approximations up to order R. For a given point Xo and a given tuning
parameter k, the algorithm of adaptive variable order approximation consists of the following
steps:
Step 1. Fit a local polynomial (2.1) of order R + a, with a as in (2.8), in order to obtain
the estimated regression coefficients and u2 ( xo). In this step, we use the inflated
bandwidth 2.5hk( xo).
Step 2. For each r (d ::; r ::; R), compute the estimated MSE for the r th order polynomial
fit from (2.11).
15
Step 3. For each order T, obtain MSEr(Xi) (1 $ i $ ngrid) for all grid points, and calculate
their smoothed values by the weighted local average of the neighboring 2[ngrid kin] +1
values.
Step 4. Choose the order T xo which has the smallest estimated smoothed MSE, and use a
T xo order polynomial approximation to estimate m(d)(xo).
We first of all remark that the particular choice of the inflation factor 2.5 in Step 1 is
of minor importance. Further, Step 3 in the above algorithm may be ignored. However, we
experienced that this smoothing step produces a slight improvement on the results.
Remarks on Computational Aspects. First of all, note that the design matrices in Step 2
are submatrices of the design matrix in Step 1. Hence, it suffices to compute the latter one.
Secondly, br(xo) can be computed as a by-product of computing .Br(XO) - both involving
(XTWX)-l. Further, the design matrix for order
column of the design matrix for order
T
+ 1.
can be obtained by deleting the last
T
Hence, the stepwise deletion algorithm can be
used to invert matrices. The above algorithm is fast in computation, if all these structural
features are fully exploit.
Suppose interest is focussed on estimating the regression function. Then, from application point of view, an approximation order up to cubic (R=3) turns out to be flexible
enough for moderate sample sizes. It should be mentioned that higher order approximations
can possibly result into an ill-posed matrix inversion, causing numerical instability.
2.6
Selecting the smoothing parameter
As discussed in Section 2.4, the amount of effective local data points is determined by the
parameter k (see (2.18)). A way to determine this parameter is to choose k such that it
minimizes the estimated average mean squared error:
n
'd
1 ~
k = argmmko<k<[n!2j-- ~ MSE(Xij k),
- ngrjd i=l
A .
A
(2.20)
where k o is some initial integer value. This provides a rule for selecting the smoothing
parameter k, which can also be applied for any fixed order approximation. To summarize
16
the method, let us consider a local r th order fit for estimating m(d)(.). For this the algorithm
reads as follows:
Step 1. For a given k, and a grid point Xi fit a local polynomial (2.2) of order r + a, with a
as in (2.8), in order to obtain ~r+b'" ,~r+a and o-2(Xi)' Here, the inflated bandwidth
2.5hk(Xi) is used.
Step 2. Calculate MSEr(xi; k) as in (2.11) for each grid point, and choose
k as in
(2.20).
Step 3. Fit a local r th order polynomial using k and take its estimated (d + 1)st coefficient
multiplied by d! as the estimated ~h-derivative curve.
Remark that the parameter k is discrete. Therefore, the above minimization problem is
very easy to solve. Also, when the sample size is large, one can further restrict the domain
of the minimization problem to
with
K.
> 1 and an integer io, reducing the computation significantly.
We use the motorcycle data provided by Schmidt, Mattern and SchUler (1981) and
analyzed, using smoothing splines, by Silverman (1985), to illustrate the automatic choice
of the bandwidth. Figure 2.1 shows the real data as well as the FAVORS fit.
Figure
2.2 presents the local constant, linear, quadratic and cubic fits using for each of them the
automatic choice ofthe tuning parameter k, in overlay with FAVORS. In bandwidth aspect,
they all work nicely, but FAVORS fits slightly better at valleys and corners.
IPut Figures 2.1 - 2.2 about here I
Our selection procedure has a similar virtue as cross-validation or "plug-in" techniques.
But the cross-validation method suffers a slow rate of convergence. See, for example, HardIe
and Marron (1985), HardIe, Hall and Marron (1988), Vieu (1991), and Hall and Johnstone
(1992). This drawback encourages statisticians to consider the modern data-driven bandwidth selection rule based on the "plug-in" idea applied to the MISE optimal bandwidth.
See, for example, Gasser, Kneip and Kohler (1991), Hall, Sheather, Jones and Marron
(1991), and Jones and Sheather (1991).
17
We would expect that the proposed selection procedure is an asymptotic semiparametric
efficient method with a good finite sample performance, as already indicated while explaining the approach in Section 2.2. For instance, the measure AMSE is "less asymptotic" than
the measure IMSE because we only estimate the curve at the grid points. When
ngrid -+ 00,
both measures coincide. In this limit situation, an optimal choice for the tuning parameter
k is then obtained by minimizing the asymptotic weighted IMSE, which can be determined
from expression (2.12). For example, for a local cubic fit, this optimal k is
_
kO'Pt -
D1
(8Cl)
1 /9 (
f (12(x)w(x)dx
)
f(m(4l(x))2 w (x)/ fl(x)dx
1/9
8/9
n,
where w(·) is a nonnegative weight function and with C 1 and D1 as in (2.14). Our
be an asymptotic efficient estimate of kO'P t when
3
ngrid
k would
= O( n).
Examples
We now present four simulated examples in order to compare the performance of various
fixed order and the proposed variable order polynomial approximations. The empirical
investigations vote strongly in favor of FAVORS, the Fully Adaptive Variable Order Regression Smoother. In addition, FAVORS is robust to the choice of the bandwidth, as
evidenced below. Also, for each fixed order approximation we have experienced the outstanding performance of the proposed bandwidth selection method.
In each of the examples the sample size n is 200 and the number of simulations is 400.
The Epanechnikov kernel is used and the effective local data points are determined by the
variable bandwidth (2.18). Below we report on the pointwise mean absolute deviation error,
standardized by the L1 -risk of FAVORS at each location. It is this ratio which is plotted in
the figures. In Tables 3 - 6 the Mean Absolute Deviation Error (MADE) was computed in
accordance with (2.16) except that the L1-10ss is used, with its SD indicated after
±. We
prefer to report here the L1 -risk instead of the L2-risk, because for L 2-10ss, the errors from
peak/valley - regions dominate too much on the measure of the overall performance. The
18
Average Bias (AB) and the Average Standard Deviation (ASD) are calculated as in (2.17).
Further, in each table we list bias and standard deviation of the regression smoother at the
points Xo = -1 and Xo =
°(see the last two columns). For each of the four examples we
investigate the performance of the methods in case a fixed tuning parameter k, 10 or 20, is
chosen, and in case the automatic
k (see (2.20»
is applied.
Example 1. We first of all consider the model:
y = X +2exp(-16X 2 ) +£,
X'" Uniform(-2,2),
£ '" N(0,0.5 2 ).
(3.1)
The results were reported in Figure 1, and are summarized in the following table.
Table 3. MADE, Biases and Standard Deviations for Example 1*
Bias and SD
Bias and SD
Method
MADE
AB± ASD
at Xo = 0
at Xo = -1
FAVORS 0.0945 ± 0.0211 0.0448 ± 0.1149 -0.1634 ± 0.1528
0.0035 ± 0.0999
-0.6548 ± 0.1760
0.0041 ± 0.0957
0.1426 ± 0.0226
0.1811 ± 0.1122
0
k
10
1
2
3
FAVORS
0
1
2
3
20
FAVORS
0
1
2
3
k
0.1370 ± 0.0228
0.1107 ± 0.2105
0.1109 ± 0.0214
0.1123 ± 0.0210
0.1153 ± 0.0218
0.1115 ± 0.0214
0.1453 ± 0.0206
0.1500 ± 0.0212
0.1058 ± 0.0254
0.1864 ± 0.0566
0.1575 ± 0.0409
0.1310 ± 0.0298
0.1174 ± 0.0229
0.1777 ± 0.1002
0.0467 ± 0.1329
0.0450 ± 0.1337
0.0283 ± 0.1420
0.0666 ± 0.1343
0.0632 ± 0.1285
0.0106 ± 0.1833
0.0104 ± 0.1895
0.0864 ± 0.1203
0.2500 ± 0.1652
0.2117 ± 0.1333
0.1242 ± 0.1358
0.0450 ± 0.1438
• Mean Average Squared Error can be obtained via MASE
MSE
Bias2 + SD2.
=
-0.6459 ± 0.1729
-0.1688 ± 0.1518
-0.1634 ± 0.1528
-0.0474 ± 0.2048
-0.2516 ± 0.1733
-0.2421 ± 0.1708
-0.0233 ± 0.1865
-0.0213 ± 0.1931
-0.3048 ± 0.2251
-0.8854 ± 0.4057
-0.7572 ± 0.3289
-0.4385 ± 0.2529
-0.1638 ± 0.1925
= AB2 + ASD2.
0.0034 ± 0.0911
0.0026 ± 0.1264
0.0021 ± 0.1285
0.0046 ± 0.1258
0.0054 ± 0.1246
0.0042 ± 0.1252
0.0041 ± 0.1800
0.0021 ± 0.1846
-0.0050 ± 0.1064
0.0376 ± 0.1032
0.0029 ± 0.0900
-0.0136 ± 0.1118
-0.0041 ± 0.1364
MSE at point 0 can be computed by
Note that the MADE is smallest for the adaptive method, with automatic choice of
k. From Figure 1 and Table 3 it can be seen first of all that the fully adaptive method
outperforms all other fixed order polynomial approximation methods. Further, it is clear
that FAVORS is 'robust' to the choice of k, as explained in the introduction.
Example 2. In this example, we consider a somewhat more sophisticated regression
19
curve:
Y = sin(2X)
+ 2 exp( -16X 2 ) + C,
X - Uniform( -2,2),
c - N(O, 0.42).
(3.2)
The true regression curve as well as the performance of each procedure is demonstrated
in Figures 3.1 - 3.4. The overall performance, using the automatic choice of k for each
method, is further summarized in Table 4.
IPut Figures 3.1 - 3.4 about here I
=
Table 4. MADE, Biases and Standard Deviations for Example 2, k k
Method
MADE
AB± ASD
Bias and SO at Xo = 0
Bias and SO at Xo = -1
FAVORS
0.0920 ± 0.0206
0.0663 ± 0.1050
-0.2344 ± 0.1937
0.0096 ± 0.0969
0
0.1119 ± 0.0322
0.1001 ± 0.1296
-0.3725 ± 0.2817
0.0339 ± 0.0971
1
0.1029 ± 0.0262
0.0960 ± 0.1108
-0.3531 ± 0.2415
0.0293 ± 0.0934
2
0.1078 ± 0.0249
0.1003 ± 0.1151
-0.0046 ± 0.0914
-0.3581 ± 0.2231
0.0432 ± 0.1176
3
0.0967 ± 0.0191
-0.1572 ± 0.1709
0.0036 ± 0.1094
The conclusions which can be made from this table are similar to those mentioned for
Example 1.
Example 3. In this example, we focus on the model
Y = 0.3exp( -4(X
+ 1)2) + 0.7exp( -16(X -
1)2)
+ C,
X - Uniform( -2,2),
(3.3)
where c - N(0,0.2 2 ). Figures 4.1 - 4.4 give an overview on the performance of FAVORS,
compared with fixed order approximation methods. Table 5 reports on the simulation
results with the automatic choice of k.
IPut Figures 4·1 - 4·4 about here I
Table 5. MADE, Biases and Standard Deviations for Example 3, k = k
Method
MADE
AB± ASD
Bias and SO at Xo = -1
Bias and SO at Xo = 0
FAVORS
0.0401 ± 0.0080
0.0194 ± 0.0479
-0.0191 ± 0.0491
0.0048 ± 0.0352
0
0.0553 ± 0.0163
0.0616 ± 0.0536
-0.0322 ± 0.0448
0.0059 ± 0.0380
1
0.0468 ± 0.0.103
0.0449 ± 0.0458
-0.0200 ± 0.0435
0.0036 ± 0.0385
2
0.0447 ± 0.0083
-0.0017 ± 0.0444
0.0279 ± 0.0508
-0.0047 ± 0.0458
3
0.0457 ± 0.0088
-0.0016 ± 0.0540
0.0137 ± 0.0565
-0.0004 ± 0.0547
20
Example 4. As a last example, we consider a simple linear model to show that FAVORS
does not intend to choose a higher order fit, and hence is an 'honest' procedure.
Y = O.4X + 1 + c,
X", N(O,l),
c'" N(O,O.3 2 ).
(3.4)
Figures 5.1 - 5.4 clearly indicate that the performance of FAVORS is as well as that of
the local linear fit. Hence, FAVORS clearly does not overfit the model. Table 6 further
supports our assertion.
IPut Figures 5.1 - 5.4 about here I
Table 6. MADE, Biases and Standard Deviations for Example 4, k = k
MADE
Bias and SD at Xo = 0
Bias and SD at Xo = -1
Method
AB± ASD
0.0011 ± 0.0312
FAVORS 0.0338 ± 0.0126 0.0020 ± 0.0459
0.0011 ± 0.0360
0
1
2
3
4
0.0470 ±
0.0322 ±
0.0411 ±
0.0516 ±
0.0145
0.0120
0.0129
0.0168
0.0297 ±
0.0022 ±
0.0032 ±
0.0045 ±
0.0523
0.0430
0.0545
0.0702
-0.0003 ±
0.0007 ±
0.0007 ±
0.0017 ±
0.0437
0.0258
0.0390
0.0531
0.0332 ±
0.0015 ±
0.0029 ±
0.0009 ±
0.0439 .
0.0347
0.0436
0.0537
Extension to Higher dimensions
In this section, we discuss briefly how the methodology described in Section 2 can be applied
in the multivariate case. Because of the intrinsic difficulty of the "curse of dimensionality",
one should not expect to apply this method successfully to more than 3 dimensions. The
philosophy and basic ideas of the variable order approximation can be easily extended to
the multivariate setting. However, the number of parameters increases drastically with the
number of dimensions, if higher order approximation is used. For example, a local cubic fit
in two dimensions involves already estimation of 10 parameters. Hence, it is not really recommendable to use an order of approximation higher than a local linear fit. Consequently,
we will not concentrate on variable order approximation in the multivariate setting, although the extension is straightforward. Throughout this section, we will focus on the
local linear approximation. The challenging question is how to determine the appropriate
amount of smoothing.
21
To introduce an appropriate amount of smoothing in each direction, let (XU,"" Xlp, YI ),
... , (Xnll ... ,Xnp , Yn ) be an i.i.d. sample. Let
X
=
(Xll"',
x p ) be a certain fixed point
where the unknown regression function has to be estimated. An intuitive procedure consists
of applying the variable bandwidth (2.18) in each direction separately. More precisely, the
window width in the jth direction is given by
h'3,k (X ) --
X l* L'-Xl*
j+",3
j-",3
2
'
L'
(4.1)
where k determines the amount of smoothing, and Xi,j ::; Xi,j .. , ::;
X~,j
are the ordered
X-values in the jth dimension and lj is the index associated with the order statistic closest
to Xj. Evidently, such an approach is scale-equivariant.
An appropriate multivariate version of a nonparametric regression estimator based on
a local linear approximation is obtained via the following least squares problem:
2
n
L
p
}'i-{3O-L{3j(Xij -Xj)
i=1 (
j=1
K(.h1,k(x) ,"',.hp,k(X) ),
XiI -
)
Xip -
Xl
where K is a nonnegative multivariate kernel function. Let
/3 denote
Xp
(4.2)
the solution vector
which minimizes (4.2). Define the nonparametric local linear regression smoother as follows:
(4.3)
Then, these estimators are location- and scale-equivariant. Clearly, this estimator mk(x)
inherits all nice features of the local linear smoother in I-dimension and moreover adapts
automatically to the sparsity of the design density. The kernel function K is typically
chosen to be of one of the following forms:
K(UI,"', up) =
IT
W(Uj)
or
K(uI,"', up) = W
3=1
((t
UJ)I/2) ,
3=1
where the I-dimensional kernel function W is commonly chosen to be either a Gaussian
density or a symmetric beta density.
We now introduce the method of bandwidth selection. To avoid complicated notations,
we outline the idea for the two-dimensional case. For each point, x
22
= (Xl,X2),
let X
=
(1, Xi,l - XI, Xi,2 - X2) be the n
X
3 design matrix. The bias and variance of mk(x) is
approximately the first element of
where W is the diagonal matrix with weights as in (4.2) and r is an approximation error
whose i th element is given by
The coefficients
133,134, 135
and (12(xO) can be estimated by running a local quadratic fit (6
parameters) with an, by a factor 2, inflated smoothing parameter in each direction. When
there is lack of data for a local linear fit at a particular grid point, we use a local constant
fit. Its estimated bias is (4.4) with (4.5) suitably modified by adding linear terms. Finally,
choose the tuning parameter
k that
minimizes the weighted average of the estimated MSE
at the prescribed grid points.
The above idea is a natural extension of the I-dimensional selection rule. We only
approximate the bias up to the second order in (4.5) because higher order approximations
require more parameters to fit, resulting into heavy computation and the possibility of
lacking sufficient local data points.
The following two simulated examples illustrate the methodology. A random sample of
size n = 400 is simulated respectively from the model:
Yi = 0.7 exp( -3((X1 + 0.8)2 + 8(X2 - 0.5)2)) + exp( -3((X1 - 0.8)2 + 8(X 2 - 0.5)2)) + E
Yi
5
-exp(-5XI 2j8)+E
=
where
(4.6)
1r
£
f'V
N(0,0.1 2 ), Xl
f'V
Uniform( -2,2) and X 2
f'V
Uniform(O, 1), independently. Model
(4.6) is used in Herrmann, Wand, Engel and Gasser (1992), but here we doubled the noise
level. Also here, the Epanechnikov kernel was used. The following figures show the true
as well as the estimated surfaces based on 51
parameter
k=
X
51 grid points. The automatic smoothing
52, respectively 53, was selected in the first, respectively the second, model.
23
IPut Figures 6.1 - 6.4 about here I
5 . Concluding Remarks
The concept of variable order polynomial approximation is intuitively appealing. It outperforms all fixed order approximation methods. FAVORS adapts to the curvature of the
regression curve, which enhances the efficiency of the smoothing method. Also, with the
variable order approximation, the choice of the bandwidth is less crucial than for any fixed
order approximation. This feature makes the task of curve fitting much easier and robust.
Another important aspect of this paper is the choice of the smoothing parameter. We
introduce a smoothing parameter which automatically adapts to the sparsity of the design density. The amount of smoothing is determined based on a function approximation
idea. The proposed bandwidth selection method does not strongly rely on the asymptotic
expansions and hence performs well for finite samples. We have experienced a very good
performance of the selection procedure. The asymptotic efficiency of the proposed bandwidth selection method is currently under investigation, and we will report it elsewhere.
The presented idea of variable order approximation as well as the bandwidth selection
rule can easily be extended to more sophisticated models such as Generalized linear models,
censoring models and the L1-method. These extensions are under study.
REFERENCES
Cleveland, W.S. (1979). Robust locally weighted regression and smoothing scatterplots.
Jour. Amer. Statist. Assoc., 74829-836.
Cleveland, W.S. and Devlin, S.J. (1988). Locally-weighted regression: an approach to
regression analysis by local fitting. Jour. Amer. Statist. Assoc. 83 597-610.
Donoho, D. L. (1992). Nonlinear solution of linear inverse problems by Wavelet-vaguelette
decomposition. Department of Statistics, Stanford, California, Technical report.
Fan, J. (1992a). Local linear regression smoothers and their minimax efficiency. Ann.
Statist., to appear.
Fan, J. (1992b). Design-adaptive nonparametric regression. Jour. Amer. Statist. Assoc.,
87, to appear.
24
Fan, J. and Gijbels, 1. (1992a). Variable bandwidth and local linear regression smoothers.
Ann. Statist., to appear.
Fan, J. and Gijbels, 1. (1992b). Censored regression: nonparametric techniques and their
applications. Mathematical Sciences Research Institute, Berkeley, California, Technical report no. 044-92.
Gasser, T., Kneip, A. and Kohler, W. (1991). A flexible and fast method for automatic
smoothing. Jour. Amer. Statist. Assoc., 86, 643-652.
Hardle, W., Hall, P. and Marron, J. S. (1988). How far are asymptotically optimal smoothing parameters away from their optimum? (with discussion) Jour. Amer. Siatist.
Assoc., 83, 86-101.
Hardle, W. and Marron, J. S. (1985). Optimal bandwidth selection rule in nonparametric
regression function estimation. Ann. Statist., 13, 1465-148l.
Hall, P. and Johnstone, 1. (1992). Empirical functionals and efficient smoothing parameter
selection (with discussion). Jour. Royal Statist. Soc. B, 54,475-530.
Hall, P., Sheather, S.J., Jones, M.C. and Marron, J.S. (1991). On optimal data-based
bandwidth selection in kernel density estimation. Biometrika, 78, 263-271.
Herrmann, E., Wand, M.P., Engel, J. and Gasser, T. (1992). A bandwidth selector for
bivariate kernel regression. To appear in Jour. Royal Statist. Soc. B.
Jones, M.C. and Sheather, S.J. (1991). Using nonstochastic terms to advantage in kernelbased estimation of integrated squared density deriva,tives. Statist. Prob. Letters 11
511-514.
Lejeune (1985). Estimation non-parametrique par noyaux: regression polynomiale mobile.
Revue de Statist. Appliq. 33 43-68.
Muller, H.G. (1987). Weighted local regression and kernel methods for nonparametric
curve fitting. Jour. Amer. Statist. Assoc. 82 231-238.
Nadaraya, E.A. (1964). On estimating regression. Theory Probab. Appli., 9, 141-142.
Ruppert, D. and Wand, M.P. (1992). Multivariate weighted least squares regression. Department of Statistics, Rice University, Technical report no. 92-4.
Schmidt, G., Mattern, R. and Schuler, F. (1981). Biomechanical investigation to determine
physical and traumatological differentiation criteria for the maximum load capacity
of head and vertebral column with and without protective helmet under effects of
impact. EEC Research Program on Biomechanics of Impacts. Final Report Phase
TIl, Project 65, Institut fUr Rechtsmedizin, Universitiit Heidelberg, Germany.
Silverman, B.W. (1984). Spline smoothing: the equivalent variable kernel method. Ann.
Statist. 12 898-916.
25
Silverman, B.W. (1985). Some aspects ofthe spline smoothing approach to nonparametric
regression curve fitting (with discussion). Jour. Royal Statist. Soc. B,47, 1-52.
Silverman, B.W. (1992). In :Discussion on the paper "Choosing a kernel regression estimator", by Chu, C.K. and Marron, J.S., Statist. Sci., 6, 404-436.
Stone, C.J. (1977). Consistent Nonparametric Regression. Ann. Statist. 5 595-645.
Vieu, P. (1991). Nonparametric regression: optimal local bandwidth choice. Jour. Royal
Statist. Soc. B, 53, 453-464.
Watson, G.S. (1964). Smooth regression analysis. Sankhyii Ser. A, 26, 359-372.
26
fits with k = 10
Simulated Example 1
(')
0
("j
.
.*:
N
It)
N
0
N
>.
-
It)
0
-
~
,
~
f.Y!
l ....
-1"';';';'
;,.;}f'" ~.,~,;;;. . . . . , . "" --\
y
.r--
~-
It)
-1
o
,;
' / \;
-.~
{
, . ;;'~. . ,-\..,,"',,~"~
..:0."'"
~
,/
.... ..J
"".r.•/ ..,"'-"'""~-
-
~
local constant - - • local linear
·.-.-.locaI quadratic ...... Iocal cubic
0
-2
_
0
,
.~.'.
\.
0
o
-1
2
x
Figure 1.1
Figure 1.2
fits with k = 20
fits with automatic choice of k
("j
o
0
("j
It)
It)
N
N
o
N
0
N
~
-
--''';;
I'~"
'...;..~.I"
~
__"
."ii ' • • - •
It)
o
local constant - - - local linear
-.•... local quadratic ...... local cubic
o
o
.... ..,.Ii
~
'"
_.~
•• ,
~
;t-2.
, .-
'-
.. ..
~-
......
.....
It)
local constant - - • local linear
0
-.-... local quadratic ...... local cubic
0
0
-1
o
Figure 1.3
-1
o
Figure 1.4
~ ~,
--
,. ,
Acceleration(g)
-150
-100
-50
o
50
100
o
o
I\)
~
0
.a'=!
c 3
Gl..!.
1\)3
9.
0
(3
'<
(')
Go)
0
0
CD
:......!!!.
0
a
~
Sl)
0
Ol
o
Acceleration(g)
-150
-100
-50
o
50
100
o
....
o
I\)
0
.a'=!
c 3
i.!..
1\)3
•
lit
1\)-
~
9.
0
(3
Go)
0
'<
(')
CD
0
~
0
Ul
o
a
Sl)
Simulated Example 2
fits with k
....
= 10
0
c?
II)
C\I
N
0
N
~
>.
o
.." . . ;.·:.:-.··.i\.·.:.. '.~. .." ,_,
;
r-
"\.
-2
local constant ••• local linear
0
-.-.-. local quadratic ...... local cubic
0
,
o
-1
0
2
-1
o
x
Figure 3.1
Figure 3.2
fits with k = 20
fits with automatic choice of k
o
o
c?
c?
II)
II)
N
N
o
N
o
N
~
~
:Vi
~
:
y,
~
II)
(
local constant - - - local linear
o
-.-.-. local quadratic ...... local cubic
o
o
~
II)
local constant - - - local linear
o
-.-.-. local quadratic ...... local cubic
o
o
-1
o
Figure 3.3
~"-- ,.~
....
II)
..
j., ........;;:.
j
\
~
'7
i'~ "'vi \
oJ
-1
o
Figure 3.4
fits with k = 10
Simulated Example 3
o
c?
~
...
It)
C\i
o
C\i
It)
o
>.
~
o
•*
..
....*. . .
...
o
-1
-2
..
..."
'.
'
~
..'.
-
.,. ..... -,;,'~":."_:'';;'.
..,
.
-" "\
...,;
,.. ~ \. ~ 'oi".... ;.;;..~.~ :';';""-"
'...:
/'
/
~
It)
o
"
\
,.
"'.
local constant - - - local linear
-.-.-.Iocal quadratic ...... local cubic
o
o
2
o
-1
x
Figure 4.1
Figure 4.2
fits with k = 20
fits with automatic choice of k
o
~
c?
CO)
It)
It)
C\i
C\i
o
C\i
o
C\i
~
....
l
- ..... _ ...rc..'....
~
\.
r··.....
~
,. ...;;""'.~.......
.c.,;... ..,. ••,.
\."
~
.;
~
It)
o
local constant - - - local linear
-.-.-. local quadratic ...... local cubic
o
i·~·"\
'"
--
~-'-,_ ....
~~, ....•.
-""\-
~.,~.;;:
It)
o
local constant - - - local linear
-.-.-. local quadratic ...... local cubic
o
o
o
-1
o
Figure 4.3
-1
o
Figure 4.4
fits with k = 10
Simulated Example 4
o
It)
t?
:N ~I
~
>.
j
*
~
.
It)
o
~
*
* *
• *
..
..'
~
* **e M*
*.
....
~**:~
.:
**
..
••
*
It)
N
* *
N
._..
..... ~-.:)
•
•
o
...~**~M_*~
*..
"
.
: •• *• •
,
*
..
•
".
•
•
-.
~
\
-
\ "'" til' -
-
'"
*
*
C!
-r
r~·~·v'::;....,;;··..,.·, ~··~·,~~··~·t·.;;:·;~··~···. . ~~.--
local constant .. - local linear
...... local quadratic ...... local cubic
o
o
-1
o
-1
2
x
Figure 5.1
Figure 5.2
fits with k = 20
fits with automatic choice of k
o
o
t?
t?
It)
It)
N
N
o
N
N'
C!
"
o
-2
~
....
~
o
q
,":'- - _
~
It)
It)
-
01
... ~"
, .... -
::;,;.;:.'
..:;1J.•. I'."co
~
.yo.,.. '-'
~---
It)
o
o
.~-
------
._.~ ,~..,.,,, '",. ..::..~'::..
local constant
local linear
·.·.-.Iocal quadratic
local cubic
o
"';0
_
~
~
.....
.d
~~~~:~
~
••••
•.••
--,
C!
It)
~
IocaJ constant _.. local linear
o
-..... local quadratic ...... local cubic
o
o
-1
o
Figure 5.3
-1
o
Figure 5.4
d
--
~,~~
True Surface (Bimodal)
Estimated Surface (Bimodal)
C'!
.....
co
ci
co
cD
ci
NU!
o
ci
N"':
ci
C\I
..,
o
ci
C\I
ci
o
o
C\l
q
U!
~
l\i
.....
-1
Figure 6.1
Figure 6.2
True Surface (Tunnel)
Estimated Surface (Tunnel)
~
~
.....
N
N% ~
In
ci
R
........
ci
l?~
-,
Figure 6.3
0
l?~
-,
Figure 6.4

Download Report

Fan, Jianqing and Gijbels, Irene; (1992).Spatial and Design Adaptation: Adaptive Order Polynomial Approximation in Function Estimation."

Paperzz.com

Your Paperzz