The live method for generalized additive volatility models

LSE Research Online
Article (refereed)
Woocheol Kim and Oliver B. Linton
The live method for generalized additive volatility
models
Originally published in Econometric theory, 20 (6). pp. 1094-1139 © 2004
Cambridge University Press.
You may cite this version as:
Kim, Woocheol & Linton, Oliver B. (2004). The live method for
generalized additive volatility models [online]. London: LSE Research
Online.
Available at: http://eprints.lse.ac.uk/archive/00000321
Available online: July 2005
LSE has developed LSE Research Online so that users may access research output of
the School. Copyright © and Moral Rights for the papers on this site are retained by the
individual authors and/or other copyright owners. Users may download and/or print
one copy of any article(s) in LSE Research Online to facilitate their private study or for
non-commercial research. You may not engage in further distribution of the material or
use it for any profit-making activities or any commercial gain. You may freely distribute
the URL (http://eprints.lse.ac.uk) of the LSE Research Online website.
http://eprints.lse.ac.uk
Contact LSE Research Online at: [email protected]
Econometric Theory, 20, 2004, 1094–1139+ Printed in the United States of America+
DOI: 10+10170S026646660420603X
THE LIVE METHOD FOR
GENERALIZED ADDITIVE
VOLATILITY MODELS
WO O C H E O L KI M
Korea Institute of Public Finance
and
Humboldt University of Berlin
OL I V E R LI N T O N
The London School of Economics
We investigate a new separable nonparametric model for time series, which includes
many autoregressive conditional heteroskedastic ~ARCH! models and autoregressive ~AR! models already discussed in the literature+ We also propose a new estimation procedure called LIVE, or local instrumental variable estimation, that is
based on a localization of the classical instrumental variable method+ Our method
has considerable computational advantages over the competing marginal integration or projection method+ We also consider a more efficient two-step likelihoodbased procedure and show that this yields both asymptotic and finite-sample
performance gains+
1. INTRODUCTION
Volatility models are of considerable interest in empirical finance+ There are
many types of parametric volatility models, following the seminal work of Engle
~1982!+ These models are typically nonlinear, which poses difficulties both in
computation and in deriving useful tools for statistical inference+ Parametric
models are prone to misspecification, especially when there is no theoretical
reason to prefer one specification over another+ Nonparametric models can provide greater flexibility+ However, the greater generality of these models comes
at a cost—including a large number of lags requires estimation of a highdimensional smooth, which is known to behave very badly ~Silverman, 1986!+
The “curse of dimensionality” puts severe limits on the dynamic flexibility of
This paper is based on Chapter 2 of the first author’s Ph+D+ dissertation from Yale University+ We thank Wolfgang Härdle, Joel Horowitz, Peter Phillips, and Dag Tjøstheim for helpful discussions+ We are also grateful to
Donald Andrews and two anonymous referees for valuable comments+ The second author thanks the National
Science Foundation and the ESRC for financial support+ Address correspondence to Woocheol Kim, Korea Institute of Public Finance, 79-6 Garak-Dong, Songpa-Gu, Seoul, Republic of Korea, 138-774; e-mail: wkim@kipf+re+kr+
Oliver Linton, Department of Economics, The London School of Economics, Houghton Street, London WC2A
2AE, United Kingdom; e-mail: lintono@lse+ac+uk+
1094
© 2004 Cambridge University Press
0266-4666004 $12+00
LIVE METHOD FOR VOLATILITY
1095
nonparametric models+ Separable models offer an intermediate position between
the complete generality of nonparametric models and the restrictiveness of parametric models+ These models have been investigated in cross-sectional settings
and also in time series settings+
In this paper, we investigate a generalized additive nonlinear autoregressive
conditional heteroskedastic model ~GANARCH!:
yt 5 m~ yt21 , yt22 , + + + , yt2d ! 1 u t ,
u t 5 v 102 ~ yt21 , yt22 , + + + , yt2d !«t ,
S
S
d
m~ yt21 , yt22 , + + + , yt2d ! 5 Fm cm 1
v~ yt21 , yt22 , + + + , yt2d ! 5 Fv cv 1
D
D
( m a ~ yt2a !
a51
d
( va ~ yt2a !
a51
,
,
(1.1)
(1.2)
(1.3)
where m a ~{! and va ~{! are smooth but unknown functions and Fm ~{! and Fv ~{!
are known monotone transformations ~whose inverses are Gm ~{! and Gv ~{!,
respectively!+1 The error process, $«t %, is assumed to be a martingale difference
with unit scale, that is, E~«t 6Ft21 ! 5 0 and E~«t2 6Ft21 ! 5 1, where Ft is the
t
s-algebra of events generated by $ yk % k52`
+ Under some weak assumptions, the
time series of nonlinear autoregressive models can be shown to be stationary
and strongly mixing with mixing coefficients decaying exponentially fast+ Auestadt and Tjøstheim ~1990! use a-mixing or geometric ergodicity to identify
their nonlinear time series model+ Similar results are obtained for the additive
nonlinear autoregressive conditional heteroskedastic ~ARCH! process by Masry
and Tjøstheim ~1997!; see also Cai and Masry ~2000! and Carrasco and Chen
~2002!+ We follow the same argument as Masry and Tjøstheim ~1997! and will
assume all the necessary conditions for stationarity and mixing property of the
process $ yt % nt51 in ~1+1!+ The standard identification for the components of the
mean and variance is made by
E @m a ~ yt2a !# 5 0
and
E @va ~ yt2a !# 5 0
(1.4)
for all a 5 1, + + + , d+ The notable aspect of the model is additivity via known
links for conditional mean and volatility functions+ As will be shown later, ~1+1!–
~1+3! include a wide variety of time series models in the literature+ See Horowitz ~2001! for a discussion of generalized additive models in a cross-section
context+
In a much simpler univariate setup, Robinson ~1983!, Auestadt and Tjøstheim ~1990!, and Härdle and Vieu ~1992! study the kernel estimation of the
conditional mean function m~{! in ~1+1!+ The so-called CHARN ~conditionally
heteroskedastic autoregressive nonlinear! model is the same as ~1+1! except
that m~{! and v~{! are univariate functions of yt21 + Masry and Tjøstheim ~1995!
and Härdle and Tsybakov ~1997! apply the Nadaraya–Watson and local linear
1096
WOOCHEOL KIM AND OLIVER LINTON
smoothing methods, respectively, to jointly estimate v~{! together with m~{!+
Alternatively, Fan and Yao ~1996! and Ziegelmann ~2002! propose local linear
least square estimation for the volatility function, with the extension given by
Avramidis ~2002! based on local linear maximum likelihood estimation+ Also,
in a nonlinear vector autoregressive ~VAR! context, Härdle, Tsybakov, and Yang
~1998! deal with the estimation of conditional mean in a multilagged extension similar to ~1+1!+ Unfortunately, however, introducing more lags in nonparametric time series models has unpleasant consequences, more so than in
the parametric approach+ As is well known, smoothing methods in high dimensions suffer from a slower convergence rate—the “curse of dimensionality+”
Under twice differentiability of m~{!, the optimal rate is n220~41d ! , which gets
rapidly worse with dimension+ In high dimensions it is also difficult to describe
graphically the function m+
The additive structure has been proposed as a useful way to circumvent
these problems in multivariate smoothing+ By assuming the target function
to be a sum of functions of covariates, say, m~ yt21 , yt22 , + + + , yt2d ! 5 cm 1
( da51 m a ~ yt2a !, we can effectively reduce the dimensionality of a regression
problem and improve the implementability of multivariate smoothing up to that
of the one-dimensional case+ Stone ~1985, 1986! shows that it is possible to
estimate m a ~{! and m~{! with the one-dimensional optimal rate of convergence—
for example, n 205 for twice differentiable functions—regardless of d+ The estimates are easily illustrated and interpreted+ For these reasons, since the 1980s,
additive models have been fundamental to nonparametric regression among both
econometricians and statisticians+ Regarding the estimation method for achieving the one-dimensional optimal rate, the literature suggests two different
approaches: backfitting and marginal integration+ The former, originally suggested by Breiman and Friedman ~1985!, Buja, Hastie, and Tibshirani ~1989!,
and Hastie and Tibshirani ~1987, 1990!, is to execute iterative calculations of
one-dimensional smoothing until some convergence criterion is satisfied+ Though
appealing to our intuition, the statistical properties of backfitting algorithm were
not clearly understood until the very recent works by Opsomer and Ruppert
~1997! and Mammen, Linton, and Nielsen ~1999!+ They develop specific ~linear! backfitting procedures and establish the geometric convergence of their
algorithms and the pointwise asymptotic distributions under some conditions+
However, one disadvantage of these procedures is the time-consuming iterations required for implementation+ Also, the proofs for the linear case cannot
be easily generalized to nonlinear cases such as generalized additive models+
A more recent approach, called marginal integration ~MI!, is theoretically
more manipulable—its statistical properties are easy to derive, because it simply uses averaging of multivariate kernel estimates+ Developed independently
by Newey ~1994!, Tjøstheim and Auestadt ~1994!, and Linton and Nielsen
~1995!, its simplicity inspired subsequent applications such as Linton, Wang,
Chen, and Härdle ~1995! for transformation models and Linton, Nielsen, and
van de Geer ~2003! for hazard models with censoring+ In the time series mod-
LIVE METHOD FOR VOLATILITY
1097
els that are special cases of ~1+1! and ~1+2! with Fm being the identity, Chen and
Tsay ~1993a, 1993b! and Masry and Tjøstheim ~1997! apply backfitting and
MI, respectively, to estimate the conditional mean function+ Mammen et al+
~1999! provide useful results for the same type of models by improving the
previous backfitting method with some modification and successfully deriving
the asymptotic properties under weak conditions+ The separability assumption
is also used in volatility estimation by Yang, Härdle, and Nielsen ~1999!, where
the nonlinear ARCH model is of additive mean and multiplicative volatility in
the form of
d
yt 5 cm 1
(
a51
S
m a ~ yt2a ! 1 cv
)
a51
D
102
d
va ~ yt2a !
«t +
(1.5)
To estimate ~1+5!, they rely on marginal integration with local linear fits as a
pilot estimate and derive asymptotic properties+
This paper features two contributions to the additive literature+ The first concerns theoretical development of a new estimation tool called the local instrumental variable estimator for the components of additive models ~LIVE for
CAM!, which was outlined for simple additive cross-sectional regression in
Kim, Linton, and Hengartner ~1999!+ The novelty of the procedure lies in the
simple definition of the estimator based on univariate smoothing combined
with new kernel weights+ That is, adjusting kernel weights via conditional density of the covariate enables a univariate kernel smoother to estimate consistently the corresponding additive component function+ In many respects, the
new estimator preserves the good properties of univariate smoothers+ The instrumental variable method is analytically tractable for asymptotic theory: it is
shown to attain the optimal one-dimensional rate+ Furthermore, it is computationally more efficient than the two existing methods ~backfitting and MI! in
the sense that it reduces the computations by a factor of n smoothings+ The
other contribution relates to the general coverage of the model we work with+
The model in ~1+1!–~1+3! extends ARCH models to a generalized additive framework where both the mean and variance functions are additive after some known
transformation ~see Hastie and Tibshirani, 1990!+ All the time series models in
our previous discussion are regarded as a subclass of the data generating process for $ yt % in ~1+1!–~1+3!+ For example, setting Gm to be an identity and Gv
a logarithmic function reduces our model to ~1+5!+ Similar efforts to apply transformation have been made in parametric ARCH models+ Nelson ~1991! considers a model for the log of the conditional variance—the exponential
~G!ARCH class—to embody the multiplicative effects of volatility+ It has also
been argued to use the Box–Cox transformation for volatility, which is intermediate between linear and logarithm and which allows nonseparable news
impact curves+ Because it is hard to tell a priori which structure of volatility is
more realistic and it should be determined by real data, our generalized additive model provides useful flexible specifications for empirical work+ Addi-
1098
WOOCHEOL KIM AND OLIVER LINTON
tionally, from the perspective of potential misspecification problems, the
transformation used here alleviates the restriction imposed by the additivity
assumption, which increases the approximating power of our model+ Note that
when the lagged variables in ~1+1!–~1+3! are replaced by different covariates
and the observations are independent and identically distributed ~i+i+d+!, the
model becomes the cross-sectional additive model studied by Linton and Härdle ~1996!+ Finally, we also consider more efficient estimation along the lines
of Linton ~1996, 2000!+
The rest of the paper is organized as follows+ Section 2 describes the main
estimation idea in a simple setting+ In Section 3, we define the estimator for the
full model+ In Section 4 we give our main results, including the asymptotic
normality of our estimators+ Section 5 discusses more efficient estimation+ Section 6 reports a small Monte Carlo study+ The proofs are contained in the
Appendix+
2. NONPARAMETRIC INSTRUMENTAL VARIABLES: THE MAIN IDEA
This section explains the basic idea behind the instrumental variable method
and defines the estimation procedure+ For ease of exposition, this will be carried out using an example of simple additive models with i+i+d+ data+ We then
extend the definition to the generalized additive ARCH case in ~1+1!–~1+3!+
Consider a bivariate additive regression model for i+i+d+ data ~ y, X 1 , X 2 !,
y 5 m 1 ~X 1 ! 1 m 2 ~X 2 ! 1 «,
where E~«6 X ! 5 0 with X 5 ~X 1 , X 2 ! and the components satisfy the identification conditions E @m a ~X a !# 5 0, for a 5 1,2 ~the constant term is assumed to
be zero, for simplicity!+ Letting h 5 m 2 ~X 2 ! 1 «, we rewrite the model as
y 5 m 1 ~X 1 ! 1 h,
(2.6)
which is a classical example of “omitted variable” regression+ That is, although
~2+6! appears to take the form of a univariate nonparametric regression model,
smoothing y on X 1 will incur a bias due to the omitted variable h, because h
contains X 2 , which in general depends on X 1 + One solution to this is suggested
by the classical econometric notion of instrumental variable+ That is, we look
for an instrument W such that
E~W6 X 1 ! Þ 0;
E~Wh6 X 1 ! 5 0
(2.7)
with probability one+2 If such a random variable exists, we can write
m 1~ x 1 ! 5
E~Wy6 X 1 5 x 1 !
+
E~W6 X 1 5 x 1 !
(2.8)
This suggests that we estimate the function m 1~{! by nonparametric smoothing
of Wy on X 1 and W on X 1 + In parametric models the choice of instrument is
LIVE METHOD FOR VOLATILITY
1099
usually not obvious and requires some caution+ However, our additive model
has a natural class of instruments—p2 ~X 2 !0p~X ! times any measurable function of X 1 will do, where p~{!, p1~{!, and p2 ~{! are the density functions of the
covariates X, X 1 , and X 2 , respectively+ It follows that
E~Wy6 X 1 !
5
E~W6 X 1 !
5
E
E
W~X !m~X !
E
p~X !
dX 2
p1 ~X 1 !
p~X !
dX 2
W~X !
p1 ~X 1 !
m~X !p2 ~X 2 ! dX 2
E
5
p2 ~X 2 ! dX 2
E
5
E
W~X !m~X !p~X ! dX 2
E
W~X !p~X ! dX 2
m~X !p2 ~X 2 ! dX 2
as required+ This formula shows what the instrumental variable estimator is estimating when m is not additive—an average of the regression function over the
X 2 direction, exactly the same as the target of the marginal integration estimator+ For simplicity we will take
W~X ! 5
p2 ~X 2 !
p~X !
(2.9)
throughout+3
Up to now, it was implicitly assumed that the distributions of the covariates
are known a priori+ In practice, this is rarely true, and we have to rely on estimates of these quantities+ Let p~{!,
[
p[ 1~{!, and p[ 2 ~{! be kernel estimates of the
densities p~{!, p1~{!, and p2 ~{!, respectively+ Then the feasible procedure is defined
[ ! and
with a replacement of the instrumental variable W by WZ 5 p[ 2 ~X 2 !0p~X
taking sample averages instead of population expectations+ Section 3 provides
a rigorous statistical treatment for feasible instrumental variable estimators based
on local linear estimation+ See Kim et al+ ~1999! for a slightly different approach+
Next, we come to the main advantage that the local instrumental variable
method has+ This is in terms of the computational cost+ The marginal integration method actually needs n 2 regression smoothings evaluated at the pairs
~ X 1i , X 2j !, for i, j 5 1, + + + , n, whereas the backfitting method requires nr
operations—where r is the number of iterations to achieve convergence+ The
instrumental variable procedure, in contrast, takes at most 2n operations of kernel smoothings in a preliminary step for estimating the instrumental variable
and another n operations for the regressions+ Thus, it can be easily combined
with the bootstrap method whose computational costs often become prohibitive
in the case of marginal integration ~see Kim et al+, 1999!+
1100
WOOCHEOL KIM AND OLIVER LINTON
Finally, we show how the instrumental variable approach can be applied to
generalized additive models+ Let F~{! be the inverse of a known link function
G~{! and let m~X ! 5 E~ y6 X !+ The model is defined as
y 5 F~m 1 ~X 1 ! 1 m 2 ~X 2 !! 1 «,
(2.10)
or equivalently G~m~X !! 5 m 1~X 1 ! 1 m 2 ~X 2 !+ We maintain the same identification condition, E @m a ~X a !# 5 0+ Unlike in the simple additive model, there is
no direct way to relate Wy to m 1~X 1 ! here, so ~2+8! cannot be implemented+
However, under additivity
m 1 ~X 1 ! 5
E @WG~m~X !!6 X 1 #
E @W6 X 1 #
(2.11)
for the W defined in ~2+9!+ Because m~{! is unknown, we need consistent estimates of m~X ! in a preliminary step, and then the calculation in ~2+11! is feasible+ In the next section we show how these ideas are translated into estimators
for the general time series setting+
3. INSTRUMENTAL VARIABLE PROCEDURE FOR GANARCH
We start with some simplifying notations that will be used repeatedly in the
discussion that follows+ Let x t be the vector of d lagged variables until t 2 1,
that is, x t 5 ~ yt21 , + + + , yt2d !, or concisely, x t 5 ~ yt2a , ryt2a !, where ryt2a 5
d
~ yt21 , + + + , yt2a21 , yt2a11 , + + + , yt2d !+ Defining m at ~ ryt2a ! 5 ( b51,Þa
m b ~ yt2b !
d
and vat ~ ryt2a ! 5 ( b51,Þa
vb ~ yt2b !, we can reformulate ~1+1!–~1+3! with a focus
on the ath components of the mean and variance as
yt 5 m~ x t ! 1 v 102 ~ x t !«t ,
m~ x t ! 5 Fm ~cm 1 m a ~ yt2a ! 1 m at ~ ryt2a !!,
v~ x t ! 5 Fv ~cv 1 va ~ yt2a ! 1 vat ~ ryt2a !!+
To save space we will use the following abbreviations for the functions to be
estimated:
Ha ~ yt2a ! [ @m a ~ yt2a !, va ~ yt2a !# Á,
c [ @cm , cv # Á,
Hat ~ ryt2a ! [ @m at ~ ryt2a !, vat ~ ryt2a !# Á,
rt [ H~ x t ! 5 @Gm ~m~ x t !!, Gv ~v~ x t !!# Á ,
wa ~ ya ! 5 c 1 Ha ~ ya !+
Note that the components @m a ~{!, va ~{!# Á are identified, up to constant c, by
wa ~{!, which will be our major interest in estimation+ Subsequently, we examine in some detail each relevant step for computing the feasible nonparametric
instrumental variable estimator of wa ~{!+ The set of observations is given by
'
Y 5 $ yt % nt51 , where n ' 5 n 1 d+
LIVE METHOD FOR VOLATILITY
1101
3.1. Step I. Preliminary Estimation of rt = H ( x t )
Because rt is unknown, we start with computing the pilot estimates of the regression surface by a local linear smoother+ Let m~
K x! be the first component of
~ a,I bD Á ! Á that solves
n'
min
(
a, b t5d11
Kh ~ x t 2 x!$ yt 2 a 2 b Á ~ x t 2 x!% 2,
(3.12)
where Kh ~ x! 5 P di51 K~ x i 0h!0h d, K is a one-dimensional kernel function, and
h 5 h~n! is a bandwidth sequence+ In a similar way, we get the estimate of the
volatility surface, v~{!,
I
from ~3+12! by replacing yt with the squared residuals,
K x t !! 2 + Then, transforming mK and vI by the known links will lead
«I t2 5 ~ yt 2 m~
to consistent estimates of rI t ,
rI t 5 H~
E x t ! 5 @Gm ~ m~
K x t !!, Gv ~ v~
I x t !!# Á+
3.2. Step II: Instrumental Variable Estimation
of Additive Components
This step involves the estimation of wa ~{!, which is equivalent to @m a ~{!, va ~{!# Á ,
up to the constant c+ Let p~{! and pat ~{! denote the density functions of the random variables ~ yt2a , ryt2a ! and ryt2a , respectively+ Define the feasible instrument as
WZ t 5
p[ at ~ ryt2a !
p~
[ yt2a , ryt2a !
,
[
are computed using the kernel function L~{!, for example,
where p[ at ~{! and p~{!
n
d
p~
[ x! 5 ( t51 ) i51 L g ~ x it 2 x i !0n with L g ~{! [ L~{0g!0g and g 5 g~n! is a bandwidth sequence+ The instrumental variable local linear estimates w[ a ~ ya ! are given
as ~a 1 , a 2 ! Á through minimizing the localized squared errors elementwise:
n'
min
(
a j , bj t5d11
Kh ~ yt2a 2 ya ! WZ t $ rI jt 2 a j 2 bj ~ yt2a 2 ya !% 2,
(3.13)
where rI jt is the jth element of rI t +4 The closed form of the solution is
Á
Á
w[ a ~ ya ! Á 5 e1Á ~Y2
KY2 !21 Y2
KR,
E
(3.14)
where e1 5 ~1,0! Á, Y2 5 @i,Y2 # , K 5 diag@Kh ~ yd112a 2 ya ! WZ d11 , + + + ,
Kh ~ yn ' 2a 2 ya ! WZ n ' # , and RE 5 ~ rI d11 , + + + , rI n ' ! Á , with i 5 ~1, + + + ,1! Á and Y2 5
~ yd112a 2 ya , + + + , yn ' 2a 2 ya ! Á+
1102
WOOCHEOL KIM AND OLIVER LINTON
4. MAIN RESULTS
Let Fba be the s-algebra of events generated by $ yt % ba and a~k! the strong mixing coefficient of $ yt % that is defined by
a~k! [
sup
0
A[F2`
, B [ Fk
`
6P~A ù B! 2 P~A!P~B!6+
Throughout the paper, we make the following assumptions+
Assumption A+
A1+ $ yt % `
t51 is a stationary and strongly mixing process generated by ~1+1!–
`
~1+3!, with a mixing coefficient such that ( k50 k a $a~k!% 1220n , `, for some
n . 2 and 0 , a , ~1 2 20n!+
As pointed out by Masry and Tjøstheim ~1997!, the condition on the mixing
coefficient in A1 is milder than assumed on the standard mixing process where
the coefficient decreases at a geometric rate, that is, a~k! 5 r2bk ~for some
b . 0!+ Some technical conditions for regularity are stated here+ For simplicity,
we assume that the process $ yt % `
t51 has a compact support+
A2+ The additive component functions, m a ~{! and va ~{!, for a 5 1, + + + , d, are
continuous and twice differentiable on the compact support+
A3+ The link functions, Gm and Gv , have bounded continuous second-order
derivatives over any compact interval+
A4+ The joint and marginal density functions, p~{!, pat ~{!, and pa ~{!, for a 5
1, + + + , d, are continuous, twice differentiable with bounded ~partial! derivatives,
and bounded away from zero on the compact support+
A5+ The kernel functions, K~{! and L~{!, are a real bounded nonnegative symmetric ~around zero! function on a compact support satisfying * K~u! du 5
*L~u! du 5 1, *uK~u! du 5 *uL~u! du 5 0+ Also, assume that the kernel functions are Lipschitz-continuous, 6K~u! 2 K~v!6 # C6u 2 v6+
A6+ ~i! g r 0, ng d r `, ~log n! 2 M h0M n g d r 0+ ~ii! h r 0, ~log n! 2Y
M nh 2d21 r 0+ ~iii! The bandwidth satisfies M n0h a~t~n!! r 0, where $t~n!%
is a sequence of positive integers, t~n! r `, such that t~n! 5 o~M nh!+
Conditions A2–A5 are standard in kernel estimation+ The continuity assumption in A2 and A4, together with the compact support, implies that the functions are bounded+ The bandwidth conditions in A6~i! and A6~ii! are necessary
for showing negligibility of the stochastic error terms arising from the preliminary estimation of m, v, and pa ~{!+ Under twice-differentiability of these functions as in A2–A4, the given side conditions are satisfied when d # 4+ Our
asymptotic results that follow can be extended into a more general case of
LIVE METHOD FOR VOLATILITY
1103
d . 4, although we do not prove it in the paper+ One way of extension to
higher dimensions is to strengthen the differentiability conditions in A2–A4 and
use higher order polynomials ~see Kim et al+, 1999!+ The additional bandwidth
condition in A6~iii! is necessary to control the effects from the dependence
of the mixing processes in showing the asymptotic normality of instrumental variable estimates+ The proof of consistency, however, does not require
this condition+ Define D 2 f ~ x 1 , + + + , x d ! 5 ( dl51 ] 2 f ~ x l !0] 2 x and @¹Gm ~t !,
¹ Gv ~t !# 5 @dGm ~t !0dt, dGv ~t !0dt # + Let ~K * K !i ~u! 5 * K ~w!K ~w 1 u! 3
2
5 *~K * K !0 ~u!u 2 du, and 7K7 22
w i dw, a convolution of kernel functions, m K*K
2
denote *K ~u! du+ The asymptotic properties of the feasible instrumental variable estimates in ~3+14! are summarized in the following theorem, whose proof
is in the Appendix+ Let k3 ~ ya , z at ! 5 E @«t3 6 x t 5 ~ ya , z at !# and k4 ~ ya , z at ! 5
E @~«t2 2 1! 2 6 x t 5 ~ ya , z at !# + A ( B denotes the matrix Hadamard product+
THEOREM 1+ Assume that conditions A1–A6 hold. Then,
M nh@ w[ a ~ ya ! 2 wa ~ ya ! 2 Ba #
Ba ~ ya ! 5
d
&
N @0, S *a ~ ya !# ,
where
h2 2 2
m K D wa ~ ya !
2
1
h2
2
E
2
@m K*K
D 2 wa ~ ya ! 1 m 2K D 2 wat ~z at !#
( @¹Gm ~m~ ya , z at !!,¹Gv ~v~ ya , z at !!# Á pat ~z at ! dz at
1
g2 2
mK
2
S *a ~ ya ! 5 7K7 22
E
EF
D 2 pat ~z at ! 2
m a2t ~z at !
pa2t ~z at !
p~ ya , z at ! m at ~z at !vat ~z at !
1 7~K * K !0 7 22
3
F
F
G
pat ~z at !
D 2 p~ ya , z at ! Hat ~z at ! dz at ,
p~ ya , z at !
E
m at ~z at !vat ~z at !
va2t ~z at !
pa2t ~z at !
p~ ya , z at !
¹Gm ~m! 2 v
~¹Gm¹Gv !~k3 v 302 !
~¹Gm¹Gv !~k3 v 302 !
¹Gv ~v! 2 k4 v 2
G
G
dz at
~ ya , z at ! dz at +
Remarks+
1+ To estimate @m a ~ ya !, va ~ ya !# Á we can use the following recentered
estimates: w[ a ~ ya ! 2 c,[ where c[ 5 @ c[ m , c[ v # 5 ~10n!@ (t yt ,(t «I t2 # Á and
1104
WOOCHEOL KIM AND OLIVER LINTON
«I t 5 yt 2 m~
K x t !+ Because c[ 5 c 1 Op ~10M n !, the bias and variance of
@ m[ a ~ ya !, v[ a ~ ya !# Á are the same as those of w[ a ~ ya !+ For y 5 ~ y1 , + + + , yd !,
the estimates for the conditional mean and volatility are defined by
FF
F
d
@ m~
[ y!, v~
[ y!# [ Fm 2~d 2 1! c[ m 1
Fv 2~d 2 1! c[ v 1
G
GG
( w[ a1~ ya !
a51
d
( w[ a2 ~ ya !
a51
,
+
Let ¹F~ y! [ @¹Fm ~m~ y!!,¹Fv ~v~ y!!# Á + Then, by Theorem 1 and the delta
method, their asymptotic distribution satisfies
M nh@ m~
[ y! 2 m~ y! 2 bm ~ y!, v~
[ y! 2 v~ y! 2 bv ~ y!# Á
d
&
N @0, S * ~ y!# ,
where @bm ~ y!, bv ~ y!# Á 5 ¹F~ y! ( ( da51 Ba ~ ya ! and S * ~ y! 5 @¹F~ y! 3
¹F~ y! Á # ( @S *1 ~ y1 ! 1 + + + 1 S *d ~ yd !# + It is easy to see that w[ a ~ ya ! and
w[ b ~ yb ! are asymptotically uncorrelated for any a and b and that the asymptotic variance of their sum is also the sum of the variances of w[ a ~ ya ! and
w[ b ~ yb !+
2+ The first term of the bias is of standard form, depending only on the second derivatives as in other local linear smoothing+ The last term reflects
the biases from using estimates for density functions to construct the fea[ x t !+ When the instrument consistsible instrumental variable, p[ at ~ ryt2a !0p~
ing of known density functions, pat ~ ryt2a !0p~ x t !, is used in ~3+13!, the
asymptotic properties of instrumental variable estimates are the same as
those from Theorem 1 except that the new asymptotic bias now includes
only the first two terms of Ba ~ ya !+
3+ The convolution kernel ~K * K !~{! is the legacy of double smoothing
in the instrumental variable estimation of “generalized” additive
models because we smooth @Gm ~ mK ~{!!, Gv ~ vI ~{!!# with mK ~{! and vI ~{!
given by ~multivariate! local linear fits+ When Gm ~{! is the identity,
we can directly smooth y instead of Gm ~ m~
K x t !! to estimate the components of the conditional mean function+ Then, as the following
theorem shows, the second term of the bias of Ba does not arise, and
the convolution kernel in the variance is replaced by a usual kernel
function+
Suppose that Fm ~t ! 5 Fv ~t ! 5 t in ~1+2! and ~1+3!+ In this case, the instrumental variable estimates of wa ~ ya ! can be defined in a simpler way+ For wa ~ ya ! 5
@Ma ~ ya !,Va ~ ya !# 5 @cm 1 m a ~ ya !, cv 1 va ~ ya !# , we define @ MZ a ~ ya !, VZ a ~ ya !#
by the solution to the adjusted-kernel least squares in ~3+13! with the modification that the ~2 3 1! vector zI t is replaced by @ yt , «I t2 # Á , where «I t is given in
step I in Section 3+1+ Theorem 2 shows the asymptotic normality of these estimates+ The proof is almost the same as that of Theorem 1 and thus is omitted+
LIVE METHOD FOR VOLATILITY
1105
THEOREM 2+ Under the same conditions as Theorem 1,
(i)
d
M nh@ MZ a ~ ya ! 2 Ma ~ ya ! 2 bam #
2
bam ~ ya ! 5
&
N @0, sam ~ ya !# ,
h
g
m 2K D 2 m a ~ ya ! 1
m 2K
2
2
3
where
2
EF
E
D 2 pat ~z at ! 2
sam ~ ya ! 5 7K7 22
G
pat ~z at !
D 2 p~ ya , z at ! m at ~z at ! dz at ,
p~ ya , z at !
pa2t ~z at !
@m a2t ~z at ! 1 v~ ya , z at !# dz at
p~ ya , z at !
and
(ii)
M nh@ VZ a ~ ya ! 2 Va ~ ya ! 2 bav #
d
2
bav ~ ya ! 5
&
N @0, sav ~ ya !# ,
h
g
m 2K D 2 va ~ ya ! 1
m 2K
2
2
3
where
2
EF
E
D 2 pat ~z at ! 2
S va ~ ya ! 5 7K7 22
G
pat ~z at !
D 2 p~ ya , z at ! vat ~z at ! dz at ,
p~ ya , z at !
pa2t ~z at !
@va2t ~z at ! 1 k4 ~ ya , z at !v 2 ~ ya , z at !# dz at +
p~ ya , z at !
Although the instrumental variable estimators achieve the one-dimensional
optimal convergence rate, there is room for improvement in terms of variance+
For example, compared with the marginal integration estimators of Linton and
Härdle ~1996! or Linton and Nielsen ~1995!, the asymptotic variances of the
instrumental variable estimates for m 1~{! in Theorems 1 and 2 include an additional factor of m 22 ~{!+ This is because the instrumental variable approach treats
h 5 m 2 ~X 2 ! 1 « in ~2+6! as if it were the error term of the regression equation
for m 1~{!+ Note that the second term of the asymptotic covariance in Theorem 2
is the same as that in Yang et al+ ~1999!, where the authors only considered the
case with additive mean and multiplicative volatility functions+ The issue of
efficiency in estimating an additive component was first addressed by Linton
~1996! based on “oracle efficiency” bounds of infeasible estimators under the
knowledge of other components+ According to this, both instrumental variable
and marginal integration estimators are inefficient, but they can attain the efficiency bounds through one simple additional step, following Linton ~1996, 2000!
and Kim et al+ ~1999!+
5. MORE EFFICIENT ESTIMATION
5.1. Oracle Standard
In this section we define a standard of efficiency that could be achieved in the
presence of certain information, and then we show how to achieve this in prac-
1106
WOOCHEOL KIM AND OLIVER LINTON
tice+ There are several routes to efficiency here, depending on the assumptions
one is willing to make about «t + We shall take an approach based on likelihood,
that is, we shall assume that «t is i+i+d+ with known density function f like the
normal or t with given degrees of freedom+ It is easy to generalize this to the
case where f contains unknown parameters, but we shall not do so here+ It is
also possible to build an efficiency standard based on the moment conditions in
~1+1!–~1+3!+ We choose the likelihood approach because it leads to easy calculations and links with existing work and is the most common method for estimating parametric ARCH0GARCH models in applied work+
There are several standards that we could apply here+ First, suppose that we
know ~cm , $m b ~{! : b Þ a%! and ~cv , $va ~{! : a%!; then what is the best estimator
we can obtain for the function m a within the local polynomial paradigm? Similarly, suppose that we know ~cm , $m a ~{! : a%! and ~cv , $vb ~{! : b Þ a%!; then what
is the best estimator we can obtain for the function va ? It turns out that this
standard is very high and cannot be achieved in practice+ Instead we ask: suppose that we know ~cm , $m b ~{! : b Þ a%! and ~cv , $vb ~{! : b Þ a%!; then what is
the best estimator we can obtain for the functions ~m a , va !? It turns out that this
standard can be achieved in practice+ Let p denote 2log f ~{!, where f ~{! is the
density function of «t + We use z t to denote ~ x t , yt !, where x t 5 ~ yt21 , + + + , yt2d ! 5
~ yt2a , ryt2a !+ For u 5 ~ua , ub ! 5 ~a m , a v , bm , bv !, we define
l t* ~u, ga ! 5 l * ~z t ; u, ga ! 5 p
1
S
yt 2 Fm ~g1a ~ ryt2a ! 1 a m 1 bm ~ yt2a 2 ya !!
Fv102 ~g2a ~ ryt2a ! 1 a v 1 bv ~ yt2a 2 ya !!
D
1
log Fv ~g2a ~ ryt2a ! 1 a v 1 bv ~ yt2a 2 ya !!,
2
l t ~u, ga ! 5 l~z t ; u, ga ! 5 Kh ~ yt2a 2 ya !l * ~z t ; u, ga !,
(5.15)
where ga ~ ryt2a ! 5 ~g1a ~ ryt2a !, g2a ~ ryt2a !! 5 ~cm 1 m at ~ ryt2a !, cv 1 vat ~ ryt2a !! 5
~cm 1 ( dbÞa m b ~ yt2b !, cv 1 ( dbÞa vb ~ yt2b !!+ With l t ~u, ga ! being the ~negative! conditional local log likelihood, the infeasible local likelihood estimator
uZ 5 ~ a[ m , a[ v , bZ m , bZ v ! is defined by the minimizer of
n'
Qn ~u! 5
(
l t ~u, ga0 !,
t5d11
0
0
~{!, g2a
~{!! 5 ~cm0 1 m a0t ~{!, cv0 1 va0t ~{!!+ From the definition
where ga0 ~{! 5 ~g1a
for the score function
st* ~u, ga ! 5 s * ~z t ; u, ga ! 5
st ~u, ga ! 5 s~z t ; u, ga ! 5
]l * ~z t ; u, ga !
,
]u
]l~z t ; u, ga !
,
]u
LIVE METHOD FOR VOLATILITY
1107
the first-order condition for uZ is given by
'
1 n
0 5 sS n ~ u,Z ga0 ! 5
( st ~ u,Z ga0 !+
n t5d11
The asymptotic distribution of the local maximum likelihood estimator has been
studied by Avramidis ~2002!+ For y 5 ~ y1 , + + + , yd ! 5 ~ ya , rya !, define
E
E
Va 5 Va ~ ya ! 5
V~ y; u0 , ga0 !p~ y! d rya ;
Da 5 D~ ya ! 5
D~ y; u0 , ga0 !p~ y! d rya ,
where
V~ y; u, ga ! 5 E @s * ~z t ; u, ga !s * ~z t ; u, ga ! Á 6 x t 5 y#;
D~ y; u, ga ! 5 E~¹u st* ~z t ; u, ga !6 x t 5 y!+
With a minor generalization of the results by Avramidis ~2002, Theorem 2!,
we obtain the following asymptotic properties for the infeasible estimators:
w[ ainf ~ ya ! 5 @ m[ ainf ~ ya !, v[ ainf ~ ya !# Á 5 @ a[ m , a[ v # Á + Let wac ~ ya ! [ @m a ~ ya !, va ~ ya !# Á ,
that is, wac ~ ya ! 5 wa ~ ya ! 2 c, where c 5 ~cm , cv !+
THEOREM 3+ Under Assumption C in the Appendix, it holds that
M nh@ w[ ainf ~ ya ! 2 wac ~ ya ! 2 Ba #
d
&
N @0, V *a ~ ya !# ,
where Ba 5 _12 h 2 m 2K @m a'' ~ ya !, va'' ~ ya !# Á and V *a ~ ya ! 5 7K7 22 Da21 Va Da21.
A more specific form for the asymptotic variance can be calculated+ For example, suppose that the error density function, f ~{!, is symmetric+ Then, the asymptotic variance of the volatility function is given by
v22 ~ ya ! 5
E HE
FE HE
J
g 2 ~ y! f ~ y! dy ~¹Fv 0Fv ! 2 ~Gv ~v~ y!!!p~ y! d rya
J
q~ y! f ~ y! dy ~¹Fv 0Fv ! 2 ~Gv ~v~ y!!!p~ y! d rya
G
2
,
where g~ y! 5 f ' ~ y! f 21 ~ y!y 1 1 and q~ y! 5 @ y 2 f '' ~ y! f ~ y! 1 yf ' ~ y! f ~ y! 2
y 2 f ' ~ y! 2 # f 22 ~ y!+
When the error distribution is Gaussian, we can further simplify the asymptotic variance; that is,
v11 ~ ya ! 5
FE
FE
v22 ~ ya ! 5 2
v 21 ~ y!¹Fm2 ~Gm ~m~ y!!!p~ y! d rya
G
G
v 22 ~ y!¹Fv2 ~Gv ~v~ y!!!p~ y! d rya
21
;
21
+
v12 5 v21 5 0;
1108
WOOCHEOL KIM AND OLIVER LINTON
In this case, one can easily find the infeasible estimator to have lower asymptotic variance than the instrumental variable estimator+ To see this, we note
that ¹Gm 5 10¹Fm and 7K7 22 # 7~K * K !0 7 22 and apply the Cauchy–Schwarz
inequality to get
7~K *
K !0 7 22
E
FE
pa2t ~ rya !
p~ ya , rya !
$ 7K7 22
¹Gm ~m! 2 v~ ya , rya ! d rya
v 21 ~ ya , rya !¹Fm2 ~Gm ~m!!p~ ya , rya ! d rya
G
21
+
In a similar way, from k4 5 3 due to the Gaussianity assumption on «, it follows that
7~K * K !0 7 22 k4
$2
FE
E
pa2t ~z at !
¹Gv ~v! 2 v 2 ~ ya , yat ! dyat
p~ ya , z at !
v 22 ~ y!¹Fv2 ~Gv ~v~ y!!!p~ y! d rya
G
21
+
These, together with k3 5 0, imply that the second term of S *a ~ ya ! in Theorem 1 is greater than V *a ~ ya ! in the sense of positive definiteness, and hence
S *a ~ ya ! $ V *a ~ ya !, because the first term of S *a ~ ya ! is a nonnegative matrix+
The infeasible estimator is more efficient than the instrumental variable estimator because the former uses more information concerning the mean-variance
structure+ We finally remark that the infeasible estimator is also more efficient
that the marginal integration estimator in Yang et al+ ~1999! whose asymptotic
variance corresponds to the second term of S *a ~ ya !; see the discussion following Theorem 2+
5.2. Feasible Estimation
Let ~ cI m , $ mK b ~{! : b Þ a%! and ~ cI v , $ vI b ~{! : b Þ a%! be the estimators from ~3+12!
and ~3+13! in Section 3, with the common bandwidth parameter h 0 chosen for
the kernel function K~{!+ We define the feasible local likelihood estimator uZ * 5
~ a[ m* , a[ v* , bZ m* , bZ v* ! as the minimizers of
n'
(
QE n ~u! 5
l t ~u, gJ a !,
t5d11
where gJ a ~{! 5 ~ gJ 1a ~{!, gJ 2a ~{!! 5 ~ cI m 1 mK at ~{!, cI v 1 vI at ~{!! and l t ~{! is given by
~5+15!, with the additional bandwidth parameter h, possibly different from h 0 +
Then, the first-order condition for uZ * is given by
1
0 5 sS n ~ uZ , gJ a ! 5
n
*
n'
(
st ~ uZ *, gJ a !+
(5.16)
t5d11
Let w[ a* ~ ya ! 5 ~ m[ a* ~ ya !, v[ a* ~ ya !! Á 5 ~ a[ m* , a[ v* ! Á + We have the following result+
LIVE METHOD FOR VOLATILITY
1109
THEOREM 4+ Under Assumptions B and C in the Appendix, it holds that
M nh@ w[ a* ~ ya ! 2 w[ ainf ~ ya !#
p
&
0+
This result shows that the oracle efficiency bound is achieved by the twostep estimator+
6. NUMERICAL EXAMPLES
A small-scale simulation is carried out to investigate the finite-sample properties of both the instrumental variable and two-step estimators+ The design in
our experiment is additive nonlinear ARCH~2!:
yt 5 @0+2 1 v1 ~ yt21 ! 1 v2 ~ yt22 !#«t ,
v1 ~ y! 5 0+4FN ~62y6!@2 2 FN ~ y!# y 2,
v2 ~ y! 5 0+4$1YM 1 1 0+1y 2 1 ln~1 1 4y 2 ! 2 1%,
where FN ~{! is the ~cumulative! standard normal distribution function and «t is
i+i+d+ with N~0,1!+ Figure 1 ~solid lines! depicts the shapes of the volatility functions defined by v1~{! and v2 ~{!+ Based on the preceding model, we simulate
500 samples of ARCH processes with sample size n 5 500+ For each realization of the ARCH process, we apply the instrumental variable estimation procedure in ~3+13! with rI t 5 yt2 to get preliminary estimates of v1~{! and v2 ~{!+
Those estimates then are used to compute the two-step estimates of volatility
functions based on the feasible local maximum likelihood estimator in Section
5+2, under the normality assumption for the errors+ The infeasible oracle estimates are also provided for comparisons+ The Gaussian kernel is used for all
the nonparametric estimates, and bandwidths are chosen according to the rule
of thumb ~Härdle, 1990!, h 5 ch std~ yt !n210~41d ! , where std~ yt ! is the standard
deviation of yt + We fix ch 5 1 for both the density estimates ~for computing the
instruments, W ! and instrumental variable estimates in ~3+13! and ch 5 1+5 for
the ~feasible and infeasible! local maximum likelihood estimator+ To evaluate
the performance of the estimators, we calculate the mean squared error, together
with the mean absolute deviation error, for each simulated datum; for a 5 1,2,
ea, MSE 5
H
ea, MAE 5
1
50
1
50
50
@va ~ yi ! 2 v[ a ~ yi !# 2
(
i51
J
102
,
50
( 6va ~ yi ! 2 v[ a ~ yi !6,
i51
where $ y1 , + + , y50 % are grid points on @21,1!+ The grid range covers about 70%
of the observations on average+ Table 1 gives averages of ea, MSE ’s and ea, MAE ’s
from 500 repetitions+
1110
Figure 1. Averages of volatility estimates ~demeaned!: ~a! first lag; ~b! second lag+
1111
Figure 2. Volatility estimates ~demeaned!+
1112
WOOCHEOL KIM AND OLIVER LINTON
Table 1. Averages MSE and MAE for three volatility
estimators
Oracle est+
IV est+
Two-step
e1, MSE
e2, MSE
e1, MAE
e2, MAE
0+07636
0+08017
0+08028
0+08310
0+11704
0+08524
0+06049
0+06660
0+06372
0+06816
0+09725
0+07026
Table 1 shows that the infeasible oracle estimator is the best out of the three,
as would be expected+ The performance of the instrumental variable estimator
seems to be reasonably good, compared to the local maximum likelihood estimators, at least in estimating the volatility function of the first lagged variable+
However, the overall accuracy of the instrumental variable estimates is improved
by the two-step procedure, which behaves almost as well as the infeasible one,
confirming our theoretical results in Theorem 4+ For more comparisons, Figure 1 shows the averaged estimates of volatility functions, where the averages
are made, at each grid, over 500 simulations+ In Figure 2, we also illustrate the
estimates for three typical ~consecutive! realizations of ARCH processes+
NOTES
1+ The extension to allow the F transformations to be of unknown functional form is considerably more complicated; see Horowitz ~2001!+
2+ Note the contrast with the marginal integration or projection method+ In this approach one
defines m 1 by some unconditional expectation
m 1 ~ x 1 ! 5 E @m~ x 1 , X 2 !W~X 2 !#
for some weighting function W that depends only on X 2 and that satisfies
E @W~X 2 !# 5 1;
E @W~X 2 !m 2 ~X 2 !# 5 0+
3+ If instead we take
W~X ! 5
p1 ~X 1 !p2 ~X 2 !
p~X !
,
this satisfies E~W6 X 1 ! 5 1 and E~Wh6 X 1 ! 5 0+ However, the term p1~X 1 ! cancels out of the expression and is redundant+
4+ For simplicity, we choose the common bandwidth parameter for the kernel function K~{! in
~3+12! and ~3+13!, which amounts to undersmoothing ~for our choice of h! for the purposes of
estimating m+ Undersmoothing in the preliminary estimation of step I allows us control over the
biases from estimating m and v+ In addition, the convolution kernel function in the asymptotic
variance of Theorem 1 relies on the condition of the same bandwidth for K~{!+
LIVE METHOD FOR VOLATILITY
1113
REFERENCES
Andrews, D+W+K+ ~1994! Empirical process methods in econometrics+ In R+F+ Engle & D+ McFadden ~eds+!, Handbook of Econometrics, vol+ IV, pp+ 2247–2294+ North-Holland+
Auestadt, B+ & D+ Tjøstheim ~1990! Identification of nonlinear time series: First order characterization and order estimation+ Biometrika 77, 669– 687+
Avramidis, P+ ~2002! Local maximum likelihood estimation of volatility function+ Manuscript, LSE+
Breiman, L+ & J+H+ Friedman ~1985! Estimating optimal transformations for multiple regression
and correlation ~with discussion!+ Journal of the American Statistical Association 80, 580– 619+
Buja, A+, T+ Hastie, & R+ Tibshirani ~1989! Linear smoothers and additive models ~with discussion!+ Annals of Statistics 17, 453–555+
Cai, Z+ & E+ Masry ~2000! Nonparametric estimation of additive nonlinear ARX time series: Local
linear fitting and projections+ Econometric Theory 16, 465–501+
Carrasco, M+ & X+ Chen ~2002! Mixing and moment properties of various GARCH and stochastic
volatility models+ Econometric Theory 18, 17–39+
Chen, R+ ~1996! A nonparametric multi-step prediction estimator in Markovian structures+ Statistical Sinica 6, 603– 615+
Chen, R+ & R+S+ Tsay ~1993a! Nonlinear additive ARX models+ Journal of the American Statistical
Association 88, 955–967+
Chen, R+ & R+S+ Tsay ~1993b! Functional-coefficient autoregressive models+ Journal of the American Statistical Association 88, 298–308+
Engle, R+F+ ~1982! Autoregressive conditional heteroscedasticity with estimates of the variance of
U+K+ inflation+ Econometrica 50, 987–1008+
Fan, J+ & Q+ Yao ~1996! Efficient estimation of conditional variance functions in stochastic regression+ Biometrica 85, 645– 660+
Gozalo, P+ & O+ Linton ~2000! Local nonlinear least squares: Using parametric information in nonparametric regression+ Journal of Econometrics 99~1!, 63–106+
Hall, P+ & C+ Heyde ~1980! Martingale Limit Theory and Its Application. Academic Press+
Härdle, W+ ~1990! Applied Nonparametric Regression. Econometric Monograph Series 19+ Cambridge University Press+
Härdle, W+ & A+B+ Tsybakov ~1997! Locally polynomial estimators of the volatility function+ Journal of Econometrics 81, 223–242+
Härdle, W+, A+B+ Tsybakov, & L+ Yang ~1998! Nonparametric vector autoregression+ Journal of
Statistical Planning and Inference 68~2!, 221–245+
Härdle, W+ & P+ Vieu ~1992! Kernel regression smoothing of time series+ Journal of Time Series
Analysis 13, 209–232+
Hastie, T+ & R+ Tibshirani ~1990! Generalized Additive Models. Chapman and Hall+
Hastie, T+ & R+ Tibshirani ~1987! Generalized additive models: Some applications+ Journal of the
American Statistical Association 82, 371–386+
Horowitz, J+ ~2001! Estimating generalized additive models+ Econometrica 69, 499–513+
Jones, M+C+, S+J+ Davies, & B+U+ Park ~1994! Versions of kernel-type regression estimators+ Journal of the American Statistical Association 89, 825–832+
Kim, W+, O+ Linton, & N+ Hengartner ~1999! A computationally efficient oracle estimator of additive nonparametric regression with bootstrap confidence intervals+ Journal of Computational and
Graphical Statistics 8, 1–20+
Linton, O+B+ ~1996! Efficient estimation of additive nonparametric regression models+ Biometrika
84, 469– 474+
Linton, O+B+ ~2000! Efficient estimation of generalized additive nonparametric regression models+
Econometric Theory 16, 502–523+
Linton, O+B+ & W+ Härdle ~1996! Estimating additive regression models with known links+ Biometrika
83, 529–540+
1114
WOOCHEOL KIM AND OLIVER LINTON
Linton, O+B+ & J+ Nielsen ~1995! A kernel method of estimating structured nonparametric regression based on marginal integration+ Biometrika 82, 93–100+
Linton, O+B+, J+ Nielsen, & S+ van de Geer ~2003! Estimating multiplicative and additive hazard
functions by kernel methods+ Annals of Statistics 31, 464– 492+
Linton, O+B+, N+ Wang, R+ Chen, & W+ Härdle ~1995! An analysis of transformation for additive
nonparametric regression+ Journal of the American Statistical Association 92, 1512–1521+
Mammen, E+, O+B+ Linton, & J+ Nielsen ~1999! The existence and asymptotic properties of a backfitting projection algorithm under weak conditions+ Annals of Statistics 27, 1443–1490+
Masry, E+ ~1996! Multivariate local polynomial regression for time series: Uniform strong consistency and rates+ Journal of Time Series Analysis 17, 571–599+
Masry, E+ & D+ Tjøstheim ~1995! Nonparametric estimation and identification of nonlinear ARCH
time series: Strong convergence and asymptotic normality+ Econometric Theory 11, 258–289+
Masry, E+ & D+ Tjøstheim ~1997! Additive nonlinear ARX time series and projection estimates+
Econometric Theory 13, 214–252+
Nelson, D+B+ ~1991! Conditional heteroskedasticity in asset returns: A new approach+ Econometrica 59, 347–370+
Newey, W+K+ ~1994! Kernel estimation of partial means+ Econometric Theory 10, 233–253+
Opsomer, J+D+ & D+ Ruppert ~1997! Fitting a bivariate additive model by local polynomial regression+ Annals of Statistics 25, 186–211+
Pollard, D+ ~1990! Empirical Processes: Theory and Applications. CBMS Conference Series in
Probability and Statistics, vol+ 2+ Institute of Mathematical Statistics+
Robinson, P+M+ ~1983! Nonparametric estimation for time series models+ Journal of Time Series
Analysis 4, 185–208+
Silverman, B+W+ ~1986! Density Estimation for Statistics and Data Analysis. Chapman and Hall+
Stein, E+M+ ~1970! Singular Integrals and Differentiability Properties of Functions. Princeton University Press+
Stone, C+J+ ~1985! Additive regression and other nonparametric models+ Annals of Statistics 13,
685–705+
Stone, C+J+ ~1986! The dimensionality reduction principle for generalized additive models+ Annals
of Statistics 14, 592– 606+
Tjøstheim, D+ & B+ Auestad ~1994! Nonparametric identification of nonlinear time series: Projections+ Journal of the American Statistical Association 89, 1398–1409+
Volkonskii, V+ & Y+ Rozanov ~1959! Some limit theorems for random functions+ Theory of Probability and Applications 4, 178–197+
Yang, L+, W+ Härdle, & J+ Nielsen ~1999! Nonparametric autoregression with multiplicative volatility and additive mean+ Journal of Time Series Analysis 20, 579– 604+
Ziegelmann, F+ ~2002! Nonparametric estimation of volatility functions: The local exponential estimator+ Econometric Theory 18, 985–992+
APPENDIX
A.1. Proofs for Section 4. The proof of Theorem 1 consists of three steps+ Without loss of generality we deal with the case a 5 1; here we will use the subscript 2, for
expositional convenience, to denote the nuisance direction+ That is, p2 ~ ryk21 ! 5 p1v ~ ryk21 !
in the case of density function+ For component functions, m 2 ~ ryk21 !, v2 ~ ryk21 !, and
H2 ~ ryk21 ! will be used instead of m 1v ~ ryk21 !, v1v ~ ryk21 !, and H1v ~ ryk21 !, respectively+ We
start by decomposing the estimation errors, w[ 1~ y1 ! 2 w1~ y1 !, into the main stochastic
term and bias+ Use X n . Yn to denote X n 5 Yn $1 1 op ~1!% in the following+ Let vec~X !
denote the vectorization of the elements of the matrix X along with columns+
LIVE METHOD FOR VOLATILITY
1115
Proof of Theorem 1.
Step I. Decompositions and Approximations. Because w[ 1~ y1 ! is a column vector,
the vectorization of equation ~3+14! gives
Á
Á
w[ 1 ~ y1 ! 5 @I2 J e1Á ~Y2
KY2 !21 # ~I2 J Y2
K!vec~ R!+
E
A similar form is obtained for the true function, w1~ y1 !,
Á
Á
@I2 J e1Á ~Y2
KY2 !21 # ~I2 J Y2
K!vec~iw1Á ~ y1 ! 1 Y2¹w1Á ~ y1 !!,
by the identity
Á
Á
w1 ~ y1 ! 5 vec$e1Á ~Y2
KY2 !21 Y2
K@iw1Á ~ y1 ! 1 Y2¹w1Á ~ y1 !#%,
because
Á
Á
e1Á ~Y2
KY2 !21 Y2
Ki 5 1,
Á
Á
e1Á ~Y2
KY2 !21 Y2
KY2 5 0+
Á
KY2 Dh21 , the estimation errors are
By defining Dh 5 diag~1, h! and Qn 5 Dh21 Y2
w[ 1 ~ y1 ! 2 w1 ~ y1 ! 5 @I2 J e1Á Qn21 #tn ,
where
Á
tn 5 ~I2 J Dh21 Y2
K!vec@ RE 2 iw1Á ~ y1 ! 2 Y2¹w1Á ~ y1 !# +
Observing
tn 5
S
n'
1
( KhWZ ~ yk21 2 y1 !@ rI k 2 w1~ y1 ! 2 ~ yk21 2 y1 !¹w1~ y1 !# J
n k5d11
1,
k
yk21 2 y1
h
D
Á
,
where KhWZ k ~ y! 5 Kh ~ y! WZ k , it follows by adding and subtracting rk 5 w1~ yk21 ! 1
H2 ~ ryk21 ! that
tn 5
n'
1
( KhWZ ~ yk21 2 y1 !@ rI k 2 rk 1 H2 ~ ryk21 !# J
n k5d11
1
k
S
1,
yk21 2 y1
h
D
Á
n'
1
n
S
(
KhWZ k ~ yk21 2 y1 !@w1 ~ yk21 ! 2 w1 ~ y1 ! 2 ~ yk21 2 y1 !¹w1 ~ y1 !#
k5d11
J 1,
yk21 2 y1
h
D
Á
+
1116
WOOCHEOL KIM AND OLIVER LINTON
As a result of the boundedness condition in Assumption A2, the Taylor expansion applied
to @Gm ~ m~
K x k !!, Gv ~ v~
I x k !!# at @m~ x k !, v~ x k !# yields the first term of tn as
tI n [
n'
1
( KhWZ ~ yk21 2 y1 !
n k5d11
k
F S
uI k J 1,
yk21 2 y1
h
DG
Á
,
where uI k [ rI k1 1 rI k2 1 H2 ~ ryk21 !,
K x k ! 2 m~ x k !# ,¹Gv ~v~ x k !!@ v~
I x k ! 2 v~ x k !#% Á,
rI k1 [ $¹Gm ~m~ x k !!@ m~
rI k2 [
1
$D 2 Gm ~m * ~ x k !!@ m~
K x k ! 2 m~ x k !# 2, D 2 Gv ~v * ~ x k !!@ v~
I x k ! 2 v~ x k !# 2 % Á,
2
K x k !@ v~
I x k !# and m~ x k !@v~ x k !, respectively#+ In a simand m * ~ x k !@v * ~ x k !# is between m~
ilar way, the Taylor expansion of w1~ yk21 ! at y1 gives the second term of tn as
s0n 5
h2 1
2
n'
( KhWZ ~ yk21 2 y1 !
n k5d11
k
S
yk21 2 y1
h
DF
2
S
D 2 w1 ~ y1 ! J 1,
yk21 2 y1
h
DG
Á
3 ~1 1 op ~1!!+
The term tI n continues to be simplified by some further approximations+ Define the
marginal expectation of estimated density functions p[ 2 ~{! and p~{!
[
as follows:
p~
T yk21 , ryk22 ! [
pT 2 ~ ryk22 ! [
E
E
L g ~z 1 2 yk21 !L g ~z 2 2 ryk22 !p~z 1 , z 2 ! dz 1 dz 2 ,
L g ~z 2 2 ryk22 !p2 ~z 2 ! dz 2 +
In the first approximation, we replace the estimated instrument, W,
Z by the ratio of the
expectations of the kernel density estimates, pT 2 ~ ryk21 !0p~
T x k ! and deal with the linear
terms in the Taylor expansions+ That is, tI n is approximated with an error of op ~1YM nh!
by t1n 1 t2n :
t1n [
t2n [
1
n'
( Kh ~ yk21 2 y1 !
n k5d11
1
n'
( Kh ~ yk21 2 y1 !
n k5d11
pT 2 ~ ryk21 !
p~
T xk !
pT 2 ~ ryk21 !
p~
T xk !
DG
F S
S
DG
F
rI k1 J 1,
yk21 2 y1
Á
,
h
H2 ~ ryk21 ! J 1,
yk21 2 y1
h
Á
,
based on the following results:
n
Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~
[ x k !# @ rI k2 J ~1, ~ yk21 2 y1 !0h! Á # 5
~i! ~10n! ( k5d11
op ~1YM nh!,
n'
~ii! ~10n! ( k5d11
Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~
[ x k ! 2 pT 2 ~ ryk21 !0p~
T x k !# @H2 ~ ryk21 ! J
~1, ~ yk21 '2 y1 ! 0h! Á # 5 op ~1YM nh!,
n
~iii! ~10n! ( k5d11
Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~
[ x k ! 2 pT 2 ~ ryk21 !0p~
T x k !# @ rI k1 J ~1,
Á
Y
~ yk21 2 y1 ! 0h! # 5 op ~1 M nh!+
'
LIVE METHOD FOR VOLATILITY
1117
To show ~i!, consider the first two elements of the term, for example, which are
bounded elementwise by
max 6 m~
K x k ! 2 m~ x k !6 2
k
1 1
Kh ~ yk21 2 y1 !
2 n(
k
p[ 2 ~ ryk21 !
p~
[ xk !
S
D 2 Gm ~m~ x k !! 1,
yk21 2 y1
h
D
Á
5 op ~1YM nh!+
The last equality is direct from the uniform convergence theorems in Masry ~1996! that
max 6 m~
K x t ! 2 m~ x t !6 5 Op ~log nYM nh d !
(A.1)
t
and ~10n! (k Kh ~ yk21 2 y1 !@ p[ 2 ~ ryk21 !0p~
[ x k !#D 2 Gm ~m~ x k !!~1, ~ yk21 2 y1 ! 0h! Á 5 Op ~1!+
The proof for ~ii! is shown by applying Lemma A+1, which follows+ The negligibility of
~iii! follows in a similar way from ~ii!, considering ~A+1!+ Although the asymptotic properties of s0n and t2n are relatively easy to derive, additional approximation is necessary
to make t1n more tractable+ Note that the estimation errors of the local linear fits, m~
K xk ! 2
m~ x k !, are decomposed into
1
n(
l
Kh ~ x l 2 x k !
p~ x l !
v 102 ~ x l !«l 1 the remaining bias
from the approximation results for the local linear smoother in Jones, Davies, and Park
~1994!+ A similar expression holds for volatility estimates, v~
I x k ! 2 v~ x k !, with a stochastic term of ~10n! (l @Kh ~ x l 2 x k !0p~ x l !#v~ x l !~«l2 2 1!+ Define
Jk, n ~ x l ! [
1
nh d
F
(k
K~ yk21 2 y1 0h!K~ x l 2 x k 0h! p2 ~ ryk21 !
p~ x l !
S
3 diag~¹Gm ,¹Gv ! J 1,
p~ x k !
yk21 2 y1
h
DG
Á
and let JN ~ x l ! denote the marginal expectation of Jk, n with respect to x k + Then, the stochastic term of t1n , after rearranging its the double sums, is approximated by
tI 1n 5
1
nh
(l JN ~ xl !@~v 102 ~ x l !«l , v~ x l !~«l2 2 1!! Á J I2 #
because the approximation error from JN ~X l ! negligible, that is,
1
nh
(l ~Jk, n 2 JN !@~v 102 ~Xl !«l , v~Xl !~«l2 2 1!! Á J I2 # Á 5 op ~1YM nh!,
1118
WOOCHEOL KIM AND OLIVER LINTON
applying the same method as in Lemma A+1+ A straightforward calculation gives
JN ~ x l ! .
1
h
E
K~u 1 2 y1 0h!K~u 1 2 yl21 0h!
S
F
3 diag~¹Gm ~u!,¹Gv ~u!! J 1,
.
1
h
E
E
1
u 1 2 y1
K~u 1 2 y1 0h!K~u 1 2 yl21 0h!
K~ ryl21 2 u 2 0h!
h d21
h
DG
p~ x l !
S
p~ x l !
du 2 du 1
p2 ~ ryl21 !
F
p2 ~ ryl21 !
p~ x l !
Á
3 diag~¹Gm ~u 1 , ryl21 !,¹Gv ~u 1 , ryl21 !! J 1,
.
p2 ~u 2 !
u 1 2 y1
h
DG
Á
du 1
F
diag~¹Gm ~ y1 , ryl21 !,¹Gv ~ y1 , ryl21 !!
S
J ~K * K !0
S
yl21 2 y1
h
D
, ~K * K !1
S
yl21 2 y1
h
DD G
Á
,
where
~K * K !i
S
yl21 2 y1
h
D E
5
S
w1i K~w1 !K w1 1
yl21 2 y1
h
D
dw+
Observe that ~K * K !i ~~ yl21 2 y1 !0h! in JN ~Xl ! is actually a convolution kernel and behaves
just like a one-dimensional kernel function of yl21 + This means that the standard method
~central limit theorem or law of least numbers! for univariate kernel estimates can be
applied to show the asymptotics of
1
tI 1n 5
nh
(l
p2 ~ ryl21 !
p~ x l !
5F
¹Gm ~ y1 , ryl21 !v 102 ~X l !«l
¹Gv ~ y1 , ryl21 !v~X l !~«l2 2 1!
G
J
3
~K * K !0
~K * K !1
S
S
yl21 2 y1
h
yl21 2 y1
h
D
D
46
+
If we define s1n as the remaining bias term of t1n , the estimation errors of w[ 1~ y1 ! 2
w1~ y1 ! consist of two stochastic terms, @I2 J e1Á Qn21 # ~ tI 1n 1 tI 2n !, and three bias terms,
@I2 J e1Á Qn21 # ~s0n 1 s1n 1 s2n !, where
tI 2n 5
1
n'
( Kh ~ yk21 2 y1 !
n k5d11
s2n 5 t2n 2 tI 2n +
p2 ~ ryk21 !
p~X k !
F
S
H2 ~ ryk21 ! J 1,
Yk21 2 y1
h
DG
Á
,
LIVE METHOD FOR VOLATILITY
1119
Step II. Computation of Variance and Bias. We start with showing the order of
the main stochastic term,
tI n* 5 tI 1n 1 tI 2n 5
1
n
(k jk ,
where jk 5 j1k 1 j2k ,
j1k 5
j2k 5
p2 ~ ryk21 !
p~ yk21 , ryk21 !
p2 ~ ryk21 !
p~ yk21 , ryk21 !
HF
¹Gm ~ y1 , ryk21 !v 102 ~X k !«k
¹ Gv ~ y1 ,
5F
ryk21 !v~X k !~«k2
m 2 ~ ryk21 !
v2 ~ ryk21 !
G
J
3
1
h
K
S
2 1!
G
J
S
F
1
h
~K * K !0
h
yk21 2 y1
h
0
D
1
yk21 2 y1
K
h
h
yk21 2 y1
yl21 2 y1
DS
S
h
D
46
DGJ
,
,
by calculating its asymptotic variance+ Dividing a normalized variance of tI n* into the
sums of variances and covariances gives
var~M nh tI n* ! 5 var
SM ( D
h
Mn
jk 5
k
5 h var~ jD k ! 1 (
k
h
n
F G
n2k
n
h
cov~jk , jl !
(k var~jk ! 1 n ((
kÞl
h@cov~jd , jd1k !# ,
where the last equality comes from the stationarity assumption+
We claim that
~a! h var~jk ! r S 1~ y1 !,
~b! (k @1 2 ~k0n!#h cov~jd , jd1k ! 5 o~1!, and
~c! nh var~ tI n* ! r S 1~ y1 !,
where
S 1 ~ y1 ! 5
HE
J
1
p22 ~z 2 !
p~ y1 , z 2 !
F
E
F
¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 !
~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
7~K * K !0 7 22
0
0
0
p22 ~z 2 !
p~ y1 , z 2 !
~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 !
GJ
H2 ~z 2 !H2Á ~z 2 ! dz 2 J
3
7K7 22
0
0
E
K 2 ~u!u 2 du
4
+
G
dz 2
1120
WOOCHEOL KIM AND OLIVER LINTON
Á
Proof of (a). Noting E~j1k ! 5 E~j2k ! 5 0431 and E~j1k j2k
! 5 0434 ,
Á
Á
h var~jk ! 5 hE~j1k j1k
! 1 hE~j2k j2k
!
by the stationarity assumption+ Applying the integration with substitution of variable
and Taylor expansion, the expectation term is
Á
hE ~j1k j1k
!5
HE
J
F
¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 !
p22 ~z 2 !
p~ y1 , z 2 ! ~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
F
7~K * K !0 7 22
0
0
0
~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 !
G
dz 2
GJ
and
Á
hE~j2k j2k
!5
E
J
p22 ~z 2 !
F
m 2 ~z 2 !v2 ~z 2 !
m 22 ~z 2 !
p~ y1 , z! m 2 ~z 2 !v2 ~z 2 !
3
7K7 22
0
0
E
K 2 ~u!u 2 du
v22 ~z 2 !
4
G
dz 2
1 o~1!,
where k3 ~ y1 , z 2 ! 5 E @«t3 6 x t 5 ~ y1 , z 2 !# and k4 ~ y1 , z 2 ! 5 E @~«t2 2 1! 2 6 x t 5 ~ y1 , z 2 !# +
n
Á
Proof of (b). Because E~j1k j1jÁ !6jÞk 5 E~j1k j2j
!6jÞk 5 0, cov~jd11 , jd111k ! 5
cov~j2d11 , j2d111k !+ By setting c~n!h r 0, as n r `, we separate the covariance terms
into two parts:
c~n!
(
k51
F G
12
k
n
n'
h cov~j2d11 , j2d111k ! 1
(
k5c~n!11
F G
12
k
n
h cov~j2d11 , j2d111k !+
To show the negligibility of the first part of the covariances, consider that the dominated convergence theorem used after Taylor expansion and the integration with substitution of variables gives
6cov~j2d11 , j2d111k !6
.
*E
H2 ~ ryd !H2Á ~ ryd1k !
p~ y1 , ryd , y1 , ryd1k !
p162 ~ y1 6 ryd !p162 ~ y1 6 ryd1k !
*
d~ ryd , ryd1k ! J
F G
1
0
0
0
+
LIVE METHOD FOR VOLATILITY
1121
Therefore, it follows from the assumption on the boundedness condition in Assumption A2 that
6cov~j2d11 , j2d111k !6 # E6H2 ~ ryd !6E6H2Á ~ ryd1k !6
3
E
p~ y1 , ryd , y1 , ryd1k !
p162 ~ y1 6 ryd !p162 ~ y1 6 ryd1k !
d~ ryd , ryd1k ! J
F G
1
0
0
0
,
[ A* ,
where A # B means a ij # bij , for all element of matrices A and B+ By the construction
of c~n!,
c~n!
(
k51
F G
12
k
n
h cov~j2d11 , j2d111k !
# 2c~n!6h cov~j2d11 , j2d111k !6 # 2c~n!hA* r 0,
as n r `+
Next, we turn to the negligibility of the second part of the covariances:
n'
(
k5c~n!11
F G
12
k
n
h cov~j2d11 , j2d111k !+
i
Let j2k
be the ith element of j2k , for i 5 1, + + + ,4+ Using Davydov’s lemma ~in Hall and
Heyde, 1980, Theorem A+5!, we obtain
i
i
6h cov~j2d11
, j2d111k !6 5 6cov~M hj2d11
,M hj2d111k !6
j
j
# 8@a~k!1220v #
F
G
i v
max E~M h6j2k
6 !
i51, + + + ,4
20v
1 v
for some v . 2+ The boundedness of E~M h6j2k
6 !, for example, is evident from the
direct calculation that
j2k 5
1 v
6 !.
E~6M hj2k
p2 ~ ryk !
p~ x k !
h v02
h
5O
v21
5F G
E
m 2 ~ ryd !
v2 ~ ryd !
J
p2v ~z 2 !
v21
p ~ y1 , z 2 !
3
1
h
h
v21
5O
1
h
S
6m 2v ~z 2 !6 dz 2
S D S D
h v02
K
v0221
+
S
D
yk21 2 y1
K
h
h
yk21 2 y1
yk21 2 y1
1
h
DS
h
D
46
,
1122
WOOCHEOL KIM AND OLIVER LINTON
Thus, the covariance is bounded by
6h cov~j2d11 , j2d111k !6 # C
F
1
h
v0221
G
20v
@a~k!1220v # +
This implies
n'
(
k5c~n!11
F G
12
k
n
h cov~j2d11 , j2d111k !
`
#2
( 6h cov~j2d11, j2d111k !6 # C '
k5c~n!11
`
5 C'
(
k5c~n!11
F
1
h 1220v
G
F
1
h 1220v
G
`
( @a~k!1220v #
k5c~n!11
`
@a~k!1220v # # C '
( k a @a~k!1220v # ,
k5c~n!11
if a is such that
k a $ ~c~n! 1 1! a $ c~n! a 5
1
h
1220v
,
for example, c~n! a h 1220v 5 1, which implies c~n! r `+ If we further restrict a such
that
2
0 , a , 12 ,
v
then
c~n! a h 1220v 5 1
implies c~n! a h 1220v 5 @c~n!h# 1220v c~n!2d 5 1,
for d . 0+
Thus, c~n!h r 0 as required+ Therefore,
n'
(
k5c~n!11
F G
12
k
n
`
h cov~j2d11 , j2d111k ! # C '
as n goes to `+
( k a @a~k!1220v # r 0,
k5c~n!11
n
The proof of ~c! is immediate from ~a! and ~b!+
Next, we consider the asymptotic bias+ Using the standard result on the kernel weighted
sum of the stationary series, we first get
s0n
p
&
h2
2
@D 2 w1 ~ y1 ! J ~ m 2K ,0! Á # ,
LIVE METHOD FOR VOLATILITY
1123
because
1
n'
DF
S
DG
DF
S DG
S DF
S DG
S DF
S DG
( KhWZ ~ yk21 2 y1 !
n k5d11
p
.
5
E
E
E
&
S
S
yk21 2 y1
h
z 1 2 y1
KhWZ ~z 1 2 y1 !
F
5 D 2 w1 ~ y1 ! J
2
z 1 2 y1
2
h
E
2
h
z 1 2 y1
S
z 1 2 y1
h
DS
z 1 2 y1
p~z! dz
z 1 2 y1
Á
h
h
1,
Á
h
z 1 2 y1
2
Á
h
D 2 w1 ~ y1 ! J 1,
D 2 w1 ~ y1 ! J 1,
Kh ~z 1 2 y1 !
yk21 2 y1
D 2 w1 ~ y1 ! J 1,
D 2 w1 ~ y1 ! J 1,
h
Kh ~z 1 2 y1 !p2 ~z 2 !
Kh ~z 1 2 y1 !
2
z 1 2 y1
h
dz
Á
dz 1
D G
Á
dz 1
5 @D 2 w1 ~ y1 ! J ~ m 2K ,0! Á # +
For the asymptotic bias of s1n , we again use the approximation results in Jones et al+
~1994!+ Then, the first component of s1n , for example, is
1
Kh ~ yk21 2 y1 !
n(
k
3
H
1 1
2 n(
l
p2 ~ ryk21 !
p~ x k !
Kh ~ x l 2 x k !
p~ x l !
¹Gm ~m~ x k !!
d
( ~ yl2a 2 yk2a ! 2
a51
] 2 m~ x k !
2
]yk2a
J
and converges to
h2
2
E
2
p2 ~z 2 !¹Gm ~m~ y1 , z 2 !!@m K*K
D 2 m 1 ~ y1 ! 1 m 2K D 2 m 2 ~z 2 !# dz 2 ,
based on the argument for the convolution kernel given previously+ A convolution of
symmetric kernels is symmetric, so that *~K * K !0 ~u!udu 5 0 and *~K * K !1~u!u 2 du 5
**wK~w!K~w 1 u!u 2 dwdu 5 0+ This implies that
sI 1n
p
&
h2
2
E
p2 ~z 2 !$@¹Gm ~m~ y1 , z 2 !!,¹Gv ~v~ y1 , z 2 !!# Á
2
( @m K*K
D 2 w1 ~ y1 ! 1 m 2K D 2 w2 ~z 2 !#% J ~1,0! Á dz 2 +
To calculate s2n , we use the Taylor series expansion of pT 2 ~ ryk21 !0p~X
T k !:
F
pT 2 ~ ryk21 ! 2
T k!
p2 ~ ryk21 ! p~X
F
p~X k !
5 pT 2 ~ ryk21 ! 2
5
pT 2 ~ ryk21 !
p~X k !
2
G
1
p~X
T k!
T k!
p2 ~ ryk21 ! p~X
p~X k !
T k!
p2 ~ ryk21 ! p~X
p 2 ~X k !
G
1
p~X k !
1 op ~1!+
F
3 12
p~X
T k ! 2 p~X k !
p 2 ~X k !
1{{{
G
1124
WOOCHEOL KIM AND OLIVER LINTON
Thus,
s2n 5
n'
1
( Kh ~ yk21 2 y1 !
n k5d11
F
&
E
.
E
5
E
Kh ~z 1 2 y1 !
E
1
pT 2 ~z 2 !
p~z!
pT 2 ~z 2 !
p~z!
g2
2
2
FE
FE
p2 ~ ryk21 !
p~X k !
G
Á
p2 ~z 2 !
2
H2 ~z 2 ! J 1,
p~z!
p2 ~z 2 ! p~z!
T
2
p ~z!
p2 ~z 2 !
2
p~z!
p 2 ~z!
z 1 2 y1
2
H2 ~z 2 ! J 1,
p2 ~z 2 !
2
p~ y1 , z 2 !
Á
p~z! dz
h
z 1 2 y1
Á
h
z 1 2 y1
h
p~z! dz
Á
p~z! dz
p2 ~z 2 ! p~z!
T
p 2 ~z!
Á
p~z! dz
h
D 2 p2 ~z 2 !H2 ~z 2 ! dz 2 J ~ m 2K ,0! Á
g2
z 1 2 y1
H2 ~z 2 ! J 1,
2
p2 ~z 2 !p~z!
3 H2 ~z 2 ! J 1,
.
2
h
p~z!
T
Kh ~z 1 2 y1 !
F
p~X
T k!
yk21 2 y1
pT 2 ~z 2 !
Kh ~z 1 2 y1 !
Kh ~z 1 2 y1 !
pT 2 ~ ryk21 !
S
DG
S DG
F
GF
S DG
F
GF
S DG
F
GF
F
G
S DG
3 H2 ~ ryk21 ! J 1,
p
F
G
G
D 2 p~ y1 , z 2 !H2 ~z 2 ! dz 2 J ~ m 2K ,0! Á +
Finally, for the probability limit of @I2 J e1Á Qn21 # , we note that
Á
KY2 Dh21 5 @ q[ ni1j22 ~ y1 ; h!# ~i, j !51,2
Qn 5 Dh21 Y2
n
with q[ ni 5 ~10n! ( k5d
KhWZ ~Yk21 2 y1 !~~ yk21 2 y1 !0h! i,
q[ ni
p
&
E
Kh ~z 1 2 y1 !
S
z 1 2 y1
h
D
i
for i 5 0,1,2, and
p2 ~z 2 ! dz 5
E
K~u 1 !u 1i du 1
5
E
K~u 1 !u 1i du 1 [ qi ,
where q0 5 1, q1 5 0, and q2 5 m 2K +
E
p2 ~z 2 ! dz 2
LIVE METHOD FOR VOLATILITY
Thus, Qn r
F GQ
0
,
m 2K
1
0
21
n
r ~10m 2K !
F G, and e
m 2K 0
0 1
Á 21
1 Qn
1125
r e1Á+ Therefore,
B1n ~ y1 ! 5 @I2 J e1Á Qn21 # ~s0n 1 s1n 1 s2n !
h2
5
2
3
m 2K D 2 w1 ~ y1 !
h2
2
E
2
@m K*K
D 2 w1 ~ y1 ! 1 m 2K D 2 w2 ~z 2 !#
( @¹Gm ~m~ y1 , z 2 !!,¹Gv ~v~ y1 , z 2 !!# Á p2 ~z 2 ! dz 2
1
3
g2
2
E
m 2K
E
Dp2 ~z 2 !H2 ~z 2 ! dz 2 2
p2 ~z 2 !
p~ y1 , z 2 !
g2
2
m 2K
D 2 p~ y1 , z 2 !H2 ~z 2 ! dz 2
1 op ~h 2 ! 1 op ~g 2 !+
Step III. Asymptotic Normality of tI n*+ Applying the Cramer–Wold device, it is sufficient to show
1
Dn [
Mn
(k M h jD k
D
&
N~0, b Á S 1 b!,
for all b [ R4, where jD k 5 b Á jk + We use the small block–large block argument ~see
Masry and Tjøstheim, 1997!+ Partition the set $d, d 1 1, + + + , n% into 2k 1 1 subsets with
large blocks of size r 5 rn and small blocks of size s 5 sn where
k5
F
n1
rn 1 sn
G
and @x# denotes the integer part of x+ Define
~ j11!~r1s!21
j~r1s!1r21
( M h jD t ,
t5j~r1s!
hj 5
(
vj 5
M h jD t ,
0 # j # k 2 1,
t5j~r1s!1r
n
§k 5
( M h jD t +
t5k~r1s!
Then,
Dn 5
1
Mn
S(
k21
j50
k21
hj 1
( vj 1 §k
j50
D
[
1
Mn
~Sn' 1 Sn'' 1 Sn''' !+
Because of Assumption A6, there exists a sequence a n r ` such that
a n sn 5 o~M nh!
and
a nM n0ha~sn ! r 0,
as n r `,
(A.2)
1126
WOOCHEOL KIM AND OLIVER LINTON
defining the large block size as
rn 5
FM G
nh
an
(A.3)
+
It is easy to show by ~A+2! and ~A+3! that as n r `
rn
sn
r 0,
n
rn
rn
r 0,
r 0,
M nh
(A.4)
and
n
rn
a~sn ! r 0+
We first show that Sn'' and Sn''' are asymptotically negligible+ The same argument used in
step II yields
s21
(
var~vj ! 5 s 3 var~M h jD t ! 1 2s
k51
S D
12
k
s
cov~M h jD d11 ,M h jD d111k !
(A.5)
5 sb Á S 1 b~1 1 o~1!!,
which implies
k21
nsn
( var~vj ! 5 O~ks! ; rn 1 sn ;
j50
nsn
rn
5 o~n!,
from the condition ~A+4!+ Next, consider
k21
k21
iÞj
iÞj
s
s
( cov~vi , vj ! 5 i,(
( ( cov~M h jD N 1k ,M h jD N 1k !,
i, j50,
j50, k 51 k 51
i
1
1
j
2
2
where Nj 5 j~r 1 s! 1 r+ Because 6Ni 2 Nj 1 k 1 2 k 2 6 $ r, for i Þ j, the covariance term
is bounded by
n2r
2
n
( (
k 151 k 25k 11r
6cov~M h jD k1 ,M h jD k2 !6
n
# 2n
( 6cov~M h jD d11,M h jD d111j !6 5 o~n!+
j5r11
The last equality also follows from step II+ Hence, ~10n!E $~Sn'' ! 2 % r 0, as n r `+
Repeating a similar argument for Sn''' , we get
LIVE METHOD FOR VOLATILITY
1
n
E $~Sn''' ! 2 % #
1
n
@n 2 k~r 1 s!# var~M h jD d11 !
12
#
1127
n 2 k~r 1 s!
n2k~r1s!
n
j51
rn 1 sn
n
r 0,
(
cov~M h jD d11 ,M h jD d111j !
b Á S 1 b 1 o~1!
as n r `+
D
k21
hj & N~0, b Á S 1 b!+
Now, it remains to show ~10M n !Sn' 5 ~10M n ! ( j50
j~r1s!1r21
j~r1s!1r21
Because hj is a function of $ jD t % t5j~r1s!11 that is Fj~r1s!112d -measurable, the Volkonskii and Rozanov’s lemma ~1959! in the appendix of Masry and Tjøstheim ~1997!
implies that, with sI n 5 sn 2 d 1 1,
*
F SM
E exp it
1
n
k21
(
j50
hj
DG
) E~exp~ithj !!*
k21
2
# 16ka~ sI n 2 d 1 1! .
j50
n
rn 1 sn
a~ sI n ! .
n
rn
a~ sI n ! . o~1!,
where the last two equalities follow from ~A+4!+ Thus, the summands $hj % in Sn' are
asymptotically independent, because an operation similar to ~A+5! yields
var~hj ! 5 rn b Á S 1 b~1 1 o~1!!
and hence
var
SM D
1
n
Sn' 5
1
n
k21
( E~hj2 ! 5
j50
k n rn
n
b Á S 1 b~1 1 o~1!! r b Á S * b+
Finally, because of the boundedness of density and kernel functions, the Lindeberg–
Feller condition for the asymptotic normality of Sn' holds:
1
k21
n
j50
( E @hj2 I $6hj 6 . M n dM b Á S 1 b%# r 0
for every d . 0+ This completes the proof of step III+
p
Á 21 I *
& e Á , the Slutzky theorem implies M nh@I
From e1Á Qn21
1
2 J e1 Qn # tn
*
*
Á
N~0, S 1 !, where S 1 5 @I2 J e1 #S 1 @I2 J e1 # + In sum, M nh~ w[ 1 ~ y1 ! 2 w1~ y1 ! 2 Bn !
N~0, S *1 !, with S *1 ~ y1 ! given by
E
p22 ~z 2 !
p~ y1 , z 2 !
3
1
F
d
&
&
7~K * K !0 7 22
¹Gm ~ y1 , z 2 ! 2 v~ y1 , z 2 !
~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
E
d
p22 ~z 2 !
p~ y1 , z 2 !
~¹Gm{¹Gv !~k3{v 302 !~ y1 , z 2 !
¹Gv ~ y1 , z 2 ! 2 k4 ~ y1 , z 2 !v 2 ~ y1 , z 2 !
7K7 22 H2 ~z 2 !H2Á ~z 2 ! dz 2 +
G
dz 2
n
1128
WOOCHEOL KIM AND OLIVER LINTON
LEMMA A+1+ Assume the conditions in Assumptions A1 and A4–A6. For a bounded
function, F~{!, it holds that
n
(a) r1n 5 ~M h0M n ! ( k5d
Kh ~ yk21 2 y1 !~ p[ 2 ~ ryk22 ! 2 pT 2 ~ ryk22 !!F~ x k ! 5 op ~1!,
n
M
M
(b) r2n 5 ~ h0 n ! ( k5d
Kh ~ yk21 2 y1 !~ p~
[ x k ! 2 p~ x k !!F~ x k ! 5 op ~1!+
Proof. The proof of ~b! is almost the same as ~a!+ Therefore we only show ~a!+ By
adding and subtracting LO l 6 k ~ yl22 6 yk22 !, the conditional expectation of L g ~ ryl22 2 ryk22 !
given ryk22 in r1n , we get r1n 5 j1n 1 j2n , where
n
1
j1n 5
n2
1
j2n 5
n2
n
( ( Kh ~ yk21 2 y1 !F~ xk !@L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ yl22 6 yk22 !# ,
k5d l5d
(k (l Kh ~ yk21 2 y1 !F~ xk !@ LO l 6 k ~ yl22 6 yk22 ! 2 pT 2 ~ ryk22 !# +
Rewrite j2n as
1
n2
(k (
Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !# ,
s,k * ~n!
1
1
n2
(k (
Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !# ,
s$k * ~n!
where k * ~n! is increasing to infinity as n r `+ Let
B 5 E $Kh ~ yk21 2 y1 !F~ x k !@ LO k1s6 k ~ yk1s22 6 yk22 ! 2 pT 2 ~ ryk22 !#%,
which exists as a result of the boundedness of F~ x k !+ Then, for a large n, the first part of
j2n is asymptotically equivalent to ~10n!k * ~n!B+ The second part of j2n is bounded by
sup 6 pk1s6 k ~ yk1s22 6 yk22 ! 2 p~ yk22 !6
s$k * ~n!
1
n
n
(k Kh ~ yk21 2 y1 !6F~ xk !6
# r k~n! Op ~1!+
Therefore, M nhj2n # Op ~~M h YM n !k * ~n!! 1 Op ~r2k ~n! M nh! 5 op ~1!, for k~n! 5
log n, for example+
It remains to show j1n 5 op ~1YM nh!+ Because E~j1n ! 5 0 from the law of iteration,
we just compute
*
2
E~j1n
!5
1
n
4
n
n
n
n
(((
( E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ xk !
kÞl
iÞj
3 F~X l !@L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ ryk22 !#
3 @L h ~ ryj22 2 ryi22 ! 2 LO j 6 i ~ ryi22 !#%+
LIVE METHOD FOR VOLATILITY
1129
~1! Consider the case k 5 i and l Þ j+
1
n
4
n
n
n
(k (
( E $Kh2 ~ yk21 2 y!F 2 ~ x k !
lÞj
3 @ L g ~ ryl22 2 ryk22 ! 2 LO l 6 k ~ ryk22 !# @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !#%
5 0,
because, by the law of iteration and the definition of LO j 6 k ~ ryk22 !,
E6 k, l @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !#
5 E6 k @L g ~ ryj22 2 ryk22 ! 2 LO j 6 k ~ ryk22 !# 5 E6k @L g ~ ryj22 2 ryk22 !# 2 LO j 6 k ~ ryk22 !
5 0+
~2! Consider the case l 5 j and k Þ i+
1
n4
n
n
n
( ( (l E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ xk !F~ xi !
kÞi
3 @L g ~ ryl22 2 ryk22 ! 2 LO l 6k ~ ryk22 !# @L g ~ ryl22 2 ryi22 ! 2 LO l 6 i ~ ryi22 !#%+
We only calculate
1
n
4
n
n
n
( ( (l E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!L g ~ ryl22 2 ryk22 !
kÞi
3 L g ~ ryl22 2 ryi22 !F~ x k !F~ x i !%
(A.6)
because the rest of the triple sum consists of expectations of standard kernel estimates and is O~10n!+ Note that
E6~i, k! L g ~ ryl22 2 ryk22 !L g ~ ryl22 2 ryi22 !
. ~L * L!g ~ ryk22 2 ryi22 !pl 6~k, i ! ~ ryk22 6 ryk22 , ryi22 !,
where ~L * L!g ~{! 5 ~10g! *L~u!L~u 1 {0g! is a convolution kernel+ Thus, ~A+6!
is
1
n
n
n
( ( (l E @Kh ~ yk21 2 y!Kh ~ yi21 2 y!~L * L!g ~ ryk22 2 ryi22 !
n 4 kÞi
3 F~ x k !F~ x i !pl 6~k, i ! ~ ryk22 6 ryk22 , ryi22 !# 5 O
SD
1
n
+
~3! Consider the case with i 5 k, j 5 m:
1
n4
n
n
( ( E $Kh2 ~ yk21 2 y!F 2 ~ x k !@L g ~ yl22 2 yk22 ! 2 LO l 6 k ~ ryk22 !# 2 %
kÞl
5O
S D S D
1
2
n hg
5o
1
nh
+
1130
WOOCHEOL KIM AND OLIVER LINTON
~4! Consider the case k Þ i, l Þ j:
1
n
4
n
n
n
n
(((
( E $Kh ~ yk21 2 y!Kh ~ yi21 2 y!F~ x k !F~ x i !
kÞl
iÞj
3 @ L g ~ ryl22 2 ryk22 ! 2 LO l 6k ~ ryk22 !# @L g ~ ryj22 2 ryi22 ! 2 LO j 6 i ~ ryi22 !#%
5 0,
for the same reason as in ~1!+
n
A.2. Proofs for Section 5. Recall that x t 5 ~ yt21 , + + + , yt2d ! 5 ~ yt2a , ryt2a ! and z t 5
~ x t , yt !+ In a similar context, let x 5 ~ y1 , + + , yd ! 5 ~ ya , rya ! and z 5 ~ x, y0 !+ For the score
function s * ~z, u, ga ! 5 s * ~z, u, ga ~ rya !!, we define its first derivative with respect to the
parameter u by
¹u s * ~z, u, ga ! 5
]s * ~z, u, ga !
]u
and use s * ~u, ga ! and ¹u s * ~u, ga ! to denote E @s * ~z t , u, ga !# and E @¹u s * ~z t , u, ga !# ,
respectively+ Also, the score function s * ~z, u,{! is said to be Frechet differentiable
~with respect to the sup norm 7{7` ! if there is S * ~z, u, ga ! such that for all ga with
7ga 2 ga0 7` small enough,
7s * ~z, u, ga ! 2 s * ~z, u, ga0 ! 2 S * ~z, u, ga0 ~ rya !!~ga 2 ga0 !7 # b~z!7ga 2 ga0 7 2,
(A.7)
for some bounded function b~{!+ The term S * ~z, u, ga0 ! is called the functional derivative
of s * ~z, u, ga ! with respect to ga + In a similar way, we define ¹g S * ~z, u, ga ! to be the
functional derivative of S * ~z, u, ga ! with respect to ga +
Assumption B. Suppose that ~i! ¹u s * ~u0 ! is nonsingular; ~ii! S * ~z, u, ga ~ rya !!
and ¹g S * ~z, u, ga ~ rya !! exist and have square integrable envelopes SN * ~{! and ¹O g S * ~{!,
satisfying
7S * ~z, u, ga ~ rya !!7 # SN * ~z!,
7¹g S * ~z, u, ga ~ rya !!7 # ¹O g S * ~z!;
and ~iii! both s * ~z, u, ga ! and S * ~z, u, ga ! are continuously differentiable in u, with derivatives bounded by square integrable envelopes+
Note that the first condition is related to the identification condition of component
functions, whereas the second concerns Frechet differentiability ~up to the second order!
of the score function and uniform boundedness of the functional derivatives+ For the
main results in Section 5, we need the following conditions+ Some of the assumptions
are stronger than their counterparts in Assumption A in Section 4+ Let h 0 and h denote
the bandwidth parameter used for the preliminary instrumental variable and the twostep estimates, respectively, and g denote the bandwidth parameter for the kernel density+
Assumption C.
`
is stationary and strongly mixing with a mixing coefficient a~k! 5 r2bk,
1+ $ yt % t51
for some b . 0, and E~«t4 x t ! , `, where «t 5 yt 2 E~ yt 6 x t !+
2+ The joint density function, p~{!, is bounded away from zero and q-times continuously differentiable on the compact supports X 5 X a 3 X aT , with Lipschitz
LIVE METHOD FOR VOLATILITY
3+
4+
5+
6+
7+
8+
1131
continuous remainders, that is, there exists C , ` such that for all x, x ' [ X,
6Dxm p~ x! 2 Dxm p~ x ' !6 # C7 x 2 x ' 7, for all vectors m 5 ~ m 1 , + + + , m d ! with
d
m i # q+
( i51
The component functions, m a ~{! and va ~{!, for a 5 1, + + + , d, are q-times continuously differentiable on X a with Lipschitz continuous qth derivative+
The link functions, Gm and Gv , are q-times continuously differentiable over any
compact interval of the real line+
The kernel functions, K~{! and L~{!, are of bounded support, symmetric about zero,
satisfying * K~u! du 5 * L~u! du 5 1, and of order q, that is, * u i K~u! du 5
*u i L~u! du 5 0, for i 5 1, + + + , q 2 1+ Also, the kernel functions are q-times differentiable with Lipschitz continuous qth derivative+
The true parameters u0 5 ~m a ~ ya !, va ~ ya !, m a' ~ ya !, va' ~ ya !! lie in the interior of
the compact parameter space Q+
~i! g r 0, ng d r ` and ~ii! h 0 r 0, nh 0 r `+
q
~i! nh 02 0~log n! 2 h r ` and M nhh 0 r 0; and for some integer v . d02,
q2v 2v2102
2v11
~ii! n~h 0 h!
0log n r `; h 0 h
r 0;
~iii! nh 0d1~4v11! 0log n r `; q $ 2v 1 1+
Some facts about empirical processes will be useful in the discussion that follows+
Define the L2 -Sobolev norm ~of order q! on the class of real-valued function with domain
W0 :
7t7q,2, W0 5
S( E
m#q
W0
D
102
~Dxm t~ x!! 2 dx
,
where, for x [ W0 , R k and a k-vector m 5 ~ m 1 , + + + , m k ! of nonnegative integers,
k
] S i51 m i t~ x!
D m t~ x! 5
] m 1 x 1{{{] m k x k
and q $ 1 is some positive integer+ Let X a be an open set in R1 with minimally smooth
d
boundary as defined by, for example, Stein ~1970!, and X 5 3b51
X b , with X aT 5
d
d
3b51~Þa!X b + Define T1 as a class of smooth functions on X aT 5 3b51~Þa!
X b whose
L2 -Sobolev norm is bounded by some constant T1 5 $t : 7t7q,2, X aT # C%+ In a similar
way, T2 5 $t : 7t7q,2, X # C%+
Define ~i! an empirical process, v1n ~{!, indexed by t [ T1 :
v1n ~t1 ! 5
n
1
( @ f1~ x t ; t1 ! 2 Ef1~ x t ; t1 !# ,
M n t51
(A.8)
with pseudometric r1~{,{! on T1 :
r1 ~t, t ' ! 5
FE
X
G
~ f1 ~w; t~ wt a !! 2 f1 ~w; t ' ~ wt a !!! 2 p~w! dw
102
,
where f1~w; t! 5 h 2102 K~~wa 2 ya !0h! Ss * ~w, ga0 !t1 ~ wt a !; and ~ii! an empirical process,
v2n ~{,{!, indexed by ~ ya , t2 ! [ X a 3 T2 :
1132
WOOCHEOL KIM AND OLIVER LINTON
v2n ~ ya , t2 ! 5
n
1
( @ f2 ~ x t ; ya , t2 ! 2 Ef2 ~ x t ; ya , t2 !# ,
M n t51
(A.9)
with pseudometric r2 ~{,{! on T2 :
r2 ~~ ya , t2 !,~ ya' , t2' !! 5
FE
X
G
~ f2 ~w; ya , t2 ! 2 f2 ~w; ya' , t2' !! 2 p~w! dw
102
,
K @~wa 2 ya !0h 0 # @ paT ~ wt a !0p~w!#Gm' ~m~w!!t2 ~w!+
where f2 ~w; ya , t2 ! 5 h2102
0
We say that the processes $n1n ~{!% and $n2n ~{,{!% are stochastically equicontinuous
at t10 and ~ ya0 ,t20 !, respectively ~with respect to the pseudometric r1~{,{! and r2 ~{,{!,
respectively!, if
∀ «, h . 0,
lim P *
Tr`
∃ d . 0 s+t+
F
r1 ~t, t0 !,d
F
r2 ~~ ya , t2 !, ~ ya0 , t20 !!,d
sup
G
6 n1n ~t1 ! 2 n1n ~t10 !6 . h , «,
(A.10)
and
lim P *
Tr`
sup
G
6n2n ~ ya , t2 ! 2 n2n ~ ya0 , t20 !6 . h , «,
(A.11)
respectively, where P * denotes the outer measure of the corresponding probability
measure+
Let F1 be the class of functions such as f1~{! defined previously+ Note that ~A+10!
follows, if Pollard’s entropy condition is satisfied by F1 with some square integrable
envelope FO 1 ; see Pollard ~1990! for more details+ Because f1~w; t1 ! 5 c1~w!t1~ wt a ! is
the product of smooth functions t1 from an infinite-dimensional class ~with uniformly
bounded partial derivatives up to order q! and a single unbounded function c~w! 5
@h 2102 K~~wa 2 ya !0h! Ss * ~w, ga0 !# , the entropy condition is verified by Theorem 2 in
Andrews ~1994! on a class of functions of type III+ Square integrability of the envelope FO 1 comes from Assumption B~ii!+ In a similar way, we can show ~A+11! by applying the “mix and match” argument of Theorem 3 in Andrews ~1994! to f2 ~w; ya , t2 ! 5
c2 ~w!h 2102 K~~wa 2 ya !0h 0 !t2 ~w!, where K~{! is Lipschitz continuous in ya , that is, a
function of type II+
Proof of Theorem 4. We only give a sketch, because the whole proof is lengthy and
relies on arguments similar to Andrews ~1994! or Gozalo and Linton ~2000! for the i+i+d+
case+ Expanding the first-order condition in ~5+16! and solving for ~ uZ * 2 u0 ! yields
F
uZ * 2 u0 5 2
1
n
n'
(
t5d11
G
¹u s~z t , u,N gJ a !
21
1
n
n'
(
s~z t , gJ a !,
t5d11
where uN is the mean value between uZ and u0 and s~z t , gJ a ! 5 s~z t , u0 , gJ a !+ By the uniform law of large numbers in Gozalo and Linton ~1995!, we have supu[Q 6 Qn ~u! 2
p
E~Qn ~u!!6 & 0, which, together with ~i! uniform convergence of gJ a by Lemma A+3 and
~ii! uniform continuity of the localized likelihood function, Qn ~u, ga ! over Q 3 Ga , yields
p
supu[Q 6 QE n ~u! 2 E~Qn ~u!!6 & 0 and thus consistency of uZ * + Based on the ergodic theo-
LIVE METHOD FOR VOLATILITY
1133
rem on the stationary time series and a similar argument to Theorem 1 in Andrews ~1994!,
consistency of uZ * and uniform convergence of gJ a imply
1
n
n'
( ¹u s~z t , u,N gJ a !
t5d11
p
&
E @¹u s~z t , u0 , ga0 !# [ Da ~ ya !+
(A.12)
For the numerator, we first linearize the score function+ Under Assumption B~ii!,
s * ~z, u, ga ! is Frechet differentiable and ~A+7! holds, which, because of M nh7 gJ a 2
p
ga0 7 2` & 0 ~by Lemma A+3 and Assumption C+8~i!!, yields a proper linearization of the
score term:
n'
1
n
(
1
s~z t , gJ a ! 5
n
t5d11
1
n'
(
Kh ~ yt2a 2 ya0 !s * ~z t , ga0 !
t5d11
1
n
n'
(
Kh ~ yt2a 2 ya0 !S * ~z t , ga0 ~ ryt2a !!@ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !#
t5d11
1 op ~1YM nh!,
where S * ~z t , ga0 ~ ryt2a !! 5 S * ~z t , u0 , ga0 ~ ryt2a !!+ Or equivalently, by letting
Ss * ~ y, ga0 ~ rya !! 5 E @S * ~z t , ga0 ~ ryt2a !!x t 5 y#
and u t 5 Ss * ~ x t , ga0 ~ ryt2a !! 2 E @ Ss * ~ x t , ga0 ~ ryt2a !!6 x t 5 y# , we have
n'
M nh
n
(
s~z t , gJ a !
t5d11
5
Mh
Mn
1
1
n'
(
Kh ~ yt2a 2 ya0 !s * ~z t , ga0 !
t5d11
n'
Mh
Mn
(
n'
Mh
Mn
Kh ~ yt2a 2 ya0 ! Ss * ~ x t , ga0 ~ ryt2a !!@ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !#
t5d11
(
Kh ~ yt2a 2 ya0 !u t @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# 1 op ~1!
t5d11
[ T1n 1 T2n 1 T3n 1 op ~1!+
Note that the asymptotic expansion of the infeasible estimator is equivalent to the first
term of the linearized score function premultiplied by the inverse Hessian matrix in ~A+12!+
Because of the asymptotic boundedness of ~A+12!, it suffices to show the negligibility
of the second and third terms+
To calculate the asymptotic order of T2n , we make use of the preceding stochastic
equicontinuity results+ For a real-valued function d~{! on X aT and T 5 $d : 7d7v,2, X aT #
C%, we define an empirical process
vn ~ ya , d! 5
1
Mn
n'
(
t5d11
@ f ~ x t ; ya , d! 2 E~ f ~ x t ; ya , d!!# ,
1134
WOOCHEOL KIM AND OLIVER LINTON
where f ~ x t ; ya , d! 5 K~~ yt2a 2 ya !0h!h v Ss * ~ x t , ga0 ~ ryt2a !!d~ ryt2a !, for some integer
v . d02+ Let dD 5 h 2v2102 @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# + From the uniform convergence rate
in Lemma A+3 and the bandwidth condition C+8~ii!, it follows that
S F!
7 d7
D v,2, X aT 5 Op h 2v2102
log n
~2v11!
nh 0
1 h0
q2v
GD
5 op ~1!+
Because dZ is bounded uniformly over X aT , with probability approaching one, it holds
that Pr~ dZ [ T ! r 1+ Also, because, for some positive constant C , `,
2
r 2 ~~ ya , d!,
Z ~ ya ,0!! # Ch 2~2v11! 7 gJ a 2 ga0 7 v,2,
X aT 5 op ~1!,
p
Z ~ ya ,0!! & 0+ Hence, following Andrews ~1994, p+ 2257!, the
we have r~~ ya , d!,
stochastic equicontinuity condition of vn ~ ya ,{! at d 0 5 0 implies that 6 vn ~ ya , d!
Z 2
nn ~ ya , d 0 !6 5 6vn ~ ya , d!6
Z 5 op ~1!; that is, T2n is approximated ~with an op ~1! error! by
*
T2n
5 M nh
E
Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !@ gJ a ~ rya ! 2 ga0 ~ rya !# p~ x! dx+
*
We proceed to show negligibility of Tn2
+ From the integrability condition on
*
0
S ~z, ga ~ rya !!, it follows, by change of variables and the dominated convergence
theorem, that *Kh ~ ya 2 ya0 !S * ~z, ga0 ~ rya !! dF0 ~z! 5 *S * @~ y, ya0 , rya !, ga0 ~ rya !# 3
p~ y, ya0 , rya ! d~ y, rya ! , `, which, together with M n -consistency of c[ 5 ~ c[ m , c[ v ! Á ,
means that ~ c[ 2 c!M nh*Kh ~ ya 2 ya0 !S * ~z, ga0 ~ rya !! dF0 ~z! 5 op ~1!+ Because
d
gJ a ~ rya ! 2 ga0 ~ rya ! 5
( ~ w[ b ~ yb ! 2 wb0 ~ yb !! 2 ~d 2 2!~ c[ 2 c!,
b51,Þa
this yields
d
*
T2n
5
( M nh
b51,Þa
E
Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !~ w[ b ~ yb ! 2 wb0 ~ yb !!p~ x! d~ x! 1 op ~1!+
From Lemma A+3,
w[ b ~ yb ! 2 wb ~ yb ! 5 h 0 bN b ~ yb ! 1
q
1
n
(t ~K * K !h ~ yt2b 2 yb !
0
p2 ~ ryt2b !
p~ x t !
3 ~¹G~ x b , ryt2b ! ( jt !
1
1
Kh ~ yt2b 2 yb !
n(
t
0
p2 ~ ryt2b !
p~ x t !
ga*0 ~ ryt2b ! 1 Op ~rn2 ! 1 op ~n2102 !,
where jt 5 ~«t , ~«t2 2 1!! Á, ¹G~ x t ! 5 @¹Gm ~ yb , ryt2b !v~ x t !102, ¹Gv ~ yb , ryt2b !v~ x t !# Á ,
q
and ga*0 ~ ryt2b ! 5 ga0 ~ ryt2b ! 2 c0 + Under the condition C+8~i!, M nhh 0 5 o~1!, integra0
*
bility of the bias function bN b ~ yb ! and S ~z, u0 , ga ~ rya !! imply
*
T2n
5 S1n 1 S2n 1 op ~1!,
LIVE METHOD FOR VOLATILITY
1135
where
d
S1n 5
( M nh
b51,Þa
E
Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !
3 ( ~K * K !h0 ~ yt2b 2 yb !
p2 ~ ryt2b !
p~ x t !
t
1
n
~¹G~ x b , ryt2b ! ( jt !p~ x! dx,
and
d
S2n 5
( M nh
b51,Þa
E
Kh ~ ya 2 ya0 ! Ss * ~ x, ga0 !
3 ( Kh0 ~ yt2b 2 yb !
p2 ~ ryt2b !
p~ x t !
t
1
n
ga*0 ~ ryt2b !p~ x! dx+
i
i
Let S1n
and S2n
be the ith elements of S1n and S2n , respectively, with Ss *ij ~{! being the
~i, j ! element of Ss * ~{!+ By the dominated convergence theorem and the integrability condition, we have
i
S1n
5
Mh
Mn
3
5
(t
FE
p2 ~ ryt2b !
p~ x t !
G
d
Kh ~ ya 2 ya0 ! Ss i1* ~ x, ga0 !
Mh
1
M n (t
3
FE
Mh
v~ x t !102 «t
p2 ~ ryt2b !
p~ x t !
(
b51,Þa
~K * K !h0 ~ yb 2 yt2b !¹Gm ~ yb , ryt2b !p~ x! dx
v~ x t !~«t2 2 1!
d
Kh ~ ya 2 ya0 ! Ss i 2* ~ x, ga0 !
M n (t
p2 ~ ryt2b !
p~ x t !
G
( ~K * K !h ~ yb 2 yt2b !¹Gv ~ yb , ryt2b !p~ x! dx
b51,Þa
0
@v~ x t ! 102 Ã1i1 ~ x t !«t 1 v~ x t !Ã1i 2 ~ x t !~«t2 2 1!# 1 op ~1!,
where
d
Ã1ij ~ x t ! 5 ¹G j ~ ya0 , ryt2a !
(
b51,Þa
E
Ss ij* @~ ya0 , yt2b , ry~a, b! !, ga0 # p~ ya0 , yt2b , ry~a, b! ! d ry~a, b!
and ¹G j ~{! 5 ¹Gm ~{!, for j 5 1; ¹Gv ~{!, for j 5 2+ Because p2 ~{!0p~{! and Ãij ~{! are
bounded under the condition of compact support, applying the law of large numbers for
i
i+i+d+ errors jt 5 ~«t , ~«t2 2 1!! Á leads to S1n
5 op ~1! and consequently S1n 5 op ~1!+
Likewise,
1136
i
S2n
5
WOOCHEOL KIM AND OLIVER LINTON
Mh
Mn
3
1
3
5
p2 ~ ryt2b !
(t
FE
p~ x t !
G
d
Kh ~ ya 2 ya0 ! Ss i1* ~ x, ga0 !
p2 ~ ryt2b !
Mh
M n (t
FE
*0
g1a
~ ryt2b !
p~ x t !
( Kh ~ yt2b 2 yb !p~ x! dx
b51,Þa
*0
g2a
~ ryt2b !
G
d
Kh ~ ya 2 ya0 ! Ss i 2* ~ x, ga0 !
p2 ~ ryt2b !
Mh
M n (t
p~ x t !
(
b51,Þa
Kh ~ yt2b 2 yb !p~ x! dx
@Ã2i1 ~ x t !m at ~ ryt2a ! 1 Ã2i1 ~ x t !vat ~ ryt2a !# 1 op ~1!,
where
E
d
Ã2ij ~ x t ! 5
(
b51,Þa
Ss ij* @~ ya0 , yt2b , ry~a, b! !, ga0 # p~ ya0 , yt2b , ry~a, b! ! d ry~a, b! ,
i
and, for the same reason as before, we get S2n
5 op ~1! and S2n 5 op ~1!, because
E~m at ~ ryt2a !! 5 E~vat ~ ryt2a !! 5 0+
We finally show negligibility of the last term:
T3n 5
Mh
Mn
n'
(
Kh ~ yt2a 2 ya0 !u t @ gJ a ~ ryt2a ! 2 ga0 ~ ryt2a !# +
t5d11
Substituting the error decomposition for gJ a ~ ryt2a ! 2 ga0 ~ ryt2a ! and interchanging the
summations gives
d
T3n 5
Mh
(
( ( Kh ~ yt2a 2 ya0 !Kh ~ ys2b 2 yb !
b51,Þa nM n t s~Þt !
0
3
p2 ~ rys2b !
p~ x s !
d
1
3
ga*0 ~ rys2b !u t ga*0 ~ rys2b !
Mh
(
( ( Kh ~ yt2a 2 ya0 !~K * K !h ~ ys2b 2 yb !
b51,Þa nM n t s~Þt !
0
p2 ~ rys2b !
p~ x s !
~¹G~ x b , rys2b ! ( u t js !
1 op ~1!,
where the op ~1! errors for the remaining bias terms hold under the assumption that
M nhh02 5 o~1!+ For
pni, b ~z t , z s ! 5 Kh ~ yt2at 2 ya0 !Kh0 ~ ys2b 2 yb !
p2 ~ rys2b !
p~ x s !
u t ga*0 ~ rys2b !M h0~nM n !,
LIVE METHOD FOR VOLATILITY
1137
i, b
i, b
we can easily check that E~p1n
~z t , z s !6z t ! 5 E~p1n
~z t , z s !6z s ! 5 0, for t Þ s, implying
i, b
that ( (tÞs pn ~z t , z s ! is a degenerate second-order U-statistic+ The same conclusion
also holds for the second term+ Hence, the two double sums are mean zero and have
variance of the same order as
n 2 3 $Epni, b ~z t , z s ! 2 1 Epni, b ~z t , z s !Epni, b ~z s , z t !%,
n
which is of order n21 h 21 + Therefore, T3n 5 op ~1!+
LEMMA A+2+ ~Masry, 1996!+ Suppose that Assumption C holds. Then, for any vector m 5 ~ m 1 , + + + , m d ! Á with 6m6 5 S j m j # v,
(a) supx[X 6Dxm p~
[ x! 2 Dxm p~ x!6 5 Op ~ log n0ng ~26m61d ! ! 1 Op ~g q2m !,
~26m61d !
q2m
K x! 2 Dxm m~ x!6 5 Op ~ log n0nh 0
! 1 Op ~h 0 ! [ rn ~ m!,
(b) supx[X 6Dxm m~
2
(c) supx[X 6 m~
K x! 2 m~ x! 2 L~
E x!6 5 Op ~rn !, where,
M
M
L~
E x! 5
1
(
n sÞt
Kh0 ~ x s 2 x!
p~ x s !
v 102 ~ x s !«s 1 h 0 bn ~ x!+
q
LEMMA A+3+ Suppose that Assumption C holds. Then, for any vector m 5
~ m 1 , + + , m d ! Á with 6m6 5 S j m j # v,
(a) supx a[X a 6D m w[ a ~ ya ! 2 D m wa ~ ya !6 5 Op ~ log n0nh ~26m611! ! 1 Op ~h q2m ! 1
Op ~rn2 ~ m!!,
(b) supx a[X a 6 w[ a ~ ya ! 2 wa ~ ya ! 2 LZ w ~ ya !6 5 Op ~rn2 ! 1 op ~n2102 !,
M
where
LZ w ~ ya ! 5
1
n
1
( ~K * K !h ~ yt2a 2 ya !
t
1
Kh ~ yt2a 2 ya !
n(
t
F
paT ~ ryt2a ! Gm' ~m~ ya , ryt2a !!v~ x t !102
p~ x t !
p2 ~ ryt2a !
p~ x t !
Gv' ~v~ ya ,
ryt2a !v~ x t !!
GF G
«t ,
«t2
21
@m at ~ ryt2a !, vat ~ ryt2a !# Á 1 h q bN a ~ ya !+
Proof. We first show ~b!+ For notational simplicity, the bandwidth parameter h ~only
in this proof ! abbreviates h 0 + From the decomposition results for the instrumental variable estimates,
w[ a ~ ya ! 2 wa ~ ya ! 5 @I2 J e1Á Qn21 #tn ,
where Qn 5 @ q[ ni1j22 ~ ya !# ~i, j !51,2 , with q[ ni 5 ~10n! (t5d Kh ~ yt2a 2 ya !@ p[ aT ~ ryt2a !0
p~
[ x t !# @~ yt2a 2 ya !0h# i, for i 5 0,1,2, and tn 5 ~10n! (t Kh ~ yt2a 2 ya ! @ p[ aT ~ ryt2a !0
p~
[ x t !# @ rI t 2 wa ~ ya ! 2 ~ yt2a 2 ya !¹wa ~ ya !# J ~1, ~ yt2a 2 ya !0h! Á + By the Cauchy–
Schwarz inequality and Lemma A+2 applied with Taylor expansion, it holds that
n
1138
WOOCHEOL KIM AND OLIVER LINTON
1
n
( Kh ~ yt2a 2 ya !
n t5d
sup
x a[X a
# sup
x[X
*
S
p[ aT ~ rya !
p~
[ x!
2
F
p[ aT ~ ryt2a !
p~
[ xt !
paT ~ rya !
p~ x!
*
2
sup
x a[X a
paT ~ ryt2a !
p~ x t !
GS
yt2a 2 ya
h
( Kh ~ yt2a 2 ya ! *
n t5d
1
n
D
i
yt2a 2 ya
h
*
i
D
5 Op sup 6 p~
[ x! 2 p~ x!6 [ Op ~rn !,
x[X
where the boundedness condition of C+2 is used for the last line+ Hence, the standard
argument of Masry ~1996! implies that supx a[X a 6 q[ ni 2 qi 6 5 op ~1!, where qi 5
*K~u 1 !u 1i du 1 + From q0 5 1, q1 5 0, and q2 5 m 2K , we get the following uniform conp
vergence result for the denominator term; that is, e1Á Qn21 & e1Á , uniformly in ya [
X a + For the numerator, we show the uniform convergence rate of the first element of
tn because the other terms can be treated in the same way+ Let tn1 denote the first
element of tn , that is,
tn1 5
1
n
(t Kh ~ yt2a 2 ya !
p[ aT ~ ryt2a !
p~
[ xt !
@Gm ~ m~
K x t !! 2 Ma ~ ya ! 2 ~ yt2a 2 ya !m a' ~ ya !# ,
or alternatively,
tn1 5
1
n
[
(t Kh ~ yt2a 2 ya !r~ xt ; g!,
where
r~ x t ; g! 5
g2 ~ ryt2a !
g3 ~ x t !
@Gm ~g1 ~ x t !! 2 Ma ~ ya ! 2 ~ yt2a 2 ya !m a' ~ ya !# ,
g~ x t ! 5 @ g1 ~ x t !, g2 ~ ryt2a !, g3 ~ x t !# 5 @m~ x t !, paT ~ ryt2a !, p~ x t !# ,
g[ 5 g~
[ x t ! 5 @ m~
K x t !, p[ aT ~ ryt2a !, p~
[ x t !# +
Because paT ~{!0p~{! is bounded away from zero and Gm has a bounded second-order derivative, the functional r~ x t ; g! is Frechet differentiable in g, with respect to the sup norm
7{7` , with the ~bounded! functional derivative R~ x t ; g! 5 @]r~ x t ; g!0]g#fg5g~ x t ! + This
implies that for all g with 7g 2 g 0 7` small enough, there exists some bounded function
b~{! such that
7r~ x t ; g! 2 r~ x t ; g 0 ! 2 R~ x t ; g 0 !~g 2 g 0 !7` # b~ x t !7g 2 g 0 7 2` +
By Lemma A+2, 7g[ 2 g 0 7 2` 5 Op ~rn2 !, and consequently, we can properly linearize tn1 as
tn1 5
1
n
1
(t Kh ~ yt2a 2 ya !r~ xt ; g 0 ! 1 n (t Kh ~ yt2a 2 ya !R~ xt ; g 0 !~ g[ 2 g 0 ! 1 Op ~rn2 !,
LIVE METHOD FOR VOLATILITY
1139
where the Op ~rn2 ! error term is uniformly in x a + After plugging Gm ~m~ x t !! 5 cm 1
S 1#b#d m b ~ yt2b ! into r~ x t ; g 0 !, a straightforward calculation shows that
tn1 5
1
n
1
1
(t Kh ~ yt2a 2 ya !§t @1 1 Op ~rn !#
1
n
(t Kh ~ yt2a 2 ya !
hq
paT ~ ryt2a !
p~ x t !
Gm' ~m~ x t !!@ m~
K x t ! 2 m~ x t !#
m q ~k!b1a ~ ya ! 1 op ~h q !,
q!
(A.13)
where §t 5 @ p2 ~ ryt2a !0p~ x t !#MaT ~ ryt2a ! and MaT ~ ryt2a ! 5 S 1#b#d, ~Þa! m b ~ ryt2a !+ Note that
as a result of the identification condition E @§t 6 yt2a # 5 0 and consequently the first term
is a standard stochastic term appearing in kernel estimates+ For a further asymptotic
expansion of the second term of tn1 , we use the stochastic equicontinuity argument to
the empirical process $vn ~{,{!%, indexed by ~ ya , d! [ X a 3 T, with T 5 $d : 7d7v,2, X a #
C%, such that
vn ~ ya , d! 5
n'
1
Mn
(
@ f ~ x t ; ya , d! 2 E~ f ~ x t ; ya , d!!# ,
t5d11
where f ~ x t ; ya , d! 5 K@~ yt2a 2 ya !0h# h v @ paT ~ ryt2a !0p~ x t !#Gm' ~m~ x t !!d~ yt2a !, for some
positive integer v . d02+ Let dD 5 h 2v2102 @ m~
K x t ! 2 m~ x t !# + From the uniform convergence rate in Lemma A+2 and the bandwidth condition in C+8~iii!, it follows that 7 d7
D v,2, X 5
Op ~h 2v2102 @M log n0nh ~2v1d ! 1 h q2v # ! 5 op ~1!, leading to ~i! Pr~ dZ [ T ! r 1 and ~ii!
p
r~~ ya , d!,
Z ~ ya , d 0 !! & 0, where d 0 5 0+ These conditions and stochastic equicontinuity
0
of vn ~{,{! at ~ ya , d ! yield supya[Xa 6vn ~ ya , d!
Z 2 nn ~ ya , d 0 !6 5 supx a[Xa 6vn ~ ya , d!6
Z 5
op ~1!+ Thus, the second term of tn1 is approximated with an op ~10M n ! error ~uniform in
ya ! by
E
Kh ~ yt2a 2 ya !
paT ~ ryt2a !
p~ x t !
Gm' ~m~ x t !!@ m~
K x t ! 2 m~ x t !# p~ x t ! dx t ,
K x t ! 2 m~ x t !, is given by
which, by substituting L~
E x t ! for m~
1
~K * K !h
n(
s
1
hq
q!
S
ys2a 2 ya
h
m q ~k!b2a ~ ya !,
D
paT ~ rys2a !Gm' ~m~ ya , rys2a !!
v 102 ~ x s !«s
p~ x s !
(A.14)
where ~K * K !~{! is actually a convolution kernel as defined before+ Hence, by letting
bN a ~ ya ! summarize two bias terms appearing in ~A+13! and ~A+14!, Lemma A+3~b! is
shown+ The uniform convergence results in part ~a! then follow by the standard arguments of Masry ~1996!, because two stochastic terms in the asymptotic expansion of
w[ a ~ ya ! 2 wa ~ ya ! consist only of univariate kernels+
n