Bandwidth Selection for Continuous-Time Markov Processes
Federico M. Bandi, Valentina Corradi, and Guillermo Moloche
University of Chicago and University of Warwick
(Preliminary draft)
July 2009
Abstract
We propose a theoretical approach to bandwidth choice for continuous-time Markov processes. We
do so in the context of stationary and nonstationary processes of the recurrent kind. The procedure
consists of two steps. In the …rst step, by invoking local gaussianity, we suggest an automated bandwidth selection method which maximizes the probability that the standardized data are a collection
of normal draws. In the case of di¤usions, for instance, this procedure selects a bandwidth which
only ensures consistency of the in…nitesimal variance estimator, not of the drift estimator. Additionally, the procedure does not guarantee that the rate conditions for asymptotic normality of the
in…nitesimal variance estimator are satis…ed. In the second step, we propose tests of the hypothesis
that the bandwidth(s) are either "too small" or "too big" to satisfy all necessary rate conditions for
consistency and asymptotic normality. The suggested statistics rely on a randomized procedure based
on the idea of conditional inference. Importantly, if the null is rejected, then the …rst-stage bandwidths are kept. Otherwise, the outcomes of the tests indicate whether larger or smaller bandwidths
should be selected. We study scalar and multivariate di¤usion processes, jump-di¤usion processes, as
well as processes measured with error as is the case, for instance, for stochastic volatility modelling
by virtue of preliminary high-frequency spot variance estimates. The …nite sample joint behavior of
our proposed automated bandwidth selection method, as well as that of the associated (second-step)
randomized procedure, are studied via Monte Carlo simulation.
Keywords: Bandwidth selection, recurrence, Continuous-time Markov processes.
We are grateful to Christian-Yann Robert for his useful comments. We thank Marine Carrasco, Jean-Marie Dufour,
Andrew Patton, Anders Rahbek, Eric Renault, and Olivier Scaillet for suggestions. We also thank the seminar participants
at the Universitè de Montreal, the Oxford-Man Institute, the Cass Conference "What went wrong? Financial Engineering,
Financial Econometrics, and the Current Stress" (December 5-6, 2008, London), the "Toulouse Financial Econometrics
conference" (May 15-16, 2009, Toulouse), the Stevanovich Center-CREATES conference "Financial Econometrics and Statistics: Current Themes and New Directions" (June 4-6, 2009, Skagen, Denmark) and the First European SoFiE conference
(June 10-12, 2009, Geneva, Switzerland) for discussions. Daniel Wilhelm provided outstanding research assistance. The
codes will soon be available on our websites.
1
1
Introduction
Following in‡uential, early work on fully nonparametric in…nitesimal volatility estimation and testing
for scalar di¤usion processes (e.g., Brugiére, 1991, Corradi and White, 1999, Florens-Zmirou, 1993, and
Jacod 1997), the recent nonparametric literature in continuous time has largely focused on the full
system. Emphasis might, for instance, be placed on the estimation of the …rst in…nitesimal moment (the
drift) in the di¤usion case (Stanton, 1997, among others) and, in the case of jump-di¤usions, on the
high-order in…nitesimal moments (Johannes, 2004, inter alia).
Motivated by the need to completely characterize the system’s dynamics, Bandi and Phillips (2003,
BP henceforth) have established consistency and asymptotic (mixed) normality for Nadaraya-Watson
kernel estimators of both the drift and the di¤usion function of recurrent (and, hence, possibly nonstationary) scalar di¤usion processes (see, also, Fan and Zhang, 2003, and Moloche, 2004, for local
polynomial estimates under stationarity and recurrence, respectively). Their results rely on a double
asymptotic design in which the interval between discretely-sampled observations approaches zero, in…ll asymptotics, and the time span diverges to in…nity, long-span asymptotics. A signi…cant di¤erence
between a stationary (or positive recurrent) di¤usion and a nonstationary (or null recurrent) di¤usion
is that in the former case the local time grows linearly with the time span, while in the latter case it
grows at a slower (and, generally, unknown) rate. Because the rate of divergence of local time a¤ects the
rate of convergence of the functional estimates of the process moments, this observation is theoretically,
and empirically, important. Bandi and Moloche (2004, BM henceforth) have generalized the results
in BP to the case of multidimensional di¤usion processes. Importantly, in the multidimensional case
a well-de…ned notion of local time no longer exists and one has to rely on the more general notion of
occupation density. In both the scalar and the multidimensional case, consistency and (mixed) normality
of the drift and variance estimator (and, hence, of the full system’s dynamics) rely on the proper choice
of the bandwidth parameters, i.e., on the rate at which the bandwidths approach zero as the interval
between discretely-sampled observations goes to zero and the corresponding occupation densities (or
local times, in the scalar case) diverge to in…nity.
Admittedly, in the context of the functional estimation of continuous-time Markov models, the appropriate choice of window width is a largely unresolved issue. While it is recognized that in…nitesimal
conditional moment estimation in continuous time and conditional moment estimation in discrete time
impose di¤erent requirements on the optimal window width for estimation accuracy (see, e.g, BP, 2003,
and BM, 2004, for discussions), there is an overwhelming tendency in the continuous-time literature
to employ bandwidth selection methods which can only be justi…ed in more traditional set-ups of the
regression type. Cross-validation procedures applied to the estimation of the drift and in…nitesimal variance of scalar di¤usion processes are typical examples. Yet, to the best of our knowledge, even in the
stationary case, no theoretical discussion has been provided to automatically select the window width
in continuous-time models of the types routinely used in the nonparametric …nance literature. Furthermore, for both discrete and continuous-time processes, bandwidth selection is particularly delicate in
the null recurrent (nonstationary) case since, as said, the bandwidth’s vanishing rate ought to depend
on the divergence rate of the number of visits to open sets in the range of the process but the latter is
unknown, in general. In discrete time, important progress on the issue of bandwidth selection has been
2
made by Karlsen and Tjostheim (2001) for -null recurrent processes and by Guerre (2004) for general
recurrent processes. The continuous-time case poses additional complications in that not only one has
to adapt to the level of recurrence in the estimation domain but, also, to the rate at which the interval
between discretely-sampled observations vanishes asymptotically.
This paper attempts to …ll this important gap in the continuous-time econometrics literature by
proposing a theoretical approach to automated bandwidth choice. The approach is designed for widelyemployed classes of continuous-time Markov processes, such as scalar and multivariate di¤usion processes
and jump-di¤usion processes, and is justi…ed under mild assumptions on their statistical properties,
stationarity not being required. Our solution to the problem is novel and may also be applied to
discrete-time models, as outlined in Section 8.
In the di¤usion case, the intuition of our approach is as follows. Consider kernel estimates of drift and
di¤usion function (bhdr and bhdif ). Assume these estimates are obtained by selecting di¤erent smoothing
sequences. Invoking the local Gaussianity property which di¤usion models readily imply as a useful prior
on the distributional feature of the standardized data, we maximize the probability that the standardized
data
(Xt+
Xt ) bhdr (Xt )
p
bhdif (Xt )
sequences (hdr
is a collection of draws from a Gaussian distribution by choosing the relevant
smoothing
and hdif ) accordingly. This procedure selects a bandwidth hdif which ensures
the consistency of the in…nitesimal variance estimator but, in spite of its sound empirical performance,
does not select a bandwidth hdr which ensures the theoretical consistency of the drift function. Also, the
automatically-chosen bandwidths do not necessarily satisfy the rate conditions required for (mean zero)
asymptotic normality. To overcome this issue, for each in…nitesimal moment, we propose a test of the
null hypothesis that one or more rate conditions (for consistency and normality) are violated versus the
alternative that all rate conditions are satis…ed. The suggested statistics (separately speci…ed for drift
and di¤usion) rely on a randomized procedure based on the idea of conditional inference, along the lines
of Corradi and Swanson (2006). If the null is rejected, then the selected bandwidth is kept, otherwise
the outcome of the procedure suggests whether we should select a larger or a smaller bandwidth. We
proceed sequentially, until the null is rejected. Because the probability of rejecting the null when the it
is false is asymptotically one at each step, our approach does not su¤er from a sequential bias problem.
Our emphasis on recurrence is empirically-motivated, theoretical generality being only a by-product.
Under general recurrence properties, the bandwidth’s rate conditions are not a function of T (the time
span or the number of observations) as in stationary time-series analysis. They are a function of the
number of visits to each level at which functional estimation is conducted. Importantly, however, even for
stationary processes (which are, as emphasized, a sub-case of the class of recurrent processes) choosing
the bandwidth rate as a function of the empirical occupation times is bound to provide a more objective
solution to the bandwidth selection problem than choosing it based on a theoretical (and, hence, purely
hypothetical) divergence rate of the occupation times equal to T . This point is, of course, particularly
compelling when dealing with highly dependent, but possibly stationary, time-series of the type routinely
encountered in …elds such as …nance. These processes return to values in their range very slowly and, thus,
even though they may be stationary, have occupation densities which hardly diverge at the "theoretical"
T rate.
We begin by considering the case of bandwidth selection for scalar di¤usion models (Section 2). We
3
then extend our analysis to scalar jump-di¤usion processes (Section 3). The case of a di¤usion observed
with error is presented in Section 4. Stochastic variance processes …ltered from high-frequency …nancial
data may, of course, be regarded as processes observed with error. We evaluate the case of stochastic
volatility explicitly and discuss bandwidth selection for di¤usion models applied to market microstructure
noise-contaminated spot variance estimates in Section 5. In Section 6 we study the multivariate di¤usion
case. Section 7 provides a Monte Carlo study. Section 8 contains …nal remarks. All proofs are collected
in the Appendix.
2
2.1
Scalar di¤usion processes
The framework
We consider the following class of one-factor models,
dXt = (Xt )dt + (Xt )dWt ;
where fWt : t = 1; :::; T g is a standard Brownian motion. Our objective is to provide suitable nonparametric estimates of the drift term (a) and of the in…nitesimal variance 2 (a): To this extent, we assume
availability of a sample of N equidistant observations and denote the discrete interval between two
successive observations as N;T = T =N , where T de…nes the time span. Speci…cally, we observe the
di¤usion skeleton X N;T ; X2 N;T ; :::; XN N;T : In what follows, we require N; T ! 1; N;T ! 0 (in-…ll
asymptotics), and T = N;T N ! 1 (long-span asymptotics) for consistency of the moment estimates.
As in Stanton (1997), BP (2003), and Johannes (2004), inter alia, we construct the following estimators
of the drift and in…nitesimal variance, respectively:
bN;T (a) =
1
N;T
and
b2N;T (a) =
1
N;T
dif
We denote by h = hdr
N;T ; hN;T 2 H
PN
Xj
PN
Xj
1
j=1 K
1
j=1 K
N;T
a
hdr
N;T
PN
j=1 K
N;T
hdif
n;T
PN
a
j=1 K
X(j+1)
Xj
N;T
hdr
N;T
N;T
N;T
;
a
X(j+1)
N;T
Xj
a
N;T
hdif
N;T
Xj
Xj
(1)
2
N;T
:
(2)
2 a bivariate vector bandwidth belonging to the set H contained
R+
2 . This vector is our object of econometric interest. Assumption 1 guarantees
in the positive plane R+
existence of a unique, recurrent solution to X. Assumption 2 outlines the conditions imposed on the
kernel function K(:) in Eqs. (1) and (2). The same conditions on the kernel function are also employed
in the following sections.
Assumption 1.
(i )
(:) and (:) are time-homogeneous, B-measurable functions on D = (l; u) with 1 l < u 1;
where B is the -…eld generated by Borel sets on D. Both functions are at least twice continuously
4
di¤ erentiable. Hence, they satisfy local Lipschitz and growth conditions. Thus, for every compact
subset J of the range of the process, there exist constants C1J and C2J so that, for all x and y in
J,
j (x)
(y)j + j (x)
C1J jx
(y)j
yj;
and
C2J f1 + jxjg.
j (x)j + j (x)j
(ii )
2 (:)
> 0 on D.
(iii ) We de…ne S( ), the natural scale function, as
Z
Z
exp
S( ) =
c
y
c
2 (x)
dx dy;
2 (x)
where c is a generic …xed number belonging to D. We require S( ) to satisfy
lim S( ) =
1:
!l
and
lim S( ) = 1:
!u
Assumption 2. The kernel K(:) is a continuously di¤ erentiable, symmetric and nonnegative function
whose derivative K 0 (:) is absolutely integrable and for which
Z
1
K(s)ds = 1;
K2 =
1
Z
1
1
K 2 (s)ds < 1;
sup K(s) < C3 ;
s
and
Z
1
1
s2 K(s)ds < 1:
In what follows, the symbol LX (T; a) denotes the chronological local time of X at T and a, i.e., the
number of calendar time units spent by the process around a in the time interval [0; T ].
Proposition 1 (BP, 2003): Let Assumptions 1 and 2 hold.
1=2
(i ) Let
N;T
= T =N with T …xed. If limN !1
b (T ; a) =
where L
X
N;T
hN;T
PN
j=1 K
b (T ; a)
L
X
Xj
1
N;T
hN;T
1
N;T
LX (T ; a) = oa:s: (1);
a
N;T
hN;T
log
:
5
! 0, then
The drift estimator
a:s:
Let (ii ) hdr
N;T LX (T; a) ! 1 and (iii )
LX (T;a)
hdr
N;T
N;T
bN;T (a)
a:s:
log
1
1=2 a:s:
N;T
! 0; then:
(a) = oa:s: (1):
Further, if (iv ) hdr;5
N;T LX (T; a) ! 0; then:
r
b
hdr
N;T LX (T; a) b N;T (a)
(a) ) N 0; K2
2
(a) :
The di¤ usion estimator
dif
If (iii ) holds with hdr
N;T replaced by hN;T , then:
Further, if (iv’)
b2N;T (a)
hdif;5
N;T LX (T;a) a:s:
N;T
2
(a) = oa:s: (1):
! 0; then:
v
u dif
b
uh L
t N;T X (T; a)
N;T
b2N;T (a)
2
(a) ) N 0; 2K2
4
(a) :
It is evident from the proposition above (as well as classical logic based on nonparametric moment
estimation in discrete time) that consistency and asymptotic normality of the drift and variance estimator
crucially rely on appropriate choice of the smoothing parameter(s). To this extent, two issues ought to be
addressed. First, usual data-driven methods often employed in empirical work in continuous-time …nance,
such as cross-validation, are not theoretically justi…ed and may not necessarily work in the presence of inp
…ll asymptotics and nonstationarity. Second, while in the positive recurrent case LX (T; a)=T ! fX (a);
where fX (a) denotes the stationary probability density at a of the process X; in the null recurrent
p
case LX (T; a)=T ! 0. Under null recurrence, as emphasized earlier, LX (T; a) grows at a (generally
unknown) rate which is slower than T .1 Since the bandwidth’s vanishing rate depends on this unknown
rate, appropriate bandwidth selection in the null recurrent case is particularly delicate.
We shall proceed in two steps. In the …rst step, we introduce an adaptive bandwith selection
method which ensures consistency of the di¤usion estimator but only guarantees that bN;T (a)
(a) =
1=2
op
: In the second step, we employ a randomized procedure to test whether the bandwidth
N;T
selected in the …rst stage violate any of the rate conditions (ii )-(iii )-(iv ) for the drift and (iii )-(iv’) for
the di¤usion. This second step is conducted separately for drift and di¤usion. Should we reject the
null, then we would rely on the previously-chosen bandwidth. Alternatively, because the outcome of
the procedure gives us information about whether the selected bandwidth is too small or too large, we
iterate until the null is rejected.
1
p
The Brownian motion case is an exception for which the rate is known and LX (T; a)= T = Op (1):
6
2.2
First step: A residual-based procedure
Consider the estimated residual series
(
Xi N;T X(i 1) N;T bN;T (X(i 1)
p
b
"i N;T =
bN;T (X(i 1) N;T )
N;T
)
N;T
N;T
1
N;T T
: i = 2; :::;
)
;
1
assuming, for notational simplicity, that N;T
is an integer. In light of the normality of the driving
Brownian motion, over small time intervals N;T the residual series is roughly standard normally distributed. Our minimization problem requires …nding
b N;T 2H
h
b
h
2
R+
:
FNN;T ;
=
(3)
N
b
h
1
with N # 0 as N = N;T
T ! 1, where FNN;T denotes the empirical cumulative distribution of the
estimated residuals b
"i N;T ,
is the cumulative distribution of the standard normal random variable,
and (:; :) is a distance metric.
It is noted that the criterion is de…ned over a …xed time span T whereas the estimators, mainly
for consistency of the drift, are de…ned over an enlarging span of time T: We de…ne the criterion over
a …xed time span to avoid theoretical imbalances in the case of nonstationary di¤usions. This point
is discussed in Bandi and Phillips (2007). From an empirical standpoint, …xing the sample span over
which the criterion is minimized and enlarging the time span over which the nonparametric estimators
are computed is immaterial. It simply amounts to splitting the sample into two parts, i.e. (0; T ] and
(T ; T ]. The entire sample (from 0 to T ) is used to compute bN;T (:) and bN;T (:). The …rst part of the
sample (from 0 to T ) is used to de…ne the minimization problem.2
We focus on the Kolmogorov-Smirnov distance, but a di¤erent distance measure may, of course, be
dif
employed. We de…ne the target bandwidth sequence hN;T = (hdr
N;T ; hN;T ) as the bandwidth sequence
which guarantees that the empirical distribution function of the standardized data converges uniformly
T
to the standard normal distribution function as N; T ! 1 with N
! 0 (and, of course, with N =
1
T N;T
! 1). First, we show that this sequence exists and is optimal in a sense to be de…ned. Second,
b N;T converges to it.
we show that h
We start with optimality and assume that hN;T exists for the moment. We show that the bandwidth
dif
vector (hdr
N;T ; hN;T ) minimizing the distance between the residual empirical distribution function and
the Gaussian cumulative distribution function is the optimal (in the sup norm) bandwidth vector up to
an additional condition needed to identify the drift.
dif
Theorem 1. A vector bandwidth hN;T = (hdr
N;T ; hN;T ) satis…es
p
2
This statement can easily be reconciled
with our theoretical framework.
Assume T = N , for instance. Then,
n
p o
T
= p1N . We can now split the sample in
the observations are equispaced at p1N ; p2N ; :::; 1; 1 + p1N ; :::; N since N
two parts, namely observations in (0; T ] and observations in (T ; T ]: Assume, without loss of generality, that T = 1.
Also, assume that there are N equispaced observations in the …rst part of the sample. Then, N1 = p1N . This
implies thatpthe number of observations in the …rst part of the sample, which is de…ned over a …xed time span T ,
grows with N , whereas the number of observations in the second part of the sample grows with N . In practice
one can choose T relatively large.
7
hN;T = h 2 H : sup FNh (x)
p
(x)
!
N;T !1;
x
N;T !0
0
(4)
if and only if
sup bN;T
a2D
and
a; hdr
N;T
p
(a) = op
sup bN;T a; hdif
N;T
1
N;T
!
;
(5)
(a) = op (1) :
(6)
a2D
b N;T is asymptotically equivalent to it.
We now show that the target bandwidth hN;T exists and h
dif
Theorem 2. (i ) There exists a vector bandwidth hN;T = (hdr
N;T ; hN;T ) so that
hN;T = h 2 H : sup FNh (x)
(x)
x
p
!
N;T !1;
N;T !0
0
(7)
and
dif
hN;T = hdr
N;T ; hN;T
!
N;T !1;
N;T !0
0:
(ii ) If
b N;T = h 2H : sup F h (x)
h
N
(x) =
x
with
N
# 0 as N ! 1, then
b N;T =h
h
N;T
p
!
N;T !1;
N;T !0
(8)
N
1:
b N;T ensuring that our proposed criterion
Theorem 2 guarantees the existence of a bandwidth vector h
has a solution. This solution guarantees uniform consistency (in probability) of the variance estimator
but, despite being empirically sensible as we show below through simulations, fails to guarantee theoretical consistency of the drift estimator. In addition, the selected di¤usion bandwidth does not ensure
asymptotic normality of the di¤usion estimator. A second procedure is therefore needed in order to verify whether the resulting bandwidths satisfy all rate conditions needed for consistency and asymptotic
normality of both estimators.
a:s:
dr;5
Given Proposition 1, we now need to check whether hdr
N;T is small enough as to satisfy hN;T LX (T; a) !
0 8a 2 D and large enough as to satisfy min hdr
N;T LX (T; a); (
hdr
N;T
N;T
log(1=
Similarly, we need to check whether hdif
N;T is small enough as to satisfy
large enough as to satisfy
hdif
N;T
(
N;T
log(1=
N;T ))
a:s:
1=2 L (T;a)
X
8
! 1 8a 2 D:
N;T ))
a:s:
1=2 L (T;a)
X
hdif;5
N;T LX (T;a) a:s:
N;T
! 1 8a 2 D:
! 0 8a 2 D and
2.3
Second step: A randomized procedure
h
b N;T = b
b dif
b
Let h
hdr
(x) : We begin by verifying whether
N;T ; hN;T be de…ned as hN;T = arg minh FN (x)
b
b dif
hdr
N;T satis…es conditions (ii ), (iii ), and (iv ) in Proposition 1. Next, we will turn to hN;T , whose requirements are slightly di¤erent.
It is immediate to see that (ii ) and (iii ) require the bandwidth not to approach zero too fast, thus
only one of the two is binding. Condition (iv ) instead requires the bandwidth to approach zero fast
enough. It is important to rule out the possibility that any bandwidth is too large to satisfy (iv ) and
too small to satisfy the most stringent between (ii ) and (iii ). To this extent, we only ought to provide
1=5 :
primitive conditions on N and T . If (iv ) is violated, then hdr
N;T goes to zero not faster than LX (T; a)
This ensures that (ii ) is satis…ed, but does not ensure that (iii ) is satis…ed. For (iii ) to be satis…ed when
1=2
(iv ) is not, we need LX (T; a)6=5 N;T log(1= N;T ) ! 0: Because LX (T; a) can grow at most at rate T; a
su¢ cient condition is therefore N=T 17=5 ! 1.
Provided N=T 17=5 ! 1; there are three possibilities (see Figure 1). First, we have chosen the right
bandwidth and thus b
hdr
N;T satis…es (ii ), (iii ), and (iv ). Second, we have chosen too large a bandwidth, so
that (ii ) and (iii ) hold, but (iv ) is violated. Third, we have chosen too small a bandwidth, so that either
(ii ) or (iii ) is violated (or both) but (iv ) holds. Hence, at most one set of conditions can be violated,
namely either (iv ) or the most stringent between (ii ) and (iii ). To this extent, we consider the following
hypotheses:
8
9
1=2
b
1=2
<
LX (T; a) N;T log (1= N;T ) = a:s:
1
b (T; a) a:s:
L
!
1
or
max
H0dr : b
hdr;5
;
! 1,
X
N;T
b (T; a)
b
:b
;
hdr
hdr L
N;T
a:s:
b
dr
HA
: b
hdr;5
N;T LX (T; a) ! 0 and max
N;T
X
8
<
1
;
b
:b
hdr
N;T LX (T; a)
b (T; a)
L
X
1=2
N;T
log1=2 (1=
b
hdr
N;T
a:s:
b
b dr b
The null is that either b
hdr;5
N;T LX (T; a) ! 1; (iv ) is violated, or min hN;T LX (T; a);
9
=
)
N;T
a:s:
! 0.
;
b (T;a)
L
X
b
hdr
N;T
1=2
N;T
log
1=2
a:s:
(1=
N;T )
0; (ii )^(iii ) is violated. Since it is impossible that neither (ii )^(iii ) nor (iv ) hold, the alternative is that
both (ii )^(iii ) and (iv ) hold. Thus, if we reject the null, we can rely on b
hdr
N;T for drift estimation.
If, instead, we fail to reject the null, depending on which condition we fail to reject, we know whether
we have chosen a bandwidth which is too small or one which is too large. Suppose that the selected
bandwidth is too large, we proceed sequentially by choosing a smaller bandwidth until we reject the
null. Because at all steps the probability of rejecting the null when it is wrong is asymptotically one,
the procedure does not su¤er from the well-known sequential bias issue.
b
Importantly, rejection of the null, as stated above, does not rule out the possibility that b
hdr;5
N;T LX (T; a) =
b
Op (1) (if b
hdr
N;T / LX (T; a)
1=5 )
b
or min b
hdr
N;T LX (T; a);
b (T;a)
L
X
b
hdr
N;T
1=2
N;T
log1=2 (1=
N;T )
= Op (1) (if b
hdr
N;T /
b
b (T; a) (1=5+ =2) with > 0 and N / T 17=5+ or if b
1
L
hdr
X
N;T / LX (T; a) ): Also, it does not ensure
that conditions (ii ), (iii ), and (iv ) hold for all evaluation points a 2 D. Hence, we re-formulate the
9
!
Figure 1: Graphical representation of the drift bandwidth test
hypotheses as follows:
Z
a:s:
dr;(5 ") b
b
H00;dr :
hN;T
LX (T; a)da ! 1
A
8
1=2
Z b
<
LX (T; a) N;T log1=2 (1=
1
;
or max R
dr;(1+")
dr;(1+") b
: b
b
hN;T
LX (T; a)da A
A hN;T
for A
D; and " > 0 arbitrarily small, versus
N;T )
9
=
a:s:
da ! 1
;
0;dr
: negation of H00;dr .
HA
The role of the integral over A, and of " > 0, is to ensure that rejection of the null implies
b
R dr b
R dr;5 b
hdr
a:s:
a:s:
N;T
min A b
hN;T LX (T; a)da; R b
! 1 and A b
hN;T LX (T; a)da ! 0. However, of
1=2
A
LX (T;a)
N;T
log(1=
N;T )da
course, if we choose an " which is not small enough, we run the risk of not having a bandwidth sequence
for which H00;dr is rejected. Hereafter, we consider the following statistic:
n
n
oo
e
e
e
VR;N;T = min V1;R;N;T ; min V2;R;N;T ; V3;R;N;T
;
where for i = 1; 2; 3
Vei;R;N;T =
with U = [u; u] being a compact set,
R
U
Z
U
2
Vi;R;N;T
(u) (u)du,
(u)du = 1; (u)
0 for all u 2 U; and
R
2 X
Vi;R;N;T (u) = p
1 fvi;j;N;T
R j=1
10
ug
1
2
and
v1;j;N;T =
exp
Z
A
v2;j;N;T =
v3;j;N;T
0
exp
Z
A
dr;(5
b
hN;T
1=2
") b
LX (T; a)da
dr;(1+") b
b
LX (T; a)da
hN;T
0
Z b
LX (T; a)
@
@
= exp
A
1=2
1=2
(1=
N;T log
dr;(1+")
b
hN;T
1;j ;
1
!!1=2
N;T )
2;j ;
111=2
daAA
3;j ;
(9)
with ( 1 ; 2 ; 3 )| iidN (0; I3R ):
In what follows, let the symbols P and d denote convergence in probability and in distribution under
P ; which is the probability law governing the simulated random variables 1 ; 2 ; 3 , i.e., a standard
normal, conditional on the sample. Also, let E and V ar denote the mean and variance operators under
P . Furthermore, with the notation a:s: P we mean: for all samples but a set of measure 0:
R dr;(5 ") b
a:s:
Suppose that A b
LX (T; a)da ! 1. Then, conditionally on the sample and a:s: P , v1;j;N;T
hN;T
diverges to 1 with probability 1=2 and to 1 with probability 1=2: Thus, as N; T ! 1; for any
u 2 U; 1 fv1;j;N;T ug will be distributed as a Bernoulli random variable with parameter 1=2: Further note that as N; T ! 1; for any u 2 U; 1 fv1;j;N;T ug is equal to either 1 or 0 regardless of
P
ug 12 and
the evaluation point u; and so as N; T; R ! 1; for all u; u0 2 U; p2R R
j=1 1 fv1;j;N;T
P
R
p2
u0 g 12 will converge in d distribution to the same standard normal random
j=1 1 fv1;j;N;T
R
d
variable. Hence, Ve1;R;N;T ! 21 a:s: P: It is immediate to notice that for all u 2 U; V1;R;N;T (u) and
Ve1;R;N;T have the same limiting distribution. The reason why we are averaging over U; it is simply because the …nite sample type I and type II errors may indeed depend on the particular evaluation point.
R dr;(5 ") b
R dr;(5 ") b
a:s:
As for the alternative, if A b
hN;T
LX (T; a)da ! 0; (or, if A b
hN;T
LX (T; a)da = Oa:s: (1)), then
v1;j;N;T , as N; T ! 1, conditionally on the sample and a:s: P , will converge to a (mixed) zero mean
p
P
ug 12 will diverge to in…nity at speed R if
normal random variable. Thus, p2R R
j=1 1 fv1;j;N;T
u 6= 0 a:s: P .
Importantly, the two conditions stated in the null hypothesis are the negation of (ii ), (iii ), and
(iv ) in Proposition 1, respectively.3 As mentioned, only one of the conditions stated under the null
is false, simply because the criterion cannot select a bandwidth which is too small (for the most
stringent between (ii ) and (iii ) to be satis…ed)
and, at the
n
o same time, too large (for (iv ) to be
satis…ed). Hence, either Ve1;R;N;T or min Ve2;R;N;T ; Ve3;R;N;T has to diverge under the null. Thus,
n
n
oo
min Ve1;R;N;T ; min Ve2;R;N;T ; Ve3;R;N;T
; conditional on the sample, and for all samples but a set of
measure zero, is asymptotically 21 under the null and diverges under the alternative. If we reject the
null, then conditions
(ii ), (iii ),
1 are satis…ed. Otherwise, if, for instance,
n
n and (iv ) in Proposition
oo
e
e
e
e
V1;R;N;T = min V1;R;N;T ; min V2;R;N;T ; V3;R;N;T
3:84 and we fail to reject the null, then b
hdr
N;T is
b (T; a):
It should be noted that the rate conditions in Proposition 1 are stated in terms of LX (T; a) instead of L
X
1=2
b (T;a) 1=2 log1=2 (1=
1=2
L
)
L
(T;a)
log
(1=
)
a:s:
a:s:
X
N;T
X
N;T
b
N;T
N;T
However,
! 0 if, and only if,
! 0; but this ensures that L (T; a)
3
b dr
h
N;T
b dr
h
N;T
LX (sup ft : Xt = ag ; a) = oa:s: (1) (BP, 2003, Corollary 1).
11
X
too large (and condition (iv ) is violated). The same testing procedure should therefore be repeated until
n
o
dr
0
e
b
hdr
=
max
h
<
h
:
s.t.
H
is
rejected
:
N;T
N;T
0
In other words, the proposed procedure gives us a way to learn whether the conditions for consistency
and (mean zero) mixed normality of the drift are satis…ed. If they are not, it gives us a way to understand
which condition is not satis…ed and modify the bandwidth accordingly.
Theorem 3. Let Assumption 1 and 2 hold. Assume T; N; R ! 1; N=T 17=5 ! 1, and 4 .
(i ) Under H00;dr ;
d
2
1
VR;N;T !
a:s:
P:
0;dr
(ii ) Under HA
; there are ; > 0 so that
P
R
1+
VR;N;T >
! 1 a:s:
P:
The test has some nice features. Speci…cation tests generally assume correct speci…cation under the
null. In our case, the bandwidth is correctly speci…ed under the alternative. This is helpful in that, in
theory, rejection of the null at the 5% level, gives us 95% con…dence that the alternative is true and
the assumed bandwidth is correctly speci…ed. Further, the critical values (those of chi-squared with 1
degree of freedom) are readily tabulated. Reliance on a classical distribution makes testing, as well as
adaptation of the bandwidth in either direction should the null not be rejected, rather straightforward.
It should be stressed that the limiting distribution in Theorem 3 is driven by the added randomness
; conditional on the sample and for all samples but a set of measure zero. Nevertheless, whenever we
reject the null, for all samples and for 95% of random draws ; the alternative is true, and so keeping
the selected bandwidth is the right choice.
dif
We now turn to hdif
N;T . We will ensure that hN;T is small enough as to satisfy
hdif;5
N;T LX (T;a) a:s:
! 0 8a 2 D;
N;T
hdif
N;T
and large enough as to satisfy (
1=2 L (T;a) ! 1: In order to rule out the possibility that
N;T log(1= N;T ))
X
any bandwidth rate is either too slow to satisfy the former condition or too fast to satisfy the latter, it
su¢ ces to require that N=T 5 ! 1.
We can now state the hypothesis of interest as:
H0dif
for A
:
Z b dif;(5
hN;T
A
") b
LX (T; a)
N;T
a:s:
da ! 1 or
Z
A
b (T; a)
L
X
D; and " > 0 arbitrarily small, versus
1=2
1=2
(1=
N;T log
dif;(1+")
b
hN;T
N;T )
a:s:
da ! 1
0
HA
: negation of H00 .
Remark 1. We note that, contrary to the drift case, we are not writing the second condition in the null
hypothesis as
4
The condition R=T ! 0 is necessary only for the case in which the local time diverges at a logarithmic rate. On the
other hand, if the local time diverges at rate T a a > 0; then R can grow as fast as or faster than T:
12
Figure 2: Graphical representation of the di¤usion bandwidth test
max
8
<
:R
Z
N;T
;
dif;(1+")
b (T; a)da A
b
L
h
X
A N;T
b (T; a)
L
X
b (T;a)
b
hdif L
X
9
=
N;T )
a:s:
da ! 1:
;
1=2
1=2
(1=
N;T log
dif;(1+")
b
hN;T
(10)
is the rate of convergence of the di¤usion estimator, we do
In fact, in spite of the fact that N;T N;T
not need to explicitly require its divergence (in Proposition 1, for example). If (iii ) is satis…ed for the
b (T;a)
b
hdif L
X
is guaranteed to diverge. In other words, the maximum in Eq. (10)
di¤usion estimator, then N;T N;T
is always the second term and the …rst term can be dropped. The graphical manifestation of this result
is the fact that, in Figure 2, f (a) < 12 : In the case of the drift, the maximum may vary depending on
(see Figure 1). For instance, if is larger than 58 , then the maximum condition is always R bdr b1
since
1
5
+
2
A
> 1.
Consider the following statistic:
where for i = 1; 2
U and
n
o
V DR;N;T = min Vg
D1;R;N;T ; Vg
D2;R;N;T ;
de…ned as above, and
Vg
Di;R;N;T =
Z
U
2
V Di;R;N;T
(u) (u)du,
R
2 X
1 fvdi;j;N;T
V Di;R;N;T (u) = p
R j=1
with
13
ug
1
2
hN;T LX (T;a)da
0
vd1;j;N;T = @exp
0
with (
1;
|
2)
0
vd2;j;N;T = @exp @
Z
A
Z
A
0
@
") b
dif;(5
b
hN;T
b (T; a)
L
X
LX (T; a)
N;T
1=2
1=2
(1=
N;T log
dif;(1+")
b
hN;T
iidN (0; I2R ):
111=2
daAA
N;T )
1;j ;
111=2
daAA
2;j ;
Theorem 4. Let Assumption 1 and 2 hold. Assume T; N; R ! 1; N=T 5 ! 1, and R=T ! 0.
(i ) Under H0dif ;
d
V DR;N;T !
2
1
a:s:
P:
dif
(ii ) Under HA
; there are ; > 0 such that
P
R
1+
V DR;N;T >
! 1 a:s:
P:
Remark 2 (The local polynomial and local linear case). Our discussion has focused on classical
Nadaraya-Watson kernel estimates. We will continue to do so throughout this paper. This said, the
methods readily apply to alternative kernel estimators when propriately modi…ed, if needed. For example,
they apply (unchanged) to the local linear estimates studied by Fan and Zhang (2003) and Moloche
(2004).
3
Jump-di¤usion processes
We now study the problem of bandwidth selection in the context of processes with discontinuous sample
paths. Consider the class of jump-di¤usion models
dXt = (Xt )dt + (Xt )dWt + dJt ;
where fJt : t = 1; :::; T g is a Poisson jump process with in…nitesimal intensity (Xt )dt and jump size c.
Let c = c(Xt ; y), where y is a random variable with stationary distribution fy (:).
We begin by assuming existence of consistent estimates of (:) and (:) in the presence of jumps
(bN;T (:) and b2N;T (:)). Later we show how these estimates can be de…ned. Write, as earlier,
for i = 2; :::;
1
N;T T .
b
"i
N;T
=
Xi
We note that
N;T
X(i
1)
N;T
bN;T (X(i
14
1)
bN;T (X(i 1)
p
)
N;T
N;T
N;T
)
N;T
b
"i
N;T
=
=
Xi
Xi
X(i
N;T
N;T
bN;T (X(i
X(i
N;T
(X(i
(X(i
1)
1)
(X(i
N (0; 1) +
N;T
1)
Ji
1)
1)
1)
bN;T (X(i 1)
p
)
N;T
N;T
(X(i
N;T
) + op (1)
N;T
) Wi
1)
N;T
)
N;T
p
N;T
(X(i
1)
J(i
1)
p
)
N;T
N;T
N;T
N;T
+ op (1)
N;T
W(i 1) N;T
p
) + op (1)
N;T
N;T
N;T
)
Ji
+
(X(i
N;T
1)
J(i
1)
N;T
) + op (1)
N;T
+ op (1):
p
+ op (1)
N;T
(11)
N;T
If there is a jump at i N;T ; Ji N;T J(i 1) N;T = Op (1). However, over a …nite time span T ; there
will only be a …nite number of times in which 1fb
"i N;T
xg is 1 instead of 0 or viceversa, because of
jumps. Thus,
N
N
Op (1)
1 X
1 X
1fb
"i N;T
xg =
1fb
"ci N;T
xg +
;
N 1 i=2
N 1 i=2
N
where b
"ci N;T is the residual that would prevail in the continuous case. Hence, the same criterion as in
Subsection 2.2 can be applied to the case with jumps.
It still remains to establish conditions under which we have consistent estimates of the in…nitesimal
moments in the presence of jumps. Herafter, we rely on the following assumption:
Assumption 3.
(:); (:); c(:; y); and (:) are time-homogeneous, B-measurable functions on D = (l; u) with 1
l < u 1; where B is the -…eld generated by Borel sets on D. All functions are at least twice
continuously di¤ erentiable. They satisfy local Lipschitz and growth conditions. Thus, for every
compact subset J of the range of the process, there exist constants C4J ; C5J , and C6J so that, for all
x and z in J,
(i )
j (x)
(z)j + j (x)
(z)j + (x)
Y
and
j (x)j + j (x)j + (x)
and for
> 2;
(x)
Z
Y
(ii )
(:) > 0 and
2 (:)
Z
Z
Y
jc(x; y))j
jc(x; y)
c(z; y)j (dy)
jc(x; y))j (dy)
(dy)
> 0 on D.
15
C4J jx
C5J f1 + jxjg;
C6J f1 + jxj g;
zj;
(iii )
(:); (:); c(:; y); and
(:) are such that the solution is recurrent.
In what follows, we consider two alternative scenarios. First, we establish the validity of our bandwidth selection procedure for all in…nitesimal moments under parametric assumptions on the jump
component. Second, without making parametric assumptions on the jump component, we discuss bandwidth selection for the purpose of consistent (and asymptotically normal) estimation of the system’s
drift and in…nitesimal variance. In the former case, we incur the risk of incorrectly specifying the jump
distribution but completely identify the system’s dynamics. The procedure is, in spirit, semiparametric.
In the latter case, we are agnostic about the jump distribution, but can only identify the process’drift
(possibly inclusive of the …rst conditional jump moment) and the process’in…nitesimal volatility, while
remaining fully nonparametric. If interest is on the full system’s dynamics, one should employ the procedure in Subsection 3.1. If interest is solely on the volatility of the continuous component of the process,
then the methods in Subsection 3.2 are arguably preferable. As we will show, in fact, the di¤usion’s
kernel estimator converges at a faster rate in this second case.
3.1
Consistent estimation of all in…nitesimal moments
In order to separate the moments of the continuous component from those of the jump component, we
ought to properly correct the kernel estimators considered in the previous section. Following Bandi and
Nguyen (2003), BN hereafter, and Johannes (2004), de…ne
and
bN;T (a) =
b2N;T (a) =
N;T
1
N;T
PN
1
j=1 K
1
PN
1
j=1 K
Xj N;T a
hn;T;1
PN
j=1 K
Xj N;T a
hn;T;2
PN
j=1 K
X(j+1)
N;T
Xj
N;T
Xj N;T a
hN;T;1
X(j+1)
N;T
Xj N;T a
hN;T;2
bh
b y;h (c(Xt ; y))
(Xt )E
n;T
bh
b y;h
(Xt )E
c(Xt ; y)2 : (13)
n;T
n;T
Xj
(12)
2
N;T
n;T
b y c(:; y)j with j = 1; 2
Since the intensity estimator b(:); as well as the jump size moment estimator, E
depend, in general, on higher-order in…nitesimal moment estimates, we make explicit their dependence
b y;h
on a (vector-)bandwidth hn;T and write bhn;T (:) and E
c(:; y)2 , as above.
n;T
We are now more speci…c. Identi…cation of (:) and the moments of the jumps may hinge on
parametric assumptions on fy (:), i.e., the probability distribution of the jump size. Assume, for instance,
c(Xt ; y) = y and fy (:) = N (0; 2y ), but alternative speci…cations may, of course, be invoked along the
lines of, e.g., Bandi and Renò (2008), BR henceforth. Then, from BN (2003) and Johannes (2004), one
can write
16
b y;h
E
c(Xt ; y)2
n;T
b2y
=
(a) =
n;T
1
cj
M
N;T;hk (a) =
N;T
PN
1
j=1 K
N;T
c4
M
N;T;h4 (a)
bh
with
=
3 b4y
N
c6
1 X M
N;T;h6 (Xj
c4
N j=1 5M
N;T;h4 (Xj
n;T
)
n;T
)
;
;
N;T
Xj N;T a
hn;T;k
X(j+1)
PN
j=1 K
Xj
N;T
j
N;T
Xj N;T a
hN;T;k
j = 1; :::
Since the mean of the jump size is zero, Eq. (12) and Eq. (13) become, in this case:
c1
bN;T (a) = M
N;T;h1 (a);
c2
b2N;T (a) = M
N;T;h2 (a)
1
N
3
c4
M
N;T;h4 (a)
PN
c6
M
N;T;h6 (Xi
i=1 5M
c4
(Xi
n;T
N;T;h4
2
)
n;T
)
0
N
c6
X
M
N;T;h6 (Xi
@1
c4
N i=1 5M
N;T;h4 (Xi
n;T
)
)
n;T
1
A;
(14)
(15)
with hn;T = (h6 ; h4 ). In other words, optimization of the criterion in Subsection 2.2 will now depend on
four bandwidths whose properties are laid out below.
Proposition 2 (BN, 2003): Let Assumption 3 hold.
1=2
(i ) Let
N;T
= T =N with T …xed. If limN !1
b (T ; a) =
where L
X
N;T
hN;T
PN
j=1 K
b (T ; a)
L
X
Xj
1
N;T
hN;T
log
1
N;T
! 0, then
LX (T ; a) = oa:s: (1);
a
N;T
hN;T
:
The in…nitesimal moments
a:s:
If (ii ) hN;T;k LX (T; a) ! 1 and (iii )
LX (T;a)
hN;T;k
ck
M
N;T;hk (a)
N;T
log
1
N;T
1=2 a:s:
! 0; then:
M k (a) = oa:s: (1):
a:s:
If, in addition, (iv ) hdr;5
N;T;k LX (T; a) ! 0; then:
q
b (T; a) M
ck
hN;T;k L
X
N;T;hk (a)
M k (a) ) N 0; K2 M 2k (a) :
From the
q proposition above, we note that all moments estimators converge to their limit at the
b (T; a): Importantly, it is theoretically sound to employ the same rate condition
same rate, hN;T;k L
X
17
for all moments. This is in sharp contrast with the continuous semimartingale case in which the drift
estimator converges at a slower rate than the in…nitesimal variance estimator. In the continuous case, in
fact, one ought to use di¤erent bandwidth rates, since, from the conditions in Proposition 1, we require
a:s:
a:s:
dif
5
hdr
N;T LX (T; a) ! 1 (for the drift bandwidth) and hN;T LX (T; a) ! 0 (for the di¤usion bandwidth).
Let bN;T;h (a) and b2N;T;h (a) be de…ned as in Eq. (14) and Eq. (15) with h1 = h2 = h4 = h6 = h: We
can now select h in such a way as to minimize supx jFN;T;h (x)
(x)j ; where FN;T;h (x) is the empirical
distribution of b
" (as de…ned in (11)) evaluated at x: Given the nature of the bandwidth requirements from
Proposition 2, the second-step procedure can be carried out as in the continuous drift case. Similarly,
the asymptotic behavior of the second-step procedure is as established in Theorem 3.6
Needless to say, misspeci…cation of the parametric distribution of the jump component will, in general,
result in failure of the statement in Theorem 2 since, in this case, there might not exist a bandwidth for
which supx jFN;T;h (x)
(x)j = op (1): We now turn to a procedure which does not impose parametric
assumptions on the process’discontinuities at the cost of solely identifying the moments of the process’
continuous component.
3.2
Consistent estimation of the drift and in…nitesimal variance
Should we be unwilling to make parametric assumptions on the distribution of the jump component, we
may still consistently estimate the in…nitesimal variance term. The only maintained assumption about
the jump component in this subsection is that Jt is a process of …nite activity. De…ne b2J;N;T (a) as:
b2J;N;T (a) =
where
k
PN
Xj
p
j=1 K
p
2=p
N;T
a
N;T
hdif
N;T
p
i=1
PN
j=1 K
X(j+i)
N;T
Xj
a
N;T
hdif
N;T
X(j+i
2
p
1)
N;T
;
(16)
= E jZjk ; with Z denoting a standard normal random variable, and 2 < p < p < 1: Corradi
T
with T
and Distaso (2008) have studied the properties of this class of estimators for the case N;T = N
2
2
…nite. They have shown that, under mild conditions, bJ;N;T (a) identi…es (a) consistently even in the
presence of …nite activity jumps. Since we are dealing with Poisson jumps, with probability one we can
have at most a …nite number of jumps over a …nite time span. As the time span increases inde…nitely,
the number of jumps increases roughly at the same rate. Provided p > 2; asymptotic mixed normality
follows under the same rate conditions as in the continuous semimartingale case. In fact:
5
Consider conditions (iii ) and (iv’) in Proposition 1. From (iv’), we notice that
hdif;5
N;T
hdif;5
N;T LX (T; a):
than
Set N;T = O
and ignoring the logarithm, we obtain
LX (T; a)
hdif
N;T
LX (T; a) with
q
hdif;5
N;T
"
N;T
has to vanish at a slower rate
> 0 arbitrarily small. Now, plugging this condition into (iii )
3=2
dif;3=2 "=2 a:s:
LX (T; a) = LX (T; a)hN;T
a:s:
! 0;
which implies hdif
N;T LX (T; a) ! 0 but, of course, this is in contraddiction with (ii ) in the drift case (see Proposition 1).
6
Simulations suggest that it is sometimes very bene…cial to select a smaller bandwidth for the in…nitesimal second
moment than for the …rst and higher-order moments (see, e.g., BR, 2008). One may therefore set h1 = h4 = h6 with h2
left unrestricted. In this case our criterion results in the choice of two bandwidths like in the continuous case.
18
Theorem 5. Let Assumption 3 hold and let p > 2:. If (i )
hdif;5
N;T LX (T;a)
N;T
LX (T;a)
hdif
N;T
N;T
log
1
1=2 a:s:
! 0; (ii )
N;T
a:s:
! 0, then
v
u dif
b
uh L
t N;T X (T; a)
b2J;N;T (a)
N;T
where
p
=
p
4=p
(2p
1)
2p
2=p
+2
2
p 1 2
4=p 2=p
(a) ) N 0;
+
p 2 4
4=p 2=p
4
p K2
(a) ;
p (p 1) 2(p 1)
4=p
2=p
+ ::: +
2p
2=p
Let
b
"i
=
N;T
Xi
X(i
N;T
1)
N;T
bJ;N;T (X(i
bN;T (X(i 1)
p
N;T
1) N;T )
N;T
)
N;T
:
;
where bN;T is de…ned as in (14) and b2J;N;T (a) is as in (16) above, with p > 4: We may now select
dif
(x)j ; where FN;T;h (x) is the empirical
hN;T = (hdr
N;T ; hN;T ) in order to minimize supx jFN;T;h (x)
distribution of b
". Subsequently, we can verify the rate conditions as in the continuous semimartingale
drift and di¤usion case. In other words, Theorems 3 and 4 apply.
Of course, if the jump size does not have mean zero, the procedure only identi…es the sum of the drift
component and the compensator (see, e.g., Eq. 12) while remaining consistent for the di¤usive volatility.
Should this be the case, then one has to resort to parametric assumptions, as in the previous subsection,
to identify the continuous drift component, if needed.
4
Di¤usions observed with error (or microstructure noise)
We now assume that the process Xt is contaminated by measurement error and write observations from
the observable process Yt as
Yi N;T = Xi N;T + i N;T ;
(17)
k=2
1=2
where aN;T i N;T is an i.i.d. sequence with mean zero, variance 1, and so that E ki N;T = O aN;T
(k 2) for aN;T ! 0 as N; T ! 1:
We provide estimates of the …rst two in…nitesimal moments which are robust to this type of measurement error. In this context, we establish conditions for consistency and asymptotic normality. We
then turn to the issue of automatic bandwidth choice. Write
PB Pl
b=1
bN;l;T (a) =
1
j=1 K
Y((j
1)B+b) N;T
hdr
N;l;T
a
PB Pl
b=1
1
j=1 K
19
1
l;T
Y((j
Y(jB+b)
1)B+b) N;T
hdr
N;l;T
N;T
a
Y((j
1)B+b)
N;T
;
(18)
where Bl = N and
b2N;l;T (a) =
l;T
= T =l. As for the di¤usion:
PB Pl
1
j=1 K
b=1
Y((j
1
l;T
PB Pl
Y((j
1
j=1 K
b=1
PN
a
1)B+b) N;T
hdif
N;l;T
Y(jB+b)
N;T
Y((j
2
1)B+b)
N;T
1
l;T RVT;N
a
1)B+b) N;T
hdif
N;l;T
N;T ;
(19)
2
where RVT;N N;T = N;T j=1 Yj N;T Y(j 1) N;T : In the case of a …xed time span, the estimator
in Eq. (19) has been studied by Corradi and Distaso (2008). Here, we also consider estimation of the
…rst in…nitesimal moment as in Eq. (18). In both cases, letting the time span increase without bound
(which is, as always, necessary in the drift case for consistency) raises additional technical issues which
ought to be dealt with.
Remark 3 (market microstructure). When Yi N;T is an observable logarithmic price process (i.e.,
a transaction price or a mid-quote, for example), Xi N;T generally denotes the underlying, unobservable
equilibrium price and "i N;T de…nes market microstructure noise. If econometric interest is placed on the
drift and di¤usion function of the equilibrium price process, as is generally the case, then bN;l;T (a) and
b2N;l;T (a) will provide consistent and asymptotically normal estimates of its true in…nitesimal moments
(as we show below) even when contaminated price observations Yi N;T are employed.
Remark 4. We note that the form of bN;l;T (a) and b2N;l;T (a) requires the use of an appropriately-chosen
lower frequency l. In agreement with the two-scale estimator of Zhang, Mykland, and Aït-Sahalia (2005),
ZMA henceforth, the di¤usion case also requires a bias-correction term based on the higher frequency
N.
Theorem 6
The in…nitesimal …rst moment
Let Assumption 1 hold and let
a:s:
(i ) hdr
N;l;T LX (T; a) ! 1; (ii )
p
1=2
N 1=k aN;T LX (T;a) a:s:
q
! 0; then
dr
be de…ned as in Eq. (17). Also assume that l = O(BT ): If
LX (T;a)
hdr
N;l;T
l;T
1
log
1=2 a:s:
a:s:
! 0; (iii ) hdr;5
N;l;T LX (T; a) ! 0, and (iv )
l;T
hN;l;T
r
b
hdr
N;l;T LX (T; a) b N;l;T (a)
(a) ) N
2
0; K2
3
2
(a) :
The in…nitesimal second moment
Let Assumption 1 hold and let
LX (T;a)
hdif
N;l;T
0; then
l;T
log
1
l;T
1=2 a:s:
! 0; (ii )
be de…ned as in Eq. (17). Also assume that l = O(BT
): If (i )
r
hdif;5
N;l;T LX (T;a)
v
u dif
b
uh
t N;l;T LX (T; a)
l;T
l;T
a:s:
! 0; (iii )
b2N;l;T (a)
20
2
a2N;T l
2
l;T
1=2
! 0; and (iv )
(a) ) N 0; K2
4
N 1=k aN;T l1=2
(a) :
dif
h
L (T;a)
N;l;T X
T
hdif
N;l;T
a:s:
!
Both in the drift and in the di¤usion case, the averaging over sub-grids reduces the constant of proportionality in the estimators’asymptotic variance (from 1 to 23 in the drift case, from 2 to 1 in the di¤usion
! 0, the rate is slower.
case). The rates of convergence are also a¤ected. In the di¤usion case, since N;T
l;T
Instead, in the drift case, the new bandwidth conditions (ii ) require larger bandwidth choices and thus,
compatibly with condition (iii ) the actual rate may be faster. Since N = Bl; by choosing a smaller l,
and hence a larger B, we may allow for a larger variance of the error term (see below for additional
details). This choice will in general not come at a price (in terms of convergence rate) as far as the drift
is concerned, but could come at a price in the case of di¤usion estimation if
hdif
N;l
1 ;T
l1 ;T
hdif
N;l
2 ;T
l2 ;T
=o
with
l 1 < l2 .
Turning to bandwidth selection, we note that our local Gaussian criterion ought to be re-adjusted in
this new framework. Heuristically, let
u
bi
N;T
=
=
Yi
Yi
N;T
N;T
Y(i
Xi
bN;l;T (Y(i 1)
p
N;T
1) N;T )
N;T
bN;l;T (Y(i
Y(i
(Y(i
=
1)
1)
1)
X(i
N;T
(Y(i
N;T
N;T
1)
1)
p
) + op (1)
(X(i
N;T
(X(i
1)
N;T
N;T
)
)
N;T
N;T
+ op (1)
N;T
1)
N;T
N;T
)+
)
N;T
0 (X
+
i
(i 1)
(i 1)
N;T
N;T
)
(i 1)
N;T
N;T
+ op (1)
+op (1)
= ui
N;T
0 (X
(i 1)
p
N;T
)
(i 1)
N;T
N;T
+ op (1);
where X (i 1) N;T 2 X(i 1) N;T ; Y(i 1) N;T : In spite of the consistency of the drift and in…nitesimal
variance estimator, ui N;T is in general non-Gaussian since the presence of measurement error a¤ects
Yi N;T Y(i 1) N;T and, of course, the evaluation point. A natural solution to this issue is to use di¤erent
frequencies for in…nitesimal moment estimation and for bandwidth selection. For the latter, one may use
a (lower) frequency at which the contamination error is expected to have little or no e¤ect, say H;T ;
with H=N ! 0: Provided aN;T = o( H;T ); we de…ne
u
bi
H;T
=
=
Yi
Xi
H;T
H;T
Y(i
H;T
H;T
bN;l;T (Y(i
X(i
(X(i
= ui
1)
1)
1)
H;T
bN;l;T (Y(i 1)
p
H;T
1) H;T )
(X(i
) + op (1)
H;T
+ op (1):
1)
p
H;T
H;T
)
)
H;T
H;T
+ op (1)
H;T
It is now clear that ui H;T is approximately Gaussian. The criterion de…ned in Section 2.2 is therefore
still valid and the statements in Theorem 1 and 2 continue to apply. In …nite samples, of course, the
approximation is best the smallest the interval H;T . From a practical standpoint, therefore, one has
to balance the size of the implied measurement error with the accuracy of the Gaussian approximation.
The highest frequency at which the measurement error appears negligible is therefore the preferable
frequency.
21
N;T
Remark 5. In the case of high-frequency logarithmic asset prices and market microstructure noise,
an appropriate frequency may be chosen in a data-driven manner, either by looking at signature plots
(Andersen, Bollerslev, Diebold, and Labys 2000) or via the statistics suggested by Awartani, Corradi
and Distaso (2009).
In the second stage one needs to verify whether the bandwidths selected by the procedure in Theorem
b dif
1, say b
hdr
N;l;T and hN;l;T , satisfy all of the rate conditions in Theorem 6. We begin with the drift. Notice
that N; T and the size of the measurement error aN;T are given. While aN;T is unknown in general, it may
be estimated by using (RVT;N N;T ) =2 as de…ned in Eq. (19). Given N , we …x l and B, using the fact
1
that N = lB: If we choose l = O aN;T
T ; it is immediate to see that (ii ) implies (iv ). Summarizing, if
1
T 17=5 =l ! 0 and l = O aN;T
T ; then there is a bandwidth satisfying (i )-(iv ) and we can proceed along
the lines of Theorem 2 by testing the hypothesis:
Z
a:s:
"b
b
H0dr :
hdr;5
N;l;T LX (T; a)da ! 1
A
8
9
1=2
Z b
1=2
<
=
L
(T;
a)
log
(1=
)
X
l;T
1
a:s:
l;T
!1
or max R
;
da
dr;(1+")
dr;(1+") b
: b
;
b
hN;l;T
h
LX (T; a)da A
A
N;l;T
for A D; and " > 0 arbitrarily small, versus its alternative.
2=3+"
We now turn to the variance estimator. If we set l = O aN;T T 2=3 ; (iii ) is always satis…ed.
Further, if T 5 =l ! 0; then there is a bandwidth satisfying (i ) and (ii ). We now test the following
hypothesis:
H0dif
:
Z b dif;5 " b
hN;l;T LX (T; a)
A
max
for A
5
l;T
a:s:
da ! 1 or
r
8
>
>
<Z N 1=k a1=2 l1=2
N;T
>
>
:
A
b
hdif
N;l;T LX (T;a)
T
dif;(1+")
hN;l;T
da;
Z
A
b (T; a)
L
X
D; and " > 0 arbitrarily small, versus its alternative.
1=2
1=2
(1=
l;T log
dif;(1+")
b
hN;l;T
l;T )
da;
9
>
>
=
>
>
;
a:s:
!1
Stochastic volatility
Consider now the model
dXt =
df (vt2 ) =
X
X
t dt + vt dWt
(vt2 )dt + (vt2 )dWt
;
where WtX : t = 1; :::; T and fWt : t = 1; :::; T g are potentially correlated Brownian motions. The
function f (x) may be equal to log(x) as in Jaquier, Polson, and Rossi (1994) or x as in Eraker, Johannes,
and Polson (2003), for instance. Our interest is in (vt2 ) and 2 (vt2 ), the drift and the di¤usion function
of the spot variance process.
22
Volatility is latent. However, it may be …ltered from prices Xt sampled at high frequency as suggested
by Kristensen (2008) and BR (2008). To this extent, assume, as earlier, availability of N equi-distant
price observations with N;T = T =N denoting the time distance between successive data points and T
denoting the time span. We again observe the price skeleton X N;T ; X2 N;T ; :::; XT N;T . These price
observations may be employed to (1) …lter spot volatility (or spot variance) nonparametrically for the
purpose of (2) identifying (:) and 2 (:). Using preliminary spot variance estimates vbt2 , the latter may
be done by virtue of the functional estimators in Eq. (1) and (2) (BR, 2008, and Kanaya and Kristensen,
2008). Importantly, however, selection of the smoothing sequences hdr and hdif now also depends on the
need to eliminate the impact of the estimation error induced by the …rst-step spot variance estimates.
To present the main ideas, consider spot variance estimates obtained by virtue of the classical realized
variance estimator (Andersen et al., 2003, and Barndor¤-Nielsen and Shephard, 2002). Speci…cally, write
+T
vb2 =
i=
N
X
T
N
1
N;T
T
N
X(i+1)
N;T
Xi
2
N;T
:
(20)
1
N;T
1
squared price di¤erences in a local neighborhood of determined
The estimator averages 2T N N;T
N
by the localizing factor T
.
BR (2008) introduce four additional conditions (with respect to those in Proposition 1 above) which
the drift bandwidth hdr and the di¤usion bandwidth hdif ought to satisfy for asymptotic normality
of the drift and the di¤usion function estimates to hold. These conditions (two for each in…nitesimal
moment) are su¢ cient to eliminate, asymptotically, the in‡uence of the estimation error induced by vb2
(when used in place of the unobservable v 2 ). Intuitively, the conditions imply that one needs to use a
T
larger discrete interval, say M;T = M
with M=N ! 0, than is used for estimating the preliminary spot
variance estimates. In other words, one needs to use high-frequency data to identify spot variance vb2
and M lower frequency observations (on vb2 ) to identify the dynamics (through (:) and 2 (:)). To this
dif
extent, call the relevant bandwidths hdr
M;T and hM;T .
In what follows, for conciseness, we will not discuss the origin and form of these four conditions. We
refer the reader to BR (2008) and Appendix B to this paper for details. However, consistently with our
stated goal, we discuss the implications of the four conditions for bandwidth choice. When dealing with
this choice, the main technical issue is now that the rate of growth of M depends on hdr
M;T ; which is what
one needs to …nd optimally, as well as on Lv (T; a) which is unknown and whose estimates depend on
hdr
M;T : This is an important di¤erence from the observable case in which all N observations are used. In
the drift case, one may consider optimizing over both M dr and hdr
M;T . Similarly, in the di¤usion case one
dif
dif
might wish to optimize over M
and hM;T . We leave this issue for future work and take the following
approach to the problem.
As said, it is natural for applied researchers to employ N high-frequency observations to identify spot
variance before using M lower frequency data (on vb2 ) to evaluate the dynamics. To this extent, assume
M and N are …xed (with M < N ). It can be shown (see Appendix B) that, for the drift, the implied
23
bandwidth condition becomes:
0
@N
hdr
M;T Lv (T; a)
1 7
2.
where
2
2 +1
|
As for the di¤usion:
0
hdif
M;T Lv (T; a) @ N N
T;M
T
M2
{z
2
2 +1
1
2
T
2
1+2
1
2
M4
{z
|
A a:s:
! 1:
(21)
}
N;T;M
2
1+2
1
1
T A a:s:
! 1:
(22)
}
N;T;M
In general (i.e., for empirically reasonable values of N; M; T ), it is easy to see that N;T;M < 1 and
a:s:
dr
N;T;M < 1. Hence, Eq. (21) and Eq. (22) are more stringent conditions that hM;T Lv (T; a) ! 1 and
hdif
M;T Lv (T;a) a:s:
T;M
M
! 1 with probability one. This observation leads to the following tests.
If T 17=5 ! 1 and N; M; T are such that
T is the local time’s divergence rate), then
b
H0dr : b
hdr;5
M;T Lv (T; a) ! 1 or max
a:s:
8
<
:
N;T;M
! 0 and T [1
1
b dr b
N;T;M hM;T Lv (T; a)
;
( 15 +c)]
b (T; a)
L
v
N;T;M
1=2
M;T
! 1 (with c > 0; where
log1=2 (1=
b
hdr
M;T
M;T )
9
=
;
a:s:
! 1:
If TM5 ! 1 and N; M; T are such that N;T;M ! 0 and
1
1
M [1 ( 5 +c)] T [1 ( 5 +c)]( 1) N;T;M ! 1 (with c > 0, where T is the local time’s divergence rate), then
H0dif :
6
b
b
hdif;5
M;T Lv (T; a)
M;T
a:s:
! 1 or max
8
<
:
M;T
b dif b
N;T;M hM;T Lv (T; a)
Multivariate di¤usion processes
;
b (T; a)
L
v
1=2
M;T
log1=2 (1=
b
hdif
M;T
9
=
)
M;T
a:s:
! 1:
;
We now turn to multidimensional di¤usions. Let Xt = (X1;t ; :::; Xd;t )| and consider the stochastic
di¤erential equation
dXt = (Xt )dt + (Xt )dWt ;
where (:) and (:) are matrix functions satisfying the regularity conditions for the existence of a
recurrent solution in BM (2004) and fWt : t = 1; :::; T g is a (conformable) standard Brownian vector.
Let the di¤usion matrix (a) be de…ned as (a) = (a) (a)| for x = (a1 ; :::; ad ):
7
These conditions allow for the use of market microstructure noise-robust spot variance estimators. BR (2008) propose
1
N
noise-robust spot variance estimators with a rate of convergence equal to k = T
. As in the
N;T for some
2
case of realized variance (above), these estimators may be derived from robust integrated variance estimators (such as the
two-scale estimator of Zhang et al., 2005, and the class of kernel estimates suggested by Barndor¤-Nielsen et al., 2008b) by
localizing the integrated estimates in time. Their asymptotic properties (studied in BR, 2008) reveal that is, for instance,
equal to 1=10 (in the case of the two-scale estimator) or 1=6 in the case of ‡at-top kernel estimates obtained by virtue of
kernels g(:) satisfying g 0 (1) = 0 and g 0 (0) = 0. For realized variance in Eq. (20) = 12 .
24
T
Suppose we observe X N;T ; X2 N;T ; :::; XN N;T with N;T = N
: Speci…cally, assume there is a frequency at which synchronized observations may be observed for all processes. This is standard for
estimation methods relying on low-frequency observations. In principle, however, we could allow for
observations recorded at random, asynchronous times and, therefore, use high-frequency data for estimation. This could be done, for example, by employing the refresh time approach advocated by
Barndor¤-Nielsen, Hansen, Lunde and Shephard (2008a). The use of refresh times, however, would
require important, additional technicalities due to their randomness. In particular, it would require an
extension of existing asymptotic (mixed) normal results for drift and in…nitesimal variance estimators
(in Proposition 3 below) to the case of random times.
We de…ne Nadaraya-Watson estimators of the drift vector and covariance matrix by writing
1
j=1 K
1
b N;T (a) =
PN
N;T
Xj
b N;T (a) =
where the kernel K
1
N;T
Xj
1
j=1 K
N;T
hN;T
Xj
N;T
hdif
N;T
X(j+1)
PN
j=1 K
and
PN
a
N;T
hdr
N;T
a
X(j+1)
N;T
PN
j=1 K
x
=
d K
i=1
Xi;j N;T xi
hi;N;T
Xj
N;T
Xj
N;T
N;T
a
hdr
N;T
Xj
Xj
X(j+1)
N:T
N;T
hdif
N;T
N;T
Xj
|
N:T
;
a
is a product kernel and K (:) is de…ned in the
dif
dif
dr
same manner as in Assumption 2. We denote by hN;T the matrix bandwidth hdr
1;N;T ; :::; hd;N;T ; h1;N;T ; :::; hd;N;T
2d :
belonging to the set H R+
In the multivariate case, local time is not de…ned. However, the averaged kernel
b (T; x) =
L
X
N;T
N
X1
d h
i=1 i;N;T j=1
K
Xj
N;T
a
hN;T
will still provide an estimate of the occupation density of the process (while, at the same time, inheriting
its divergence rate) as discussed in BM (2004). Naturally, the divergence rate of the occupation density
plays a role in the characterization of the bandwidth conditions for both the drift and the di¤usion
matrix.
Proposition 3 (BM, 2004): Let Assumption 1 and 2 in BM (2004) hold.
Assume T; N ! 1 and N;T ! 0: Assume, for all i, hi;N;T ! 0 and
(
n;T
log(1=
1=2
=
n;T ))
d
i=1 hi;N;T
! 0:
Then,
b (T; a)
L
X
) CX e (a) g ;
v(1=T )
where the function v(1=T ) is regularly-varying at in…nity with process-speci…c parameter satisfying
0
1, g is used here to denote the Mittag-Le- er random variable with the same process-speci…c
parameter , and CX is a process-speci…c constant.
25
The drift estimator
If, for all i; hdr
i;N;T ! 0,
d hdr
i=1 i;N;T v(1=T )
! 1; and
v(1=T )
d hdr
i=1 i;N;T
N;T
1=2
1
log
! 0;
N;T
then
If, in addition, for all j, hdr;5
j;N;T
r
where Id is a d
a:s:
b N;T (a)
d hdr
i6=j i;N;T v(1=T )
b dr
d hdr
i=1 i;N;T LX (T; a)
d identity matrix.
(a) ! 0:
! 0;
b N;T (a)
1=2
(a) )
(a)N 0; Kd2 Id ;
The di¤ usion estimator
If, for all i; hdif
i;N;T ! 0;
d hdif
i=1 i;N;T v(1=T )
! 1; and
N;T
v(1=T )
N;T
d hdif
i=1 i;N;T
log
1=2
1
! 0;
N;T
then
If, in addition, for all j;
v
u
u
t
h5;dif
j;N;T
d hdif
i6=j i;N;T
N;T
b dif (T; a)
d hdif L
i=1 i;N;T X
N;T
with V (a) = PD (2 (a)
a:s:
b N;T (a)
v(1=T )
(a) ! 0:
! 0;
vech b N;T (a)
(a) ) V (a)1=2 N 0; Kd2 Id ;
(a)) PD0 ; where PD is so that vech (a) = PD vec (a):
We now turn to the …rst step of our bandwidth selection procedure. For i = 2; :::;
inner product of the residual process:
n
|
1
b
b
"|i N;T b
"i N;T =
X
(X
)
N;T
(j+1)
(j+1)
N;T
N;T
N;T
N;T
where
X(j+1)
N;T
= X(j+1)
b N;T (X(j+1)
N;T
b N;T = arg min
h
h
and
hN;T = h 2 H
X(j+1)
1
N
N;T
1
)
N;T
sup
X(j+1)
N;T
: Now write:
1 u2D+
N
X
i=2
n
1 b
"|i
2d
R+
: sup FNh (x)
x
26
b
"i
N;T
(x)
b N;T (X(j+1)
N;T
u
o
N;T
(u)
p
!
N;T !1;
N;T !0
0;
)
1
N;T T
N;T
o
de…ne the
;
where (u) = Pr 2d u , i.e., the cumulative distribution function of a Chi-squared random variable
with d degrees of freedoms. Note that
b
"i
N;T
= "i
+ b N;T (X(j+1)
N;T
+ (X(j+1)
N;T
)
+ b N;T (X(j+1)
where "i
N;T
=
1=2
N;T
(X(j+1)
N;T
1=2
)
(X(j+1)
b N;T (X(j+1)
1=2
)
N;T
(X(j+1)
1=2
N;T
1=2
)
N;T
2:
d
is i.i.d.
Hence, as N; T ! 1 and
p
b
hN;T hN;T ! 0 if, and only if,
)
N;T
(X(j+1)
N;T
1=2
)
X(j+1)
1=2
N;T
1=2
)
N;T
(X(j+1)
N;T
N;T
b N;T (X(j+1)
N;T
N;T
N;T
)
p
)
(X(j+1)
N;T
N;T
; and so "|i
p
N;T
N;T ;
"i
N;T
! 0; by the same arguments as in Theorem 1 and 2,
N;T
sup b N;T a; hdr
N;T
p
(a) = op
a2Dd
and
p
X(j+1)
sup vech b N;T a; hdif
N;T
1
N;T
!
;
(23)
(a) = op (1) :
(24)
a2Dd
In the second step, we need to check whether hdr
N;T is small enough as to satisfy
dr
b (T; a) a:s:
(i ) max h5;dr d hdr L
! 0 8a 2 Dd and large enough as to satisfy
j
i6=j i;N;T X
b dr (T; a);
d hdr
L
i=1 i;N;T X
j;N;T
min (ii )
check whether
hdif
N;T
d hdr
i=1 i;N;T
(iii )
(
N;T
maxj h5;dif
j;N;T
is small enough as to satisfy
d hdif
i=1 i;N;T
enough as to satisfy
(
N;T
log(1=
a:s:
b dr (T;a)
log(1= N;T ))1=2 L
X
b dr
d hdif
i6=j i;N;T LX (T;a)
a:s:
! 0 8a 2 Dd and large
N;T
a:s:
b dif (T;a)
1=2 L
N;T ))
X
! 1 8a 2 Dd : Similarly, we need to
; ! 1: Let us begin with the drift estimator. Without
any restriction on the relative (almost sure) order of the various bandwidths, we cannot ensure that
there is a vector hN;T so that whenever (i ) is violated, (ii )-(iii ) cannot be violated. This may happen
b dr (T; a) a:s:
b dr (T; a) a:s:
! 1 but min h5;dr d hdr L
! 0: Broadly speaking,
when max h5;dr d hdr L
j
j;N;T
i6=j i;N;T
j
X
j;N;T
X
i6=j i;N;T
(ii )-(iii ) only depend on the product on the bandwidths, while (i ) depends both on the product and
on the individual bandwidths. Therefore, in order to ensure the existence of bandwidths satisfying all
conditions, we need to impose some restrictions on the degree of "heterogeneity" of their almost sure
order. We require that, for all j; hdr
j;N;T = Oa:s:
d hdr
i6=j i;N;T
1=(d 1)
; so that the bandwidths can
di¤er from each other but are of the same almost sure order. Given that, whenever (i) is violated
b dr (T; a) dd+41 ; and d hdr
d hdr
approaches zero almost surely at a rate equal or slower than L
X
i6=j i;N;T
dr
b (T; a)
cannot approach zero at a rate faster than L
X
satis…ed, while (iii) writes as
(
N;T
log(1=
dr; 2d+4
d+4
b
1=2
LX
(T; a)
N;T ))
d
d+4
; it is immediate to see that (ii) is trivially
1
1: Imposing the restriction hdif
j;N;T = Oa:s:
(
N;T
log(1=
d hdif
i6=j i;N;T
27
i=1 i;N;T
1=2 T
N;T ))
1=(d 1)
2d+4
d+4
1
! 1 provided N=T
5d+12
d+4
; by an analogous argument, we see
!
maxj h5;dif
d
hdif
b dr (T;a)
L
a:s:
3d+12
j;N;T i6=j i;N;T X
that, whenever
! 0 is violated, (iii) is satis…ed provided N=T 4 d ! 0:
N;T
Thus, if we wished to allow for d > 3; we would need to rely on higher-order kernels.
Testing can now be conducted as in the scalar case. However, should we reject, contrary to the scalar
case, we would not have a clear-cut indication of which particular bandwidth should be made larger or
smaller. In spite of this, we do have information about whether we need to increase or decrease di=1 hdif
i;N;T
and/or di=1 hdr
:
Future
work
should
focus
on
methods
to
adjust
iteratively
individual
bandwidths.
i;N;T
7
Simulations
Two data-generating processes are considered:
(1)
dXt = (0:1320
(2)
dXt = (0:02
1:5918Xt )dt + 2Xt1:49 dWt ;
1=2
0:025Xt )dt + 0:14Xt
dWt ;
X0 = 0:08;
X0 = 0:6:
The …rst process has been used to model short-term interest rates. The second process has been used
for stochastic volatility modelling. They are both highly persistent.
The standard normal density (u) is chosen as the kernel function for the drift and di¤usion estimates,
(u) = (u), U = [ 1:5; 2:5]; " = 0:1, R 2 f40; 80; 130g, the sample size N is set equal to 1; 000 and
the time horizon T 2 f50; 100; 150g. The simulations include a burning period of 200. All tests are
performed at the 95% level.
To be completed.
8
Further discussions and conclusions
This paper provides an automated procedure to jointly select all bandwidths needed to identify the
dynamics of popular classes of continuous-time Markov processes. It also proposes a randomized method
designed to test whether the rate conditions for almost-sure consistency and (zero mean) asymptotic
normality of the moment estimates are satis…ed in sample. Our approach is valid even in presence of
jumps or microstructure error.
Below we outline how to apply our procedure for bandwidth selection for discrete time Markov
processes. Further, we discuss the usefulness of the second step in a variety of nonparametric estimation
setting, without requiring the markovianity of the underlying process.
8.1
The discrete-time case
Our methods are applicable to the recurrent discrete-time kernel case. Although t = (yt
(Xt )) = (Xt )
2
is not necessarily locally Gaussian, it is immediate to see that e.g. E( t ) = 0; E( t ) = 1; E( t g(Xt )) = 0;
E 2t g(Xt ) =E(g(Xt )) for any function g which is FX measurable. Hence, one can select the bandwidth(s) in such a way to minimize the distance between sample moments of residuals and their theoretical counterparts. Indeed, the problem would be somewhat easier in discrete time. First, the initial
28
criterion would yield uniform consistency of both conditional moments since, di¤erently from
q our asb n (x);
sumed continuous-time framework, the two moments would converge at the same rate (i.e., hn L
b n (x) is, as earlier, the empirical occupation density of the underlying discrete-time process).
where L
Second, the bandwidth conditions needed to be tested would closely resemble those for the drift (in
Proposition 1). Importantly, however, the condition on the modulus of continuity of Brownian motion
(which is crucial in our case to approximate the continuous sample path of the process with its discretetime analogue and yield almost-sure consistency) would not be needed. Hence, the second-step procedure
b n (x)
would simply amount to testing whether, in-sample, the selected bandwidths are proportional to L
with 15 < < 1.
8.2
More on the second-step method
In both the stationary and the nonstationary case, irrespective of whether we operate in continuous or
in discrete time, the bandwidth conditions needed for consistency and (zero mean) asymptotic normality
of kernel estimators can be expressed as functions of the process’occupation density (and its divergence
rate). Even in cases for which the divergence rate of the occupation density can be quanti…ed in closedform (the stationary case, for example, for which it is T ), relying on an in-sample assessment of the
process’occupation density, rather than on purely-hypothetical divergence rates, is bound to provide a
more objective evaluation of the accuracy of bandwidth choices (particularly for persistent processes).
Our second-step procedure is designed to explicitly achieve this goal.
Importantly, however, our testing method may be disconnected from the …rst-stage method and
applied to smoothing sequences selected by virtue of alternative, possibly more classical, methods of
the kind routinely used in applied work. More generally, our test (and its logic) may, in principle, be
extended to evaluate choices in functional econometrics requiring the balancing of an asymptotic (and
…nite sample) trade-o¤ between bias and variance. The number of sieves or the number of autocovariances
in HAC estimation are possible examples. In this contexts, a test (like the one proposed in this paper)
which, under the null, implies that the assumed number is either too low or too high and provides, if the
null is not rejected, an easy quantitative rule to adjust the initial selection in either direction appears to
be appealing.
9
Appendix A
Proof of Theorem 1. Assume hN;T 2 H exists and satis…es
sup FNh (x)
(x)
x
p
!
N;T !1;
N;T !0
0:
(25)
Using the triangular inequality, write
sup FNh (x)
x
(x)
sup FNh (x)
x
FN (x)
where FN (x) is the empirical distribution function of
(
Xi N;T X(i 1) N;T
(X(i 1)
p
"i N;T =
(X(i 1) N;T )
N;T
29
sup jFN (x)
(x)j ;
x
N;T
)
N;T
: i = 2; :::; N
)
:
An application of the Law of Iterated Logarithm implies sup jFN (x)
(x)j = op (1). The result in Eq. (25)
x
combined with sup jFN (x)
(x)j = op (1) yields
x
sup FNh (x)
FN (x) = op (1):
x
But,
sup FNh (x)
FN (x)
x
=
supj
x
1
N
1
(X(i
N
X
Xi
1
N
= op (1)
1
X(i
1)
N
X
N;T
)
Xi
1
1)
(X(i
i=2
(X(i
1
N;T
bN;T (X(i
p
1) N;T )
N;T
1)
1)
N;T
)
N;T
X(i
1)
(X(i
i=2
N;T
N;T
1)
(X(i 1)
p
)
N;T
N;T
!
N;T
)
N;T
x
bN;T (X(i 1) N;T )
(X(i 1) N;T )
N;T
(X(i 1)
p
)
N;T
N;T
)
N;T
N;T
!
x j
gives
bN;T (a)
(a)
sup
a2D
and
sup
The converse follows from
(x)
x
p
bN;T (a)
(a)
(a)
a2D
sup FNh (x)
1 = op (1)
sup FNh (x)
x
t
= op (1):
FN (x) + sup jFN (x)
(x)j :
x
Proof of Theorem 2. Assume hN;T 2 H exists. Let (:; ")
H be an open ball of radius ". Then, from Eq.
(8) and Eq. (7), 8" > 0; 9 > 0 :
b N;T 2
P h
= (hN;T ; ")
P
N
+ sup
(x)
x
h
FNN;T (x)
>0
!
N;T !1;
N;T !0
This proves the second part of the theorem. Now we need to show that
9hN;T = h 2 H : sup FNh (x)
x
As supx jFN (x)
(x)
p
!
N;T !1;
N;T !0
0:
(x)j = op (1); and given the triangular inequality, it su¢ ces to show that
9hN;T = h 2 H : sup FNh (x)
x
FN (x)
p
!
0:
N;N ;T !1
Recalling the de…nition of FNh (x) in the proof of Theorem 1, note that
sup FNh (x)
x
FN (x) = sup jZN (x; h)j + sup jHN (x; h)j
x
x
where
30
0:
ZN (x; h)
1
=
HN (x; h)
N
1
1
=
N
1
N n
X
1 "i
N;T
(i 1)
N;T
(
(x))
N;T
(i 1)
N n
X
(
(i 1)
N;T
bN;T (X(i 1) N;T )
(X(i 1) N;T )
(X(i
1)
x
x
(i 1)
N
1
)
0
i
0
+ sup max
i
0
sup max x
=
i
x
(
(
(
(i 1)
N;T
N;T
:
(x) be a value on the line
(x),
0
(
(i 1)
(x))
N;T
(x)
N;T
(i 1)
o
x
i=2
sup max x
x
N;T
N n
X
1
x
N;T
bN;T (X(i 1) N;T )
p
N;T (X(i 1) N;T )
N;T
N;T
We start by bounding sup jHN (x; h)j. By the mean-value theorem, letting
sup
o
x + (x) ;
o
(x) :
(x))
i=2
(x) = x
segment connecting x and
1 "i
i=2
and
(i 1)
(x)
N;T
(i 1)
(i 1)
(i 1)
)
N;T
1
N
p
)
N;T
N;T
1
N
X
i=2
N
X
N;T
N
bN;T (X(i
1
(X(i
1)
(X(i
0
(
i
x
(X(i
)
N;T
1)
1)
) bN;T (X(i
(X(i 1) N;T )
1)
N;T
i=2
) IN;T + sup max
)
N;T
(i 1)
N;T
N;T
1)
)
N;T
)
) IIN;T
We begin by considering IIN;T :
IIN;T
p
1
sup
(x)
x
1
+ sup
(x)
x
= Op
p
Also, supx maxi x
N;T
N
p
N
X
i=1
N;T
N
(
(i 1)
N
X
i=1
dr
N;T hN;T
0
PN
+ Op
N;T
j=1
X(j
K
1)
X(i
N;T
1)
PN
j=1
PN
j=1
s
1
N;T
N;T
hdr
N;T
K
X(j
1)
N;T
PN
) IN;T = Op
hdr
N;T
p1
1)
K
X(j
1
supx LT (x)
!
1)
N;T
(j 1)
X(i
N;T
(Xs )
N;T
1)
(X(i
1)
N;T
) ds
N;T
hdr
N;T
1
N;T
N;T
hdr
N;T
j=1
N;T
X(i
X(j
K
Rj
1)
N;T
X(i
hdr
N;T
Rj
N;T
(j 1)
1)
N;T
(Xs )dWs
N;T
= op (1):
by a similar argument as that in the proof of Theorem 1 in
N
Corradi and Distaso (2008). Finally, sup jZN (x; h)j = op (1); by a similar argument as in the proof of Theorem 1
x
in Bai (1994) and Theorem 2.1 in Lee and Wei (1999).
Proof of Theorem 3. We begin with (i ). Suppose that VR;N;T = Ve1;R;N;T : Without loss of generality, we
assume that8 b
hdr;5
N;T
"b
LX (T; a) diverges at least at rate log T 8a 2 D: First note that for all j; conditional on the
In fact, we could allow it to diverge at rate log(log T ) simply by using exp exp supa2D b
hdr;5
N;T
dr;5 " b
b
instead of exp sup
h
L (T; a) :
8
a2D
N;T
X
31
"b
LX (T; a)
in (9)
sample, v1;j;N;T ' N 0; exp
N;T
=
R
A
!:T
"b
b
hdr;5
N;T
1
LX (T; a)da
exp
Z
A
: Let
"b
dr;5
b
hN;T
LX (T; a)da
>
for
arbitrarily large
so that, under H0 ; P (limN;T !1 N;T ) = 1. We shall proceed conditional on ! 2
without loss of generality, u > 0; we obtain
N;T :
For any u 2 U; assuming,
R
VR;N;T (u)
2 X
p
(1 fv1;j;N;T
R j=1
=
ug
E (1 fv1;j;N;T
R
2 X
+p
E (1 fv1;j;N;T
R j=1
where E (1 fv1;j;N;T
ug) = 1=2 + P (0
P (0
v1;j;N;T
v1;i;N;T
exp
= Op T
R
A
1=2
"b
b
hdr;5
LX (T; a)da
N;T
1=2
;
u) : Now, uniformly in u;
u)
1
=
1
2
ug)
ug))
Z
u
0
0
exp @
1
2
R
2 exp
:
A
x
A dx
dr;5 " b
b
hN;T LX (T; a)da
(26)
Thus,
R
2 X
(1 fv1;j;N;T
VR;N;T (u) = p
R j=1
ug
E (1 fv1;j;N;T
ug)) + op (1)
as R = o(T ): Given (26), and recalling that E (v1;s;N;T v1;j;N;T ) = 0 for s 6= j;
0
R
1 X
V ar @ p
(1 fv1;j;N;T ug E (1 fv1;j;N;T
R j=1
1
ug))A
R
=
1 X
E (1 fv1;j;N;T
R j=1
=
1=4 + OP (T
1=2
ug
E 1 fv1;j;N;T
2
ug)
);
where the Op (T 1=2 ) holds uniformly in u: Note that the asymptotic variance is equal to 1=4 regardless of the
evaluation point u: This is an immediate consequence of that fact that, as N; T ! 1; 1 fv1;j;N;T ug takes the
R
d R
2
same value, either 0 or 1, irrespectively of the evaluation point u: Hence, U V1;R;N;T
(u) (u)du ! U 21 (u)du
2
1:
We now turn to (ii ). Let
Z
dr;5 " b
+
b
=
!
:
exp
hN;T
LX (T; a)da
< , <1
N;T
A
so that, under HA ; P limN;T !1
+
N;T
d
= 1. For ! 2
+
N;T ,
exp
R
A
dr;5
b
hN;T
"b
LX (T; a)da
a:s:
! M
1: Hence,
v1;j;N;T ! N (0; M ): Let F (u) be the cumulative distribution function of a zero mean normal random variable
with variance M: Now,
R
2 X
p
1 fv1;i;N;T
R i=1
ug
R
=
2 X
p
(1 fv1;i;N;T
R i=1
ug
32
1
2
p
F (u)) + 2 R F (u)
1
2
:
(27)
Note that F (u) =
1
2
if, and only if, u = 0: Thus, R
R
R
1 2
2
F (u)
PR
2
U
(u)du diverges to in…nity at rate R: As for
2
the …rst term on the right-hand side of Eq. (27), U pR i=1 (1 fv1;i;N;T ug F (u))
the same argument used in the proof of Theorem 1(ii ) in Corradi and Swanson (2006).
(u)du = Op (1) by
Proof of Theorem 4 By the same argument as in the proof of Theorem 3.
Proof of Theorem 5 Because the compensator Xt Ey c Xt ; y can be treated as a component of
the drift function, for notational simplicity we denote the jump component by Jt and the jumpless component by
Yt : Thus, Xt = Yt + Jt : We …rst show that
b2J;N;T (a)
= op
s
PN
p
j=1
p
2=p
N;T
Xj
K
a
N;T
p
i=1
hdif
n;T
PN
j=1
N;T
hdif
N;T LX (T; a)
!
Y(j+i)
Xj
K
PN
p
j=1
p
2=p
hdif
N;T LX (T; a)
r
Jjr
a
N;T
N;T
N;T
N;T
a
PN
j=1
s
PN
p
j=1
p
2=p
hdif
N;T LX (T; a)
N;T
PN
N;T
j=1
p
1
+
'
p
s
Y(j+1)
2
p
p
2=p
hdif
N;T LX (T; a)
N;T
N;T
p
j=1
j=1
+
p
1
Y(j+1)
2
p
Xj
N;T
a
2
p
X(j+i)
Xj
K
p
i=1
2
p
N;T
Xj
K
p
i=1
N;T
N;T
; and let
2
p
Y(j+i)
N;T
a
N;T
hdif
N;T
J(j+i)
2
p
N;T
N;T
Xj
K
N;T
hdif
N;T
N;T
p
+ ::: +
p
2
p
J(j+p)
1
p 1
i=1
N;T
2
p
Y(j+i)
N;T
a
hdif
n;T
p
i=2 (j+i)
N;T
a
1)
hdif
N;T
J(j+i)
PN
PN
N;T
hdif
n;T
K
p
i=2
N;T
Xj
K
Y(j+i
N;T
1, it follows that
p
i=1
hdif
n;T
Y(j+i)
j Y jr + j Jjr
(j Y j + j Jj)
Xj
K
2
p
N;T
(28)
by the triangle inequality, monotonicity, and concavity given r
s
1)
hdif
N;T
with b2J;N;T (a) de…ned as in Eq. (16). Hereafter, let Y(j+i) N;T =
X(j+i) N;T and J(j+i) N;T be de…ned in an analogous manner. Since
j Y +
Y(j+i
N;T
p
i=1 (j+i)
a
+ ::: +
N;T
p
p
1
(j+p)
N;T
p 1
i=1
Y(j+i)
2
p
;
N;T
(29)
where (j+i) N;T = 1 if J(j+i) N;T 6= 0 and (j+i) N;T = 0 if J(j+i) N;T = 0; and 'p means "of the same probability order". Because on every …xed time span there is at most a …nite number of jumps, Pr ki=1 (j+i) N;T = 1 =
O
k
N;T
p
Op
for all k
1. Both E
p
i=1 (j+i)
p
i=1 (j+i)
and V ar
N;T
N;T
are O
p
N;T
p k
i=1
: Since
Y(j+i)
2
p
N;T
k
p
N;T
for k = 1; :::; p
1, then the relevant term is the last one. All other terms are of a smaller probability
order. Write
N;T
p
dif
2=p hN;T LX (T;a)
N;T
PN
p
j=1
K
Xj
N;T
a
(j+p)
hdif
n;T
N;T
hdif
N;T LX (T;x)
PN
j=1
K
Xj
33
N;T
N;T
hdif
N;T
p 1
i=1
a
Y(j+i)
2
p
N;T
p
= Op
1
p
N;T
:
=
q
Hence, the statement in Eq. (28) follows as
that
PN
p
j=1
p
2=p
N;T
Xj
K
a
N;T
p
i=1
hdif
n;T
PN
p
j=1
'
p
Xj
K
Y(j+i)
K
a
N;T
hdif
n;T
Xj
p
j=1
+
Xj
K
a
N;T
2
2
N;T
hdif
N;T
p
i=1
Xj
j=1
K
2
p
N;T
W(j+i
N;T
2
p
1)
N;T
Xj
K
p
2
2=p
Xj
N;T
!
a
N;T
hdif
N;T
N;T
a
N;T
hdif
N;T
W(j+i)
(a)
N;T
2
Xj
1)
N;T
PN
Xj
hdif
n;T
PN
! 0; for p > 2 (see footnote 4). Finally, note
a
j=1
PN
Y(j+i
N;T
a:s:
N;T
2
PN
j=1
p
2=p
p 2
2p
hdif
N;T LX (T; a)
(a)
= IN;T + IIN;T :
Since, given the …ltration Fj
p
2
2=p
Xj
N;T ;
N;T
= (Xs ; s
N;T
j
N;T ),
Ej
2
Xj
N;T
N;T
p
i=1
W(j+i)
W(j+i
N;T
1)
2
p
N;T
=
the term IN;T averages martingale di¤erences. Using BP (2003), its distribution is normal
r
b
hdif
N;T LX (T;a)
and its rate of convergence is
N;T
. The form of its asymptotic variance can be found along the lines of
Corradi and Distaso (2008). The term IIN;T is a bias term with order Op hdif
N;T . This proves the stated result.
Proof of Theorem 6. We begin with the drift. Write the estimation error decomposition as
r
=
b
hdr
N;l;T LX (T; x) b l;T (x)
N;T
q
b
hdr
N;l;T LX (T; x)
1
l;T
N;T
B X
l 1
X
B
B
+B
@
r
N;T
Y((j
K
b=1 j=1
x
1)B+b) N;T
hdr
N;l;T
Y(jB+b) N;T Y((j 1)B+b) N;T
PB Pl 1
Y((j 1)B+b=T ) N;T
b=1
j=1 K
hdr
b
hdr
N;l;T LX (T;x)
0
(x)
!
X((j
K
1)B+b) N;T
hdr
N;l;T
x
!!
+
x
N;l;T
b
hdr
N;l;T LX (T;x)
PB Pl
r
b
hdr
N;l;T LX (T; x)
b=1
1
j=1
K
X((j
N;T
b
hdr
N;l;T LX (T;x)
!
1)B+b) N;T
hdr
N;l;T
PB Pl
b=1
1
j=1
x
K
1
l;T
Y((j
Y(jB+b)
1)B+b) N;T
hdr
N;l;T
N;T
Y((j
1)B+b)
N;T
x
(x)
= IN;T;l + IIN;T;l :
(30)
Note that
K
'
p
Y((j
1 X((j
1)B+b) N;T
hdr
N;l;T
1)B+b)
N;T
x
!
2 x
K
X((j
hdr
N;l;T ; x
1)B+b) N;T
hdr
N;l;T
hdr
N;l;T
x
!
((j 1)B+b)
34
N;T
dr
[ x + hdr
N;l;T ; x + hN;l;T
((j 1)B+b=T )
N;T
:
Recalling that E
k
i
k=2
= O aN;T , with aN;T ! 0 as N; T ! 1, a straigthforward application of Markov
N;T
1=2
inequality ensures that supi=1;:::;N
IN;T;l
'
i
B X
l 1 n
X
1 x + hdr
N;l;T
N;T
pq
b (T; x)
dr
hN;l;T L
X
1
l;T
X(jB+b)
= Oa:s: N 1=k aN;T : Thus,
N;T
X((j
1)B+b)
N;T
b
hdr
N;l;T LX (T;x)
B X
l 1 n
X
N;T
1 x
1=2
N 1=k aN;T b=1 j=1
1)B+b=T )
1=2
N 1=k aN;T
p
LX (T;x)
q
dr
hN;l;T
E (IN;T;l ) = Op
PB Pl
1
j=1
b=1
+ hdr
N;l;T
K
N;T
N;T
!
is independent of
((j 1)B+b)
N;T
N;T
o
:
x
N;T
hdr
N;l;T
1=2
X((j
1)B+b)
1=k
x + hdr
aN;T
N;l;T + N
N;T
q
hdr
N;l;T
((j 1)B+b=T )
N;T
o
for all j, it is easy to see that
: As for the second moment,
3=2
1=2
E
1)B+b)
1
(jB+b)
l;T
Y((j 1)B+b)
+
N;T
= Op LX (T; x + hdr
N;l;T ) = Op (LX (T; x)) + Op
then, recalling that X((j
1=2
1=k
x + hdr
aN;T
N;l;T + N
b=1 j=1
N;T
Since
X((j
2
IN;T;l
N 1=k aN;T
=O
+
hdr
N;l;T
aN;T N 1=k
hdr
N;l;T
N;T
2
l;T
!
;
where the order of the …rst term can be derived as in the case of Eq. (31) and the order of the second term can
be obtained as in the case of Eq. (34) below (in both cases with the indicator kernel in place of a smooth kernel).
We now note that
q
1=2
1=2
N 1=k aN;T LX (T; x) a:s:
N 1=k aN;T
q
! 0)
!0
hdr
N;l;T
hdr
N;l;T
a:s:
since hdr
N;l;T LX (T; x) ! 1 (see below). Now write:
r
b
hdr
N;l;T LX (T; x) b l;T (x)
=
r
+
N;T
b
hdr
N;l;T LX (T;x)
r
PB Pl
N;T
b
hdr
N;l;T LX (T;x)
b=1
1
j=1
(x) = IIN;T + op (1)
X((j
K
1)B+b) N;T
hdr
N;l;T
N;T
PB Pl
b
hdr
N;l;T LX (T;x)
1
j=1
b=1
K
X((j
N;T
b
hdr
N;l;T LX (T;x)
= AN;l;T + BN;l;T + op (1):
x
1
l;T
PB Pl
1)B+b=T )
hdr
N;l;T
N;T
PB Pl
b=1
1
j=1
b=1
1
j=1
K
x
K
X(jB+b=T )
X((j
1
l;T
X((j
1)B+b=T )
hdr
N;l;T
(jB+b=T )
1)B+b=T )
hdr
N;l;T
N;T
X((j
N;T
N;T
N;T
x
1)B+b=T )
N;T
(x)
x
((j 1)B+b=T )
N;T
+ op (1)
We …rst show that BN;l;T is op (1): Because the denominator is bounded away from zero, it su¢ ces to show that
35
the numerator is op (1): Write
0
B X
l 1
X
N;T
var @ q
b
hdr
N;l;T LX (T; x)
0
' O@
2
N;T
b
hdr
N;l;T LX (T; x)
X((j
K
1)B+b) N;T
hdr
N;l;T
b=1 j=1
b
hdr
N;l;T LX (T; x)
1
2
A
l;T aN;T
1
N;T
since aN;T ! 0 if l = O(BT ). Now note that
N;T aN;T
2
l;T
x
1
l;T
(jB+b)
N;T aN;T
2
l;T
=O
1=2
! 0 and
!
!
((j 1)B+b)
N;T
N;T
= o(1)
1
A
(31)
3=2
N 1=k aN;T
!0)
hdr
N;l;T
aN;T N 1=k
hdr
N;l;T
N;T
2
l;T
! 0:
As for AN;l;T ,
AN;l;T
q
N;T
=
b
hdr
N;l;T LX (T; x)
1
l;T
X(jB+b)
B X
l 1
X
+
N;T
N;T
PB Pl
b=1
1
j=1
K
N;T
b
hdr
N;l;T LX (T;x)
x
1)B+b) N;T
hdr
N;l;T
X((j 1)B+b) N;T
PB Pl 1
b=1
j=1 K
b
hdr
N;l;T LX (T;x)
b
hdr
N;l;T LX (T;x)
K
b=1 j=1
N;T
r
X((j
X((j
X((j
X((j
PB Pl
1
j=1
1)B+b)
x
l;T
N;T
1)B+b) N;T
hdr
N;l;T
1)B+b) N;T
hdr
N;l;T
b=1
!
K
x
X((j
X((j
1)B+b)
1)B+b) N;T
hdr
N;l;T
N;T
(x)
:
x
(32)
By the same argument
used in the proof of Theorem 3 in BP (2003) the second term on the right-hand side of
q
a:s:
dr
Eq. (32) is Op
hN;l;T LX (T; x)hdr
and, of course, op (1) if hdr;5
N;l;T
N;l;T LX (T; x) ! 0: As for the …rst term on the
right-hand side of Eq. (32), write
!
B X
l 1
X
X((j 1)B+b) N;T x
N;T
q
K
hdr
b (T; x) b=1 j=1
N;l;T
hdr
L
X
N;l;T
R
1 (jB+b) N;T
(Xs )
X((j 1)B+b) N;T ds
l;T ((j 1)B+b) N;T
P
P
X
x
((j 1)B+b) N;T
B
l 1
N;T
b=1
j=1 K
b
hdr
dr
hN;l;T LX (T;x)
+
r
N;T
b
hdr
N;l;T LX (T;x)
N;l;T
PB Pl
b=1
1
j=1
K
N;T
b
hdr
N;l;T LX (T;x)
The …rst term in Eq. (33) is Op
1=2
l;T
log1=2 (1=
X((j
1)B+b) N;T
hdr
N;l;T
PB Pl
b=1
l;T )
1
j=1
K
x
X((j
1
l;T
R (jB+b)
N;T
((j 1)B+b)
1)B+b) N;T
hdr
N;l;T
x
N;T
(Xs ) dWs
:
(33)
q
hdr
N;l;T LX (T; x) : De…ne the second term on the right-hand
36
side of Eq. (33) as AN;T;l (x) and express its quadratic variation as
hAN;T;l (x)i
=
b
hdr
N;l;T LX (T; x)
2
l;T
*Z
b=1 i=1 j=1
(jB+b)
2
l;T
*Z
1
l;T
2
3
2
b=1 i>b j=1
N;T
(x)
1
Z
2
i
2
+
1)B+b) N;T
hdr
N;l;T
N;T
x
PB Pl
1
j=1
b=1
X((j
K
2
x
+
1)B+i) N;T
hdr
N;l;T
X((j
K
x
x
2
x
1)B+b) N;T
hdr
N;l;T
+ op (1)
X((j
x
1)B+b) N;T
hdr
N;l;T
N;T
b
hdr
N;l;T LX (T;x)
(x)
X((j
K
N;T
1)B+b) N;T
hdr
N;l;T
b
hdr
N;l;T LX (T;x)
K
b=1 i>b j=1
b
(jB+i)
N;T
N;T
B X
B X
l 1
X
B
1
j=1
1)B+i) N;T
hdr
N;l;T
(Xs ) dWs
X((j
K
(Xs ) dWs
((j 1)B+i)
b
hdr
N;l;T LX (T; x)
=
2
Z
PB Pl
b=1
X((j
K
((j 1)B+i=T )
N;T
B X
B X
l 1
X
(jB+b)
2
N;T
=
b
hdr
N;l;T LX (T;x)
(Xs ) dWs ;
b
hdr
N;l;T LX (T; x)
x
1)B+b) N;T
hdr
N;l;T
N;T
N;T
((j 1)B+b=T )
2
N;T
=
X((j
K
B X
B X
l 1
X
2
N;T
K
PB Pl
b=1
1
j=1
X((j
K
1)B+i) N;T
hdr
N;l;T
X((j
1)B+b) N;T
hdr
N;l;T
x
x
2
K 2 (s)ds + op (1):
(34)
Finally, the limiting distribution in the statement derives from a similar argument as that in the proof of Theorem
3 in Corradi and Distaso (2008).
We now turn to the di¤usion function estimator in (ii ). Write the estimation error decomposition as:
v
u
b
u hdif L
t N;l;T X (T; x) 2
2
bN;l;T (x)
(x)
v
u
u
= t
l;T
N;T B
1
b
hdif
N;l;T LX (T; x)
1
l;T
Y(jB+b)
B X
l 1
X
+
1)B+b)
PB Pl
1
j=1
b=1
PB Pl
v
u
b
u hdif L
t N;l;T X (T; x)
b=1
1
j=1
K
K
N;T
Y((j
X((j
RVT;N
1)B+b) N;T
hdif
N;l;T
N;T
!
K
X((j
1)B+b) N;T
hdif
N;l;T
x
!!
N;T
x
1)B+b) N;T
hdif
N;l;T
b
hdif
N;l;T LX (T;x)
2
x
1)B+b) N;T
hdif
N;l;T
2
Y((j
N;T
b
hdif
N;l;T LX (T;x)
1
N;T B
b
dif
hN;l;T LX (T;x)
K
b=1 j=1
N;T
r
Y((j
x
1
l;T
PB Pl
b=1
1
j=1
K
Y(jB+b)
Y((j
N;T
1)B+b) N;T
hdif
N;l;T
Y((j
2
1)B+b)
N;T
RVT;N
N;T
x
(x)
l;T
= CN;T;l + DN;T;l :
(35)
37
Expressing the kernel function as in part (i ):
v
u
B X
l 1 n
1
X
u
N;T B
t
CN;T;l 'p
1 x + hdr
N;l;T
dif b
hN;l;T LX (T; a) b=1 j=1
1
l;T
X(jB+b)
X((j
N;T
1)B+b)
N;T
0
Now E(CN;T;l ) = O @
1=2
N 1=k aN;T l1=2
r
b
hdif
N;l;T LX (T;x)
dif
h
L (T ;x)
N;l;T X
T
hdif
N;l;T
1
E
2
1
j=1
b=1
A and
hdif
N;l;T
Y((j
K
aN;T N 1=k
Bhdif
N;l;T
((j 1)B+b)
N;T
3=2
+
1=k
x + hdr
aN;T
N;l;T + N
N;T
(jB+b)
PB Pl
aN;T N 1=k
=O
1)B+b)
+
N;T
1=2
2
CN;T;l
1=2
X((j
5=2
+
l;T
hdif
N;l;T
2
l;T
RVT;N
N;T
N;T
:
x
1)B+b) N;T
hdif
N;l;T
aN;T N 1=k l
o
!
;
where, again, the order of the …rst term is analogous to that of Eq. (38), the order of the third term is analogous
to that of Eq. (36), and the order of the cross-product term is analogous to that of Eq. (37) below (in all cases
with the indicator kernel in place of a smooth kernel). Note that
1=2
N 1=k aN;T l1=2
q
hdif
N;l;T LX (T;x)
a:s:
T
! 0 and
hdif
N;l;T
s
hdif
N;l;T LX (T; x)
l;T
a:s:
! 1
1=2
aN;T N 1=k
)
hdif
N;l;T
! 0:
1=2
3=2
Similarly
aN;T N 1=k
Bhdif
N;l;T
l;T
v
u
b
u hdif L
t N;l;T X (T; x)
0r
B
= B
@
b2N;l;T (x)
l;T
1
N;T B
b
hdif
N;l;T LX (T;x)
PB Pl
+
v
u
u
2t
N;T B
2
X(jB+b)
X((j
N;T
1
C
(x)A
PB Pl
b=1
1
j=1
K
PB Pl
1
j=1
b=1
X((j
1
B X
l 1
X
X((j
b=1 j=1
N;T
N;T
X((j 1)B+b)
PB Pl
b
hdr
N;l;T LX (T;x)
b=1
K
1
l;T
X((j
x
b=1
1
j=1
x
(jB+b) N;T
X((j 1)B+b)
hdr
N;l;T
= IN;l;T + IIN;l;T + IIIN;l;T + op (1):
38
X(jB+b)
1)B+b) N;T
hdr
N;l;T
N;T
X((j
2
1)B+b)
(jB+b)
K
x
X((j
((j 1)B+b)
N;T
1)B+b) N;T
hdr
N;l;T
N;T
RVT;N
N;T
x
(36)
!
N;T
N;T
2
1
l;T
PB Pl
1)B+b) N;T
hdr
N;l;T
N;T
1
j=1
K
1)B+b) N;T
hdr
N;l;T
N;T
K
x
1)B+b) N;T
hdr
N;l;T
b
hdr
N;l;T LX (T;x)
b
hdif
N;l;T LX (T; x)
1
l;T
K
! 0. Now write
hdif
N;l;T
(x) = DN;T;l + op (1)
b
hdr
N;l;T LX (T;x)
l;T
1
N;T B
b
dif
hN;l;T LX (T;x)
2
1
j=1
b=1
v
u
b
u hdif L
t N;l;T X (T; x)
r
aN;T N 1=k
! 0 if l = O(BT ) given
((j 1)B+b)
x
N;T
+ op (1)
(37)
Now notice that IIIN;l;T is Op
1=2
aN;T N 1=k
hdif
N;l;T
aN;T
B l;T
a2N;T l
and IIN;l;T is O
5=2
a2N;T l
: If
2
l;T
! 0, then
2
l;T
aN;T N 1=k l
hdif
N;l;T
2
l;T
! 0 since
! 0. Finally,
v
u
u
= t
IN;l;T
N;T B
1
B X
l 1
X
X((j
K
b (T; x)
hdif
L
j=1
b=1
X
N;l;T
R (jB+b) N;T
1
X((j
l;T 2 ((j 1)B+b) N;T Xs
P
P
B
l
1
N;T
b=1
j=1 K
b
dr
1)B+b)
X((j
hN;l;T LX (T;x)
2
Noting that limB!1 B
b=1
hIN;l;T i
=
b
Bhdif
N;l;T LX (T; x)
*Z
4
N;T
Xs
x
b
hdr
N;l;T LX (T;x)
PB Pl
X((j
N;T
1)B+b)
b=1
K
1
j=1
K
x
X((j
K
B X
B X
l 1
X
b=1 i>b j=1
1)B+b) N;T
hdr
N;l;T
N;T
b
hdr
N;l;T LX (T;x)
2
i b
(x) 1
B
Z
4
(x) K 2 (s)ds + op (1):
X((j
+ op (1):
x
PB Pl
b=1
K
1
j=1
K
1)B+i) N;T
hdr
N;l;T
X((j
(Xs ) dWs
N;T
4
=
(Xs )dWs
N;T
1)B+b) N;T
hdr
N;l;T
1)B+b) N;T
hdr
N;l;T
N;T
N;T
4
X((j
K
B X
B X
l 1
X
((j 1)B+i)
!
= 1=4, we obtain
b=1 i>b j=1
(jB+b)
b
Bhdif
N;l;T LX (T; x)
i b 2
B
1
i>b
N;T
2
l;T
=
PB PB
x
1)B+b) N;T
hdr
N;l;T
+
1)B+b) N;T
hdr
N;l;T
x
x
2
+ op (1)
X((j
X((j
1)B+i) N;T
hdr
N;l;T
1)B+b) N;T
hdr
N;l;T
x
x
2
(38)
The stated result now follows.
10
Appendix B
Let N;T = T =N and M;T = T =M; with M < N; be the discrete intervals used in estimation of spot volatility
and in the estimation of volatility drift and variance respectively. Bandi and Reno’(2008: BR08) have established
additional rate conditions under which the estimation error at the …rst step is asymptotically negligible. More
precisely, there are four additional conditions, two for the drift and two for the variance. As for the drift, from
Section 4 in BR08, the …rst additional condition reads as:
T Lv
1=2
(T; a)
dr;1=2
M;T hM;T T
where
1
2
and
=
1
2
N
a:s:
N;T
! 0;
in the case of estimators non-robust to microstructure noise. This requires
dr;1=2
(1+
M < N hM;T T
N)
L1=2
v (T; a):
The second drift condition reads as:
T LT
1=2
dr;1=2
M;T hM;T
T
N =2
39
log T
N =2
! 0;
(39)
which requires
M < L1=2
v (T; a)T
N =2
dr;1=2
N =2
hM;T log(T
):
(40)
By equating the right-hand sides of the inequalities in Eq. (39) and Eq. (40), as earlier, we set
as to guarantee that
T
Ignoring now log(T
N =2
1+
N
N+ 2
N =2
(log(T
N
in such a way
))1= = N:
(41)
); and plugging (41) into (39), one may write
M
<
dr;1=2
N hM;T N
N(2
+1
N =2
T
) hdr;1=2 T ( 2
M;T
1=2
LT
+1
) L1=2
T
which is indeed condition (21) in Section 5.
We now turn to asymptotic normality of the spot volatility’s di¤usion. The …rst condition, reads
T Lv
1=2
(T; a)
3=2 dif;1=2
T
M;T hM;T
a:s:
N
! 0
N;T
which requires
2
1
dif 1=3
M < N 3 hM;T T 3
The second condition reads
T Lv
1=2
(T; a)
3=2 dif;1=2
M;T hM;T
( 23 + 23
N =2
T
N)
log T
L1=3
v (T; a):
N =2
(42)
a:s:
! 0
which requires
1
3(
M < L1=3
v (T; a)T
N +1)
dif;1=3
hM;T
By equating the right-hand sides of Eq. (42) and Eq. (43), we can set
T
N +2 +2
2
N
log(T
N =3
N =3
log(T
N
):
in such a way that
3
) 2 = N:
(44)
Thus, plugging Eq. (44) into Eq. (42), and neglecting the logarithm, we obtain:
M
<
1=3
L1=3
T
v (T; a)N
N
6
1=3
L1=3
N (
v (T; a)N
which is indeed condition (22) in Section 5.
40
dif;1=3
hM;T
2
1+2
)
1
6
N =3
log(T
T(
2
1+2
)
(43)
1
6
)
dif;1=3
hM;T :
References
[1] AIT-SAHALIA, Y., P.A. MYKLAND and L. ZHANG (2006). Ultra High Frequency Volatility
Estimation with Dependent Microstructure Noise. Working Paper, Princeton University.
[2] ANDERSEN, T.G., T. BOLLERSLEV, F.X. DIEBOLD and P. LABYS (2000). Great Realizations.
Risk, March, 105-108.
[3] AWARTANI, B., V. CORRADI and W. DISTASO (2009). Assessing Market Microstructure E¤ects
with an Application to the Dow Jones Industrial Average Stocks. Journal of Business and Economic
Statistics, 27, 251-265.
[4] BAI, J. (1994). Weal Convergence of Sequential Empirical Processes of Residuals in ARMA Models.
Annals of Statistics, 22, 2051-2061.
[5] BANDI, F.M., and G. MOLOCHE (2004). On the Functional Estimation of Multivariate Di¤usion
Processes. Working paper.
[6] BANDI, F.M., and T. NGUYEN (2003). On the Functional Estimation of Jump-Di¤usion Models.
Journal of Econometrics 116, 293-328.
[7] BANDI, F.M., and P.C.B. PHILLIPS (2003). Fully Nonparametric Estimation of Scalar Di¤usion
Models. Econometrica 71, 241-283.
[8] BANDI, F.M., and P.C.B. PHILLIPS (2007). A Simple Approach to the Parametric Estimation of
Potentially Nonstationary Di¤usions. Journal of Econometrics 137, 354-395.
[9] BANDI, F. M., and R RENÒ
http://ssrn.com/abstract=1158438.
(2008).
Nonparametric
Stochastic
Volatility.
SSRN:
[10] BARNDORFF-NIELSEN, O.E., N. SHEPHARD, and M. WINKEL (2006). Limit Theorems for
Multipower Variation in the Presence of Jumps. Stochastic Processes and Their Applications 116,
796-806.
[11] BARNDORFF-NIELSEN, O.E., A. LUNDE, P.R. HANSEN, and N. SHEPHARD (2008a). Multivariate Realised Kernels: Consistent Positive Seni-De…nite Estimators of the Covariation of Equity
Prices with Noise and Non-Synchronous Trading. Working Paper. Stanford University.
[12] BARNDORFF-NIELSEN, O.E., P.R. HANSEN, A. LUNDE and N. SHEPHARD (2008b). Designing Realized Kernels to Measure the Ex-Post Variation of Equity Prices in the Presence of Noise.
Econometrica, 76, 1481-1536.
[13] BRUGIÈRE, P. (1991). Estimation de la Variance d’un Processus de Di¤usion dans le Cas Multidimensionnel. C. R. Acad. Sci., T. 312, 1, 999-1005.
[14] CORRADI, V., and W. DISTASO (2008). Diagnostic Tests for Volatility Models. Working paper,
Imperial College.
41
[15] CORRADI, V., and N. SWANSON (2006). The E¤ects of Data Transformation on Common Cycle,
Cointegration and Unit Root Tests: Monte Carlo and a Simple Test. Journal of Econometrics, 132,
195-229.
[16] CORRADI, V., and H. WHITE (1999). Speci…cation Tests for the Variance of a Di¤usion. Journal
of Time Series Analysis 20, 253-270.
[17] FAN, J., and C. ZHANG (2003). A Re-Examination of Di¤usion Estimators with Applications to
Financial Model Validation. Journal of the American Statistical Association 98, 118-134.
[18] FLORENS-ZMIROU, D. (1993). On Estimating the Di¤usion Coe¢ cient from Discrete Observations. Journal of Applied Probability 30, 790-804.
[19] ERAKER, B., M. JOHANNES, and N. POLSON. (2003), The Impact of Jumps in Volatility and
Returns, Journal of Finance, 58, 1269-1300.
[20] GUERRE, E. (2004). Design-Adaptive Pointwise Nonparametric Regression Estimation for Recurrent Markov Time Series. Working Paper, Queen Mary, University of London.
[21] JAQUIER, E., N.G. POLSON, and P.E. Rossi (1994). Bayesian Analysis of Stochastic Volatility
Models, Journal of Business Economics and Statistics, 12, 371-389.
[22] JACOD, J. (1997). Nonparametric Kernel Estimation of the Di¤usion Coe¢ cient of a Di¤usion.
Prépublication No. 405. du Laboratoire de Probabilités de l’Université Paris VI.
[23] JOHANNES, M. (2004). The Statistical and Economic Role of Jumps in Continuous-Time Interest
Rate Models. Journal of Finance 59, 227-260.
[24] KANAYA, S., and D. KRISTENSEN (2008). Estimation of Stochastic Volatility Models by Nonparametric Filtering, Working Paper, Columbia University.
[25] KARLSEN, H.A. and V. TJOSTHEIM (2001). Nonparametric Estimation in Null Recurrent Time
Series. Annals of Statistics, 29, 372-416.
[26] KRISTENSEN, D. (2008). Nonparametric Filtering of Realized Spot Volatility: a Kernel-based
Approach. Econometric Theory, forthcoming.
[27] LEE, S. and C.Z. WEI. (1999). On Residual Empirical Processes of Stochastic Regression Models
with Application to Time Series, Annals of Statistics, 27, 237-261.
[28] MOLOCHE, G. (2004). Local Nonparametric Estimation of Scalar Di¤usions. Working paper.
[29] STANTON, R. (1997). A Nonparametric Model of Term Structure Dynamics and the Market Price
of Interest Rate Risk. Journal of Finance 52, 1973-2002.
[30] ZHANG, L., P.A. MYKLAND and Y. AIT-SAHALIA (2005). A Tale of Two Time Scales: Determining integrated volatility with Noisy High Frequency Data. Journal of the American Statistical
Association, 100, 1394-1411.
42
© Copyright 2026 Paperzz