News - good or bad - and its impact on volatility predictions over

News - good or bad - and its impact on volatility
predictions over multiple horizons∗
Xilong Chen†
Eric Ghysels‡
First Draft: April 2006
This Draft: March 10, 2008
Abstract
We examine whether the sign and magnitude of discretely sampled high frequency returns have
impact on future volatility predictions. We first let the ’data speak’, namely with minimal
interference we capture the mapping between returns over short horizons and future volatility
over longer horizons. Technically speaking, we introduce semi-parametric MIDAS regressions.
Compared to the semi-parametric infinite ARCH estimation in Linton and Mammen (2005) we
show that the asymptotic distribution of semi-parametric MIDAS regressions depends on the
mixed data sampling scheme. Also novel is the parametric specification we consider to deal with
for intra-daily/daily lags. In the empirical work we revisit the concept of news impact curves
introduced by Engle and Ng (1993), in the current high frequency data environment of financial
market time series. We find that moderately good (intra-daily) news reduces volatility (the
next day), while both very good news (unusual high positive returns) and bad news (negative
returns) increase volatility, with the latter having a more severe impact. The asymmetries
we find have profound implications for current volatility prediction models that are based on
in-sample asymptotic analysis developed over recent years.
∗
We like to especially thank Oliver Linton for comments and sharing with us software as well as Neil
Shephard for insightful comments during the early stages of the paper. In addition we like to thank three
referees and the Editor for many helpful suggestions and comments on a previous version of the paper. We
also like to thank Robert Engle, Benoı̂t Perron and Eric Renault for providing helpful comments as well as
seminar participants at CIREQ-CIRANO conference, the New York Fed Research Department, MIT Sloan,
the University of Chicago Stevanovich Center for Financial Mathematics conference on volatility and UNC.
†
Deptartment of Economics, University of North Carolina at Chapel Hill, Gardner Hall CB 3305, Chapel
Hill, NC 27599-3305, e-mail: [email protected]
‡
Department of Finance, Kenan-Flagler Business School, and Department of Economics, University of
North Carolina at Chapel Hill, email: [email protected]
1
Introduction
Let’s imagine a modern era Rip Van Winkle who, for some unknown reason, had a weakness
for watching and studying stock market volatility.1 Before his long sleep, he watched
ARCH-type models being developed and enriched to fit the stylized features of asset market
fluctuations. The raw data were daily returns, and the stylized facts were the phenomenon
of volatility clustering and various other regularities, including for equity markets the
observation that volatility showed asymmetries - i.e. the response to good and bad news
appeared different. The asymmetry was exploited, notably by Engle and Ng (1993), who
introduced the notion of news impact curve both as an object of economic interest and a
diagnostic tool for volatility modeling. Despite all the exciting developments since ARCH
models saw the daylight, our story’s fictional protagonist falls asleep in the mid-90s.
When Rip Van Winkle wakes up a decade later he is bewildered by the fact that the data
is different, the models are different, the issues are different. Stylized facts used to drive
volatility model specifications, parametric models that is, before he fell asleep. Now it
seems that measurement has taken over most of the discussions. There is data of every
transaction and it enables one to measure so called realized volatility - a post mortem
sample realization of the increments in quadratic variation of an underlying continuous time
process. Measurement, as it turns out, is not easy, as transactions may be affected by
microstructure noise and quadratic variation increments may contain a jump component
which one would like to separate from the rest. Rip Van Winkle still recognizes one stylized
feature, the importance of volatility clustering. What happened to the other features of
(daily?) data that so many modelers had tried to capture with the next variation on ARCH?
For instance, what happened to asymmetries or so called leverage? The simple measures
of realized volatilities involve the intra-daily sum of high frequency squared returns. More
sophisticated measurements that separate jumps or account for microstructure noise, one
way or another, are also based on squared returns. Did the stylized facts, like asymmetries,
disappear or become irrelevant with high frequency data?
With the focus shifted towards measurement, it is indeed the case that leverage has no impact
1
The story of Rip Van Winkle is about a villager of Dutch descent, who one autumn day settles down
under a shady tree and falls asleep. He wakes up twenty years later and returns to his village and discovers a
different world. Rip Van Winkle - still a loyal subject of King George III - wakes up not knowing that in the
meantime the American Revolution had taken place. It is a celebrated short story, written by the American
author Washington Irving and published in 1819.
1
on the in-sample asymptotic analysis that was developed against the backdrop of increasingly
available high frequency financial data (see Jacod (1994), Jacod (1996) and Barndorff-Nielsen
and Shephard (2002) as well as the recent survey by Barndorff-Nielsen and Shephard (2007)).
Linear models are used to predict future volatility, instead of ARCH-type models involving
daily returns, and they rely on the most accurate measures of daily volatility such as realized
volatility. The observation that leverage does not affect measurement appears to have given
credence to the fact that asymmetries do not matter for forecasting. It is worth recalling
that originally, news impact curves were formulated within the context of daily ARCH type
models. Therefore, news was defined with respect to a particular choice of a daily volatility
model, and the impact curve measured how news, innovations in daily returns that is, affect
tomorrow’s expected volatility. One may therefore wonder whether it is because daily data
was used, that leverage mattered, and that the use of high frequency now has nullified the
issue.
It is not obvious how we would go about answering the question whether the sign and
magnitude of discretely sampled high frequency returns have any impact on future volatility
predictions. First, the raw input is a return over a short interval and the prediction period
is not the next short interval, but rather some arbitrary future period - say the next day,
week, etc. The mismatch of observation frequency and prediction horizon brings about issues
that cannot be easily handled by simple linear models - let alone ARCH-type models. Then
there is also the pervasive intra-daily seasonality that prevents one from putting each high
frequency interval on unequal footing.
In the paper we try to answer the question whether the sign and magnitude of discretely
sampled high frequency returns have impact on future volatility predictions. We first let the
’data speak’, namely with minimal interference we capture the mapping between returns over
short horizons and future volatility over longer horizons. To cut straight to the main point,
consider an illustrative example of our findings. We take five minute returns on the Dow
Jones Industrial Average index as the primitive input, and the next day’s realized volatility
as the future outcome of interest - hence we are thinking along the lines of Engle and Ng
(1993) but without a daily volatility model. The typical picture for one day ahead (ignoring
the intra-day effects) that emerges from our analysis is as follows:
2
10
5% Lower Bound
News Impact Curve
95% Upper Bound
Impact onTomorrow’s RV
8
6
4
2
0
−2
−0.2
−0.15
−0.1
−0.05
0
0.05
5−minute Return
0.1
0.15
0.2
The X-axis measures 5-minute returns in the Dow Jones Industrial Average index. The
Y-axis is tomorrow’s volatility.2 The pattern that emerges is interesting. Good news reduces
tomorrow’s volatility (recall this is the impact of a five minute return - up to a scaling factor),
but very good news tends to increase volatility, as does bad news. This asymmetric pattern
has been recognized in the past, notably by Engle and Ng (1993). Here, however, we can
carry this further across different horizons.
Technically speaking, we introduce semi-parametric MIDAS regressions. The analysis in
this paper is inspired by recent work on MIDAS regressions, in particular in the context of
volatility as in Ghysels, Santa-Clara, and Valkanov (2006) and Forsberg and Ghysels (2006).
Compared to the semi-parametric infinite ARCH estimation in Linton and Mammen (2005)
we show that the asymptotic distribution of semi-parametric MIDAS regressions depends on
the mixed data sampling scheme. The new asymptotic results - showing the impact of mixed
data sampling - are of general interest, since the semi-parametric MIDAS regression model
has applications beyond that of news impact curves. Also novel is the parametric specification
we consider to deal with for intra-daily/daily lags. We introduce a multiplicative scheme that
seems to handle high frequency data well and extends the aforementioned existing papers.
We also introduce various new parametric models applicable to intra-daily returns, that are
inspired by the asymmetric (daily) GARCH models.
The paper is organized as follows.
2
In section 2 we introduce semi-parametric MIDAS
We also skip for the moment how the confidence band around the curve is constructed.
3
regression models in the context of news impact curves and volatility prediction. We also
cover intra-daily seasonality issues. Next, in section 3 we discuss asymptotic properties.
Empirical results are reported in section 4, while in section 5 we introduce a new class of
parametric models - inspired by the original ARCH-type news impact models - that produce
low frequency predictions using high frequency data. Section 6 concludes the paper.
2
Volatility Measurement and Model Specification
Volatility is a prevailing feature of financial markets. Its presence implies risk and although
asset returns are often represented as a martingale difference series, volatility displays
strong persistence and should therefore be predictable. The recent vintage of volatility
models can be written as a linear autoregressive prediction based on so called Realized
Variance, the sum of the squared intra-daily returns. More specifically, we think of
return i on day t, and in each period, there are M subperiods. To fix notation, let
rt−1+i/M , i = 1, ..., M, denote the high-frequency return in subperiod i of period t, where
rt−1+i/M ≡ log(Pt−1+i/M ) − log(Pt−1+(i−1)/M ), P is the asset price. Such high frequency
intra-daily returns are used to compute Realized Variance, namely:
RVt ≡
yielding:
RVt+1 =
M
X
2
rt−1+i/M
(2.1)
i=1
τ
X
ψj (θ)RVt−j + εt
(2.2)
j=0
where the lag coefficients are parameterized by θ. Models based on daily RV have become
very popular see e.g. Andersen, Bollerslev, and Diebold (2006) and Barndorff-Nielsen and
Shephard (2007) and references therein for numerous applications. Obviously, there are
many variations on this basic theme. It was noted that RV may include jumps and
that the separation between continuous path part of integrated volatility (the population
counterpart of RV ) and the jump component might be useful in formulating a prediction
model.3 Moreover, high frequency returns may be affected by microstructure noise that
3
On the subject of extracting jump see for instance, Aı̈t-Sahalia (2004), Aı̈t-Sahalia and Jacod (2007b),
Aı̈t-Sahalia and Jacod (2007a), Andersen, Bollerslev, and Diebold (2006), Barndorff-Nielsen, Graversen,
Jacod, and Shephard (2006), Barndorff-Nielsen and Shephard (2006), Barndorff-Nielson and Shephard
4
masks the true price variation, and therefore various corrected measures of RV have been
suggested.4 In addition, linear models are not always autoregressive as in equation (2.2), and
come in various forms including the so called heterogeneous autoregressive (HAR) model of
Corsi (2003) and MIDAS regression models.5
2.1
A new class of regression models
We will not aggregate high frequency returns to daily RV measures, instead we will use
them directly as regressors, for the purpose of forecasting future daily, weekly or monthly
volatility which will be measure by future realized variance appearing in equation (2.1). Note
two important issues, namely (1) we gain information since we do not aggregate intra-daily
returns and (2) we do not impose the quadratic variation transformation - that is square
intra-daily returns - but instead let the regression fit decide which functional to take through
the semi-parametric setting.6
To predict future volatility with the past high-frequency return, we propose the following
semi-parametric model MIDAS regression model (again for simplicity restricting ourselves
to a single day):
τ X
M
X
RVt+1 =
ψij (θ)m(rt−j+(i−1)/M ) + εt
(2.3)
j=1 i=1
where ψij (θ) is a known lag coefficients function with unknown parameter vector θ and m(.)
is an unknown function. Our analysis is much inspired by the recent work of Linton and
Mammen (2005) who propose the semi-parametric ARCH(∞). The difference between the
semi-parametric ARCH(∞) and the above regression is the mixed data sampling scheme.
Moreover, the difference between the above setting and existing MIDAS regressions applied
to volatility prediction is the presence of the unknown function m(.).7 In the implementation
(2004), Huang and Tauchen (2005), Tauchen and Zhou (2005), among others.
4
See for example, Aı̈t-Sahalia and Mancini (2006), Aı̈t-Sahalia, Mykland, and Zhang (2005), Andersen,
Bollerslev, and Meddahi (2006), Bandi and Russell (2006), Bandi and Russell (2005), Barndorff-Nielsen,
Hansen, Lunde, and Shephard (2006), Ghysels and Sinko (2006) and Hansen and Lunde (2006).
5
The Corsi model can be viewed as a special case of a MIDAS regression with so called stepfunctions as
noted in Ghysels, Sinko, and Valkanov (2006) and Forsberg and Ghysels (2006).
6
There are some notable exceptions in the recent literature that have tried to accommodate asymmetries,
including Barndorff-Nielsen, Kinnebrock, and Shephard (2008) and Engle and Gallo (2006). Both consider
some form of ’signed’ daily variances, i.e. variance measures multiplied by a sign indicator function. In
contrast, we use the sign of the intra-daily returns directly without aggregation.
7
The projection of RV on intra-daily data was considered in Ghysels, Santa-Clara, and Valkanov (2006).
5
of equation (2.3) we will use a new parametric specification of the polynomial different from
prior applications.
Positivity constraints need to be imposed, since we are dealing with volatility prediction
models. First of all we should note that we could consider log RV, and this can easily be
done in the present context as well.8 Given the similarity with Linton and Mammen (2005)
it is not surprising that we follow their approach.9
2.2
Dealing with intra-daily seasonality
Intra-daily seasonality in financial markets is pervasive. Wood, McInish, and Ord (1985),
one of the earliest studies employing intra-daily data and documenting the well known Ushaped pattern, finds that the trading day return series is non-stationary and is characterized
by a low-order autoregressive process. Much has been written on the topic of seasonality
in economic time series (see e.g. Ghysels and Osborn (2001) for a comprehensive survey).
Broadly speaking there are two approaches: (1) seasonally adjust series and construct nonseasonal models subsequently, or (2) build seasonal features of the data into the model
specification. Most of the literature on the topic has dealt with linear time series models.
The topic of intra-daily seasonality, while less developed, has seen similar approaches.10
We will deal with intra-daily seasonality along two different lines, one relatively standard,
the other being novel. To start with the standard one, let us reconsider equation (2.3), using
They used intra-daily squared or absolute returns - hence the nonparametric estimation was absent. In
addition, the parametric specification of the polynomial considered in Ghysels, Santa-Clara, and Valkanov
(2006) did not take into account intra-daily seasonal patterns.
8
Most of the literature on news impact curves deals with the level of volatility, although the original
work by Engle and Ng (1993) considered both level and log specifications.
9
Linton and Mammen (2005) point out in footnote 5 of their paper (p. 782) that ”... in small samples we
can find m(y) < 0 for some y, ...”. To remedy the problem they introduce max(σ̂ 2 , > 0) to guarantee
positivity. We are grateful to Oliver Linton for sharing his code with us - the positivity constraint
max(σ̂ 2 , > 0) appears in both our code and that underlying Linton and Mammen (2005). As noted
later, we never encountered cases in our empirical work where the constraint was binding. It is perhaps also
worth noting that this is also a concern for parametric models - notably those discussed later in the paper.
For example on page 807 (footnote 10), Glosten, Jagannathan, and Runkle (1993) (henceforth GJR) note
that for the monthly frequency there is monotonic decreasing news impact curve. Therefore, although for
the data range they report a positive news impact curve is above 0, the news impact curve may be negative
as well for a broader range of returns.
10
See e.g. Andersen and Bollerslev (1997), Andersen and Bollerslev (1998), Bollen and Inder (2002),
Dacorogna, Gençay, Müller, Olsen, and Pictet (2001), Martens, Chang, and Taylor (2002), among others.
6
’seasonally adjusted high frequency returns
RVt+1 =
τ X
M
X
sa
ψij (θ)m(rt−j+(i−1)/M
) + εt
(2.4)
j=1 i=1
sa
where returns are adjusted as follows rt−1+i/M
= (rt−1+i/M − r i )/si , i = 1, . . . , M, t = 1, . . . ,
q
PT
PT
1
2
T, where r i = 1/T t=1 rt−1+i/M and si = T −1
t=1 (rt−1+i/M − r i ) , which means that we
demean the high frequency returns by their intra-daily seasonal mean and normalize them
by the intra-daily seasonal standard deviation. Arguably, this does not take out seasonality
in higher moments, and since we deal with nonparametric models, this may be an issue.
In section 5 we will consider some parametric specifications for the news impact curve m.
While they will be discussed in detail later, it is worth taking two simple examples for the
purpose of explaining the treatment of intra-daily seasonal effects. Namely, consider (1) a
simple symmetric news impact curve, i.e. m(x) = ax2 and (2) the asymmetric specification of
Nelson (1991), i.e. m(x) = ax + b|x|. In both cases, making the reasonable assumption that
ri = 0, the above seasonal adjustment scheme amounts to (1) m(x) = ai x2 with ai ≡ a/s2i
and (2) m(x) = ai x + bi |x|, with ai ≡ a/si and bi ≡ b/si . This suggest we might want to look
at specifications of the type γi m(x), or more generally mi (x), involving unadjusted returns.
The unappealing feature of this approach is that we would have to estimate M parameters
γi in addition to the function m in the former case or M nonparametric functions in the
latter. This is unappealing because it is not parsimonious. The scheme we propose amounts
to formulating a parsimonious parameterization of the intra-daily seasonal effects and above
all, it seems to work as the empirical results show.
The advantage of the second approach is that it builds the intra-daily period behavior directly
into the model specification and is to the best of our knowledge novel. The parametric
specification we consider will be multiplicative for intra-daily/daily lags. Namely, we define
ψij (.) as:
ψij (θ) = ψj (θ)ψi (θ) = Beta(j, τ, θ1 , θ2 ) × Beta(i, M, θ3 , θ4 )
(2.5)
where the beta polynomial specification has been used in prior work, notably in Ghysels,
Santa-Clara, and Valkanov (2002).11 Here we accommodate intra-daily patterns via
Beta(i, M, θ3 , θ4 ) while the daily memory decay is patterned via Beta(j, τ, θ1 , θ2 ). Note that
11
More specifically: Beta(k, K, α, β) = (k/(K + 1))α−1 (1 − k/(K + 1))β−1 Γ(α + β)/Γ(α)/Γ(β) and Γ(α)
R +∞ −t α−1
= 0 e t
dt.
7
we impose the restriction that the intra-daily patterns wash out across the entire day, i.e.
we impose the restriction that the weights of the intra-daily polynomial add up to one,
P
i Beta(i, M, θ3 , θ4 ) = 1. We could also impose a similar restriction on the daily polynomial
in order to separately identify a slope coefficient. The intra-daily seasonal pattern may not
fully captured by the Beta(i, M, θ3 , θ4 ) polynomial. Its virtue, compared to the estimation
of unconstrained γi is that it appears to work very well empirically with only two parameters
to estimate. Other more complex specifications could be conceivable.
It should be noted of course that the multiplicative scheme is of course restrictive as general
news impact curves would not scale like the two token parametric examples we used. There
are both empirical and theoretical issues that need to be raised and they will be discussed
later in the paper. For the purpose of theoretical development we will consider the general
case, i.e. where one estimates a news impact curve for every separate high frequency return
interval. This setup is reminiscent of the periodic ARCH model of Bollerslev and Ghysels
(1996) which has a periodic parametric news impact curve, i.e. a news impact curve for the
M high frequency intervals. While this setting is feasible in a parametric context, it is only
of theoretical interest in a nonparametric setting, i.e. it will be used for the development
of asymptotic properties in the next section. In periodic models, it is common to stack
all the observations pertaining to one period into a vector and treat the specification as
a multivariate stationary process (see e.g. Gladyshev (1961)). Consider the M -dimensional
vector {xt−1+1/M , ..., xt−1+M/M }t is stationary. Let mi (.), i = 1, ..., M, denote the news impact
curve of {xt−1+i/M }t . We can rewrite model (3.1) as
yt =
τ
X
Bj (θ)mj#M (xt−1−(j−1)/M ) + εt
(2.6)
j=1
where the subscript j#M is defined as 1 + (j − 1 − M ∗ d(j − 1)/M e), the sum of one and
the reminder of j − 1 divided by M ; dxe is the biggest integer less than x. This framework
is the general one for the purpose of asymptotic analysis.
8
3
Asymptotic analysis of semi-parametric MIDAS
regression models
It was noted before that our analysis is much inspired by the recent work of Linton and
Mammen (2005) who propose the semi-parametric ARCH(∞), henceforth called SPARCH.
The estimation approach in Linton and Mammen (2005) uses kernel smoothing methods
and solve a so called type II linear integral equation. While there are similarities between
semi-parametric MIDAS regressions and the work of Linton and Mammen (2005), it will
also become clear there are important and novel differences. Consider the following generic
setting where a regressor x is sampled M times more frequent (equally spaced) than yt :
yt =
τ
X
Bj (θ)m(xt−1−(j−1)/M ) + εt
(3.1)
j=1
where εt is a martingale difference sequence with its mean independent of the past of the
regressors xs/M . The lag coefficients Bj (.), j = 1, ..., τ , are described by a finite dimensional
P
parameter θ ∈ Θ ⊂ Rp with τj=1 Bj (θ) = 1 for identification. The true parameters θ0 and
the true function m0 (.) are unknown. Without loss of generality, we assume τ = nM, n ∈ N.
We follow the approach of Linton and Mammen (2005) and (2006). For the moment we
ignore the presence of seasonality - an issue that will be discussed later. Suppose {yt } and
{xs/M } are stationary. Let θ0 and m0 be defined as the minimizers of the population least
squares criterion function
(
S(θ, m) = E  yt −
τ
X
Bj (θ)m(xt−1−(j−1)/M )
j=1
)2 

(3.2)
Define mθ as the minimizer of the criterion function for any given θ ∈ Θ. A necessary
condition for mθ to be the minimizer of (3.2) is that it satisfies the first order condition
E
"(
yt −
τ
X
Bj (θ)m(xt−1−(j−1)/M )
j=1
)
τ
X
k=1
#
Bk (θ)g(xt−1−(k−1)/M ) = 0
(3.3)
for any measurable (and smooth) hfunction g yielding a well-defined
expectation. Moreover,
P τ
2 i
the second order condition is −E
. The fact that the latter
k=1 Bk (θ)g(xt−1−(k−1)/M )
9
is negative implies that the solution of the first order condition does indeed (locally) minimize
the criterion. The first order condition (3.3) can be rewritten as
τ
X
Bk (θ)E[yt g(xt−1−(k−1)/M )]
k=1
−
=
τ
X
τ
τ
X
X
Bk (θ)Bj (θ)E[mθ (xt−1−(j−1)/M )g(xt−1−(k−1)/M )]
k=1 j=1,j6=k
Bk (θ)2 E[mθ (xt−1−(k−1)/M )g(xt−1−(k−1)/M )]
k=1
Taking g(.) to be the Dirac delta function, we have that
τ
X
k=1
=
Bk (θ)E[yt |xt−1−(k−1)/M = x] −
τ
X
τ
τ
X
X
Bk (θ)Bj (θ)E[mθ (xt−1−(j−1)/M )|xt−1−(k−1)/M = x]
k=1 j=1,j6=k
Bk (θ)2 mθ (x)
k=1
for each x. This is an implicit equation for mθ (.) which can be re-expressed as a linear type
two integral equation in L2 (f0 ), where f0 is the marginal density of xs/M . Define Bk∗ (θ) =
Pτ −|i|
P
Pτ
2
Bk (θ)/ τj=1 Bj (θ)2 , k = 1, ..., τ, and Bi+ (θ) =
k=1 Bk (θ)Bk+|i| (θ)/
j=1 Bj (θ) , i =
±1, ..., ±(τ − 1). Finally,let f0,j be the joint density of (xs/M , x(s−j)/M ), then:
mθ (x) =
m∗θ (x) =
m∗θ (x)
τ
X
k=1
+
Z
Hθ (x, y)mθ (y)f0 (y)dy, or mθ = m∗θ + Hθ mθ ,
Bk∗ (θ)E[yt |xt−1−(k−1)/M = x]
±(τ −1)
Hθ (x, y) = −
X
i=±1
Bi+ (θ)
f0,i (y, x)
.
f0 (y)f0 (x)
(3.4)
(3.5)
(3.6)
T
Hence, m0 = mθ0 . The general estimation strategy for a given sample {{yt }Tt=1 , {xs/M }M
s=1 }
is (a) for each θ compute estimators m̂∗θ , Ĥθ of m∗θ , Hθ , (b) solve an empirical version of (3.4)
to obtain an estimator m̂θ of mθ and (c) choose θ̂ to minimize the profiled least squares
criterion with respect to θ and let m̂(x) = m̂θ̂ (x). In Appendix D we elaborate on the details
of the estimation procedure.
10
3.1
Asymptotic theory
The following theorem establishes the asymptotic properties:
Theorem 3.1 Suppose that assumptions appearing in Appendix A hold. Then for each
θ ∈ Θ and x ∈ (x, x)
Moreover,
√ T h m̂θ (x) − mθ (x) − h2 bθ (x) =⇒ N (0, ωθ (x))
√
T (θ̂ − θ0 ) =⇒ N (0, Σ)
Furthermore, for x ∈ (x, x)
√
T h(m̂(x) − m(x) − h2 b(x)) =⇒ N (0, ω(x))
where h denotes the bandwidth defined in Appendix A, Σ (eq. (B.5)) is variance matrix, b
(eq. (B.4)) and bθ (eq. (B.2)) are bias functions, ω appears below and ωθ (eq. (B.1)) are
variance functions defined in Appendix B.
Proof: See Appendix B
It is shown in Appendix B that
Bj2 E ε2t |xt−1−(j−1)/M = x
ω(x) =
P
2
τ
2
f0 (x)
B
j=1 j
Pτ Pτ
2
M − 1 ||K||2 j=1 k=1,k6=j Bj2 Bk2 var(m(xt+(j−k)/M )|xt = x)
+
P
2
M
τ
2
f0 (x)
B
j=1 j
||K||22
Pτ
j=1
(3.7)
The above expression shows that the mixed data sampling scheme in semi-parametric MIDAS
regressions adds an extra term, the last appearing in the above expression. When M = 1,
the result is the asymptotic distribution collapses to the case covered in Linton and Mammen
(2006). When M > 1, the dependent variable is sampled less frequent than the regressor
which - compared to the case where all processes are sampled at the frequency 1/M, implies
that (M − 1)/M regression equations are missing. Intuitively speaking, there is therefore less
accuracy, due to the fact that the explained variable is not sampled at high enough frequency.
11
This introduces the extra term in variance ω(.), the second term of the right-hand side of
the equation (3.7).
To calculate the confidence interval, we assume: (1) the sample size is large enough, so the
variance of m(x)
b
is the same as the asymptotic variance; (2) var(m(xs/M )|xk/M = x) =
var(m(xs/M )), ∀s 6= k; (3) var(εt |xt−j/M = x) = var(εt ), j = 1, ..., τ . The confidence interval
is then calculated for any x as,
CI(x) = [m(x)
b
+ Zα sb(x), m(x)
b
+ Z1−α
||K||22
sb(x) =
P
b
T hfb0 (x) τj=1 Bj2 (θ)
p
sb(x)]
M −1
var(εbt ) +
M
Pτ
j=1
Pτ
2 b
2 b
k=1,k6=j Bj (θ)Bk (θ)
Pτ
2 b
j=1 Bj (θ)
(3.8)
!
var(m(x
b t ))
(3.9)
where Zα is the α-quantile of the standard normal distribution. In this paper, we set
α = 0.05, so Z0.05 = −1.645 and Z0.95 = 1.645.12
3.2
Extension to periodic data
We consider now the estimation of model (2.6), assuming that {yt , xt−1+1/M , ..., xt−1+M/M }t
is stationary, to obtain:
mi,θ (x) =
m∗i,θ (x)
+
M Z
X
Hi,k,θ (x, y)mk,θ (y)fk (y)dy, i = 1, ..., M
(3.10)
k=1
τ /M
m∗i,θ (x) =
X
n=1
Bi+(n−1)M (θ)
E[yt |xt−n−1+i/M = x]
Pτ /M
2
l=1 Bi+(l−1)M (θ)
(3.11)
τ /M τ /M
Hi,k,θ (x, y) = −
X X Bi+(n−1)M (θ)Bk+(l−1)M (θ) fi+(n−1)M,k+(l−1)M (x, y)
, k 6= i
Pτ /M
2
f
(x)f
(y)
i
k
B
(θ)
i+(p−1)M
n=1 l=1
p=1
(3.12)
τ /M
12
τ /M
X X
Bi+(n−1)M (θ)Bi+(l−1)M (θ) fi+(n−1)M,i+(l−1)M (x, y)
Hi,i,θ (x, y) = −
Pτ /M
2
fi (x)fi (y)
n=1 l=1,l6=n
p=1 Bi+(p−1)M (θ)
(3.13)
Note that we are computing the confidence interval of m(x)
b
corresponding to E(m(x))
b
instead of m 0 (x),
so we omit the discussion related to the bias part. See Wasserman (2006, p.89) for more details related to
the confidence interval of nonparametric estimation.
12
where fi (.) denotes the probability density function of {xt−1+i/M }t ; fs,q (., .) denotes the joint
density function of {xs/M , xq/M }. With the estimation method introduced in Appendix D, we
can estimate all M news impact curves mi (.), i = 1, ..., M . Note that the proposed estimation
method is an extension of the semi-parametric MIDAS model to a special multivariate case.
The notation of the asymptotic properties are more complex, but keep essentially a similar
form, which is why we omit the details.
To proceed, consider the special case: mi (x) = λi m(x), i = 1, ..., M . Replacing mi (.)s in the
model (2.6), we obtain:
yt =
=
τ
X
j=1
τ
X
Bj (θ)λj#M m(xt−1−(j−1)/M ) + εt
Cj (θ)m(xt−1−(j−1)/M ) + εt
(3.14)
j=1
where Cj (θ) ≡ Bj (θ)λj#M . Hence, the estimation solves the following equations:
mθ (x) =
m∗θ (x)
+
M X
M Z
X
Hi,k,θ (x, y)mθ (y)fk (y)dy
(3.15)
i=1 k=1
m∗θ (x) =
τ
X
j=1
C (θ)
Pτ j
E[yt |xt−1−(j−1)/M = x]
2
l=1 Cl (θ)
(3.16)
τ /M τ /M
Hi,k,θ (x, y) = −
X X Ci+(n−1)M (θ)Ck+(l−1)M (θ) fi+(n−1)M,k+(l−1)M (x, y)
Pτ
, k 6= i
2
fi (x)fk (y)
p=1 Cp (θ)
n=1 l=1
(3.17)
τ /M
/M
X τX
Ci+(n−1)M (θ)Ci+(l−1)M (θ) fi+(n−1)M,i+(l−1)M (x, y)
Pτ
Hi,i,θ (x, y) = −
2
fi (x)fi (y)
p=1 Cp (θ)
n=1 l=1,l6=n
(3.18)
We can estimate m∗θ , Hi,k,θ , i, k = 1, ..., M with the kernel smoothing method for any given
θ; then, solve the equation (3.15) to get the estimation of mθ and finally find the minimizer
(θ̂, m̂θ̂ ) of the sample mean square error as the estimation of the parameters and news impact
curve. The asymptotic properties are discussed in Appendix C. In practice, to simplify the
calculation, we use the estimation method introduced in Appendix D, which is equivalent to
assume that {xs/M } is stationary and the seasonality is considered in the coefficients.
INCOMPLETE
13
4
Empirical Results
We analyze four datasets which consist of five-minute intra-day returns of respectively Dow
Jones and S&P500 cash and futures markets. The descriptive statistics pertaining to the
data sets appear in the top panel of Table 1. The samples start in 1993 or 1996 and hence
do not include the 1987 crash, and end in October 2003. Besides the five-minute data, we
also will consider coarser sampling of returns in our models to see how predictability and
asymmetries are affected by the sampling frequency. In the case of S&P 500 futures our
data sample includes that of Bollerslev, Litvinova, and Tauchen (2006), who document that
transactions in the futures market occur on average roughly every 9 seconds. This means
that, at least for the futures data, one may safely assume that microstructure effects are
negligible, an assumption also underlying the analysis in Bollerslev, Litvinova, and Tauchen
(2006).13 By considering coarser sampling frequencies we also avoid the possibility that
some microstructure still affects the five-minute data. Besides looking at different sampling
frequencies for the regressors, we also look at different prediction horizons for future volatility.
This will allow us to appraise how asymmetries play out at different horizons. So far we
wrote equations predicting RV only one day ahead, and we noted that longer horizons are
straightforward extensions. In the empirical work we consider three horizons (1) one day,
(2) one week and (3) one month. We discuss these cases separately. A major concern about
the semi-parametric model is that of over-fitting, which we guard against by examining outof-sample performance. We therefore consider both full sample estimates and out-of-sample
predictions. Table 1 lists the sample configurations, namely the data retained for out-ofsample prediction are one and a half year at the end of the sample. Finally, it should also
be noted that, as typical in semi-parametric methods, there are many factors that affect the
estimation results, such as the initial parameters, the lag truncations, the number of grid
points and the weights of each grid point in numerically solving the integral equation, the
spline method to generate the news impact curve through the grid points, etc. The empirical
results we report are fairly robust to various choices of these attributes of semi-parametric
estimation.
Table 2 contains one day ahead forecasts for both parametric and semi-parametric model
specifications, with acronyms provided in the lower panel of Table 1. At this stage we have
not yet discussed any of the parametric specifications, nor have we provided a rationale
13
We also computed signature plots (Andersen, Bollerslev, Diebold, and Labys (2000)) which indicate
that 5 minutes appears to be a reasonable sampling frequency.
14
for them. This matter will be discussed later. For the moment, it is worth noting that
the semi-parametric MIDAS regression models (denoted SP and SP-SA, the latter involving
seasonally adjusted returns) provide the best out-of-sample fit for all the five-minute data
(the best models for out-of-sample predictions are bold faced in Table 2). It is also interesting
to note that the semi-parametric models typically have the best in-sample fit. A comparison
of SP and SP-SA indicates that using the raw five-minute data without adjustment is the
best, except for the Dow Jones cash series (i.e. one out of four).
While we have not yet discussed the parametric models, it may be worth noting that
among them figure the RV, RAV and BPVJ models using aggregate daily volatility measures
considered in the literature by Andersen, Bollerslev, and Diebold (2006) (RV and BPVJ),
Forsberg and Ghysels (2006) (RAV) and Ghysels, Santa-Clara, and Valkanov (2006) (RV and
RAV), among others. It is worth stressing the implication of this finding. It means that a
regression model involving non-parametric estimation of a response function applied to highfrequency data, outperforms a fully parametric model involving daily aggregate measures
such as RV, RAV and even the separation of jumps and continuous path volatility (identified
via test statistics involving daily measures discussed in Andersen, Bollerslev, and Diebold
(2006)). Typically, well specified parametric models outperform semi-parametric ones. Here,
however, the semi-parametric models de facto use more data and are not subject to the prespecified quadratic transformation of returns.
Since we have four volatility series, we consider four news impact curves in Figure 1 at two
horizons: next day and next week. Unlike the plot appearing in the Introduction, we consider
now four series instead of a single one. For the moment it suffices to look at the dotted lines
in each of the figures, as they represent the news impact curves obtained via the semiparametric estimation. It is remarkable to note how similar the shapes are for the SP model
across the four different series. For all series we recognize a similar shape. The asymmetry
of the news impact is obvious, that is negative and positive returns have a different impact.
The finding that No news is good news extensively documented in the literature using daily
returns has the minimum of the news impact curve at zero. Instead, with intra-daily data
we find in Figure 3 that the intra-daily news impact curves attain their minimum at some
mildly positive return, meaning that such returns result in decreased volatility the next day
(since the impact is negative).14 As noted in the Introduction, we also recognize the fact
14
The shape of the news impact curve should bring us back to the issue of positivity constraints. It
is important to note that each and every day has 78 five-minute intervals, and for some m(.) is positive,
whereas for others the functional yields a negative value. As far as positivity is concerned, what matters is
15
that extremely ’good news’, the positive returns (it turns out those larger than the 90%
quantiles), cause increased future volatility. Finally, as noted earlier, ’bad news’ has a more
acute impact that positive news. To give specific numbers, for the DJ cash series the news
impact curve achieves its minimum at 12.10 % annualized return and crosses into volatility
increasing region at 24.20 % annualized returns. The other series yield similar results.15 It
is also worth noting from the plots in Figure 3 that the 95 % asymptotic confidence intervals
around the news impact curves tell us that the dips below zero are statistically significant
in three out of the four cases - the exception being the S&P 500 cash market series.16
Besides the news impact curves, we need to discuss the parametric part of the semiparametric MIDAS regression, or more specifically the Beta polynomials appearing in
equation (2.5). We plot only one of the four examples, namely the S&P500 Futures example.
There are three plots that appear in Figure 2. The first plot displays the product of the
daily and intra-daily lags, hence it contains the profile of the coefficient ψij . The second plot
displays only the daily coefficient ψj and finally the intradaily coefficient ψi appears in the
third plot. The patterns are not surprising, given the abundant evidence documented in the
empirical volatility literature. The daily coefficients decrease monotonically and are close to
zero after 6 to 8 days. The intra-daily weights display a somewhat asymmetric U-shaped
pattern, perhaps best characterized as a smirk. It means that late afternoon returns, news
that is, carry relatively more weight that morning returns. The product of the two provides
a spiky decay pattern compounding the intra-daily and daily response.
Table 3 contains both parametric and semi-parametric model specifications for the one-week
horizon whereas the four news impact curves appear in Figure 1 also cover the weekly horizon.
We observe from the patterns that as the horizon increases, the news impact curves become
symmetric and centered around the zero return axis. Likewise Table 3 contains the monthly
horizon forecast results.
the final model prediction which compounds all the five minute intervals - so the fact that the function dips
below zero over a single interval is not of major concern, as long as the sum of all weighted functionals of
five minute returns remains positive. In none of our empirical examples did it ever happen that predictions
yielded negative volatilities - that is the constraint mentioned earlier and also appearing in the Linton and
Mammen code was never binding.
15
The annualized returns calculations, through the simple extrapolation, obviously assume that the five
minute returns would be sustained for a year.
16
In the interest of space we do not report the curves involving the adjusted returns. It turns out that
with the exception of the DJ cash series, there is not such a clear asymmetric pattern that emerges. If we
look at the out-of-sample prediction performance in Table 2 it appears that the DJ cash example is the only
one where SP-SA dominates the raw data SP specification. All three other SP-SA models are out-performed
by the asymmetric SP regressions.
16
When we examine the results in Table 3 we observe that the semi-parametric models hold
up very well as far as forecasting out-of-sample goes. In fact the results in Table 3 reveal
that the semi-parametric MIDAS is the best across all four series in terms of out-of-sample
performance for the monthly horizon and three out of four series for the weekly. This is
remarkable considering the fact that it is partially based on non-parametric estimation.
The lesson we learned so far is that the news impact curve reported in the Introduction
is representative, as it appears similar across difference series, and it also holds up out-ofsample. The models we propose do not involve aggregation of returns to a daily volatility
measure - hence information in high frequency data is preserved. Moreover, the asymmetric
pattern is distinct from that implied by realized volatility measurement.
5
A comparison with Parametric Models - New and
Old
The findings discussed in the previous section wet our appetite for considering parametric
models that apply directly to intra-daily data. There are at least three reasons for looking at
a new class of parametric models. First, the models we will consider relate to GARCH-type
models and hence bridge a new and old literature. Second, formal testing in the context of
semi-parametric models is quite challenging while it is not in the case of parametric models.
The most important and third reason is very practical. The estimation of semi-parametric
MIDAS regressions is computationally demanding. The estimation time is roughly equal to
Model M T × τ M × n2g where M T is the sample size, τ M is the number of lags and n2g is
the number of grid points. Hence, in our examples with M T ' 200, 000, τ M ' 400 and n2g
= 41, estimation time is about 20 hours for PC with P4 2.4G CPU and 1GB Memory. The
parametric models introduced in this section take between 1 and 5 minutes to estimate with
the same data.
Some of the models we introduce are new, as they explore via parametric specifications
the patterns that we uncovered with the estimation of m(.) in the previous section. Not
surprisingly, these new parametric specifications are inspired by model specifications adopted
in the ARCH literature. Yet, the new class of models are within the context of MIDAS
regressions and replace the function m(.) by various parametric functional forms. This
new class of MIDAS regressions will be compared with more traditional MIDAS regression
17
models involving daily measures of volatility discussed earlier, that is RV, RAV and BPV(J).
It should also be noted that the new class of parametric models inherit the parametric
polynomial specifications appearing in equation (2.5). That includes, of course the treatment
of intra-daily seasonality via the product of beta polynomials. Alternatively, the parametric
models can also be formulated in terms of adjusted returns, hence the classical debate about
seasonal adjustment emerges here in the context of nonlinear time series regression models
with mixed frequency data sampling.
In a first subsection we introduce the new parametric models. The next subsection reports
the empirical results.
5.1
A New Class of Parametric High Frequency Data Volatility
Models
The purpose is to introduce various parametric MIDAS regression models that are inspired
by our previously introduced semi-parametric setup. To facilitate the presentation, we use
the following indicator process: 1A which is one when A is true, and equals zero otherwise.
All models involving discretely sampled high frequency data can be represented in a generic
parametric way:
τ X
M
X
RVt =
ψij (θ)N IC(rt−j−(i−1)/M ) + εt
(5.1)
j=1 i=1
where
Pτ
j=1
PM
i=1
ψij = 1 and the following news impact curves (NIC) are used:
• N IC(r) = (a + br 2 ), to which we attach the acronym SYMM. The SYMM model
can be regarded as a MIDAS extension of ARCH to the case of high-frequency data.
Obviously, the SYMM model cannot capture any asymmetries that appear in the data.
• Inspired by the GJR model proposed by Glosten, Jagannathan, and Runkle (1993),
we consider the ASYMGJR model with N IC(r) = (a + br 2 + c1r<0 r 2 ). Although in
the original GJR model there is constraint that b, c ≥ 0 to guarantee positivity of
volatility, this constraint is most likely redundant with high frequency data. So, the
constraint is not imposed in the ASYMGJR model.
• Another possible way to allow for asymmetric effects is via a location shift, as in
the Asymmetric GARCH model in Engle (1990), yielding the ASYMLS model with
18
N IC(r) = (a + b(r − c)2 ).17
• The last model with the intra-daily return considered in our study is the ABS model
with N IC(r) = (a + b|r|), which is again a symmetric model.
All of the above models are compared to the more traditional daily volatility models. We
consider three cases of regressors: RV, RAV, BPV and Jumps yielding the RV model, RAV
model and BPVJ model, respectively. All these models are in the framework of MIDAS
regression, namely:
τ
X
RVt = a + b
ψj (θ)RVt−j + εt
(5.2)
j=1
τ
X
ψj (θ)RAVt−j + εt
(5.3)
ψj (θ)BP Vt−j + 1jump,t−1 (c + d(RVt−1 − BP Vt−1 )) + εt
(5.4)
RVt = a + b
j=1
RVt = a + b
τ
X
j=1
where ψj (θ) = Beta(j, τ, θ1 , θ2 ); 1jump,t−1 indicates if there is jump at day t − 1 and
RVt−1 − BP Vt−1 is the size of the jump at day t − 1. We use the test suggested by Huang
and Tauchen (2005) to determine 1jump,t .
5.2
Empirical Results
The full-sample estimation and out-of-sample forecasts are shown in Tables 2 and 3,
respectively for one-day, one-week and one-month horizons. According to the R 2 s in the
full-sample estimation, the models can be divided into two groups: the first group consists of
the asymmetric ASYMGJR and ASYMLS and SP(-SA) models; the second group consists of
the rest - i.e. the symmetric ones. The R2 s of the models in the first group is in general 5%
to 7% greater than those in the second group. The in-sample results also hold in the out-ofsample forecasts comparisons. Hence, the asymmetric effect is an important feature of the
news impact curve in the high-frequency data case. It is also worth noting that symmetric
models using intra-daily data still typically outperform the traditional models based on daily
17
We also considered two models which combine the GRJ model and Asymmetric GARCH model:
ASYMC1 model, N IC(r) = (a + b1r−d<0(r − d)2 + c1r−d≥0 (r − d)2 ); and ASYMC2 model, N IC(r; θ) =
(a + b1r<0 (r − d − e)2 + c1r≥0 (r − d)2 ). Due to space limitations we do not report the results and they were
roughly similar to the ASYMGJR specification.
19
volatility measures, i.e. the RV, RAV and BPVJ models. Hence, the information gain from
intra-daily squared or absolute returns is genuine.
The news impact curves of two asymmetric models at the one-day horizon, the ASYMLS
and ASYMGJR models, are shown in Figure 3. Comparing the news impact curves of the
SP model with the two asymmetric parametric models, we find that the curves are very
similar for negative returns. The main difference is that the news impact curve in the SP
model has a minimum at a positive return, but the other two asymmetric parameter models
do not. Hence, for extreme good news with the SP specification, there is an increase of the
future volatility; in the asymmetric parametric models, instead there is a decrease. This
feature indicates that the parametric models still do not fully capture the news impact as
recovered via semi-parametric estimation, and also explains why the latter feature superior
forecasting performance. Finally, in Table 3 we note that the one month ahead forecasting
horizons reveal that the symmetric parametric models perform better than the asymmetric
ones. This was already suggested by the semi-parametric impact curves that looked more
quadratic at the one week horizon.
6
Conclusions
While semi-parametric MIDAS regressions potentially apply in a variety of settings, the main
focus of the paper is on a specific application, namely news impact curves. Returning to the
figure, one may wonder what we learn from the relatively model-free patterns? The writing
of Engle and Ng (1993) was in part motivated by the recognition that volatility models,
including the at time very popular daily GARCH(1,1) model of Bollerslev (1986), imposed
a particular response function of shocks to volatility and that most often such response
functions were inherently misspecified. In particular, in the case of the GARCH(1,1) ’good’
and ’bad’ news had the same impact on future volatility and that appeared counterfactual.
The most preferred model of Engle and Ng (1993), based on their empirical analysis, was
that of Glosten, Jagannathan, and Runkle (1993). The main findings of this literature still
remains very much part of our core beliefs today regarding the key stylized facts of volatility
dynamics. Namely, it is widely believed that “good” news and “bad” news do not have the
same impact on future volatility. This is a theme that resonates in many empirical asset
pricing papers, including Campbell and Hentschel (1992), Glosten, Jagannathan, and Runkle
(1993), among many others.
20
We introduced semi-parametric MIDAS regressions and study their large sample behavior.
In addition, we focused on news impact curves as a specific application. A new parametric
specification dealing with intra-daily data is a by-product of this application. The regression
models also inspired a new class of parametric volatility models that apply directly to highfrequency data. Our analysis relates to and extends recent work by Linton and Mammen
(2005). Our empirical findings suggest that moderately good (intra-daily) news reduces
volatility (the next day), while both very good news (unusual high positive returns) and
bad news (negative returns) increase volatility, with the latter having a more severe impact.
The asymmetry evaporates at longer horizons. Parametric specifications, which bridge the
new and old literature, confirm these findings of asymmetry at short and longer horizons
via simple hypotheses imposed on the parameters. We also examine the findings in the
context of diffusion models, albeit via a more modest setup, namely that of predictive linear
regression models that allow for asymmetry. Our findings have profound implications for
current volatility prediction models and for the recently developed high frequency data insample asymptotic analysis.
21
References
Aı̈t-Sahalia, Y., 2004, Disentangling diffusion from jumps, Journal of Financial Economics
74, 487–528.
, and J. Jacod, 2007a, Testing for jumps in a discretely observed process, Annals of
Statistics, forthcoming.
, 2007b, Volatility estimators for discretely sampled Lévy processes, Annals of
Statistics, forthcoming 35, 355–392.
Aı̈t-Sahalia, Y., and Loriano Mancini, 2006, Out of sample forecasts of quadratic variation,
Work in progress.
Aı̈t-Sahalia, Y., P. A. Mykland, and L. Zhang, 2005, How often to sample a continuous-time
process in the presence of market microstructure noise, Review of Financial Studies 18,
351–416.
Andersen, T.G., T. Bollerslev, F.X. Diebold, and P. Labys, 2000, Great realizations, Risk
Magazine 13, 105 – 108.
Andersen, T., T. Bollerslev, and N. Meddahi, 2006, Market microstructure noise and realized
volatility forecasting, Work in progress.
Andersen, T. G., and T. Bollerslev, 1997, Intraday periodicity and volatiliy persistence in
financial markets, Journal of Empirical Finance 4, 115–158.
, 1998, Deutsche Mark–Dollar Volatility: Intraday Activity Patterns, Macroeconomic
Announcements, and Longer Run Dependencies, Journal of Finance 53, 219–265.
, and Francis X. Diebold, 2006, Roughing it up: Including jump components in
the measurement, modeling and forecasting of return volatility, Review of Economics and
Statistics (forthcoming).
Bandi, F. M., and J. R. Russell, 2005, Microstructure noise, realized variance, and optimal
sampling, Working paper, University of Chicago.
, 2006, Separating microstructure noise from volatility, Journal of Financial
Economics 79, 655–692.
22
Barndorff-Nielsen, O., S. Graversen, J. Jacod, and N. Shephard, 2006, Limit theorems for
bipower variation in financial econometrics, Econometric Theory 22, 677–719.
Barndorff-Nielsen, O. E., P. R. Hansen, A. Lunde, and N. Shephard, 2006, Regular and
modified kernel-based estimators of integrated variance: The case with independent noise,
Discussion paper, Department of Mathematical Sciences, University of Aarhus.
Barndorff-Nielsen, O. E., S. Kinnebrock, and N. Shephard, 2008, Measuring downside risk
realised semivariance, Discussion Paper, Oxford.
Barndorff-Nielsen, O. E., and N. Shephard, 2002, Estimating quadratic variation using
realised variance, Journal of Applied Econometrics 17, 457–477.
, 2006, Econometrics of testing for jumps in financial economics using bipower
variation, Journal of Financial Econometrics 4, 1–30.
, 2007, Variation, jumps, market frictions and high frequency data in financial
econometrics, in Richard Blundell, Persson Torsten, and Whitney K Newey, ed.: Advances
in Economics and Econometrics. Theory and Applications, Ninth World Congress.
Econometric Society Monographs, Cambridge University Press.
Barndorff-Nielson, O.E., and N. Shephard, 2004, Power and bipower variation with stochastic
volatility and jumps, Journal of Financial Econometrics 2, 1–48.
Bollen, B., and B. Inder, 2002, Estimating daily volatility in financial markets utilizing
intraday data, Journal of Empirical Finance 9, 551–562.
Bollerslev, T., 1986, Generalized autoregressive conditional heteroskedasticity, Journal of
Econometrics 31, 307–327.
, and E. Ghysels, 1996, On periodic autoregressive conditional heteroskedasticity,
Journal of Business and Economic Statistics 14, 139–151.
Bollerslev, T., J. Litvinova, and G. Tauchen, 2006, Leverage and volatility feedback effects
in high-frequency data, Journal of Financial Econometrics 4, 353–384.
Bosq, D., 1998, Nonparametric Statistics for Stochastic Processes: Estiamtion and Prediction
(Springer-Verlag: Berlin).
23
Campbell, J. Y., and Ludger Hentschel, 1992, No news is good news: An asymmetric model
of changing volatility in stock returns, Journal of Financial Economies 31, 281–318.
Corsi, Fulvio, 2003, A simple long memory model of realized volatility, Unpublished
manuscript, University of Southern Switzerland.
Dacorogna, M., R. Gençay, U. A. Müller, R. B. Olsen, and O. V. Pictet, 2001, An
Introduction of High-Frequency Finance (Academic Press: San Diego).
Engle, R., and G. Gallo, 2006, A multiple indicators model for volatility using intra-daily
data, Journal of Econometrics 131, 3 – 27.
Engle, R. F., 1990, Discussion: Stock market volatility and the crash of ’87, Review of
Financial Studies 3, 103–106.
, and V. Ng, 1993, Measuring and testing the impact of news on volatility, Journal
of Finance 48, 1749–1778.
Forsberg, Lars, and E. Ghysels, 2006, Why do absolute returns predict volatility so well?,
Journal of Financial Econometrics 6, 31–67.
Ghysels, E., and D. Osborn, 2001, The Econometric Analysis of Seasonal Time Series
(Cambridge University Press, Cambridge).
Ghysels, E., P. Santa-Clara, and R. Valkanov, 2002, The MIDAS touch: Mixed data sampling
regression models, Working paper, UNC and UCLA.
, 2006, Predicting volatility: getting the most out of return data sampled at different
frequencies, Journal of Econometrics 131, 59–95.
Ghysels, E., and A. Sinko, 2006, Volatility prediction and microstructure noise, Work in
progress.
, and R. Valkanov, 2006, MIDAS Regressions: Further Results and New Directions,
Econometric Reviews 26, 53 – 90.
Gladyshev, E., 1961, Periodically correlated random sequences, Soviet Mathematics 2, 385
– 388.
24
Glosten, L. R., R. Jagannathan, and David E. Runkle, 1993, On the relation between the
expected value and the volatility of the nominal excess return on stocks, Journal of Finance
48, 1779–1801.
Hansen, P. R., and Asger Lunde, 2006, Realized variance and market microstructure noise,
Journal of Business and Economic Statistics 24, 127–161.
Huang, X., and G. Tauchen, 2005, The relative contribution of jumps to total price variation,
Journal of Financial Econometrics 3, 456–499.
Jacod, J., 1994, Limit of random measures associated with the increments of a brownian
semimartingale, Preprint number 120, Laboratoire de Probabilités, Université Pierre et
Marie Curie, Paris.
, 1996, La variation quadratique du brownian en presence d’erreurs d’arrondi,
Asterisque 236, 155–162.
Jones, M.C., O.B. Linton, and J.P. Nielsen, 1995, A simple bias reduction method for density
estimation, Biometrika 82, 327–338.
Linton, O., and E. Mammen, 2005, Estimating semiparametric ARCH(∞) models by kernel
smoothing methods, Econometrica 73, 711–836.
, 2006, Nonparametric transformation to white noise, Working paper, London School
of Economics.
Martens, M., Y. C. Chang, and S. J. Taylor, 2002, A comparison of seasonal adjustment
methods when forecasting intraday volatility, Journal of Financial Research 25, 283–299.
Nelson, D. B., 1991, Conditional heteroscedasticity in asset returns: A new approach,
Econometrica 64, 347–370.
Tauchen, G., and H. Zhou, 2005, Identifying realized jumps on financial markets, Working
paper, Duke University.
Wasserman, L., 2006, All of Nonparametric Statistics (Springer: New York).
Wood, R. A., T. H. McInish, and J. K. Ord, 1985, An investigation of transaction data for
nyse stocks, Journal of Finance 40, 723–739.
25
Technical Appendices
A
Regularity conditions
To facilitate the asymptotic analysis, we make the following assumptions on the residuals and regressors, the
kernel function K(.), and the bandwidth parameter h. Define
y(s+M +j−1)/M − E[y(s+M +j−1)/M |xs/M ], if (s + j − 1)/M ∈ Z
0, otherwise
ηs,j = {
ζs,j (θ) = mθ (x(s−j)/M ) − E[mθ (x(s−j)/M )|xs/M ]
ηs,θ = M
τ
X
Bj∗ (θ)ηs,j
(A.1)
(A.2)
(A.3)
j=1
±(τ −1)
ζs,θ = −
X
Bj∗ (θ)ζs,j (θ)
(A.4)
j=±1
Moreover, we assume that:
+∞
• The process {xs/M }+∞
s=−∞ is stationary; and the process {yt , Xt }t=−∞ are jointly stationary and
geometrically α-mixing, where Xt = {xt−(M −1)/M , xt−(M −2)/M , ..., xt }, and α(k) ≤ ask for some
constant a and 0 ≤ s < 1 when k is big enough.
• E[|yt |2ρ ] < ∞ for some ρ > 2.
• The covariate process {xs/M }∞
s=−∞ has absolutely continuous density f0 (.) supported on [x, x] for
some −∞ < x < x < ∞ and the bivariate densities f0,j (.) are supported on [x, x]2 . The function
m(.) together with the densities f0 (.) and f0,j (.) are continuous and twice continuously differentiable
over (x, x) and (x, x)2 , and are uniformly bounded, f0 (.) is bounded away from zero on [x, x], i.e.,
inf x≤ω≤x f0 (ω) > 0.
• The parameter space Θ is a compact subset of Rp , and the value θ0 is an interior point of Θ. There exist
R
Pτ
no measurable function m(.) with m(x)2 f0 (x)dx = 1 such that j=1 Bj (θ)m(xt−1−(j−1)/M ) = 0
with probability one. For any > 0
inf
S(θ, mθ ) > S(θ0 , mθ0 )
||θ−θ0 ||>
• The density function µ of (ηs,j , ζs,j (θ)) is Lipschitz continuous on its domain. The joint densities
µ0,j , j = 1, 2, ..., τ − 1, of ((ηt,0 , ζs,0 (θ)), (ηs,j , ζs,j (θ))) are uniformly bounded.
• The bandwidth sequence h(T ) satisfies T 1/5 h(T ) → γ as T → ∞ with γ bounded away from zero and
infinity.
26
R
R
• For each x ∈ [x, x] the kernel function K has support [−1, 1] and K(u)du = 1 and K(u)udu = 0,
such that for some constant C, supx∈[x,x] |K(u) − K(v)| ≤ C|u − v| for all u, v ∈ [−1, 1]. Define
R
R
µj (K) = uj K(u)du and ||K||22 = K 2 (u)du.
∞
• εt satisfies E εt |{xt−1−(s−1)/M }∞
s=1 , {εt−j }j=1 = 0 a.s.
B
Proof of Theorem 3.1
Define the functions βθj (x), j = 1, 2, as solutions to the integral equations βθj = βθ∗,j + Hθ βθj , in which:
βθ∗,1 (x) = m∗00
θ (x),
±(τ −1)
βθ∗,2 (x)
X
=
Bi∗ (θ)
i=±1
(
00
E(mθ (x(s−i)/M |xs/M
f (x)
= x) 0
−
f0 (x)
Z
mθ (y)
[∇2 f0,s (x, y)]
dy
f0 (x)
)
where the operator ∇2 is defined as ∇2 = ∂ 2 /∂x2 + ∂ 2 /∂y 2 . Then define
||K||22
var[ηθ,s + ζθ,s ]
f0 (x)
1
bθ (x) = µ2 (K) βθ1 (x) + βθ2 (x)
2
ωθ (x) =
Define:
Bj2 (θ0 )E ε2t |xt−1−(j−1)/M = x
ω(x) =
i2
hP
τ
2 (θ )
B
f0 (x)
0
j=1 j
Pτ Pτ
2
2
2
M − 1 ||K||2 j=1 k=1,k6=j Bj (θ0 )Bk (θ0 ) var(m(xt+(j−k)/M )|xt = x)
+
P
2
M
τ
2
f0 (x)
j=1 Bj (θ0 )
||K||22
Pτ
j=1
b(x) = µ2 (K)
Let εt (θ) = yt −
Pτ
j=1
0
f ∂
1 00
m (x) + (I − Hθ )−1 0
(Hθ m) (x)
2
f0 ∂x
(B.1)
(B.2)
(B.3)
(B.4)
Bj (θ)mθ (xt−1−(j−1)/M ), and let
2
−1 2
−1
∂ εt
∂εt ∂εt 2
∂ εt
Σ= E
(θ0 )
E
ε (θ0 ) E
(θ0 )
∂θ∂θ|
∂θ ∂θ| t
∂θ∂θ|
(B.5)
The proof follows Linton and Mammen (2005) and Linton and Mammen (2006). First, for general θ we
apply Proposition 1, p. 815, of Linton and Mammen (2005). Thus, we write
∗,C
∗,D
m̂∗θ (x) − m∗θ (x) = m̂∗,B
θ (x) + m̂θ (x) + m̂θ (x)
(B.6)
∗,F
∗,G
(Ĥθ − Hθ )mθ (x) = m̂∗,E
θ (x) + m̂θ (x) + m̂θ (x)
(B.7)
27
∗,E
−2/5
where m̂∗,B
),
θ (x) and m̂θ (x) are deterministic and O(T
h2
µ2 (K)m∗00
θ (x)
2
Z
±(τ −1)
X
f000 (x)
h2
mθ (y)
+
B
(θ)
E(m
(x
)|x
=
x)
m̂∗,E
(x)
=
[∇
f
(x,
y)]
µ
(K)
−
dy
θ
t
2
2
0,j
t+j/M
j
θ
2
f0 (x)
f0 (x)
s=±1
m̂∗,B
θ (x) =
while:
m̂∗,C
θ (x) =
=
=
τ X
X
1
Bj∗ (θ)Kh (xt−1−(j−1)/M − x)(yt − E(yt |xt−1−(j−1)/M ))
T f0 (x) j=1 t
τ X
X
1
B ∗ (θ)Kh (xs/M − x)ηs,j
T f0 (x) j=1 s j
X
1
Kh (xs/M − x)ηs,θ
M T f0 (x) s
m̂∗,F
θ (x) = −
±(τ −1)
X X
1
B + (θ)Kh (xs/M − x)(m(x(s−j)/M ) − E(m(x(s−j)/M )|xs/M ))
M T f0 (x) j=±1 s j
±(τ −1)
X X
1
B + (θ)Kh (xs/M − x)ζs,j
=−
M T f0 (x) j=±1 s j
=
X
1
Kh (xs/M − x)ζs,θ
M T f0 (x) s
∗,G
and the reminder terms m̂∗,D
θ (x) and m̂θ (x) satisfy
−2/5
sup sup |m̂∗,J
), J = D, G
θ (x)| = op (T
θ∈Θx∈[x,x]
From this one obtains an expansion
∗,C
∗,F
E
−2/5
m̂θ (x) − mθ (x) = m̂B
)
θ (x) + m̂θ (x) + m̂θ (x) + m̂θ (x) + op (T
(B.8)
−1 ∗,B
−1 ∗,E
where m̂B
m̂θ (x) and m̂E
m̂θ (x), and the error is op (T −2/5 ) over x and
θ (x) = (I − Hθ )
θ (x) = (I − Hθ )
∗,F
θ ∈ Θ. Form this expansion we obtain the main result. Specifically, m̂∗,C
θ (x) + m̂θ (x) is asymptotically
normal with zero mean and the stated variance after applying a CLT for near epoch dependent functions
E
of mixing processes. The asymptotic bias comes from m̂B
θ (x) + m̂θ (x). Note that because of the boundary
modification to the kernel we have E fˆ0 (x) = f0 (x) + O(h2 ) and E fˆ0,j (x, y) = f0,j (x, y) + O(h2 ) for all x, y.
Our proof below make use the following results. For δT = T −3/10+ζ with ζ > 0 small enough,
max
sup |fˆ0,j (x, y) − f0,j (x, y)| = op (δT )
(B.9)
sup |fˆ0 (x) − f0 (x)| = op (δT )
(B.10)
1≤|j|≤τ −1x,y∈[x,x]
x∈[x,x]
28
This follows by the exponential inequality of Bosq (1998, Theorem 1.3), see p. 817, Linton and Mammen
(2005).
PROOF OF (B.6). For each j,
ĝj (x) − gj (x) =
X
1
h2
Kh (xt−1−(j−1)/M − x)e
ηt,j + µ2 (K)bj (x) + RT j (x)
T f0 (x) t
2
where ηet,j = yt − E(yt |xt−1−(j−1)/M ) = η(t−1)M −(j−1),j as defined in (A.1); bj (x) is the bias function and
RT j (x) is the remainder term, which is op (T −2/5 ) uniformly over j ≤ τ and x ∈ [x, x]. See p. 818 of Linton
and Mammen (2005) for detail. Therefore,
m̂∗θ (x) − m∗θ (x) =
=
+
τ
X
j=1
Bj∗ (θ) [ĝj (x) − gj (x)]
τ X
X
1
ηt,j
Bj∗ (θ)Kh (xt−1−(j−1)/M − x)e
T f0 (x) j=1 t
τ
X
h2
µ2 (K)
Bj∗ (θ)bj (x) + op (T −2/5 )
2
j=1
uniformly over x ∈ [x, x]. Then (B.6) follows.
PROOF OF (B.7). We have
(Ĥθ − Hθ )mθ (x)
Z
Z
= Ĥθ (x, y)mθ (x)fˆ0 (y)dy − Hθ (x, y)mθ (x)f0 (y)dy
±(τ −1)
=−
Denote by
Then write
Z
X
j=±1
Bj+ (θ)
#
Z "ˆ
f0,j (x, y) f0,j (x, y)
mθ (y)dy
−
f0 (x)
fˆ0 (x)
f0,j (x, y)
mθ (y)dy = E m(x(s−j)/M )|xs/M = x ≡ rj (x)
f0 (x)
R
Z ˆ
fˆ0,j (x, y)mθ (y)dy
f0,j (x, y)
mθ (y)dy =
fˆ0 (x)
fˆ0 (x)
P
1
Kh (xs/M − x)m∗s−j
= M T 1 sP
s Kh (xs/M − x)
MT
29
(B.11)
where
m∗s =
=
Z
Z
Kh (y − xs/M )mθ (y)dy
(B.12)
Kh (y − xs/M )(mθ (y) − mθ (xs/M ))dy + mθ (xs/M )
Z
0
= mθ (xs/M ) + mθ (xs/M ) Kh (y − xs/M )(y − xs/M )dy
Z
1
Kh (y − xs/M )(y − xs/M )2 m00θ (x∗s/M (y))dy
+
2
h2
= mθ (xs/M ) + µ2 (K)m00θ (xs/M ) + o(h2 )
2
by a second order Taylor expansion, a change of variables and property B7 of the kernels. The error is
uniformly o(h2 ) over s, θ. Note that (B.11) is just like a local constant smoother of m∗s−j on xs/M and can
be analyzed in the same way.
#
Z "ˆ
f0,j (x, y) f0,j (x, y)
mθ (y)dy
−
f0 (x)
fˆ0 (x)
R
fˆ0,j (x, y)mθ (y)dy
=
− rj (x)
fˆ0 (x)
P
1
Kh (xs/M − x)(m∗s−j − rj (x))
= MT s 1 P
s Kh (xs/M − x)
MT
P
1
Kh (xs/M − x)(m∗s−j − mθ (x(s−j)/M ))
= MT s
1 P
s Kh (xs/M − x)
MT
1 P
Kh (xs/M − x)(mθ (x(s−j)/M ) − rj (xs/M ))
+ MT s
1 P
s Kh (xs/M − x)
MT
P
1
Kh (xs/M − x)(rj (xs/M ) − rj (x))
+ MT s 1 P
s Kh (xs/M − x)
MT
2
h
µ2 (K)E m00θ (x(s−j)/M )|xs/M = x
'
2
X
1
+
Kh (xs/M − x)ζs,j
M T f0 (x) s
2rj0 (x)f00 (x)
h2
00
+ µ2 (K) rj (x) +
2
f0 (x)
(B.13)
by standard arguments for Nadaraya-Watson smoother. The appoximation is valid uniformly over |j| ≤ τ −1,
x ∈ [x, x] and θ ∈ Θ.
The bias terms in (B.13) are
2rj0 (x)f00 (x)
h2
µ2 (K) rj00 (x) +
+ E m00θ (x(s−j)/M )|xs/M = x
2
f0 (x)
30
which can be rearranged as
Z 2
f 00 (x)
1
∂ f0,j (x, y) ∂ 2 f0,j (x, y)
h2
m
(y)dy
µ2 (K) − 0
rj (x) +
+
θ
2
f0 (x)
f0 (x)
∂x2
∂y 2
Refer to p. 23 of Linton and Mammen (2006) for details. In conclusion, we have
Z
Ĥθ (x, y)mθ (x)fˆ0 (y)dy −
±(τ −1)
=−
X
j=±1
±(τ −1)
+
Bj+ (θ)
X
"
2
Z
Hθ (x, y)mθ (x)f0 (y)dy
X
1
Kh (xs/M − x)ζs,j
M T f0(x) s
1
f 00 (x)
µ2 (K) 0
rj (x) −
2
f0 (x)
f0 (x)
h
Bj+ (θ)
j=±1
Z #
∂ 2 f0,j (x, y) ∂ 2 f0,j (x, y)
+
∂x2
∂y 2
mθ (y)dy
+ op (T −2/5 )
uniformly over x ∈ [x, x] and θ ∈ Θ. This concludes the proof of (B.7).
The consistency and root-n consisitency of θb are the same as that in Linton and Mammen (2005, 2006), so
we omit them here.
We can now effectively take θ = θ0 , and one obtains a simpler expansion for m
b θ0 (x) − m(x). Omit θ0 in
Bj (θ0 ) to simplify notation. In particular:
τ
m
b ∗,C
θ (x) =
XX
1
B ∗ Kh (xt−1−(j−1)/M − x)(yt − E(yt |xt−1−(j−1)/M ))
T f0 (x) j=1 t j
τ X
X
1
=
Kh (xt−1−(j−1)/M − x)Bj∗ εt
T f0 (x) j=1 t
+
=
τ X
X
1
Kh (xt−1−(j−1)/M − x)Bj∗
T f0 (x) j=1 t
τ
X
k=1,k6=j
m
b ∗,F
θ (x) = −
=−
X
t
Bk ζ(t−1)M −j+1,j−k
k=1,k6=j
τ X
X
1
Bj∗ Kh (xt−1−(j−1)/M − x)εt
T f0 (x) j=1 t
τ
X
1
+
T f0 (x) j=1
τ
X
Kh (xt−1−(j−1)/M − x)Bj∗ Bk ζ(t−1)M −j+1,j−k
±(τ −1)
X X
1
Kh (xs/M − x)Bj+ ζs,j
M T f0(x) j=±1 s
τ
X
1
M T f0(x) j=1
τ
X
k=1,k6=j
31
X
s
Kh (xs/M − x)Bj∗ Bk ζs,j−k
m
b ∗,C
b ∗,F
θ (x) + m
θ (x)
=
+
−
=
τ X
X
1
Bj∗ Kh (xt−1−(j−1)/M − x)εt
T f0 (x) j=1 t
τ
X
1
T f0 (x) j=1
1
M T f0 (x)
1
T f0 (x)
τ
X
X
Kh (xt−1−(j−1)/M − x)Bj∗ Bk ζ(t−1)M +j−1,j−k
k=1,k6=j t
τ
τ
X X
X
j=1 k=1,k6=j
τ X
X
j=1
t
s
Kh (xs/M − x)Bj∗ Bk ζs,j−k
Bj∗ Kh (xt−1−(j−1)/M − x)εt +
τ
X
τ
X
Bj∗ Bk A
j=1 k=1,k6=j
where
X
1
Kh (xt−1−(j−1)/M − x)ζ(t−1)M −j+1,j−k
T f0 (x) t
X
1
−
Kh (xs/M − x)ζs,j−k
M T f0(x) s
X
M
=
Kh (xt−1−(j−1)/M − x)ζ(t−1)M −j+1,j−k
M T f0(x) t
A=
M X
X
1
−
Kh (xt−1−(j−1−i)/M − x)ζ(t−1)M −j+1+i,j−k
M T f0(x) i=0 t
M −1 X
=
Kh (xt−1−(j−1)/M − x)ζ(t−1)M −j+1,j−k
M T f0(x) t
M
X
X
1
−
Kh (xt−1−(j−1−i)/M − x)ζ(t−1)M −j+1+i,j−k
M T f0(x) i=1
t
!
Since
√
T h (T f0 (x))−1
X
t
Kh (xt−1−(j−1−i)/M − x)ζ(t−1)M −j+1+i,j−k
!
=⇒ N (0, ||K||22 f0 (x)−1 var(m(xt+(j−k)/M )|xt = x)), ∀i, j, k
and if i.i.d. Zi ˜N (0, σi2 ),
√
Therefore,
P
i
Zi ˜N (0,
P
i
σi2 ),
T hA =⇒ N (0, [(M − 1)2 + (M − 1)]M −2 ||K||22 f0 (x)−1 var(m(xt+(j−k)/M )|xt = x))
√
∗,F
Th m
b ∗,C
(x)
+
m
b
(x)
=⇒ N (0, ω(x))
θ
θ
32
where
Bj2 E ε2t |xt−1−(j−1)/M = x
ω(x) =
P
2
τ
2
f0 (x)
j=1 Bj
Pτ Pτ
2
2 2
M − 1 ||K||2 j=1 k=1,k6=j Bj Bk var(m(xt+(j−k)/M )|xt = x)
+
P
2
M
τ
2
B
f0 (x)
j=1 j
||K||22
Pτ
j=1
Likewise, there is a simplification for the bias term m
bB
bE
θ (x) + m
θ (x).
If εt is i.i.d. and independent
of the process {xs/M }, the first item of ω(x) is simplified as
P
τ
||K||22 σε2 / f0 (x) j=1 Bj (θ0 )2 , where σε2 is the variance of εt . When M = 1, the result is the same as
that in Linton and Mammen (2006). If Ĥθmod (x, y) is used, b(x) is simplified to µ2 (K)m00 (x)/2.
C
Asymptotic Properties of the Seasonality Model
Note that most of the proof in Appendix B does not require that {xs/M } is stationary, but
{yt , xt−1+1/M , ..., xt−1+M/M } is stationary. By replacing f0 (.) with the appropriate fi (.), f0,j (., .) with the
appropriate fi/M +n,k/M +l (., .), Bj with Cj , and rearranging the equations, we can achieve the asymptotic
properties of the seasonality model (3.14) in the exact same form as that in Theorem 3.1, certainly
with different definition of the bias functions and variance functions, which are described in the following
paragraphs.
Define operator Hθ as follows:
Hθ m =
Z X
M X
M
Hi,k,θ (x, y)fk (y)m(y)dy
(C.1)
i=1 k=1
If function G is the solution to the integral equation G = G∗ + Hθ G, we can express G as (I − Hθ )−1 G∗ .
Then, for any given θ, (1) the bias function is as follows:
bθ (x) =
h
i
1
µ2 (K)(I − Hθ )−1 βθ∗,1 (x) + βθ∗,2 (x)
2
33
(C.2)
where
βθ∗,1 (x) = m∗00
θ (x),
βθ∗,2 (x)
=
/M
M X
M τX
X
i=1 k=1 n=1
τ /M
X
l=1
l6=n if i=k
Ci+(n−1)M (θ)Ck+(l−1)M (θ) 1
2
Pτ
α
(x)
+
α
(x)
θ
θ
2
p=1 Cp (θ)
00
f (x)
= E(mθ (xl+k/M )|xn+i/M = x) i
fi (x)
Z
mθ (y)
dy
α2θ (x) = − [∇2 fn+i/M,l+k/M (x, y)]
fi (x)
α1θ (x)
∇2 ≡ ∂ 2 /∂x2 + ∂ 2 /∂y 2
(2) the variance function is as follows:
ωθ (x) =
M
X
||K||2
2
i=1
fi (x)
var[ηθ,i + ζθ,i ]
(C.3)
where
τ /M
ηθ,i =
ζθ,i =
X C(n−1)M +i (θ)
Pτ
(yt − E(yt |xt−1−n+i/M = x))
2
p=1 Cp (θ)
n=1
/M
M τX
X
k=1 n=1
τ /M
X
l=1
l6=n if i=k
Ci+(n−1)M (θ)Ck+(l−1)M (θ)
Pτ
(mθ (xl+k/M ) − E(mθ (xl+k/M )|xn+i/M = x))
2
p=1 Cp (θ)
When the estimation of θ converges to the true value, the bias and variance functions are:
(
)
M 0
X
1 00
∂
f
i
b(x) = µ2 (K)
m (x) + (I − H)−1
(Hi m) (x)
2
f
i ∂x
i=1
/M
M τX
X
C(n−1)M +i (θ) E ε2t |xt−1−n+i/M = x
2
Pτ
ω(x) = ||K||2
2
fi (x)
p=1 Cp (θ)
i=1 n=1
where
Hi m =
/M
M τX
X
k=1 n=1
τ /M
X
l=1
l6=n if i=k
Ci+(n−1)M (θ)Ck+(l−1)M (θ)
Pτ
2
p=1 Cp (θ)
Z
(C.4)
(C.5)
fi+(n−1)M,i+(l−1)M (x, y)
m(y)dy
fi (x)
The asymptotic property of the estimation of θ is exactly same as that in Theorem 3.1, so it is omitted here.
34
D
Details
of
semi-parametric
MIDAS
regression
estimator
There are many suitable estimators of the regression functions and density functions in our estimator; we
shall use local linear regression estimators for m∗ and a fairly standard density estimator for H but other
choices are possible. It is the purpose of this subsection to provide the details of the estimator.
For any sequence {yt } and any lag j, j = 1, ..., τ , define the estimator ĝj (x) = ĉ0 , where (ĉ0 , ĉ1 ) are the
minimizers of the weighted sums of squares criterion
T
X
t=τ /M +1
{yt − c0 − c1 (xt−1−(j−1)/M − x)}2 Kh (xt−1−(j−1)/M − x)
with respect to (c0 , c1 ), where K is a symmetric probability density function, h is a positive bandwidth, and
Kh (.) = K(./h)/h. Further define,
fˆ0,i (y, x) =
MX
T −τ
1
Kh (xs/M − y)Kh (x(s−i)/M − x), i = ±1, ..., ±(τ − 1)
M T − 2τ s=τ +1
fˆ0 (x) =
MT
1 X
Kh (xs/M − x)
M T s=1
m̂∗θ (x) =
τ
X
Bj∗ (θ)ĝj (x)
j=1
±(τ −1)
Ĥθ (x, y) = −
X
Bi+ (θ)
i=±1
fˆ0,i (y, x)
ˆ
f0 (y)fˆ0 (x)
Then define m̂θ as any solution to the equation
m = m̂∗θ + Ĥθ m,
(D.1)
in L2 (fˆ0 ). We give a brief solution in practice. Let {αj,n , j = 1, ..., n} be some equally spaced grid of points
in [0, 1] , and let qj,n = F̂0−1 (αj,n ) be the empirical αj,n quantiles of xs/M . Now approximate (D.1) by
m̂(qi,n ) = m̂∗θ (qi,n ) +
n
X
Ĥθ (qi,n , qj,n )m̂(qj,n ), i = 1, ..., n
j=1
The linear system (D.2) can be written in matrix notation
(In − Ĥθ )m̂θ = m̂∗θ
35
(D.2)
where In is the n × n identity, m̂θ = (m̂(q1,n ), ..., m̂(qn,n ))T and m̂∗θ = (m̂∗θ (q1,n ), ..., m̂∗θ (qn,n ))T , while

±(τ −1)
Ĥθ = −
X
l=±1
n
ˆ0,l (qi,n , qj,n ))
f

Bl+ (θ)
fˆ0 (qi,n )fˆ0 (qj,n )
i,j=1
is an n×n matrix. When n is not too big, e.g., n < 2000, we can find the solution values m̂θ = (In −Ĥθ )−1 m̂∗θ ;
otherwise, iterative methods are indispensable (see Linton and Mammen (2005) for more details).
Let θ̂ = arg minθ∈Θ ŜT (θ), where
1
ŜT (θ) =
T − τ /M
T
X
t=τ /M +1
Finally, let m̂(x) = m̂θ̂ (x).



yt −
τ
X
Bj (θ)m̂θ (xt−1−(j−1)/M )
j=1
2


One can also replace Ĥθ (x, y) by a local linearized version as in p. 789 of Linton and Mammen (2005) as
follows:
MX
T −τ
fˆ0 (x)
1
mod
fˆ0,i
(y, x) = fˆ0,i (y, x) + 0
(xs/M − y)Kh (xs/M − y)Kh (x(s−i)/M − x)
fˆ0 (x) M T − 2τ s=τ +1
MT
fˆ0 (x) 1 X
(xs/M − x)Kh (xs/M − x)
fˆ0mod (x) = fˆ0 (x) + 0
fˆ0 (x) M T
s=1
±(τ −1)
Ĥθmod (x, y) = −
X
Bi+ (θ)
i=±1
PM T
−1
mod
fˆ0,i
(y, x)
fˆmod (y)fˆ0 (x)
0
where fˆ00 (x) can be replaced by (M T h2 µ2 (K))
s=1 (xs/M − x)Kh (xs/M − x). Alternatively, one can
replace the standard kernel density estimators by other suitable density estimators like the Jones, Linton,
and Nielsen (1995) procedure.
36
Table 1: Details of the Data Series and Model Acronyms
The top part of the table provides the details of the data used in our study. We analyze four series which
consist of intra-day returns of respectively Dow Jones and S&P500 cash and futures markets. To lower part
summarizes all models, showing the equation numbers, theP
models’
PMacronyms and some details. The generic
τ
specification appears in equation (5.1), namely: RVt = j=1 i=1 ψij (θ)N IC(rt−j−(i−1)/M ) + εt where
Pτ PM
i=1 ψij = 1 and news impact curves N IC(r) used.
j=1
Period
Days
Trading Hours
M
9 : 30˜16 : 05
7 : 25˜15 : 20
8 : 35˜15 : 00
8 : 35˜15 : 30
79
96
78
84
9 : 30˜16 : 05
7 : 25˜15 : 20
8 : 35˜15 : 00
8 : 35˜15 : 30
79
96
78
84
Full Sample
Dow Jones
S&P 500
Cash
Futures
Cash
Futures
4/1/1993˜10/31/2003
10/6/1997˜10/31/2003
4/1/1993˜10/31/2003
10/1/1997˜10/31/2003
2669
1529
2550
1531
Out-of-sample
Dow Jones
S&P 500
Cash
Futures
Cash
Futures
News Impact
4/1/2001˜10/31/2003
11/1/2001˜10/31/2003
1/2/2002˜10/31/2003
1/2/2002˜10/31/2003
Acronym
649
504
456
463
Explanation
Intra-daily returns - Parametric
(a + brδ2 )
(a + br2 + c1r<0 r2 )
(a + b(r − c)2 )
(a + b1r−d<0(r − d)2
+c1r−d≥0(r − d)2 )
(a + b1r<0 (r − d − e)2
+c1r≥0 (r − d)2 )
(a + b|r|)
SYMM
ASYMGJR
ASYMLS
ASYMC1
Symmetric
Asymmetric GJR
Asymmetric Location Shift
1st Asymmetric Combination
ASYMC2
2nd Asymmetric Combination
ABS
Absolute Value
Intra-daily returns - Semi-parametric
Eq. (2.3)
Eq. (2.4)
SP
SP-SA
Semi-parametric
Semi-parametric with Seas. Adj. Returns
Models with daily volatility
Eq. (5.2)
Eq. (5.3)
Eq. (5.4)
RV
RAV
BPVJ
RV
RAV
BPV and Jumps
37
Table 2: One-day ahead in-sample fit and out-of-sample forecast performance of models
The top panel of the table shows the R2 of in-sample estimation for each model, acronyms appearing in lower panel of Table 1. The lower
panel provides out-of-sample forecasting, with the out-sample period specified in top panel of Table 1. Regressors with 5/10/30-minute returns
are considered. The in- (lines with ’in’) and out-of-samples (lines with ’out’) for each series and model are reported. The best out-of-sample
model appears boldfaced.
Dow Jones Cash
5min
10min
30min
Dow Jones Futures
5min
10min 30min
5min
S&P 500 Cash
10min
30min
S&P 500 Futures
5min
10min
30min
Full Sample Fit Performance
Semi-Parametric Intra-daily returns
SP
SP-SA
In
Out
In
Out
0.5710
0.5607
0.5585
0.6192
0.5490
0.4876
-
0.5422
0.4896
-
0.5297
0.6928
0.5092
0.6391
0.4926
0.6090
-
0.4755
0.5517
-
0.5848
0.7024
0.5844
0.6934
38
0.5725
0.5937
-
0.5443
0.5592
-
0.5551
0.6846
0.5106
0.5859
0.5152
0.5962
-
0.4840
0.5241
-
0.5392
0.6064
0.5419
0.5677
0.5835
0.6508
0.5910
0.6851
0.5248
0.5823
0.5257
0.5564
0.5813
0.6402
0.5848
0.6714
0.4744
0.5414
0.4898
0.5823
0.5176
0.6241
0.5290
0.6351
0.4806
0.5976
0.4963
0.5944
0.5262
0.5925
0.5373
0.6483
0.4670
0.5879
0.4657
0.5387
0.5226
0.6195
0.5305
0.6367
0.5176
0.5942
0.5353
0.5736
0.5265
0.6015
0.4920
0.5577
0.5148
0.5569
0.5265
0.5912
0.4687
0.6031
0.4850
0.5818
0.4740
0.6127
0.4671
0.6028
0.4872
0.5897
0.4773
0.6176
0.4490
0.5813
0.4541
0.5390
0.4609
0.5771
Parametric Intra-daily returns
SYMM
ABS
ASYMGJR
ASYMLS
In
Out
In
Out
In
Out
In
Out
0.5317
0.4823
0.5261
0.4548
0.5868
0.5224
0.5780
0.5331
0.5087
0.4675
0.5208
0.4655
0.5572
0.4819
0.5620
0.5251
0.5075
0.4602
0.5120
0.4483
0.5511
0.4889
0.5623
0.5254
0.4732
0.6096
0.4745
0.5915
0.5185
0.5998
0.5157
0.6483
0.4684
0.6226
0.4768
0.6101
0.5141
0.5547
0.5158
0.6560
0.4517
0.5269
0.4516
0.5390
0.4952
0.5475
0.5049
0.5800
0.5454
0.6021
0.5395
0.5607
0.6000
0.6786
0.5955
0.6890
Parametric Daily Volatility Measures
RV
RAV
BPVJ
In
Out
In
Out
In
Out
0.5143
0.5130
0.5250
0.4791
0.5238
0.5135
0.4882
0.4744
0.5163
0.4746
0.4939
0.4698
0.4829
0.4621
0.5064
0.4724
0.5044
0.4998
0.4551
0.5966
0.4672
0.5928
0.4511
0.5951
0.4462
0.5988
0.4636
0.5970
0.4446
0.5951
0.4305
0.5252
0.4415
0.5407
0.4167
0.4816
0.5284
0.6175
0.5358
0.5717
0.5389
0.6291
Table 3: One-week and one-month ahead in-sample fit and out-of-sample forecast performance of models
The top panel of the table shows the R2 of in-sample estimation for each model, acronyms appearing in lower panel of Table 1. The lower
panel provides out-of-sample forecasting, with the out-sample period specified in top panel of Table 1. The regressor in each model is 5-minute
returns. The best model, for each series, appears as boldfaced.
One-week Ahead
DJ
S&P 500
Cash Futures
Cash
Futures
One-month Ahead
DJ
S&P 500
Cash
Futures
Cash
Futures
Full sample estimation
39
SP
0.6004
0.5651
0.6704
0.5850
0.6058
0.4762
0.7126
0.5844
SYMM
ABS
ASYMGJR
ASYMLS
RV
PV
BPVJ
0.6348
0.6382
0.6733
0.6624
0.6229
0.6372
0.6314
0.5619
0.5705
0.5683
0.5790
0.5527
0.5694
0.5371
0.6258
0.6369
0.6760
0.6742
0.6234
0.6373
0.6099
0.5272
0.5407
0.5530
0.5486
0.4707
0.5380
0.4610
0.5146
0.6266
0.5433
0.6254
0.4998
0.5876
0.5193
0.3898
0.3967
0.4977
0.5116
0.3566
0.3940
0.3725
0.7570
0.6930
0.7888
0.6179
0.6362
0.6909
0.6610
0.5483
0.5157
0.5166
0.5022
0.4413
0.5083
0.4125
Out-of-sample performance
SP
0.5345
0.6404
0.6658
0.7628
0.5752
0.6292
0.4956
0.5450
SYMM
ABS
ASYMGJR
ASYMLS
RV
PV
BPVJ
0.5700
0.5558
0.6530
0.6457
0.5719
0.5543
0.5682
0.6278
0.5955
0.6268
0.6854
0.6527
0.6510
0.6400
0.5383
0.5945
0.6292
0.6436
0.5719
0.5854
0.5485
−0.9259
0.6855
0.6807
0.6899
0.6951
0.6903
0.7154
0.5590
0.5242
0.4629
0.4441
0.4343
0.5194
0.3997
0.4564
0.6154
0.4593
0.4198
0.5579
0.5911
0.5394
0.3020
0.4703
0.1935
0.2814
0.1929
0.3806
0.0661
0.4016
0.4116
0.4718
0.4548
0.3573
0.4856
0.3487
Figure 1: One-day ahead and one-week ahead news impact curves for SP models
(a) Dow Jones Cash Market; (b) Dow Jones Futures Market; (c) S&P 500 Cash Market; and (d) S&P 500
Futures Market
(a)
(b)
10
10
Daily
Daily CI
Weekly
Weekly CI
8
6
Daily
Daily CI
Weekly
Weekly CI
5
4
0
2
0
−2
−0.2
−5
−0.1
0
0.1
0.2
−0.2
−0.1
(c)
0
0.1
0.2
(d)
10
10
Daily
Daily CI
Weekly
Weekly CI
8
6
Daily
Daily CI
Weekly
Weekly CI
5
4
0
2
0
−2
−0.2
−5
−0.1
0
0.1
0.2
−0.2
40
−0.1
0
0.1
0.2
Figure 2: Parametric polynomial lag estimates of semi-parmetric MIDAS
This figure shows the lag polynomials of the semi-parametric MIDAS regression using the S&P 500 futures
data. The first plot provide the product of the daily and intra-daily Beta polynomials appearing in equation
(2.5). The second contains only the daily polynomial whereas the third only the intra-daily.
0.025
0.02
0.015
0.01
0.005
0
0
2
4
6
8
0
2
4
6
8
10
Lags
12
14
16
18
20
10
12
Daily Lags
14
16
18
20
0.6
0.4
0.2
0
0.08
0.06
0.04
0.02
0
8:30
9:30
10:30
11:30
12:30
Intradaily Lags
41
13:30
14:30
15:30
Figure 3: One-day ahead news impact curves for semi-parametric MIDAS and
two asymmetric parametric models
(a) Dow Jones Cash Market; (b) Dow Jones Futures Market; (c) S&P 500 Cash Market; and (d) S&P 500
Futures Market
(a)
(b)
5
5
ASYMUC
ASYMS
SP
4
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−0.2
−0.1
0
0.1
ASYMUC
ASYMS
SP
4
0.2
−0.2
−0.1
(c)
0.1
0.2
(d)
5
5
ASYMUC
ASYMS
SP
4
3
ASYMUC
ASYMS
SP
4
3
2
2
1
1
0
0
−1
−1
−2
−0.2
0
−2
−0.1
0
0.1
0.2
−0.2
42
−0.1
0
0.1
0.2