Risk Measures & Risk Management for High-Frequency Data
EURANDOM, Eindhoven, March 6–8, 2006
Out of Sample Forecasts of Quadratic Variation
Yacine Aı̈t-Sahalia
Princeton University
Loriano Mancini
University of Zurich
1
Introduction
High frequency data:
Observed log-price = latent efficient log-price + microstructure noise
Y =X +ε
where ε summarizes bid-ask bounces, price discreteness, differences in
trade sizes, gradual response of prices to a block trade, . . .
e.g., Roll (1984), Easley and O’Hara (1987)
2
Object of interest
Daily integrated variance
IV =
of the latent price process
Z T
0
σt2 dt
dXt = µt dt + σt dWt
estimated using discretely sample data on the observed price process Y
at times: 0 = t0, t1, . . . , tn = T
3
Object of interest
Daily integrated variance
IV =
of the latent price process
Z T
0
σt2 dt
dXt = µt dt + σt dWt
estimated using discretely sample data on the observed price process Y
at times: 0 = t0, t1, . . . , tn = T
Goal: forecast the integrated variance
3-a
Estimate of integrated variance
• realized volatility (RV)
e.g., Andersen, Bollerslev, Diebold and Labys (2001)
Barndorff-Nielsen and Shephard (2002)
• two scales realized volatility (TSRV)
Zhang, Mykland and Aı̈t-Sahalia (2005)
4
Our objective
Investigate out-of-sample performance of RV and TSRV as estimators of
integrated variance
5
Our objective
Investigate out-of-sample performance of RV and TSRV as estimators of
integrated variance
Does the better performance of TSRV extend to out-of-sample forecasts
when volatility has jumps, long memory, . . . ?
5-a
Outline
• RV and TSRV estimators
• Forecasts of integrated variance
– Monte Carlo experiments
– Empirical application
6
Realized volatility estimator
Let H = {t0, th, t2h, . . . , tn}
RV =
X ti ∈H
Yti+h − Yti
2
7
Realized volatility estimator
Let H = {t0, th, t2h, . . . , tn}
RV =
X ti ∈H
Yti+h − Yti
2
Biased and inefficient estimator of integrated variance:
• Bias = 2 nH Eε2
• Discard large amount of data −→ inefficient estimator
Example: one NYSE day and transactions every second: n = 23,400
RV at 5 minutes uses only nH = 78 observations
7-a
RV asymptotic distribution
Z T
L
RV ≈
σt2 dt
| 0 {z }
object of interest
4
+[ 4
n
Eε
H
{z
}
|
2
2
n
Eε
H
| {z }
bias due to noise
+
due to noise
|
+
2T T 4
σt dt
n
0
| H {z
}
Z
]1/2 Ztotal
due
{z to discretization}
total variance
8
Two Scales Realized Volatility estimator
• Subsampling: Partition the original grid G into K subsamples, G (k),
k = 1, . . . , K, e.g.
= { t0 t1 . . . tK tK+1
...
G (1) = { t0
tK
G (2) = {
t1
tK+1
...
G (K) = {
tK−1
t2K−1
G
P
(k)
and compute [Y, Y ]
=
ti∈G (k)
Yti+K − Yti
2
tn }
... }
... }
... }
9
Two Scales Realized Volatility estimator
• Subsampling: Partition the original grid G into K subsamples, G (k),
k = 1, . . . , K, e.g.
= { t0 t1 . . . tK tK+1
...
G (1) = { t0
tK
G (2) = {
t1
tK+1
...
G (K) = {
tK−1
t2K−1
G
P
(k)
and compute [Y, Y ]
=
ti∈G (k)
Yti+K − Yti
2
tn }
... }
... }
... }
PK
(avg)
• Averaging: [Y, Y ]
= k=1[Y, Y ](k)/K
9-a
Two Scales Realized Volatility estimator
• Subsampling: Partition the original grid G into K subsamples, G (k),
k = 1, . . . , K, e.g.
= { t0 t1 . . . tK tK+1
...
G (1) = { t0
tK
G (2) = {
t1
tK+1
...
G (K) = {
tK−1
t2K−1
G
P
and compute [Y, Y ](k) =
ti∈G (k)
Yti+K − Yti
2
tn }
... }
... }
... }
PK
(avg)
• Averaging: [Y, Y ]
= k=1[Y, Y ](k)/K
• Bias-correction:
n
(all)
(avg)
[Y,
Y
]
−
TSRV = [Y,
Y
]
{z
}
|
{z
}
|n
slow time scale
fast time scale
9-b
TSRV asymptotic distribution
Number of subsamples K = c n2/3
L
TSRV ≈
Z T
σt2 dt
8
+ 1/6 [ 2 E[ε2]2
n
}
|c {z
1
| 0 {z }
object of interest
+
4 T 4
cT
σt dt
0
| 3 {z
}
Z
]1/2 Ztotal
due
{z to discretization}
|due to noise
total variance
K ∗ minimizes Var[TSRV], K ∗ = c∗ n2/3,
c∗ =
Z T
!−1/3
T
4 dt
σ
t
12 E[ε2]2 0
10
Monte Carlo simulation: Setup
Latent log-price: dXt = (0.05 − σt2/2) dt + σt dWt
Four different dynamics for σt, time step = 1 second
11
Monte Carlo simulation: Setup
Latent log-price: dXt = (0.05 − σt2/2) dt + σt dWt
Four different dynamics for σt, time step = 1 second
Observed log-price: Yτ = Xτ + ετ ,
ετ ∼ N ID 0, (0.10/100)2
11-a
Monte Carlo simulation: Setup
Latent log-price: dXt = (0.05 − σt2/2) dt + σt dWt
Four different dynamics for σt, time step = 1 second
Observed log-price: Yτ = Xτ + ετ ,
ετ ∼ N ID 0, (0.10/100)2
Each sample path: 101 trading days −→ 100 estimates of daily IV
Each experiment: 10,000 simulated sample paths
11-b
Estimate of daily integrated variance
• RV:
· 5, 10, 15, 30 minutes, raw log-returns
· de-meaned MA(1) filtered 5 minutes log-returns
• TSRV: slow time scale 5, 10, 15, 30 minutes, and minimum variance
12
Experiment 1: Heston volatility model
Heston (1993) model: dσ 2 = 5 (0.04 − σ 2) dt + 0.5 σdW
In-sample estimates of daily IV
RV 5mn
RV 5mn MA(1)
TSRV 5mn
TSRV min var
Bias
1.560
1.348
−0.014
−0.001
Var
0.318
0.274
0.071
0.020
RMSE
1.659
1.446
0.266
0.140
Bias
2.508
2.026
−0.011
−0.004
Relative
Var
RMSE
47.701 7.348
26.985 5.576
0.022 0.149
0.010 0.099
13
Out-of-sample forecasts of IV: Heston model
Mincer-Zarnowitz regression:
IVtrue = b0 + b1 TSRVforecast + b2 RVforecast + error
RV 5mn
TSRV 5mn
b0
−1.467 (0.016)
0.018 (0.006)
TSRV 5mn vs.
RV 5mn
RV 10mn
RV 15mn
RV 30mn
0.146
0.159
0.119
0.089
TSRV min var
0.005 (0.004)
(0.016)
(0.010)
(0.008)
(0.007)
b1
b2
0.971 (0.005)
R2
0.809
0.928
(0.009)
(0.008)
(0.007)
(0.006)
0.928
0.930
0.930
0.930
0.991 (0.003)
1.057
1.120
1.107
1.085
(0.008)
(0.008)
(0.007)
(0.006)
0.993 (0.002)
−0.074
−0.146
−0.134
−0.118
0.961
14
Experiment 2: Jump-diffusion volatility model
Heston Jump-diffusion model (e.g., Eraker, Johannes and Polson, 2003)
dσ 2 = 5 (0.035 − σ 2) dt + 0.5σ dW + J dq
Poisson process q, jump size J ∼ Exp
number of jumps ≈ 2 per day; jump size ≈ 2% of unconditional variance
15
Out-of-sample forecasts: Heston jump-diffusion model
6
5
4
Forecast IV minus True IV
3
2
1
0
−1
−2
RV 5mn MA(1) Filter
TSRV 5mn
TSRV min variance
−3
−4
0
2
4
6
8
10
12
14
True IV
Model: dσ 2 = 5 (0.035 − σ 2 ) dt + 0.5 σdW + J dq, number of jumps ≈ 2 per day; jump size ≈ 2% of
unconditional variance
16
Experiment 3:
Heterogeneous Autoregressive Realized Volatility model
Corsi (2004): Cascade model for the integrated variance
IV (m) ←− RV (m)
IV (w) ←− RV (m), RV (w)
IV (d) ←− RV (m), RV (w), RV (d)
17
Experiment 3:
Heterogeneous Autoregressive Realized Volatility model
Corsi (2004): Cascade model for the integrated variance
IV (m) ←− RV (m)
IV (w) ←− RV (m), RV (w)
IV (d) ←− RV (m), RV (w), RV (d)
Reduced form:
(d)
(d)
IVt+∆ = 0.002 + 0.45 RVt
(d)
RVt
=
X
tj ∈{day t}
rt2j
(w)
+ 0.30 RVt
(m)
+ 0.20 RVt
(w)
and similarly RVt
+ error
(m)
and RVt
17-a
Experiment 3:
Heterogeneous Autoregressive Realized Volatility model
Corsi (2004): Cascade model for the integrated variance
IV (m) ←− RV (m)
IV (w) ←− RV (m), RV (w)
IV (d) ←− RV (m), RV (w), RV (d)
Reduced form:
(d)
(d)
IVt+∆ = 0.002 + 0.45 RVt
(d)
RVt
=
X
tj ∈{day t}
rt2j
(w)
+ 0.30 RVt
(m)
+ 0.20 RVt
(w)
and similarly RVt
+ error
(m)
and RVt
Intrady latent log-return
rt = Xt − Xt−∆ =
and zt ∼ N ID(0, 1), ∆ = 1 second
r
(d)
IVt
zt
17-b
HAR-RV model
0.6
1/2
Daily IV
0.5
1/2
Weekly IV
Monthly IV1/2
0.4
0.3
0.2
0.1
0
0
500
1000
1500
2000
Day
2500
3000
3500
4000
0.8
Autocorrelation
0.6
0.4
0.2
0
−0.2
0
20
40
60
Lag (day)
80
100
120
Simulated path of HAR-RV process, IV (d) = 0.004 + 0.35 RV (d) + 0.30 RV (w) + 0.25 RV (m) + ω (d)
ω (d) ∼ N ID(0, 0.0032 ). Efficient log-return rt = Xt − Xt−∆ , and ∆ = 3 hours
18
Out-of-sample forecasts of IV: HAR-RV model
Mincer-Zarnowitz regression:
IVtrue = b0 + b1 TSRVforecast + b2 RVforecast + error
RV 5mn
TSRV 5mn
TSRV 5mn vs.
RV 10mn
RV 15mn
RV 30mn
TSRV min var
b0
−1.499 (0.017)
−0.005 (0.006)
b1
b2
0.981 (0.005)
R2
0.762
0.891
0.023 (0.009)
0.009 (0.009)
0.023 (0.007)
0.891
0.891
0.891
1.017 (0.004)
−0.027 (0.011)
−0.012 (0.009)
−0.017 (0.007)
0.996 (0.009)
1.009 (0.008)
0.998 (0.007)
0.009 (0.005)
0.997 (0.003)
0.918
19
Experiment 4: Fractional Ornstein-Uhlenbeck model
Long memory model: WH fractional Brownian motion, H = 0.7
dZ = 20 (0.2 − Z) dt + 0.012 dWH
e.g., Comte and Renault (1998), Cheridito, Kawaguchi and Maejima (2003)
20
Experiment 4: Fractional Ornstein-Uhlenbeck model
Long memory model: WH fractional Brownian motion, H = 0.7
dZ = 20 (0.2 − Z) dt + 0.012 dWH
e.g., Comte and Renault (1998), Cheridito, Kawaguchi and Maejima (2003)
Set σ = Z and forecast
Iσm =
Z T
m+1
Tm
σt dt
20-a
Experiment 4: Fractional Ornstein-Uhlenbeck model
Long memory model: WH fractional Brownian motion, H = 0.7
dZ = 20 (0.2 − Z) dt + 0.012 dWH
e.g., Comte and Renault (1998), Cheridito, Kawaguchi and Maejima (2003)
Set σ = Z and forecast
Iσm =
Z T
m+1
Tm
σt dt
Exactly specified forecast regression:
Cov(Iσm, Iσm−1)
E[Iσm|Iσm−1] = E[Iσm] +
(Iσm−1 − E[Iσm−1])
Var(Iσm−1)
20-b
Fractional Ornstein-Uhlenbeck
0.4
Classical OU
Fractional OU
0.35
Volatility
0.3
0.25
0.2
0.15
0.1
0.05
0
500
1000
1500
2000
Day
2500
3000
3500
4000
1.2
Autocorrelation
1
0.8
0.6
0.4
0.2
0
−0.2
0
10
20
30
40
50
Lag (day)
60
70
80
90
100
Simulated path of classical and fractional Ornstein-Uhlenbeck process; Comte and Renault (1998)
dZ = 10 (0.2 − Z) dt + 0.2 dWH , FBM with Hurst exponent H = 0.7, dt = 1 day
21
In-sample estimates of IV: Fractional OU model
RV 5mn
RV 5mn MA(1)
TSRV 5mn
TSRV min var
Bias
1.558
1.360
−0.018
−0.006
Var
0.288
0.229
0.048
0.017
RMSE
1.647
1.442
0.220
0.129
Relative
Bias
Var
RMSE
1.036 0.219 1.137
0.900 0.155 0.982
−0.011 0.017 0.132
−0.004 0.006 0.080
22
Distribution of TSRV: Fractional OU model
0.45
0.4
0.35
density
0.3
0.25
0.2
0.15
0.1
0.05
0
−4
−3
−2
−1
0
N(0,1)
1
2
3
4
Asymptotic and small sample distributions of minimum variance TSRV
23
Out-of-sample forecasts of IV: Fractional OU model
Mincer-Zarnowitz regression:
IVtrue = b0 + b1 TSRVforecast + b2 RVforecast + error
RV 5mn
TSRV 5mn
TSRV 5mn vs.
RV 5mn
RV 10mn
RV 15mn
RV 30mn
TSRV min var
b0
−1.155 (0.032)
0.006 (0.009)
0.306
0.263
0.209
0.159
(0.024)
(0.017)
(0.014)
(0.012)
−0.009 (0.006)
b1
b2
0.873 (0.010)
R2
0.438
0.757
(0.011)
(0.011)
(0.010)
(0.009)
0.762
0.766
0.766
0.766
1.007 (0.006)
1.111
1.150
1.139
1.118
(0.010)
(0.009)
(0.009)
(0.008)
1.010 (0.004)
−0.147
−0.202
−0.194
−0.176
0.877
24
Empirical application: IBM stock
• Transaction prices: TAQ database
• May 2, 2003 – May 2, 2005: from 9:30 until 16:00
• Pre-processing: eliminated
– obvious errors (zero prices, out of order transaction times, . . . )
– bounce back outliers larger than 0.03
25
IBM stock
5
0
−5
Apr03
Jul03
Oct03
Jan04
Apr04
Jul04
Oct04
Jan05
Apr05
Jul05
Apr05
Jul05
0.4
RV 5mn
RV 5mn MA(1)
TSRV 5mn
0.3
0.2
0.1
0
Apr03
Jul03
Oct03
Jan04
Apr04
Jul04
Oct04
Jan05
IBM stock from May 2003 to May 2005: daily log-return in %; daily
√
IV on annual base
26
In-sample forecasts of
√
IV: IBM stock
Mincer-Zarnowitz regression:
√ true
IV
= b0 + b1 TSRVforecast + b2 RVforecast + error
RV 5mn
TSRV 5mn
vs. RV 5mn
b0
−0.245 (0.085)
−0.072 (0.056)
−0.058 (0.081)
b1
1.056 (0.061)
1.079 (0.116)
b2
1.174 (0.088)
−0.036 (0.153)
R2
0.262
0.371
0.371
√
One day ahead forecasts of IV: AR(1) model for IV, estimated using the
full sample (May 2003–May 2005). 503 in-sample forecasts.
√
Benchmark IV: TSRV estimates
27
Out-of-sample forecasts of
√
IV: IBM stock
Mincer-Zarnowitz regression:
√ true
IV
= b0 + b1 TSRVforecast + b2 RVforecast + error
RV 5mn
TSRV 5mn
vs. RV 5mn
b0
−0.224 (0.085)
−0.107 (0.069)
−0.131 (0.084)
b1
1.082 (0.077)
0.984 (0.206)
b2
1.156 (0.091)
0.121 (0.234)
R2
0.350
0.395
0.396
√
One day ahead forecasts of IV: AR(1) model for IV re-estimated every
day using the past 200 estimates of IV. 303 out-of-sample forecasts.
√
Benchmark IV: TSRV estimates
28
Out-of-sample forecasts of
√
IV: IBM stock
0.24
Benckmark IV
TSRV
RV
0.22
0.2
0.18
0.16
0.14
0.12
0.1
0.08
Nov04
Dec04
Jan05
One day ahead forecasts of
Feb05
√
Mar05
Apr05
May05
Jun05
IV. AR(1) model for IV re-estimated every day
29
Conclusions
• Monte Carlo evidence: TSRV outperforms RV in forecasting
integrated variance even when volatility has jumps, long memory, . . .
• Empirical application confirms Monte Carlo results
30
Current research
• Dependent microstructure noise:
ε neither time-independent nor independent of latent price X
• Extend the empirical part
31
© Copyright 2026 Paperzz