A Nonparametric Least-Absolute-Deviations Estimator of

A Nonparametric Least-Absolute-Deviations
Estimator of Volatility Functions
Flavio A. Ziegelmann ∗
Universidade Federal do Rio Grande do Sul (Brazil)
[email protected]
July 2005
Abstract
An important empirical characteristic of financial time series is
that the unconditional distribution of the returns tends to possess
heavy tails. This is the motivation for the particular local kernel
volatility estimator proposed in this work. Whereas least-squaredeviations (LSD) estimators are strongly affected by heavy-tailed distributions, the performance of least-absolute-deviations (LAD) estimators is not. This robustness to heavy tails is evidenced by the more
flexible assumptions made on the distributional moments of the observable variable. The simulation examples also highlight the superior
performances of the LAD estimator when compared to the LSD estimator under heavy tails conditions. The full nonparametric model is
described and the asymptotic properties of the LAD estimator are derived. Extensive Monte Carlo studies strongly suggest that the LAD
estimator is asymptotically adaptive to the unknown conditional first
moment. The LAD estimator is also used to estimate the volatility of
the S&P500 and the BOVESPA returns.
1
Introduction
Empirical evidence has suggested that returns of financial time series possess
heavy-tailed marginal distributions. Therefore, these returns may not have
∗
Supported by a grant from CAPES (Brazil)
1
finite higher moments, strongly affecting or even preventing the use of leastsquare-deviations (LSD) based estimators.
In a nonparametric context, kernel smoothing techniques provide estimators with nice asymptotic properties. Silverman (1986), Wand and Jones
(1995) and Fan and Gijbels (1996) give a comprehensive coverage of these
techniques, while Simonoff (1996) and Bowman and Azzalini (1997) present
a more practical approach. In this context, the inadequacy of local LSD
methods has led researchers to search for more robust alternatives. Some
of these robust methods have their roots in the approaches suggested by
Härdle (1984) and Härdle and Gasser (1984). Truong (1989) and Fan and
Hall (1994) study local median smoothing for independent data. Truong and
Stone (1992) address robust nonparametric regression for time series data.
Wang and Scott (1994) develop L1 methods for robust nonparametric regression. Welsh (1996), and more recently, Cai and Zhu (2004) and Koenker and
Xiao (2004) describe robust quantile methods.
In this article, we present the residual local linear least-absolute-deviations
(LAD) estimator of the conditional volatility, which is not the conditional
variance, to accommodate the presence of heavy-tailed distributions in financial time series. Hall, Peng and Yao (1999) use the LAD estimator to
estimate the conditional median function of the response variable Yt in a
time series setting. In terms of estimation, we follow a two-step residual
method as in Fan and Yao (1998) and Ziegelmann (2002). After estimating
the conditional mean (or median) at a first step, we estimate the conditional
volatility, at a second step. For that, the squared residuals obtained from
fitting the conditional mean (median) at the first step are used as the new
response variable.
This paper is organised as follows: in Section 2, we present our model
and describe the LAD estimator of the conditional volatility function. In
Section 3, its theoretical asymptotic properties are established. Section 4
implements three simulation studies for three different models: a zero-mean
ARCH(1) model, a non-zero-mean threshold or asymmetric ARCH(1) model,
and a non-zero-mean nonlinear time series model analysed previously in
Ziegelmann (2002). All examples show a superior performance of the LAD
over the LSD estimator when the fourth conditional moment of Y given X
does not exist. Moreover, extensive Monte Carlo analysis of the second and
third numerical examples strongly suggest that the LAD volatility estimator is asymptotically adaptive to the unknown conditional location quantity
(either mean or median) estimated at the first step. In the fourth and fi2
nal section we apply the LAD estimator to S&P500 and BOVESPA index
returns, comparing its performance to the LSD estimator.
2
The Model and the LAD Volatility Estimator
In this model the volatility function is the conditional median of the squared
residuals, rather than the traditional conditional mean. In the first step we
model the conditional mean of Y ; the alternative modelling of its conditional
median is addressed later on nevertheless.
Let {(Yt , Xt )} be a two-dimensional strictly stationary process, having
the same marginal distribution as (Y, X). Let m(x) = IE(Y |X = x) and
τ (x) = M{[Y − m(x)]2 |X = x} > 0, where M{Y |X} is the conditional
median of Y given X. Then, we write our model as
Yt = m(Xt ) + τ 1/2 (Xt )εt ,
(1)
where IE(εt |Xt ) = 0 and M(ε2t |Xt ) = 1.
The local linear LSD estimator of m(x) is denoted by m̂(x), having its
explicit form defined by
m̂(x) = T −1
T
X
[ŝ2 (x) − ŝ1 (x)(Xt − x)] Kh (Xt − x)Yt
ŝ2 (x)ŝ0 (x) − ŝ1 (x)2
t=1
,
(2)
where the kernel K(·) is a symmetric density function on <, with Kh (u) =
(1/h)K(u/h), h > 0 is the bandwidth, or smoothing parameter, and
ŝr (x) = T
−1
T
X
(Xt − x)r Kh (Xt − x) .
t=1
Our main interest in this paper is to estimate the conditional volatility
function, τ (·), in a second step, using the residual estimator.
From model (1) we can see that
rt2 = (Yt − m(Xt ))2 = τ (Xt )ε2t .
(3)
Taking the conditional median of the squared residuals in equation (3), we
obtain M (rt2 |Xt ) = τ (Xt ). This suggests to estimate τ (x) from a LAD regression of the estimated squared residuals r̂t2 against Xt .
3
Therefore, the estimator of the conditional volatility (in this case, the
conditional median of [Yt − m(Xt )]2 ) is τ̂ (x) = γ̂1 , the solution for γ1 in the
following weighted least-absolute deviations problem:
(γ̂1 , γ̂2 ) = arg min
γ1 ,γ2
T
X
|r̂t2 − γ1 − γ2 (Xt − x)|Kh2 (Xt − x) ,
(4)
t=1
where h2 > 0 is the second-step bandwidth, and K is a symmetric probability
density.
3
Asymptotic Properties
To assess the theoretical asymptotic properties of the LAD volatility estimator, we first introduce some notation. Let f (·) denote the marginal density
of Xt , g(·|x) be the conditional density function of Yt given Xt = x, and
gτ (x) = g{τ (x)|x}.
Also, τ̇ (x) =R ∂τ (x)/∂x and τ̈ (x) = ∂ 2 τ (x)/∂x2 . MoreR 2
over, let κ1 = K (u)du, σ0 = u2 K(u)du, and σ 2 (x) = κ1 /[4f (x)gτ2 (x)].
Some regularity conditions are now enumerated:
(C1) For fixed x, f (x) > 0 and g(y|x) > 0 is continuous at y = τ (x). In a
neighbourhood of x, τ (·) has two continuous derivatives, both f (·) and
g(y|·) have continuous first derivatives, and IE(|Yt |δ |Xt = ·) < ∞ for
some δ ≥ 2.
(C2) The kernel K is a symmetric, compactly supported density, satisfying
|K(x1 ) − K(x2 )| ≤ C|x1 − x2 | for x1 , x2 .
(C3) The process {(Xt , Yt )} is strongly mixing, i.e.,
α(v) ≡
sup
u
A∈=∞
u+v ,B∈=1
|P(A)P(B) − P(AB)| → 0
as v → ∞, where =vu denotes the σ-field generated by {(Xt , Yt ) : u ≤
P
t ≤ v}. Further, v≥1 α(v)δ0 /(1+δ0 ) < ∞ for some δ0 ∈ [0, 1).
(C4) As T → ∞, h2 → 0 and lim inf T →∞ T h32 > 0.
We are now ready to state the following result:
4
Theorem 1 Consider the conditional mean function m(·) known. Thus, if
regularity conditions (C1)-(C4) hold, as T → ∞,
1
1
(T h2 ) 2 {τ̂ (x) − τ (x) − h22 σ0 τ̈ (x)}
2
(5)
is asymptotically normal with mean 0 and variance σ 2 (x).
Proof: We need to prove the asymptotic properties of τ̂ (x) = γ̂1 , the solution for γ1 in (4). The idea of the proof is as follows: first we construct a
new quadratic function whose minimisers θ̂1 and θ̂2 are one-to-one functions
of the minimisers in (4), γ̂1 and γ̂2 . Then, we establish the asymptotic properties of θ̂1 and θ̂2 . Finally, we prove that our original estimator γ̂1 is close
enough to the new minimiser θ̂1 , so sharing the same asymptotic properties.
Let Kt = K{(Xt −x)/h}, Zt = (1, (Xt −x)/h)0 , rt2∗ = rt2 −τ (x)−τ̇ (x)(Xt −
x) and θ̂ = (T h)1/2 (τ̂ (x) − τ (x), h{τ̇ˆ(x) − τ̇ (x)})0 . For θ = (θ1 , θ2 )0 ∈ <2 ,
define
G(θ) =
T
X
{|rt2∗ − (θ 0 Zt /(T h)1/2 )| − |rt2∗ |}Kt ,
(6)
t=1
where θ̂ is the minimiser of θ in G(θ). Hence, if we establish the asymptotic
properties of θ̂1 , so do we for τ̂ = γ̂1 .
Let now dt = θ 0 Zt /(T h)1/2 (assuming that dt ≥ 0, without loss of generality) and consider the following decomposition:
{|rt2∗ − dt )| − |rt2∗ |}Kt = [rt2∗ {I(rt2∗ > dt ) − I(rt2∗ ≤ dt ) − I(rt2∗ > 0) + I(rt2∗ ≤ 0)}
+dt {I(rt2∗ ≤ dt ) − I(rt2∗ > dt )}]Kt
= −dt Kt D(rt2∗ ) + 2Kt (dt − rt2∗ )I(0 ≤ dt − rt2∗ < dt ),
(7)
where D(y) = I(y > 0) − I(y < 0).
Then if we write
St = [|rt2∗ − (θ 0 Zt /(T h)1/2 )| − |rt2∗ | + dt D(rt2∗ )]Kt ,
(8)
we can use result (7) to obtain (after some calculations, and noting that (C4)
implies that dt → 0)
IE(St ) = T −1 f (x)gτ (x)(θ12 + θ22 σ0 ) + o(T −1 ) .
5
(9)
Now considering that
T
X
St = G(θ) +
t=1
T
X
dt Kt D(rt2∗ ) ,
(10)
t=1
we introduce the new function
R(θ) =
T
X
St − f (x)gτ (x)(θ12 + θ22 σ0 )
t=1
= G(θ) +
T
X
dt Kt D(rt2∗ ) − f (x)gτ (x)(θ12 + θ22 σ0 ) ,
(11)
t=1
which is quadratic on θ1 and θ2 .
P
The next step is to prove that R(θ) → 0. Firstly, in view of equation (9),
it is straightforward to see that
IE(R(θ)) → 0 .
P
(12)
P
Therefore, we only need to show that Tt=1 [St − IE(St )] → 0. Since
IE( t=1 [St − IE(St )]) = 0, then, applying Chebyshev’s inequality, we have
PT
¯
(¯ T
)
¯X
¯
¯
¯
P ¯ [St − IE(St )]¯ > ε
¯
¯
T
X
1
≤ 2 V ar( St )
ε
t=1
t=1
(13)
(
)
T
TX
−1
1 X
2
≤ 2
IE(St ) + 2
(T − t)Cov(S1 , St+1 ) .
ε t=1
t=1
In order to tackle the first sum on the right-hand-side of inequality (13), note
that
||a + b| − |a| − D(a)b| ≤ 2|b|I(|a| < |b|) ,
K(·) has a compact support, and gτ (·) > 0. Hence, it follows (after some
algebraic) that for any γ ≥ 0
2(1+γ)
IE(St
) ≤
n
o
2
2(1+γ) 2(1+γ)
2∗
1/2
0
I
E
(θ
Z
)
K
I(|r
|
<
C/(T
h)
)
t
t
t
(T h)1+γ
3
1
= O(T −( 2 +γ) h−( 2 +γ) ) .
(14)
Thus, using the previous relation and Corollary 1.1 of Bosq (1998, p. 21)
for the second sum on the right-hand-side of (13), we have that
¯
(¯ T
)
¯X
¯
¯
¯
P ¯ [St − IE(St )]¯ > ε = O{(T h)−1/2 + T −1/[2(1+δ0 )] h−[1+2δ0 ]/[2(1+δ0 )] } ,
¯
¯
t=1
(15)
6
which converges to 0, since the second term on the right-hand side is of
smaller order than (T h3 )−1/[2(1+δ0 )] (see condition (C4)). From this, we can
also conclude that
G(θ) +
T
X
P
dt Kt D(rt2∗ ) → f (x)gτ (x)(θ12 + θ22 σ0 ) .
t=1
In addition, using the same arguments as Pollard (1991, p. 193) and Yao
and Tong (1996, p. 289), we may show that the difference between the θ
that minimises G(θ) and the minimiser of
−
T
X
dt Kt D(rt2∗ ) + f (x)gτ (x)(θ12 + θ22 σ0 )
t=1
converges to 0 in probability. Deriving the previous expression in relation to
θ1 , we obtain
√
T
X
1
T h{τ̂ (x) − τ (x)} = √
D(rt2∗ )Kt + op (1) .
2 T hf (x)gτ (x) t=1
(16)
Then, after some more algebraic, we have
IE{D(rt2∗ )Kt } = σ0 h3 f (x)gτ (x)τ̈ (x) + o(h3 )
and
(17)
Z
IE{D(rt2∗ )Kt }2 = f (x)h
K 2 + o(1) .
(18)
The asymptotic normality follows from Theorem 1 of Doukhan (1994,
page 46). Finally, putting results (16), (17) and (18) all together, the proof
is finished.
Discussion of the regularity conditions: In (C1), in the case of first
stage LAD estimation, the condition δ ≥ 2 can be extended to δ > 0, allowing consequently that IE(Y |X = x) = ∞. For this, we can substitute Y by a
truncated variable Y − (with finite expectation), and show that some quantities involving Y are close enough to the same quantities with Y replaced
by Y − , for 0 < δ < 1. (C2) can be relaxed in the sense that non-compact
functions with light tail weights are admitted, in special the Gaussian kernel.
Also note that the strong mixing condition in (C3) is a mild one.
7
Unknown Conditional Mean: Theorem 1 is proved for a known m(·),
for technical simplicity. In our second and third numerical examples though,
the conditional mean is estimated from the data, and a posterior analysis
on the adaptiveness of the LAD estimator to unknown conditional mean (or
median) is carried out.
M{Yt |Xt } estimator: If we consider using LAD estimators in both stages
of our estimation procedure, i.e., to estimate both the location and the
volatility functions, a slight alteration in our model is necessary. As before, let {(Yt , Xt )} be a two-dimensional strictly stationary process, having
the same marginal distribution as (Y, X). Now let m∗ (x) = M(Y |X = x)
and τ (x) = M{[Y − m∗ (x)]2 |X = x} > 0, where M{Y |X} is the conditional
median of Y given X. Then, we can write our model as
Yt = m∗ (Xt ) + τ 1/2 (Xt )εt ,
(19)
where M(εt |Xt ) = 0 and M(ε2t |Xt ) = 1. Hence, the properties of the LAD
estimator of m∗ (·) are equivalent to those of the LAD estimator of the τ (·)
function discussed previously, although τ (·) itself is different from the previous setting.
4
Numerical Implementations
We present three simulation studies. The first example does not involve
the two-stage estimation procedure, since it is a zero-mean ARCH(1) model
where the true mean is assumed to be known. On the other hand, the
second and third examples do involve the two-stage procedure, leading to
the residual volatility estimator. In all three examples we simulate from
both non-heavy and heavy-tailed distributions. In the latter case the error
distributions are chosen in a way that the fourth conditional moment of the
observable variable is not defined. To perform the practical estimations of the
LAD estimator we again use the downhill simplex algorithm (see Section 10.4
in Press, Teukolsky, Vetterling and Flannery, 1992), since there is no closed
analytical expression for this estimator. The cross-validation method is used
to automatically select the bandwidth, and the Gaussian kernel is adopted.
Example 1: We simulate from the ARCH(1) model, specified as below:
yt = σt εt
8
(20)
2
σt2 = 0.2 + 0.5yt−1
.
(21)
We examine two cases: εt ∼ N (0, 1) and εt ∼ t(3). For each case, each simulated series has 2000 observations and we replicate 100 of these series. For
the estimation procedures we consider the true conditional mean as known.
We estimate the conditional volatility at time t for each of the 41 equidistant
points Y (t − 1) = y in the grid [-2,2]. For each generated series we compute
the mean absolute deviations, given by
δi =
41
1 X
|σ̂ 2 (xj ) − σ 2 (xj )|, i = 1, 2, . . . , 100 ,
41 j=1
where xj is the jth grid point. Figure 1 shows the true conditional variance
function, its estimates and two standard-deviation limits. It is clear from
comparing part (a) to (b) that in the case of Gaussian errors the LSD estimates have smaller variance and seem to have smaller bias as well, at least
near the boundaries. It is also noticeable that in the case of heavy tails,
shown in parts (c) and (d), the LAD estimates exhibit much smaller variance, with less bias in the centre but larger in the tails (although the bias
might be a result from extra smoothing). One should be reminded that the
variance of the LSD estimator is not defined for the t-distribution case.
(b) LAD vol.est.: normal error
4
3
3
Variance(t)
Variance(t)
(a) LSD vol.est.: normal error
4
2
1
2
1
0
0
-2
-1
0
1
2
-2
-1
Y(t-1)
(c) LSD vol.est.: t error
1
2
(d) LAD vol.est.: t error
4
4
3
3
Variance(t)
Variance(t)
0
Y(t-1)
2
1
2
1
0
0
-2
-1
0
1
2
-2
Y(t-1)
-1
0
1
2
Y(t-1)
Figure 1: Example 1. True conditional variance, its estimates and two
standard-deviation confidence intervals.
9
In Figure 2, we can observe the box-plots of the mean-absolute deviations,
comparing LSD and LAD estimates in both cases, Gaussian errors in part (a)
and Student errors in part (b). Our insights from the previous figure are
reflected in a global point of view. Remarkably superior performances in
terms of both bias and variance are obtained by the LAD estimates when
heavy-tailed errors are present, whereas, as expected, LSD performs better
under Gaussian errors.
(a) Box plots: normal error
0.6
0.5
0.4
0.3
0.2
0.1
0.0
LS
LAD
(b) Box plots: t error
0.6
0.5
0.4
0.3
0.2
0.1
0.0
LS
LAD
Figure 2: Example 1. Mean absolute deviations of LSD and LAD volatility
estimates.
Example 2: The second example is an extension of the preceding, since it
retains the ARCH structure of the conditional variance but adds some more
complexity to the model. The model is now defined as
yt = 0.2yt−1 + σt εt
2
,
σt2 = 0.2 + [0.5 − 0.3I(yt−1 > 0)]yt−1
(22)
(23)
which can be seen as a non-zero mean threshold (or asymmetric) ARCH(1)
model. The threshold in the variance equation incorporates into the model
10
the asymmetry in financial data, i.e., negative past returns lead to more
variability than positive ones, both with the same absolute value. The grid
is [-1.8,1.8], with 36 equidistant points. We run 80 repetitions, each of them
with 1000 observations. The same error distributions of Example 1 are used.
Also the same mean-absolute deviations are investigated. Figure 3 shows the
true conditional mean (the same as the conditional median in this case) and
its respective estimates. LSD and LAD estimates are provided, for Gaussian
and Student errors.
0.4
0.4
0.2
0.2
0.0
0.0
-0.2
-0.2
-0.4
-0.4
-2
-1
0
1
2
-2
0
1
Y(t-1)
(c) LS estimates: t error
(d) LAD estimates: t error
0.4
0.4
0.2
0.2
0.0
-0.2
-0.4
-0.4
-1
0
1
2
-2
Y(t-1)
2
0.0
-0.2
-2
-1
Y(t-1)
Median(t)
Mean(t)
(b) LAD estimates: normal error
Median(t)
Mean(t)
(a) LS estimates: normal error
-1
0
1
2
Y(t-1)
Figure 3: Example 2. LSD and LAD estimates of the conditional mean for
Gaussian and Student error distributions.
Figure 4 plots the true volatility function, as well as its estimates and twostandard-deviation confidence intervals. Numerical difficulties of obtaining
LSD estimates for the Student error distribution were faced in this implementation, specially near the discontinuity point and on the boundaries. Whereas
comparing part (a) to (b) seems to reveal a slightly better performance of
the LSD estimator, a comparison between parts (c) and (d) uncovers a huge
superiority of the LAD estimator over the LSD. This enhances the robustness
of the LAD volatility estimator against heavy-tailed distributions.
In a similar way to Example 1, the global figures of the box-plots shown in
Figure 5 indicate better results for the LAD estimator in the case of Student
11
distributed errors, while the LSD estimator is superior in the Gaussian case.
(b) LAD estimates: normal error
4
3
3
Variance(t)
Variance(t)
(a) LS estimates: normal error
4
2
1
2
1
0
0
-2
-1
0
1
2
-2
-1
Y(t-1)
(c) LS estimates: t error
1
2
(d) LAD estimates: t error
4
4
3
3
Variance(t)
Variance(t)
0
Y(t-1)
2
1
2
1
0
0
-2
-1
0
1
2
-2
-1
Y(t-1)
0
1
2
Y(t-1)
Figure 4: Example 2. True conditional variance, its estimates and two
standard-deviation limits are represented.
(a) Box plots: normal error
0.6
0.5
0.4
0.3
0.2
0.1
0.0
LS
LAD
(b) Box plots: t error
0.6
0.5
0.4
0.3
0.2
0.1
0.0
LS
LAD
Figure 5: Example 2. Mean absolute deviations of LSD and LAD volatility
estimates.
12
An important question to be answered is whether the LAD volatility estimator is adaptive to the unknown conditional mean (or median). In order
to address this issue numerically, we compare the oracle volatility estimates
(those computed using the true residuals in the second stage) to those obtained from the residual estimator. We simulate observations for both error
distributions, Gaussian and Student. We also carry out Monte Carlo simulations for two possible scenarios, LSD and LAD conditional mean (median)
estimates. Only one repetition for each sample size is performed, and we
use sample sizes of 100, 500, 1000 and 2000 observations. The scatter plots
between the series of estimates for the two estimators are reported in Figure 6 for the Normal case. The plots on the left-hand-side are based on a
LSD first stage mean estimator, whereas on the right-hand-side are those
based on a LAD median (median and mean are equal in this case) estimator.
Both sides suggest that as the sample size increases so does the correlation
between oracle and residual estimates.
(a) Est x True: T=100, LSD mean
0.6
0.5
0.4
0.3
0.2
••
•
•••
••
•
••• • • • •
••
0.2
•
•
•
•
•
•
•
• ••
•
•
(b) Est x True: T=100, LAD median
0.3
0.2
0.1
0.0
0.4
0.6
0.8
1.0
••• •
••
•
••••
••
0.2
(c) Est x True: T=500, LSD mean
5
4
3
2
1
0
•
4
3
2
1
0
•
•
•
•
•• •••• ••• • •
•••• ••••••
0.0
0.5
1.0
1.5
2.0
2.5
3.0
(e) Est x True: T=1000, LSD mean
1.2
1.0
0.8
0.6
0.4
0.2
••
•
•
•
•
••
• •• ••
• • • ••••
•••••• •
0.2
•
•
•
0.4
1.2
1.0
0.8
0.6
0.4
0.2
•
•
0.6
•
1.5
•
••
•• •
••••
•
•••
•
•
•
•
•
••• •
0.5
1.0
• •
0.4
•
0.6
0.8
1.0
•
•
•
•
•
• •• •
••••••• • •
•••• •••••
0.0
0.5
1.0
1.5
2.0
2.5
•
•
•
••
• ••
• ••• •••• •
•••••
•
•
0.4
•
•
••
•
•
•
•
0.6
•
•
1.0
0.5
2.0
3.0
(h) Est x True: T=2000, LAD median
1.5
1.5
•
•
•
2.0
0.5
•
•
••
0.2
(g) Est x True: T=2000, LSD mean
1.0
•
•
•
•
(f) Est x True: T=1000, LAD median
•
•
•
(d) Est x True: T=500, LAD median
•
•
•
•
••
•
•
2.5
• ••
•••• ••• •
••••
•••••
0.5
•
••
• •
•
1.0
•
•
1.5
2.0
2.5
Figure 6: Example 2, Gaussian errors. Residual and oracle volatility estimates on the y and x axes, respectively.
13
This evidence is much stronger however for the Student errors case, represented in Figure 7, where an almost perfect line is reached at a sample size
of 2000. There is also a suggestion that using a LAD median estimator in
the first stage leads to more adaptiveness.
(a) Est x True: T=100, LSD mean
1.2
1.0
0.8
0.6
0.4
0.2
0.0
•
•
•
•
•
•
•
• ••••• •• • • •
• • ••
•
0.2
0.3
(b) Est x True: T=100, LAD median
1.4
1.2
1.0
0.8
0.6
0.4
0.2
•
••
•
•
•
• •
0.4
•
•
•
0.5
(c) Est x True: T=500, LSD mean
1.2
1.0
0.8
0.6
0.4
0.2
• •
•
•
••
0.6
0.7
0.8
0.9
1.0
0.8
0.6
0.4
0.2
(e) Est x True: T=1000, LSD mean
3.0
2.5
2.0
1.5
1.0
0.5
•
• •
•
•
• ••• •••• •
•• ••• ••• • ••• •
• ••
0.2
0.4
0.6
0.8
1.0
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
1.2
(g) Est x True: T=2000, LSD mean
3.0
2.5
2.0
1.5
1.0
0.5
•• •
••••••• •••••
•••
0.5
• ••
•
1.0
•
•
1.5
•
2.0
•
•
•
•
••
0.4
•
0.5
•
• •
•
•• • •••
• •• ••
•• ••• •• •
•
•
•
•
••
0.3
0.4
0.5
•
•
•
0.6
0.7
0.8
0.9
•
• •
•
•
• ••• ••• •
•• • ••• •
•
•
•
•
• •••
0.2
0.4
0.6
0.8
•
•
•
•
1.0
1.2
(h) Est x True: T=2000, LAD median
•
•
• •
• •••
•••• • ••• • • • •
0.2
0.3
•
•
••
•
•
(f) Est x True: T=1000, LAD median
•
•
•
•
•
•
(d) Est x True: T=500, LAD median
•
••
• ••
• •• • • • •• • •
• • •• ••
•
•
••• •
0.3
0.4
0.5
•
•
2.5
2.0
1.5
1.0
0.5
2.5
•
•
•
•
• ••
••••• •
•
•
•
•
•
•
•
•• •
0.5
1.0
•
•
1.5
2.0
2.5
Figure 7: Example 2, Student errors. Residual and oracle volatility estimates
on the y and x axes, respectively.
Finally in this example, Figure 8 displays a scatter plot between oracle and
residuals volatility estimates, but this time based on the average of 80 repetitions of 1000 observations each. The virtually perfect line indicates that
in average the residuals estimator is fully adaptive to unknown conditional
median, for sample size of 1000 and the Student error distribution.
14
LAD volatility estimates: t distribution
•
1.6
•
1.4
Estimated Residuals
•
1.2
•
•
1.0
•
• •
0.8
••
0.6
0.4
••
•••••
0.2
0.2
••
••
• ••
0.4
••
•
•
••
•
0.6
••
0.8
1.0
1.2
1.4
1.6
True Residuals
Figure 8: Example 2. Residual and oracle volatility estimates based on a
80-repetition average, with LAD median estimator and sample size 1000.
Example 3: In this third example we consider the model
Xt+1 = 0.235Xt (16 − Xt ) + 0.3εt ,
(24)
where εt ∼ IIDN (0, 1) for one set of simulations, and εt ∼ IIDt(5) for the
other set. Each simulated series has 500 observations, and we generate 200
of them. We estimate the conditional mean and volatility considering the
two-step-ahead prediction, because the conditional volatility is constant for
the one-step-ahead case. In this example, we have 131 equidistant grid points
in the interval [2,15]. Figure 9 shows the true conditional mean functions,
their LSD estimates and two standard-deviation confidence limits.
Nice results are presented for the LAD volatility estimator in parts (b)
and (d) of Figure 10, whereas in parts (a) and (c) (specially in (c)) the LSD
estimator seems to fail to reproduce the central area of the curve. However,
different volatility quantities are measured by these two estimators, and for
further comparisons we address to Figure 11, where the box-plots of the
mean-absolute deviations of the square root of the volatility are shown. There
we can confirm the superior performance of the LAD estimator in the Student
15
case, while the two estimators perform very similarly in the Gaussian case
(at least the bias results look very similar, although there is a slightly better
performance of the LSD estimator in terms of variance).
Mean(t)
(a) LS mean estimates: normal error
16
14
12
10
8
6
4
2
2
4
6
8
10
12
14
Y(t-2)
Mean(t)
(b) LS mean estimates: t error
16
14
12
10
8
6
4
2
2
4
6
8
10
12
14
Y(t-2)
Figure 9: Example 3. LSD estimates of the conditional mean. Results for
Gaussian and Student errors are displayed.
(b) LAD estimates: normal error
2.0
1.5
1.5
Volatility(t)
Volatility(t)
(a) LS estimates: normal error
2.0
1.0
0.5
1.0
0.5
0.0
0.0
2
4
6
8
10
12
14
2
4
6
Y(t-2)
(c) LS estimates: t error
10
12
14
(d) LAD estimates: t error
3.0
3.0
2.5
2.5
2.0
2.0
Volatility(t)
Volatility(t)
8
Y(t-2)
1.5
1.0
0.5
1.5
1.0
0.5
0.0
0.0
2
4
6
8
10
12
14
2
Y(t-2)
4
6
8
10
12
14
Y(t-2)
Figure 10: Example 3. True Conditional Variance, its estimates and two
standard-deviation limits are represented.
16
Figure 12 shows scatter plots between residuals and oracle estimates, in the
same way as seen in Example 2. Again, only one repetition is performed
for sample sizes varying from 100 to 2000. The error distribution is Normal,
and either using LSD or LAD estimates in the first stage, the scatter plots
show very nice results in terms of adaptiveness, apart from the extreme righthand-side, i.e., large estimated volatility values. Even a sample size of 500
observations already performs well.
From Figure 13, we can see that almost the same conclusion can be drawn
for the Student error case. However, unlike Example 2, there is a bit more
of variability present now, when compared to the Normal error distribution.
But again, the straight line starts getting its shape as early as at a 500
sample size (apart from the extreme volatility estimates), suggesting that
the residual and oracle estimators are estimating very similar conditional
volatilities.
(a) Box plots: normal error
0.15
0.10
0.05
LS
LAD
(b) Box plots: t error
0.25
0.20
0.15
0.10
0.05
LS
LAD
Figure 11: Example 3. Mean absolute deviations of the LSD and LAD
volatility estimates.
As a last indication of adaptiveness, Figure 14 represents one more scatter
plot between residual and oracle volatility estimates. These are based on an
17
average of 200 repetitions of sample sizes equal to 500. The LAD estimator of
the conditional median is applied in the first stage, and the error distribution
is Student.
(a) Est x True: T=100, LSD mean
1.0
0.8
0.6
0.4
0.2
0.0
• •
••
•
••
• •
• • • • • • ••
• •• ••
•
•
••
•
•
•
•
•
••
• •• •• ••••• •••••••••
•••••••••••••••• ••••••••• • • • • • • • • • ••
••
• •• •• • • ••• • • •••
0.0
0.1
0.2
0.3
(b) Est x True: T=100, LAD median
0.4
0.3
0.2
0.1
0.0
(c) Est x True: T=500, LSD mean
0.8
0.6
0.4
0.2
0.0
•• •
••
• •• •• ••
•
•
•
•• •• ••
• ••• • •• •••••• •• •
•
0.4
•
•
• •
•
•
•• •••• • •••• •• • •
• • •••• •••
•••••••• •••
•
0.2
•
•
•
•
• ••••••
••••• ••••
0.0 • •••••• ••
0.0
0.1
0.2
0.3
0.4
0.5
0.6
(f) Est x True: T=1000, LAD median
•
•
• ••
•••
•
0.0
•
• • ••
• • ••• • • • • •• ••
• • ••
•
•
•
•
•
•
••
• ••• • •••• •• ••
• • • ••
•••••••••••••••• • • • • • •
••
•••••••
•• ••••••••
••••••••
0.1
0.2
0.3
0.4
0.5
0.6
0.8
0.6
0.4
0.2
0.0
(g) Est x True: T=2000, LSD mean
0.8
0.6
0.4
0.2
0.0
••
••
••• • •••
•• •
••• ••••• •
•
•
•
•
•
•••
•
•
•
•
• •• • • ••••
•
•••
• ••
•••••••••••••• • • •••• • ••• •• • • • •
• • ••
• •• •••
• •• • • • • •••
0.0
0.1
0.2
0.3
•
(d) Est x True: T=500, LAD median
• •
•
• •• • •
• • •• •• • • • • •••
•• ••
• •
•
• •• • •
•
•••• •• • ••• ••• •
•
•
•
•
•
• ••• • • •
••
•
• •• • • •• • • •
•• • •••••• •• • •• •
•••••• •••• •
•
•••••
•
0.0
0.1
0.2
0.3
0.4
0.5
(e) Est x True: T=1000, LSD mean
1.2
1.0
0.8
0.6
0.4
0.2
0.0
•
•
••
•
0.0
••
• ••
•
•
•
•• • • •• • ••
• •
••
• ••• • •• •• • •
•
•••
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•••• • • • •••• • • • • • • •
•
•
•
•
•
•
••
•••••• • •• • •
••••••
•••••
•••••
0.2
0.4
• •••
••
•
0.0
••
• • ••••• •• • • •• • •••
• • ••
• • • ••
•
•
•
•
•
•
•
•
•
•
•
••
• •• •• •••• •• ••••••• •• •
•••••••••• ••
•• •••••• • •• •
•••••••
••••••••••••••
0.1
0.2
0.3
0.4
0.5
0.6
(h) Est x True: T=2000, LAD median
0.8
0.6
0.4
0.2
0.0
0.6
••
••
• • ••• •
•
•••••• • •• • • •• • • •
•• •••••••
••••• • • • •• ••• •
•• ••
• •••• • •• •• ••
•
•
•• • •
••
•••••••••••••••• •••
• ••••
0.0
0.2
0.4
0.6
•
•••
Figure 12: Example 3, Normal errors. Residual x Oracle volatility estimates
on the y and x axes, respectively.
18
(a) Est x True: T=100, LSD mean
(b) Est x True: T=100, LAD median
•
•
• •• ••
•
•
•
1
• •• •
• • •• • • •
• •• • • ••• ••••
• • • ••• • • •
••••••••••••••••• ••••••
• •• • •
•
•••
•
• •• ••• ••••• ••• ••
•••
0 •
• ••
•
•
0.0
0.2
0.4
0.6
0.8
1.0
2
0.4
0.3
0.2
0.1
0.0
•
0.0
(c) Est x True: T=500, LSD mean
1.0
0.8
0.6
0.4
0.2
0.0
•
0.0
•
0.0
0.8
•• •
•
•• • •
•
•• •
•
•
•
•
•
•
• • • •• •
• •• •
•
•
•
••
•• •••
••• •• •••••• •• ••• • • ••• •• •• •••• • • •
• ••• • •••••• • • ••• ••
•••••••••••
••••••••••• ••
0.2
0.4
0.6
0.4
0.6
0.8
1.0
••
• •• ••
•
••
•
•
• •
•
• ••• ••• • •• • • •• ••• • • ••• •••• •••
•
•
••• •
• • •• • • • •
• ••••••• ••
• ••••••••••••••••••
••••••••••
••••••••••••
0.6
0.4
0.2
•
0.0
0.0
0.2
0.4
0.6
(f) Est x True: T=1000, LAD median
• • ••
• •• •
• ••••••
•
•• • • • •• •
•• •
• •• •
•
•• •
••• •••••• • • • • •• • • •• •• •
••
•
•
•
•
•
•
•• • • •• •
•• •
•••• ••• ••• • •
•••••••
•••••••••
•••
•••••••
0.2
0.4
0.6
1.0
0.8
0.6
0.4
0.2
0.0
•
• •• •••
•
•• • •••••
•
• ••
•
• • • •• • •• •
•• •
•
•
•• •••• • • •• • • •• • • •
•
••
•
•
•
••
•••
• ••
• • •• •
•••••••• •••• • •
••••••••
•••••
••••••••••••
0.2
0.4
0.6
•
0.0
(g) Est x True: T=2000, LSD mean
0.8
0.6
0.4
0.2
0.0
0.2
(d) Est x True: T=500, LAD median
(e) Est x True: T=1000, LSD mean
1.0
0.8
0.6
0.4
0.2
0.0
• •••
••••
••• • • • •
•• •• • • • • • • • • •• • • • • • • •
• • • • ••
•••
• •• • •• • • •••
•• ••
•
•
• • •• •
•••••••••••••••• •• • • • ••
•
•
•
•
• • •• • • • •
(h) Est x True: T=2000, LAD median
•
•
•
•• • • • • • •• •
•
•
••
•
• ••
•
• ••
•
• •
• ••• • • •
• •••• •••• ••••• • • • • •
• • ••
• • •• ••••••
••• •••••••••• • •
• •••••
0.0
0.2
0.4
0.6
0.8
•
•
•
•
•
•
•
•
•
••• ••••• • • • •• •• • • • • •• •
••
•
• • •••• •• • ••
• •
• • ••••••• ••• • •
•
•
•
• • •••••••• •
• ••••••••
•••••
• ••••
0.0
0.2
0.4
0.6
0.8
0.8
0.6
0.4
0.2
0.0
Figure 13: Example 3, Student errors. Residual x Oracle volatility estimates
on the y and x axes, respectively.
LAD volatility estimates: t distribution
0.6
•
Estimated Residuals
0.5
0.4
••
0.3
0.2
0.1
•
• • • ••
•• •
• •• •
• •
•
••
•••
• • ••••••••••
•
••
• • • •••
• ••••• • •
•
•
•
••
• •
• • •• • • • •
•
•
• • ••
• • • •
•
• • •
• • •
• • •
••• •• • •
•
•
•
• ••••
• ••••••
•••
0.1
0.2
0.3
0.4
0.5
0.6
True Residuals
Figure 14: Example 3. Residual and oracle volatility estimates based on a
200-repetition average, with LAD median estimator and sample size 500.
19
5
Analysis of S&P500 and BOVESPA Returns
The LAD volatility estimator is now applied to the S&P500 and BOVESPA
index returns. The S&P500 index is evaluated over the period from October
25, 1988 to April 8, 1998, whereas the BOVESPA index runs from May 27,
1986 to October 16, 2000. The returns are defined as yt = log xt − log xt−1 ,
where xt is the index price at time t.
In part (a) of Figure 15 the S&P500 returns are plotted, followed by their
historical conditional volatility estimates in parts (b)-(d). In part (b), the
LSD volatility estimates are reported, whereas in (c) and (d) LAD estimates
are shown (in the latter case based on the logarithms of the squared residuals). It looks as though LAD estimates are too low for extreme past returns
(outliers). The sparseness of points in that area seems to have a rather strong
effect on the LAD volatility estimates.
Figure 16 show the conditional volatility estimates in the grid [-0.02,0.02].
Part (a) represents the LAD estimates, (b) the LSD estimates, and (c) the
LAD estimates computed using the logarithms of the residuals. All three
estimators reflect the asymmetry of financial time series, often commented
on in this work. Whereas the LSD estimates are smoother than the LAD at
the negative-end-side of the grid, LAD estimates seem slightly more robust
at the positive-end-side of the grid. Also note that LAD and log-based LAD
estimates capture similar levels of volatility.
Turning to the BOVESPA returns analysis, they are plotted in Figure 17,
along with their historical volatility estimates for the same three estimators
as for the S&P500 returns. Interestingly, this time both the LAD and the
LSD estimates, represented in parts (b) to (d), seem to provide low values for
extreme returns across time. Also, apart from measuring different volatility
scales, these estimates show similar results.
Figure 18 plots the conditional volatility estimates of the BOVESPA returns over the grid [-0.15,0.15]. All three estimators seem to do a good work
in reflecting the asymmetry of volatility. It only does not happen on the
extreme sides of the grid, where the sparseness of data points plays a central
and disturbing role on the resulting estimates. As a final remark, one is
reminded that the conditional volatility measures seen in Figures 15 and 17
can be used to construct confidence intervals in risk analysis. We will not
perform this illustration here though.
20
(a) S&P500 returns
0.04
0.02
0.0
-0.02
-0.04
-0.06
26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97
(b) LSD estimates of conditional std.dev.
0.012
0.010
0.008
0.006
0.004
26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97
(c) LAD estimates of conditional std.dev.
0.008
0.006
0.004
0.002
0.0
26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97
(d) LAD estimates of conditional std.dev. of logs
0.008
0.006
0.004
0.002
0.0
26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97
Figure 15: S&P500 returns and historical volatility estimates.
21
(a) LAD est. of cond. volatility
0.0050
0.0048
0.0046
0.0044
0.0042
0.0040
0.0038
-0.02
-0.01
0.0
returns
0.01
0.02
(b) LSD est. of cond. volatility
0.0100
0.0095
0.0090
0.0085
0.0080
0.0075
-0.02
-0.01
0.0
returns
0.01
0.02
(c) LAD est. of cond. volatility: logs
0.0055
0.0050
0.0045
0.0040
-0.02
-0.01
0.0
returns
0.01
Figure 16: S&P500 volatility estimates.
22
0.02
(a) BOVESPA returns
0.3
0.2
0.1
0.0
-0.1
-0.2
28/05/86
28/05/88
28/05/90
28/05/92
28/05/94
28/05/96
28/05/98
28/05/100
(b) LSD estimates of conditional std.dev.
0.08
0.06
0.04
0.02
0.0
28/05/86
28/05/88
28/05/90
28/05/92
28/05/94
28/05/96
28/05/98
28/05/100
(c) LAD estimates of conditional std.dev.
0.04
0.03
0.02
0.01
0.0
28/05/86
28/05/88
28/05/90
28/05/92
28/05/94
28/05/96
28/05/98
28/05/100
(d) LAD estimates of conditional std.dev. of logs
0.05
0.04
0.03
0.02
0.01
0.0
28/05/86
28/05/88
28/05/90
28/05/92
28/05/94
28/05/96
28/05/98
28/05/100
Figure 17: BOVESPA returns and historical volatility estimates.
23
(a) LAD est. of cond. volatility
0.045
0.040
0.035
0.030
0.025
0.020
0.015
-0.15
-0.10
-0.05
0.0
returns
0.05
0.10
0.15
0.10
0.15
(b) LSD est. of cond. volatility
0.08
0.07
0.06
0.05
0.04
0.03
-0.15
-0.10
-0.05
0.0
returns
0.05
(c) LAD est. of cond. volatility: logs
0.05
0.04
0.03
0.02
-0.15
-0.10
-0.05
0.0
returns
0.05
0.10
Figure 18: BOVESPA volatility estimates.
24
0.15
References
[1] D. Bosq. Nonparametric Statistics for Stochastic Processes. SpringerVerlag, New York, 1998.
[2] A. W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data
Analysis. Oxford University Press, New York, 1997.
[3] Z. Cai and X. Zhu. Nonparametric quantile estimations for dynamic
smooth coefficient models. Technical report, Departament of Mathematics and Statistics, University of North Carolina, 2004.
[4] J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications.
Chapman and Hall, London, 1996.
[5] J. Fan and P. Hall. On curve estimation by minimizing mean absolute
deviation and its implications. Annals of Statistics, 22:867–885, 1994.
[6] J. Fan and Q. Yao. Efficient estimation of conditional variance functions
in stochastic regression. Biometrika, 85:645–660, 1998.
[7] P. Hall, Peng L., and Q. Yao. Prediction and nonparametric estimation
for time series with heavy tails. Journal of Time Series Analysis, 23:313–
331, 2002.
[8] W. Härdle. Robust regression function estimation. Journal of Multivariate Analysis, 14:169–180, 1984.
[9] W. Härdle and T. Gasser. Robust nonparametric function fitting. Journal of the Royal Statistical Society, Series B, 46:42–51, 1984.
[10] R. Koenker and Z. Xiao. Unit root quantile autorregression inference.
Journal of the American Statistical Association, 99:775–787, 2004.
[11] D. Pollard. Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7:186–199, 1991.
[12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery.
Numerical Recipes in C. Cambridge University Press, Cambridge, 1992.
[13] B. W. Silverman. Density Estimation for Statistics and Data Analysis.
Chapman and Hall, London, 1986.
25
[14] J. S. Simonoff. Smoothing Methods in Statistics. Springer-Verlag, New
York, 1996.
[15] Y. K. Truong. Asymptotic properties of kernel estimators based on local
medians. Annals of Statistics, 17:606–617, 1989.
[16] Y. K. Truong and C. J. Stone. Nonparametric function estimation involving time series. Annals of Statistics, 20:77–97, 1992.
[17] M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall,
London, 1995.
[18] F. T. Wang and D. W. Scott. The L1 method for robust nonparametric
regression. Journal of the American Statistical Association, 89:65–76,
1994.
[19] A. H. Welsh. Robust estimation of smooth regression and spread functions and their derivatives. Statistica Sinica, 6:347–366, 1996.
[20] Q. Yao and H. Tong. Asymmetric least squares regression estimation:
a nonparametric approach. Nonparametric Statistics, 6:273–292, 1996.
[21] F. A. Ziegelmann. Nonparametric estimation of volatility functions: the
local exponential estimator. Econometric Theory, 18:985–991, 2002.
26