A Nonparametric Least-Absolute-Deviations Estimator of Volatility Functions Flavio A. Ziegelmann ∗ Universidade Federal do Rio Grande do Sul (Brazil) [email protected] July 2005 Abstract An important empirical characteristic of financial time series is that the unconditional distribution of the returns tends to possess heavy tails. This is the motivation for the particular local kernel volatility estimator proposed in this work. Whereas least-squaredeviations (LSD) estimators are strongly affected by heavy-tailed distributions, the performance of least-absolute-deviations (LAD) estimators is not. This robustness to heavy tails is evidenced by the more flexible assumptions made on the distributional moments of the observable variable. The simulation examples also highlight the superior performances of the LAD estimator when compared to the LSD estimator under heavy tails conditions. The full nonparametric model is described and the asymptotic properties of the LAD estimator are derived. Extensive Monte Carlo studies strongly suggest that the LAD estimator is asymptotically adaptive to the unknown conditional first moment. The LAD estimator is also used to estimate the volatility of the S&P500 and the BOVESPA returns. 1 Introduction Empirical evidence has suggested that returns of financial time series possess heavy-tailed marginal distributions. Therefore, these returns may not have ∗ Supported by a grant from CAPES (Brazil) 1 finite higher moments, strongly affecting or even preventing the use of leastsquare-deviations (LSD) based estimators. In a nonparametric context, kernel smoothing techniques provide estimators with nice asymptotic properties. Silverman (1986), Wand and Jones (1995) and Fan and Gijbels (1996) give a comprehensive coverage of these techniques, while Simonoff (1996) and Bowman and Azzalini (1997) present a more practical approach. In this context, the inadequacy of local LSD methods has led researchers to search for more robust alternatives. Some of these robust methods have their roots in the approaches suggested by Härdle (1984) and Härdle and Gasser (1984). Truong (1989) and Fan and Hall (1994) study local median smoothing for independent data. Truong and Stone (1992) address robust nonparametric regression for time series data. Wang and Scott (1994) develop L1 methods for robust nonparametric regression. Welsh (1996), and more recently, Cai and Zhu (2004) and Koenker and Xiao (2004) describe robust quantile methods. In this article, we present the residual local linear least-absolute-deviations (LAD) estimator of the conditional volatility, which is not the conditional variance, to accommodate the presence of heavy-tailed distributions in financial time series. Hall, Peng and Yao (1999) use the LAD estimator to estimate the conditional median function of the response variable Yt in a time series setting. In terms of estimation, we follow a two-step residual method as in Fan and Yao (1998) and Ziegelmann (2002). After estimating the conditional mean (or median) at a first step, we estimate the conditional volatility, at a second step. For that, the squared residuals obtained from fitting the conditional mean (median) at the first step are used as the new response variable. This paper is organised as follows: in Section 2, we present our model and describe the LAD estimator of the conditional volatility function. In Section 3, its theoretical asymptotic properties are established. Section 4 implements three simulation studies for three different models: a zero-mean ARCH(1) model, a non-zero-mean threshold or asymmetric ARCH(1) model, and a non-zero-mean nonlinear time series model analysed previously in Ziegelmann (2002). All examples show a superior performance of the LAD over the LSD estimator when the fourth conditional moment of Y given X does not exist. Moreover, extensive Monte Carlo analysis of the second and third numerical examples strongly suggest that the LAD volatility estimator is asymptotically adaptive to the unknown conditional location quantity (either mean or median) estimated at the first step. In the fourth and fi2 nal section we apply the LAD estimator to S&P500 and BOVESPA index returns, comparing its performance to the LSD estimator. 2 The Model and the LAD Volatility Estimator In this model the volatility function is the conditional median of the squared residuals, rather than the traditional conditional mean. In the first step we model the conditional mean of Y ; the alternative modelling of its conditional median is addressed later on nevertheless. Let {(Yt , Xt )} be a two-dimensional strictly stationary process, having the same marginal distribution as (Y, X). Let m(x) = IE(Y |X = x) and τ (x) = M{[Y − m(x)]2 |X = x} > 0, where M{Y |X} is the conditional median of Y given X. Then, we write our model as Yt = m(Xt ) + τ 1/2 (Xt )εt , (1) where IE(εt |Xt ) = 0 and M(ε2t |Xt ) = 1. The local linear LSD estimator of m(x) is denoted by m̂(x), having its explicit form defined by m̂(x) = T −1 T X [ŝ2 (x) − ŝ1 (x)(Xt − x)] Kh (Xt − x)Yt ŝ2 (x)ŝ0 (x) − ŝ1 (x)2 t=1 , (2) where the kernel K(·) is a symmetric density function on <, with Kh (u) = (1/h)K(u/h), h > 0 is the bandwidth, or smoothing parameter, and ŝr (x) = T −1 T X (Xt − x)r Kh (Xt − x) . t=1 Our main interest in this paper is to estimate the conditional volatility function, τ (·), in a second step, using the residual estimator. From model (1) we can see that rt2 = (Yt − m(Xt ))2 = τ (Xt )ε2t . (3) Taking the conditional median of the squared residuals in equation (3), we obtain M (rt2 |Xt ) = τ (Xt ). This suggests to estimate τ (x) from a LAD regression of the estimated squared residuals r̂t2 against Xt . 3 Therefore, the estimator of the conditional volatility (in this case, the conditional median of [Yt − m(Xt )]2 ) is τ̂ (x) = γ̂1 , the solution for γ1 in the following weighted least-absolute deviations problem: (γ̂1 , γ̂2 ) = arg min γ1 ,γ2 T X |r̂t2 − γ1 − γ2 (Xt − x)|Kh2 (Xt − x) , (4) t=1 where h2 > 0 is the second-step bandwidth, and K is a symmetric probability density. 3 Asymptotic Properties To assess the theoretical asymptotic properties of the LAD volatility estimator, we first introduce some notation. Let f (·) denote the marginal density of Xt , g(·|x) be the conditional density function of Yt given Xt = x, and gτ (x) = g{τ (x)|x}. Also, τ̇ (x) =R ∂τ (x)/∂x and τ̈ (x) = ∂ 2 τ (x)/∂x2 . MoreR 2 over, let κ1 = K (u)du, σ0 = u2 K(u)du, and σ 2 (x) = κ1 /[4f (x)gτ2 (x)]. Some regularity conditions are now enumerated: (C1) For fixed x, f (x) > 0 and g(y|x) > 0 is continuous at y = τ (x). In a neighbourhood of x, τ (·) has two continuous derivatives, both f (·) and g(y|·) have continuous first derivatives, and IE(|Yt |δ |Xt = ·) < ∞ for some δ ≥ 2. (C2) The kernel K is a symmetric, compactly supported density, satisfying |K(x1 ) − K(x2 )| ≤ C|x1 − x2 | for x1 , x2 . (C3) The process {(Xt , Yt )} is strongly mixing, i.e., α(v) ≡ sup u A∈=∞ u+v ,B∈=1 |P(A)P(B) − P(AB)| → 0 as v → ∞, where =vu denotes the σ-field generated by {(Xt , Yt ) : u ≤ P t ≤ v}. Further, v≥1 α(v)δ0 /(1+δ0 ) < ∞ for some δ0 ∈ [0, 1). (C4) As T → ∞, h2 → 0 and lim inf T →∞ T h32 > 0. We are now ready to state the following result: 4 Theorem 1 Consider the conditional mean function m(·) known. Thus, if regularity conditions (C1)-(C4) hold, as T → ∞, 1 1 (T h2 ) 2 {τ̂ (x) − τ (x) − h22 σ0 τ̈ (x)} 2 (5) is asymptotically normal with mean 0 and variance σ 2 (x). Proof: We need to prove the asymptotic properties of τ̂ (x) = γ̂1 , the solution for γ1 in (4). The idea of the proof is as follows: first we construct a new quadratic function whose minimisers θ̂1 and θ̂2 are one-to-one functions of the minimisers in (4), γ̂1 and γ̂2 . Then, we establish the asymptotic properties of θ̂1 and θ̂2 . Finally, we prove that our original estimator γ̂1 is close enough to the new minimiser θ̂1 , so sharing the same asymptotic properties. Let Kt = K{(Xt −x)/h}, Zt = (1, (Xt −x)/h)0 , rt2∗ = rt2 −τ (x)−τ̇ (x)(Xt − x) and θ̂ = (T h)1/2 (τ̂ (x) − τ (x), h{τ̇ˆ(x) − τ̇ (x)})0 . For θ = (θ1 , θ2 )0 ∈ <2 , define G(θ) = T X {|rt2∗ − (θ 0 Zt /(T h)1/2 )| − |rt2∗ |}Kt , (6) t=1 where θ̂ is the minimiser of θ in G(θ). Hence, if we establish the asymptotic properties of θ̂1 , so do we for τ̂ = γ̂1 . Let now dt = θ 0 Zt /(T h)1/2 (assuming that dt ≥ 0, without loss of generality) and consider the following decomposition: {|rt2∗ − dt )| − |rt2∗ |}Kt = [rt2∗ {I(rt2∗ > dt ) − I(rt2∗ ≤ dt ) − I(rt2∗ > 0) + I(rt2∗ ≤ 0)} +dt {I(rt2∗ ≤ dt ) − I(rt2∗ > dt )}]Kt = −dt Kt D(rt2∗ ) + 2Kt (dt − rt2∗ )I(0 ≤ dt − rt2∗ < dt ), (7) where D(y) = I(y > 0) − I(y < 0). Then if we write St = [|rt2∗ − (θ 0 Zt /(T h)1/2 )| − |rt2∗ | + dt D(rt2∗ )]Kt , (8) we can use result (7) to obtain (after some calculations, and noting that (C4) implies that dt → 0) IE(St ) = T −1 f (x)gτ (x)(θ12 + θ22 σ0 ) + o(T −1 ) . 5 (9) Now considering that T X St = G(θ) + t=1 T X dt Kt D(rt2∗ ) , (10) t=1 we introduce the new function R(θ) = T X St − f (x)gτ (x)(θ12 + θ22 σ0 ) t=1 = G(θ) + T X dt Kt D(rt2∗ ) − f (x)gτ (x)(θ12 + θ22 σ0 ) , (11) t=1 which is quadratic on θ1 and θ2 . P The next step is to prove that R(θ) → 0. Firstly, in view of equation (9), it is straightforward to see that IE(R(θ)) → 0 . P (12) P Therefore, we only need to show that Tt=1 [St − IE(St )] → 0. Since IE( t=1 [St − IE(St )]) = 0, then, applying Chebyshev’s inequality, we have PT ¯ (¯ T ) ¯X ¯ ¯ ¯ P ¯ [St − IE(St )]¯ > ε ¯ ¯ T X 1 ≤ 2 V ar( St ) ε t=1 t=1 (13) ( ) T TX −1 1 X 2 ≤ 2 IE(St ) + 2 (T − t)Cov(S1 , St+1 ) . ε t=1 t=1 In order to tackle the first sum on the right-hand-side of inequality (13), note that ||a + b| − |a| − D(a)b| ≤ 2|b|I(|a| < |b|) , K(·) has a compact support, and gτ (·) > 0. Hence, it follows (after some algebraic) that for any γ ≥ 0 2(1+γ) IE(St ) ≤ n o 2 2(1+γ) 2(1+γ) 2∗ 1/2 0 I E (θ Z ) K I(|r | < C/(T h) ) t t t (T h)1+γ 3 1 = O(T −( 2 +γ) h−( 2 +γ) ) . (14) Thus, using the previous relation and Corollary 1.1 of Bosq (1998, p. 21) for the second sum on the right-hand-side of (13), we have that ¯ (¯ T ) ¯X ¯ ¯ ¯ P ¯ [St − IE(St )]¯ > ε = O{(T h)−1/2 + T −1/[2(1+δ0 )] h−[1+2δ0 ]/[2(1+δ0 )] } , ¯ ¯ t=1 (15) 6 which converges to 0, since the second term on the right-hand side is of smaller order than (T h3 )−1/[2(1+δ0 )] (see condition (C4)). From this, we can also conclude that G(θ) + T X P dt Kt D(rt2∗ ) → f (x)gτ (x)(θ12 + θ22 σ0 ) . t=1 In addition, using the same arguments as Pollard (1991, p. 193) and Yao and Tong (1996, p. 289), we may show that the difference between the θ that minimises G(θ) and the minimiser of − T X dt Kt D(rt2∗ ) + f (x)gτ (x)(θ12 + θ22 σ0 ) t=1 converges to 0 in probability. Deriving the previous expression in relation to θ1 , we obtain √ T X 1 T h{τ̂ (x) − τ (x)} = √ D(rt2∗ )Kt + op (1) . 2 T hf (x)gτ (x) t=1 (16) Then, after some more algebraic, we have IE{D(rt2∗ )Kt } = σ0 h3 f (x)gτ (x)τ̈ (x) + o(h3 ) and (17) Z IE{D(rt2∗ )Kt }2 = f (x)h K 2 + o(1) . (18) The asymptotic normality follows from Theorem 1 of Doukhan (1994, page 46). Finally, putting results (16), (17) and (18) all together, the proof is finished. Discussion of the regularity conditions: In (C1), in the case of first stage LAD estimation, the condition δ ≥ 2 can be extended to δ > 0, allowing consequently that IE(Y |X = x) = ∞. For this, we can substitute Y by a truncated variable Y − (with finite expectation), and show that some quantities involving Y are close enough to the same quantities with Y replaced by Y − , for 0 < δ < 1. (C2) can be relaxed in the sense that non-compact functions with light tail weights are admitted, in special the Gaussian kernel. Also note that the strong mixing condition in (C3) is a mild one. 7 Unknown Conditional Mean: Theorem 1 is proved for a known m(·), for technical simplicity. In our second and third numerical examples though, the conditional mean is estimated from the data, and a posterior analysis on the adaptiveness of the LAD estimator to unknown conditional mean (or median) is carried out. M{Yt |Xt } estimator: If we consider using LAD estimators in both stages of our estimation procedure, i.e., to estimate both the location and the volatility functions, a slight alteration in our model is necessary. As before, let {(Yt , Xt )} be a two-dimensional strictly stationary process, having the same marginal distribution as (Y, X). Now let m∗ (x) = M(Y |X = x) and τ (x) = M{[Y − m∗ (x)]2 |X = x} > 0, where M{Y |X} is the conditional median of Y given X. Then, we can write our model as Yt = m∗ (Xt ) + τ 1/2 (Xt )εt , (19) where M(εt |Xt ) = 0 and M(ε2t |Xt ) = 1. Hence, the properties of the LAD estimator of m∗ (·) are equivalent to those of the LAD estimator of the τ (·) function discussed previously, although τ (·) itself is different from the previous setting. 4 Numerical Implementations We present three simulation studies. The first example does not involve the two-stage estimation procedure, since it is a zero-mean ARCH(1) model where the true mean is assumed to be known. On the other hand, the second and third examples do involve the two-stage procedure, leading to the residual volatility estimator. In all three examples we simulate from both non-heavy and heavy-tailed distributions. In the latter case the error distributions are chosen in a way that the fourth conditional moment of the observable variable is not defined. To perform the practical estimations of the LAD estimator we again use the downhill simplex algorithm (see Section 10.4 in Press, Teukolsky, Vetterling and Flannery, 1992), since there is no closed analytical expression for this estimator. The cross-validation method is used to automatically select the bandwidth, and the Gaussian kernel is adopted. Example 1: We simulate from the ARCH(1) model, specified as below: yt = σt εt 8 (20) 2 σt2 = 0.2 + 0.5yt−1 . (21) We examine two cases: εt ∼ N (0, 1) and εt ∼ t(3). For each case, each simulated series has 2000 observations and we replicate 100 of these series. For the estimation procedures we consider the true conditional mean as known. We estimate the conditional volatility at time t for each of the 41 equidistant points Y (t − 1) = y in the grid [-2,2]. For each generated series we compute the mean absolute deviations, given by δi = 41 1 X |σ̂ 2 (xj ) − σ 2 (xj )|, i = 1, 2, . . . , 100 , 41 j=1 where xj is the jth grid point. Figure 1 shows the true conditional variance function, its estimates and two standard-deviation limits. It is clear from comparing part (a) to (b) that in the case of Gaussian errors the LSD estimates have smaller variance and seem to have smaller bias as well, at least near the boundaries. It is also noticeable that in the case of heavy tails, shown in parts (c) and (d), the LAD estimates exhibit much smaller variance, with less bias in the centre but larger in the tails (although the bias might be a result from extra smoothing). One should be reminded that the variance of the LSD estimator is not defined for the t-distribution case. (b) LAD vol.est.: normal error 4 3 3 Variance(t) Variance(t) (a) LSD vol.est.: normal error 4 2 1 2 1 0 0 -2 -1 0 1 2 -2 -1 Y(t-1) (c) LSD vol.est.: t error 1 2 (d) LAD vol.est.: t error 4 4 3 3 Variance(t) Variance(t) 0 Y(t-1) 2 1 2 1 0 0 -2 -1 0 1 2 -2 Y(t-1) -1 0 1 2 Y(t-1) Figure 1: Example 1. True conditional variance, its estimates and two standard-deviation confidence intervals. 9 In Figure 2, we can observe the box-plots of the mean-absolute deviations, comparing LSD and LAD estimates in both cases, Gaussian errors in part (a) and Student errors in part (b). Our insights from the previous figure are reflected in a global point of view. Remarkably superior performances in terms of both bias and variance are obtained by the LAD estimates when heavy-tailed errors are present, whereas, as expected, LSD performs better under Gaussian errors. (a) Box plots: normal error 0.6 0.5 0.4 0.3 0.2 0.1 0.0 LS LAD (b) Box plots: t error 0.6 0.5 0.4 0.3 0.2 0.1 0.0 LS LAD Figure 2: Example 1. Mean absolute deviations of LSD and LAD volatility estimates. Example 2: The second example is an extension of the preceding, since it retains the ARCH structure of the conditional variance but adds some more complexity to the model. The model is now defined as yt = 0.2yt−1 + σt εt 2 , σt2 = 0.2 + [0.5 − 0.3I(yt−1 > 0)]yt−1 (22) (23) which can be seen as a non-zero mean threshold (or asymmetric) ARCH(1) model. The threshold in the variance equation incorporates into the model 10 the asymmetry in financial data, i.e., negative past returns lead to more variability than positive ones, both with the same absolute value. The grid is [-1.8,1.8], with 36 equidistant points. We run 80 repetitions, each of them with 1000 observations. The same error distributions of Example 1 are used. Also the same mean-absolute deviations are investigated. Figure 3 shows the true conditional mean (the same as the conditional median in this case) and its respective estimates. LSD and LAD estimates are provided, for Gaussian and Student errors. 0.4 0.4 0.2 0.2 0.0 0.0 -0.2 -0.2 -0.4 -0.4 -2 -1 0 1 2 -2 0 1 Y(t-1) (c) LS estimates: t error (d) LAD estimates: t error 0.4 0.4 0.2 0.2 0.0 -0.2 -0.4 -0.4 -1 0 1 2 -2 Y(t-1) 2 0.0 -0.2 -2 -1 Y(t-1) Median(t) Mean(t) (b) LAD estimates: normal error Median(t) Mean(t) (a) LS estimates: normal error -1 0 1 2 Y(t-1) Figure 3: Example 2. LSD and LAD estimates of the conditional mean for Gaussian and Student error distributions. Figure 4 plots the true volatility function, as well as its estimates and twostandard-deviation confidence intervals. Numerical difficulties of obtaining LSD estimates for the Student error distribution were faced in this implementation, specially near the discontinuity point and on the boundaries. Whereas comparing part (a) to (b) seems to reveal a slightly better performance of the LSD estimator, a comparison between parts (c) and (d) uncovers a huge superiority of the LAD estimator over the LSD. This enhances the robustness of the LAD volatility estimator against heavy-tailed distributions. In a similar way to Example 1, the global figures of the box-plots shown in Figure 5 indicate better results for the LAD estimator in the case of Student 11 distributed errors, while the LSD estimator is superior in the Gaussian case. (b) LAD estimates: normal error 4 3 3 Variance(t) Variance(t) (a) LS estimates: normal error 4 2 1 2 1 0 0 -2 -1 0 1 2 -2 -1 Y(t-1) (c) LS estimates: t error 1 2 (d) LAD estimates: t error 4 4 3 3 Variance(t) Variance(t) 0 Y(t-1) 2 1 2 1 0 0 -2 -1 0 1 2 -2 -1 Y(t-1) 0 1 2 Y(t-1) Figure 4: Example 2. True conditional variance, its estimates and two standard-deviation limits are represented. (a) Box plots: normal error 0.6 0.5 0.4 0.3 0.2 0.1 0.0 LS LAD (b) Box plots: t error 0.6 0.5 0.4 0.3 0.2 0.1 0.0 LS LAD Figure 5: Example 2. Mean absolute deviations of LSD and LAD volatility estimates. 12 An important question to be answered is whether the LAD volatility estimator is adaptive to the unknown conditional mean (or median). In order to address this issue numerically, we compare the oracle volatility estimates (those computed using the true residuals in the second stage) to those obtained from the residual estimator. We simulate observations for both error distributions, Gaussian and Student. We also carry out Monte Carlo simulations for two possible scenarios, LSD and LAD conditional mean (median) estimates. Only one repetition for each sample size is performed, and we use sample sizes of 100, 500, 1000 and 2000 observations. The scatter plots between the series of estimates for the two estimators are reported in Figure 6 for the Normal case. The plots on the left-hand-side are based on a LSD first stage mean estimator, whereas on the right-hand-side are those based on a LAD median (median and mean are equal in this case) estimator. Both sides suggest that as the sample size increases so does the correlation between oracle and residual estimates. (a) Est x True: T=100, LSD mean 0.6 0.5 0.4 0.3 0.2 •• • ••• •• • ••• • • • • •• 0.2 • • • • • • • • •• • • (b) Est x True: T=100, LAD median 0.3 0.2 0.1 0.0 0.4 0.6 0.8 1.0 ••• • •• • •••• •• 0.2 (c) Est x True: T=500, LSD mean 5 4 3 2 1 0 • 4 3 2 1 0 • • • • •• •••• ••• • • •••• •••••• 0.0 0.5 1.0 1.5 2.0 2.5 3.0 (e) Est x True: T=1000, LSD mean 1.2 1.0 0.8 0.6 0.4 0.2 •• • • • • •• • •• •• • • • •••• •••••• • 0.2 • • • 0.4 1.2 1.0 0.8 0.6 0.4 0.2 • • 0.6 • 1.5 • •• •• • •••• • ••• • • • • • ••• • 0.5 1.0 • • 0.4 • 0.6 0.8 1.0 • • • • • • •• • ••••••• • • •••• ••••• 0.0 0.5 1.0 1.5 2.0 2.5 • • • •• • •• • ••• •••• • ••••• • • 0.4 • • •• • • • • 0.6 • • 1.0 0.5 2.0 3.0 (h) Est x True: T=2000, LAD median 1.5 1.5 • • • 2.0 0.5 • • •• 0.2 (g) Est x True: T=2000, LSD mean 1.0 • • • • (f) Est x True: T=1000, LAD median • • • (d) Est x True: T=500, LAD median • • • • •• • • 2.5 • •• •••• ••• • •••• ••••• 0.5 • •• • • • 1.0 • • 1.5 2.0 2.5 Figure 6: Example 2, Gaussian errors. Residual and oracle volatility estimates on the y and x axes, respectively. 13 This evidence is much stronger however for the Student errors case, represented in Figure 7, where an almost perfect line is reached at a sample size of 2000. There is also a suggestion that using a LAD median estimator in the first stage leads to more adaptiveness. (a) Est x True: T=100, LSD mean 1.2 1.0 0.8 0.6 0.4 0.2 0.0 • • • • • • • • ••••• •• • • • • • •• • 0.2 0.3 (b) Est x True: T=100, LAD median 1.4 1.2 1.0 0.8 0.6 0.4 0.2 • •• • • • • • 0.4 • • • 0.5 (c) Est x True: T=500, LSD mean 1.2 1.0 0.8 0.6 0.4 0.2 • • • • •• 0.6 0.7 0.8 0.9 1.0 0.8 0.6 0.4 0.2 (e) Est x True: T=1000, LSD mean 3.0 2.5 2.0 1.5 1.0 0.5 • • • • • • ••• •••• • •• ••• ••• • ••• • • •• 0.2 0.4 0.6 0.8 1.0 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 1.2 (g) Est x True: T=2000, LSD mean 3.0 2.5 2.0 1.5 1.0 0.5 •• • ••••••• ••••• ••• 0.5 • •• • 1.0 • • 1.5 • 2.0 • • • • •• 0.4 • 0.5 • • • • •• • ••• • •• •• •• ••• •• • • • • • •• 0.3 0.4 0.5 • • • 0.6 0.7 0.8 0.9 • • • • • • ••• ••• • •• • ••• • • • • • • ••• 0.2 0.4 0.6 0.8 • • • • 1.0 1.2 (h) Est x True: T=2000, LAD median • • • • • ••• •••• • ••• • • • • 0.2 0.3 • • •• • • (f) Est x True: T=1000, LAD median • • • • • • (d) Est x True: T=500, LAD median • •• • •• • •• • • • •• • • • • •• •• • • ••• • 0.3 0.4 0.5 • • 2.5 2.0 1.5 1.0 0.5 2.5 • • • • • •• ••••• • • • • • • • • •• • 0.5 1.0 • • 1.5 2.0 2.5 Figure 7: Example 2, Student errors. Residual and oracle volatility estimates on the y and x axes, respectively. Finally in this example, Figure 8 displays a scatter plot between oracle and residuals volatility estimates, but this time based on the average of 80 repetitions of 1000 observations each. The virtually perfect line indicates that in average the residuals estimator is fully adaptive to unknown conditional median, for sample size of 1000 and the Student error distribution. 14 LAD volatility estimates: t distribution • 1.6 • 1.4 Estimated Residuals • 1.2 • • 1.0 • • • 0.8 •• 0.6 0.4 •• ••••• 0.2 0.2 •• •• • •• 0.4 •• • • •• • 0.6 •• 0.8 1.0 1.2 1.4 1.6 True Residuals Figure 8: Example 2. Residual and oracle volatility estimates based on a 80-repetition average, with LAD median estimator and sample size 1000. Example 3: In this third example we consider the model Xt+1 = 0.235Xt (16 − Xt ) + 0.3εt , (24) where εt ∼ IIDN (0, 1) for one set of simulations, and εt ∼ IIDt(5) for the other set. Each simulated series has 500 observations, and we generate 200 of them. We estimate the conditional mean and volatility considering the two-step-ahead prediction, because the conditional volatility is constant for the one-step-ahead case. In this example, we have 131 equidistant grid points in the interval [2,15]. Figure 9 shows the true conditional mean functions, their LSD estimates and two standard-deviation confidence limits. Nice results are presented for the LAD volatility estimator in parts (b) and (d) of Figure 10, whereas in parts (a) and (c) (specially in (c)) the LSD estimator seems to fail to reproduce the central area of the curve. However, different volatility quantities are measured by these two estimators, and for further comparisons we address to Figure 11, where the box-plots of the mean-absolute deviations of the square root of the volatility are shown. There we can confirm the superior performance of the LAD estimator in the Student 15 case, while the two estimators perform very similarly in the Gaussian case (at least the bias results look very similar, although there is a slightly better performance of the LSD estimator in terms of variance). Mean(t) (a) LS mean estimates: normal error 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 Y(t-2) Mean(t) (b) LS mean estimates: t error 16 14 12 10 8 6 4 2 2 4 6 8 10 12 14 Y(t-2) Figure 9: Example 3. LSD estimates of the conditional mean. Results for Gaussian and Student errors are displayed. (b) LAD estimates: normal error 2.0 1.5 1.5 Volatility(t) Volatility(t) (a) LS estimates: normal error 2.0 1.0 0.5 1.0 0.5 0.0 0.0 2 4 6 8 10 12 14 2 4 6 Y(t-2) (c) LS estimates: t error 10 12 14 (d) LAD estimates: t error 3.0 3.0 2.5 2.5 2.0 2.0 Volatility(t) Volatility(t) 8 Y(t-2) 1.5 1.0 0.5 1.5 1.0 0.5 0.0 0.0 2 4 6 8 10 12 14 2 Y(t-2) 4 6 8 10 12 14 Y(t-2) Figure 10: Example 3. True Conditional Variance, its estimates and two standard-deviation limits are represented. 16 Figure 12 shows scatter plots between residuals and oracle estimates, in the same way as seen in Example 2. Again, only one repetition is performed for sample sizes varying from 100 to 2000. The error distribution is Normal, and either using LSD or LAD estimates in the first stage, the scatter plots show very nice results in terms of adaptiveness, apart from the extreme righthand-side, i.e., large estimated volatility values. Even a sample size of 500 observations already performs well. From Figure 13, we can see that almost the same conclusion can be drawn for the Student error case. However, unlike Example 2, there is a bit more of variability present now, when compared to the Normal error distribution. But again, the straight line starts getting its shape as early as at a 500 sample size (apart from the extreme volatility estimates), suggesting that the residual and oracle estimators are estimating very similar conditional volatilities. (a) Box plots: normal error 0.15 0.10 0.05 LS LAD (b) Box plots: t error 0.25 0.20 0.15 0.10 0.05 LS LAD Figure 11: Example 3. Mean absolute deviations of the LSD and LAD volatility estimates. As a last indication of adaptiveness, Figure 14 represents one more scatter plot between residual and oracle volatility estimates. These are based on an 17 average of 200 repetitions of sample sizes equal to 500. The LAD estimator of the conditional median is applied in the first stage, and the error distribution is Student. (a) Est x True: T=100, LSD mean 1.0 0.8 0.6 0.4 0.2 0.0 • • •• • •• • • • • • • • • •• • •• •• • • •• • • • • • •• • •• •• ••••• ••••••••• •••••••••••••••• ••••••••• • • • • • • • • • •• •• • •• •• • • ••• • • ••• 0.0 0.1 0.2 0.3 (b) Est x True: T=100, LAD median 0.4 0.3 0.2 0.1 0.0 (c) Est x True: T=500, LSD mean 0.8 0.6 0.4 0.2 0.0 •• • •• • •• •• •• • • • •• •• •• • ••• • •• •••••• •• • • 0.4 • • • • • • •• •••• • •••• •• • • • • •••• ••• •••••••• ••• • 0.2 • • • • • •••••• ••••• •••• 0.0 • •••••• •• 0.0 0.1 0.2 0.3 0.4 0.5 0.6 (f) Est x True: T=1000, LAD median • • • •• ••• • 0.0 • • • •• • • ••• • • • • •• •• • • •• • • • • • • •• • ••• • •••• •• •• • • • •• •••••••••••••••• • • • • • • •• ••••••• •• •••••••• •••••••• 0.1 0.2 0.3 0.4 0.5 0.6 0.8 0.6 0.4 0.2 0.0 (g) Est x True: T=2000, LSD mean 0.8 0.6 0.4 0.2 0.0 •• •• ••• • ••• •• • ••• ••••• • • • • • • ••• • • • • • •• • • •••• • ••• • •• •••••••••••••• • • •••• • ••• •• • • • • • • •• • •• ••• • •• • • • • ••• 0.0 0.1 0.2 0.3 • (d) Est x True: T=500, LAD median • • • • •• • • • • •• •• • • • • ••• •• •• • • • • •• • • • •••• •• • ••• ••• • • • • • • • ••• • • • •• • • •• • • •• • • • •• • •••••• •• • •• • •••••• •••• • • ••••• • 0.0 0.1 0.2 0.3 0.4 0.5 (e) Est x True: T=1000, LSD mean 1.2 1.0 0.8 0.6 0.4 0.2 0.0 • • •• • 0.0 •• • •• • • • •• • • •• • •• • • •• • ••• • •• •• • • • ••• • • • • • • • • • • • • • • •••• • • • •••• • • • • • • • • • • • • • •• •••••• • •• • • •••••• ••••• ••••• 0.2 0.4 • ••• •• • 0.0 •• • • ••••• •• • • •• • ••• • • •• • • • •• • • • • • • • • • • • •• • •• •• •••• •• ••••••• •• • •••••••••• •• •• •••••• • •• • ••••••• •••••••••••••• 0.1 0.2 0.3 0.4 0.5 0.6 (h) Est x True: T=2000, LAD median 0.8 0.6 0.4 0.2 0.0 0.6 •• •• • • ••• • • •••••• • •• • • •• • • • •• ••••••• ••••• • • • •• ••• • •• •• • •••• • •• •• •• • • •• • • •• •••••••••••••••• ••• • •••• 0.0 0.2 0.4 0.6 • ••• Figure 12: Example 3, Normal errors. Residual x Oracle volatility estimates on the y and x axes, respectively. 18 (a) Est x True: T=100, LSD mean (b) Est x True: T=100, LAD median • • • •• •• • • • 1 • •• • • • •• • • • • •• • • ••• •••• • • • ••• • • • ••••••••••••••••• •••••• • •• • • • ••• • • •• ••• ••••• ••• •• ••• 0 • • •• • • 0.0 0.2 0.4 0.6 0.8 1.0 2 0.4 0.3 0.2 0.1 0.0 • 0.0 (c) Est x True: T=500, LSD mean 1.0 0.8 0.6 0.4 0.2 0.0 • 0.0 • 0.0 0.8 •• • • •• • • • •• • • • • • • • • • • •• • • •• • • • • •• •• ••• ••• •• •••••• •• ••• • • ••• •• •• •••• • • • • ••• • •••••• • • ••• •• ••••••••••• ••••••••••• •• 0.2 0.4 0.6 0.4 0.6 0.8 1.0 •• • •• •• • •• • • • • • • ••• ••• • •• • • •• ••• • • ••• •••• ••• • • ••• • • • •• • • • • • ••••••• •• • •••••••••••••••••• •••••••••• •••••••••••• 0.6 0.4 0.2 • 0.0 0.0 0.2 0.4 0.6 (f) Est x True: T=1000, LAD median • • •• • •• • • •••••• • •• • • • •• • •• • • •• • • •• • ••• •••••• • • • • •• • • •• •• • •• • • • • • • •• • • •• • •• • •••• ••• ••• • • ••••••• ••••••••• ••• ••••••• 0.2 0.4 0.6 1.0 0.8 0.6 0.4 0.2 0.0 • • •• ••• • •• • ••••• • • •• • • • • •• • •• • •• • • • •• •••• • • •• • • •• • • • • •• • • • •• ••• • •• • • •• • •••••••• •••• • • •••••••• ••••• •••••••••••• 0.2 0.4 0.6 • 0.0 (g) Est x True: T=2000, LSD mean 0.8 0.6 0.4 0.2 0.0 0.2 (d) Est x True: T=500, LAD median (e) Est x True: T=1000, LSD mean 1.0 0.8 0.6 0.4 0.2 0.0 • ••• •••• ••• • • • • •• •• • • • • • • • • •• • • • • • • • • • • • •• ••• • •• • •• • • ••• •• •• • • • • •• • •••••••••••••••• •• • • • •• • • • • • • •• • • • • (h) Est x True: T=2000, LAD median • • • •• • • • • • •• • • • •• • • •• • • •• • • • • ••• • • • • •••• •••• ••••• • • • • • • • •• • • •• •••••• ••• •••••••••• • • • ••••• 0.0 0.2 0.4 0.6 0.8 • • • • • • • • • ••• ••••• • • • •• •• • • • • •• • •• • • • •••• •• • •• • • • • ••••••• ••• • • • • • • • •••••••• • • •••••••• ••••• • •••• 0.0 0.2 0.4 0.6 0.8 0.8 0.6 0.4 0.2 0.0 Figure 13: Example 3, Student errors. Residual x Oracle volatility estimates on the y and x axes, respectively. LAD volatility estimates: t distribution 0.6 • Estimated Residuals 0.5 0.4 •• 0.3 0.2 0.1 • • • • •• •• • • •• • • • • •• ••• • • •••••••••• • •• • • • ••• • ••••• • • • • • •• • • • • •• • • • • • • • • •• • • • • • • • • • • • • • • ••• •• • • • • • • •••• • •••••• ••• 0.1 0.2 0.3 0.4 0.5 0.6 True Residuals Figure 14: Example 3. Residual and oracle volatility estimates based on a 200-repetition average, with LAD median estimator and sample size 500. 19 5 Analysis of S&P500 and BOVESPA Returns The LAD volatility estimator is now applied to the S&P500 and BOVESPA index returns. The S&P500 index is evaluated over the period from October 25, 1988 to April 8, 1998, whereas the BOVESPA index runs from May 27, 1986 to October 16, 2000. The returns are defined as yt = log xt − log xt−1 , where xt is the index price at time t. In part (a) of Figure 15 the S&P500 returns are plotted, followed by their historical conditional volatility estimates in parts (b)-(d). In part (b), the LSD volatility estimates are reported, whereas in (c) and (d) LAD estimates are shown (in the latter case based on the logarithms of the squared residuals). It looks as though LAD estimates are too low for extreme past returns (outliers). The sparseness of points in that area seems to have a rather strong effect on the LAD volatility estimates. Figure 16 show the conditional volatility estimates in the grid [-0.02,0.02]. Part (a) represents the LAD estimates, (b) the LSD estimates, and (c) the LAD estimates computed using the logarithms of the residuals. All three estimators reflect the asymmetry of financial time series, often commented on in this work. Whereas the LSD estimates are smoother than the LAD at the negative-end-side of the grid, LAD estimates seem slightly more robust at the positive-end-side of the grid. Also note that LAD and log-based LAD estimates capture similar levels of volatility. Turning to the BOVESPA returns analysis, they are plotted in Figure 17, along with their historical volatility estimates for the same three estimators as for the S&P500 returns. Interestingly, this time both the LAD and the LSD estimates, represented in parts (b) to (d), seem to provide low values for extreme returns across time. Also, apart from measuring different volatility scales, these estimates show similar results. Figure 18 plots the conditional volatility estimates of the BOVESPA returns over the grid [-0.15,0.15]. All three estimators seem to do a good work in reflecting the asymmetry of volatility. It only does not happen on the extreme sides of the grid, where the sparseness of data points plays a central and disturbing role on the resulting estimates. As a final remark, one is reminded that the conditional volatility measures seen in Figures 15 and 17 can be used to construct confidence intervals in risk analysis. We will not perform this illustration here though. 20 (a) S&P500 returns 0.04 0.02 0.0 -0.02 -0.04 -0.06 26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97 (b) LSD estimates of conditional std.dev. 0.012 0.010 0.008 0.006 0.004 26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97 (c) LAD estimates of conditional std.dev. 0.008 0.006 0.004 0.002 0.0 26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97 (d) LAD estimates of conditional std.dev. of logs 0.008 0.006 0.004 0.002 0.0 26/10/88 26/10/89 26/10/90 26/10/91 26/10/92 26/10/93 26/10/94 26/10/95 26/10/96 26/10/97 Figure 15: S&P500 returns and historical volatility estimates. 21 (a) LAD est. of cond. volatility 0.0050 0.0048 0.0046 0.0044 0.0042 0.0040 0.0038 -0.02 -0.01 0.0 returns 0.01 0.02 (b) LSD est. of cond. volatility 0.0100 0.0095 0.0090 0.0085 0.0080 0.0075 -0.02 -0.01 0.0 returns 0.01 0.02 (c) LAD est. of cond. volatility: logs 0.0055 0.0050 0.0045 0.0040 -0.02 -0.01 0.0 returns 0.01 Figure 16: S&P500 volatility estimates. 22 0.02 (a) BOVESPA returns 0.3 0.2 0.1 0.0 -0.1 -0.2 28/05/86 28/05/88 28/05/90 28/05/92 28/05/94 28/05/96 28/05/98 28/05/100 (b) LSD estimates of conditional std.dev. 0.08 0.06 0.04 0.02 0.0 28/05/86 28/05/88 28/05/90 28/05/92 28/05/94 28/05/96 28/05/98 28/05/100 (c) LAD estimates of conditional std.dev. 0.04 0.03 0.02 0.01 0.0 28/05/86 28/05/88 28/05/90 28/05/92 28/05/94 28/05/96 28/05/98 28/05/100 (d) LAD estimates of conditional std.dev. of logs 0.05 0.04 0.03 0.02 0.01 0.0 28/05/86 28/05/88 28/05/90 28/05/92 28/05/94 28/05/96 28/05/98 28/05/100 Figure 17: BOVESPA returns and historical volatility estimates. 23 (a) LAD est. of cond. volatility 0.045 0.040 0.035 0.030 0.025 0.020 0.015 -0.15 -0.10 -0.05 0.0 returns 0.05 0.10 0.15 0.10 0.15 (b) LSD est. of cond. volatility 0.08 0.07 0.06 0.05 0.04 0.03 -0.15 -0.10 -0.05 0.0 returns 0.05 (c) LAD est. of cond. volatility: logs 0.05 0.04 0.03 0.02 -0.15 -0.10 -0.05 0.0 returns 0.05 0.10 Figure 18: BOVESPA volatility estimates. 24 0.15 References [1] D. Bosq. Nonparametric Statistics for Stochastic Processes. SpringerVerlag, New York, 1998. [2] A. W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis. Oxford University Press, New York, 1997. [3] Z. Cai and X. Zhu. Nonparametric quantile estimations for dynamic smooth coefficient models. Technical report, Departament of Mathematics and Statistics, University of North Carolina, 2004. [4] J. Fan and I. Gijbels. Local Polynomial Modelling and Its Applications. Chapman and Hall, London, 1996. [5] J. Fan and P. Hall. On curve estimation by minimizing mean absolute deviation and its implications. Annals of Statistics, 22:867–885, 1994. [6] J. Fan and Q. Yao. Efficient estimation of conditional variance functions in stochastic regression. Biometrika, 85:645–660, 1998. [7] P. Hall, Peng L., and Q. Yao. Prediction and nonparametric estimation for time series with heavy tails. Journal of Time Series Analysis, 23:313– 331, 2002. [8] W. Härdle. Robust regression function estimation. Journal of Multivariate Analysis, 14:169–180, 1984. [9] W. Härdle and T. Gasser. Robust nonparametric function fitting. Journal of the Royal Statistical Society, Series B, 46:42–51, 1984. [10] R. Koenker and Z. Xiao. Unit root quantile autorregression inference. Journal of the American Statistical Association, 99:775–787, 2004. [11] D. Pollard. Asymptotics for least absolute deviation regression estimators. Econometric Theory, 7:186–199, 1991. [12] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C. Cambridge University Press, Cambridge, 1992. [13] B. W. Silverman. Density Estimation for Statistics and Data Analysis. Chapman and Hall, London, 1986. 25 [14] J. S. Simonoff. Smoothing Methods in Statistics. Springer-Verlag, New York, 1996. [15] Y. K. Truong. Asymptotic properties of kernel estimators based on local medians. Annals of Statistics, 17:606–617, 1989. [16] Y. K. Truong and C. J. Stone. Nonparametric function estimation involving time series. Annals of Statistics, 20:77–97, 1992. [17] M. P. Wand and M. C. Jones. Kernel Smoothing. Chapman and Hall, London, 1995. [18] F. T. Wang and D. W. Scott. The L1 method for robust nonparametric regression. Journal of the American Statistical Association, 89:65–76, 1994. [19] A. H. Welsh. Robust estimation of smooth regression and spread functions and their derivatives. Statistica Sinica, 6:347–366, 1996. [20] Q. Yao and H. Tong. Asymmetric least squares regression estimation: a nonparametric approach. Nonparametric Statistics, 6:273–292, 1996. [21] F. A. Ziegelmann. Nonparametric estimation of volatility functions: the local exponential estimator. Econometric Theory, 18:985–991, 2002. 26
© Copyright 2026 Paperzz