5.1. Data and estimation period

ESTIMATING CONDITIONAL VOLATILITY WITH NEAREST NEIGHBOR
PREDICTIONS
Acosta-González, Eduardo*
Universidad de Las Palmas de Gran Canaria
Fernández-Rodríguez, Fernando**
Universidad de Las Palmas de Gran Canaria
Pérez-Rodríguez, Jorge***
Universidad de Las Palmas de Gran Canaria
SUMMARY
We propose a new approach to measure volatility. This differs from GARCH family
models in that it is based on non-linear dynamical systems and non-parametric
regression. Volatility is defined as the risk of predictions in relation to a priori
information about the past Nearest Neighbors (NNs).
Our new measure of volatility is compared with GARCH in simulated and financial
time series. The out-of-sample forecasting results indicated that the GARCH-based
model forecasts, in most cases, were biased and exhibited no significant informational
content. In contrast, the NN-based model forecasts generally exhibited less bias, and in
most cases had significant information content.
JEL classification: C52; C53
Keywords: Nearest neighbor predictions; GARCH models
Universidad de Las Palmas de Gran Canaria
Campus de Tafira.
Facultad de CC. Económicas y Empresariales
35017 Las Palmas de Gran Canaria
Spain
(*) e-mail: [email protected]
Tfno/Fax: +34 928 451 820 / +34 928 451 829
(**) e-mail: [email protected]
Tfno/Fax: +34 928 451 802 / +34 928 451 829
(***) e-mail: [email protected]
Tfno/Fax: +34 928 458 222 / +34 928 451 829
1
1. INTRODUCTION
Market risk has become one of the buzzwords of financial markets 1. Two facts are
apparent: first, the role of uncertainty is central to much of modern finance theory, such
as capital asset pricing model (CAPM), consumption-based CAPM (C-CAPM), or
arbitrage pricing theory (APT) because there exists a feedback between risk and return.
For example, according to CAPM the risk premium is determined by the covariance
between the future return on the asset and one or more benchmark portfolios. Theory
also suggests that the price of an asset is a function of its volatility, or risk. Second,
regulators, commercial and investment banks, and corporate and institutional investors
are increasingly focusing on measuring more precisely the level of market risk incurred
by their investors, and have long recognized that asset returns exhibit volatility
clustering2. Consequently, an understanding of how volatility evolves over time is
central to the decision making process.
In empirical finance, we tend to be less interested in the level of the asset price or
stock market index since it is widely assumed that such time series can be described as
random walk. However, recent work in finance has demonstrated that financial markets
are not characterized perfectly by random walk theory as a weak efficient-market
hypothesis (see Lo and MacKinlay, 1999), and mean return and volatility can be
estimated using non-linear models.
There is widespread agreement that the volatility of asset returns is, to some degree,
forecastable. Only in the last two decades, however, have been developed measures and
statistical models that can accommodate and account for this dependence and analyze
whether the volatility is stable over time3. Traditionally, to estimate volatility using
historical time series analysis, most practitioners in the finance profession have relied
on a moving average with fixed weights for all observations across the measurement
sample. Volatility was defined as the standard deviation of changes over a specified
period. Most measurement samples also used the rule that the longer the forecast
horizon, the more historical data should be used. Typical studies of this nature are Black
(1976), French et al. (1987) and Schwert (1989, 1990); these estimate by using the
sample standard deviation of stock price changes or moving averages of squared price
changes.
Traditional estimates are based on the assumption that volatility is necessary to
construct the series, although this is difficult to defend theoretically. Moreover, the
length of the interval greatly affects the measured persistence. Using a simple moving
average has not been very satisfactory, however, because of a unique characteristic of
the measure. Since all points in the sample have equal weights, volatility tends to rise
sharply when confronted with a shock but then decline just as sharply once that
1
JP Morgan defines risk as the degree of uncertainty of future net returns. Many participants in the
financial markets are subject to a wide variety of risks. A common classification of risks is based on the
source of the underlying uncertainty. For example, credit risk (estimates the potential loss due to the
inability of the counterparty to meet its obligations), operational risk (results from the errors that may be
made in instructing payments or settling transactions), liquidity risk (reflected in the inability of a firm to
fund its non-liquid assets) and market risk (involves the uncertainty of future earnings resulting from
market conditions – asset prices, interest rates, volatility).
2
In particular, volatility clustering implies that big surprises of either sign increase the probability of
future volatility.
3
The more stable the volatility the more reliable is the prediction of future volatility from past
observation.
2
particular observation falls out of the measurement sample. Using simple moving
averages creates measures of volatility which tend to look like plateaus when charted.
One way to avoid this problem is to use an exponential moving average where the
latest observations are assigned the greatest weight in estimating volatility. This
approach has two conceptual advantages, which are important to the practitioner. The
first is that the volatility estimate reacts faster to shocks to the markets, as recent data
carries more weight in the estimation. The second is that following a shock, volatility
declines gradually as the weight of the shock observation falls. In contrast, the use of a
simple moving average leads to changes in volatility once the shock observation falls
out of the measurement sample, which in some cases can be several months after it has
occurred.
However, research in finance has devoted significant effort in the last two decades to
coming up with better models for estimating volatility. Time series realizations of
returns often exhibit time-dependent volatility. These facts allow an alternative
volatility specification based on non-linear models. Several authors have fitted time
series models to obtain estimates of conditional or expected volatility from return data.
This idea was first formalized in Engle’s (1982) ARCH model, which is based on the
specification of conditional densities at successive periods of time with a timedependent volatility process.
The ARCH model is based on the assumption that forecasts of the variance at some
future point in time can also be improved by using recent information. Since the
publication of the original ARCH paper in 1982, these methods have been used by
many researchers. Alternative formulations have been suggested and used and the range
of applications has continually widened (see Bollerslev et al., 1992a and Bera and
Higgins (1993) for a survey of these models). In the ARCH model (Engle, 1982) and its
extension as generalized ARCH (Bollerslev, 1986), or exponential GARCH (Nelson,
1991) approximations, time series volatility is measured by means of the conditional
variance of its unexpected component, that is, a distributed lag over squared
innovations. Fitting GARCH models to stock price data provides an alternative way to
estimate conditional volatility and has become standard in recent empirical applications.
However, as Pagan and Schwert (1990) have shown, the ARCH models present some
problems in the estimation of volatility because there are important non-linearities in
stock return behavior that are not captured by conventional ARCH or GARCH models.
Furthermore, evidence of non-linearity in financial time series has accumulated over the
years, and new prediction techniques, such as chaotic dynamics and artificial neural
networks, have been introduced.
Nearest-Neighbor predictions (NN hereafter) is a new non-parametrical technique of
short-run forecasting inspired by the literature on forecasting chaotic dynamical
systems. The basic philosophy behind these predictors is that elements of a past time
series might have a resemblance to elements in the future. In order to generate
predictions, patterns with similar behavior are located in terms of nearest neighbors and
the time evolution of NNs is used to yield the prediction. The NN prediction procedure
makes no attempt to fit a global model to the whole time series, but uses only local
information about the points to be predicted.
NN methods are included in the framework of non-parametric methods (Härdle and
Linton, 1994). Original ideas on NN have been contributed by Stone (1977) (consistent
non-parametric regression) and Cleveland (1979) (robust locally weighted regression).
Farmer and Sidorowich (1987) gave an important impulse to this kind of prediction by
applying the NN method to predict chaotic time series.
3
NN predictors in financial time series suggest a mixture of technical analysis and
chaotic behavior. The chaos paradigm holds that non-linear behavior is capable of
producing deterministic, apparently random series that are short-term predictable;
chartism, on the other hand, holds that parts of financial series, in the past, might have a
resemblance to parts in the future. Clyder and Osler (1997) show that non-linear
forecasting techniques, based on the literature of complex dynamic systems, can be
viewed as a generalization of these chartist graphical methods; that is, the NN prediction
method may be considered as a developed and sophisticated chartism inspired by
chaotic dynamics, where, in order to yield predictions, present patterns of a time series
are compared with past patterns.
NN predictions have been applied several times to predicting financial time series;
for instance, Diebold and Nason (1990), Bajo-Rubio et al. (1992), Mizrach (1992),
Fernández-Rodríguez et al. (1999), Soofi and Cao (1999) and Lisi and Medio (1997).
Most of these studies show that NN predictors have a higher efficiency than a random
walk.
The purpose of the present paper is to propose a different approach of estimating
conditional volatility based on ideas of non-linear dynamical systems like NN
predictions. In the ARCH (Engle, 1982) and GARCH (Bollerslev, 1986)
approximations, time series volatility is measured by means of the conditional variance
of its unexpected component. We define the non-predictable component of the volatility
of an observation as the risk of its NN prediction, in relation to a priori information
about the past NNs.
There have been several attempts to measure the volatility of a financial time series
using concepts related to non-linear dynamical systems. Philippatos and Wilson (1972
and 1974) used entropy as a measure of uncertainty in the selection of efficient
portfolios. Bajo-Rubio et al. (1992) proposed an indicator of global volatility based on
the inverse of the maximum Lyapunov characteristic exponent in order to measure
exchange rate volatility. Finally, a first indicator of daily exchange rate volatility based
on NN forecasting was introduced in Sosvilla-Rivero et al. (1999).
In finance, volatility is associated with the risk of a specific prediction. For instance,
for the GARCH models, volatility is associated with the risk of an ARMA-GARCH
prediction. In this study the volatility of an observation in a series is associated with the
risk of a NN prediction.
This paper proceeds as follows. Section 2 introduces the GARCH models, Section 3
presents the nearest neighbor technique and Section 4 describes a new volatility
measure based on NNs. Section 5 gives empirical results for some GARCH-family
models and NN predictions and a forecast evaluation of the different models. The final
section provides a brief summary and conclusions.
2. GARCH MODEL
Consider a stock market index It and its continuous rate of return xt, which we have
 . The index t denotes daily closing observations. The
constructed as x t  log  I t

I
t 1 

conditional distribution of the series of disturbances which follows a GARCH process
can be written as:  t | t 1  N 0, ht  where t denotes all available information at time
t-1.
4
The regression model for the series of xt can be written as
 s L x t     r L  t ,  s L   1  1 L  ...   s Ls ,  r L   1   1 L  ...   r Lr
 t  z t ht , z t  N 0,1
, (1)
where L is the backward shift operator. The parameter  reflects a constant term, which
in practice is typically estimated to be close or equal to zero. The orders r and s identify
the terms of the ARMA(s,r) stochastic process and we assume that the error term is
heteroskedastic.
In this sense, the conditional variance ht could be written as
q
p
ht   i    j ht  j ,
i 1
2
t i
(2)
j 1
where p0, q>0 and >0, i0 ,j0 for a non-negative GARCH(p,q) process. The
GARCH(p,q) model reduces to the ARCH where p=0 and at least one of the ARCH
parameters must be non-zero (q>0). Also, the GARCH parameters are restricted by
q
imposing stationary unconditional variance. This occurs when
p
   
j 1
i
j 1
j
 1 , which
implies that the mean, variance and autocovariance are finite and constant over time.
When
q
p
j 1
j 1
 i    j  1 , the unconditional variance does not exist. However, it is
interesting that the integrated GARCH or IGARCH model can be strongly stationary.
Another specification is the exponential GARCH proposed by Nelson (1991). A single
specification of this model is EGARCH(1,1), as:
log ht    log ht 1 
 
2
  t 1 

ht 1  ht 1  
 t 1
(3)
where,  > 0, 0 <  < 1, 0 <  <1 , and  < 0.  is the volatility persistence and 
reflects an asymmetric effect or leverage. If  is strictly negative, positive shocks over
return produce less volatility than the negative. Note that the left-hand side is the log of
the conditional variance. This implies that the leverage effect is exponential, rather than
quadratic, and that forecasts of the conditional variance are guaranteed to be nonnegative. The presence of leverage effects can be tested by the null hypothesis   0 .
The impact is asymmetric if   0 .
Other models that we employ are the GARCH-M and EGARCH-M formulations.
The formers are characterized by the introduction of the standard deviation into the
model (1). By reformulating (1), we have:
 s L x t     ht   r L  t ,
 t  z t ht , z t  N 0,1
(4)
and by substituting (2) or (3) into equation (4), we have the GARCH(p,q)-M and
EGARCH(p,q)-M specifications.
5
The estimation of (1) and (2), (1) and (3) or (4) is performed by using the Berndt,
Hall, Hall and Hausman (1974) algorithm (hereafter BHHH) and the Bollerslev and
Wooldridge (1992b) procedure for heteroskedasticity-consistent covariance. Assuming
the normally conditional error distribution function, the log-likelihood is expressed by:
lt  
1 T
1 T 2 
   log ht     t  .
2 t 1
2 t 1  ht 
t 1 T
T
Lt    
However, Nelson’s log likelihood specification for the log conditional variances differs
slightly from the specification above, because in Nelson’s model it is assumed that the
error follows a generalized error distribution, while we assume normally distributed
errors. Bollerslev et al. (1992a) and Bera and Higgins (1993) provide excellent surveys
of GARCH family models.
3. NEAREST-NEIGHBOR PREDICTIONS
The Nearest-Neighbor predictions are a short-run forecast technique inspired by the
literature on forecasting non-linear dynamical systems.
The basic tool for NN predictions is the embedding of our series in a phase space.
Given a series { x t } (t=1,2 ...., T), in order to detect behavioral patterns in this series,
segments of equal length are considered as vectors x td of d consecutive observations
sampled from the original time series, that is:
xtd  ( xt , xt 1 ,......, xt ( d 1) ) t  d , d  1,...., T .
These d-dimensional vectors are often called d-histories; the parameter d is referred
to as the embedding dimension, while the m-dimensional space  d is the phase space
of the time series.
The Takens (1981) embedding theorem establishes that, for a large enough
embedding dimension d, if the original time series is sampled from a deterministic
(perhaps chaotic) dynamical system, the trajectories of d-histories x td mimic the data
generation process. The proximity of two d-histories in the phase space  d may be
interpreted as similar dynamic behavior and allows us to refer to the "nearest neighbors"
of a particular segment x td of the series.
Given the series { x t } (t=1,2 ...., T), the prediction of observation xT 1 is generated by
analyzing the historical paths for the last available d-history
xTd  ( xT , xT 1 ,......, xT ( d 1) ) .
To that end, segments
x td1 , x td2 ,........., x tdk
(5)
with similar dynamic behavior to the last one in the series xTd are detected, seeking the k
vectors in the phase space  d , which maximize the function:
6
 ( x td , x Td ) ,
as we see in Figure 1
[Figure 1]
Therefore, the k d-histories (5) chosen present the highest serial correlation with
respect to the last d-history xTd .
Once the NNs to xTd have been established, the future short-term evolution of the
time series is obtained by estimating the next observation xT 1 . The prediction xˆT 1 of
xT 1 can be obtained after extrapolating the observations
x t1 1 , x t2 1 ,....., x tk 1 ,
(6)
subsequent to the k NNs d-histories (5), that is:
xˆT 1  F ( xi1 1 , xi2 1 ,....., xik 1 ) .
The simplest determination of function F(.) is a projection on every argument, that is
xˆTelem
1  xtr 1 , for r=1, 2, ...., k
(7)
Henceforth, we term this kind of ingenuous prediction elementary predictions xˆTelem
1 . A
better determination of function F(.) consists of using the mean of the elementary
predictions, that is
1 k
(8)
 xt 1 ,
k i 1 r
for geometrical reasons, such a predictor xˆTbar
1 would be called barycentric predictor
(Fernández-Rodríguez and Sosvilla-Rivero, 1998).
When generating NN predictions, in order to get greater accuracy than elementary or
barycentric ones, locally adjusted linear autoregressive (LALA hereafter) predictions
are usually employed to fit the function F(.). This procedure involves the regression by
ordinary least squares of the future evolution of the k nearest neighbors chosen on the
basis of their preceding d-histories, that is, regressing x tr from (6) on
xˆTbar1 
xtdr  ( xtr , xtr 1 ,......, xtr ( d 1) ) from (5), for r=1,....k. Then, the fitted coefficients are used
to generate predictions for any xT 1 as follows
ˆ
ˆ
ˆ
ˆ
xˆTlala
1  a0 xT  a1 xT 1  .....  ad 1 xT ( d 1)  ad ,
(9)
where the â i are the values of a i that minimize the expression
7
 x
2
k
r 1
tr 1
 a0 xtr  a1 xtr 1  ......  ad 1 xtr ( d 1)  ad 
Sugihara and May (1990), Casdagli (1992) and Casdagli and Weigend (1994) offer a
detailed decryption of this kind of predictor. Other implementations of NN estimation
(especially that of Cleveland and Devlin, 1988) advocated local weighting schemes that
place greater weights on near observations in estimating the local linear regression.
Although such weighting schemes have certain theoretical attractions, there are practical
difficulties in their implementation. Wayland et al. (1994) showed that unweighted
algorithms tend to yield superior results.
4. A NEW VOLATILITY MEASURE
In this study the volatility of an observation in a series is associated with
unpredictability in the sense of NN predictions.
In non-linear dynamical systems, the basic idea behind NN predictions is that parts
of a time series from the past might have a resemblance to other parts in the future. How
can we predict and measure the unpredictability of an observation of a series in the NN
sense?
Let us consider two kinds of NN predictions; on the one hand, elementary
predictions xˆTelem
1  xtr 1 , for r=1, 2, ...., k given in (7); on the other hand, locallyadjusted linear autoregressive predictions xˆTlala
1 , given in (9). The simplest way of
predicting the volatility of xT 1 is by considering the variance of its elementary
predictions (7), that is
2
v

2

k
k


 1 / k   xtr 1  1 / k  xtr 1   1 / k  xtr 1  xˆTbar1 .
r 1 
r 1
r 1

k
bar
T 1
(10)
Note that vTbar
1 predicts the volatility by comparing different predictions of this
observation xT 1 , that is, elementary predictions of (6) and barycentric predictions of
(8). If elementary predictions are similar, volatility is low; if predictions differ
considerably, volatility is high because the observation xT 1 seems unpredictable.
Observe that vTbar
1 is a measure of the risk of a NN barycentric prediction (8).
If we wish to measure the risk of a more sophisticated prediction like xˆTlala
in (9),
1
following (10) we must consider the new volatility predictor given by
 1 / k  xtr 1  xˆ
2
k
lala
T 1
v
r 1
lala
T 1
.
(11)
Our new methodology to predict volatility seeks similar-behaving patterns in the past
of the series (5), similar to the recent past (the last d-history xTd ) of the series. Observe
that in (10) and (11) volatility is predicted as the mean square of the differences
between NN predictors.
8
As a final intuitive explanation, Figure 1 shows that NNs detected in the past, given
by (5), have a similar volatility to the last d-history xTd of the series. If xTd has low (high)
volatility, the NNs selected in the past have low (high) volatility.
Finally, observe that our new measure of volatility is conceptually similar to the
GARCH philosophy. In finance, volatility is associated with the risk of a specific
prediction. So, for a prediction xˆT 1 of the random variable xT 1 volatility is defined as


E  xT 1  xˆT 1  .
2
For instance, for the GARCH models, volatility is associated with the risk of an
ARMA-GARCH prediction and defined as

 garch
hT 1  E xT 1  xˆTarma
1
 ,
2
while ht is estimated as
ht   0   1 t21   1 ht 1 .
In this paper, the volatility of a series observation is associated with the risk of a NN
prediction, the locally-adjusted linear autoregressive (LALA) predictor, that is

lala
vTlala
ˆT 1
1  E xT 1  x
 .
2
In this case vTlala
1 is estimated by the expression
1 k
2
( xtr 1  xˆTlala

1 ) .
k r 1
5. FORECAST EVALUATION IN SIMULATED AN FINANCIAL TIME SERIES
5.1. Data and estimation period
The data used in this paper to compare GARCH and our NN volatility measure are
simulated and real financial time series. Simulated series are provided by Gaussian
white noise (GWN), GARCH, GARCH-M, EGARCH and EGARCH-M. Real series are
daily observed stock market indices in the New York Stock Exchange. Table 1 reports
header notations and a brief description of each data item. The data were collected from
2nd January 1962 to 31st January 1996, and provided by Data Disk Plus; these data are
copyrighted by Finance & Technology Publishing, where a purchaser is licensed only
for personal use of the data contained therein. In this study, we consider the daily
closing prices as the daily observations for all indices, but we also consider the higher
and lower index for S&P 500.
[Table 1]
9
Some characteristics of the rates of return, xt, are given in Table 2. The number of
observations is 8579 for all seven indices. The means and variances are quite small. The
high kurtosis indicates the existence of fat-tailed distributions to describe these
variables. The estimated measure of skewness is either positive or negative and is large
in an absolute sense.
[Table 2]
The forecast volatility period is the last 500 observations, that is from 8th February
1994 to 31st January 1996. The prediction of the GARCH and EGARCH models and
the LALA model are well-suited recursively one-step-ahead. We estimate models (1)
and (2), (1) and (3) for the GARCH and EGARCH models, (4) for GARCH-M and
EGARCH-M and (9) and (11) for the LALA model and variants of mean effects
between observations 1 to T (where T is 7th February 1994) and then obtain the forecast
for the next period (T+1) (8th February 1994). In the next step, we estimate the model
between 1 to T+1, including real values of series for T+1. We thus estimate the model
for a sample size that includes the real rate of return for the date 8th February 1994, and
then forecast the next observation. We repeat this process until 31st January 1996. In
this sense, for each series we construct two predicted volatility series ht for the GARCH
family model, vt for the LALA model and two error series tarma-garch, tlala for the
forecast period (t=1,…500).
5.2 Bias test
By definition, the predictable component of volatility in a series is the conditional
variance of that series. Like Pagan and Schwert (1990), we use the regression
 t2   0   1vt  ut
(12)
to compare the forecasting performance of stock volatility. If the forecasts are unbiased,
0 = 0 and 1 = 1. Different estimations of 0 and 1 mean the existence of bias in the
model's predictions. The purpose of a bias test is to determine whether forecasts are
unbiased estimates of the actual series (square errors in our case). In other words, the
bias test determines whether model forecasts are systematically higher or lower than
actual square errors.
A natural approach to this forecasting performance is to estimate 1 from a simulated
series. This single experiment is replicated 1000 times by generating 255 gaussian
random series with different variances vt(i) (t=1,2, … ,255 and i=1,2, …, 1000 )4. For
each replication i, a random observation t(i) is selected from each series; we then
compute 1(i) from
 t2 ( i )   0( i )   1( i ) vt( i )  ui( i )
4
t  1,2, ... 255
(13)
These variances have been generated from a GARCH process (0 = 0.0001, 1 = 0.2 and 1 = 0.7).
10
Figure 2(a) shows the estimation of 1(i) in (13) to be around one.
The same simulation for the GARCH series instead of the gaussian random series
produces Figure 2(b)5. In this case  1( i ) is not around one. In fact, its mean is 0.872225.
Similar results are obtained when different parameters of conditional variance are used.
Table 3 summarizes the statistical basis of 1(i) for both cases, revealing substantial
bias in the forecast of the slope coefficient estimates for the GARCH case. It is apparent
that volatility is downward biased by GARCH models.
[Figure 2]
[Table 3]
In Tables 4 and 5, we report some results of this test applied to the estimated errors
of the above models. The regression equation is (12), where vt and t2 are estimated
from a ARMA-GARCH model (these series are ht and squares of tarma-garch) and a
LALA model (these series are vt and squares of tlala), respectively6.
[Table 4]
Tables 4 shows that in the case of the GARCH-family models, the hypothesis of
unbiasedness is rejected for all cases. For the LALA model forecast (Table 5), the
hypothesis of absence of bias was not rejected at the 5% level for series DJBA, DJIA,
DJTA, DJUA, or SP500C, or for any of the simulated series, but was rejected at the 5%
level for series SP500H and SP500L. In terms of bias, the LALA model performance is
clearly superior. It is interesting to examine the shape of the bias in the cases where its
presence is significant. In the GARCH-family models, intercepts and slopes are
significantly different from zero and one, respectively. The slope coefficients are
significantly greater than one, which indicates that the GARCH-family models (Table
4) have a tendency to underestimate the magnitude of volatility. Finally, in the case of
the LALA model (Table 5), the SP500L intercept is significantly different from zero but
its slope is not significantly different from one. This indicates that the bias is constant,
and does not vary with the level of the forecast.
[Table 5]
5.3. Informational content test
To determine whether the forecasts generated from alternative models (GARCH family
and LALA) contain additional information, we used the informational content test
5
GARCH series have been generated using the same parameters as those used to generate variances of
gaussian random series.
6
Computational problems arose in estimating the ARMA-GARCH model for series DJBA, GWN and the
simulated EGARCH-M series. However, we had no problems with the NN-based procedures in these
series, which provide good results.
11
developed by Fair and Shiller (1989, 1990), which involves running regressions to
estimate the square errors in each model on paired volatility forecasts. The regression is:
 t2  0  1vt  2ht  ut
(14)
We tested a null hypothesis of no information in both forecasts, H0: 1 = 0 and 2 = 0,
against the alternative hypothesis that both coefficients are non-zero or that at least one
is non-zero. If both 1 and 2 are zero, then neither forecast vt nor ht contains significant
information on which volatility can be estimated. However, if the 1 (2) coefficient is
non-zero then vt (ht) contains significant information to estimate volatility and no
additional independent information is found in ht (vt). Finally, if both 1 and 2 are nonzero, significant information is revealed in both series and the information sets are
independent.
Results from this test are presented in Tables 6 and 7. In Table 6 equation (14) is
estimated by using the square of tarma-garch as the endogenous variable whereas in Table
7 equation (14) is estimated by using the square of tlala as the endogenous variable.
Table 6 shows that, in the case of DJIA, DJTA, DJUA, SP500C, SP500H and all the
simulated series, the coefficients associated with vt and ht are not significantly different
from zero over the forecast period from 8th February 1994 to 31 January 1966. Hence,
vt and ht do not contain information to explain the squared errors generated by the
GARCH family models. However, in the case of SP500L, 2 is significantly negative
over the 500 daily forecast period, and then ht generated from GARCH family models
contained information to explain t2. The negative coefficients imply that ht is
negatively correlated with t2, a perverse result in economic terms.
Table 7 shows that in all real series the 1 coefficient is significantly positive
whereas 2 is not significantly different from zero. Hence, the LALA model dominated
the GARCH family models over the 500 daily forecast period. In the case of the
simulated series, 1 and 2 are significantly positive, except for the EGARCH series
where 2 is not significantly different from zero. This result should be interpreted
carefully; if both coefficients are non-zero, significant information is revealed in vt and
ht and the information sets are independent. There is no statistical explication for the
fact that ht contains information above that found in vt.
[Table 6]
[Table 7]
6. CONCLUSION
The purpose of this paper is to analyze the forecasting volatility accuracy of the
GARCH-family and NN models of indices obtained from the New York Stock
Exchange (DJIA, DJTA, DJBA, DJUA, SP500H, SP500L, SP500C) and simulated
series. The NN models are based on non-parametric models developed originally by
Stone (1977) and Cleveland (1979), and were used by Farmer and Sidorowich (1987) to
predict chaotic time series. For financial series, the in-sample period estimation for the
models was 2 Jan 1962 to 7 Feb 1994. For the out-of-sample period it was 8 Feb 1994 to
12
31 Jan 1996. One-step-ahead forecasts were generated for both financial and simulated
series. Performance criteria included bias tests and informational content tests.
The out-of-sample forecasting results indicated a sharp difference between the
performance of the GARCH-family models and the NN model. Forecasts from GARCH
family models, in most cases, were biased and exhibited no significant informational
content, except for the GARCH, GARCH-M and EGARCH simulated series in which
the squared errors were generated from a NN model. In contrast, NN model forecasts
generally exhibited unbiased estimations, and in all cases presented a significant
information constant when squared errors were generated from a NN model. Our results
show that the NN model improves considerably on the out-of-sample volatility
forecasting performance of the GARCH family models.
REFERENCES
Bajo-Rubio, O., Fernández-Rodríguez, F. and Sosvilla_Rivero, S. (1992), ‘Chaotic
behaviour in exchange –rate series. First results for the Peseta_U.S. Dollar case’,
Economics Letters 39, 207-211.
Bera, A. and Higgins, M.L. (1993), ‘ARCH Models: Properties, estimation and testing’,
Journal of Economic Surveys 7, 305-366.
Berndt, E., Hall, B., Hall, R. and Hausman, J. (1974), ‘Estimation inference in nonlinear
structural models’, Annals of Economic and Social Measurement 4, 653-665.
Black, F. (1976), ‘Studies in Stock Price Volatility’, American Statistical Association,
Proceedings of the 1976 Business Meeting of the Business and Economic Statistics
Section, 177-181.
Bollerslev, T. (1986), ‘Generalized autoregressive conditional heteroskedasticity’,
Journal of Econometrics 31, 307-327.
Bollerslev, T., Chou, R. and Kroner, K. (1992a), ‘ARCH modeling in finance: A review
of the theory and empirical evidence’, Journal of Econometrics 32, 5-59.
Bollerslev, T. and Wooldridge, J. (1992b), ‘Quasi-maximum likelihood estimation and
inference in dynamic models with time varying covariances’, Econometric Reviews
11, 143-172.
Casdagli, M. (1992), 'Chaos and deterministic versus stochastic non-linear modeling',
Journal of the Royal Statistical Society, Ser., b, 54, 303-328.
Casdagli, M. and Weigend, A. S. (1994), ‘Exploring the continuum between
deterministic and stochastic modelling’, In Weigend, A.S. and Gershenfeld, N.A.
(Eds), Time Series Prediction: Forecasting the Future and Understanding the Past.
Reading, MA. Addison Wesley.
Cleveland, W.S. (1979), ‘Robust locally weighted regression and smoothing
scatterplots’, Journal of the American Statistical Association 74, 829-836.
Cleveland, W.S. and Devlin, S.J. (1988), 'Locally weighted regression: an approach to
regression analysis by local fitting'. Journal of the American Statistical Association,
83, 596-610.
Clyder, W.C. and Osler C. L. (1997), ‘Charting: Chaos theory in disguise?’, Journal of
Futures Markets 17, 489-514.
13
Diebold, F. X. and Nason, J. A. (1990), ‘Nonparametric exchange rate predictions’,
Journal of International Economics 28, 315-332.
Engle, R.F. (1982), ‘Autoregressive conditional heteroskedasticity with estimates of
variance of U.K. inflation’, Econometrica 50, 987-1007.
Fair, R.C. and Shiller, R.J. (1989), ‘The informational content of ex ante forecasts’, The
Review of Economics and Statistics 27, 325-331.
Fair, R.C. and Shiller, R.J. (1990), ‘Comparing information in forecasts from
econometrics models’, The American Economic Review 80, 375-389
Farmer, D. and Sidorowich, J. (1987), ‘Predicting chaotic time series’, Physical Review
Letters 59, 845-848.
French, K., Schwert, G. and Stambaugh, R. (1987), ‘Expected stock returns and
volatility’, Journal of Financial Economics 19, 3-29.
Fernández-Rodríguez, F. Sosvilla-Rivero, S. (1998). ‘Testing nonlinear forecastability
in time series: Theory and evidence from the EMS’, Economics Letters 59, 49-63.
Fernández-Rodríguez, F., Sosvilla-Rivero, S. and Andrada_Félix, J. (1999), ‘Exchangerate forecasts with simultaneous nearest-neighbor methods: Evidence from the
EMS’, International Journal of Forecasting 15, 383-392.
Härdle, W. and Linton, O. (1994), ‘Applied nonparametric methods’. In Handbook of
Econometrics 4, Engle, R.F., MacFadden, D. (eds). Elsevier, Amsterdam.
Lisi, F. and Medio A. (1997), ‘Is a random walk the best exchange rate predictor?’,
International Journal of Forecasting 13, 255-267.
Lo, A.W. and MacKinlay, A. C. (1999), A non-random walk down wall street, Princeton
University Press. Princeton, New Jersey.
Mizrach, B. (1992), ‘Multivariate nearest-neighbor forecast of EMS exchange rates’,
Journal of Applied Econometrics 7, S151-S163.
Nelson, D.B. (1991), ‘Conditional heteroskedasticity in asset returns: A new approach’,
Econometrica 59, 347-370.
Newey, W.K. and West, K.D. (1987), ‘A simple, positive semidifinite
heteroskedasticity and autocorrelation consistent covariance matrix’, Econometrica
55, 703-708.
Pagan, A.R. and Schwert, G.W. (1990), ‘Alternative models for conditional stock
volatility’, Journal of Econometrics 45, 267-290.
Philippatos, G. C. and Wilson C. J. (1972), ‘Entropy, market risk, and the selection of
efficient portfolios’, Applied Economics 4, 209-220.
Philippatos, G. C. and Wilson C. J. (1974), ‘Entropy, market risk, and the selection of
efficient portfolios: reply’, Applied Economics 6, 77-81.
Schwert, G. (1989), ‘Why does stock market volatility change over time?’, Journal of
Finance 44, 1115-1153.
Schwert, G. (1990), ‘Shock volatility and the crash of 87’, Review of Financial Studies
3, 77-102.
Soofi, A.S. and Cao, L. (1999), ‘Nonlinear deterministic forecasting of daily pesetadollar exchange rate’, Economics Letters 62, 175-180.
Sosvilla-Rivero, S., Fernández-Rodríguez, F. and Bajo-Rubio, O. (1999), ‘Exchange
rate volatility in the EMS before and after the fall’, Applied Economics Letters 6,
717-722.
Stone, C. J. (1977), ‘Consistent nonparametric regression’, Annals of Statistics 5, 595620.
Sugihara, G. and May, R.M. (1990), ‘Nonlinear forecasting as a way of distinguishing
chaos from measurement error in time series’, Nature 344, 734-741.
14
Takens, F. (1981), Detecting strange attractors in turbulence. in: lect. notes math 898,
Raud, D.A. and Young L.S. (eds). Springer-Verlag, New York
Wayland, R., Pickett, D., Bromley, D. and Passamante, A. (1994), 'Measuring spatial
spreading in recurrent time series'. Physyca D 79, 320-334
15
Table 1. Descriptions of stocks exchanges indices time series.
Index
Explanation
DJBA
Close for the Dow Jones 20 Bond Average.
DJIA
Close for the Dow Jones Industrial Average.
DJTA
Close for the Dow Jones Transportation Average.
DJUA
Close for the Dow Jones Utility Average.
SP500H
High for the S&P 500 index.
SP500L
Low for the S&P 500 index.
SP500C
Close for the S&P 500 index.
Table 2. Descriptive statistics.
Sample
DJBA
DJIA
Statistics
Mean
0.000027 0.000234
Median
0.000000 0.000264
Maximum
0.164649 0.096662
Minimum
-0.058005 -0.256509
Std. Deviation 0.003092 0.009276
Skewness
16.91432 -2.390666
Kurtosis
961.9340 76.24175
Period
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
2-jan-1962 to 31-jan-1996
DJTA
DJUA
SP500H
SP500L
SP500C
0.000304
0.000180
0.072551
-0.192361
0.010383
-0.759212
20.81961
0.0000687
0.0000000
0.080016
-0.166481
0.007067
-1.362277
46.13572
0.000254
0.000327
0.057217
-0.140601
0.008111
-0.667659
17.61075
0.000255
0.000432
0.101400
-0.224859
0.008701
-1.899351
62.60098
0.000256
0.000338
0.087089
-0.228997
0.008864
-2.101220
60.60433
Jarque-Bera
Probability
3.29e+08
0.000000
1925480.0 114317.6
0.000000 0.000000
667695.0
0.000000
76936.44
0.000000
1274801.0 1192313.0
0.000000 0.000000
Observations
8578
8578
8578
8578
8578
8578
8578
Table 3. Descriptive statistics for 1(i)
1(i)
1(i)
(Gaussian
case)
(GARCH
case)
Sample Statistics
Mean
1.003546 0.872225
Median
0.961091 0.871340
Maximum
2.738856 1.381036
Minimum
0.292862 0.438071
Std. Dev.
0.345485 0.137213
Skewness
0.705542 0.034237
Kurtosis
3.726891 2.891560
Observations
1000
1000
16
Table 4. Pagan and Schwert (1990) bias test. GARCH models.
Panel (A): Financial series
Parameters
DJIA
DJTA
DJUA
SP500C
R2
0.17
0.22
0.25
0.24
a
-0.000039
-0.00010
-0.000025
-0.00039
0
(-3.97)
(-4.93)
(-1.93)
(-3.86)
[0.000]
[0.000]
[0.027]
[0.000]
1.6849
2.0301
1.3329
1.7162
1b
(2.19)
(3.93)
(1.25)
(2.53)
[0.015]
[0.000]
[0.106]
[0.006]
Wald c
33.950
43.181
13.456
80.702
[0.000]
[0.000]
[0.001]
[0.000]
Panel (B): Simulated series
Parameters
GARCH
GARCH-M EGARCH
R2
0.53
0.54
0.0046
-0.00475
-0.00427
-0.000642
0a
(-4.31)
(-6.36)
(-0.06)
[0.000]
[0.000]
[0.476]
b
1.9012
1.8568
0.02048
1
(3.53)
(4.84)
(-64.09)
[0.000]
[0.000]
[0.000]
Wald c
30.360
68.479
588769
[0.000]
[0.000]
[0.000]
SP500H
0.20
-0.000045
(-5.19)
[0.000]
2.1228
(3.75)
[0.000]
109.13
[0.000]
SP500L
0.25
-0.000048
(-3.55)
[0.000]
1.9412
(2.53)
[0.006]
78.610
[0.000]
Note: To correct heteroskedasticity and correlation problems within the error term, a heteroskedastic and
autocorrelation consistent covariance matrix is used to estimate the standard errors of coefficients in
equation (12) (Newey-West, 1987). a The t-values in parentheses are calculated for the null hypothesis
that i = 0 (i=1,2). b 2 statistics are calculated to test the null hypothesis of (1, 2) = (0,0). The p-value
is in brackets.
17
Table 5. Pagan and Schwert (1990) bias test. NN method
Panel (A): Financial series
Parameters
DJBA DJIA
DJTA DJUA
SP500C
npp
6
6
10
8
5
R2
0.98
0.80
0.89
0.99
0.67
0.0000 0.0000 0.0000 0.0000
0.0000
0a
(-0.499) (0.388) (0.878) (0.354)
(0.121)
[0.618] [0.700] [0.380] [0.724]
[0.904]
0.9843 0.8997 0.9740 1.0004
0.9284
1b
(0.756) (1.741) (0.573) (0.131)
(0.398)
[0.450] [0.082] [0.567] [0.896]
[0.691]
Wald c
0.620
3.751
0.997
0.243
0.584
[0.734] [0.153] [0.607] [0.885]
[0.747]
Panel (B): Simulated series
Parameters
GWN
GARCH
GARCH-M EGARCH
npp
8
11
12
12
R2
0.99
0.68
0.24
0.41
-7.4654
0.0004
0.0010
0.0025
0a
(-1.666)
(0.361)
(0.734)
(0.581)
[0.096]
[0.718]
[0.463]
[0.561]
b
1.0001
1.0204
0.9214
0.9974
1
(0.074)
(0.165)
(0.369)
(0.011)
[0.941]
[0.869]
[0.712]
[0.991]
Wald c
3.413
0.879
0.859
4.742
[0.181]
[0.645]
[0.651]
[0.093]
SP500H
6
0.85
0.0000
(-2.403)
[0.017]
0.9984
(0.024)
[0.981]
10.738
[0.005]
SP500L
10
0.92
0.0000
(-5.440)
[0.000]
1.0126
(0.343)
[0.731]
30.699
[0.000]
EGARCH-M
11
0.42
0.0020
(0.340)
[0.734]
1.0135
(0.045)
[0.964]
3.129
[0.209]
Note: To correct heteroskedasticity and correlation problems within the error term, a heteroskedastic and
autocorrelation consistent covariance matrix is used to estimate the standard errors of coefficients in
equation (12) (Newey-West, 1987). a The t-values in parentheses are calculated for the null hypothesis
that i = 0 (i=1,2). b 2 statistics are calculated to test the null hypothesis of (1, 2) = (0,0). The p-value
is in brackets.
18
Table 6. Informational test. GARCH models
Panel (A): Financial Series
Parameters
DJIA
DJTA DJUA
SP500C
R2
0.00
0.01
0.01
0.00
0.1980 0.6746 0.0005
0.9011
1ª
(0.349) (1.319) (0.335)
(1.546)
[0.727] [0.188] [0.738]
[0.123]
ª
-11.0102 18.588 -20.8763
-18.1718
2
(-0.550) (1.055) (-1.305)
(-0.943)
[0.583] [0.292] [0.193]
[0.346]
Wald b
0.470
3.616
1.793
2.780
[0.790] [0.164] [0.408]
[0.249]
Panel (B): Simulated series
Parameters
GARCH
GARCH-M EGARCH
2
R
0.00
0.01
0.00
0.1795
-0.9142
-0.0781
1ª
(0.562)
(-1.902)
(-0.370)
[0.574]
[0.058]
[0.712]
0.2118
2.2362
0.0210
2ª
(0.084)
(1.079)
(0.233)
[0.933]
[0.281]
[0.816]
Wald b
1.629
3.629
0.228
[0.443]
[0.163]
[0.892]
SP500H
0.00
-0.0984
(-0.253)
[0.801]
-35.3970
(-1.029)
[0.304]
1.071
[0.585]
SP500L
0.02
0.7509
(1.588)
[0.113
-47.8027
(-2.848)
[0.005]
8.790
[0.012]
Note: To correct heteroskedasticity and correlation problems within the error term, a heteroskedastic and
autocorrelation consistent covariance matrix is used to estimate the standard errors of coefficients in
equation (12) (Newey-West, 1987). a The t-values in parentheses are calculated for the null hypothesis
that i = 0 (i=1,2). b 2 statistics are calculated to test the null hypothesis of (1, 2) = (0,0). The p-value
is in brackets.
19
Table 7. Informational test. NN method.
Panel (A): Financial series
Parameters
DJIA
DJTA DJUA
SP500C
R2
0.81
0.89
0.99
0.68
0.8977 0.9728 1.0004
0.9196
1ª
(15.153) (21.469) (341.509) (5.102)
[0.000] [0.000] [0.000]
[0.000]
ª
0.4883
0.1992
0.1695
1.1619
2
(0.638) (0.402) (0.066)
(1.580)
[0.524] [0.688] [0.947]
[0.115]
Wald b
273.739 460.931 116721.8 32.645
[0.000] [0.000] [0.000]
[0.000]
Panel (B): Simulated series
Parameters
GARCH
GARCH-M EGARCH
R2
0.69
0.31
0.41
ª
0.9353
0.7036
0.9974
1
(7.591)
(3.4338)
(4.100)
[0.000]
[0.001]
[0.000]
1.0234
1.4244
-0.0126
2ª
(3.1110)
(5.032)
(-0.582)
[0.002]
[0.000]
[0.561]
b
Wald
87.775
46.583
18.084
[0.000]
[0.000]
[0.000]
SP500H
0.85
0.9991
(15.182)
[0.000]
0.2455
(0.4061)
[0.685]
248.499
[0.000]
SP500L
0.92
1.0150
(28.984)
[0.000]
-0.2995
(-0.540)
[0.590]
962.836
[0.000]
Note: To correct heteroskedasticity and correlation problems within the error term, a heteroskedastic and
autocorrelation consistent covariance matrix is used to estimate the standard errors of coefficients in
equation (12) (Newey-West, 1987). a The t-values in parentheses are calculated for the null hypothesis
that i = 0 (i=1,2). b 2 statistics are calculated to test the null hypothesis of (1, 2) = (0,0). The p-value
is in brackets.
20
x d t1
…..
x d t2
.
xt  1
2
.
xt  1
1
x d tk
.
xt 
1
k
…..
Figure 1. Graphical interpretation of NNs
3.0
1.4
2.5
1.2
2.0
1.0
1.5
0.8
1.0
0.6
0.5
0.0
0.4
200
400
1
600
800
1000
(i)
Figure 2(a). Slopes for gaussian random series
200
400
600
800
1(i)
1000
Figure 2(b). Slopes for GARCH series
21