Econ201- Final Paper

Stochastic Volatility Model:
Bayesian Framework with High Frequency Data
Haolan Cai
Econ201- Spring 2009
Academic honesty pledge that the assignment is in compliance with the Duke
Community Standard as expressed on pp. 5-7 of "Academic Integrity at Duke: A Guide
for Teachers and Undergraduates "
____________________________________
1 Introduction
1.1 Motivation
Volatility or risk of an asset is an important feature that needs to be well
understood for a variety of reasons, such as the need to account for risk while assembling
a basket of assets and in the pricing of options. While some models, like the basic BlackSholes options pricing model, assume constant volatility over time, the data suggests that
allowing volatility to change over time is much more sensible. Just eyeballing the timeseries of any financial asset’s returns will show patterns in the grouping of periods of
high and low returns.
Stochastic volatility models have been introduced as a way of modeling this
changing variance over time. While stochastic volatility models have been around since
the 1980s, stochastic volatility models in a Bayesian framework are relatively newer and
less explored. Thus, I chose a model using Gibbs-sampling with an underlying
autoregressive process of order one to explore patterns in volatility.
The use of high frequency data is also relatively new. There are obvious
advantages of using such datasets. For instance, one could use a smaller and more
relevant time span of data but have the same number of data points to work with. In
particular, using within-day returns to train a stochastic volatility model allows with-in
day predictions. Using within-day returns, however, poses its own problems in the
context of a stochastic volatility model.
1.2 Background
There has been significant work completed on assessing the validity of stochastic
volatility models in the frequentist framework. Among the issues are a few from Gallant,
1
Chieh, & Tauchen (1997) such as the need for an asymmetric thick-tailed distribution for
innovations and long-term dependence in the model for volatility. Stochastic volatility
models in the Bayesian framework solve some such issues while creating a few of their
own. One such disadvantage of the model is that the likelihood function is not tractable
and more careful estimation is needed.
Jacquier et al. (1994) developed a Bayesian method for estimating a simple,
univariate stochastic volatility model. This was extended to use a Gibbs sampler for
simpler and faster implementation by Carter and Kohn (1994). West & Aguilar (2000)
furthered this work with extensions to improve model fit. My work builds on the current
literature by applying West & Aguilar’s model to a real high frequency data set and
assessing the model on two different rubrics for both in-sample fit and out-of-sample
predictive power.
1.3 Data
The data comes from minute-by-minute prices of General Electric stock from
1997 to 2008. Using the last 2 years of data, I computed two-hourly returns on which to
assess in-sample fit while reserving the last month of data for out-of-sample forecasting
evaluation. This gives exactly 2000 time points on which to train the model and 60 time
points for out-of-sample prediction. Figure 1 shows the 2-hourly returns of GE for the
entire time period with the red lines marking the section used for training the model and
the remaining data used for predictive fit.
2 Methods
2.1 Model
2
The model that I choose is from West and Aguilar (2000). The model takes a
transformation of the returns so that the volatility can be modeled as an AR process with
some latent mean volatility. The canonical model is as follows:
N (0,  t2 )
rt
 t  exp(   xt )
xt  AR(1  )
2
Applying a transformation of log(r )/2 gives the following linearized model:
yt    xt   t
xt  AR(1 (, ))
The parameters to be estimated are (mu, phi, nu). Thus, the interpretation of the
model is a linear combination of some baseline volatility on the log scale and a latent
AR(1) process with some added error. Gamma is the error term, which is taken to be
distributed one-half of log of chi-squared with degrees of freedom one.
2.2 Differentiation from HMM
This model differs from the traditional Hidden Markov Model in two respects.
The first, a relatively minor detail, is the baseline volatility as estimated by mu. Secondly,
the gamma error replaces the usual Gaussian estimation for the observational noise. The
one-half log chi-squared distribution with one degree of freedom is non-Gaussian and
slightly left-skewed and can be approximated with a discrete mixture of normal
distributions with known parameters, i.e.,
J
p( t )   qi N (bi , wi )
i 1
where q is the weight of the j-th normal distribution with mean b and standard deviation
w.
3
Estimating the error with a normal distribution (or in this case a mixture of
normals) puts the estimation algorithm into the Gaussian hidden AR model framework.
The shape of the gamma parameter’s distribution has been well studied and can be very
accurately approximated by a mixture of j = 7 normals with the following conditional
moments:
From Kim, Shephard, and Chib (1998). Minor changes to the error approximation such as
adjustments in tail size or finer approximation with more components can be made
without changing the Gibbs sampling structure.
2.3 Sampling Framework
I chose to run this model in a Bayesian framework to allow uncertainty in the
parameters to carry through the model. This means running a Gibbs sampling algorithm
in which the parameters are iteratively sampled from a combination of the data and prior
to construct a posterior distribution for each parameter. The point estimates for the
parameters is the maximum likelihood estimate from that posterior distribution. The
following standard priors were specified for each parameter: mu is taken to be normal,
phi is taken to be normal as well, and nu has the inverse gamma prior.
Sampling in this framework gives the following updating procedure (in no
particular order) for each of the condition posterior distributions of the parameters:
1) p( 1:n y1:n , x0:n ,  ,  , )
It is easy to see that the gammas are conditionally independent, and
normalizing over j updates the probabilities qj at each time point t. This
4
defines a posterior for gamma that can be sampled at each new time point
to gain new mixture component indicators.
2) p( y1:n , x0:n , , )
This is the standard posterior for the AR(1) persistence coefficient under
the normal prior. This can be trivially sampled at each new time point to
generate new values of phi.
3) p( y1:n , x0:n , ,  )
This is the posterior for the AR(1) innovation variance under the inverse
gamma prior. This distribution is also trivially sampled at each new time
point for new values of nu.
4) p( y1:n , x0:n ,  1:n ,  , )
This is the posterior under the normal prior for mu and is based on
conditionally normal, independent observed values of yt from the data.
Thus, the posterior is also normal.
5) p( x0:n y1:n ,  1:n ,  ,  , )
Under the above conditioning the model is a linear (conditionally) normal
AR(1) HMM, conditional on the error term. The error variances of the
distribution for yt are known and differ over time (any one of the 7
normals in the mixture that approximates the log chi-squared distribution).
To construct a posterior distribution, a forward-filtering, backwardsampling (FFBS) algorithm is used.
Following this sampling algorithm iteratively gives the Gibbs sampling process for this
SV model.
5
2.4 Forward-Filtering Backward-Sampling
Since the error variance varies over time, a Kalman filter-like forward smoothing,
from t = 0 to t = n, and a backward simulating, states xn, xn-1, …, x0, is needed to generate
the full sample x0:n. It is possible (and simpler) to use a Gibbs sampler on each of the
complete conditionals to generate latent xt values instead. However, that technique tends
to be less effective in practice, especially where AR dependence is high (phi close to 1)
as is such in financial data. Due to the high degree of dependence between successive
iterations, the Gibbs sampler moves around the state space very slowly and thus also
converges very slowly. The FFBS approach takes all xt variates together, moves though
the state space quickly, and generally converges rapidly as well.
2.5 Predictive Model
The predictive model was built into the sampling algorithm to allow the
uncertainty in the parameter estimates to carry through. For each iteration, the next 60
time steps were estimated using that iteration’s generated parameters. Each step of the
AR(1) process is estimated using the following model:
xt   xt 1   t
t
N (0,

)
(1   2 )
The baseline volatility for each iteration was added after all time step estimates were
generated.
2.6 Assessing Fit
Assessment of the model centers around two metrics. The in-sample fit is
computed as the coefficient of determination by regressing the absolute value of real
6
returns on the volatility estimates produced by the model for that period. While this R2
value is expected to be low, it can still be considered a valid metric following Anderson
& Bollerslev (1998). The out-of-sample prediction value of the model is calculated using
the correlation between the average predicted 60 time steps and the actual realized
volatility during that time period. This is similar to a metric used in Anderson &
Bollerslev (1998).
2.7 Normalization
As mentioned before, some problems arise when using high frequency data in
stochastic volatility models. The volatility smile is well documented in the literature
when computing intraday returns. To control for this pattern within the day, I normalized
each of the returns by the average realized volatility for each two hour window. Figure 2
illustrates the normalized returns over the given time period.
2.8 Error Term
Initial implementation of the model revealed problems with the model
specification, particularly the size of the tails in the error term. Analysis of the
conditional posterior probabilities showed the left-most tail of the log chi-squared
distribution being selected more often than expected. The model was then trivially
modified to allow increased tail area by uniformly scaling the variances by a constant
greater than one. The optimal scaling factor was found to be close to 1.75.
2.9 Random Noise
In theory, price movements are continuous and there should be no zero returns. In
practice, due to events such as missing data and the discreteness of reported prices, we
see zero returns from time to time. Failure to account for these zero returns leads to a
7
breakdown in the model, as the log of zero is undefined. To deal with this issue, a small
amount of random white noise is added to the end of all prices. The white noise has mean
zero and variance 10xe-7. This preserves small (discretely indistinguishable) changes in
price without affecting the return structure.
3 Results
The stochastic volatility model was estimated using the Gibbs sampling algorithm
outlined above. A burn-in of 500 iterations was applied and then 5000 posterior samples
were generated from the model. The posterior distributions for the model parameters
calculated with the 2 years of 2-hourly normalized returns are shown in Figure 3. The
corresponding MLEs of each distribution are: mu = .4535, phi = .9458, and rs = .3290.
The high value of phi implies a high persistence in volatility, congruent with the volatility
clustering observed in the data.
The in-sample fit is shown in Figure 4. The model shows a reasonable
approximation of the movements in volatility throughout the time period. The importance
of using a time-varying volatility model is clear from this figure, as there are distinct
periods of high and low volatility.
The in-sample fit was calculated as the coefficient of determination from the
following regression: abs( yt )  0  1 t , where  t is the estimated volatility from the
model. The estimated volatility is taken to be a biased estimator for the absolute value of
the returns. The stochastic volatility model explains 10.39% of the variation in the
absolute value of the returns. This value compares favorably to the range of values given
in Anderson & Bollerslev (1998).
8
The out-of-sample predictive value was calculated as the correlation between
realized volatility as seen in the data and the predicted volatility from the model.
Estimating  t  0  1 t yields a correlation of .0860. Figure 5 shows the value of the
true realized volatility on the x-axis and the predicted volatility on the y-axis.
The model provides a significantly lower range than the realized volatility. However,
volatility is notoriously difficult to predict. The R2 value for the fit is .0074, comparing
favorably to the range given by Anderson & Bollerslev (1998).
4 Conclusions
The model provides a good in-sample fit for high frequency data that is
comparable to other assessments of stochastic volatility models with lower frequency
returns in the literature. The model correctly represents the heteroskedastic nature of
volatility and the high persistence of the volatility structure. The out-of-sample fit for
intraday volatility is also comparable to other models in the literature. However, the
model predicts much less volatility than is realized by the data.
Further work to be done includes updating the AR process to include a high order
model for longer memory models to better approximate the volatility structure. A high
order AR process or a mixture of AR(1)s could potentially greatly enhance the predictive
and fit value of this model. AR processes modeling more than the price movement itself,
perhaps extra macro-economic factors, could also enhance the predictive quality of the
model. Further extensions include the development and application of new rubrics for
assessing the performance of stochastic volatility models for greater comparative strength.
9
Further work could be done to ascertain the optimal high frequency periods for predictive
and modeling purposes.
10
Figures
Figure 1: Daily GE returns from 1997 to 2008. Dashed lines indicate the start and finish of time
period used for model training. Remaining data used for predictive comparison.
Figure 2: Comparison of the normalized returns and original returns. Note the price structure has
not been changed.
11
Figure 3: The posterior distributions for mu, phi and variance of innovation in AR process
Figure 4: Plot shows the mean estimated volatility from the model at each time point.
12
Figure 5: Predicted volatility values vs. real volatility values.
13
References
Aguilar, O. & West, M. (2000). Bayesian dynamic factor models and portfolio allocation.
Journal of Business and Economic Statistics 18, 338-357.
Anderson, T.G. & Bollerslev, T. (1998). Answering the skeptics: Yes, standard volatility
models do provide accurate forecasts. International Economic Review 39, 885-905.
Carter, C. & Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika 81,
541-553.
Gallant, A.R., Hiesh, D., & Tauchen, G. (1997). Estimation of stochastic volatility
models with diagnostics. Journal of Econometrics 81, 159-192.
Jacquier, E., Polson, N.G. & Rossi, P.E. (1994). Bayesian analysis of stochastic volatility
models. Journal of Business and Economic Statistics 12, 69-87.
Kim, S., Shephard, N., & Chib, S. (1998). Stochastic volatility: likelihood inference and
comparison with ARCH models. Review of Economic Studies 65, 361-393.
14