Marginal distribution modeling and value at risk

Journal of Applied Operational Research (2014) 6(4), 207–221
© Tadbir Operational Research Group Ltd. All rights reserved. www.tadbir.ca
ISSN 1735-8523 (Print), ISSN 1927-0089 (Online)
Marginal distribution modeling and value at
risk estimation for stock index returns
Christos Katris * and Sophia Daskalaki
University of Patras, Greece
Abstract. The purpose of this study is to explore probability distributions for modeling the marginal distributions of stock
index returns and to further employ them for risk estimation. It is well accepted that stock returns follow heavy-tailed and
leptokurtic distributions. In order to describe sufficiently the empirical characteristics of the stock index returns of eight European
countries for the period 2006-2012, flexible models with varying number of parameters such as the Generalized Hyperbolic
distribution and Normal Mixtures have been employed. The fit of these distributions is evaluated using Kolmogorov-Smirnov
statistic and Euclidean distance. For risk estimation, on the other hand, a common tool is Value at Risk (VaR) which is estimated
with the help of the selected distribution models along with the non-parametric historical method. Evaluation and comparison of
the VaR models is performed using backtesting in conjunction with a binomial test, both in-sample to validate models and
out-of-sample to test forecasting performance. Lastly, rolling window strategies are employed in order to improve forecasts
and identify a successful strategy.
Keywords: financial modeling; generalized hyperbolic distributions; normal mixtures; probability distributions; marginal
value at risk; rolling estimation
* Received December 2013. Accepted April 2014
Introduction
Development of theoretical models for the marginal distributions of stock returns is important for many problems
in the field of Finance. Previous research indicates that there are several well established empirical characteristics
in financial data, i.e. heavy tails, leptokurtosis, long memory and volatility clustering. The first two concern their
marginal distributions, while the other two are dynamic characteristics affecting possible time series models.
Marginal distributions have a static nature; however they play an important role for constructing dynamic models.
Theories and models developed for stock returns implicitly or explicitly set an assumption for their marginal
distribution and for many years this had been the normal distribution (e.g. Black-Scholes Option Pricing model).
Although normal distribution is an attractive mathematical model, empirical studies do not always support its
widespread usage (Peters 1991; Harvey and Siddique 2000; Alles and Kling 1994) and indicate alternative models
as more suitable. Since 1963 till now researchers have suggested Paretian Stable distributions (Mandelbrot 1963;
Fama 1965), a scaled t-distribution (Praetz 1972; Blattberg and Gonides 1974) and mixtures of Normal distributions
(Kon 1984). More recently, Aquino (2006) fitted Laplace to data from the Philippines Stock Exchange, while
Prause (1997) and Necula (2009) used the Generalized Hyperbolic (GH) family of distributions to model certain
financial assets. On the other hand, subclasses of the GH distributions, e.g. the Normal Inverse Gaussian, were
* Correspondence: Christos Katris, University of Patras, Electrical
and Computer Engineering Department, Rio, Patras 26504 Greece.
E-mail: [email protected]
208
Journal of Applied Operational Research Vol. 6, No. 4
proposed in Barndorff- Nielsen (1995; 1997), and the plain Hyperbolic in Eberlein and Keller (1995) and Kuchler et al
(1999). Lastly, Normal mixtures along with the Hyperbolic and LogF have been used in Behr and Poetter (2009).
In addition to serving other needs, marginal distributions may also be used for estimating the risk of a portfolio
or a stock index. They provide parametric models for Value at Risk (VaR) estimation, a common approach for
estimating risk. Alternatively, VaR may be estimated using the nonparametric historical method, the RiskMetrics
or Delta Normal (Morgan 1995, McNeil et al. 2005) which is a widely used technique that assumes normality for
the returns, as well as models for the tails like the Generalized Pareto distribution (Embrechts et al. 1997; Rachev
et al. 2010). VaR can be static or time varying leading to the distinction of marginal or conditional according to
Gourieroux and Jasiac (2009). In this work, we adopt marginal distributions as static VaR models and test their
capability for risk identification by estimating the marginal VaR. As far as we know, Normal mixtures have been
employed previously as parametric VaR models in Venkatraman (1997), while from the GH family only certain
subclasses, i.e. the Normal Inverse Gaussian and the Hyperbolic distribution have been used (Önalan 2010; Bauer
2000; Doric and Doric 2011). Chen et al. (2008), on the other hand, developed a dynamic model which is based
on the GH distribution in a nonparametric framework.
In this paper, using returns of eight European stock indices we apply fitting techniques to the data in order to
achieve “good” marginal distributions for each one of them. The experimentation with marginal distributions goes
further into VaR estimation, where the 5% and 1% long position VaRs are estimated. The estimates are evaluated
using the Kupiec test, a Binomial test for the number of exceedances of the estimated VaRs. The suggested models
are tested both in-sample and out-of-sample. It is true that a distribution model can be of value if and only if it
proves to be effective with data that were not used for the fitting process. Furthermore, with our empirical study
we show that a parametric VaR estimation model has to be updated continuously in order to be successful with
out-of-sample data and for therefore rolling windows is a reasonable approach. Swanson and White (1997) point
the importance of rolling estimation in econometric modeling and our results confirm it.
Next section is concerned with the distributions that will be considered as possible models for the marginals of
the returns and the methodology for evaluating and comparing their fit to data. The following section gives a
discussion on the marginal VaR and the different approaches for its estimation. Moreover, it outlines the
backtesting method, which will be used for evaluation and the rolling windows methodology to be used for improving
future VaR estimates. The following two sections present the empirical study performed to support our methodology,
the framework for comparison and evaluation of the models, and lastly the successful models that are recommended
from this work. The last section summarizes findings and draws conclusions.
Modeling methodology for marginal distributions
The marginal distribution function of a time series { Xt , t  1,
, T } can be thought as
Ft (x)  P(Xt  x)  F(, , , x , , , )
where F(x1 , , xt , , xT ) is the joint probability distribution function of the series. In practice, the marginal
distribution of a time series is approximated by neglecting the dependence structure that may exist in the data.
However, as it will be explained later with the empirical data of eight stock index returns, the assumption of
independence is not always invalid.
As indicated by many researchers the marginal distributions of stock and stock index returns exhibit heavy-tails
and leptokurtosis. Therefore in this section, we give a brief overview of the Normal mixtures and the GH distribution,
which are flexible enough and may approximate effectively leptokurtic data with heavy tails. Moreover, these
distributions depend on five or more parameters, thus satisfy a requirement posed in Cont (2001), according to
which a distribution model needs at least 4 parameters in order to describe satisfactorily stock returns.
The assumption that stock returns may follow mixtures of distributions rather than a single distribution has provided
alternative approaches to our problem (Kon, 1984; Behr and Poetter, 2009). To define the mixtures, let N be a set
of n univariate normal
distributions with probability density functions fi ( x) and a set of weights wi , one for each
n
distribution, with  wi  1 . The density function of the mixture of the n distributions is given by:
i 1
n
f ( x)   wi f i ( x)
i 1
C Katris and S Daskalaki
209
The vector  of the parameters for the new distribution is of the form   (wi ; i ; i ), i  N , where i and
n parameters, respectively. In this paper we consider fitting mixtures of two and
 i2 are the location and dispersion
three normal distributions. Since  wi  1 , our models will carry five and eight parameters, respectively. For fitting a
1 data we employ the EM algorithm, which may provide Maximum Likelihood
mixture of normal distributions ito
estimates for the parameters through an iterative procedure (Dempster et al. 1977; McLachlan and Peel 2000).
The GH distribution, on the other hand, was introduced by Barndorff-Nielsen (1977). It is a normal variance-mean
mixture with a continuous mixing distribution and carries five parameters, μ for the location, α for the shape, β for
the skewness, λ for the kurtosis and δ for scaling. If the random variable X follows a GH distribution, then the
probability density function takes the form:
1
fGH ( x;  , ,  ,  ,  )  k{ 2  ( x   ) 2 }
1
2(   )
2
K

  2  ( x   )2 ) e ( x )
1(
2
where

( 2   2 ) 2
k
2 
1
(  )
2
K  (  2   2 )
and

1
1  1  2 t ( x  x1 )
K (t ) 
x e
dx, t  0 is the Bessel function of 3rd kind with index λ.
20

Certain subclasses of this very flexible family of distributions have been used extensively in Finance. These are the
Hyperbolic distribution, arising when λ = 1, and the Inverse Gaussian distribution, when λ = -1/2. These distributions
have provided successful models for marginal distribution of stock returns, as well as parametric risk models for
calculating VaR. For this work we consider the whole family of the GH distributions which requires estimation of
all five parameters using Maximum-Likelihood procedure. Since there is no analytic solution for the resulting
system, the estimation of the parameters is performed numerically.
Evaluation and comparison of the distributions considered will be performed in two ways; first in terms of their
fit to data and second in terms of their capacity in estimating risk. The goodness of fit for all suggested distributions
will be measured by the maximum distance between analytic and empirical distribution over all points in the sample
(Kolmogorov-Smirnov statistic) and by the corresponding Euclidean distance:
KS  Max| pempirical  panalytic |
Euclidean Distance 
( p
empirical
 panalytic )2
For our study the statistics KS and Euclidean distance were used only as measures to compare and rank the
models and not as statistical test functions for testing hypotheses. The Euclidean distance, specifically, was employed
because it takes into account the distance between empirical and theoretical model for all points, and not only the
maximum distance as KS statistic does. Regarding the comparison of the models on the basis of their risk estimation
capacity, it is discussed rigorously in the next section.
Value at Risk Estimation
In this section we suggest additional comparisons of the probability models and in terms of their ability to forecast
rare and damaging events which are described by their tail behavior, thus leading to risk estimation. In Economics
and Finance, the long position Value at Risk (VaR) is the maximum loss not to be exceeded with a given probability,
defined as risk level, over a given period of time. If Yt  pt  pt  k denote the change of a portfolio value during a
k-day horizon, the marginal long position VaR at the 100% risk level is then given by P(Yt  VaR)   , which
210
Journal of Applied Operational Research Vol. 6, No. 4
means that the VaR can be identified as the a-quantile of the marginal distribution of Yt . According to Gourieroux
and Jasiac (2009), the marginal VaR can be estimated using either a parametric model by VaR  F 1 ( ) , where
F( x ) is the distribution function of the returns, or a non-parametric method, where estimation is based exclusively
on empirical quantiles using past data (historical simulation), or methods where the objective is to find extreme
quantiles using a parametric model for the tail only.
The historical method is conceptually the simplest one for estimating VaR. It is based on the assumption that
from a risk perspective history will repeat itself. To apply the method, we need the actual historical returns and
calculate the corresponding empirical quantiles, which serve as VaR estimates at the desired levels. The most
widely used technique for this task is historical simulation, where data are separated in time windows and finally
obtain an overall VaR measure. The sample quantiles can be estimated by several different algorithms (Hyndman
and Fan 1996). In this paper, instead of the usual historical simulation, we applied bootstrap resampling with 1000
samples to obtain VaR estimates. For each sample we calculate the corresponding quantile using algorithm 8 in
(Hyndman and Fan 1996) and then obtain a mean quantile which serves as the static VaR estimate.
In order to evaluate and compare the resulting VaR models, the backtesting concept is adopted in conjunction
with the Kupiec Hypothesis Test. This approach suggests using past data to test effectiveness of the VaR estimates.
The test suggested by Kupiec (1995) applies a Binomial test to determine whether the observed frequency of
exceedances of a certain value is consistent with the frequency estimated from a model. If n is the number of VaR
exceedances in a sample of size T, then n ~ Binomial (T , p) where p is the theoretical probability of VaR exceedances,
i.e. the risk level. The proportion of exceedances in the sample ideally should be equal to the theoretical probability
p, so the Kupiec test carries the following hypotheses:
H0 :
n
 p,
T
H1 :
n
p
T
For the test we need the number of observations (T), the number of exceedances (n), the risk level (p) and the
level of significance (α). The corresponding statistic of the test is the likelihood ratio:




n
T n
p (1  p)


LR  2ln
T n 
 n n

n
   1     
  T    T   
Under the null hypothesis LR asymptotically converges to a chi squared distribution with one degree of freedom. If
2
the observed value for LR exceeds the critical value of  (1)
for the preset level of significance a, then the null
hypothesis is rejected.
The backtesting procedure, discussed previously, is a method that can be used for the evaluation of VaR models.
For its application it requires a sample of size T with past data and this may be the data used for the fitting process,
in which case we have in-sample evaluation, or a completely new data set for an out-of-sample evaluation. While
the former is the most common strategy, the latter is also very crucial because it indicates whether the model is
appropriate for future real data also.
When a static approach is followed, however, VaR estimates are projected to the future and this is not always
safe. The possibility of a failure in such projections has been criticized previously and rolling estimation has been
proposed from Basel committee. This procedure is more time consuming and computationally more intense, however
as one may see from this study it improves in most cases the out-of-sample performance of VaR estimates. For
implementing this step, rolling windows as a technique is used to update the parameters of the distributions. The
procedure starts with a window of size T, the estimation of the first VaR and its comparison to the T+1 observation.
Next, the oldest observation is deleted from the dataset, the newer is added and the step of the VaR model estimation is
repeated. The alteration of estimation and comparison is repeated for the whole period for which the out-of-sample
evaluation is performed. At the end of this procedure the Kupiec test is applied to test whether the exceedances of
the rolling VaRs is as expected. A critical question in planning such a procedure is the size of the window; however,
Basel committee’s standards refer to a size of at least 200 observations (Gourieroux and Jasiac, 2009). In our
study we experimented with windows of 250, 500, 750 and 1000 observations.
C Katris and S Daskalaki
211
Empirical Study and Evaluation of Models
The dataset used for this study consists of daily logarithmic returns (rt  100(log pt  log pt 1 )) of eight stock indices
from countries that use Euro as their official currency. The data were obtained either from the stock market official
websites (for Finland, Montenegro and Slovakia) or from the Wolfram Research database (for Austria, Germany
and France) or from the webpage www.stockwatch.cy.com (for Greece and Cyprus). The analysis, estimation of
parameters and model building were performed using a 5-year time frame and specifically from January 2006 to
December 2010. The evaluation and comparison of the extracted models were performed either in-sample with the
same data as those used for estimation or out-of-sample with a different set, from January 2011 to May 2012. Lastly,
for the analysis that is presented next the software packages R, Mathematica, Excel and STATA were used.
Table 1. European Stock Market Indices
S&P
Index
Country
ATX
DAX
OMX Helsinki 25
CAC 40
SAX
CSE
ASE
MONEX20
Austria
Germany
Finland
France
Slovakia
Cyprus
Greece
Montenegro
Rating
Outlook
AAA
AAA
AAA
AAA
A+
BBB+
CC
BB
Stable
Stable
Stable
Stable
Positive
Negative
Negative
Negative
Table 1 displays the stock market indices that were considered in this study and the country they represent. The
last two columns display the credit rating of each index and an assessment on the future prospects of each country
for economic development, as announced by the Standard & Poor’s financial institution in their 2010 list. By
considering countries of different credit worthiness we can deduce wider conclusions about the effectiveness of
our modeling schemes, since credit rating and future prospects is possible to affect stock market behavior.
Table 2 presents the descriptive statistics for each index and specifically the number of observations, the sample
mean, standard deviation, range, skewness and kurtosis. We observe that the sample mean is very close to zero for
all indices; in fact none of the true means is significantly different from zero at a 5% significance level. On the
other hand, CSE displays the highest standard deviation, followed by ATX. In terms of skewness, SAX is the only
index with skewness clearly different than zero, so one may assume that neither symmetry nor asymmetry is
necessarily a characteristic of returns. Lastly, in terms of kurtosis all indices are clearly leptokurtic, with kurtosis
much larger than 3.
Table 2. Descriptive statistics for the eight stock market indices (period 2006-2010)
INDEX
Obs.
Mean
St. Dev
Min
Max
Skewness
Kurtosis
ATX
DAX
OMXH 25
CAC 40
SAX
CSE
ASE
MONEX20
1239
1269
1256
1277
891
1234
1243
1228
-0.01882
0.01936
0.01059
-0.01680
-0.06560
-0.03887
-0.07628
0.03218
2.01192
1.55623
1.68434
1.64270
1.44360
2.58490
1.89193
1.90172
-10.25260
-7.43346
-8.90540
-9.47154
-14.81009
-12.13529
-10.21404
-9.70806
12.02104
10.79747
9.28556
10.59459
11.88026
12.12395
9.11439
11.28601
-0.19325
0.16424
0.07006
0.16273
-1.71565
-0.00617
-0.20025
0.68928
7.59691
10.46307
6.56828
9.90517
26.44706
5.48760
6.05895
8.64852
212
Journal of Applied Operational Research Vol. 6, No. 4
Table 3 proceeds with a number of hypothesis tests in order to check normality, symmetry, kurtosis, stationarity
and independence. For this purpose the statistical tests Jarque-Bera, D’Agostino, Anscombe-Glynn, Augmented
Dickey-Fuller (ADF), and Runs test, respectively, were employed. For each test the value for the test statistic and
the corresponding p-values is shown, marking with bold the hypotheses that are rejected. The Jarque-Bera test
(Trapletti and Hornik 2009) indicates that all indices display significant departures from normal distribution.
According to the D’Agostino test (Komsta and Novomestky 2007) only two indices (SAX and MONEX20) suggest
skewed distributions, while for the others a symmetric distribution is not rejected. The one-sided Anscombe-Glynn test
indicates that all indices follow leptokurtic distributions and the ADF test that the returns of all indices can be
considered stationary. Lastly, the Runs test indicates that the returns from the stock markets with credit rating A+
or higher can be assumed independent, while the CSE index is on the borderline.
Table 3. Tests for normality, skewness, kurtosis, stationarity and randomness
Index
ATX
DAX
OMXH 25
CAC 40
SAX
CSE
ASE
MONEX20
Jarque-Bera
(p-value)
D’Agostino
(p-value)
Anscombe-Glynn
(p-value)
ADF
(p-value)
Runs Test
(p-value)
1098.630
(<0.001)
2950.711
(<0.001)
667.3668
(<0.001)
2542.682
(<0.001)
20847.11
(<0.001)
318.1825
(<0.001)
492.9298
(<0.001)
1729.748
(<0.001)
-1.8246
(0.068)
1.5730
(0.116)
0.6712
(0.502)
1.5635
(0.118)
-9.8858
(<0.001)
-0.0586
(0.953)
-1.8924
(0.058)
5.9468
(<0.001)
10.986
(<0.001)
13.128
(<0.001)
9.972
(<0.001)
12.846
(<0.001)
14.887
(<0.001)
8.377
(<0.001)
9.272
(<0.001)
11.804
(<0.001)
-33.559
(<0.001)
-36.805
(<0.001)
-35.031
(<0.001)
-38.542
(<0.001)
-31.094
(<0.001)
-31.711
(<0.001)
-32.435
(<0.001)
-24.692
(<0.001)
-0.54
(0.589)
0.7583
(0.448)
-1.5243
(0.127)
0.14
(0.889)
1.3073
(0.191)
-1.6452
(0.099)
-3.2064
(0.001)
-5.2529
(<0.001)
As a last step of this exploratory analysis and in order to draw conclusions about the stability of moments with
respect to the sample size, up-to-date samples were used to calculate standard deviation, skewness and kurtosis.
Fig. 1 presents the fluctuation observed for each index in the values of these parameters when the sample size
changes. The graphs indicate that approximately after 1000 observations or sometimes less, the moments converge to a
certain value, with the exception of MONEX, for which the 1st and 3rd moments converge after 1200 observations.
This strengthens our belief for finite moments in the corresponding populations and at the same time indicates that
the samples we used in the analysis are large enough to study the marginal (long-run) distributions.
Next, we proceed with the modeling of the marginal distributions for each index. For comparison reasons and
due to its prominent position in Finance, we fit the Normal distribution along with the others reviewed previously.
Given a set of data, the parameter estimation is performed using the recommended estimation method which is the
MLE for the GH distribution (Wuertz 2010), and the EM algorithm for the mixtures of Normal distributions
(Benaglia et al. 2009). Tables 11 and 12 in the Appendix give the estimated values for all parameters.
After the fitting process the resulting models were evaluated and compared against each other using the
Kolmogorov-Smirnov (KS) statistic and Euclidean distance. Table 4 summarizes the results of the two statistics in
two different ways. (a) It displays the average values of the two statistics over all indices, provides the ranking of
the alternative distributions according to the two distances, and records the number of indices for which each
distribution was the best model (best fit). (b) It gives the average values of the two statistics separately for the two
groups of countries, those with rating AAA or A+ and those with rating less than A.
C Katris and S Daskalaki
213
214
Fig. 1 Sample moments as a function of the sample size (N = 1 to 1200)
Journal of Applied Operational Research Vol. 6, No. 4
C Katris and S Daskalaki
215
Table 4(a). In-sample average performance across all eight indices
Distribution
Avg K-S statistic
Avg Eucl. Dist.
Rank: K-S (E.D.)
Best Fits
Normal
0.0879
1.8481
4 (4)
0
2- Normal
0.0254
0.3616
3 (3)
0
3- Normal
0.0199
0.2503
2 (2)
4
GH
0.0155
0.2034
1 (1)
4
Table 4(b). In-sample average performance across the two groups of countries
KS Statistic
Normal
2-Normal Mix
3-Normal Mix
GH
Countries ≥ A+
Countries < A
0.091862
0.081257
0.028365
0.020538
0.019518
0.020732
0.016945
0.012992
Euclidean Dist.
Normal
2-Normal Mix
3-Normal Mix
GH
Countries ≥ A+
Countries < A
1.906828
1.7501
0.404716
0.289828
0.220114
0.300547
0.220981
0.174137
Based on the results recorded in Table 4(a) the GH distribution is the best model according to both distances,
with the 3-Normals mixture to be the second best. According to the number of best fits GH gives the best model
only for the four indices, while the 3-Normals mixture gives the best model for the other four. This suggests that
these two probability models are very close in performance. It is known, however, that the GH model is a normal
variance-mean mixture with the generalized Inverse Gaussian as mixing distribution, and this is potentially the
reason why although it contains less parameters it still outperforms the 3-Normals mixture model. Moreover,
since the tests for independence performed previously failed for the last three countries (countries rated less than
A and negative) we summarized the observed values for the two statistics according to this criterion in Table 4(b).
As one may observe the fitting performance for the two groups of indices is quite comparable and the lack of
independence does not cause any observable differences.
Risk estimation using marginal distributions
In this subsection the models that resulted from the fitting procedure are competing to each other as VaR parametric
models. Based on the definition of VaR given previously, the 5% and 1% levels of long position risk correspond
to the 5% and 1% quantiles of a parametric model for the returns. Therefore, for each one of the indices VaR
estimates at the 5% and 1% risk levels were calculated using all suggested distributions and the help of R
(Mächler, 2010; Wuertz, 2010). Then it is claimed that these return values will be exceeded with probability .05
(or .01 respectively). In order to test this claim VaRs were evaluated in-sample and compared to each other using
the Kupiec binomial test. A summary of the results are displayed in Table 5, where the average LR statistic is
used to rank the models.
Table 5. In-sample comparison of VaR models using the Kupiec binomial test
Distributions
Average LR Statistic
5%level ( 1%level)
Ranking
5%level (1%level)
Rejections at
5% level ( 1% level)
Normal
0.989 (8.443)
3 (4)
0 (8)
2-Normals Mixture
1.787 (1.093)
4 (3)
1 (1)
3-Normals Mixture
0.230 (0.130)
1 (1)
0 (0)
Generalized Hyperbolic
0.236 (0.474)
2 (2)
0 (0)
216
Journal of Applied Operational Research Vol. 6, No. 4
The results in Table 5 reveal that VaR estimates from the 3-Normals mixture and GH are not rejected in any of
the 16 tests performed; from the 2-Normals mixture was rejected once at each level and from Normal was rejected
for all eight indices at the 1% risk level. Moreover, based on the average LR value the 3-Normals mixture and the
GH distributions are almost equivalent in performance with a slight precedence of the former.
Next, the study continues with an out-of-sample evaluation and comparison, where the VaR estimates developed
previously were evaluated with a new dataset from January 2011 to May 2012. In addition to the previously discussed
parametric models, we also included the historical method, which is adopted as an alternative non-parametric
method for VaR estimation through the calculation of sample quantiles. Again, the measure was the LR statistic
from the Kupiec binomial test. The results are displayed in Table 6.
Table 6. Out-of-sample comparison of VaR models using the Kupiec binomial test
Distributions
Average LR Statistic
5% (1%) level
Ranking
5% (1%) level
Rejections at
5% (1%) level
Normal
2-Normals Mix
3-Normals Mix
GH
Historical
8.434 (10.762)
9.547 (3.315)
8.396 (2.589)
7.469 (3.159)
8.487 (2.565)
3 (5)
5 (4)
2 (2)
1 (3)
4 (1)
4 (4)
6 (2)
4 (1)
3 (2)
5 (1)
Table 6 displays the average LR statistic over all indices and corresponding ranking of the models, as well as
the number of rejections observed according to the Kupiec test. Based on these summarized results, for the 5%
risk level the GH distribution gives the lowest average LR statistic, followed by the 3-normals mixture. For the
1% risk level however, the historical method gives the lowest average LR statistic, followed closely by the 3-normals
mixture. In terms of the number of rejections none of the model prevails, even though one may notice that the GH
and the 3-normals mixture carry the lowest cumulative number of rejections at both levels.
As a last step for our study we experimented with rolling windows for estimating VaR (Zeileis and Grothendieck,
2005). Rolling windows as an approach has been selected here to capture the time dynamics of economy which
sometimes make past returns not to be representative for the future. For this technique however, the length of the
window must be long enough in order to estimate accurately the parameters of the models. Four alternative
lengths for the windows were adopted, 250, 500, 750, and 1000 observations, except from SAX for which we used
window of 250, 500, 750, and 800 observations. Our intention for this approach was to specify the best window size
for each VaR level.
Table 7. Out-of sample evaluation of VaR models using Rolling Windows
Rolling Window of 250 Observations
Distributions
Normal
2 Normal Mix
3 Normal Mix
GH
Historical
Average LR Statistic
5% ( 1%) level
4.823 (11.545)
5.057 (5.383)
4.616 (5.507)
6.117 (5.322)
4.304 (3.851)
Ranking
5% ( 1%) level
3 (5)
4 (3)
2 (4)
5 (2)
1 (1)
Rejections
5% ( 1%) level
2 (7)
3 (3)
4 (3)
4 (3)
3 (2)
Rolling Window of 500 Observations
Normal
2 Normal Mix
3 Normal Mix
GH
Historical
5.575 (8.989)
6.507 (4.593)
5.504 (2.992)
7.243 (3.072)
5.650 (2.524)
2 (5)
4 (4)
1 (2)
5 (3)
3 (1)
3 (5)
4 (2)
3 (1)
4 (1)
3 (0)
C Katris and S Daskalaki
217
Rolling Window of 750 Observations
Normal
2 Normal Mix
3 Normal Mix
GH
Historical
2.934 (2.881)
3.949 (1.969)
3.071 (2.822)
3.507 (2.696)
2.926 (1.748)
2 (5)
5 (2)
3 (4)
4 (3)
1 (1)
2 (1)
2 (1)
1 (2)
1 (2)
1 (1)
Rolling Windows of 1000 Observations
Normal
2 Normal Mix
3 Normal Mix
GH
Historical
3.856 (2.291)
5.181 (3.176)
3.792 (2.211)
3.972 (2.344)
4.080 (0.974)
2 (3)
5 (5)
1 (2)
3 (4)
4 (1)
3 (1)
3 (2)
2 (1)
2 (2)
2 (0)
Table 7 presents the summarized results from the evaluation with rolling windows. Based on the average LR
statistic for the 5% level a window size of 750 observations seems more appropriate, since all models display
there a minimum. Similarly, for 1% level, the size of 1000 observations seems more appropriate for all models
except from the 2 Normal Mixtures. It is worth noting that although the Normal distribution was not suggested as
a well fitted model for the returns, using a suitable rolling window size may lead to competitive VaR estimates.
Specifically, using windows of 750 observations for the 5% VaR and 1000 for the 1% VaR the Normal distribution
fails only in 2 and 1 indices, respectively. Lastly, the historical method appears to achieve quite accurate VaR values
and outperforms the parametric models according to average LR statistic and number of rejections.
Given the previous results, Tables 8 and 9 give analytically the LR statistic and p-values for each index with the
intention to draw conclusions about the performance in each separate market. For each case the numbers of rejections
of the Kupiec test at the significance level of 0.01 are recorded.
Table 8: Kupiec test for the 5% risk level VaRs - Rolling window 750 observations
Index
ATX
DAX
OMXH 25
CAC 40
SAX
CSE
ASE
MONEX20
Reject at α = 0.01
Normal
(p-value)
2 Normal Mix
(p-value)
3 Normal Mix
(p-value)
GH
(p-value)
Historical
(p-value)
2.0683
(0.150)
0.4494
(0.503)
1.4392
(0.230)
0.1924
(0.661)
8.6247
(0.003)
6.6979
(<0.01)
0.3164
(0.574)
3.6675
(0.055)
2
4.0202
(0.045)
0.1924
(0.661)
1.4392
(0.230)
0.1924
(0.661)
8.6247
(0.003)
11.5983
(<0.01)
2.8693
(0.090)
2.6451
(0.104)
2
2.9503
(0.086)
0.8075
(0.369)
0.9487
(0.330)
0.1924
(0.661)
4.0652
(0.044)
11.5983
(<0.01)
2.1723
(0.141)
1.8108
(0.178)
1
2.9503
(0.086)
0.4494
(0.503)
1.4392
(0.230)
0.1924
(0.661)
5.3472
(0.021)
12.9955
(<0.01)
2.8693
(0.090)
1.8108
(0.178)
1
4.0203
(0.045)
0.8075
(0.369)
0.5551
(0.456)
0.0413
(0.839)
2.9893
(0.084)
11.5983
(<0.01)
1.5647
(0.211)
1.8108
(0.178)
1
218
Journal of Applied Operational Research Vol. 6, No. 4
Table 9: Kupiec test for the 1% risk level VaRs - Rolling window 1000 observations
Index
ATX
DAX
OMXH 25
CAC 40
SAX
CSE
ASE
MONEX20
Reject at α = 0.01
Normal
(p-value)
2-Normal Mix
(p-value)
3-Normal Mix
(p-value)
GH
(p-value)
Historical
(p-value)
0.0661
(0.797)
2.5041
(0.114)
1.3871
(0.239)
1.3060
(0.253)
0.5561
(0.456)
10.3564
(<0.01)
1.4430
(0.229)
0.7084
(0.34)
1
7.0553
(<0.01)
2.6862
(0.101)
0.8293
(0.362)
7.2965
(<0.01)
0.7854
(0.376)
4.2856
(0.038)
0.0606
(0.806)
2.4122
(0.120)
2
0.7767
(0.378)
2.6862
(0.101)
0.0972
(0.755)
2.7007
(0.100)
2.5413
(0.111)
8.1191
(<0.01)
0.0606
(0.806)
0.7084
(0.340)
1
7.0553
(<0.01)
0.0390
(0.843)
0.0972
(0.755)
0.8830
(0.347)
0.0817
(0.775)
8.1191
(<0.01)
0.0606
(0.806)
2.4122
(0.120)
2
0.7767
(0.378)
0.0390
(0.843)
0.0972
(0.755)
0.8830
(0.347)
0.7854
(0.376)
2.7395
(0.098)
0.0606
(0.806)
2.4122
(0.120)
0
In Table 8 the 3-Normals mixture, the GH and the Historical method exhibit the best performance according to
number of rejections, while in Table 9 the Historical method gives the best performance over all parametric
models since none of the corresponding tests is significant. From the parametric models, the 3-Normals mixture
and the Normal, exhibit the best performance with only one rejection for each significance level.
Table 10. Mean LR and rejections for countries according to credit rating at 5% VaR
5% risk level VaRs
Average LR for
Countries ≥ A+
Rejections
Average LR for
Countries < A
Rejections
Normal
2-Normal Mix
3-Normal Mix
GH
Historical
2.5548
2.8938
1.7928
2.0757
1.6827
1
1
0
0
0
3.5606
5.7042
5.1938
5.8919
4.9913
1
1
1
1
1
1% risk level VaRs
Average LR for
Countries ≥ A+
Rejections
Average LR for
Countries < A
Rejections
1.1639
3.7305
1.7604
1.6312
0.5162
0
2
0
1
0
4.1693
2.2528
2.9627
3.5306
1.7374
1
0
1
1
0
A last comment on the results just presented would be to observe the specific indices where the rejections of the
Kupiec test take place. Interestingly enough the models that perform well in VaR estimation perform even better
for the indices that belong to the countries with credit rating A+ or higher. Table 10 summarizes further the results
from Tables 8 and 9 according to the credit rating of the countries. One may observe that all models achieve
smaller average LR statistic (more accurate VaR estimation) and cumulatively fewer rejections for the countries
C Katris and S Daskalaki
219
with ratings greater than A. The only exception is the 2-Normals mixture at the 1% risk level where this speculation
is not validated. However, since the data from countries with rating lower than A have indicated significant
dependencies, it is possible that the estimation of the parameters is biased and their results not very reliable.
Moreover, for countries with credit rating greater than A the historical method is the best for both risk levels and
Normal distribution follows as second best for the 1% risk level only, an observation suggesting that both of them
are good choices for countries of this category. Lastly, comparing the 3-Normals mixture and GH, our study suggests
that the former prevails over the latter since it achieves zero rejections at both risk levels.
Summary and conclusions
In this paper the scope was to identify probability models suitable for modeling the marginal distributions of stock
index returns and to employ them for risk estimation. We compared the mixtures of two and three normal distributions
and the GH along with the plain Normal. The fitting process was performed for eight European stock indices and
VaR was estimated using the chosen distributions as parametric models. The models were compared for their fit
as marginal distributions and as VaR estimation models. The first comparison was performed using the KS statistic
and Euclidean distance, while the second using backtesting and the Kupiec binomial test.
According to our study the GH distribution was found to be the most successful model for modeling marginal
distributions of returns. The 3-Normals mixture, however, follows closely the performance of GH in the fitting
process and has the same number of best fits. For the estimation of VaR the models were compared both in-sample
and out-of-sample. The in-sample comparison suggested that the 3-Normals mixture and GH overperform all others
and achieve zero failures at the Kupiec test. However, with the out-of-sample comparisons all models display
worse performance, especially for the countries with low credit rating where the tests for independence were
significant. Still for the out-of-sample comparison the 3-Normals and GH give the smallest cumulative number of
rejections and relatively smaller average LR statistic.
To improve VaR estimation rolling window approaches were adopted with four different window lengths. This
decision increased computational cost but improved the static approach especially when larger windows were
employed. For the 5% risk level a window of 750 observations was selected as best and indicated that the 3-Normals
mixture, the GH and historical method prevail over the other two distributions. Conversely, for the 1% risk level a
window of 1000 observations was selected as best and indicated that the capability of most parametric models for
VaR estimation is improved by succeeding lower LR statistic and fewer rejections; this is true for the historical
method too. Again, for the countries with low credit rating it is observed that practically all models provide worse
VaR estimates even with rolling windows.
Overall, we conclude that among the different distribution families we examined in our study, the GH is the
best model for fitting the returns of stock indices while the 3-Normals mixture is a better choice for VaR estimation
since it combines comparatively low LR statistic and small number of rejections.
References
Alles L, Kling J (1994) Regularities in Variation of Skewness
in Asset Return. J Financ Res XVII(3): 427–438.
Aquino R (2006) Efficiency of the Philippine Stock Market.
Appl Econ Lett, 13: 463–470
Barndorff-Nielsen OE (1977) Exponentially decreasing
distributions for the logarithm of particle size. Proc. of
the Royal Society London, 353: 401–419
Barndorff-Nielsen OE, Cox DR (1995). Inference and
Asymptotics. Chapman and Hall: London
Barndorff-Nielsen OE (1997) Normal Inverse Gaussian
distributions and the modeling of stock returns, Scand J
Stat, 24: 1–13
Bauer C (2000) Value at Risk Using Hyperbolic Distributions,
J Econ Bus 52: 455–467
Behr A, Poetter U (2009) Modeling Marginal Distributions
of Ten European Stock Market Index Returns, Int Res J
Fin Econ, 28: 104–119
Benaglia T, Chauveau D, Hunter DR, Young D (2009)
mixtools: An R Package for Analyzing Finite Mixture
Models. J Stat Softw, 32(6): 1-29
Blattberg RC, Gonedes NJ (1974) A comparison of the stable
and Student distributions as statistical models for stock
prices. J Bus 47: 244–280
Chen Y, Härdle W, Jeong S (2008) Nonparametric Risk
Management with Generalized Hyperbolic Distributions,
J Am Stat Assoc 103: 910 – 923
Cont R (2001) Empirical properties of asset returns: stylized
facts and statistical issues. Quant Financ 1: 223–236.
220
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J
Roy Stat Soc B Met 39(1): 1-38
Doric D and Doric EN (2011) Return Distribution and Value
at Risk Estimation for BELEX15. YUJOR 21(1): 103-118
Eberlein E, Keller U (1995) Hyperbolic distributions in
finance. Bernoulli 1(3): 281–299
Embrechts P, Klueppelberg C, Mikosch T (1997) Modeling
Extremal Events. Springer-Verlag, Berlin Heidelberg
New York
Fama EF (1965) The behavior of stock market prices. J
Bus 38: 34–105
Gourieroux C, Jasiac J (2010) Value at Risk. In Ait-Sahalia
Y, Hansen LP (eds) Handbook of Financial Econometrics,
Vol. 1, Tools and Techniques, North-Holland UK pp
553-615.
Harvey K, Siddique A (2000) Conditional Skewness in Asset
Pricing Tests. J Financ 55: 1263–1295
Kon SJ (1984) Models of stock returns - A comparison. J
Financ 39:147–165
Komsta L, Novomestky F (2007) moments: Moments,
cumulants, skewness, kurtosis and related tests. R
package
version0.11.
http://www.r-project.org,
http://www.komsta.net/
Kupiec P, (1995) Techniques for Verifying the Accuracy of
Risk Management Models. J Deriv 3:73–84
Mächler M (2010) nor1mix: Normal (1-d) Mixture Models
(S3 Classes and Methods). R package version 1.1-2.
http://CRAN.R-project.org/package=nor1mix
Mandelbrot B (1963) The variation of certain speculative
prices. J Bus 36: 394–419
McLachlan GJ, Peel, D (2000) Finite Mixture Models. John
Wiley & Sons, Inc.
McNeil AJ, Frey R, Embrechts P (2005) Quantitative Risk
Management: Concepts, Techniques and Tools, Princeton
University Press, Princeton and Oxford.
Morgan JP (1995) RiskMetrics. Third Edition, Morgan
Necula C (2009) Modeling Heavy-Tailed Stock Index Returns
Using The Generalized Hyperbolic Distribution. Rom J
of Econ Forecast 10(2): 118–131
Onour I (2010) Extreme Risk and Fat-Tails Distribution
Model: Empirical Analysis. J Money Investment and
Banking, 13: 27–34
Önalan O (2010) Financial Risk Management with Normal
Inverse Gaussian Distribution, International Research
Journal of Finance and Economics, Issue 38: 104–115
Peters EE (1991) Chaos and Order in the Capital Markets.
John Wiley & Sons, New York
Praetz PD (1972) The distribution of share price changes. J
Bus 45: 49–55
Prause K (1997) Modeling financial data using generalized
hyperbolic distributions. FDM Preprint, 48, University
of Freiburg
Rachev Z, Racheva-lotova B, Stoyanov S (2010) Capturing
fat tails, Risk magazine, May 2010, 76-80.
Swanson NR, White H (1997) Forecasting Economic Time
Series Using Flexible Versus Fixed Specification and
Journal of Applied Operational Research Vol. 6, No. 4
Linear versus Nonlinear Econometric Models, Int J
Forecasting 13: 439–461
Trapletti A, Hornik K (2009) tseries: Time Series Analysis
and Computational Finance, R package version 0.10-29
Venkataraman S (1997) Value at risk for a mixture of normal distributions: the use of quasi-Bayesian estimation
techniques, Economic Perspectives, Fed. Reserve Bank
of Chicago, March/April 1997, 2–13
Wuertz D (2010) Rmetrics core team members, uses code
builtin from the following R contributed packages:
gmm from Pierre Chauss, gld from Robert King, gss
from Chong Gu, nortest from Juergen Gross,
HyperbolicDist from David Scott, sandwich from
Thomas Lumley, Achim Zeileis and fortran/C code
from Kersti Aas. fBasics: Rmetrics - Markets and
Basic Statistics. R package version 2110.79.
http://CRAN.R-project.org/package=fBasics
Zeileis A, Grothendieck G (2005) zoo: S3 Infrastructure for
Regular and Irregular Time Series. J Stat Softw, 14:1-27
C Katris and S Daskalaki
221
Appendix
We provide tables with supplementary computations carried over during the analysis. Specifically, we provide Tables
11 and 12 with the estimated parameters for Normal mixtures and Generalized Hyperbolic distribution, respectively.
Table 11. Parameter estimation for the Normal Mixtures (2 and 3 components)
Index
ATX
DAX
OMXH
CAC
SAX
CSE
ASE
MONEX
Normal Mixture (2-components)
Normal Mixture (3-components)
weights
mean
St. dev.
weights
mean
St. dev.
w1  0.275
1  0.534
2  0.177
 1  3.270
 2  1.177
w1  0.383
1  0.396
2  0.266
3  0.243
 1  2.231
 2  1.033
 3  4.857
1  0.08
2  0.482
 1  1.079
 2  3.445
w1  0.082
1  0.521
2  0.043
3  0.177
 1  3.836
 2  1.249
 3  0.319
1  0.214
2  0.099
 1  2.664
 2  1.058
w1  0.055
1  0.593
2  0.249
3  0.187
 1  3.855
 2  1.865
 3  0.869
1  0.062
2  0.487
 1  1.098
 2  3.367
w1  0.043
1  0.094
2  0.142
3  0.316
 1  4.815
 2  0.913
 3  1.829
1  0.066
2  0.431
 1  0.546
 2  2.618
w1  0.028
 1  5.563
 2  0.412
 3  1.493
1  0.032
2  0.046
 1  3.451
 2  1.242
w1  0.194
 1  2.724
 2  1.031
w1  0.188
w2  0.612
1  0.341
2  0.092
w1  0.667
w2  0.333
1  0.068
2  0.233
 1  0.863
 2  3.050
w1  0.151
1  2.248
2  0.095
3  0.124
1  0.065
2  0.003
3  0.046
1  0.374
2  0.310
3  0.152
1  0.591
2  0.062
3  0.071
w2  0.725
w1  0.886
w2  0.114
w1  0.284
w2  0.716
w1  0.857
w2  0.143
w1  0.735
w2  0.265
w1  0.495
w2  0.505
w1  0.388
w2  0.555
w3  0.061
w2  0.752
w3  0.166
w2  0.455
w3  0.490
w2  0.615
w3  0.342
w2  0.542
w3  0.430
w2  0.226
w3  0.580
w2  0.254
w3  0.558
w2  0.307
w3  0.542
 1  4.352
 2  0.779
 3  2.222
 1  3.296
 2  0.650
 3  1.569
 1  3.870
 2  0.505
 3  1.498
Table 12. Generalized Hyperbolic parameter estimation
Index
ATX
DAX
OMXH 25
CAC 40
SAX
CSE
ASE
MONEX 20
Generalized Hyperbolic Distribution
alpha
0.348
0.406
0.607
0.211
0.509
0.543
0.654
0.452
beta
-0.074
-0.065
-0.046
-0.046
-0.111
-0.005
-0.084
0.034
delta
1.719
1.178
1.175
1.641
0.292
0.387
0.809
0.531
mu
0.274
0.173
0.141
0.107
0.130
-0.003
0.218
-0.089
Lambda
-0.735
-0.683
-0.098
-1.260
-0.037
0.945
0.463
0.124