Journal of Applied Operational Research (2014) 6(4), 207–221 © Tadbir Operational Research Group Ltd. All rights reserved. www.tadbir.ca ISSN 1735-8523 (Print), ISSN 1927-0089 (Online) Marginal distribution modeling and value at risk estimation for stock index returns Christos Katris * and Sophia Daskalaki University of Patras, Greece Abstract. The purpose of this study is to explore probability distributions for modeling the marginal distributions of stock index returns and to further employ them for risk estimation. It is well accepted that stock returns follow heavy-tailed and leptokurtic distributions. In order to describe sufficiently the empirical characteristics of the stock index returns of eight European countries for the period 2006-2012, flexible models with varying number of parameters such as the Generalized Hyperbolic distribution and Normal Mixtures have been employed. The fit of these distributions is evaluated using Kolmogorov-Smirnov statistic and Euclidean distance. For risk estimation, on the other hand, a common tool is Value at Risk (VaR) which is estimated with the help of the selected distribution models along with the non-parametric historical method. Evaluation and comparison of the VaR models is performed using backtesting in conjunction with a binomial test, both in-sample to validate models and out-of-sample to test forecasting performance. Lastly, rolling window strategies are employed in order to improve forecasts and identify a successful strategy. Keywords: financial modeling; generalized hyperbolic distributions; normal mixtures; probability distributions; marginal value at risk; rolling estimation * Received December 2013. Accepted April 2014 Introduction Development of theoretical models for the marginal distributions of stock returns is important for many problems in the field of Finance. Previous research indicates that there are several well established empirical characteristics in financial data, i.e. heavy tails, leptokurtosis, long memory and volatility clustering. The first two concern their marginal distributions, while the other two are dynamic characteristics affecting possible time series models. Marginal distributions have a static nature; however they play an important role for constructing dynamic models. Theories and models developed for stock returns implicitly or explicitly set an assumption for their marginal distribution and for many years this had been the normal distribution (e.g. Black-Scholes Option Pricing model). Although normal distribution is an attractive mathematical model, empirical studies do not always support its widespread usage (Peters 1991; Harvey and Siddique 2000; Alles and Kling 1994) and indicate alternative models as more suitable. Since 1963 till now researchers have suggested Paretian Stable distributions (Mandelbrot 1963; Fama 1965), a scaled t-distribution (Praetz 1972; Blattberg and Gonides 1974) and mixtures of Normal distributions (Kon 1984). More recently, Aquino (2006) fitted Laplace to data from the Philippines Stock Exchange, while Prause (1997) and Necula (2009) used the Generalized Hyperbolic (GH) family of distributions to model certain financial assets. On the other hand, subclasses of the GH distributions, e.g. the Normal Inverse Gaussian, were * Correspondence: Christos Katris, University of Patras, Electrical and Computer Engineering Department, Rio, Patras 26504 Greece. E-mail: [email protected] 208 Journal of Applied Operational Research Vol. 6, No. 4 proposed in Barndorff- Nielsen (1995; 1997), and the plain Hyperbolic in Eberlein and Keller (1995) and Kuchler et al (1999). Lastly, Normal mixtures along with the Hyperbolic and LogF have been used in Behr and Poetter (2009). In addition to serving other needs, marginal distributions may also be used for estimating the risk of a portfolio or a stock index. They provide parametric models for Value at Risk (VaR) estimation, a common approach for estimating risk. Alternatively, VaR may be estimated using the nonparametric historical method, the RiskMetrics or Delta Normal (Morgan 1995, McNeil et al. 2005) which is a widely used technique that assumes normality for the returns, as well as models for the tails like the Generalized Pareto distribution (Embrechts et al. 1997; Rachev et al. 2010). VaR can be static or time varying leading to the distinction of marginal or conditional according to Gourieroux and Jasiac (2009). In this work, we adopt marginal distributions as static VaR models and test their capability for risk identification by estimating the marginal VaR. As far as we know, Normal mixtures have been employed previously as parametric VaR models in Venkatraman (1997), while from the GH family only certain subclasses, i.e. the Normal Inverse Gaussian and the Hyperbolic distribution have been used (Önalan 2010; Bauer 2000; Doric and Doric 2011). Chen et al. (2008), on the other hand, developed a dynamic model which is based on the GH distribution in a nonparametric framework. In this paper, using returns of eight European stock indices we apply fitting techniques to the data in order to achieve “good” marginal distributions for each one of them. The experimentation with marginal distributions goes further into VaR estimation, where the 5% and 1% long position VaRs are estimated. The estimates are evaluated using the Kupiec test, a Binomial test for the number of exceedances of the estimated VaRs. The suggested models are tested both in-sample and out-of-sample. It is true that a distribution model can be of value if and only if it proves to be effective with data that were not used for the fitting process. Furthermore, with our empirical study we show that a parametric VaR estimation model has to be updated continuously in order to be successful with out-of-sample data and for therefore rolling windows is a reasonable approach. Swanson and White (1997) point the importance of rolling estimation in econometric modeling and our results confirm it. Next section is concerned with the distributions that will be considered as possible models for the marginals of the returns and the methodology for evaluating and comparing their fit to data. The following section gives a discussion on the marginal VaR and the different approaches for its estimation. Moreover, it outlines the backtesting method, which will be used for evaluation and the rolling windows methodology to be used for improving future VaR estimates. The following two sections present the empirical study performed to support our methodology, the framework for comparison and evaluation of the models, and lastly the successful models that are recommended from this work. The last section summarizes findings and draws conclusions. Modeling methodology for marginal distributions The marginal distribution function of a time series { Xt , t 1, , T } can be thought as Ft (x) P(Xt x) F(, , , x , , , ) where F(x1 , , xt , , xT ) is the joint probability distribution function of the series. In practice, the marginal distribution of a time series is approximated by neglecting the dependence structure that may exist in the data. However, as it will be explained later with the empirical data of eight stock index returns, the assumption of independence is not always invalid. As indicated by many researchers the marginal distributions of stock and stock index returns exhibit heavy-tails and leptokurtosis. Therefore in this section, we give a brief overview of the Normal mixtures and the GH distribution, which are flexible enough and may approximate effectively leptokurtic data with heavy tails. Moreover, these distributions depend on five or more parameters, thus satisfy a requirement posed in Cont (2001), according to which a distribution model needs at least 4 parameters in order to describe satisfactorily stock returns. The assumption that stock returns may follow mixtures of distributions rather than a single distribution has provided alternative approaches to our problem (Kon, 1984; Behr and Poetter, 2009). To define the mixtures, let N be a set of n univariate normal distributions with probability density functions fi ( x) and a set of weights wi , one for each n distribution, with wi 1 . The density function of the mixture of the n distributions is given by: i 1 n f ( x) wi f i ( x) i 1 C Katris and S Daskalaki 209 The vector of the parameters for the new distribution is of the form (wi ; i ; i ), i N , where i and n parameters, respectively. In this paper we consider fitting mixtures of two and i2 are the location and dispersion three normal distributions. Since wi 1 , our models will carry five and eight parameters, respectively. For fitting a 1 data we employ the EM algorithm, which may provide Maximum Likelihood mixture of normal distributions ito estimates for the parameters through an iterative procedure (Dempster et al. 1977; McLachlan and Peel 2000). The GH distribution, on the other hand, was introduced by Barndorff-Nielsen (1977). It is a normal variance-mean mixture with a continuous mixing distribution and carries five parameters, μ for the location, α for the shape, β for the skewness, λ for the kurtosis and δ for scaling. If the random variable X follows a GH distribution, then the probability density function takes the form: 1 fGH ( x; , , , , ) k{ 2 ( x ) 2 } 1 2( ) 2 K 2 ( x )2 ) e ( x ) 1( 2 where ( 2 2 ) 2 k 2 1 ( ) 2 K ( 2 2 ) and 1 1 1 2 t ( x x1 ) K (t ) x e dx, t 0 is the Bessel function of 3rd kind with index λ. 20 Certain subclasses of this very flexible family of distributions have been used extensively in Finance. These are the Hyperbolic distribution, arising when λ = 1, and the Inverse Gaussian distribution, when λ = -1/2. These distributions have provided successful models for marginal distribution of stock returns, as well as parametric risk models for calculating VaR. For this work we consider the whole family of the GH distributions which requires estimation of all five parameters using Maximum-Likelihood procedure. Since there is no analytic solution for the resulting system, the estimation of the parameters is performed numerically. Evaluation and comparison of the distributions considered will be performed in two ways; first in terms of their fit to data and second in terms of their capacity in estimating risk. The goodness of fit for all suggested distributions will be measured by the maximum distance between analytic and empirical distribution over all points in the sample (Kolmogorov-Smirnov statistic) and by the corresponding Euclidean distance: KS Max| pempirical panalytic | Euclidean Distance ( p empirical panalytic )2 For our study the statistics KS and Euclidean distance were used only as measures to compare and rank the models and not as statistical test functions for testing hypotheses. The Euclidean distance, specifically, was employed because it takes into account the distance between empirical and theoretical model for all points, and not only the maximum distance as KS statistic does. Regarding the comparison of the models on the basis of their risk estimation capacity, it is discussed rigorously in the next section. Value at Risk Estimation In this section we suggest additional comparisons of the probability models and in terms of their ability to forecast rare and damaging events which are described by their tail behavior, thus leading to risk estimation. In Economics and Finance, the long position Value at Risk (VaR) is the maximum loss not to be exceeded with a given probability, defined as risk level, over a given period of time. If Yt pt pt k denote the change of a portfolio value during a k-day horizon, the marginal long position VaR at the 100% risk level is then given by P(Yt VaR) , which 210 Journal of Applied Operational Research Vol. 6, No. 4 means that the VaR can be identified as the a-quantile of the marginal distribution of Yt . According to Gourieroux and Jasiac (2009), the marginal VaR can be estimated using either a parametric model by VaR F 1 ( ) , where F( x ) is the distribution function of the returns, or a non-parametric method, where estimation is based exclusively on empirical quantiles using past data (historical simulation), or methods where the objective is to find extreme quantiles using a parametric model for the tail only. The historical method is conceptually the simplest one for estimating VaR. It is based on the assumption that from a risk perspective history will repeat itself. To apply the method, we need the actual historical returns and calculate the corresponding empirical quantiles, which serve as VaR estimates at the desired levels. The most widely used technique for this task is historical simulation, where data are separated in time windows and finally obtain an overall VaR measure. The sample quantiles can be estimated by several different algorithms (Hyndman and Fan 1996). In this paper, instead of the usual historical simulation, we applied bootstrap resampling with 1000 samples to obtain VaR estimates. For each sample we calculate the corresponding quantile using algorithm 8 in (Hyndman and Fan 1996) and then obtain a mean quantile which serves as the static VaR estimate. In order to evaluate and compare the resulting VaR models, the backtesting concept is adopted in conjunction with the Kupiec Hypothesis Test. This approach suggests using past data to test effectiveness of the VaR estimates. The test suggested by Kupiec (1995) applies a Binomial test to determine whether the observed frequency of exceedances of a certain value is consistent with the frequency estimated from a model. If n is the number of VaR exceedances in a sample of size T, then n ~ Binomial (T , p) where p is the theoretical probability of VaR exceedances, i.e. the risk level. The proportion of exceedances in the sample ideally should be equal to the theoretical probability p, so the Kupiec test carries the following hypotheses: H0 : n p, T H1 : n p T For the test we need the number of observations (T), the number of exceedances (n), the risk level (p) and the level of significance (α). The corresponding statistic of the test is the likelihood ratio: n T n p (1 p) LR 2ln T n n n n 1 T T Under the null hypothesis LR asymptotically converges to a chi squared distribution with one degree of freedom. If 2 the observed value for LR exceeds the critical value of (1) for the preset level of significance a, then the null hypothesis is rejected. The backtesting procedure, discussed previously, is a method that can be used for the evaluation of VaR models. For its application it requires a sample of size T with past data and this may be the data used for the fitting process, in which case we have in-sample evaluation, or a completely new data set for an out-of-sample evaluation. While the former is the most common strategy, the latter is also very crucial because it indicates whether the model is appropriate for future real data also. When a static approach is followed, however, VaR estimates are projected to the future and this is not always safe. The possibility of a failure in such projections has been criticized previously and rolling estimation has been proposed from Basel committee. This procedure is more time consuming and computationally more intense, however as one may see from this study it improves in most cases the out-of-sample performance of VaR estimates. For implementing this step, rolling windows as a technique is used to update the parameters of the distributions. The procedure starts with a window of size T, the estimation of the first VaR and its comparison to the T+1 observation. Next, the oldest observation is deleted from the dataset, the newer is added and the step of the VaR model estimation is repeated. The alteration of estimation and comparison is repeated for the whole period for which the out-of-sample evaluation is performed. At the end of this procedure the Kupiec test is applied to test whether the exceedances of the rolling VaRs is as expected. A critical question in planning such a procedure is the size of the window; however, Basel committee’s standards refer to a size of at least 200 observations (Gourieroux and Jasiac, 2009). In our study we experimented with windows of 250, 500, 750 and 1000 observations. C Katris and S Daskalaki 211 Empirical Study and Evaluation of Models The dataset used for this study consists of daily logarithmic returns (rt 100(log pt log pt 1 )) of eight stock indices from countries that use Euro as their official currency. The data were obtained either from the stock market official websites (for Finland, Montenegro and Slovakia) or from the Wolfram Research database (for Austria, Germany and France) or from the webpage www.stockwatch.cy.com (for Greece and Cyprus). The analysis, estimation of parameters and model building were performed using a 5-year time frame and specifically from January 2006 to December 2010. The evaluation and comparison of the extracted models were performed either in-sample with the same data as those used for estimation or out-of-sample with a different set, from January 2011 to May 2012. Lastly, for the analysis that is presented next the software packages R, Mathematica, Excel and STATA were used. Table 1. European Stock Market Indices S&P Index Country ATX DAX OMX Helsinki 25 CAC 40 SAX CSE ASE MONEX20 Austria Germany Finland France Slovakia Cyprus Greece Montenegro Rating Outlook AAA AAA AAA AAA A+ BBB+ CC BB Stable Stable Stable Stable Positive Negative Negative Negative Table 1 displays the stock market indices that were considered in this study and the country they represent. The last two columns display the credit rating of each index and an assessment on the future prospects of each country for economic development, as announced by the Standard & Poor’s financial institution in their 2010 list. By considering countries of different credit worthiness we can deduce wider conclusions about the effectiveness of our modeling schemes, since credit rating and future prospects is possible to affect stock market behavior. Table 2 presents the descriptive statistics for each index and specifically the number of observations, the sample mean, standard deviation, range, skewness and kurtosis. We observe that the sample mean is very close to zero for all indices; in fact none of the true means is significantly different from zero at a 5% significance level. On the other hand, CSE displays the highest standard deviation, followed by ATX. In terms of skewness, SAX is the only index with skewness clearly different than zero, so one may assume that neither symmetry nor asymmetry is necessarily a characteristic of returns. Lastly, in terms of kurtosis all indices are clearly leptokurtic, with kurtosis much larger than 3. Table 2. Descriptive statistics for the eight stock market indices (period 2006-2010) INDEX Obs. Mean St. Dev Min Max Skewness Kurtosis ATX DAX OMXH 25 CAC 40 SAX CSE ASE MONEX20 1239 1269 1256 1277 891 1234 1243 1228 -0.01882 0.01936 0.01059 -0.01680 -0.06560 -0.03887 -0.07628 0.03218 2.01192 1.55623 1.68434 1.64270 1.44360 2.58490 1.89193 1.90172 -10.25260 -7.43346 -8.90540 -9.47154 -14.81009 -12.13529 -10.21404 -9.70806 12.02104 10.79747 9.28556 10.59459 11.88026 12.12395 9.11439 11.28601 -0.19325 0.16424 0.07006 0.16273 -1.71565 -0.00617 -0.20025 0.68928 7.59691 10.46307 6.56828 9.90517 26.44706 5.48760 6.05895 8.64852 212 Journal of Applied Operational Research Vol. 6, No. 4 Table 3 proceeds with a number of hypothesis tests in order to check normality, symmetry, kurtosis, stationarity and independence. For this purpose the statistical tests Jarque-Bera, D’Agostino, Anscombe-Glynn, Augmented Dickey-Fuller (ADF), and Runs test, respectively, were employed. For each test the value for the test statistic and the corresponding p-values is shown, marking with bold the hypotheses that are rejected. The Jarque-Bera test (Trapletti and Hornik 2009) indicates that all indices display significant departures from normal distribution. According to the D’Agostino test (Komsta and Novomestky 2007) only two indices (SAX and MONEX20) suggest skewed distributions, while for the others a symmetric distribution is not rejected. The one-sided Anscombe-Glynn test indicates that all indices follow leptokurtic distributions and the ADF test that the returns of all indices can be considered stationary. Lastly, the Runs test indicates that the returns from the stock markets with credit rating A+ or higher can be assumed independent, while the CSE index is on the borderline. Table 3. Tests for normality, skewness, kurtosis, stationarity and randomness Index ATX DAX OMXH 25 CAC 40 SAX CSE ASE MONEX20 Jarque-Bera (p-value) D’Agostino (p-value) Anscombe-Glynn (p-value) ADF (p-value) Runs Test (p-value) 1098.630 (<0.001) 2950.711 (<0.001) 667.3668 (<0.001) 2542.682 (<0.001) 20847.11 (<0.001) 318.1825 (<0.001) 492.9298 (<0.001) 1729.748 (<0.001) -1.8246 (0.068) 1.5730 (0.116) 0.6712 (0.502) 1.5635 (0.118) -9.8858 (<0.001) -0.0586 (0.953) -1.8924 (0.058) 5.9468 (<0.001) 10.986 (<0.001) 13.128 (<0.001) 9.972 (<0.001) 12.846 (<0.001) 14.887 (<0.001) 8.377 (<0.001) 9.272 (<0.001) 11.804 (<0.001) -33.559 (<0.001) -36.805 (<0.001) -35.031 (<0.001) -38.542 (<0.001) -31.094 (<0.001) -31.711 (<0.001) -32.435 (<0.001) -24.692 (<0.001) -0.54 (0.589) 0.7583 (0.448) -1.5243 (0.127) 0.14 (0.889) 1.3073 (0.191) -1.6452 (0.099) -3.2064 (0.001) -5.2529 (<0.001) As a last step of this exploratory analysis and in order to draw conclusions about the stability of moments with respect to the sample size, up-to-date samples were used to calculate standard deviation, skewness and kurtosis. Fig. 1 presents the fluctuation observed for each index in the values of these parameters when the sample size changes. The graphs indicate that approximately after 1000 observations or sometimes less, the moments converge to a certain value, with the exception of MONEX, for which the 1st and 3rd moments converge after 1200 observations. This strengthens our belief for finite moments in the corresponding populations and at the same time indicates that the samples we used in the analysis are large enough to study the marginal (long-run) distributions. Next, we proceed with the modeling of the marginal distributions for each index. For comparison reasons and due to its prominent position in Finance, we fit the Normal distribution along with the others reviewed previously. Given a set of data, the parameter estimation is performed using the recommended estimation method which is the MLE for the GH distribution (Wuertz 2010), and the EM algorithm for the mixtures of Normal distributions (Benaglia et al. 2009). Tables 11 and 12 in the Appendix give the estimated values for all parameters. After the fitting process the resulting models were evaluated and compared against each other using the Kolmogorov-Smirnov (KS) statistic and Euclidean distance. Table 4 summarizes the results of the two statistics in two different ways. (a) It displays the average values of the two statistics over all indices, provides the ranking of the alternative distributions according to the two distances, and records the number of indices for which each distribution was the best model (best fit). (b) It gives the average values of the two statistics separately for the two groups of countries, those with rating AAA or A+ and those with rating less than A. C Katris and S Daskalaki 213 214 Fig. 1 Sample moments as a function of the sample size (N = 1 to 1200) Journal of Applied Operational Research Vol. 6, No. 4 C Katris and S Daskalaki 215 Table 4(a). In-sample average performance across all eight indices Distribution Avg K-S statistic Avg Eucl. Dist. Rank: K-S (E.D.) Best Fits Normal 0.0879 1.8481 4 (4) 0 2- Normal 0.0254 0.3616 3 (3) 0 3- Normal 0.0199 0.2503 2 (2) 4 GH 0.0155 0.2034 1 (1) 4 Table 4(b). In-sample average performance across the two groups of countries KS Statistic Normal 2-Normal Mix 3-Normal Mix GH Countries ≥ A+ Countries < A 0.091862 0.081257 0.028365 0.020538 0.019518 0.020732 0.016945 0.012992 Euclidean Dist. Normal 2-Normal Mix 3-Normal Mix GH Countries ≥ A+ Countries < A 1.906828 1.7501 0.404716 0.289828 0.220114 0.300547 0.220981 0.174137 Based on the results recorded in Table 4(a) the GH distribution is the best model according to both distances, with the 3-Normals mixture to be the second best. According to the number of best fits GH gives the best model only for the four indices, while the 3-Normals mixture gives the best model for the other four. This suggests that these two probability models are very close in performance. It is known, however, that the GH model is a normal variance-mean mixture with the generalized Inverse Gaussian as mixing distribution, and this is potentially the reason why although it contains less parameters it still outperforms the 3-Normals mixture model. Moreover, since the tests for independence performed previously failed for the last three countries (countries rated less than A and negative) we summarized the observed values for the two statistics according to this criterion in Table 4(b). As one may observe the fitting performance for the two groups of indices is quite comparable and the lack of independence does not cause any observable differences. Risk estimation using marginal distributions In this subsection the models that resulted from the fitting procedure are competing to each other as VaR parametric models. Based on the definition of VaR given previously, the 5% and 1% levels of long position risk correspond to the 5% and 1% quantiles of a parametric model for the returns. Therefore, for each one of the indices VaR estimates at the 5% and 1% risk levels were calculated using all suggested distributions and the help of R (Mächler, 2010; Wuertz, 2010). Then it is claimed that these return values will be exceeded with probability .05 (or .01 respectively). In order to test this claim VaRs were evaluated in-sample and compared to each other using the Kupiec binomial test. A summary of the results are displayed in Table 5, where the average LR statistic is used to rank the models. Table 5. In-sample comparison of VaR models using the Kupiec binomial test Distributions Average LR Statistic 5%level ( 1%level) Ranking 5%level (1%level) Rejections at 5% level ( 1% level) Normal 0.989 (8.443) 3 (4) 0 (8) 2-Normals Mixture 1.787 (1.093) 4 (3) 1 (1) 3-Normals Mixture 0.230 (0.130) 1 (1) 0 (0) Generalized Hyperbolic 0.236 (0.474) 2 (2) 0 (0) 216 Journal of Applied Operational Research Vol. 6, No. 4 The results in Table 5 reveal that VaR estimates from the 3-Normals mixture and GH are not rejected in any of the 16 tests performed; from the 2-Normals mixture was rejected once at each level and from Normal was rejected for all eight indices at the 1% risk level. Moreover, based on the average LR value the 3-Normals mixture and the GH distributions are almost equivalent in performance with a slight precedence of the former. Next, the study continues with an out-of-sample evaluation and comparison, where the VaR estimates developed previously were evaluated with a new dataset from January 2011 to May 2012. In addition to the previously discussed parametric models, we also included the historical method, which is adopted as an alternative non-parametric method for VaR estimation through the calculation of sample quantiles. Again, the measure was the LR statistic from the Kupiec binomial test. The results are displayed in Table 6. Table 6. Out-of-sample comparison of VaR models using the Kupiec binomial test Distributions Average LR Statistic 5% (1%) level Ranking 5% (1%) level Rejections at 5% (1%) level Normal 2-Normals Mix 3-Normals Mix GH Historical 8.434 (10.762) 9.547 (3.315) 8.396 (2.589) 7.469 (3.159) 8.487 (2.565) 3 (5) 5 (4) 2 (2) 1 (3) 4 (1) 4 (4) 6 (2) 4 (1) 3 (2) 5 (1) Table 6 displays the average LR statistic over all indices and corresponding ranking of the models, as well as the number of rejections observed according to the Kupiec test. Based on these summarized results, for the 5% risk level the GH distribution gives the lowest average LR statistic, followed by the 3-normals mixture. For the 1% risk level however, the historical method gives the lowest average LR statistic, followed closely by the 3-normals mixture. In terms of the number of rejections none of the model prevails, even though one may notice that the GH and the 3-normals mixture carry the lowest cumulative number of rejections at both levels. As a last step for our study we experimented with rolling windows for estimating VaR (Zeileis and Grothendieck, 2005). Rolling windows as an approach has been selected here to capture the time dynamics of economy which sometimes make past returns not to be representative for the future. For this technique however, the length of the window must be long enough in order to estimate accurately the parameters of the models. Four alternative lengths for the windows were adopted, 250, 500, 750, and 1000 observations, except from SAX for which we used window of 250, 500, 750, and 800 observations. Our intention for this approach was to specify the best window size for each VaR level. Table 7. Out-of sample evaluation of VaR models using Rolling Windows Rolling Window of 250 Observations Distributions Normal 2 Normal Mix 3 Normal Mix GH Historical Average LR Statistic 5% ( 1%) level 4.823 (11.545) 5.057 (5.383) 4.616 (5.507) 6.117 (5.322) 4.304 (3.851) Ranking 5% ( 1%) level 3 (5) 4 (3) 2 (4) 5 (2) 1 (1) Rejections 5% ( 1%) level 2 (7) 3 (3) 4 (3) 4 (3) 3 (2) Rolling Window of 500 Observations Normal 2 Normal Mix 3 Normal Mix GH Historical 5.575 (8.989) 6.507 (4.593) 5.504 (2.992) 7.243 (3.072) 5.650 (2.524) 2 (5) 4 (4) 1 (2) 5 (3) 3 (1) 3 (5) 4 (2) 3 (1) 4 (1) 3 (0) C Katris and S Daskalaki 217 Rolling Window of 750 Observations Normal 2 Normal Mix 3 Normal Mix GH Historical 2.934 (2.881) 3.949 (1.969) 3.071 (2.822) 3.507 (2.696) 2.926 (1.748) 2 (5) 5 (2) 3 (4) 4 (3) 1 (1) 2 (1) 2 (1) 1 (2) 1 (2) 1 (1) Rolling Windows of 1000 Observations Normal 2 Normal Mix 3 Normal Mix GH Historical 3.856 (2.291) 5.181 (3.176) 3.792 (2.211) 3.972 (2.344) 4.080 (0.974) 2 (3) 5 (5) 1 (2) 3 (4) 4 (1) 3 (1) 3 (2) 2 (1) 2 (2) 2 (0) Table 7 presents the summarized results from the evaluation with rolling windows. Based on the average LR statistic for the 5% level a window size of 750 observations seems more appropriate, since all models display there a minimum. Similarly, for 1% level, the size of 1000 observations seems more appropriate for all models except from the 2 Normal Mixtures. It is worth noting that although the Normal distribution was not suggested as a well fitted model for the returns, using a suitable rolling window size may lead to competitive VaR estimates. Specifically, using windows of 750 observations for the 5% VaR and 1000 for the 1% VaR the Normal distribution fails only in 2 and 1 indices, respectively. Lastly, the historical method appears to achieve quite accurate VaR values and outperforms the parametric models according to average LR statistic and number of rejections. Given the previous results, Tables 8 and 9 give analytically the LR statistic and p-values for each index with the intention to draw conclusions about the performance in each separate market. For each case the numbers of rejections of the Kupiec test at the significance level of 0.01 are recorded. Table 8: Kupiec test for the 5% risk level VaRs - Rolling window 750 observations Index ATX DAX OMXH 25 CAC 40 SAX CSE ASE MONEX20 Reject at α = 0.01 Normal (p-value) 2 Normal Mix (p-value) 3 Normal Mix (p-value) GH (p-value) Historical (p-value) 2.0683 (0.150) 0.4494 (0.503) 1.4392 (0.230) 0.1924 (0.661) 8.6247 (0.003) 6.6979 (<0.01) 0.3164 (0.574) 3.6675 (0.055) 2 4.0202 (0.045) 0.1924 (0.661) 1.4392 (0.230) 0.1924 (0.661) 8.6247 (0.003) 11.5983 (<0.01) 2.8693 (0.090) 2.6451 (0.104) 2 2.9503 (0.086) 0.8075 (0.369) 0.9487 (0.330) 0.1924 (0.661) 4.0652 (0.044) 11.5983 (<0.01) 2.1723 (0.141) 1.8108 (0.178) 1 2.9503 (0.086) 0.4494 (0.503) 1.4392 (0.230) 0.1924 (0.661) 5.3472 (0.021) 12.9955 (<0.01) 2.8693 (0.090) 1.8108 (0.178) 1 4.0203 (0.045) 0.8075 (0.369) 0.5551 (0.456) 0.0413 (0.839) 2.9893 (0.084) 11.5983 (<0.01) 1.5647 (0.211) 1.8108 (0.178) 1 218 Journal of Applied Operational Research Vol. 6, No. 4 Table 9: Kupiec test for the 1% risk level VaRs - Rolling window 1000 observations Index ATX DAX OMXH 25 CAC 40 SAX CSE ASE MONEX20 Reject at α = 0.01 Normal (p-value) 2-Normal Mix (p-value) 3-Normal Mix (p-value) GH (p-value) Historical (p-value) 0.0661 (0.797) 2.5041 (0.114) 1.3871 (0.239) 1.3060 (0.253) 0.5561 (0.456) 10.3564 (<0.01) 1.4430 (0.229) 0.7084 (0.34) 1 7.0553 (<0.01) 2.6862 (0.101) 0.8293 (0.362) 7.2965 (<0.01) 0.7854 (0.376) 4.2856 (0.038) 0.0606 (0.806) 2.4122 (0.120) 2 0.7767 (0.378) 2.6862 (0.101) 0.0972 (0.755) 2.7007 (0.100) 2.5413 (0.111) 8.1191 (<0.01) 0.0606 (0.806) 0.7084 (0.340) 1 7.0553 (<0.01) 0.0390 (0.843) 0.0972 (0.755) 0.8830 (0.347) 0.0817 (0.775) 8.1191 (<0.01) 0.0606 (0.806) 2.4122 (0.120) 2 0.7767 (0.378) 0.0390 (0.843) 0.0972 (0.755) 0.8830 (0.347) 0.7854 (0.376) 2.7395 (0.098) 0.0606 (0.806) 2.4122 (0.120) 0 In Table 8 the 3-Normals mixture, the GH and the Historical method exhibit the best performance according to number of rejections, while in Table 9 the Historical method gives the best performance over all parametric models since none of the corresponding tests is significant. From the parametric models, the 3-Normals mixture and the Normal, exhibit the best performance with only one rejection for each significance level. Table 10. Mean LR and rejections for countries according to credit rating at 5% VaR 5% risk level VaRs Average LR for Countries ≥ A+ Rejections Average LR for Countries < A Rejections Normal 2-Normal Mix 3-Normal Mix GH Historical 2.5548 2.8938 1.7928 2.0757 1.6827 1 1 0 0 0 3.5606 5.7042 5.1938 5.8919 4.9913 1 1 1 1 1 1% risk level VaRs Average LR for Countries ≥ A+ Rejections Average LR for Countries < A Rejections 1.1639 3.7305 1.7604 1.6312 0.5162 0 2 0 1 0 4.1693 2.2528 2.9627 3.5306 1.7374 1 0 1 1 0 A last comment on the results just presented would be to observe the specific indices where the rejections of the Kupiec test take place. Interestingly enough the models that perform well in VaR estimation perform even better for the indices that belong to the countries with credit rating A+ or higher. Table 10 summarizes further the results from Tables 8 and 9 according to the credit rating of the countries. One may observe that all models achieve smaller average LR statistic (more accurate VaR estimation) and cumulatively fewer rejections for the countries C Katris and S Daskalaki 219 with ratings greater than A. The only exception is the 2-Normals mixture at the 1% risk level where this speculation is not validated. However, since the data from countries with rating lower than A have indicated significant dependencies, it is possible that the estimation of the parameters is biased and their results not very reliable. Moreover, for countries with credit rating greater than A the historical method is the best for both risk levels and Normal distribution follows as second best for the 1% risk level only, an observation suggesting that both of them are good choices for countries of this category. Lastly, comparing the 3-Normals mixture and GH, our study suggests that the former prevails over the latter since it achieves zero rejections at both risk levels. Summary and conclusions In this paper the scope was to identify probability models suitable for modeling the marginal distributions of stock index returns and to employ them for risk estimation. We compared the mixtures of two and three normal distributions and the GH along with the plain Normal. The fitting process was performed for eight European stock indices and VaR was estimated using the chosen distributions as parametric models. The models were compared for their fit as marginal distributions and as VaR estimation models. The first comparison was performed using the KS statistic and Euclidean distance, while the second using backtesting and the Kupiec binomial test. According to our study the GH distribution was found to be the most successful model for modeling marginal distributions of returns. The 3-Normals mixture, however, follows closely the performance of GH in the fitting process and has the same number of best fits. For the estimation of VaR the models were compared both in-sample and out-of-sample. The in-sample comparison suggested that the 3-Normals mixture and GH overperform all others and achieve zero failures at the Kupiec test. However, with the out-of-sample comparisons all models display worse performance, especially for the countries with low credit rating where the tests for independence were significant. Still for the out-of-sample comparison the 3-Normals and GH give the smallest cumulative number of rejections and relatively smaller average LR statistic. To improve VaR estimation rolling window approaches were adopted with four different window lengths. This decision increased computational cost but improved the static approach especially when larger windows were employed. For the 5% risk level a window of 750 observations was selected as best and indicated that the 3-Normals mixture, the GH and historical method prevail over the other two distributions. Conversely, for the 1% risk level a window of 1000 observations was selected as best and indicated that the capability of most parametric models for VaR estimation is improved by succeeding lower LR statistic and fewer rejections; this is true for the historical method too. Again, for the countries with low credit rating it is observed that practically all models provide worse VaR estimates even with rolling windows. Overall, we conclude that among the different distribution families we examined in our study, the GH is the best model for fitting the returns of stock indices while the 3-Normals mixture is a better choice for VaR estimation since it combines comparatively low LR statistic and small number of rejections. References Alles L, Kling J (1994) Regularities in Variation of Skewness in Asset Return. J Financ Res XVII(3): 427–438. Aquino R (2006) Efficiency of the Philippine Stock Market. Appl Econ Lett, 13: 463–470 Barndorff-Nielsen OE (1977) Exponentially decreasing distributions for the logarithm of particle size. Proc. of the Royal Society London, 353: 401–419 Barndorff-Nielsen OE, Cox DR (1995). Inference and Asymptotics. Chapman and Hall: London Barndorff-Nielsen OE (1997) Normal Inverse Gaussian distributions and the modeling of stock returns, Scand J Stat, 24: 1–13 Bauer C (2000) Value at Risk Using Hyperbolic Distributions, J Econ Bus 52: 455–467 Behr A, Poetter U (2009) Modeling Marginal Distributions of Ten European Stock Market Index Returns, Int Res J Fin Econ, 28: 104–119 Benaglia T, Chauveau D, Hunter DR, Young D (2009) mixtools: An R Package for Analyzing Finite Mixture Models. J Stat Softw, 32(6): 1-29 Blattberg RC, Gonedes NJ (1974) A comparison of the stable and Student distributions as statistical models for stock prices. J Bus 47: 244–280 Chen Y, Härdle W, Jeong S (2008) Nonparametric Risk Management with Generalized Hyperbolic Distributions, J Am Stat Assoc 103: 910 – 923 Cont R (2001) Empirical properties of asset returns: stylized facts and statistical issues. Quant Financ 1: 223–236. 220 Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B Met 39(1): 1-38 Doric D and Doric EN (2011) Return Distribution and Value at Risk Estimation for BELEX15. YUJOR 21(1): 103-118 Eberlein E, Keller U (1995) Hyperbolic distributions in finance. Bernoulli 1(3): 281–299 Embrechts P, Klueppelberg C, Mikosch T (1997) Modeling Extremal Events. Springer-Verlag, Berlin Heidelberg New York Fama EF (1965) The behavior of stock market prices. J Bus 38: 34–105 Gourieroux C, Jasiac J (2010) Value at Risk. In Ait-Sahalia Y, Hansen LP (eds) Handbook of Financial Econometrics, Vol. 1, Tools and Techniques, North-Holland UK pp 553-615. Harvey K, Siddique A (2000) Conditional Skewness in Asset Pricing Tests. J Financ 55: 1263–1295 Kon SJ (1984) Models of stock returns - A comparison. J Financ 39:147–165 Komsta L, Novomestky F (2007) moments: Moments, cumulants, skewness, kurtosis and related tests. R package version0.11. http://www.r-project.org, http://www.komsta.net/ Kupiec P, (1995) Techniques for Verifying the Accuracy of Risk Management Models. J Deriv 3:73–84 Mächler M (2010) nor1mix: Normal (1-d) Mixture Models (S3 Classes and Methods). R package version 1.1-2. http://CRAN.R-project.org/package=nor1mix Mandelbrot B (1963) The variation of certain speculative prices. J Bus 36: 394–419 McLachlan GJ, Peel, D (2000) Finite Mixture Models. John Wiley & Sons, Inc. McNeil AJ, Frey R, Embrechts P (2005) Quantitative Risk Management: Concepts, Techniques and Tools, Princeton University Press, Princeton and Oxford. Morgan JP (1995) RiskMetrics. Third Edition, Morgan Necula C (2009) Modeling Heavy-Tailed Stock Index Returns Using The Generalized Hyperbolic Distribution. Rom J of Econ Forecast 10(2): 118–131 Onour I (2010) Extreme Risk and Fat-Tails Distribution Model: Empirical Analysis. J Money Investment and Banking, 13: 27–34 Önalan O (2010) Financial Risk Management with Normal Inverse Gaussian Distribution, International Research Journal of Finance and Economics, Issue 38: 104–115 Peters EE (1991) Chaos and Order in the Capital Markets. John Wiley & Sons, New York Praetz PD (1972) The distribution of share price changes. J Bus 45: 49–55 Prause K (1997) Modeling financial data using generalized hyperbolic distributions. FDM Preprint, 48, University of Freiburg Rachev Z, Racheva-lotova B, Stoyanov S (2010) Capturing fat tails, Risk magazine, May 2010, 76-80. Swanson NR, White H (1997) Forecasting Economic Time Series Using Flexible Versus Fixed Specification and Journal of Applied Operational Research Vol. 6, No. 4 Linear versus Nonlinear Econometric Models, Int J Forecasting 13: 439–461 Trapletti A, Hornik K (2009) tseries: Time Series Analysis and Computational Finance, R package version 0.10-29 Venkataraman S (1997) Value at risk for a mixture of normal distributions: the use of quasi-Bayesian estimation techniques, Economic Perspectives, Fed. Reserve Bank of Chicago, March/April 1997, 2–13 Wuertz D (2010) Rmetrics core team members, uses code builtin from the following R contributed packages: gmm from Pierre Chauss, gld from Robert King, gss from Chong Gu, nortest from Juergen Gross, HyperbolicDist from David Scott, sandwich from Thomas Lumley, Achim Zeileis and fortran/C code from Kersti Aas. fBasics: Rmetrics - Markets and Basic Statistics. R package version 2110.79. http://CRAN.R-project.org/package=fBasics Zeileis A, Grothendieck G (2005) zoo: S3 Infrastructure for Regular and Irregular Time Series. J Stat Softw, 14:1-27 C Katris and S Daskalaki 221 Appendix We provide tables with supplementary computations carried over during the analysis. Specifically, we provide Tables 11 and 12 with the estimated parameters for Normal mixtures and Generalized Hyperbolic distribution, respectively. Table 11. Parameter estimation for the Normal Mixtures (2 and 3 components) Index ATX DAX OMXH CAC SAX CSE ASE MONEX Normal Mixture (2-components) Normal Mixture (3-components) weights mean St. dev. weights mean St. dev. w1 0.275 1 0.534 2 0.177 1 3.270 2 1.177 w1 0.383 1 0.396 2 0.266 3 0.243 1 2.231 2 1.033 3 4.857 1 0.08 2 0.482 1 1.079 2 3.445 w1 0.082 1 0.521 2 0.043 3 0.177 1 3.836 2 1.249 3 0.319 1 0.214 2 0.099 1 2.664 2 1.058 w1 0.055 1 0.593 2 0.249 3 0.187 1 3.855 2 1.865 3 0.869 1 0.062 2 0.487 1 1.098 2 3.367 w1 0.043 1 0.094 2 0.142 3 0.316 1 4.815 2 0.913 3 1.829 1 0.066 2 0.431 1 0.546 2 2.618 w1 0.028 1 5.563 2 0.412 3 1.493 1 0.032 2 0.046 1 3.451 2 1.242 w1 0.194 1 2.724 2 1.031 w1 0.188 w2 0.612 1 0.341 2 0.092 w1 0.667 w2 0.333 1 0.068 2 0.233 1 0.863 2 3.050 w1 0.151 1 2.248 2 0.095 3 0.124 1 0.065 2 0.003 3 0.046 1 0.374 2 0.310 3 0.152 1 0.591 2 0.062 3 0.071 w2 0.725 w1 0.886 w2 0.114 w1 0.284 w2 0.716 w1 0.857 w2 0.143 w1 0.735 w2 0.265 w1 0.495 w2 0.505 w1 0.388 w2 0.555 w3 0.061 w2 0.752 w3 0.166 w2 0.455 w3 0.490 w2 0.615 w3 0.342 w2 0.542 w3 0.430 w2 0.226 w3 0.580 w2 0.254 w3 0.558 w2 0.307 w3 0.542 1 4.352 2 0.779 3 2.222 1 3.296 2 0.650 3 1.569 1 3.870 2 0.505 3 1.498 Table 12. Generalized Hyperbolic parameter estimation Index ATX DAX OMXH 25 CAC 40 SAX CSE ASE MONEX 20 Generalized Hyperbolic Distribution alpha 0.348 0.406 0.607 0.211 0.509 0.543 0.654 0.452 beta -0.074 -0.065 -0.046 -0.046 -0.111 -0.005 -0.084 0.034 delta 1.719 1.178 1.175 1.641 0.292 0.387 0.809 0.531 mu 0.274 0.173 0.141 0.107 0.130 -0.003 0.218 -0.089 Lambda -0.735 -0.683 -0.098 -1.260 -0.037 0.945 0.463 0.124
© Copyright 2026 Paperzz