Independent Factor Autoregressive Conditional Density Model Alexios Ghalanos † Cass Business School, UK ∗ Eduardo Rossi ‡ University of Pavia, Italy Giovanni Urga § Cass Business School, UK and Bergamo University, Italy This version: September 22, 2010 PRELIMINARY VERSION. PLEASE DO NOT QUOTE. Abstract An Independent Factor Autoregressive Conditional Density (ACD ) model using Independent Component Analysis is proposed, extending recent approaches in multivariate modelling with higher conditional moments and orthogonal factors. Using the Generalized Hyperbolic (GH ) as conditional distribution, conditional co-moments and weighted density for use in portfolio and risk management applications are derived. The empirical application, using MSCI equity index iShares data, shows that ACD modelling with a GH distribution allows to obtain taildependent risk measures, such as Value at Risk (VaR) estimates, which improve on those obtained from higher moment time-invariant conditional densities. 1 Introduction Since Mandelbrot (1963), researchers have discovered numerous statistical properties in real market time series that contradict the theoretical results of their models. These so called stylized facts, together with the paradigm shift away from the completely rational, representative agent to a boundedly rational, heterogeneous agent, has motivated researchers to model financial markets with a new set of tools, distributions and models. Among these, the pioneering work of Box and Cox (1964) in the area of autoregressive moving average models paved the way for related work in the area of volatility modelling with the introduction of ARCH and then GARCH models by Engle (1982) and Bollerslev (1986), respectively. In terms of the statistical framework, these models capture motion dynamics in the conditional time variation of the distributional parameters of the mean and variance. This allows to capture autocorrelation in returns and squared returns. There has also been a considerable body of evidence (see for example Wilhelmsson (2009), Harvey and Siddique (2009), and Rockinger and Jondeau(2002, 2003, 2009)) that financial markets not only exhibit skewness and thick tails, but that these need to be modelled in a time varying context, leading to superior estimates of tail related measures. While these time-varying models have received growing attention in the univariate conditional density modelling literature, very little has been achieved in incorporating such dynamics in a multivariate context. This is because of the difficulty, as well as absence of flexible model-distribution ∗ Part of this work was completed while the first author was visiting the CEA@CASS, whose hospitality he gratefully acknowledges. † 106 Bunhill Row, London EC1Y 8TZ, UK. E-mail: [email protected] ‡ Dipartimento Economia politica e metodi quantitativi. Via San Felice 5, 27100 Pavia, Italy. Tel.: +39 0382/986207 Fax: +39 0382/304226. E-mail: [email protected]. § 106 Bunhill Row, London EC1Y 8TZ, UK. E-mail: [email protected] 1 representation, in incorporating such complex interactions in both the marginal and joint distributional parameters. The indepedence factor framework provides for a particularly appealing avenue for capturing time varying higher moments in a multivariate setup, with estimation in univariate time. The main contribution of this paper is to extend the Generalized Orthogonal GARCH model of van der Weide (2002), using the Independent Components Analysis (ICA) estimation method used in Chen, Hardle, and Jeong (2008), Zhang and Chan (2009), and Broda and Paolella (2009), to include marginal dynamics for the whole density, as proposed by Hansen (1994). In somewhat the same way that the Conditional Correlation model of Bollerslev (1990) used a static dependence framework, the model we adopt uses a static independence framework. Unlike other dependence models however, independence offers a greater deal of flexibility in modelling the full marginal dynamics within a multivariate affine factor framework and enabling the calculation of the density of the weighted constituents for use in portfolio applications. Section 2 of our paper builds on the Independent Factor model by extending the motion dynamics to the skew and shape parameters of the full Generalized Hyperbolic 1-dimensional margins. Key features of the model such as the conditional higher co-moment tensors and weighted portfolio representation are presented and discussed. To test the performance of our proposed model, we carry out a medium sample study in Section 3, using log returns of closing price data on 17 exchange traded MSCI index funds (iShares)1 , from 19/03/1996 to 30/11/2009 obtained from the Center for Research in Security Prices. Our modelling strategy is designed to tackle the evidence that, marginally, returns are characterized by strong conditional skewness and kurtosis which, like conditional variance, are not time invariant. The statistical analysis confirms that the adoption of time-varying conditional higher moments marginal densities outperforms the standard GARCH models. The empirical application shows that the IFACD model with GH distribution allows to obtain tail-dependent risk measures, such as VaR estimates, which shown an improvement with respect to those obtained from higher moment time-invariant conditional densities in Broda and Paolella (2009). 2 The Independent Factor Framework Factor ARCH models, originally introduced by Engle, Ng, and Rothschild (1990), and with foundations in the Arbitrage Pricing Theory (APT) of Ross (1976)2 , are based on the premise that returns are generated by a set of unobserved underlying factors that are conditionally heteroscedastic. Because of the constant linear mapping of the factors to the underlying data, the dependence framework is non-dynamic, which appears to be a necessary tradeoff for large scale estimation in a multivariate setting. The dependence structure of the unobserved factors then determines the type of factor model it belongs to, with correlated factors making up the Factor ARCH (F-ARCH) type models, while uncorrelated and independent factors comprise the Orthogonal and Generalized Orthogonal Models (GO-GARCH ) respectively3 . Because one can always re-discover uncorrelated or independent sources by certain statistical transformation, the correlated factor assumption of F-ARCH models does appear to be restrictive. GO-GARCH models on the other hand make use of those transformations to place the factors in an independence framework with unique benefits such as separability and weighted density convolution giving rise to truly large scale, real-time, feasible estimation. 1 The MSCI tracked countries included were Australia, Canada, Sweden, Germany, Hong Kong, Italy, Japan, Belgium, Switzerland, Malaysia, Netherlands, Austria, Spain, France, Singapore, UK and Mexico. 2 Although the APT does not imply a finite number of factors 3 It should be noted, that most of these factor models may be seen as special cases of the BEKK model, itself subsumed by the Generalized Dynamic Covariance Model of Kroner and Ng (1998). 2 2.1 The Generalized Orthogonal GARCH Model Consider a set of N assets whose returns r t are observed for T periods, that is {r t }Tt=1 . The conditional mean of the returns is E[r t |Ft−1 ] = µt , where Ft−1 is the sigma-field generated by the past realizations of r t , i.e. Ft−1 = σ(r t , r t−1 , . . .). The GO-GARCH Model of van der Weide (2002) maps r t onto a set of linear combination of unobserved independent factors f t , which may then be estimated separately, and follows directly from the research in independent component analysis. r t = µt + ǫt (1) ǫt = Af t (2) where A is invertible and constant over time and may be decomposed into a de-whitening and orthogonal matrix, A = Σ1/2 U , and f t = (f1t , . . . , fN t ). The first step therefore to obtaining independent factors is to whiten the multivariate series via Principal Components Analysis, where the estimation of Σ may be carried out separately on the dataset, allowing also for some dimensionality reduction, followed by a transformation of the whitened series to a set of independent signals represented by the rotation, or orthogonal, matrix U . The different approaches to the calculation of the appropriate rotational transformation represent similar avenues in obtaining independent components, already well established in the statistical literature on ICA, and are all closely related. In the original paper of van der Weide (2002) this was estimated as a product of N (N − 1)/2 rotation matrices4 , via maximum likelihood, Y U= Rij (θij ), −π 6 θij , (3) i<j where Ri (θi ) is the ith rotation of the Euler angle θi in the plane spanned by one pair of axes in Rd . Note that when U is restricted to be an identity matrix, the model reduces to the Orthogonal Factor model of Alexander (2001), which leads to uncorrelated components but not necessarily independent unless assuming a multivariate normal distribution. Thus the rotational matrix U is estimated jointly with the univariate GARCH dynamics making this a high dimensional problem. The use of the Euler angle approach, ties the calculation of U , and hence the underlying independent factors, with the underlying model specification for the individual factors, which though very efficient as an estimator is prone to errors in model and conditional distribution mispecification of the underlying factors. Subsequently, U was estimated via a 3 step approach using non-linear least squares in van der Weide (2004), a 3 step method-of-moments in van der Weide and Boswijk (2008), and more recently by Broda and Paolella (2009) and Zhang and Chan (2009) in a 3 step approach using non-parametric ICA. The alternative approaches, sacrifice some of the efficiency for computational feasibility and a model free assumption in the calculation of U . They represent non-parametric approximations to independence, arising either from maximization of non-gaussianity or mutual information, with the former being the underlying method for the FASTICA method of Hyvärinen and Oja (1999) which uses approximations to negentropy to maximize non-gaussianity and hence find the independent factors. Because of the assumption of independence, the likelihood function is greatly simplified and represents the sum of the individual likelihoods (fij ) plus a term for the mixing matrix A, T X N −1 X L(εx |θ, A) = T log A + log (fij (εy,ij |θj )) (4) i=1 j=1 Since the standard spherical multivariate normal distribution is defined as the joint distribution of n independent univariate standard normal variables, the representation of the multivariate normal density with diagonal Σ is equivalent to equation (4). 4 The determinant of U is restricted to be 1 in this case. 3 The model of Broda and Paolella (2009) extends the GO-GARCH model in a distributional direction by using a multivariate affine representation of the Normal Inverse Gaussian (maNIG) proposed by Schmidt, Hrycej, and Stützle (2006) as the underlying distribution of the return generating process. It follows Chen, Härdle, and Spokoiny (2007) who applied the FASTICA method with the NIG as conditional distribution to a set of German stocks using a local volatility process. Both papers focused on the distribution of a weighted sum of underlying components in a risk and portfolio management context, for which numerical approximations to the weighted portfolio density using Fast Fourier Transform (FFT ) in the case of Chen, Härdle, and Spokoiny (2007), and a saddlepoint approximation (SPA) in the case of Broda and Paolella (2009) to evaluate the distribution quantile for the purposes of VaR calculation. The latter paper focuses on a portfolio application, without considering the co-moment dynamics which result from the model, whilst the former do attempt to make some comparison to the Dynamics Conditional Correlation (DCC ) model of Engle (2002) which generates time-varying conditional covariances. Because the GO-GARCH model only accomodates static (in)dependence dynamics, the comparison with the DCC model should be considered a key benchmark to beat, and for this reason we include it in our empirical application. In another direction, Zhang and Chan (2009) focus on the multivariate normal distribution, investigating some additional algorithms for U and proposing the intermediate use of DCC for the independent components to capture any residual correlation not eliminated by the GO-GARCH approach. One of the key advantages offered by the Generalized Orthogonal approach is that following the estimation of the independent factors, the dynamics of the marginal density parameters of those factors may be estimated separately and in parallel, while not restricted to any particular single model or dynamics. In this context, we propose to extend the dynamics to the full conditional density parameters, allowing us to model in a multivariate setting, time varying higher moments. While any multivariate distribution admitting an affine representation may be used in this setup, we have chosen the Generalized Hyperbolic for its flexibility and rich parametrization, capturing some of the most important features of observed returns such as asymmetry and fat tails. To motivate the choice of the GH distribution in our model, we present some of its features in the next section, with a particular emphasis on the 1-dimensional representation which forms the basis of the independent factor margins in our model. 2.2 The Generalized Hyperbolic Distribution The Generalized Hyperbolic Distribution (GH ), introduced by Barndorff-Nielsen (1977) in the context of a sand project, is a variance-mean mixture of the normal and Generalized Inverse Gaussian (GIG) distributions. It is an extremely flexible distribution, allowing for skewness and fat tails. It nests a large number of other distributions which have proved popular in the empirical modelling of financial asset returns, such as the Hyperbolic, Normal Inverse Gaussian, Variance Gamma, (skew) Laplace, and as limiting cases, the Normal and (skew) Student distributions. Tail flexibility is one particularly attractive feature of the GH model, which allows for modelling asymmetrically the upper and lowers tails. The skew-Student distribution for example, analyzed in Aas and Haff (2006), allows for the modelling of one heavy (with polynomial behavior) and one semi-heavy (with exponential behavior) tail.5 Definition 2.2 provides the general family which the GH distribution belongs to, that of the Normal Mean-Variance Mixture distributions discussed in Barndorff-Nielsen, Kent, and Sörensen (1982), thus allowing the identification of key properties it inherits and highlighting the origin of its uni-dimensional shape parameters which are given by the choice of the mixing distribution. Definition 2.1 The n-dimensional random variable X is said to have a normal mean-variance mixture distribution of the following form: 5 It is in fact the only distribution in the GH family to allow for one polynomial and one exponential tail. 4 d X = µ + Wγ + √ WAZ, (5) where Z ∼ Nq (0, Iq ), W ∈ R1+ , A ∈ Rn×q , and µ, γ ∈ Rn . From the definition it follows that, X |W ∼ Nq (µ + Wγ, WΣ) , E (X) = µ + E (W) γ, (6) ′ Cov (X) = E (W) Σ + var (W) γγ , where Σ = AA′ , and the mixing variable W is positive and has finite variance. A very useful property is that if the distribution of W is infinitely divisible, then the distribution of X is also infinitely divisible. This implies that there exists a Lèvy process with support over the entire real line, which is distributed at time t = 1 according to the law of X. Since the theoretical properties of Lèvy processes are well established, this translates into the possibility of formulating financial models directly in terms of such processes. A very popular choice for the mixing variable is the GIG distribution, so that W ∼ GIG(λ, χ, ψ)6 , in which case the multivariate GH distribution is obtained. It depends on the three real parameters of the GIG distribution, the location (µ) and skewness (γ) vectors in Rn , and a positive definite matrix Σ ∈ Rn×n . The kurtosis (tail behavior), described by the λ and χ shape parameters, is driven by the univariate GIG mixing distribution and is therefore similar in all dimensions. The n-dimensional GH distribution allows for the modelling of multivariate data with some very desirable features, such as the ability to model skewness individually for each dimension. Additionally, the distribution has the property of infinite divisibility (inherited from the GIG mixing distribution), and is closed under margining, conditioning and linear affine transformations, and in the case of the NIG and Variance Gamma VG distributions are also closed under convolution (for equal skew and shape parameters). To motivate the use of the affine representation of the GH distribution used in our model, definition 2.2 provides the standard n-dimensional representation which Schmidt, Hrycej, and Stützle (2006) reformulate into a model for use in an independence factor framework. Definition 2.2 The n-dimensional Generalized Hyperbolic Distribution of the random vector X ∈ Rn q ′ −1 2 Kλ−n/2 α δ + (x − µ) Σ (x − µ) ghn (x; α, β, δ, µ, Σ) = cn q 1 α ′ β (x−µ) . n/2−λ e ′ −1 2 δ + (x − µ) Σ (x − µ) p λ α2 − β ′ Σβ/δ p . cn = (2π)n/2 Kλ δ α2 − β ′ Σβ (7) with parameter domain of variation, λ ∈ R, β, µ ∈ Rn , δ > 0, α2 > β ′ Σβ, and Σ ∈ Rn×n with determinant 1. The fact that the domain of α is 1-dimensional, is due to the univariate GIG mixing distribution, and means that kurtosis is the same for all dimensions. That is, there is one joint representation of extreme events, which may not be an adequate reflection of the multivariate data, especially when they come from not very highly correlated (at least in the tail sense) sets. Schmidt, Hrycej, and Stützle (2006) point out that the margins of X are not mutually independent for some choice of the scaling matrix Σ which may be a key property of the problem (as we see in some factor models), and that the scaling matrix Σ is hard to interpret 6 The χ and ψ parameters have also been represented as δ 2 and α2 − β 2 respectively in the literature. 5 in the presence of skewness as it is in a complex relationship with the β vector. As a result, they propose an alternative, non-elliptical, multivariate affine Generalized Hyperbolic (maGH ) model which is composed of independent margins allowed to take on separate values for skewness and shape (as well as different λ therefore allowing to mix sub-families of distributions), presented in the following definition:7 Definition 2.3 The n-dimensional Affine Generalized Hyperbolic Distribution of the random vector X ∈ Rn has the following stochastic representation d X = µ + A′ Y ∈ maGHn (µn , Σ, λn , αn , βn ) . (8) d where the = notation is meant to convey equality in distribution and A is a lower triangular matrix such that A′ A = Σ is positive definite, and may be sought via singular value decomposition or other appropriate methodology. The 1-dimensional independent margins of X, are given by Yi ∈ GH 1 (0, 1, λi , αi , βi ), arrived at by first whitening X using the estimated Σ matrix, and then making them independent by applying an appropriate rotational matrix. The ability to model the margins separately with different parameters for each margin not only increases the flexibility of the model but also its computational ease, as the multivariate estimation reduces to a univariate one with the density likelihood equal to the product of the marginal likelihoods plus a term for the mixing matrix which is exactly the representation of the GO-GARCH model. Therefore, for the estimation of the GO-GARCH model within the context of such multivariate affine distributions, one need only focus on the marginal density for the model estimation and the characteristic function for the weighted portfolio summation. The following definition provides the conditional density of the independent factor margins of our proposed model: Definition 2.4 The 1-dimensional Generalized Hyperbolic Distribution of the random variable x∈R (λ−1/2)/2 gh (x; λ, α, β, δ, µ) = c (λ, α, β, δ, µ) δ2 + (x − µ)2 q 2 2 ×Kλ−1/2 α δ + (x − µ) eβ(x−µ) c (λ, α, β, δ, µ) = √ α2 − λ/2 β2 (9) p 2παλ−1/2 δλ Kλ δ α2 − β 2 with parameter domain of variation 0 6 |β| < α, µ, λ ∈ R and δ > 0, and Kλ being the modified bessel function of the third kind. Special cases of the distribution are obtained by varying λ. For example, the NIG distribution is obtained by setting λ to − 12 , the Hyperbolic by setting λ ν 8 to n+1 2 , and the skew-Student mentioned earlier by setting λ to − 2 (with ν representing the degrees of freedom), and α → |β|. The parameters may be interpreted as location (µ), scale (δ), skewness (β) and shape (α), hence allowing the most important features in financial modelling to be represented, namely that of trend, ’risk’ (by some measures), asymmetry and likelihood of extreme events. A number of location and scale invariant parametrizations of the GH distribution have been 7 A similar strategy was followed by Ferreira and Steel (2003) in constructing a multivariate skew student density with independent margins 8 Thus in the 1-d case this is simply 1. 6 proposed in the literature, ζ=δ p α2 − β 2 , ρ= 1 ξ = (1 − ζ)− 2 , ᾱ = αδ, β , α (10) χ = ξρ, β̄ = βδ. Bläesild (1981) proved that a linear transformation of the form aX+b of a variable X distributed according to a GH distribution would again lead to a variable distributed with the same distribution and parameters λ∗ = λ, α∗ = α/ |a|, β ∗ = β/ |a|, δ∗ = δ |a|, and µ∗ = aµ + b. Therefore, for the modelling of (0,1) processes such as we find in models which are centered and scaled by their mean and standard deviation, one can use any of these location and scale invariant parametrization plus the following theoretical moments formulas for the Generalized Hyperbolic (needed to apply the centering and scaling): βδ2 Kλ+1 (ζ) , E (X) = µ + p 2 2 α − β Kλ (ζ) V ar (X) = δ 2 Kλ+1 (ζ) β2 + 2 ζKλ (ζ) α − β2 " Kλ+2 (ζ) − Kλ (ζ) Kλ+1 (ζ) Kλ (ζ) 2 #! (11) . Prause (1999) suggests the use of the (ᾱ, β̄) parametrization, which is adopted by Jensen and Lunde (2001) as well as Wilhelmsson (2009) in their GARCH and ACD - NIG models respectively. In our model we have found the (ζ, ρ) parametrization to be adequate for our purposes. In any case, moving between any of these parametrizations is a simple matter of applying the appropriate transformation, and Prause (1999) provides further directions for applying the rescaling property when dealing with the standard deviation. Appendix A provides the nontrivial method for scaling and centering the GH density in the (ζ, ρ) and (ξ, χ) parametrizations for use in GARCH type processes. In the model we propose, the parameters (ζ, ρ) are modelled as time varying, in addition to the variance, with the key dynamics and their properties described in the next section. 2.3 Conditional Factor Dynamics Although models from the GARCH family are able under certain assumptions and parameterizations to produce thick-tailed and skewed unconditional distributions they typically assume that the shape and skewness parameters are time invariant. This also leads to the assumption that the conditional distribution of the standardized innovations (zt ) is independent of the conditioning information, for which there is no good reason to believe so a-priori. A number of authors, including Lai (1991), Prakash, Chang, and Pactwa (2003), and Jondeau and Rockinger (2006), have found evidence suggesting that the incorporation of higher moments in portfolio allocation leads to superior approximations of expected utility. With regards to time variation in the full conditional density parameters, different motion dynamics and distributions have been considered in the literature, on instruments varying from real estate to foreign exchange returns. The results are mixed, with Harvey and Siddique (2009) finding significant evidence of time varying skewness, Jondeau and Rockinger (2003) finding both time varying skewness and kurtosis significant, while Premaratne and Bera (2000), Brooks, Burke, Heravi, and Persand (2005) and Rockinger and Jondeau (2002) find no evidence of either. In terms of observation frequency, Jondeau and Rockinger (2003) find the presence of time varying skewness and kurtosis in daily but not weekly data, partly consistent with the observation that excess kurtosis diminishes with temporal aggregation, while others including Hansen (1994), Bond and Patel (2003) and Harvey and Siddique (2009) do find evidence of time varying skewness and kurtosis 7 in weekly and even monthly data. The mixed results may be partly the result of the constraints required to limit the distribution parameters within certain bounds (leading to transformations such as the logistic) and partly because both skewness and kurtosis are driven by extreme events making their identification with particular motion dynamics and (standardized) residuals very hard. Nevertheless, since ACD models subsume GARCH models, it is always possible to test down different specification for time and non-time variation in the underlying conditional distribution parameters through the use of appropriate marginal cost to benefit test as provided by various information criteria and likelihood ratio tests. We now consider the dynamics of the independent factors, in the context of an expanded GOGARCH model with dynamics for the full conditional parameters. We call this the Independent Factor ACD (IFACD ) to distinguish it from the GO-GARCH model by its use of a two-stage method using ICA and full conditional density dynamics. The unconditional distribution of the factors is characterized by: E[f t ] = 0 E[f t f ′t ] = I N , t = 1, . . . , T (12) which, in turn, implies that E[ǫt ] = 0 E[ǫt ǫ′t ] = AA′ . (13) The conditional covariance matrix, Σt ≡ E[(r t − mt )(r t − mt )′ |Ft−1 ] of the returns is given by: Σt = E[ǫt ǫ′t |Ft−1 ] = AE[f t f ′t |Ft−1 ]A′ = AH t A′ , (14) where H t = E[f t f ′t |Ft−1 ] is a diagonal matrix with elements (h1t , . . . , hN t ) which are the conditional variances of the factors. These conditional variances can be modeled as a GARCH-type process. We assume that the factors have the following specification: p fit = zit hit , (15) it follows that the returns can be expressed as: 1/2 r t = mt + AH t z t (16) where z t = (z1t , . . . , zN t )′ . The factor conditional variances are modelled as a GARCH(p,q) process: hit = ci + ai (L)ǫ2t + bi (L)hit (17) with ai (L) = ai,1 L + . . . + ai,q Lq bi (L) = bi,1 L + . . . + bi,p Lp . The random variables zit are independent across i and t with E[zit ] = 0 and E[zit2 ] = 1. We assume that the conditional distribution of zit is GH with zero mean and unit variance distributed, i.e. SGHλ,i (µi , δi , αi , βi ). We expand on the model of Broda and Paolella (2009), by allowing for time variation in the higher moment parameters and using the more general GH distribution rather than constraining the λ parameter to a particular value. The representation is such as to give a location and scale invariant parametrization of the GH distribution, namely they have zero mean and unit variance, and separate motion dynamics for the skewness and 8 shape parameters as in the ACD type models introduced by Hansen (1990). The skew and shape parameters in the zi,t ’s GH density are modeled with Quadratic(1,1,1) dynamics, 2 ρ̆it = γ0i + γ1i zit−1 + γ2i zit−1 + δ1 ρ̆it−1 ζ̆it = θ0i + θ1i zit−1 + 2 θ2i zit−1 + ψ1 ζ̆it−1 , (18) i = 1, . . . , N (19) with the logistic transform to map the unconstrained processes ρ̆t and ζ̆t into ρt and ζt : 1.98 1 + e−ρ̆it 20 ζit = 0.1 + , 1 + e−ζ̆it ρit = −0.99 + (20) (21) considering the bounds of the distributional parameters which are [−0.99, 0.99] and [0.1, 20] for ρ and ζ 9 respectively. Meanwhile, we also estimate the GIG shape parameter λi , allowing it to vary for each factor. follows that distributed as √ It √ √ the single √ factors, fit , i = 1, . . . , N , are conditionally N a GHλi (µi hit , δi hit , αi / hit , βi / hit ). Finally, the vector of returns r t ∈ R , which can be expressed as a linear transformation of independent factors f t ∈ RN , is conditionally distributed according to the multivariate affine GH (maGH ) distribution of Schmidt, Hrycej, and Stützle (2006): r t |Ft−1 ∼ maGHN (mt , Σt , ω t ), where ω t = (ω1 , . . . , ωN ) and ωi = (λi , αi , βi )′ , representing the conditional shape and skew parameter vectors. As noted by Hansen (1994), because the standardized innovations zt are no longer i.i.d, consistency of the MLE and asymptotic normality of the parameters is hard to provide in such a setting. However, we did run various simulations on different parameter values and recursive windows to obtain the simulated√Root Mean Squared Errors (RMSE ) of true versus expected values, observing in most cases T type consistency. Figure 1 shows one such simulation on a set of parameters, with the blue line indicating √ p the expected RMSE value given T consistency. That is, the RMSE should decrease by T + /T when increasing the sample size from T to T + . The model entertained for this exercise was an ARMA (1,1) - GARCH (1,1) model, with parameters, (µ, ar1 , ma1 , ω, α1 , β1 ), and Quadratic (1,1,1) dynamics for the skew and shape with parameters (γ0 , γ1 , γ2 , ρ̆) and (θ0 , θ1 , θ2 , ζ̆), respectively. Because of the size and complexity of the simulation, we chose the NIG subclass from within the GH distribution, and as such cannot make any inference on the simulated properties of the λ parameter. However, we have observed that some combinations of λ, ρ and ζ yield very similar results, leading us to hypothesize that this parameter will have a very different distribution and RMSE to the other estimated parameters, but leave this question open for further research. The extension of dynamics to all the parameters of the distribution presents the opportunity to go beyond the conditional time-varying covariance. In the next section, we present the timevarying higher moment tensors representation, which provide additional insights into our model’s multivariate interactions. These form the basis for obtaining weighted moments resulting from the geometric transformation properties of our model, and discussed in section 2.6. [Insert Figure 1 here] 2.4 Conditional Co-Moments The linear affine representation of the IFACD model, possesses certain geometric properties which allow for the extension beyond the concept of covariance to that of the co-skewness and 9 We limit the upper bound of ζ to 20 for estimation ease, since values beyond this point lead to very little change in the skewness and kurtosis, with the range 0.1 to 20 representing most of the distribution. 9 co-kurtosis as described in de Athayde and Flôres Jr (2002). As mentioned previously, the covariance of rt is given by: Σt = AH f,t A′ , (22) where because of the independence of ft , Hf ,t = diag(h1,t , . . . , hn,t ), which is obtained from the ACD motion dynamics of the variance. The co-moments of order 3 and 4 (from which the co-skewness and co-kurtosis are then derived) represented as tensor matrices, are M 3t = A′ M 3f,t (A ⊗ A), M 4t = A′ M 4f,t (A ⊗ A ⊗ A), (23) where M 3f,t = tdiag(c3111,t , . . . , c3nnn,t ) and M 4f,t = tdiag(c4111,t , . . . , c4nnn,t ), with tdiag representing the position of the conditional 3rd and 4th independent factor moments in the diagonal entries of the n × n2 and n × n3 tensor matrices respectively.10 Since we are estimating the skew and shape parameters in a time varying setup, these matrices are also time-varying in univariate dynamics, but the multivariate dependence is non-time varying and provided by the mixing matrix A. Finally in order to standardize the co-moments to represent co-skewness and co-kurtosis of rt , one needs to divide the entries of the tensors by the product of their standard deviations, S ijk,t = K ijkl,t M 3ijk,t , (σi,t σj,t σk,t ) M 4ijkl,t = , (σi,t σj,t σk,tσl,t ) (24) where S ijk,t represents the asset co-skewness between elements i, j, k of r at time t, σi,t the standard deviation of the element i of r at time t, and in the case of i = j = k represents the asset skewness of element i at time t, and similarly for the co-kurtosis tensor K ijkl,t. Further to the higher co-moment representation, the next section provides additional insights into the effect of factor shocks on the underlying asset co-moments, by applying the news impact surface method to our model. 2.5 News Impact Surface A very revealing method for visualizing the multivariate dynamics in GARCH systems is through the news impact function. This was originally suggested in the univariate literature by Engle and Ng (1993), providing a visual representation of the impact of shocks on the time varying variance. It was extended to a surface function by Kroner and Ng (1998) who compared a number of multivariate models and the type of surfaces they generate. This was further extended in a natural direction by Jondeau and Rockinger (2009) to include the impact of higher moment co-dependence. While the IFACD model is mainly one of univariate independent dynamics, we investigate the type of interactions generated by the model by constructing news impact surfaces for the covariance and third co-moment. Since shocks impact the factors independently, the news impact surface is a combination of the independent news impact curves of the factors which when combined via the mixing matrix A, create the dynamics for the underlying asset-factor surface function. Specifically, let the vector zt−1 denote those inputs known at time t − 1 for the determination of the hi,t , m3i,t and m4i,t , denoting the variance, third and fourth moments of the factors respectively, and denote as Z the unconditional value of those inputs except for those to factors i and j, ẑi,t−1 and ẑj,t−1 respectively. The determination of the news impact surface of hab,t , m3abc,t and m4abcd,t denoting the conditional covariance, third and fourth co-moments of 10 Because of independence, only indices representing the univariate parameters (i.e. 111) are non zero. 10 the underlying assets a,b, and c, with respect to shocks from factors i and j, comprise the three dimensional graphs of the functions: hab,t = f (A, Hf,t−1 |(ẑi,t−1 , ẑj,t−1 , Z)), m3abc,t = f (A, M3f,t−1 |(ẑi,t−1 , ẑj,t−1 , Z)), m4abcd,t = (25) f (A, M4f,t−1 |(ẑj,t−1 , ẑj,t−1 , Z)), where, Hf,t−1 , M3f,t−1 and M4f,t−1 denote the independent factor covariance matrix, and third and fourth co-moment tensor matrices as in equations (22) and (23). [Insert Figure 2 here] [Insert Figure 3 here] Figure 2 displays the covariance news impact surface for the MSCI iShares of the commodity producing nations of Australia (EWA) and Canada (EWC), showing the impact of shocks from factors 12 and 17 on the variances of EWA and EWC, and their covariance. Factor 17 is the dominating influence on these two assets, which have a sort of U-shaped curve as described by Kroner and Ng (1998) who noted that the news impact surface of F-ARCH models is always U-shaped, with the factors determining the direction the parabola will point11 . Figures 3a, 3b, 3c show how the third moment and co-moment for the MSCI ishares of the Netherlands (EWN) and Austria (EWO) react to shocks from factors 8 and 10. In this case, factor 10 dominates the the impact on the third moment of EWO, factor 8 dominates the impact on the third moment of EWN, while jointly the picture is mixed with factor 8 having the overall largest impact. The joint picture, namely that of the third co-moment of type iij (displayed as [EWN,EWN,EWO] in the figure), shows how good a hedge asset j (EWO) is in terms of volatility changes in asset i (EWN), with respect to the factor shocks, with a negative value indicating that asset j’s return goes down with a positive increase in the volatility in country i, hence providing for a poor hedge. As the figure shows, EWO does not provide a good hedge for EWN against large negative shocks from factors 8 or 10. In contrast, Figure 3d, which shows the impact of factors 2 and 3 on the MSCI iShares for Switzerland (EWL) and France (EWQ), factor 2 dominates, but the U-shaped surface denotes that the assets provide a good hedge for each other against large negative shocks from this factor. The information from such visual diagnostics should be supplemented with more concrete analysis, as suggested by Jondeau and Rockinger (2009), in terms of simulations to determine the finite sample distribution of the reactions to shocks as well as Impulse Response functions to track the decay of the reactions to the shocks over time. Finally, in the next section we present the weighted conditional density summation property of the model and a fast estimation method for use in portfolio applications. 2.6 Weighted Conditional Density Summation and Portfolio Representation With its unique attribute, in the GH family, of being closed under convolution, the N -dimensional NIG distribution is uniquely suited to problems in portfolio and risk management where a certain weighted sum of assets is required. However, when the distributional parameters α and β, representing skew and shape, are allowed to vary, as is the case in the model, this property no longer holds and we need to use numerical methods such as that of the Fast Fourier Transform (FFT ) to derive the weighted density by inversion of the characteristic function of the scaled parameters. In the case of the NIG distribution, this is greatly simplified because of the representation of the modified Bessel function with fixed index of -0.5 which was derived in Barndorff-Nielsen and Bläesild (1981), otherwise the characteristic function of the Generalized 11 The GO-GARCH models is equivalent to an F-ARCH model with the number of factors equal to the number of assets, diagonal covariances and no idiosyncratic shocks (see van der Weide (2002) for details) 11 Hyperbolic involves the evaluation of the modified Bessel function with complex arguments, which though not impossible does complicate the inversion. Appendix B of the paper derives the characteristic functions used in the case of independent margins for both the NIG and full GH distributions. The portfolio return, Rt , i.e. the weighted sum of the returns vector r t through the portfolio strategy wt , is distributed according to an N -dimensional GH distribution. Given the factor estimates, y t , the portfolio return is given by: Rt = w′t rt = w′t (mt + Ay t ), 1/2 = w′t mt + (w ′t AH t )z t , zi,t ∼ SGHλi (µi,t , δi,t , αi,t , βi,t ) (26) 1/2 where H t is a diagonal matrix with the conditional standard deviations, estimated from the ACD dynamics of y t , z t are the N -dimensional innovations each distributed as 1-dimensional standardized GH. The weighted asset returns, wit rit , are distributed as a scaled 1-dimensional GH (see Bläesild (1981) for GH scaling property), αi,t βi,t wi,t ri,t = (wi,t µi,t + w i,t zi,t ) ∼ GHλi wi,t µ ei,t + wi,t µx,i,t , |w i,t | δi,t , , (27) |wi,t | |w i,t | 1/2 where wt is equal to w′t AH t , and w i,t is the i -th element of w t , µi,t the mean of the i -th underlying asset. In order to obtain the density of the portfolio, we must sum the individual weighted densities of xi,t , either by simulation or FFT as in Chen, Härdle, and Spokoiny (2007). We choose the latter for its accuracy and speed. In order to approximate the density of the portfolio return, we work with the characteristic function of the GH distribution, inverting it via the FFT method. The characteristic function12 of the portfolio return Rt is ϕR (z) = n Y ϕw̄Zi (z) i=1 = exp iu d X j=1 µ̄j + d X j=1 log λj 2 Kλj λ log (γ)− 2j log (υ) + √ √ δ̄j υ − log Kλj δ̄j γ , (28) where, γ = ᾱ2j − β̄j2 , υ = ᾱ2j − (β̄j + iu)2 , and (ᾱ, β̄, δ̄, µ̄) are the scaled versions of the parameters (α̂, β̂, δ̂, µ̂) as shown in (27). The density may be accurately approximated by FFT13 as, Z +∞ Z s 1 1 fp (R) = e(−itr) ψ(z)dt ≈ e(−itr) ψ(z)dt. (29) 2π −∞ 2π −s The cumulative distribution and quantile functions can then be approximated using this density. Since we have a time-varying density, in-sample this procedure must be carried out for all data points (for forecasting one need only look at the n-ahead forecast horizon). Further properties also arise naturally, such as that of portfolio variance, skewness and kurtosis from the multivariate stage estimation, and follow from the properties of tensor matrices (see for example de Athayde and Flôres Jr (2000)), 2 σp,t = w′ Σt w, sp,t = kp,t = 12 13 w′ M 3t (w ⊗ w) , (w′ Σt w)3/2 w′ M 4t (w ⊗ w ⊗ w) (w′ Σt w)2 (30) , See Appendix B for derivation. For an applied overview of this methodology, see for example Paolella (2007) section 1.3.3. 12 where Σt , M 3t and M 4t are derived in equations (22) and (23). 3 Estimation and Empirical Application In this section we report the results of a medium scale empirical exercise, with a view to highlighting the properties and performance of our proposed model and its practical implementation. The application covers 2 key events in the recent history of the equity markets, those of the 1998 and 2009 crashes providing an ideal testing set for a risk management exercise. The first section investigates the in-sample fit of the model, with tests on the conditional distribution choice and parameter dynamics. The performance of the model in capturing time variation in higher moments is tested against the non-time varying model and a DCC-Student model, in the context of a portfolio risk management application using VaR . The second section extends this exercise to a rolling out-of-sample testing environment to capture a realistic setting around the 2009 crash. Both sections report the results of the conditional coverage test of Christoffersen (1998) and superior predictive ability test of Hansen (2005) at the 1% and 5% quantile levels respectively. 3.1 Selection Strategy and In-Sample Fit As ACD models subsume GARCH models14 , one can always use some information criterion to choose the model that best describes the data. To evaluate the contribution of the ACD dynamics model under consideration, in the context of the current Factor model extension, we considered the fit on the 17 iShares’ log returns for the period 19/03/1996 - 29/05/2007 of the IFACD-GH model. This was estimated assuming a constant mean, which may seem unrealistic for such a large period, but other method for centering and removing any autocorrelations could be used such as univariate AR or Vector AR (VAR). The results were little changed in a parallel exercise using a VAR(1) model for the mean. A GARCH(1,1) model was used for the time varying variance, although as noted previously, many other parametrizations are possible, though the more exotic a system the more likely it is that simulation methods will be required post-estimation to generate certain properties of the model. For the distribution skew and shape parameters, ρt and ζt , a general Quadratic(1,1,1) model was chosen for insample evaluation, but the Hannan-Quinn information criterion (HQIC )15 was used to choose the best model from those nested in the model16 , including the non-time varying parametrization. Since we have a highly nonlinear model, critically depending on good starting values if there is any chance of obtaining anything other than a local solution to the optimization problem, we followed a strategy of generating thousands of starting values for the parameters from the uniform distribution, bounded by the parameters’ lower and upper bounds, evaluated the loglikelihood of those starting values, ranked the top 12 and initiated 12 restarts of the solver17 . Some justification and confidence in this strategy can be found in Hu, Shonkwiler, and Spruill (1994). [Insert Table 1 here] The results are in Table 1 which includes the parameters estimates and p-values of the best fitting model for each factor. Of the 17 factors, only 1 had neither skew nor shape as time varying, while jointly of the two, 4 factors had non time varying skew dynamics and 3 non-varying 14 Since they can be represented as intercept only models in higher moment dynamics. The HQIC, defined as −2LLn (k)/n + 2k ln(ln(n))/n, was chosen as it holds the middle ground between the Akaike criterion (AIC) which is known to underpenalize and the Bayesian criterion (BIC ) which is known to overpenalize the models 16 Specifically, the lowest HQIC was chosen among 16 estimated combinations of dynamics for the skew and shape parameters for each factor. 17 We used an augmented Lagrange based solver with SQP interior interior algorithm described in Ye (1997). 15 13 shape dynamics. In the joint case, neither the time invariant skew nor the time invariant shape where significant at the 5% level. GARCH persistence appeared to be quite high across all factors, averaging 0.99, indicating perhaps that there are structural breaks in the dataset, a not too unlikely phenomenon considering the long time period under consideration and the 1998 event in the set. Of the 165 total parameters characterizing the univariate ACD equation across the 17 Factors, only 9 of them of fell outside the 1% critical value of the individual NyblomHansen statistic (of which 7 where in the variance dynamics), and of the 17 models, 2 had a joint statistic which also fell outside the 1% critical value. While the λ parameter ranged from 4.5 to -5.5, with a mean around -3, we also tested the restriction of using the NIG subclass. The table reports the likelihood ratio test of the full GH fitted model against the restricted NIG model with λ = −0.5 for the factors which we run in order to justify our use of the more general distribution family. The p-value indicates that the restricted model can be rejected for 11 out of the 17 factors at the 5% significance level. 3.2 Portfolio Value at Risk As the IFACD model aims to capture time-variation in parameters affecting the whole shape of the distribution, it is natural to consider an application involving Value-at-Risk which critically depends on such aspects of the distribution as influenced by tail events and skewness. Using an equally weighted portfolio, we tested the IFACD-GHYP against the GO-GARCH-GHYP and DCC-Student18 models both in-sample and out-of-sample. The out-of-sample comparative study was performed using a period of 500 days covering the 05/12/2007 to 30/11/2009. This period was chosen since it includes the peak (in late 2007) and trough (in early 2009) of the last decade. In terms of our equally weighted iShares index, it represents a draw-down of more than 60% followed by an equivalent run-up close to the same magnitude, and hence an ideal testing period for a VaR study. The models were estimated using a strategy of rolling 1-ahead forecasts, re-estimating the model every 25 trading days and using an increasing window size for the estimation.19 . The relevant tests which form the basis for the evaluation of the adequacy of the model are those of Kupiec (1995) and Christoffersen (1998) for VaR exceedances, and Hansen (2005) for superior predictive ability and discussed in the next sections. 3.2.1 VaR exceedances Denote rt the return of an asset at time t, and the Value at Risk based on an expected α% coverage rate at time t given the information set Ωt−1 as V aRt|t−1 (α). The ex-post coverage rate is then, Pr rt < V aRt|t−1 (α) = α. (31) and the Indicator variable, It (α) as the ex-post realization of an exceedance such that, ( 1, if rt < V aRt|t−1 (α) It (α) = 0, otherwise (32) The original test of exceedances, called unconditional coverage or proportion of failures test, by Kupiec (1995) aimed at testing whether the observed frequency of VaR exceedances was consistent with the expected exceedances given the quantile under consideration and a confidence 18 The DCC model was included to contrast time variation in dependence and whether our static model could do better based on its other attributes. To stay within a 2-stage estimation framework we chose the Student distribution, since the GH distribution does not admit a representation for 2-stage DCC estimation. 19 That means the model parameters were re-estimated a total of 500/20 = 25 times, using an estimation window at time t0 = 05/12/2007 of size 2451 increasing that by 25 at each re-estimation point. 14 level, that is, whether Pr [It (α) = 1] = E [It (α)] = α. Under the null hypothesis of a correctly specified model, the number of exceedances follows a binomial distribution with the probability of obtaining X or more exceedances given a correct model given by N X Pr (X|N, p) = p (1 − p)N −X (33) X where p is the probability of an exceedance for the confidence level under consideration, X the number of exceedances given the VaR quantile and N the sample size. Therefore, levels of the probability below a given significance level lead to a rejection of the null hypothesis. The test is usually conducted as a likelihood ratio test, with the statistic taking the form, ! (1 − p)N −X pX LRuc = −2 ln (34) N −X X X 1− X N N which under the null that the model is correct, is asymptotically distributed as χ2 with 1 degree of freedom. Obviously, the test does not consider any other attributes of the exceedances other than their frequency leading to potential violation of the assumption of the independence of X should those exceedances possess autocorrelation. The conditional coverage test of Christoffersen (1998) corrects this by jointly testing the frequency as well as independence of exceedances, assuming that the VaR violation, It (α) is modelled with a first order Markov chain whose matrix of transition probabilities is defined by, η00 η01 Π= (35) η10 η11 where ηij = Pr [It (α) = j|It−1 (α) = i]. Therefore, the null of the conditional independence is then defined by, 1−α α H0 : Π = Πα = (36) 1−α α which means that the probability of having an exceedance, irrespective of the state at time t, should be equal to the coverage rate α. The test is a joint likelihood ratio of the conditional and unconditional coverage, expressed as, ! (1 − π)η00 +η10 π η01 +η11 (1 − p)N −E pE LRcc = −2 ln (37) N −E E E − 2 ln (1 − π0 )η00 π0η01 (1 − π1 )η10 π1η11 1− E N where π0 = η01 , η00 + η01 N π1 = η11 η10 + η11 andπ = η01 + η11 η00 + η01 + η10 + η11 (38) which is asymptotically distributed as χ2 with 2 degrees of freedom. It is important to report both the joint and separate unconditional test since it is always possible that the joint test passes while failing either the independence or unconditional coverage test. Other extensions include tests by Engle and Manganelli (2004) and Patton (2002), but for the current exercise only the conditional coverage test is used as it is believed to be adequate and easy to replicate. [Insert Table 2 here] The marginal contribution of the higher moment dynamics was nicely captured in the insample VaR exceedances test reported in Table 2. As would be expected, from a model which aims to capture time variation in the tails, in-sample the IFACD model outperformed the GOGARCH model at the 1% quantile, with the DCC model also performing particularly well. At 15 the 5% quantile, both IFACD and GO-GARCH models passed the unconditional coverage test but failed the conditional coverage test. We hypothesize that this may be an artifact of the timeinvariant mixing matrix together with a constant mean in the specification. In the out-of-sample estimation, what we observe is that both IFACD and GO-GARCH models perform equally well at the 1% and equally badly at the 5% quantiles. The fact that the GO-GARCH performs so well is likely due to the frequency of re-estimation (25 days) which starts to approximate a time varying model. In fact, the frequent re-estimation allows for changes in the independent factors to be captured on a lagged basis, which leads us to hypothesize that the DCC model, which implicitly models dynamic dependence, fails out of sample due to the use of the Student distribution which does not provide the required tail and asymmetry variation. 3.2.2 Superior Predictive Ability The VaR exceedances test may be considered a rather crude method to capturing the differences in such closely related models, as it distinguishes on the bases of integer exceedances and does not look at some measure of average exceedance which would capture finer differences. Also, it does require a large amount of out of sample exceedance data to avoid to possibility of datasnooping bias, which is where tests such as those of White (2000) and Hansen (2005), using an appropriate loss function, come into play. For the VaR based test, we follow Gonzalez-Rivera, Lee, and Mishra (2004) and define a statistical loss function used in quantile estimation, which for a given α is defined as, Qloss ≡ N −1 T X t=R α α − 1 rt+1 < V aRt+1 α rt+1 − V aRt+1 , (39) where P = T − R is the number of out-of-sample horizon, T the total horizon to include estimation, and R the start of the out-of-sample forecast. This is an asymmetric loss function, linearly penalizing exceedances more heavily by (1−α). Because of the non-differentiable nature of the indicator function 1, we again follow Gonzalez-Rivera, Lee, and Mishra (2004) and replace it with the approximation: −1 α α 1 rt+1 < V aRt+1 ≈ 1 + exp δ rt+1 − V aRt+1 (40) which is found to very closely match the indicator function for values of δ equal to 25.20 Among the tests for choosing superior models, while controlling for data snooping bias, the reality check of White (2000) has proven popular but the power of the test suffers from the inclusion of a poor model, a shortcoming addressed by the Superior Predictive Ability (SPA) Test of Hansen (2005) which was used in a related exercise comparing univariate GARCH models by Hansen and Lunde (2005). The test considers the relative loss, between the benchmark model loss L0,t and every other included model’s loss Lk,t, Xk,t ≡ L0,t − Lk,t , k = 1, ..., l, t = 1, ..., n. (41) The null hypothesis under the test, is that a chosen benchmark model is as good as any other model in terms of expected loss, which may be formulated as a hypothesis that H0 : µk ≡ E (Xk,t ) 6 0, since when µk > 0 is equivalent to a model k being better than the benchmark in relative loss terms. The test statistic considered to test the hypothesis is the maximum of the standardized relative losses, n1/2 X̄k Tn = max (42) k σ̂k 20 That is, when dealing with percentages, otherwise for decimals use 2500. 16 where, X̄k = n −1 n X Xk (t) (43) t=1 and σ̂k2 = var(n1/2 X̄k ) is estimated by stationary bootstrap. The distribution of the test statistic Tn under the assumption of a true null hypothesis, and its power is extensively discussed in Hansen (2005). [Insert Table 3 here] The insights from the exceedances test discussed previously are enforced here by the SPA test. Table 3 compares the weighted portfolio loss functions (reported as %) for the IFACD and GOGARCH models at the 1% and 5% quantiles. As in the exceedances test, the in-sample results confirm the superior performance of the IFACD model, whilst out-of-sample both models perform equally well. The power of the test is implied by reversing the benchmark with the model, rerunning the test and observing whether the results hold21 . [Insert Figure 4 here] In an illustrative contrast, Figure 4 shows the iShares equally weighted portfolio index, volatility, skewness and kurtosis generated by the IFACD-NIG, IFGARCH-NIG and DCC-Student models for the 500 day period of the out-of-sample estimation. The frequent re-estimation of the GO-GARCH model resulted in an apparent time variation in the higher moments, where this obviously was not present in-sample which covered a large single estimation period. While this generated enough variation in the tails to pass the VaR tests at the 1% quantile level, the same could not be said of the DCC Student model which generated very little in terms of excess kurtosis with an average degree of freedom parameter across the re-estimations of 1422 . 4 Conclusion Giovanni/Eduardo –¿ your contributions here. Research in the importance of higher moments in portfolio and risk management has received a growing amount of attention in recent years. Not only is it important to consider skewed and heavy tailed distributions but there appears to be strong evidence that there is also time variation in the higher moment and as such cannot be ignored when considering conditional density dynamics. The ability to such dynamics in a multivariate setup has been almost impossible due to the form of most multivariate distributions, few admitting a tractable representation, and for those which do the dimensionality of any resulting formulation has been infeasible to estimate. In the Independent Factor Autoregressive Conditional Density Model, we have provided a first attempt to incorporate such time variation by adopting an independent factor framework. Some evidence shows that a dynamic higher moment model provides for superior results versus an equivalent static representation, and we hypothesize that results will be mixed depending on the type of dataset, time period under consideration as well as the context. We expect that a model incorporating some sort of dynamics or moving window for the factor independence would provide for a better fit, but leave this as an extension for further research. 21 This is simple to confirm when there is 1 benchmark and 1 model, which is why we exclude the DCC model in this study in order to contrast with the directly competing model. 22 The excess kurtosis of the Student Distribution is 6/ν − 4 for nu > 4. 17 1e−06 2000 4000 6000 rmse 2e−07 4000 6000 8000 10000 2000 0.16 0.14 6000 8000 10000 2000 6000 2000 4000 0.16 0.5 (h) γ1 rmse rmse 4000 6000 8000 10000 0.08 0.1 0.2 0.20 0.15 2000 T 2000 4000 6000 T 8000 10000 T (j) δ1 2000 4000 6000 T (k) θ0 (l) θ1 0.06 0.08 0.04 0.10 rmse 0.14 0.18 (i) γ2 rmse 8000 10000 0.4 0.30 6000 T (g) γ0 0.25 rmse 0.06 8000 10000 8000 10000 T (f) β1 0.04 6000 4000 T 0.02 4000 rmse 0.08 4000 0.04 0.06 2000 (e) α1 2000 0.10 rmse 8000 10000 T 0.3 6000 8000 10000 (d) ω 0.020 4000 6000 T (c) ma1 rmse 2000 4000 T 0.010 rmse 2000 (b) ar1 0.010 0.015 0.020 0.025 (a) µ rmse 8000 10000 T 0.12 8000 10000 T 0.12 6000 6e−07 0.4 0.1 0.1 4000 0.3 0.2 rmse 0.3 0.2 rmse 0.4 0.00020 rmse 0.00010 2000 2000 4000 6000 8000 10000 2000 T 4000 6000 8000 10000 T (m) θ2 (n) ψ1 Figure 1: ACD Model Simulation: True vs Estimated Root Mean Squared Error 18 8000 10000 Table 1: MSCI iShares 17 (19/03/1996 - 05/12/2007) Independent Factor ACD-GHYP Model Fit. ω α1 β1 λ γ0 Y1 0 .0027 [0.073] 0 .053 [0.000] 0 .946 [0.000] 4.540 [0.000] 0.032 [0.258] γ1 γ2 ρ˘1 θ0 0.251 [0.271] Y2 0 .0022 [0.067] 0 .062 [0.000] 0 .937 [0.000] -1.136 [0.484] -0.002 [0.163] -0.011 [0.223] 0.002 [0.198] 0.999 [0.000] 1.362 [0.000] θ1 19 Y3 0 .003 [0.113] 0 .052 [0.000] 0 .947 [0.000] -3.967 [0.000] -0.470 [0.007] 1.617 [0.000] 0.260 [0.022] 0.856 [0.000] -0.164 [0.000] -0.718 [0.000] Y4 0 .0071 [0.153] 0 .048 [0.013] 0 .945 [0.000] -3.765 [0.000] -0.247 [0.776] 2.000 [0.049] 0.390 [0.244] 0.355 [0.237] -1.053 [0.188] 1.534 [0.067] Y5 0 .0042 [0.319] 0 .106 [0.042] 0 .893 [0.000] -3.038 [0.000] -0.057 [0.000] -0.172 [0.069] 0.017 [0.012] 0.964 [0.000] -4.475 [0.000] -1.567 [0.097] Y6 0 .0126 [0.025] 0 .062 [0.000] 0 .932 [0.000] -1.521 [0.058] -0.436 [0.000] 0.583 [0.000] 0.203 [0.004] 0.314 [0.027] -0.296 [0.305] -0.339 [0.211] 0.985 [0.000] 0.893 [0.000] 0.063 [0.050] 0.879 [0.000] θ2 ζ˘1 Y7 0 .0029 [0.057] 0 .052 [0.000] 0 .946 [0.000] -5.422 [0.000] -0.286 [0.000] 0.057 [0.656] 0.326 [0.000] 0.955 [0.000] -3.379 [0.000] 0.263 [0.703] -0.243 [0.529] Y8 0 .0026 [0.031] 0 .053 [0.000] 0 .946 [0.000] -5.134 [0.000] -0.570 [0.122] 0.835 [0.038] 0.260 [0.291] -8.022 [0.000] 1.054 [0.111] 2.000 [0.002] Y9 0 .0041 [0.099] 0 .046 [0.000] 0 .951 [0.000] -5.510 [0.000] -0.317 [0.770] 2.000 [0.017] -0.728 [0.385] 0.488 [0.024] Y1 0 0 .0219 [0.023] 0 .065 [0.000] 0 .914 [0.000] -3.451 [0.000] 0.002 [0.823] -0.064 [0.882] Y1 1 0 .001 [0.246] 0 .015 [0.000] 0 .984 [0.000] -4.971 [0.000] -1.741 [0.014] 0.961 [0.112] 0.766 [0.002] 0.999 [0.000] -5.276 [0.745] -2.000 [0.964] -0.932 [0.000] -2.000 [0.197] Y1 2 0 .0012 [0.119] 0 .030 [0.000] 0 .968 [0.000] -4.181 [0.001] -0.187 [0.145] -0.251 [0.225] 0.182 [0.131] 0.893 [0.000] -0.078 [0.287] -1.400 [0.110] 0.293 [0.877] 0.886 [0.000] 0.927 [0.000] Y1 3 0 .0015 [0.166] 0 .042 [0.000] 0 .957 [0.000] -4.629 [0.000] 0.193 [0.252] Y1 4 0 .001 [0.251] 0 .024 [0.001] 0 .975 [0.000] 2.358 [0.000] 0.055 [0.029] -3.768 [0.015] 2.000 [0.003] 0.728 [0.042] -0.985 [0.006] -2.000 [0.082] Y1 5 0 .0012 [0.359] 0 .020 [0.058] 0 .979 [0.000] -2.989 [0.000] -0.312 [0.060] 0.586 [0.001] Y1 6 0 .001 [0.225] 0 .024 [0.004] 0 .975 [0.000] -3.612 [0.000] -0.208 [0.051] -4.153 [0.000] -1.039 [0.300] 2.000 [0.097] 0.722 [0.000] 0.048 [0.930] -0.235 [0.294] 0.439 [0.116] -0.328 [0.010] 0.812 [0.000] Y1 7 0 .0044 [0.028] 0 .066 [0.000] 0 .933 [0.000] -4.836 [0.000] 0.809 [0.061] 1.359 [0.035] -1.486 [0.000] 0.368 [0.002] -2.772 [0.000] 1.903 [0.098] -2.000 [0.066] HQIC Log.Lik Nyblom Nyblom* 1% 2.55 -3746 1.61 [2.12] 2.11 -3094 1.74 [2.82] 2.53 -3703 3.06 [3.27] 2.60 -3807 2.28 [3.27] 1.87 -2732 4.40 [3.27] 2.68 -3932 2.90 [3.27] 2.51 -3677 2.12 [3.27] 2.56 -3755 4.01 [3.05] 2.64 -3877 1.57 [2.59] 2.67 -3913 2.15 [3.05] 2.69 -3951 2.19 [3.05] 2.61 -3821 2.41 [3.27] 2.60 -3816 1.70 [2.59] 2.60 -3821 2.48 [2.59] 2.64 -3872 1.70 [3.27] 2.60 -3817 2.20 [2.59] 2.58 -3783 1.74 [3.27] Log.Lik (λ = -0.5) LR Stat p-value -3746 -0.69 0.41 -3098 -7.39 0.01 -3709 -12.20 0.00 -3955 -297.21 0.00 -2769 -74.34 0.00 -3932 -0.14 0.70 -3680 -5.90 0.02 -3757 -3.61 0.06 -3879 -3.65 0.06 -4036 -246.93 0.00 -4117 -332.13 0.00 -3824 -6.43 0.01 -3818 -3.66 0.06 -3825 -8.28 0.00 -3877 -8.81 0.00 -3819 -3.84 0.05 -3792 -18.27 0.00 Notes to table 1: The table reports the coefficients, p-values (based on robust standard errors) and fit statistics for the factors in the Independent Factor ACD-GHYP model in-sample estimation. For the variance, GARCH(1,1) dynamics where chosen, while for the skew and shape parameters the Hannan-Quinn Criterion was used to choose from a combination of 16 models consisting of skew-shape model combinations nested in the Quadratic[1,1,1] model including the non time varying combinations. Of the 17 factors, 4 had no time varying skew parameter, and 3 non time varying shape parameters, with one within the intersection set having neither. GARCH persistence appeared to be quite high across all factors, averaging 0.99, indicating perhaps that there are structural breaks in the dataset (which is not unusual considering the period included the 1998 crash). The λ parameter ranged from 4.5 to -5.5 with a mean around -3/. Of the 165 total parameters characterizing the univariate ACD equation across the 17 Factors, only 9 of them of fell outside the 1% critical value of the individual Nyblom-Hansen statistic (of which 7 where in the variance dynamics), and of the 17 models, 2 had a joint statistic which also fell outside the 1% critical value. The table also includes the likelihood ratio test of the full GH fitted model against the restricted NIG model with λ = −0.5 for the factors. The p-value indicates that the restricted model can be rejected for 11 out of the 17 factors at the 5% significance level. Table 2: MSCI iShares 17 (19/03/1996 - 30/11/2009) Independent Factor ACD Model Portfolio: Portfolio VaR Exceedances Tests. In-Sample 19/03/1996 - 05/12/2007 Q(α = 1%) Exceedances Expected Unconditional Coverage Test (Kupiec) Statistic Critical Value Conditional Coverage Test (Christoffersen) Statistic IFACD-GHYP 40 3.397 [0.065] 3.716 [0.156] 20 Q(α = 5%) Exceedances IFACD-GHYP 134 Critical Value Conditional Coverage Test (Christoffersen) Statistic Critical Value 7.965 [0.005] 0.404 [0.525] 12.219 [0.002] 1.16 [0.56] 5.991 DCC-Student 13 1.538 [0.215] 1.538 [0.215] 8.973 [0.003] 1.799 [0.407] 1.799 [0.407] 9.669 [0.008] GO-GARCH-GHYP 45 DCC-Student 44 13.755 [0.000] 12.518 [0.000] 15.255 [0.000] 15.99 [0.000] 5.991 GO-GARCH-GHYP 136 DCC-Student 172 IFACD-GHYP 43 25 0.968 [0.325] 4.076 [0.043] 3.841 11.867 [0.003] GO-GARCH-GHYP 8 3.841 147.5 1.34 [0.247] IFACD-GHYP 8 5 3.841 5.991 Unconditional Coverage Test (Kupiec) Statistic DCC-Student 33 29.5 Critical Value Expected GO-GARCH-GHYP 46 Out-of-Sample 06/12/2007 - 30/11/2009 11.331 [0.001] 3.841 10.901 [0.004] 5.674 [0.059] 12.428 [0.002] 5.991 Notes to table 2: The table reports the in-sample and out-of-sample Value at Risk exceedances and related tests, at the 1% and 5% quantile levels (α) given the weighted portfolio density for the IFACD-GHYP, GO-GARCH GHYP and DCC-GARCH-Student models. The Null Hypothesis is that the model generates the correct number of exceedances and that those exceedances are independent, based on a likelihood ratio test devised by (Christoffersen 1998). In-sample, the GO-GARCH model fails the unconditional exceedance test at the 1% quantile, while both GO-GARCH and IFACD models fail the stronger test of independence at the 5% quantile. The DCC-GARCH Student on the other hand passes the test at the 1% but not at the 5% quantile. Out of sample, both IFACD and GO-GARCH pass test at the 1% quantile while the DCC model fails. All models fail out of sample at the 5% quantile. Table 3: MSCI iShares 17 (19/03/1996 - 30/11/2009) Independent Factor ACD Model Portfolio: Portfolio VaR SPA Test. SPA Test In-Sample 19/03/1996 - 05/12/2007 IFACD-GHYP GO-GARCH-GHYP (benchmark) 0.0373 0.0378 Qloss (α = 1%) Loss p-values: Lower Consistent Upper Power Out-of-Sample 06/12/2007 - 30/11/2009 IFACD-GHYP GO-GARCH-GHYP (benchmark) 0.0728 0.0723 0.5133 0.5133 0.9795 YES IFACD-GHYP (benchmark) 0.1240 Qloss (α = 5%) Loss p-values: Lower Consistent Upper Power 0.1831 0.1831 0.1831 NO GO-GARCH-GHYP IFACD-GHYP (benchmark) 0.2543 0.1242 GO-GARCH-GHYP 0.2547 0.5088 0.9441 0.9441 YES 0.5127 0.7643 0.7643 NO Notes to table 3: The table reports the in-sample and out-of-sample test of Superior Predictive Ability of Hansen (2005) based on statistical loss function (reported as %) used in quantile estimation and defined in Gonzalez-Rivera, Lee, and Mishra (2004). Power of the test is discovered by reversing the position of the benchmark with the model and observing whether the results hold. [EWA] [EWC] −4 −4 x 10 4.52 2.58 Variance 2.6 4.5 4.48 2.56 2.54 4.46 5 2.52 5 5 0 z12, t−1 5 0 0 −5 −5 z12, t−1 z17, t−1 0 −5 (a) i −5 (b) j [EWA, EWC] −4 x 10 1.94 1.92 Covariance Variance x 10 4.54 1.9 1.88 1.86 1.84 5 5 0 z12, t−1 0 −5 −5 z17, t−1 (c) i,j Figure 2: Covariance News Impact Surface 21 z17, t−1 [EWO] [EWN] −4 −3 x 10 6 1 4 0.5 Third Moment Third Moment x 10 2 0 −2 0 −0.5 −4 −1 −6 40 −1.5 40 20 40 20 20 0 −40 −20 −40 z8, t−1 z10, t−1 (a) i −40 z10, t−1 (b) j) [EWL,EWL,EWQ] [EWN,EWN,EWO] −4 −6 x 10 x 10 1.5 5 1 4 Third Co−Moment Third Co−Moment 0 −20 −20 −40 z8, t−1 40 20 0 0 −20 0.5 0 −0.5 3 2 1 −1 0 −1.5 40 −1 40 20 40 20 20 0 z8, t−1 40 −40 0 −20 −20 −40 20 0 0 −20 z2, t−1 z10, t−1 (c) i,i,j −20 −40 −40 z3, t−1 (d) i,i,j Figure 3: Third Co-Moment News Impact Surface Volatility 80 0.03 70 0.02 60 50 0.01 40 2008 2009 2008 2009 Ex.Kurtosis IFACD GO−GARCH DCC 2 4 −1.0 6 −0.5 8 10 0.0 12 Skewness IFACD GO−GARCH 2008 0 −1.5 Index IFACD GO−GARCH DCC 0.04 0.05 90 100 iShares (Equally Weighted) 2009 2008 2009 Figure 4: Out-of-Sample Models’ Conditional Portfolio Moments. 22 A Standardized Generalized Hyperbolic Density In order to model zero-mean, unit variance processes, the distribution, which must posses the scaling property, needs to be properly standardized. In distributions where the expected moments are functions of all the parameters, it is not immediately obvious how to perform such a transformation. In the case of the GHYP distribution, because of the existence of location and scale invariant parametrizations and the possibility of expressing the variance in terms of one of those parametrization, namely the (ζ, ρ), the task of standardizing and estimating the density can be broken down to one of estimating those 2 parameters, representing a combination of shape and skewness, followed by a series of transformation steps to demean, scale and then translate the parameters into the (α, β, δ, µ) parametrization for which standard formulae exist for the likelihood function. The (ξ, χ) parametrization, which is a simple transformation of the (ζ, ρ), could also be used in the first step and then transformed into the latter before proceeding further. The only difference is the kind of ’immediate’ inference one can make from the different parametrizations, each providing a different direct insight into the kind of dynamics produced and their place in the overall GHYP family particularly with regards to the limit cases. When estimating the (ζ, ρ) parameters, it is important to place constraints on their bounds in order to achieve good convergence. As shown in 20, values for ρ are bounded in (−1, 1), while for ζ, a reasonable range would be (0.1, 20). Having estimated the parameters, the next steps involve a transformation into the (α, β, δ, µ) while at the same time including the necessary recursive substitution of parameters in order to standardize the resulting distribution. Proof 1 The Standardized Generalized Hyperbolic Distribution. Let εt be a r.v. with mean (0) and variance (σt 2 ) distributed as GHY P (ζt , ρt ), and let zt be a scaled version of the r.v. εt with variance (1) and also distributed as GHY P (ζt , ρt ).23 The density ft (.) of zt can be expressed as ft ( εt 1 1 ; ζt , ρt ) = ft (zt ; ζt , ρt ) = ft (zt ; α̃t , β̃t , δ̃t , µ̃t ), σt σt σt (44) where we make use of the (α, β, δ, µ) parametrization since we can only naturally express the density in that parametrization. The steps to transforming from the (ζ, ρ) to the (α, β, δ, µ) parametrization, while at the same time standardizing for zero mean and unit variance are given henceforth. Let p ζ = δ α2 − β 2 (45) β ρ = , (46) α which after some substitution may be also written in terms of α and β as, α = p δ β = αρ. 23 ζ , (1 − ρ2 ) The parameters ζt and ρt do not change as a result of being location and scale invariant 23 (47) (48) For standardization we require that, E (X) = µ + p βδ2 Kλ+1 (ζ) ζ Kλ (ζ) ∴µ = − V ar (X) = δ Kλ+1 (ζ) βδ2 Kλ+1 (ζ) =µ+ =0 ζ Kλ (ζ) α2 − β 2 Kλ (ζ) βδ 2 (49) Kλ+1 (ζ) β2 + 2 ζKλ (ζ) α − β2 Kλ+1 (ζ) β2 + 2 ζKλ (ζ) α − β2 Since we can express, β 2 / α2 − β 2 as, ∴δ = α2 !! Kλ+2 (ζ) Kλ+1 (ζ) 2 − =1 Kλ (ζ) Kλ (ζ) !!−0.5 Kλ+2 (ζ) Kλ+1 (ζ) 2 − Kλ (ζ) Kλ (ζ) α2 ρ2 α2 ρ2 β2 ρ2 = 2 = 2 = , 2 2 2 2 −β a −α ρ a (1 − ρ ) (1 − ρ2 ) (50) (51) then we can re-write the formula for δ in terms of the estimated parameters ζ̂ and ρ̂ as, 2 −0.5 Kλ+1 ζ̂ ρ̂2 Kλ+1 ζ̂ Kλ+2 ζ̂ + − δ= 2) (1 − ρ̂ ζ̂Kλ ζ̂ Kλ ζ̂ Kλ ζ̂ (52) Transforming into the (α̃, β̃, δ̃, µ̃) parametrization proceeds by first substituting 52 into 47 and simplifying, 2 0.5 Kλ+2 (ζ̂ ) (Kλ+1 (ζ̂ )) ρ̂2 − 2 K ζ̂ K (ζ̂ ) (Kλ (ζ̂ )) λ( ) + ζ̂ λ+1 (1−ρ̂2 ) ζ̂Kλ (ζ̂ ) p α̃ = , (1 − ρ̂2 ) 2 0.5 Kλ+2 (ζ̂ ) (Kλ+1 (ζ̂ )) − ζ̂ 2 ρ̂2 Kλ (ζ̂ ) (Kλ (ζ̂ ))2 ζ̂Kλ+1 (ζ̂ ) + 2) (1−ρ̂ Kλ (ζ̂ ) p , = (1 − ρ̂2 ) 0.5 2 Kλ+2 (ζ̂ ) Kλ+1 (ζ̂ ) (Kλ+1 (ζ̂ )) ζ̂Kλ+1 (ζ̂ ) 2 2 ζ̂ ρ̂ − 2 Kλ (ζ̂ ) Kλ+1 (ζ̂ ) Kλ (ζ̂ ) (Kλ (ζ̂ )) , = + (1 − ρ̂2 ) (1 − ρ̂2 )2 ζ̂Kλ+1 (ζ̂ ) Kλ (ζ̂ ) = (1 − ρ̂2 ) 1 + ζ̂ ρ̂2 Kλ+2 (ζ̂ ) Kλ+1 (ζ̂ ) − Kλ+1 (ζ̂ ) Kλ (ζ̂ ) (1 − ρ̂2 ) 0.5 . (53) Finally, the rest of the parameters are derived recursively from α̃ and the previous results, β̃ = α̃ρ̂, δ̃ = µ̃ = (54) ζ̂ p , α̃ 1 − ρ̂2 ζ̂ . ζ̂Kλ ζ̂ −β̃ δ̃2 Kλ+1 24 (55) (56) For the use of the (ξ, χ) parametrization in estimation, the additional preliminary steps of converting to the (ζ, ρ) are, ζ = ρ = B 1 ξ̂ 2 χ̂ . ξˆ − 1, (57) (58) The Generalized Hyperbolic Characteristic Function The moment generating function (MGF ) of the Generalized Hyperbolic (GH ) Distribution is, 2 u µu √ + βu , MGH(λ,α,β,δ,µ) (u) = e MGIG λ,δ α2 −β 2 2 q 2 (59) λ/2 Kλ δ α2 − (β + u) 2 − β2 α p = eµu α2 − (β + u)2 Kλ δ α2 − β 2 where MGIG represents the moment generating function of the Generalized Inverse Gaussian which forms the mixing distribution in this variance-mean mixture subclass. Powers of the MGF, MGH (u)p , only have the representation in 59 for p = 1, which means that GH distributions are not closed under convolution with the exception of the Normal Inverse Gaussian (NIG), and only in the case when the shape and skew parameters are the same. The MGF of the NIG is, √ δ α2 −β 2 µu e √ MN IG(α,β,δ,µ) (u) = e . (60) 2 2 eδ α −(β+u) Powers of p are equivalent in this case to multiplication by p of δ and µ, so that, N IG(α, β, δ1 , µ1 ) ∗ ... ∗ N IG(α, β, δn , µn ) = N IG(α, β, δ1 + ... + δn , µ1 + ... + µn ). (61) In all other cases, when the distribution is not closed under convolution, numerical methods are required such as the inversion of the characteristic function by Fast Fourier Transform (FFT ). Because the MGF is a holomorphic function for complex z, with |z| < α − β, we can obtain the characteristic function of the GH distribution, using the following representation, φGH (u) = MGHY P (iu), (62) so that the characteristic function may be written as, φGH(λ,α,β,δ,µ) (u) = eµiu and for the NIG this is simplified to, q 2 2 λ/2 Kλ δ α − (β + iu) α2 − β 2 p , α2 − (β + iu)2 Kλ δ α2 − β 2 φN IG(α,β,δ,µ) (u) = eµiu eδ √ √ (63) α2 −β 2 . (64) 2 2 eδ α −(β+iu) In order to find the weighted summation of the portfolio density in the case of the IFACD model, the characteristic function required for the inversion of the NIG density was already used in Chen, Hardle, and Spokoiny (2007) and given below, q d d q X X 2 φport (u) = exp iu µ̄j + δ̄j ᾱ2j − β̄j2 − ᾱ2j − (β̄j + iu) (65) j=1 j=1 25 where ᾱj , β̄j , δ̄j and µ̄j represent the parameters scaled as described in the main text of the thesis. In the case of the GH characteristic function, this is a little more complicated as it involves the evaluation of modified Bessel function of the third kind with complex arguments24 . Taking logs and summing, λj 2 − β̄ 2 − λj log ᾱ2 − (β̄ + iu)2 + d d log ᾱ j X j j j 2 2 X q q µ̄j + φport (u) = exp iu 2 2 2 2 log Kλj δ̄j ᾱj − (β̄j + iu) − log Kλj δ̄j ᾱj − β̄j j=1 j=1 (66) which is more than 30 times slower to evaluate than the equivalent NIG function because of the Bessel function evaluations. 24 Routines for this exist for example on netlib, see http://www.netlib.org/amos/zbesk.f 26 References Aas, K., and I. Haff (2006): “The generalized hyperbolic skew Student’s t-distribution,” Journal of Financial Econometrics, 4(2), 275–309. Alexander, C. (2001): “Orthogonal garch,” Mastering risk, 2, 21–38. Barndorff-Nielsen, O. (1977): “Exponentially decreasing distributions for the logarithm of particle size,” Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences, 353(1674), 401–419. Barndorff-Nielsen, O., and P. Bläesild (1981): Hyperbolic distributions and ramifications: Contributions to theory and application. Matematisk Institut, Aarhus Universitet. Barndorff-Nielsen, O., J. Kent, and M. Sörensen (1982): “Normal variance-mean mixtures and z distributions,” International Statistical Review/Revue Internationale de Statistique, pp. 145–159. Bläesild, P. (1981): “The two-dimensional hyperbolic distribution and related distributions, with an application to Johannsen’s bean data,” Biometrika, 68(1), 251. Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity,” Journal of Econometrics, 31, 307–327. (1990): “Modelling the coherence in short-run nominal exchange rates: a multivariate generalized ARCH model,” The Review of Economics and Statistics, 72(3), 498–505. Bond, S., and K. Patel (2003): “The conditional distribution of real estate returns: Are higher moments time varying?,” The Journal of Real Estate Finance and Economics, 26(2), 319–339. Box, G., and D. Cox (1964): “An analysis of transformations,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 211–252. Broda, S., and M. Paolella (2009): “CHICAGO: A Fast and Accurate Method for Portfolio Risk Calculation,” Journal of Financial Econometrics, 7(4), 412. Brooks, C., S. Burke, S. Heravi, and G. Persand (2005): “Autoregressive conditional kurtosis,” Journal of Financial Econometrics, 3(3), 399–421. Chen, Y., W. Hardle, and S. Jeong (2008): “Nonparametric risk management with generalized hyperbolic distributions,” Journal of the American Statistical Association, 103(483), 910–923. Chen, Y., W. Hardle, and V. Spokoiny (2007): “Ghica-risk analysis with gh distributions and independent components,” WIAS Preprint, 1064. Chen, Y., W. Härdle, and V. Spokoiny (2007): “Portfolio value at risk based on independent component analysis,” Journal of Computational and Applied Mathematics, 205(1), 594–607. Christoffersen, P. (1998): “Evaluating interval forecasts,” International Economic Review, 39(4), 841–862. de Athayde, G., and R. Flôres Jr (2000): “Introducing Higher Moments in the CAPM: Some Basic Ideas,” mimeo. 27 (2002): “On Certain Geometric Aspects of Portfolio Optimisation with Higher Moments,” mimeo. Engle, R. (1982): “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,” Econometrica, pp. 987–1007. (2002): “Dynamic conditional correlation,” Journal of Business and Economic Statistics, 20(3), 339–350. Engle, R., and S. Manganelli (2004): “CAViaR: Conditional autoregressive value at risk by regression quantiles,” Journal of Business & Economic Statistics, 22(4), 367–382. Engle, R., and V. Ng (1993): “Measuring and testing the impact of news on volatility,” Journal of Finance, pp. 1749–1778. Engle, R., V. Ng, and M. Rothschild (1990): “Asset Pricing with a Factor ARCH Covariance Structure: Empirical Estimates for Treasury Bills,” Journal of Econometrics, 45, 213. Ferreira, J., and M. Steel (2003): “Bayesian Multivariate Regression Analysis with a New Class of Skewed Distributions,” mimeo. Gonzalez-Rivera, G., T. Lee, and S. Mishra (2004): “Forecasting volatility: A reality check based on option pricing, utility function, value-at-risk, and predictive likelihood,” International Journal of Forecasting, 20(4), 629–645. Hansen, B. (1990): “Lagrange multiplier tests for parameter instability in non-linear models,” mimeo. (1994): “Autoregressive conditional density estimation,” International Economic Review, 35, 705–730. Hansen, P. (2005): “A test for superior predictive ability,” Journal of Business and Economic Statistics, 23(4), 365–380. Hansen, P., and A. Lunde (2005): “A forecast comparison of volatility models: Does anything beat a GARCH (1, 1)?,” Journal of Applied Econometrics, (20), 873–889. Harvey, C., and A. Siddique (2009): “Autoregressive conditional skewness,” Journal of Financial and Quantitative Analysis, 34(04), 465–487. Hu, X., R. Shonkwiler, and M. Spruill (1994): “Random restarts in global optimization,” Georgia Institute of technology, Atlanta. Hyvärinen, A., and E. Oja (1999): “Independent component analysis: algorithms and applications,” Neural networks, 13(4-5), 411–430. Jensen, M., and A. Lunde (2001): “The NIG-S model: a fat-tailed, stochastic, and autoregressive conditional heteroskedastic volatility model,” The Econometrics Journal, 4(2), 319–342. Jondeau, E., and M. Rockinger (2003): “Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements,” Journal of Economic Dynamics and Control, 27(10), 1699–1737. (2006): “Optimal Portfolio Allocation under Higher Moments,” European Financial Management, 12(1), 29–55. 28 (2009): “The Impact of Shocks on Higher Moments,” Journal of Financial Econometrics, 7(2), 77. Kroner, K., and V. Ng (1998): “Modeling asymmetric comovements of asset returns,” Review of Financial Studies, 11(4), 817. Kupiec, P. (1995): “Techniques for verifying the accuracy of risk measurement models,” The journal of Derivatives, 3(2), 73–84. Lai, T. (1991): “Portfolio selection with skewness: a multiple-objective approach,” Review of Quantitative Finance and Accounting, 1(3), 293–305. Mandelbrot, B. (1963): “The variation of certain speculative prices,” Journal of business, 36(4), 394–419. Paolella, M. (2007): Interscience. Intermediate Probability: A Computational Approach. Wiley- Patton, A. (2002): “Applications of Copula Theory in Financial Econometrics,” Ph.D. thesis, University of California, San Diego. Prakash, A., C. Chang, and T. Pactwa (2003): “Selecting a portfolio with skewness: recent evidence from US, European, and Latin American equity markets,” Journal of Banking and Finance, 27(7), 1375–1390. Prause, K. (1999): “The generalized hyperbolic model: Estimation, financial derivatives, and risk measures,” Ph.D. thesis, University of Freiburg. Premaratne, G., and A. Bera (2000): “Modeling asymmetry and excess kurtosis in stock return data,” Illinois Research. Rockinger, M., and E. Jondeau (2002): “Entropy densities with an application to autoregressive conditional skewness and kurtosis,” Journal of Econometrics, 106(1), 119–142. Ross, S. (1976): “The arbitrage theory of capital asset pricing,” Journal of economic theory, 13(3), 341–360. Schmidt, R., T. Hrycej, and E. Stützle (2006): “Multivariate distribution models with generalized hyperbolic margins,” Computational statistics and data analysis, 50(8), 2065– 2096. van der Weide, R. (2002): “GO-GARCH: a multivariate generalized orthogonal GARCH model,” Journal of Applied Econometrics, 17(5), 549–564. (2004): “Wake me up before you GO-GARCH,” Computing in Economics and Finance 2004. van der Weide, R., and P. Boswijk (2008): “Method of moments estimation of GO-GARCH models,” mimeo. White, H. (2000): “A reality check for data snooping,” Econometrica, 68(5), 1097–1126. Wilhelmsson, A. (2009): “Value at Risk with time varying variance, skewness and kurtosis–the NIG-ACD model,” Econometrics Journal, 12(1), 82–104. Ye, Y. (1997): Interior Point Algorithms: Theory and Analysis. John Wiley and Sons, New York. 29 Zhang, K., and L. Chan (2009): “Efficient factor GARCH models and factor-DCC models,” Quantitative Finance, 9(1), 71–91. 30
© Copyright 2026 Paperzz