Independent Factor Autoregressive Conditional Density Model ∗

Independent Factor Autoregressive Conditional Density Model
Alexios Ghalanos †
Cass Business School, UK
∗
Eduardo Rossi ‡
University of Pavia, Italy
Giovanni Urga §
Cass Business School, UK and Bergamo University, Italy
This version: September 22, 2010
PRELIMINARY VERSION. PLEASE DO NOT QUOTE.
Abstract
An Independent Factor Autoregressive Conditional Density (ACD ) model using Independent
Component Analysis is proposed, extending recent approaches in multivariate modelling with
higher conditional moments and orthogonal factors. Using the Generalized Hyperbolic (GH )
as conditional distribution, conditional co-moments and weighted density for use in portfolio
and risk management applications are derived. The empirical application, using MSCI equity
index iShares data, shows that ACD modelling with a GH distribution allows to obtain taildependent risk measures, such as Value at Risk (VaR) estimates, which improve on those
obtained from higher moment time-invariant conditional densities.
1
Introduction
Since Mandelbrot (1963), researchers have discovered numerous statistical properties in real
market time series that contradict the theoretical results of their models. These so called stylized facts, together with the paradigm shift away from the completely rational, representative
agent to a boundedly rational, heterogeneous agent, has motivated researchers to model financial markets with a new set of tools, distributions and models. Among these, the pioneering
work of Box and Cox (1964) in the area of autoregressive moving average models paved the way
for related work in the area of volatility modelling with the introduction of ARCH and then
GARCH models by Engle (1982) and Bollerslev (1986), respectively. In terms of the statistical framework, these models capture motion dynamics in the conditional time variation of the
distributional parameters of the mean and variance. This allows to capture autocorrelation in
returns and squared returns. There has also been a considerable body of evidence (see for example Wilhelmsson (2009), Harvey and Siddique (2009), and Rockinger and Jondeau(2002, 2003,
2009)) that financial markets not only exhibit skewness and thick tails, but that these need to be
modelled in a time varying context, leading to superior estimates of tail related measures. While
these time-varying models have received growing attention in the univariate conditional density
modelling literature, very little has been achieved in incorporating such dynamics in a multivariate context. This is because of the difficulty, as well as absence of flexible model-distribution
∗
Part of this work was completed while the first author was visiting the CEA@CASS, whose hospitality he
gratefully acknowledges.
†
106 Bunhill Row, London EC1Y 8TZ, UK. E-mail: [email protected]
‡
Dipartimento Economia politica e metodi quantitativi. Via San Felice 5, 27100 Pavia, Italy. Tel.: +39
0382/986207 Fax: +39 0382/304226. E-mail: [email protected].
§
106 Bunhill Row, London EC1Y 8TZ, UK. E-mail: [email protected]
1
representation, in incorporating such complex interactions in both the marginal and joint distributional parameters. The indepedence factor framework provides for a particularly appealing
avenue for capturing time varying higher moments in a multivariate setup, with estimation in
univariate time. The main contribution of this paper is to extend the Generalized Orthogonal
GARCH model of van der Weide (2002), using the Independent Components Analysis (ICA)
estimation method used in Chen, Hardle, and Jeong (2008), Zhang and Chan (2009), and Broda
and Paolella (2009), to include marginal dynamics for the whole density, as proposed by Hansen
(1994). In somewhat the same way that the Conditional Correlation model of Bollerslev (1990)
used a static dependence framework, the model we adopt uses a static independence framework. Unlike other dependence models however, independence offers a greater deal of flexibility
in modelling the full marginal dynamics within a multivariate affine factor framework and enabling the calculation of the density of the weighted constituents for use in portfolio applications.
Section 2 of our paper builds on the Independent Factor model by extending the motion dynamics to the skew and shape parameters of the full Generalized Hyperbolic 1-dimensional margins. Key features of the model such as the conditional higher co-moment tensors and weighted
portfolio representation are presented and discussed. To test the performance of our proposed
model, we carry out a medium sample study in Section 3, using log returns of closing price data
on 17 exchange traded MSCI index funds (iShares)1 , from 19/03/1996 to 30/11/2009 obtained
from the Center for Research in Security Prices. Our modelling strategy is designed to tackle the
evidence that, marginally, returns are characterized by strong conditional skewness and kurtosis
which, like conditional variance, are not time invariant. The statistical analysis confirms that
the adoption of time-varying conditional higher moments marginal densities outperforms the
standard GARCH models. The empirical application shows that the IFACD model with GH
distribution allows to obtain tail-dependent risk measures, such as VaR estimates, which shown
an improvement with respect to those obtained from higher moment time-invariant conditional
densities in Broda and Paolella (2009).
2
The Independent Factor Framework
Factor ARCH models, originally introduced by Engle, Ng, and Rothschild (1990), and with
foundations in the Arbitrage Pricing Theory (APT) of Ross (1976)2 , are based on the premise
that returns are generated by a set of unobserved underlying factors that are conditionally heteroscedastic. Because of the constant linear mapping of the factors to the underlying data,
the dependence framework is non-dynamic, which appears to be a necessary tradeoff for large
scale estimation in a multivariate setting. The dependence structure of the unobserved factors
then determines the type of factor model it belongs to, with correlated factors making up the
Factor ARCH (F-ARCH) type models, while uncorrelated and independent factors comprise the
Orthogonal and Generalized Orthogonal Models (GO-GARCH ) respectively3 . Because one can
always re-discover uncorrelated or independent sources by certain statistical transformation, the
correlated factor assumption of F-ARCH models does appear to be restrictive. GO-GARCH
models on the other hand make use of those transformations to place the factors in an independence framework with unique benefits such as separability and weighted density convolution
giving rise to truly large scale, real-time, feasible estimation.
1
The MSCI tracked countries included were Australia, Canada, Sweden, Germany, Hong Kong, Italy, Japan,
Belgium, Switzerland, Malaysia, Netherlands, Austria, Spain, France, Singapore, UK and Mexico.
2
Although the APT does not imply a finite number of factors
3
It should be noted, that most of these factor models may be seen as special cases of the BEKK model, itself
subsumed by the Generalized Dynamic Covariance Model of Kroner and Ng (1998).
2
2.1
The Generalized Orthogonal GARCH Model
Consider a set of N assets whose returns r t are observed for T periods, that is {r t }Tt=1 . The
conditional mean of the returns is E[r t |Ft−1 ] = µt , where Ft−1 is the sigma-field generated
by the past realizations of r t , i.e. Ft−1 = σ(r t , r t−1 , . . .). The GO-GARCH Model of van der
Weide (2002) maps r t onto a set of linear combination of unobserved independent factors f t ,
which may then be estimated separately, and follows directly from the research in independent
component analysis.
r t = µt + ǫt
(1)
ǫt = Af t
(2)
where A is invertible and constant over time and may be decomposed into a de-whitening and
orthogonal matrix, A = Σ1/2 U , and f t = (f1t , . . . , fN t ). The first step therefore to obtaining independent factors is to whiten the multivariate series via Principal Components Analysis, where
the estimation of Σ may be carried out separately on the dataset, allowing also for some dimensionality reduction, followed by a transformation of the whitened series to a set of independent
signals represented by the rotation, or orthogonal, matrix U . The different approaches to the
calculation of the appropriate rotational transformation represent similar avenues in obtaining
independent components, already well established in the statistical literature on ICA, and are
all closely related. In the original paper of van der Weide (2002) this was estimated as a product
of N (N − 1)/2 rotation matrices4 , via maximum likelihood,
Y
U=
Rij (θij ), −π 6 θij ,
(3)
i<j
where Ri (θi ) is the ith rotation of the Euler angle θi in the plane spanned by one pair of axes in
Rd . Note that when U is restricted to be an identity matrix, the model reduces to the Orthogonal
Factor model of Alexander (2001), which leads to uncorrelated components but not necessarily
independent unless assuming a multivariate normal distribution. Thus the rotational matrix
U is estimated jointly with the univariate GARCH dynamics making this a high dimensional
problem. The use of the Euler angle approach, ties the calculation of U , and hence the underlying
independent factors, with the underlying model specification for the individual factors, which
though very efficient as an estimator is prone to errors in model and conditional distribution
mispecification of the underlying factors. Subsequently, U was estimated via a 3 step approach
using non-linear least squares in van der Weide (2004), a 3 step method-of-moments in van der
Weide and Boswijk (2008), and more recently by Broda and Paolella (2009) and Zhang and Chan
(2009) in a 3 step approach using non-parametric ICA. The alternative approaches, sacrifice some
of the efficiency for computational feasibility and a model free assumption in the calculation
of U . They represent non-parametric approximations to independence, arising either from
maximization of non-gaussianity or mutual information, with the former being the underlying
method for the FASTICA method of Hyvärinen and Oja (1999) which uses approximations to
negentropy to maximize non-gaussianity and hence find the independent factors.
Because of the assumption of independence, the likelihood function is greatly simplified and
represents the sum of the individual likelihoods (fij ) plus a term for the mixing matrix A,
T X
N
−1 X
L(εx |θ, A) = T log A
+
log (fij (εy,ij |θj ))
(4)
i=1 j=1
Since the standard spherical multivariate normal distribution is defined as the joint distribution of n independent univariate standard normal variables, the representation of the multivariate normal density with diagonal Σ is equivalent to equation (4).
4
The determinant of U is restricted to be 1 in this case.
3
The model of Broda and Paolella (2009) extends the GO-GARCH model in a distributional
direction by using a multivariate affine representation of the Normal Inverse Gaussian (maNIG)
proposed by Schmidt, Hrycej, and Stützle (2006) as the underlying distribution of the return
generating process. It follows Chen, Härdle, and Spokoiny (2007) who applied the FASTICA
method with the NIG as conditional distribution to a set of German stocks using a local volatility
process. Both papers focused on the distribution of a weighted sum of underlying components in
a risk and portfolio management context, for which numerical approximations to the weighted
portfolio density using Fast Fourier Transform (FFT ) in the case of Chen, Härdle, and Spokoiny
(2007), and a saddlepoint approximation (SPA) in the case of Broda and Paolella (2009) to evaluate the distribution quantile for the purposes of VaR calculation. The latter paper focuses on a
portfolio application, without considering the co-moment dynamics which result from the model,
whilst the former do attempt to make some comparison to the Dynamics Conditional Correlation
(DCC ) model of Engle (2002) which generates time-varying conditional covariances. Because
the GO-GARCH model only accomodates static (in)dependence dynamics, the comparison with
the DCC model should be considered a key benchmark to beat, and for this reason we include
it in our empirical application. In another direction, Zhang and Chan (2009) focus on the multivariate normal distribution, investigating some additional algorithms for U and proposing the
intermediate use of DCC for the independent components to capture any residual correlation
not eliminated by the GO-GARCH approach.
One of the key advantages offered by the Generalized Orthogonal approach is that following the
estimation of the independent factors, the dynamics of the marginal density parameters of those
factors may be estimated separately and in parallel, while not restricted to any particular single
model or dynamics. In this context, we propose to extend the dynamics to the full conditional
density parameters, allowing us to model in a multivariate setting, time varying higher moments.
While any multivariate distribution admitting an affine representation may be used in this setup,
we have chosen the Generalized Hyperbolic for its flexibility and rich parametrization, capturing
some of the most important features of observed returns such as asymmetry and fat tails. To
motivate the choice of the GH distribution in our model, we present some of its features in the
next section, with a particular emphasis on the 1-dimensional representation which forms the
basis of the independent factor margins in our model.
2.2
The Generalized Hyperbolic Distribution
The Generalized Hyperbolic Distribution (GH ), introduced by Barndorff-Nielsen (1977) in the
context of a sand project, is a variance-mean mixture of the normal and Generalized Inverse
Gaussian (GIG) distributions. It is an extremely flexible distribution, allowing for skewness and
fat tails. It nests a large number of other distributions which have proved popular in the empirical
modelling of financial asset returns, such as the Hyperbolic, Normal Inverse Gaussian, Variance
Gamma, (skew) Laplace, and as limiting cases, the Normal and (skew) Student distributions.
Tail flexibility is one particularly attractive feature of the GH model, which allows for modelling
asymmetrically the upper and lowers tails. The skew-Student distribution for example, analyzed
in Aas and Haff (2006), allows for the modelling of one heavy (with polynomial behavior) and
one semi-heavy (with exponential behavior) tail.5
Definition 2.2 provides the general family which the GH distribution belongs to, that of the
Normal Mean-Variance Mixture distributions discussed in Barndorff-Nielsen, Kent, and Sörensen
(1982), thus allowing the identification of key properties it inherits and highlighting the origin of
its uni-dimensional shape parameters which are given by the choice of the mixing distribution.
Definition 2.1 The n-dimensional random variable X is said to have a normal mean-variance
mixture distribution of the following form:
5
It is in fact the only distribution in the GH family to allow for one polynomial and one exponential tail.
4
d
X = µ + Wγ +
√
WAZ,
(5)
where Z ∼ Nq (0, Iq ), W ∈ R1+ , A ∈ Rn×q , and µ, γ ∈ Rn . From the definition it follows that,
X |W ∼ Nq (µ + Wγ, WΣ) ,
E (X) = µ + E (W) γ,
(6)
′
Cov (X) = E (W) Σ + var (W) γγ ,
where Σ = AA′ , and the mixing variable W is positive and has finite variance. A very useful
property is that if the distribution of W is infinitely divisible, then the distribution of X is also
infinitely divisible. This implies that there exists a Lèvy process with support over the entire
real line, which is distributed at time t = 1 according to the law of X. Since the theoretical properties of Lèvy processes are well established, this translates into the possibility of formulating
financial models directly in terms of such processes.
A very popular choice for the mixing variable is the GIG distribution, so that W ∼ GIG(λ, χ, ψ)6 ,
in which case the multivariate GH distribution is obtained. It depends on the three real parameters of the GIG distribution, the location (µ) and skewness (γ) vectors in Rn , and a positive
definite matrix Σ ∈ Rn×n . The kurtosis (tail behavior), described by the λ and χ shape parameters, is driven by the univariate GIG mixing distribution and is therefore similar in all
dimensions. The n-dimensional GH distribution allows for the modelling of multivariate data
with some very desirable features, such as the ability to model skewness individually for each
dimension. Additionally, the distribution has the property of infinite divisibility (inherited from
the GIG mixing distribution), and is closed under margining, conditioning and linear affine
transformations, and in the case of the NIG and Variance Gamma VG distributions are also
closed under convolution (for equal skew and shape parameters). To motivate the use of the
affine representation of the GH distribution used in our model, definition 2.2 provides the standard n-dimensional representation which Schmidt, Hrycej, and Stützle (2006) reformulate into
a model for use in an independence factor framework.
Definition 2.2 The n-dimensional Generalized Hyperbolic Distribution of the random vector
X ∈ Rn
q
′ −1
2
Kλ−n/2 α δ + (x − µ) Σ (x − µ)
ghn (x; α, β, δ, µ, Σ) = cn q
1
α
′
β (x−µ)
.
n/2−λ e
′ −1
2
δ + (x − µ) Σ (x − µ)
p
λ
α2 − β ′ Σβ/δ
p
.
cn =
(2π)n/2 Kλ δ α2 − β ′ Σβ
(7)
with parameter domain of variation, λ ∈ R, β, µ ∈ Rn , δ > 0, α2 > β ′ Σβ, and Σ ∈ Rn×n
with determinant 1. The fact that the domain of α is 1-dimensional, is due to the univariate
GIG mixing distribution, and means that kurtosis is the same for all dimensions. That is, there
is one joint representation of extreme events, which may not be an adequate reflection of the
multivariate data, especially when they come from not very highly correlated (at least in the
tail sense) sets. Schmidt, Hrycej, and Stützle (2006) point out that the margins of X are not
mutually independent for some choice of the scaling matrix Σ which may be a key property of
the problem (as we see in some factor models), and that the scaling matrix Σ is hard to interpret
6
The χ and ψ parameters have also been represented as δ 2 and α2 − β 2 respectively in the literature.
5
in the presence of skewness as it is in a complex relationship with the β vector. As a result, they
propose an alternative, non-elliptical, multivariate affine Generalized Hyperbolic (maGH ) model
which is composed of independent margins allowed to take on separate values for skewness and
shape (as well as different λ therefore allowing to mix sub-families of distributions), presented
in the following definition:7
Definition 2.3 The n-dimensional Affine Generalized Hyperbolic Distribution of the random
vector X ∈ Rn has the following stochastic representation
d
X = µ + A′ Y ∈ maGHn (µn , Σ, λn , αn , βn ) .
(8)
d
where the = notation is meant to convey equality in distribution and A is a lower triangular
matrix such that A′ A = Σ is positive definite, and may be sought via singular value decomposition or other appropriate methodology. The 1-dimensional independent margins of X, are given
by Yi ∈ GH 1 (0, 1, λi , αi , βi ), arrived at by first whitening X using the estimated Σ matrix, and
then making them independent by applying an appropriate rotational matrix. The ability to
model the margins separately with different parameters for each margin not only increases the
flexibility of the model but also its computational ease, as the multivariate estimation reduces
to a univariate one with the density likelihood equal to the product of the marginal likelihoods
plus a term for the mixing matrix which is exactly the representation of the GO-GARCH model.
Therefore, for the estimation of the GO-GARCH model within the context of such multivariate affine distributions, one need only focus on the marginal density for the model estimation
and the characteristic function for the weighted portfolio summation. The following definition
provides the conditional density of the independent factor margins of our proposed model:
Definition 2.4 The 1-dimensional Generalized Hyperbolic Distribution of the random variable
x∈R
(λ−1/2)/2
gh (x; λ, α, β, δ, µ) = c (λ, α, β, δ, µ) δ2 + (x − µ)2
q
2
2
×Kλ−1/2 α δ + (x − µ) eβ(x−µ)
c (λ, α, β, δ, µ) = √
α2
−
λ/2
β2
(9)
p
2παλ−1/2 δλ Kλ δ α2 − β 2
with parameter domain of variation 0 6 |β| < α, µ, λ ∈ R and δ > 0, and Kλ being the modified
bessel function of the third kind. Special cases of the distribution are obtained by varying λ.
For example, the NIG distribution is obtained by setting λ to − 12 , the Hyperbolic by setting λ
ν
8
to n+1
2 , and the skew-Student mentioned earlier by setting λ to − 2 (with ν representing the
degrees of freedom), and α → |β|. The parameters may be interpreted as location (µ), scale (δ),
skewness (β) and shape (α), hence allowing the most important features in financial modelling
to be represented, namely that of trend, ’risk’ (by some measures), asymmetry and likelihood
of extreme events.
A number of location and scale invariant parametrizations of the GH distribution have been
7
A similar strategy was followed by Ferreira and Steel (2003) in constructing a multivariate skew student
density with independent margins
8
Thus in the 1-d case this is simply 1.
6
proposed in the literature,
ζ=δ
p
α2 − β 2 ,
ρ=
1
ξ = (1 − ζ)− 2 ,
ᾱ = αδ,
β
,
α
(10)
χ = ξρ,
β̄ = βδ.
Bläesild (1981) proved that a linear transformation of the form aX+b of a variable X distributed
according to a GH distribution would again lead to a variable distributed with the same distribution and parameters λ∗ = λ, α∗ = α/ |a|, β ∗ = β/ |a|, δ∗ = δ |a|, and µ∗ = aµ + b. Therefore,
for the modelling of (0,1) processes such as we find in models which are centered and scaled
by their mean and standard deviation, one can use any of these location and scale invariant
parametrization plus the following theoretical moments formulas for the Generalized Hyperbolic
(needed to apply the centering and scaling):
βδ2
Kλ+1 (ζ)
,
E (X) = µ + p
2
2
α − β Kλ (ζ)
V ar (X) = δ
2
Kλ+1 (ζ)
β2
+ 2
ζKλ (ζ)
α − β2
"
Kλ+2 (ζ)
−
Kλ (ζ)
Kλ+1 (ζ)
Kλ (ζ)
2 #!
(11)
.
Prause (1999) suggests the use of the (ᾱ, β̄) parametrization, which is adopted by Jensen and
Lunde (2001) as well as Wilhelmsson (2009) in their GARCH and ACD - NIG models respectively. In our model we have found the (ζ, ρ) parametrization to be adequate for our purposes.
In any case, moving between any of these parametrizations is a simple matter of applying
the appropriate transformation, and Prause (1999) provides further directions for applying the
rescaling property when dealing with the standard deviation. Appendix A provides the nontrivial method for scaling and centering the GH density in the (ζ, ρ) and (ξ, χ) parametrizations
for use in GARCH type processes. In the model we propose, the parameters (ζ, ρ) are modelled as time varying, in addition to the variance, with the key dynamics and their properties
described in the next section.
2.3
Conditional Factor Dynamics
Although models from the GARCH family are able under certain assumptions and parameterizations to produce thick-tailed and skewed unconditional distributions they typically assume
that the shape and skewness parameters are time invariant. This also leads to the assumption
that the conditional distribution of the standardized innovations (zt ) is independent of the conditioning information, for which there is no good reason to believe so a-priori. A number of
authors, including Lai (1991), Prakash, Chang, and Pactwa (2003), and Jondeau and Rockinger
(2006), have found evidence suggesting that the incorporation of higher moments in portfolio
allocation leads to superior approximations of expected utility. With regards to time variation
in the full conditional density parameters, different motion dynamics and distributions have
been considered in the literature, on instruments varying from real estate to foreign exchange
returns. The results are mixed, with Harvey and Siddique (2009) finding significant evidence of
time varying skewness, Jondeau and Rockinger (2003) finding both time varying skewness and
kurtosis significant, while Premaratne and Bera (2000), Brooks, Burke, Heravi, and Persand
(2005) and Rockinger and Jondeau (2002) find no evidence of either. In terms of observation
frequency, Jondeau and Rockinger (2003) find the presence of time varying skewness and kurtosis in daily but not weekly data, partly consistent with the observation that excess kurtosis
diminishes with temporal aggregation, while others including Hansen (1994), Bond and Patel
(2003) and Harvey and Siddique (2009) do find evidence of time varying skewness and kurtosis
7
in weekly and even monthly data. The mixed results may be partly the result of the constraints
required to limit the distribution parameters within certain bounds (leading to transformations
such as the logistic) and partly because both skewness and kurtosis are driven by extreme events
making their identification with particular motion dynamics and (standardized) residuals very
hard. Nevertheless, since ACD models subsume GARCH models, it is always possible to test
down different specification for time and non-time variation in the underlying conditional distribution parameters through the use of appropriate marginal cost to benefit test as provided
by various information criteria and likelihood ratio tests.
We now consider the dynamics of the independent factors, in the context of an expanded GOGARCH model with dynamics for the full conditional parameters. We call this the Independent
Factor ACD (IFACD ) to distinguish it from the GO-GARCH model by its use of a two-stage
method using ICA and full conditional density dynamics. The unconditional distribution of the
factors is characterized by:
E[f t ] = 0 E[f t f ′t ] = I N ,
t = 1, . . . , T
(12)
which, in turn, implies that
E[ǫt ] = 0 E[ǫt ǫ′t ] = AA′ .
(13)
The conditional covariance matrix, Σt ≡ E[(r t − mt )(r t − mt )′ |Ft−1 ] of the returns is given by:
Σt = E[ǫt ǫ′t |Ft−1 ]
= AE[f t f ′t |Ft−1 ]A′
= AH t A′ ,
(14)
where H t = E[f t f ′t |Ft−1 ] is a diagonal matrix with elements (h1t , . . . , hN t ) which are the conditional variances of the factors. These conditional variances can be modeled as a GARCH-type
process. We assume that the factors have the following specification:
p
fit = zit hit ,
(15)
it follows that the returns can be expressed as:
1/2
r t = mt + AH t z t
(16)
where z t = (z1t , . . . , zN t )′ . The factor conditional variances are modelled as a GARCH(p,q)
process:
hit = ci + ai (L)ǫ2t + bi (L)hit
(17)
with
ai (L) = ai,1 L + . . . + ai,q Lq
bi (L) = bi,1 L + . . . + bi,p Lp .
The random variables zit are independent across i and t with E[zit ] = 0 and E[zit2 ] = 1. We
assume that the conditional distribution of zit is GH with zero mean and unit variance distributed, i.e. SGHλ,i (µi , δi , αi , βi ). We expand on the model of Broda and Paolella (2009), by
allowing for time variation in the higher moment parameters and using the more general GH
distribution rather than constraining the λ parameter to a particular value. The representation
is such as to give a location and scale invariant parametrization of the GH distribution, namely
they have zero mean and unit variance, and separate motion dynamics for the skewness and
8
shape parameters as in the ACD type models introduced by Hansen (1990). The skew and
shape parameters in the zi,t ’s GH density are modeled with Quadratic(1,1,1) dynamics,
2
ρ̆it = γ0i + γ1i zit−1 + γ2i zit−1
+ δ1 ρ̆it−1
ζ̆it = θ0i + θ1i zit−1 +
2
θ2i zit−1
+ ψ1 ζ̆it−1 ,
(18)
i = 1, . . . , N
(19)
with the logistic transform to map the unconstrained processes ρ̆t and ζ̆t into ρt and ζt :
1.98
1 + e−ρ̆it
20
ζit = 0.1 +
,
1 + e−ζ̆it
ρit = −0.99 +
(20)
(21)
considering the bounds of the distributional parameters which are [−0.99, 0.99] and [0.1, 20] for ρ
and ζ 9 respectively. Meanwhile, we also estimate the GIG shape parameter λi , allowing it to vary
for each factor.
follows that
distributed as
√ It √
√ the single
√ factors, fit , i = 1, . . . , N , are conditionally
N
a GHλi (µi hit , δi hit , αi / hit , βi / hit ). Finally, the vector of returns r t ∈ R , which can be
expressed as a linear transformation of independent factors f t ∈ RN , is conditionally distributed
according to the multivariate affine GH (maGH ) distribution of Schmidt, Hrycej, and Stützle
(2006):
r t |Ft−1 ∼ maGHN (mt , Σt , ω t ),
where ω t = (ω1 , . . . , ωN ) and ωi = (λi , αi , βi )′ , representing the conditional shape and skew
parameter vectors. As noted by Hansen (1994), because the standardized innovations zt are
no longer i.i.d, consistency of the MLE and asymptotic normality of the parameters is hard to
provide in such a setting. However, we did run various simulations on different parameter values
and recursive windows to obtain the simulated√Root Mean Squared Errors (RMSE ) of true
versus expected values, observing in most cases T type consistency. Figure 1 shows one such
simulation
on a set of parameters, with the blue line indicating
√
p the expected RMSE value given
T consistency. That is, the RMSE should decrease by T + /T when increasing the sample
size from T to T + . The model entertained for this exercise was an ARMA (1,1) - GARCH (1,1)
model, with parameters, (µ, ar1 , ma1 , ω, α1 , β1 ), and Quadratic (1,1,1) dynamics for the skew
and shape with parameters (γ0 , γ1 , γ2 , ρ̆) and (θ0 , θ1 , θ2 , ζ̆), respectively. Because of the size
and complexity of the simulation, we chose the NIG subclass from within the GH distribution,
and as such cannot make any inference on the simulated properties of the λ parameter. However,
we have observed that some combinations of λ, ρ and ζ yield very similar results, leading us to
hypothesize that this parameter will have a very different distribution and RMSE to the other
estimated parameters, but leave this question open for further research.
The extension of dynamics to all the parameters of the distribution presents the opportunity
to go beyond the conditional time-varying covariance. In the next section, we present the timevarying higher moment tensors representation, which provide additional insights into our model’s
multivariate interactions. These form the basis for obtaining weighted moments resulting from
the geometric transformation properties of our model, and discussed in section 2.6.
[Insert Figure 1 here]
2.4
Conditional Co-Moments
The linear affine representation of the IFACD model, possesses certain geometric properties
which allow for the extension beyond the concept of covariance to that of the co-skewness and
9
We limit the upper bound of ζ to 20 for estimation ease, since values beyond this point lead to very little
change in the skewness and kurtosis, with the range 0.1 to 20 representing most of the distribution.
9
co-kurtosis as described in de Athayde and Flôres Jr (2002). As mentioned previously, the
covariance of rt is given by:
Σt = AH f,t A′ ,
(22)
where because of the independence of ft , Hf ,t = diag(h1,t , . . . , hn,t ), which is obtained from
the ACD motion dynamics of the variance. The co-moments of order 3 and 4 (from which the
co-skewness and co-kurtosis are then derived) represented as tensor matrices, are
M 3t = A′ M 3f,t (A ⊗ A),
M 4t = A′ M 4f,t (A ⊗ A ⊗ A),
(23)
where M 3f,t = tdiag(c3111,t , . . . , c3nnn,t ) and M 4f,t = tdiag(c4111,t , . . . , c4nnn,t ), with tdiag representing
the position of the conditional 3rd and 4th independent factor moments in the diagonal entries of
the n × n2 and n × n3 tensor matrices respectively.10 Since we are estimating the skew and shape
parameters in a time varying setup, these matrices are also time-varying in univariate dynamics,
but the multivariate dependence is non-time varying and provided by the mixing matrix A.
Finally in order to standardize the co-moments to represent co-skewness and co-kurtosis of rt ,
one needs to divide the entries of the tensors by the product of their standard deviations,
S ijk,t =
K ijkl,t
M 3ijk,t
,
(σi,t σj,t σk,t )
M 4ijkl,t
=
,
(σi,t σj,t σk,tσl,t )
(24)
where S ijk,t represents the asset co-skewness between elements i, j, k of r at time t, σi,t the
standard deviation of the element i of r at time t, and in the case of i = j = k represents the
asset skewness of element i at time t, and similarly for the co-kurtosis tensor K ijkl,t.
Further to the higher co-moment representation, the next section provides additional insights
into the effect of factor shocks on the underlying asset co-moments, by applying the news impact
surface method to our model.
2.5
News Impact Surface
A very revealing method for visualizing the multivariate dynamics in GARCH systems is through
the news impact function. This was originally suggested in the univariate literature by Engle
and Ng (1993), providing a visual representation of the impact of shocks on the time varying
variance. It was extended to a surface function by Kroner and Ng (1998) who compared a number
of multivariate models and the type of surfaces they generate. This was further extended in a
natural direction by Jondeau and Rockinger (2009) to include the impact of higher moment
co-dependence. While the IFACD model is mainly one of univariate independent dynamics, we
investigate the type of interactions generated by the model by constructing news impact surfaces
for the covariance and third co-moment. Since shocks impact the factors independently, the news
impact surface is a combination of the independent news impact curves of the factors which when
combined via the mixing matrix A, create the dynamics for the underlying asset-factor surface
function. Specifically, let the vector zt−1 denote those inputs known at time t − 1 for the
determination of the hi,t , m3i,t and m4i,t , denoting the variance, third and fourth moments of the
factors respectively, and denote as Z the unconditional value of those inputs except for those
to factors i and j, ẑi,t−1 and ẑj,t−1 respectively. The determination of the news impact surface
of hab,t , m3abc,t and m4abcd,t denoting the conditional covariance, third and fourth co-moments of
10
Because of independence, only indices representing the univariate parameters (i.e. 111) are non zero.
10
the underlying assets a,b, and c, with respect to shocks from factors i and j, comprise the three
dimensional graphs of the functions:
hab,t = f (A, Hf,t−1 |(ẑi,t−1 , ẑj,t−1 , Z)),
m3abc,t = f (A, M3f,t−1 |(ẑi,t−1 , ẑj,t−1 , Z)),
m4abcd,t
=
(25)
f (A, M4f,t−1 |(ẑj,t−1 , ẑj,t−1 , Z)),
where, Hf,t−1 , M3f,t−1 and M4f,t−1 denote the independent factor covariance matrix, and third
and fourth co-moment tensor matrices as in equations (22) and (23).
[Insert Figure 2 here] [Insert Figure 3 here]
Figure 2 displays the covariance news impact surface for the MSCI iShares of the commodity
producing nations of Australia (EWA) and Canada (EWC), showing the impact of shocks from
factors 12 and 17 on the variances of EWA and EWC, and their covariance. Factor 17 is the
dominating influence on these two assets, which have a sort of U-shaped curve as described by
Kroner and Ng (1998) who noted that the news impact surface of F-ARCH models is always
U-shaped, with the factors determining the direction the parabola will point11 . Figures 3a, 3b,
3c show how the third moment and co-moment for the MSCI ishares of the Netherlands (EWN)
and Austria (EWO) react to shocks from factors 8 and 10. In this case, factor 10 dominates the
the impact on the third moment of EWO, factor 8 dominates the impact on the third moment
of EWN, while jointly the picture is mixed with factor 8 having the overall largest impact. The
joint picture, namely that of the third co-moment of type iij (displayed as [EWN,EWN,EWO]
in the figure), shows how good a hedge asset j (EWO) is in terms of volatility changes in asset i
(EWN), with respect to the factor shocks, with a negative value indicating that asset j’s return
goes down with a positive increase in the volatility in country i, hence providing for a poor hedge.
As the figure shows, EWO does not provide a good hedge for EWN against large negative shocks
from factors 8 or 10. In contrast, Figure 3d, which shows the impact of factors 2 and 3 on the
MSCI iShares for Switzerland (EWL) and France (EWQ), factor 2 dominates, but the U-shaped
surface denotes that the assets provide a good hedge for each other against large negative shocks
from this factor. The information from such visual diagnostics should be supplemented with
more concrete analysis, as suggested by Jondeau and Rockinger (2009), in terms of simulations
to determine the finite sample distribution of the reactions to shocks as well as Impulse Response
functions to track the decay of the reactions to the shocks over time.
Finally, in the next section we present the weighted conditional density summation property of
the model and a fast estimation method for use in portfolio applications.
2.6
Weighted Conditional Density Summation and Portfolio Representation
With its unique attribute, in the GH family, of being closed under convolution, the N -dimensional
NIG distribution is uniquely suited to problems in portfolio and risk management where a certain weighted sum of assets is required. However, when the distributional parameters α and
β, representing skew and shape, are allowed to vary, as is the case in the model, this property no longer holds and we need to use numerical methods such as that of the Fast Fourier
Transform (FFT ) to derive the weighted density by inversion of the characteristic function of
the scaled parameters. In the case of the NIG distribution, this is greatly simplified because of
the representation of the modified Bessel function with fixed index of -0.5 which was derived in
Barndorff-Nielsen and Bläesild (1981), otherwise the characteristic function of the Generalized
11
The GO-GARCH models is equivalent to an F-ARCH model with the number of factors equal to the number
of assets, diagonal covariances and no idiosyncratic shocks (see van der Weide (2002) for details)
11
Hyperbolic involves the evaluation of the modified Bessel function with complex arguments,
which though not impossible does complicate the inversion. Appendix B of the paper derives
the characteristic functions used in the case of independent margins for both the NIG and full
GH distributions.
The portfolio return, Rt , i.e. the weighted sum of the returns vector r t through the portfolio
strategy wt , is distributed according to an N -dimensional GH distribution. Given the factor
estimates, y t , the portfolio return is given by:
Rt = w′t rt = w′t (mt + Ay t ),
1/2
= w′t mt + (w ′t AH t )z t ,
zi,t ∼ SGHλi (µi,t , δi,t , αi,t , βi,t )
(26)
1/2
where H t is a diagonal matrix with the conditional standard deviations, estimated from the
ACD dynamics of y t , z t are the N -dimensional innovations each distributed as 1-dimensional
standardized GH. The weighted asset returns, wit rit , are distributed as a scaled 1-dimensional
GH (see Bläesild (1981) for GH scaling property),
αi,t βi,t
wi,t ri,t = (wi,t µi,t + w i,t zi,t ) ∼ GHλi wi,t µ
ei,t + wi,t µx,i,t , |w i,t | δi,t ,
,
(27)
|wi,t | |w i,t |
1/2
where wt is equal to w′t AH t , and w i,t is the i -th element of w t , µi,t the mean of the i -th
underlying asset. In order to obtain the density of the portfolio, we must sum the individual
weighted densities of xi,t , either by simulation or FFT as in Chen, Härdle, and Spokoiny (2007).
We choose the latter for its accuracy and speed. In order to approximate the density of the
portfolio return, we work with the characteristic function of the GH distribution, inverting it
via the FFT method. The characteristic function12 of the portfolio return Rt is
ϕR (z) =

n
Y
ϕw̄Zi (z)
i=1
= exp iu
d
X
j=1
µ̄j +
d
X
j=1

 log
λj
2
Kλj

λ
log (γ)− 2j log (υ) +
√
√ δ̄j υ − log Kλj δ̄j γ 
,
(28)
where, γ = ᾱ2j − β̄j2 , υ = ᾱ2j − (β̄j + iu)2 , and (ᾱ, β̄, δ̄, µ̄) are the scaled versions of the parameters
(α̂, β̂, δ̂, µ̂) as shown in (27). The density may be accurately approximated by FFT13 as,
Z +∞
Z s
1
1
fp (R) =
e(−itr) ψ(z)dt ≈
e(−itr) ψ(z)dt.
(29)
2π −∞
2π −s
The cumulative distribution and quantile functions can then be approximated using this density. Since we have a time-varying density, in-sample this procedure must be carried out for
all data points (for forecasting one need only look at the n-ahead forecast horizon). Further
properties also arise naturally, such as that of portfolio variance, skewness and kurtosis from the
multivariate stage estimation, and follow from the properties of tensor matrices (see for example
de Athayde and Flôres Jr (2000)),
2
σp,t
= w′ Σt w,
sp,t =
kp,t =
12
13
w′ M 3t (w ⊗ w)
,
(w′ Σt w)3/2
w′ M 4t (w ⊗ w ⊗ w)
(w′ Σt w)2
(30)
,
See Appendix B for derivation.
For an applied overview of this methodology, see for example Paolella (2007) section 1.3.3.
12
where Σt , M 3t and M 4t are derived in equations (22) and (23).
3
Estimation and Empirical Application
In this section we report the results of a medium scale empirical exercise, with a view to highlighting the properties and performance of our proposed model and its practical implementation.
The application covers 2 key events in the recent history of the equity markets, those of the 1998
and 2009 crashes providing an ideal testing set for a risk management exercise. The first section
investigates the in-sample fit of the model, with tests on the conditional distribution choice and
parameter dynamics. The performance of the model in capturing time variation in higher moments is tested against the non-time varying model and a DCC-Student model, in the context of
a portfolio risk management application using VaR . The second section extends this exercise to
a rolling out-of-sample testing environment to capture a realistic setting around the 2009 crash.
Both sections report the results of the conditional coverage test of Christoffersen (1998) and
superior predictive ability test of Hansen (2005) at the 1% and 5% quantile levels respectively.
3.1
Selection Strategy and In-Sample Fit
As ACD models subsume GARCH models14 , one can always use some information criterion
to choose the model that best describes the data. To evaluate the contribution of the ACD
dynamics model under consideration, in the context of the current Factor model extension, we
considered the fit on the 17 iShares’ log returns for the period 19/03/1996 - 29/05/2007 of the
IFACD-GH model. This was estimated assuming a constant mean, which may seem unrealistic
for such a large period, but other method for centering and removing any autocorrelations
could be used such as univariate AR or Vector AR (VAR). The results were little changed
in a parallel exercise using a VAR(1) model for the mean. A GARCH(1,1) model was used
for the time varying variance, although as noted previously, many other parametrizations are
possible, though the more exotic a system the more likely it is that simulation methods will
be required post-estimation to generate certain properties of the model. For the distribution
skew and shape parameters, ρt and ζt , a general Quadratic(1,1,1) model was chosen for insample evaluation, but the Hannan-Quinn information criterion (HQIC )15 was used to choose
the best model from those nested in the model16 , including the non-time varying parametrization.
Since we have a highly nonlinear model, critically depending on good starting values if there
is any chance of obtaining anything other than a local solution to the optimization problem,
we followed a strategy of generating thousands of starting values for the parameters from the
uniform distribution, bounded by the parameters’ lower and upper bounds, evaluated the loglikelihood of those starting values, ranked the top 12 and initiated 12 restarts of the solver17 .
Some justification and confidence in this strategy can be found in Hu, Shonkwiler, and Spruill
(1994).
[Insert Table 1 here]
The results are in Table 1 which includes the parameters estimates and p-values of the best
fitting model for each factor. Of the 17 factors, only 1 had neither skew nor shape as time varying, while jointly of the two, 4 factors had non time varying skew dynamics and 3 non-varying
14
Since they can be represented as intercept only models in higher moment dynamics.
The HQIC, defined as −2LLn (k)/n + 2k ln(ln(n))/n, was chosen as it holds the middle ground between the
Akaike criterion (AIC) which is known to underpenalize and the Bayesian criterion (BIC ) which is known to
overpenalize the models
16
Specifically, the lowest HQIC was chosen among 16 estimated combinations of dynamics for the skew and
shape parameters for each factor.
17
We used an augmented Lagrange based solver with SQP interior interior algorithm described in Ye (1997).
15
13
shape dynamics. In the joint case, neither the time invariant skew nor the time invariant shape
where significant at the 5% level. GARCH persistence appeared to be quite high across all
factors, averaging 0.99, indicating perhaps that there are structural breaks in the dataset, a not
too unlikely phenomenon considering the long time period under consideration and the 1998
event in the set. Of the 165 total parameters characterizing the univariate ACD equation across
the 17 Factors, only 9 of them of fell outside the 1% critical value of the individual NyblomHansen statistic (of which 7 where in the variance dynamics), and of the 17 models, 2 had a
joint statistic which also fell outside the 1% critical value. While the λ parameter ranged from
4.5 to -5.5, with a mean around -3, we also tested the restriction of using the NIG subclass.
The table reports the likelihood ratio test of the full GH fitted model against the restricted NIG
model with λ = −0.5 for the factors which we run in order to justify our use of the more general
distribution family. The p-value indicates that the restricted model can be rejected for 11 out
of the 17 factors at the 5% significance level.
3.2
Portfolio Value at Risk
As the IFACD model aims to capture time-variation in parameters affecting the whole shape of
the distribution, it is natural to consider an application involving Value-at-Risk which critically
depends on such aspects of the distribution as influenced by tail events and skewness. Using
an equally weighted portfolio, we tested the IFACD-GHYP against the GO-GARCH-GHYP
and DCC-Student18 models both in-sample and out-of-sample. The out-of-sample comparative
study was performed using a period of 500 days covering the 05/12/2007 to 30/11/2009. This
period was chosen since it includes the peak (in late 2007) and trough (in early 2009) of the
last decade. In terms of our equally weighted iShares index, it represents a draw-down of more
than 60% followed by an equivalent run-up close to the same magnitude, and hence an ideal
testing period for a VaR study. The models were estimated using a strategy of rolling 1-ahead
forecasts, re-estimating the model every 25 trading days and using an increasing window size
for the estimation.19 . The relevant tests which form the basis for the evaluation of the adequacy
of the model are those of Kupiec (1995) and Christoffersen (1998) for VaR exceedances, and
Hansen (2005) for superior predictive ability and discussed in the next sections.
3.2.1
VaR exceedances
Denote rt the return of an asset at time t, and the Value at Risk based on an expected α%
coverage rate at time t given the information set Ωt−1 as V aRt|t−1 (α). The ex-post coverage
rate is then,
Pr rt < V aRt|t−1 (α) = α.
(31)
and the Indicator variable, It (α) as the ex-post realization of an exceedance such that,
(
1, if rt < V aRt|t−1 (α)
It (α) =
0, otherwise
(32)
The original test of exceedances, called unconditional coverage or proportion of failures test,
by Kupiec (1995) aimed at testing whether the observed frequency of VaR exceedances was
consistent with the expected exceedances given the quantile under consideration and a confidence
18
The DCC model was included to contrast time variation in dependence and whether our static model could
do better based on its other attributes. To stay within a 2-stage estimation framework we chose the Student
distribution, since the GH distribution does not admit a representation for 2-stage DCC estimation.
19
That means the model parameters were re-estimated a total of 500/20 = 25 times, using an estimation window
at time t0 = 05/12/2007 of size 2451 increasing that by 25 at each re-estimation point.
14
level, that is, whether Pr [It (α) = 1] = E [It (α)] = α. Under the null hypothesis of a correctly
specified model, the number of exceedances follows a binomial distribution with the probability
of obtaining X or more exceedances given a correct model given by
N X
Pr (X|N, p) =
p (1 − p)N −X
(33)
X
where p is the probability of an exceedance for the confidence level under consideration, X the
number of exceedances given the VaR quantile and N the sample size. Therefore, levels of the
probability below a given significance level lead to a rejection of the null hypothesis. The test
is usually conducted as a likelihood ratio test, with the statistic taking the form,
!
(1 − p)N −X pX
LRuc = −2 ln
(34)
N −X X X
1− X
N
N
which under the null that the model is correct, is asymptotically distributed as χ2 with 1 degree of
freedom. Obviously, the test does not consider any other attributes of the exceedances other than
their frequency leading to potential violation of the assumption of the independence of X should
those exceedances possess autocorrelation. The conditional coverage test of Christoffersen (1998)
corrects this by jointly testing the frequency as well as independence of exceedances, assuming
that the VaR violation, It (α) is modelled with a first order Markov chain whose matrix of
transition probabilities is defined by,
η00 η01
Π=
(35)
η10 η11
where ηij = Pr [It (α) = j|It−1 (α) = i]. Therefore, the null of the conditional independence is
then defined by,
1−α α
H0 : Π = Πα =
(36)
1−α α
which means that the probability of having an exceedance, irrespective of the state at time t,
should be equal to the coverage rate α. The test is a joint likelihood ratio of the conditional
and unconditional coverage, expressed as,
!
(1 − π)η00 +η10 π η01 +η11
(1 − p)N −E pE
LRcc = −2 ln
(37)
N −E E E − 2 ln
(1 − π0 )η00 π0η01 (1 − π1 )η10 π1η11
1− E
N
where
π0 =
η01
,
η00 + η01
N
π1 =
η11
η10 + η11
andπ =
η01 + η11
η00 + η01 + η10 + η11
(38)
which is asymptotically distributed as χ2 with 2 degrees of freedom. It is important to report
both the joint and separate unconditional test since it is always possible that the joint test passes
while failing either the independence or unconditional coverage test. Other extensions include
tests by Engle and Manganelli (2004) and Patton (2002), but for the current exercise only the
conditional coverage test is used as it is believed to be adequate and easy to replicate.
[Insert Table 2 here]
The marginal contribution of the higher moment dynamics was nicely captured in the insample VaR exceedances test reported in Table 2. As would be expected, from a model which
aims to capture time variation in the tails, in-sample the IFACD model outperformed the GOGARCH model at the 1% quantile, with the DCC model also performing particularly well. At
15
the 5% quantile, both IFACD and GO-GARCH models passed the unconditional coverage test
but failed the conditional coverage test. We hypothesize that this may be an artifact of the timeinvariant mixing matrix together with a constant mean in the specification. In the out-of-sample
estimation, what we observe is that both IFACD and GO-GARCH models perform equally well
at the 1% and equally badly at the 5% quantiles. The fact that the GO-GARCH performs
so well is likely due to the frequency of re-estimation (25 days) which starts to approximate a
time varying model. In fact, the frequent re-estimation allows for changes in the independent
factors to be captured on a lagged basis, which leads us to hypothesize that the DCC model,
which implicitly models dynamic dependence, fails out of sample due to the use of the Student
distribution which does not provide the required tail and asymmetry variation.
3.2.2
Superior Predictive Ability
The VaR exceedances test may be considered a rather crude method to capturing the differences
in such closely related models, as it distinguishes on the bases of integer exceedances and does
not look at some measure of average exceedance which would capture finer differences. Also, it
does require a large amount of out of sample exceedance data to avoid to possibility of datasnooping bias, which is where tests such as those of White (2000) and Hansen (2005), using an
appropriate loss function, come into play. For the VaR based test, we follow Gonzalez-Rivera,
Lee, and Mishra (2004) and define a statistical loss function used in quantile estimation, which
for a given α is defined as,
Qloss ≡ N −1
T
X
t=R
α
α − 1 rt+1 < V aRt+1
α
rt+1 − V aRt+1
,
(39)
where P = T − R is the number of out-of-sample horizon, T the total horizon to include
estimation, and R the start of the out-of-sample forecast. This is an asymmetric loss function,
linearly penalizing exceedances more heavily by (1−α). Because of the non-differentiable nature
of the indicator function 1, we again follow Gonzalez-Rivera, Lee, and Mishra (2004) and replace
it with the approximation:
−1
α
α
1 rt+1 < V aRt+1
≈ 1 + exp δ rt+1 − V aRt+1
(40)
which is found to very closely match the indicator function for values of δ equal to 25.20
Among the tests for choosing superior models, while controlling for data snooping bias, the
reality check of White (2000) has proven popular but the power of the test suffers from the
inclusion of a poor model, a shortcoming addressed by the Superior Predictive Ability (SPA)
Test of Hansen (2005) which was used in a related exercise comparing univariate GARCH models
by Hansen and Lunde (2005). The test considers the relative loss, between the benchmark model
loss L0,t and every other included model’s loss Lk,t,
Xk,t ≡ L0,t − Lk,t ,
k = 1, ..., l,
t = 1, ..., n.
(41)
The null hypothesis under the test, is that a chosen benchmark model is as good as any other
model in terms of expected loss, which may be formulated as a hypothesis that H0 : µk ≡
E (Xk,t ) 6 0, since when µk > 0 is equivalent to a model k being better than the benchmark in
relative loss terms. The test statistic considered to test the hypothesis is the maximum of the
standardized relative losses,
n1/2 X̄k
Tn = max
(42)
k
σ̂k
20
That is, when dealing with percentages, otherwise for decimals use 2500.
16
where,
X̄k = n
−1
n
X
Xk (t)
(43)
t=1
and σ̂k2 = var(n1/2 X̄k ) is estimated by stationary bootstrap. The distribution of the test statistic
Tn under the assumption of a true null hypothesis, and its power is extensively discussed in
Hansen (2005).
[Insert Table 3 here]
The insights from the exceedances test discussed previously are enforced here by the SPA test.
Table 3 compares the weighted portfolio loss functions (reported as %) for the IFACD and GOGARCH models at the 1% and 5% quantiles. As in the exceedances test, the in-sample results
confirm the superior performance of the IFACD model, whilst out-of-sample both models perform
equally well. The power of the test is implied by reversing the benchmark with the model, rerunning the test and observing whether the results hold21 .
[Insert Figure 4 here]
In an illustrative contrast, Figure 4 shows the iShares equally weighted portfolio index, volatility, skewness and kurtosis generated by the IFACD-NIG, IFGARCH-NIG and DCC-Student
models for the 500 day period of the out-of-sample estimation. The frequent re-estimation of
the GO-GARCH model resulted in an apparent time variation in the higher moments, where
this obviously was not present in-sample which covered a large single estimation period. While
this generated enough variation in the tails to pass the VaR tests at the 1% quantile level, the
same could not be said of the DCC Student model which generated very little in terms of excess
kurtosis with an average degree of freedom parameter across the re-estimations of 1422 .
4
Conclusion
Giovanni/Eduardo –¿ your contributions here.
Research in the importance of higher moments in portfolio and risk management has received
a growing amount of attention in recent years. Not only is it important to consider skewed and
heavy tailed distributions but there appears to be strong evidence that there is also time variation
in the higher moment and as such cannot be ignored when considering conditional density
dynamics. The ability to such dynamics in a multivariate setup has been almost impossible due
to the form of most multivariate distributions, few admitting a tractable representation, and for
those which do the dimensionality of any resulting formulation has been infeasible to estimate.
In the Independent Factor Autoregressive Conditional Density Model, we have provided a first
attempt to incorporate such time variation by adopting an independent factor framework. Some
evidence shows that a dynamic higher moment model provides for superior results versus an
equivalent static representation, and we hypothesize that results will be mixed depending on
the type of dataset, time period under consideration as well as the context. We expect that
a model incorporating some sort of dynamics or moving window for the factor independence
would provide for a better fit, but leave this as an extension for further research.
21
This is simple to confirm when there is 1 benchmark and 1 model, which is why we exclude the DCC model
in this study in order to contrast with the directly competing model.
22
The excess kurtosis of the Student Distribution is 6/ν − 4 for nu > 4.
17
1e−06
2000
4000
6000
rmse
2e−07
4000
6000
8000 10000
2000
0.16
0.14
6000
8000 10000
2000
6000
2000
4000
0.16
0.5
(h) γ1
rmse
rmse
4000
6000
8000 10000
0.08
0.1
0.2
0.20
0.15
2000
T
2000
4000
6000
T
8000 10000
T
(j) δ1
2000
4000
6000
T
(k) θ0
(l) θ1
0.06
0.08
0.04
0.10
rmse
0.14
0.18
(i) γ2
rmse
8000 10000
0.4
0.30
6000
T
(g) γ0
0.25
rmse
0.06
8000 10000
8000 10000
T
(f) β1
0.04
6000
4000
T
0.02
4000
rmse
0.08
4000
0.04
0.06
2000
(e) α1
2000
0.10
rmse
8000 10000
T
0.3
6000
8000 10000
(d) ω
0.020
4000
6000
T
(c) ma1
rmse
2000
4000
T
0.010
rmse
2000
(b) ar1
0.010 0.015 0.020 0.025
(a) µ
rmse
8000 10000
T
0.12
8000 10000
T
0.12
6000
6e−07
0.4
0.1
0.1
4000
0.3
0.2
rmse
0.3
0.2
rmse
0.4
0.00020
rmse
0.00010
2000
2000
4000
6000
8000 10000
2000
T
4000
6000
8000 10000
T
(m) θ2
(n) ψ1
Figure 1: ACD Model Simulation: True vs Estimated Root Mean Squared Error
18
8000 10000
Table 1: MSCI iShares 17 (19/03/1996 - 05/12/2007) Independent Factor ACD-GHYP Model Fit.
ω
α1
β1
λ
γ0
Y1
0 .0027
[0.073]
0 .053
[0.000]
0 .946
[0.000]
4.540
[0.000]
0.032
[0.258]
γ1
γ2
ρ˘1
θ0
0.251
[0.271]
Y2
0 .0022
[0.067]
0 .062
[0.000]
0 .937
[0.000]
-1.136
[0.484]
-0.002
[0.163]
-0.011
[0.223]
0.002
[0.198]
0.999
[0.000]
1.362
[0.000]
θ1
19
Y3
0 .003
[0.113]
0 .052
[0.000]
0 .947
[0.000]
-3.967
[0.000]
-0.470
[0.007]
1.617
[0.000]
0.260
[0.022]
0.856
[0.000]
-0.164
[0.000]
-0.718
[0.000]
Y4
0 .0071
[0.153]
0 .048
[0.013]
0 .945
[0.000]
-3.765
[0.000]
-0.247
[0.776]
2.000
[0.049]
0.390
[0.244]
0.355
[0.237]
-1.053
[0.188]
1.534
[0.067]
Y5
0 .0042
[0.319]
0 .106
[0.042]
0 .893
[0.000]
-3.038
[0.000]
-0.057
[0.000]
-0.172
[0.069]
0.017
[0.012]
0.964
[0.000]
-4.475
[0.000]
-1.567
[0.097]
Y6
0 .0126
[0.025]
0 .062
[0.000]
0 .932
[0.000]
-1.521
[0.058]
-0.436
[0.000]
0.583
[0.000]
0.203
[0.004]
0.314
[0.027]
-0.296
[0.305]
-0.339
[0.211]
0.985
[0.000]
0.893
[0.000]
0.063
[0.050]
0.879
[0.000]
θ2
ζ˘1
Y7
0 .0029
[0.057]
0 .052
[0.000]
0 .946
[0.000]
-5.422
[0.000]
-0.286
[0.000]
0.057
[0.656]
0.326
[0.000]
0.955
[0.000]
-3.379
[0.000]
0.263
[0.703]
-0.243
[0.529]
Y8
0 .0026
[0.031]
0 .053
[0.000]
0 .946
[0.000]
-5.134
[0.000]
-0.570
[0.122]
0.835
[0.038]
0.260
[0.291]
-8.022
[0.000]
1.054
[0.111]
2.000
[0.002]
Y9
0 .0041
[0.099]
0 .046
[0.000]
0 .951
[0.000]
-5.510
[0.000]
-0.317
[0.770]
2.000
[0.017]
-0.728
[0.385]
0.488
[0.024]
Y1 0
0 .0219
[0.023]
0 .065
[0.000]
0 .914
[0.000]
-3.451
[0.000]
0.002
[0.823]
-0.064
[0.882]
Y1 1
0 .001
[0.246]
0 .015
[0.000]
0 .984
[0.000]
-4.971
[0.000]
-1.741
[0.014]
0.961
[0.112]
0.766
[0.002]
0.999
[0.000]
-5.276
[0.745]
-2.000
[0.964]
-0.932
[0.000]
-2.000
[0.197]
Y1 2
0 .0012
[0.119]
0 .030
[0.000]
0 .968
[0.000]
-4.181
[0.001]
-0.187
[0.145]
-0.251
[0.225]
0.182
[0.131]
0.893
[0.000]
-0.078
[0.287]
-1.400
[0.110]
0.293
[0.877]
0.886
[0.000]
0.927
[0.000]
Y1 3
0 .0015
[0.166]
0 .042
[0.000]
0 .957
[0.000]
-4.629
[0.000]
0.193
[0.252]
Y1 4
0 .001
[0.251]
0 .024
[0.001]
0 .975
[0.000]
2.358
[0.000]
0.055
[0.029]
-3.768
[0.015]
2.000
[0.003]
0.728
[0.042]
-0.985
[0.006]
-2.000
[0.082]
Y1 5
0 .0012
[0.359]
0 .020
[0.058]
0 .979
[0.000]
-2.989
[0.000]
-0.312
[0.060]
0.586
[0.001]
Y1 6
0 .001
[0.225]
0 .024
[0.004]
0 .975
[0.000]
-3.612
[0.000]
-0.208
[0.051]
-4.153
[0.000]
-1.039
[0.300]
2.000
[0.097]
0.722
[0.000]
0.048
[0.930]
-0.235
[0.294]
0.439
[0.116]
-0.328
[0.010]
0.812
[0.000]
Y1 7
0 .0044
[0.028]
0 .066
[0.000]
0 .933
[0.000]
-4.836
[0.000]
0.809
[0.061]
1.359
[0.035]
-1.486
[0.000]
0.368
[0.002]
-2.772
[0.000]
1.903
[0.098]
-2.000
[0.066]
HQIC
Log.Lik
Nyblom
Nyblom* 1%
2.55
-3746
1.61
[2.12]
2.11
-3094
1.74
[2.82]
2.53
-3703
3.06
[3.27]
2.60
-3807
2.28
[3.27]
1.87
-2732
4.40
[3.27]
2.68
-3932
2.90
[3.27]
2.51
-3677
2.12
[3.27]
2.56
-3755
4.01
[3.05]
2.64
-3877
1.57
[2.59]
2.67
-3913
2.15
[3.05]
2.69
-3951
2.19
[3.05]
2.61
-3821
2.41
[3.27]
2.60
-3816
1.70
[2.59]
2.60
-3821
2.48
[2.59]
2.64
-3872
1.70
[3.27]
2.60
-3817
2.20
[2.59]
2.58
-3783
1.74
[3.27]
Log.Lik (λ = -0.5)
LR Stat
p-value
-3746
-0.69
0.41
-3098
-7.39
0.01
-3709
-12.20
0.00
-3955
-297.21
0.00
-2769
-74.34
0.00
-3932
-0.14
0.70
-3680
-5.90
0.02
-3757
-3.61
0.06
-3879
-3.65
0.06
-4036
-246.93
0.00
-4117
-332.13
0.00
-3824
-6.43
0.01
-3818
-3.66
0.06
-3825
-8.28
0.00
-3877
-8.81
0.00
-3819
-3.84
0.05
-3792
-18.27
0.00
Notes to table 1: The table reports the coefficients, p-values (based on robust standard errors) and fit statistics for the factors in the Independent Factor ACD-GHYP model in-sample estimation. For the
variance, GARCH(1,1) dynamics where chosen, while for the skew and shape parameters the Hannan-Quinn Criterion was used to choose from a combination of 16 models consisting of skew-shape model
combinations nested in the Quadratic[1,1,1] model including the non time varying combinations. Of the 17 factors, 4 had no time varying skew parameter, and 3 non time varying shape parameters, with one
within the intersection set having neither. GARCH persistence appeared to be quite high across all factors, averaging 0.99, indicating perhaps that there are structural breaks in the dataset (which is not unusual
considering the period included the 1998 crash). The λ parameter ranged from 4.5 to -5.5 with a mean around -3/. Of the 165 total parameters characterizing the univariate ACD equation across the 17 Factors,
only 9 of them of fell outside the 1% critical value of the individual Nyblom-Hansen statistic (of which 7 where in the variance dynamics), and of the 17 models, 2 had a joint statistic which also fell outside the
1% critical value. The table also includes the likelihood ratio test of the full GH fitted model against the restricted NIG model with λ = −0.5 for the factors. The p-value indicates that the restricted model can
be rejected for 11 out of the 17 factors at the 5% significance level.
Table 2: MSCI iShares 17 (19/03/1996 - 30/11/2009) Independent Factor ACD Model Portfolio: Portfolio VaR Exceedances Tests.
In-Sample
19/03/1996 - 05/12/2007
Q(α = 1%)
Exceedances
Expected
Unconditional Coverage Test (Kupiec)
Statistic
Critical Value
Conditional Coverage Test (Christoffersen)
Statistic
IFACD-GHYP
40
3.397
[0.065]
3.716
[0.156]
20
Q(α = 5%)
Exceedances
IFACD-GHYP
134
Critical Value
Conditional Coverage Test (Christoffersen)
Statistic
Critical Value
7.965
[0.005]
0.404
[0.525]
12.219
[0.002]
1.16
[0.56]
5.991
DCC-Student
13
1.538
[0.215]
1.538
[0.215]
8.973
[0.003]
1.799
[0.407]
1.799
[0.407]
9.669
[0.008]
GO-GARCH-GHYP
45
DCC-Student
44
13.755
[0.000]
12.518
[0.000]
15.255
[0.000]
15.99
[0.000]
5.991
GO-GARCH-GHYP
136
DCC-Student
172
IFACD-GHYP
43
25
0.968
[0.325]
4.076
[0.043]
3.841
11.867
[0.003]
GO-GARCH-GHYP
8
3.841
147.5
1.34
[0.247]
IFACD-GHYP
8
5
3.841
5.991
Unconditional Coverage Test (Kupiec)
Statistic
DCC-Student
33
29.5
Critical Value
Expected
GO-GARCH-GHYP
46
Out-of-Sample
06/12/2007 - 30/11/2009
11.331
[0.001]
3.841
10.901
[0.004]
5.674
[0.059]
12.428
[0.002]
5.991
Notes to table 2: The table reports the in-sample and out-of-sample Value at Risk exceedances and related tests, at the 1% and 5% quantile levels (α) given the
weighted portfolio density for the IFACD-GHYP, GO-GARCH GHYP and DCC-GARCH-Student models. The Null Hypothesis is that the model generates the correct
number of exceedances and that those exceedances are independent, based on a likelihood ratio test devised by (Christoffersen 1998). In-sample, the GO-GARCH
model fails the unconditional exceedance test at the 1% quantile, while both GO-GARCH and IFACD models fail the stronger test of independence at the 5% quantile.
The DCC-GARCH Student on the other hand passes the test at the 1% but not at the 5% quantile. Out of sample, both IFACD and GO-GARCH pass test at the 1%
quantile while the DCC model fails. All models fail out of sample at the 5% quantile.
Table 3: MSCI iShares 17 (19/03/1996 - 30/11/2009) Independent Factor ACD Model Portfolio:
Portfolio VaR SPA Test.
SPA Test
In-Sample
19/03/1996 - 05/12/2007
IFACD-GHYP
GO-GARCH-GHYP
(benchmark)
0.0373
0.0378
Qloss (α = 1%)
Loss
p-values:
Lower
Consistent
Upper
Power
Out-of-Sample
06/12/2007 - 30/11/2009
IFACD-GHYP
GO-GARCH-GHYP
(benchmark)
0.0728
0.0723
0.5133
0.5133
0.9795
YES
IFACD-GHYP
(benchmark)
0.1240
Qloss (α = 5%)
Loss
p-values:
Lower
Consistent
Upper
Power
0.1831
0.1831
0.1831
NO
GO-GARCH-GHYP
IFACD-GHYP
(benchmark)
0.2543
0.1242
GO-GARCH-GHYP
0.2547
0.5088
0.9441
0.9441
YES
0.5127
0.7643
0.7643
NO
Notes to table 3: The table reports the in-sample and out-of-sample test of Superior Predictive Ability of
Hansen (2005) based on statistical loss function (reported as %) used in quantile estimation and defined in
Gonzalez-Rivera, Lee, and Mishra (2004). Power of the test is discovered by reversing the position of the
benchmark with the model and observing whether the results hold.
[EWA]
[EWC]
−4
−4
x 10
4.52
2.58
Variance
2.6
4.5
4.48
2.56
2.54
4.46
5
2.52
5
5
0
z12, t−1
5
0
0
−5
−5
z12, t−1
z17, t−1
0
−5
(a) i
−5
(b) j
[EWA, EWC]
−4
x 10
1.94
1.92
Covariance
Variance
x 10
4.54
1.9
1.88
1.86
1.84
5
5
0
z12, t−1
0
−5
−5
z17, t−1
(c) i,j
Figure 2: Covariance News Impact Surface
21
z17, t−1
[EWO]
[EWN]
−4
−3
x 10
6
1
4
0.5
Third Moment
Third Moment
x 10
2
0
−2
0
−0.5
−4
−1
−6
40
−1.5
40
20
40
20
20
0
−40
−20
−40
z8, t−1
z10, t−1
(a) i
−40
z10, t−1
(b) j)
[EWL,EWL,EWQ]
[EWN,EWN,EWO]
−4
−6
x 10
x 10
1.5
5
1
4
Third Co−Moment
Third Co−Moment
0
−20
−20
−40
z8, t−1
40
20
0
0
−20
0.5
0
−0.5
3
2
1
−1
0
−1.5
40
−1
40
20
40
20
20
0
z8, t−1
40
−40
0
−20
−20
−40
20
0
0
−20
z2, t−1
z10, t−1
(c) i,i,j
−20
−40
−40
z3, t−1
(d) i,i,j
Figure 3: Third Co-Moment News Impact Surface
Volatility
80
0.03
70
0.02
60
50
0.01
40
2008
2009
2008
2009
Ex.Kurtosis
IFACD
GO−GARCH
DCC
2
4
−1.0
6
−0.5
8
10
0.0
12
Skewness
IFACD
GO−GARCH
2008
0
−1.5
Index
IFACD
GO−GARCH
DCC
0.04
0.05
90 100
iShares (Equally Weighted)
2009
2008
2009
Figure 4: Out-of-Sample Models’ Conditional Portfolio Moments.
22
A
Standardized Generalized Hyperbolic Density
In order to model zero-mean, unit variance processes, the distribution, which must posses the
scaling property, needs to be properly standardized. In distributions where the expected moments are functions of all the parameters, it is not immediately obvious how to perform such
a transformation. In the case of the GHYP distribution, because of the existence of location
and scale invariant parametrizations and the possibility of expressing the variance in terms of
one of those parametrization, namely the (ζ, ρ), the task of standardizing and estimating the
density can be broken down to one of estimating those 2 parameters, representing a combination
of shape and skewness, followed by a series of transformation steps to demean, scale and then
translate the parameters into the (α, β, δ, µ) parametrization for which standard formulae exist
for the likelihood function. The (ξ, χ) parametrization, which is a simple transformation of the
(ζ, ρ), could also be used in the first step and then transformed into the latter before proceeding
further. The only difference is the kind of ’immediate’ inference one can make from the different
parametrizations, each providing a different direct insight into the kind of dynamics produced
and their place in the overall GHYP family particularly with regards to the limit cases.
When estimating the (ζ, ρ) parameters, it is important to place constraints on their bounds in
order to achieve good convergence. As shown in 20, values for ρ are bounded in (−1, 1), while
for ζ, a reasonable range would be (0.1, 20). Having estimated the parameters, the next steps
involve a transformation into the (α, β, δ, µ) while at the same time including the necessary
recursive substitution of parameters in order to standardize the resulting distribution.
Proof 1 The Standardized Generalized Hyperbolic Distribution. Let εt be a r.v. with mean (0)
and variance (σt 2 ) distributed as GHY P (ζt , ρt ), and let zt be a scaled version of the r.v. εt with
variance (1) and also distributed as GHY P (ζt , ρt ).23 The density ft (.) of zt can be expressed as
ft (
εt
1
1
; ζt , ρt ) = ft (zt ; ζt , ρt ) = ft (zt ; α̃t , β̃t , δ̃t , µ̃t ),
σt
σt
σt
(44)
where we make use of the (α, β, δ, µ) parametrization since we can only naturally express the
density in that parametrization. The steps to transforming from the (ζ, ρ) to the (α, β, δ, µ)
parametrization, while at the same time standardizing for zero mean and unit variance are given
henceforth.
Let
p
ζ = δ α2 − β 2
(45)
β
ρ =
,
(46)
α
which after some substitution may be also written in terms of α and β as,
α =
p
δ
β = αρ.
23
ζ
,
(1 − ρ2 )
The parameters ζt and ρt do not change as a result of being location and scale invariant
23
(47)
(48)
For standardization we require that,
E (X) = µ + p
βδ2 Kλ+1 (ζ)
ζ Kλ (ζ)
∴µ = −
V ar (X) = δ
Kλ+1 (ζ)
βδ2 Kλ+1 (ζ)
=µ+
=0
ζ Kλ (ζ)
α2 − β 2 Kλ (ζ)
βδ
2
(49)
Kλ+1 (ζ)
β2
+ 2
ζKλ (ζ)
α − β2
Kλ+1 (ζ)
β2
+ 2
ζKλ (ζ)
α − β2
Since we can express, β 2 / α2 − β 2 as,
∴δ =
α2
!!
Kλ+2 (ζ)
Kλ+1 (ζ) 2
−
=1
Kλ (ζ)
Kλ (ζ)
!!−0.5
Kλ+2 (ζ)
Kλ+1 (ζ) 2
−
Kλ (ζ)
Kλ (ζ)
α2 ρ2
α2 ρ2
β2
ρ2
= 2
= 2
=
,
2
2
2
2
−β
a −α ρ
a (1 − ρ )
(1 − ρ2 )
(50)
(51)
then we can re-write the formula for δ in terms of the estimated parameters ζ̂ and ρ̂ as,



2 −0.5
Kλ+1 ζ̂
ρ̂2
 Kλ+1 ζ̂
 Kλ+2 ζ̂


 
+
−
δ=


2)
(1
−
ρ̂
ζ̂Kλ ζ̂
Kλ ζ̂
Kλ ζ̂
(52)
Transforming into the (α̃, β̃, δ̃, µ̃) parametrization proceeds by first substituting 52 into 47 and
simplifying,

2  0.5
Kλ+2 (ζ̂ ) (Kλ+1 (ζ̂ ))
ρ̂2
−
2
K
ζ̂
K
(ζ̂ )
(Kλ (ζ̂ ))
λ( )

+
ζ̂  λ+1
(1−ρ̂2 )
ζ̂Kλ (ζ̂ )
p
α̃ =
,
(1 − ρ̂2 )

2  0.5
Kλ+2 (ζ̂ ) (Kλ+1 (ζ̂ ))
−
ζ̂ 2 ρ̂2
Kλ (ζ̂ )
(Kλ (ζ̂ ))2
 ζ̂Kλ+1 (ζ̂ ) +

2)
(1−ρ̂
Kλ (ζ̂ )
p
,
=
(1 − ρ̂2 )
 0.5

2
Kλ+2 (ζ̂ ) Kλ+1 (ζ̂ )
(Kλ+1 (ζ̂ ))
ζ̂Kλ+1 (ζ̂ )
2
2
ζ̂ ρ̂
−
2
 Kλ (ζ̂ )

Kλ+1 (ζ̂ ) Kλ (ζ̂ )
(Kλ (ζ̂ ))

 ,
= 
+

(1 − ρ̂2 )
(1 − ρ̂2 )2

ζ̂Kλ+1 (ζ̂ )

 Kλ (ζ̂ ) 

= 
 (1 − ρ̂2 ) 1 +
ζ̂ ρ̂2
Kλ+2 (ζ̂ )
Kλ+1 (ζ̂ )
−
Kλ+1 (ζ̂ )
Kλ (ζ̂ )
(1 − ρ̂2 )
 0.5



.
(53)
Finally, the rest of the parameters are derived recursively from α̃ and the previous results,
β̃ = α̃ρ̂,
δ̃ =
µ̃ =
(54)
ζ̂
p
,
α̃ 1 − ρ̂2
ζ̂
.
ζ̂Kλ ζ̂
−β̃ δ̃2 Kλ+1
24
(55)
(56)
For the use of the (ξ, χ) parametrization in estimation, the additional preliminary steps of converting to the (ζ, ρ) are,
ζ =
ρ =
B
1
ξ̂ 2
χ̂
.
ξˆ
− 1,
(57)
(58)
The Generalized Hyperbolic Characteristic Function
The moment generating function (MGF ) of the Generalized Hyperbolic (GH ) Distribution is,
2
u
µu
√
+ βu ,
MGH(λ,α,β,δ,µ) (u) = e MGIG λ,δ α2 −β 2
2
q
2
(59)
λ/2 Kλ δ α2 − (β + u)
2 − β2
α
p
= eµu
α2 − (β + u)2
Kλ δ α2 − β 2
where MGIG represents the moment generating function of the Generalized Inverse Gaussian
which forms the mixing distribution in this variance-mean mixture subclass. Powers of the MGF,
MGH (u)p , only have the representation in 59 for p = 1, which means that GH distributions are
not closed under convolution with the exception of the Normal Inverse Gaussian (NIG), and
only in the case when the shape and skew parameters are the same. The MGF of the NIG is,
√
δ α2 −β 2
µu e
√
MN IG(α,β,δ,µ) (u) = e
.
(60)
2
2
eδ α −(β+u)
Powers of p are equivalent in this case to multiplication by p of δ and µ, so that,
N IG(α, β, δ1 , µ1 ) ∗ ... ∗ N IG(α, β, δn , µn ) = N IG(α, β, δ1 + ... + δn , µ1 + ... + µn ).
(61)
In all other cases, when the distribution is not closed under convolution, numerical methods are
required such as the inversion of the characteristic function by Fast Fourier Transform (FFT ).
Because the MGF is a holomorphic function for complex z, with |z| < α − β, we can obtain the
characteristic function of the GH distribution, using the following representation,
φGH (u) = MGHY P (iu),
(62)
so that the characteristic function may be written as,
φGH(λ,α,β,δ,µ) (u) = eµiu
and for the NIG this is simplified to,
q
2
2
λ/2 Kλ δ α − (β + iu)
α2 − β 2
p
,
α2 − (β + iu)2
Kλ δ α2 − β 2
φN IG(α,β,δ,µ) (u) = eµiu
eδ
√
√
(63)
α2 −β 2
.
(64)
2
2
eδ α −(β+iu)
In order to find the weighted summation of the portfolio density in the case of the IFACD
model, the characteristic function required for the inversion of the NIG density was already
used in Chen, Hardle, and Spokoiny (2007) and given below,


q
d
d
q
X
X
2
φport (u) = exp iu
µ̄j +
δ̄j
ᾱ2j − β̄j2 − ᾱ2j − (β̄j + iu) 
(65)
j=1
j=1
25
where ᾱj , β̄j , δ̄j and µ̄j represent the parameters scaled as described in the main text of the
thesis. In the case of the GH characteristic function, this is a little more complicated as it
involves the evaluation of modified Bessel function of the third kind with complex arguments24 .
Taking logs and summing,



λj
2 − β̄ 2 − λj log ᾱ2 − (β̄ + iu)2 +
d
d
log
ᾱ
j
X
j
j
j
2
2  X

q
q

µ̄j +
φport (u) = exp iu


2
2
2
2
log Kλj δ̄j ᾱj − (β̄j + iu)
− log Kλj δ̄j ᾱj − β̄j
j=1
j=1
(66)
which is more than 30 times slower to evaluate than the equivalent NIG function because of the
Bessel function evaluations.
24
Routines for this exist for example on netlib, see http://www.netlib.org/amos/zbesk.f
26
References
Aas, K., and I. Haff (2006): “The generalized hyperbolic skew Student’s t-distribution,”
Journal of Financial Econometrics, 4(2), 275–309.
Alexander, C. (2001): “Orthogonal garch,” Mastering risk, 2, 21–38.
Barndorff-Nielsen, O. (1977): “Exponentially decreasing distributions for the logarithm
of particle size,” Proceedings of the Royal Society of London. Series A, Mathematical and
Physical Sciences, 353(1674), 401–419.
Barndorff-Nielsen, O., and P. Bläesild (1981): Hyperbolic distributions and ramifications: Contributions to theory and application. Matematisk Institut, Aarhus Universitet.
Barndorff-Nielsen, O., J. Kent, and M. Sörensen (1982): “Normal variance-mean mixtures and z distributions,” International Statistical Review/Revue Internationale de Statistique, pp. 145–159.
Bläesild, P. (1981): “The two-dimensional hyperbolic distribution and related distributions,
with an application to Johannsen’s bean data,” Biometrika, 68(1), 251.
Bollerslev, T. (1986): “Generalized Autoregressive Conditional Heteroskedasticity,” Journal
of Econometrics, 31, 307–327.
(1990): “Modelling the coherence in short-run nominal exchange rates: a multivariate
generalized ARCH model,” The Review of Economics and Statistics, 72(3), 498–505.
Bond, S., and K. Patel (2003): “The conditional distribution of real estate returns: Are
higher moments time varying?,” The Journal of Real Estate Finance and Economics, 26(2),
319–339.
Box, G., and D. Cox (1964): “An analysis of transformations,” Journal of the Royal Statistical
Society. Series B (Methodological), pp. 211–252.
Broda, S., and M. Paolella (2009): “CHICAGO: A Fast and Accurate Method for Portfolio
Risk Calculation,” Journal of Financial Econometrics, 7(4), 412.
Brooks, C., S. Burke, S. Heravi, and G. Persand (2005): “Autoregressive conditional
kurtosis,” Journal of Financial Econometrics, 3(3), 399–421.
Chen, Y., W. Hardle, and S. Jeong (2008): “Nonparametric risk management with generalized hyperbolic distributions,” Journal of the American Statistical Association, 103(483),
910–923.
Chen, Y., W. Hardle, and V. Spokoiny (2007): “Ghica-risk analysis with gh distributions
and independent components,” WIAS Preprint, 1064.
Chen, Y., W. Härdle, and V. Spokoiny (2007): “Portfolio value at risk based on independent component analysis,” Journal of Computational and Applied Mathematics, 205(1),
594–607.
Christoffersen, P. (1998): “Evaluating interval forecasts,” International Economic Review,
39(4), 841–862.
de Athayde, G., and R. Flôres Jr (2000): “Introducing Higher Moments in the CAPM:
Some Basic Ideas,” mimeo.
27
(2002): “On Certain Geometric Aspects of Portfolio Optimisation with Higher Moments,” mimeo.
Engle, R. (1982): “Autoregressive conditional heteroscedasticity with estimates of the variance
of United Kingdom inflation,” Econometrica, pp. 987–1007.
(2002): “Dynamic conditional correlation,” Journal of Business and Economic Statistics, 20(3), 339–350.
Engle, R., and S. Manganelli (2004): “CAViaR: Conditional autoregressive value at risk
by regression quantiles,” Journal of Business & Economic Statistics, 22(4), 367–382.
Engle, R., and V. Ng (1993): “Measuring and testing the impact of news on volatility,”
Journal of Finance, pp. 1749–1778.
Engle, R., V. Ng, and M. Rothschild (1990): “Asset Pricing with a Factor ARCH Covariance Structure: Empirical Estimates for Treasury Bills,” Journal of Econometrics, 45,
213.
Ferreira, J., and M. Steel (2003): “Bayesian Multivariate Regression Analysis with a New
Class of Skewed Distributions,” mimeo.
Gonzalez-Rivera, G., T. Lee, and S. Mishra (2004): “Forecasting volatility: A reality
check based on option pricing, utility function, value-at-risk, and predictive likelihood,” International Journal of Forecasting, 20(4), 629–645.
Hansen, B. (1990): “Lagrange multiplier tests for parameter instability in non-linear models,”
mimeo.
(1994): “Autoregressive conditional density estimation,” International Economic Review, 35, 705–730.
Hansen, P. (2005): “A test for superior predictive ability,” Journal of Business and Economic
Statistics, 23(4), 365–380.
Hansen, P., and A. Lunde (2005): “A forecast comparison of volatility models: Does anything
beat a GARCH (1, 1)?,” Journal of Applied Econometrics, (20), 873–889.
Harvey, C., and A. Siddique (2009): “Autoregressive conditional skewness,” Journal of
Financial and Quantitative Analysis, 34(04), 465–487.
Hu, X., R. Shonkwiler, and M. Spruill (1994): “Random restarts in global optimization,”
Georgia Institute of technology, Atlanta.
Hyvärinen, A., and E. Oja (1999): “Independent component analysis: algorithms and applications,” Neural networks, 13(4-5), 411–430.
Jensen, M., and A. Lunde (2001): “The NIG-S model: a fat-tailed, stochastic, and autoregressive conditional heteroskedastic volatility model,” The Econometrics Journal, 4(2),
319–342.
Jondeau, E., and M. Rockinger (2003): “Conditional volatility, skewness, and kurtosis: existence, persistence, and comovements,” Journal of Economic Dynamics and Control, 27(10),
1699–1737.
(2006): “Optimal Portfolio Allocation under Higher Moments,” European Financial
Management, 12(1), 29–55.
28
(2009): “The Impact of Shocks on Higher Moments,” Journal of Financial Econometrics, 7(2), 77.
Kroner, K., and V. Ng (1998): “Modeling asymmetric comovements of asset returns,” Review
of Financial Studies, 11(4), 817.
Kupiec, P. (1995): “Techniques for verifying the accuracy of risk measurement models,” The
journal of Derivatives, 3(2), 73–84.
Lai, T. (1991): “Portfolio selection with skewness: a multiple-objective approach,” Review of
Quantitative Finance and Accounting, 1(3), 293–305.
Mandelbrot, B. (1963): “The variation of certain speculative prices,” Journal of business,
36(4), 394–419.
Paolella, M. (2007):
Interscience.
Intermediate Probability:
A Computational Approach. Wiley-
Patton, A. (2002): “Applications of Copula Theory in Financial Econometrics,” Ph.D. thesis,
University of California, San Diego.
Prakash, A., C. Chang, and T. Pactwa (2003): “Selecting a portfolio with skewness: recent
evidence from US, European, and Latin American equity markets,” Journal of Banking and
Finance, 27(7), 1375–1390.
Prause, K. (1999): “The generalized hyperbolic model: Estimation, financial derivatives, and
risk measures,” Ph.D. thesis, University of Freiburg.
Premaratne, G., and A. Bera (2000): “Modeling asymmetry and excess kurtosis in stock
return data,” Illinois Research.
Rockinger, M., and E. Jondeau (2002): “Entropy densities with an application to autoregressive conditional skewness and kurtosis,” Journal of Econometrics, 106(1), 119–142.
Ross, S. (1976): “The arbitrage theory of capital asset pricing,” Journal of economic theory,
13(3), 341–360.
Schmidt, R., T. Hrycej, and E. Stützle (2006): “Multivariate distribution models with
generalized hyperbolic margins,” Computational statistics and data analysis, 50(8), 2065–
2096.
van der Weide, R. (2002): “GO-GARCH: a multivariate generalized orthogonal GARCH
model,” Journal of Applied Econometrics, 17(5), 549–564.
(2004): “Wake me up before you GO-GARCH,” Computing in Economics and Finance
2004.
van der Weide, R., and P. Boswijk (2008): “Method of moments estimation of GO-GARCH
models,” mimeo.
White, H. (2000): “A reality check for data snooping,” Econometrica, 68(5), 1097–1126.
Wilhelmsson, A. (2009): “Value at Risk with time varying variance, skewness and kurtosis–the
NIG-ACD model,” Econometrics Journal, 12(1), 82–104.
Ye, Y. (1997): Interior Point Algorithms: Theory and Analysis. John Wiley and Sons, New
York.
29
Zhang, K., and L. Chan (2009): “Efficient factor GARCH models and factor-DCC models,”
Quantitative Finance, 9(1), 71–91.
30