Estimating Expected Shortfall Using a

Estimating Expected Shortfall Using a Conditional
Autoregressive Model: CARES
Yin Liao and Daniel Smith
March 23, 2014
Abstract
In financial risk management, the expected shortfall (ES) becomes an increasingly popular downside risk measure due to its desired sub-additivity property,
which is lacking in the Value at Risk (VaR). This paper propose a new conditional
autoregressive model to estimate ES. This model specifies the evolution of the ES
over time using an autoregressive process and estimates the model parameters by
jointly solving two minimization problems. We derive asymptotic properties of
the model estimators and illustrate attractive finite sample property of the model
throughout a simulation study. As an illustration, we apply the model to evaluate
the ES of stock market index and individual stocks.
1
Introduction
In the past few years, financial markets have been experiencing an unprecedented crisis.
This turmoil has emphasized the need for accurate risk measures for financial institutions. Value at risk (VaR), a measure of how much an asset or a certain portfolio can
lose within a given time period for a given confidence level, has gained a great popularity
among financial practitioners due to its conceptual simplicity. However, VaR has several
shortcomings, which have been criticized for long. First, it reports only a quantile of the
return distribution and disregards the expected loss beyond the quantile. In addition,
VaR is not a coherent risk measure because it fails to be subadditive. In order to deal
with these conceptual issues caused by VaR, Artzner, Delbaen, Eber, and Heath (1999)
introduced a new measure of financial risk referred to as the expected shortfall (ES).
1
ES is defined as the conditional expectation of the return given that it exceeds the VaR.
In more detail, let Xt , t = 1, ..., n denotes the price of an asset or a portfolio over n
periods, and yt = −log(Xt /Xt−1 ) is the negative log return over the tth period. Suppose
yt , t = 1, ..., n is a stationary process with marginal distribution function F , the VaR at
a given probability τ is V aRτ = inf{u : F (u) ≥ τ }, which is the τ th quantile of the
distribution function F , and the ES associated with the probability τ is consequently
defined as ESτ = E(yt |yt > V aRτ ). ES is a risk measure that overcomes the weaknesses
of VaR, and becomes increasingly widely used in the market.
Despite ES is conceptually superior to VaR, its modeling and measuring is still an ongoing research problem without reaching a consensus. Because the ES is simply the
expected loss beyond an extreme value measured by VaR, its estimation cannot be independent of measuring VaR. Meanwhile, as the distribution of the returns typically
changes over time, the challenge of measuring ES is to find a suitable way to simultaneously model time-varying conditional VaR, as well as time-varying conditional expectation of exceedances beyond the VaR. Therefore, any reasonable methodology should
provide formulas for calculating V aRt and ESt as a function of variables known at time
t − 1 and a set of parameters that need to be estimated, as long as a procedure to
estimate the set of unknown parameters.
Most of the existing models for calculating VaR and ES focus on modeling the whole
time-varying distribution of the return, then computing the corresponding quantile as
VaR and the expected value beyond the quantile as ES. A recent development in the
VaR literature is the conditional autoregressive value at risk (CAViaR) class of models
(see Engle and Manganelli (2004)). It specifies the evolution of the quantile over time
using a special type of autoregressive process, and estimates the parameters with regression quantiles. This approach has strong appeal in that it focuses the tail of return
distribution directly and does not rely on any distributional assumption.
We follow this line to propose a conditional autoregressive specification for ES, which we
call conditional autoregressive expected shortfall(CARES) model. It specifies the evolution of the ES itself over time and estimates the unknown parameters by minimizing the
loss function E(I(yt < V aRt (τ )) · (yt − ESt (τ ))2 ) throughout jointly solving two minimization problems. The first is a quantile estimation problem, and the objective function
P
is, min T1 Tt=1 (τ − I(yt < ft (β)))(yt − ft (β)), where ft (β) is a dynamic specification for
β
V aRt , and β is the parameter vector. The second is a least square problem that only
P
focuses on the tail of the return distribution as min T1 Tt=1 I(yt < ft (β))(yt − gt (γ))2 ,
γ
where gt (γ) is a dynamic specification for ESt , and γ is the parameter vector. The
first order conditions of the two minimization problems imply two moment conditions
2
for the unknown parameters β and γ, so that the model parameter estimators can be
regarded as generalized moment of method (GMM) type estimators. Therefore, we extend the standard asymptotical theory of GMM estimator to provide the consistency
and asymptotic results for our model parameter estimators. Meanwhile, we conduct
a simulation study to exam the finite sample property of the new model. Comparing
with several widely used ES estimation approaches, the CARES model is able to provide
better out-of-sample ES forecasts.
Lastly, it is worth noting that Taylor (2008) developed a conditional ES modeling (and
they call CARE model) which is similar to our CARES model. It also specifies a conditional dynamic model for ES itself and estimate ES throughout estimating the model
parameters. However, the CARE model differs from our CARES model in essence and
has one more parameter which brings in extra estimation uncertainty. Taylor (2008)
uses expectile as quantile estimator (and VaR) given that there is a corresponding αth expectile for each τ th -quantile, then links the conditional ES to the conditional expectile
α
throughout ESt τ = (1 + (1−2α)τ
)V aRt (τ ) to obtain the parameter estimators of the
CARE model. Therefore, the success of the CARE model for VaR and ES estimation
relies on the value of α that one selects to ensure the proportion of the observations lying
below the conditional αth expectile is τ . Meanwhile, as the distribution of the return
is time-varying, the value of α corresponding to a given τ varies over time. The need
to estimate α at each time point makes the CARE model appear to be not only more
computationally demanding, but also associated with more estimation errors from the
uncertainty of α. We will further illustrate how the extra estimation procedure caused
by α influences the model forecasting performance in Section 4. (Daniel definitely can
add more for this model and explain the shortcoming of this model better)
The rest of the article is structured as follows. Section 2 reviews the current approaches
to ES estimation, and Section 3 introduces the CARES model and establishes consistency
and asymptotic normality for the model estimator. Section 4 conducts a Monte Carlo
simulation to study the finite sample property of the CARES model, and compares its
out-of-sample forecasts with other commonly used ES estimation approaches. Section 5
presents an empirical application to real data. Section 6 concludes the article.
2
Expected Shortfall Models
The existing approaches for calculating expected shortfall mainly focus on modeling the
whole return distribution, and then derive the ES from the distribution. These approaches can be divided into three different categories: parametric, semiparametric, and
3
nonparametric. Parametric approaches involve a parameterization of the time-varying
stochastic behavior of financial asset prices. Conditional VaR and ES are estimated
from a conditional volatility forecast with an assumption of the asset return distribution. GARCH models are often used to forecast the volatility and the distribution is
typically assumed to be Gaussian or the student-t.
Turning to the nonparametric methods, they estimate the distribution of asset returns
based on the data without any assumptions. The VaR and ES are naturally calculated as
the quantile of the estimated distribution and the corresponding expected loss. The most
widely used methods so far is historical simulation and kernel smoothing estimation.
The both approaches require no distribution assumptions. The former estimates the
VaR as the quantile of the empirical distribution of historical returns from a moving
window of the most recent observations, and the ES can be estimated as the mean of
the returns that exceed the VaR estimates. The latter uses kernel smoothing technique
with an optimal bandwidth on historical returns to estimate the conditional distribution
of returns, then VaR and ES can be sequently estimated.
There are, however, two severe problems with the above approaches. On one hand, an
assumption on the return distribution must be invoked in parametric approach. Unfortunately, the assumption imposed is usually at odds with the real data. On the other
hand, nonparametric method is notoriously hard to be applied with little data. Meanwhile, it assumes that returns are independent and identically distributed, and hence
does not allow for time-varying volatility.
Semiparametric approaches consequently emerge to solve these problems. These approaches include those based on extreme value analysis and quantile or expectile regression. A recent proposal for VaR using quantile regression is the class of CAViaR models
introduced by Engle and Manganelli (2004). Kuan, Yeh, and Hsu (2009) proposed the
expectile-based VaR estimation(EVaR), which is more sensitive to the magnitude of extreme losses than the quantile-based VaR (QVaR). However, an undesirable property
of these models is that it is not clear how to estimate the corresponding ES. Taylor
(2008) extended the expectiles theory to deliver estimates for ES. This method firstly
builds up a conditional autoregressive expectile model for the estimation of VaR, then
convert the estimated conditional expectile to the conditional ES throughout a specific
function. Although this model is good to allow for the time-varying property without
any distribution assumption, an extra parameter which links the expectile to quantile
involved in the model increases the estimation uncertainty.
4
3
3.1
CARES model
Model Description
In this section, we propose a new approach to ES estimation. In contrast to modeling the
whole distribution or modeling the quantile, we model the ES directly. Before presenting
the model, it is worth to explain the rationale behind it. Assuming that an asset return rt
follows a Gaussian distribution with mean µt and standard deviation σt , and φ(rt ; µt , σt2 )
and Φ(rt ; µt , σt2 ) are respectively the density and distribution functions of the return, we
have the V aRt for a given probability τ is
V aRt = µt + σt Φ[−1] (τ ),
(1)
and the corresponding ES is
ESt = E(rt |rt ≤ V aRt )
VZaRt
φ(rt ; µt , σt2 )
rt
=
dr
Φ(V aRt ; µt , σt2 )
−∞
= µt + −σt2
= µt − σt2
V aR
φ(rt ; 0, σt2 ) t
Φ(V aRt ; 0, σt2 ) −∞
φ(V aRt ; 0, σt2 )
.
Φ(V aRt ; 0, σt2 )
Since (V aRt − µt )/σt = Φ[−1] (τ ), we can rearrange the above equation as below
ESt = E(rt |rt ≤ V aRt ) = µt − σt ·
φ(Φ[−1] (τ ))
.
τ
(2)
So far, it is clear that both V aRt and ESt are proportional to the standard deviation
σt , which suggests that the evolutions of both V aRt and ESt are triggered by the timevarying volatility, and the same functional form for VaR and ES would be appropriate.
Please note that we use Gaussian distribution as an example to reveal the linkage between
the volatility and ES (or VaR), and the relationship displayed here should be held in
any distribution with different functional forms. Consequently, we propose a conditional
autoregressive model for ES to formalize its dynamic characteristics, and the model is
referred to as CARES.
Recalling the CAViaR model of Engle and Manganelli (2004) in which the conditional
quantile is specified as an autoregressive function ft (β) that depends on the parameter
5
vector β as
ft (β) = β0 +
q
X
βi · ft−i (β) +
i=1
r
X
βq+i · l(xt−i ),
(3)
i=1
where βi ft−i (β), i = 1, ..., q are the autoregressive terms, which ensure that the quantile
changes smoothly over time, and the role of l(xt−j ) is to link ft (β) to observable variables
that belong to the information set. Some examples of CAViaR model are as follows:
Symmetric absolute value:
ft (β) = β1 + β2 ft−1 (β) + β3 |rt−1 |
Asymmetric slope:
ft (β) = β1 + β2 ft−1 (β) + β3 (rt−1 )+ + β4 (rt−1 )−
Indirect GARCH(1,1):
2
2
ft (β) = (β1 + β2 ft−1
(β) + β3 rt−1
)1/2
Therefore, we introduce a similar model for ES as
gt (γ) = γ0 +
q
X
γi · gt−i (γ) +
i=1
r
X
γq+i · m(xt−i ),
(4)
i=1
where γi gt−i (γ), i = 1, ..., q are the autoregressive terms, and m(xt−j ) term is used to
link gt (γ) to observable variables that belong to the information set. Some examples
of the CARES model can be easily obtained from the above three CAViaR models by
using their ES analogies.
3.2
Model Estimation
Next, we estimate the parameters in the CARES model by jointly solving two problems.
Assuming that the level τ quantile of a sample of return observations y1,..., yT follows
CAViaR model, that is,
V aRt (τ ) = ft (β0 (τ )),
(5)
where f is assumed known up to the vector of parameters β0 , and the corresponding ES
depends on another vector of parameters γ0 as
ESt (τ ) = gt (γ0 (τ )),
6
(6)
then both the τ quantile and the ES can be defined by θ0 = (θ01 , θ02 ) = (β0 (τ )0 , γ0 (τ )0 )
that minimizes the loss function
E(I(yt < V aRt (τ )) · (yt − ESt (τ ))2 ).
(7)
The estimator for θ0 , denoted as θ̂, then can be obtained by minimizing the sample
counterpart
T
X
−1
I(yt < V aRt (τ )) · (yt − ESt (τ ))2
(8)
T
t=1
throughout a two stage procedure.
In the first stage we estimate equation (5) by solving
min
β
T
1X
(τ − I(yt < V aRt (τ ))) · (yt − V aRt (τ ))
T t=1
(9)
to obtain β̂ and V aRˆt (τ ). In the second stage, the estimated V aRt (τ ) is used as an
observation to estimate the parameters of (6) by solving
T
1X
min
I(yt < V aRˆt (τ )) · (yt − ESt (τ ))2 .
γ T
t=1
(10)
Alternatively, the parameters in the two equations (5) and (6) can be jointly estimated by
solving the two problems (9) and (10) together. The two first order conditions involved
here are
!
PT
1
∇ f (β(τ )) · (τ − I(yt < ft (β(τ )))) = 0
PTT t=1 β t
(11)
1
∇
g
(γ(τ
))
·
(y
−
g
(γ(τ
)))
·
I(y
<
f
(β(τ
))
=
0,
γ
t
t
t
t
t
t=1
T
where
∇ft (β) =
d
ft (β),
dβ
(12)
∇gt (γ) =
d
gt (γ).
dγ
(13)
and
Therefore, θ̂ is actually the resulting generalized method of moment (GMM) estimator
given two moment conditions implied by the above two first order conditions. Then, the
asymptotic distribution of θ̂ can be established within the GMM framework.
Theorem 3.1 and Theorem 3.2 show that the GMM estimator θ̂ is consistent and asymptotically normal. Theorem 3.3 provides a consistent estimator of the variance-covariance
matrix. The related assumptions and detailed proof are provided in Appendix A.
7
Theorem 3.1. (Consistency) Under assumptions 6.1 and 6.2, we have
P
ˆ ) −→
θ(τ
θ0 (τ )
as T −→ ∞.
Proof. See Appendix A.
Theorem 3.2. (Asymptotic normality) Given assumptions 6.3- 6.5, we have
√
D
T (θ̂ − θ0 ) −→ N (0, Σ(θ0 ))
(14)
as T −→ ∞,
0
where Σ(θ0 ) = D(θ0 )−1 S(θ0 )(D(θ0 )−1 ) with
D11 D12
D(θ0 ) =
D21 D22
0
−E(∇β ft (β0 (τ ))∇β ft (β0 (τ )) h(0))
0
=
0
0
E(∇γ gt (γ0 (τ ))∇β ft (β0 (τ )) (ft (β0 (τ )) − gt (γ0 (τ )))h(0)) −E(∇γ gt (γ0 (τ ))∇γ gt (γ0 (τ )) τ )
(15)
and
S11 S12
S(θ0 ) =
S21 S22
0
τ (1 − τ )E(∇β ft (β0 (τ ))∇β ft (β0 (τ )) )
0
=
0
0
E(∇gt (γ(τ ))∇γ gt (γ0 (τ )) ) · T V
(16)
where h(.) is the density function, and T V = E((yt − gt (γ0 (τ ))2 · I(yt − ft (β0 (τ )) <
0)).
Proof. See Appendix A. The basic idea is that we approximates the (discontinuous)
gradient of the objective function by its continuously differentiable expectation, and then
relates this approximation to the asymptotic first-order condition to set the approximation of the gradient asymptotically equal to zero. So that the standard Taylor expansion
can be implemented to derive the asymptotic theory of the parameter estimators. The
way for obtaining such an approximation is provided by the theorem of Huber (1967).
This technique is widely used in the quantile and expectile regression. See Engle and
Manganelli (2004) and Kuan et al. (2009) for some recent applications.
8
Theorem 3.3. (Variance-covariance matrix estimation) Under assumptions and the
conditions of Theorem 3.1 and Theorem 3.2, the asymptotic variance-covariance matrix
ˆ = D(θ)
ˆ −1 S(θ)
ˆ D(θ)
ˆ −1 , where
Σ(θ) can be consistently estimated by Σ(θ)
#
"
ˆ(θ) D12ˆ(θ)
D
11
ˆ =
D(θ)
D21ˆ(θ) D22ˆ(θ)
"
#
P
ˆ ))∇β ft (β(τ
ˆ ))0 h(0)
− T1 Tt=1 ∇β ft (β(τ
0
PT
1
ˆ ))∇β ft (β(τ
ˆ ))0 (ft (β(τ
ˆ )) − gt (γ(τ
ˆ )))h(0) − 1 PT ∇γ gt (γ(τ
ˆ ))∇γ gt (γ(τ
ˆ ))0 τ
∇γ gt (γ(τ
t=1
t=1
T
T


P
ˆ ))∇β ft (β(τ
ˆ ))0 I(|yt − ft (β(τ
ˆ ))| < cT )
− 2T1cT Tt=1 ∇β ft (β(τ
0
P

ˆ ))
− T1 Tt=1 ∇γ gt (γ(τ
 1 PT

0
ˆ
ˆ
ˆ
ˆ
ˆ
0
t=1 ∇γ gt (γ(τ ))∇β ft (β(τ )) (ft (β(τ )) − gt (γ(τ )))I(|yt − ft (β(τ ))| < cT )
2T cT
ˆ
·∇γ gt (γ(τ )) τ
P
→
" D(θ0 ),
#
ˆ
ˆ
S
(θ)
S
(θ)
12
ˆ = 11
S(θ)
S21ˆ(θ) S22ˆ(θ)
 PT

1
ˆ ))∇β ft (β(τ
ˆ ))0
τ
(1
−
τ
)∇
f
(
β(τ
0
β
t
t=1
PT
T
1
ˆ
ˆ 0
ˆ 2

t=1 ∇gt (γ(τ ))∇γ gt (γ(τ )) )(yt − gt (γ(τ )) 
T
0
ˆ )) < 0)
·I(yt − ft (β(τ
P
→ S(θ0 ),
(17)
where cT is a bandwidth, which can be defined by two ways. The first is the k-nearest
neighbor estimator used in Engle and Manganelli (2004), with k = 40 for 1% VaR and
ES and k = 60 for 5% VaR and ES. We follow Koenker (2005) to define the other one
as
cT = ŝ(Φ−1 (τ + hT ) − Φ−1 (τ − hT )),
(18)
4
−1
4.5φ (Φ (t)) 1/5
where ŝ = min(SD(yt − ft (β̂)), IQR(yt − ft (β̂)))/1.34, and hT = T −1/5 [ (2Φ
−1 (t)2 +1)2 ]
2
−1
1.5φ (Φ (τ )) 1/3
following Bofinger (1975), or hT = T −1/3 Φ−1 (1 − 0.025)2/3 [ (2Φ
following Hall
−1 (τ )2 +1)2 ]
and Sheather (1988).
The proof of this Theorem (including the assumptions) is quite similar to Theorem 3 of
Engle and Manganelli (2004). We omit the details here.
Meanwhile, we undertake a small simulation study to investigate the finite sample property of the model parameter estimators, and observe the behavior of these estimators as
the sample size increases.
9
To do this, we generate an asset or portfolio’s return from a GARCH(1,1) model
σt2
rt = σt zt ,
2
2
= a0 + a1 rt−1
+ a2 σt−1
,
where the parameters are set to be a0 = 0.025, b0 = 0.0500, c0 = 0.9250, and the disturbance zt follows a standard Gaussian distribution. Based on the relationship between
the conditional VaR/ES and the standard deviation of the return, as shown in Section
2, we are able to derive the true values for the parameters of the indirect GARCH(1,1)
specification of the above model as β0 = a0 (Φ−1 (τ ))2 , β1 = a2 , β2 = a1 (Φ−1 (τ ))2 ,
γ0 = a0 (−φ(Φ−1 (τ ))/τ )2 , γ1 = a2 , γ2 = a1 (−φ(Φ−1 (τ ))/τ )2 , where Φ and φ are cumulative density function and probability density function of the standard Gaussian
distribution, and τ is the coverage probability. See Appendix B for the derivation details.
We generate 10000 samples of size 1000, 2000, 5000 and 10000 from the above GARCH(1,1)
model with the initial values of the return and volatility drawn from the corresponding
unconditional distributions implied by the model. For each sample, we estimate the
parameters of the above model when coverage probability is 5% or 1% by the two stage
procedure, and the mean and standard deviation of the estimator for each parameter
computed from 10000 replications for different sample size are respectively reported in
Table 1 panel A and panel B.
It is important to note that the performance of the estimator is quite good even when
the sample size is moderate (T = 1000), and the bias and standard deviations of the
estimators decline as expected with the sample size. It is apparent that each parameter
estimator is converging to the true value of the parameter as T increases, which verifies the consistency of the estimators. Meanwhile, we calculate the average theoretical
standard error of each parameter estimator (the number reported with square brackets
in Table 1) by using the estimated value of parameters from each simulation with the
asymptotical theory provided above, and compare it with the standard deviation of each
parameter estimator across the replications of simulation (The numbers reported with
parentheses in Table 1). The fact that the two standard errors are quite close prove
the validity of the asymptotic distribution we derived above for the model parameter
estimators .
Moreover, in order to investigate the degree of efficiency loss in CARES model estimation, we alternatively compute the theoretical standard error of parameter estimators
in CARES model by relying on asymptotical standard error of the above GARCH(1,1)
model parameters (a0 , a1 , and a2 .), with an appropriate scaling based on the relationship
between GARCH(1,1) model parameters and CARES model parameters. With 10000
10
11
b
a
The standard deviation of the parameter estimators across the simulation.
The average theoretical standard error.
Panel A: τ = 0.05
Sample Size
T = 1000 T = 2000 T = 5000
T = 10000
True Parameters
Mean Estimated Parameter (Standard Deviation)
0.0790
0.0743
0.0703
0.0685
a
(−1)
2
(0.0470)
(0.0329)
(0.0194)
(0.0140)
β0 = a0 (Φ (τ )) = 0.0676
b
[0.0356]
[0.0314]
[0.0192]
[0.0123]
0.9219
0.9231
0.9246
0.9248
β1 = a2 = 0.9250
(0.0248)
(0.0173)
(0.0108)
(0.0086)
[0.0211]
[0.0111]
[0.0102]
[0.0082]
0.1310
0.1332
0.1333
0.1348
(−1)
2
β2 = a1 (Φ (τ )) = 0.1353
(0.0561)
(0.0388)
(0.0248)
(0.0171)
[0.0402]
[0.0285]
[0.0214]
[0.0141]
0.2119
0.1555
0.1216
0.1139
(−1)
2
γ0 = a0 (−φ(Φ (τ ))/τ ) = 0.1064 (0.1881)
(0.0914)
(0.0432)
(0.0283)
[0.1054]
[0.0825]
[0.0394]
[0.0223]
0.8924
0.9093
0.9199
0.9223
γ1 = a2 = 0.9250
(0.0496)
(0.0264)
(0.0147)
(0.0103)
[0.0309]
[0.0213]
[0.0134]
[0.0101]
0.2382
0.2276
0.2186
0.2163
(−1)
2
γ2 = a1 (−φ(Φ (τ ))/τ ) = 0.2127 (0.0938)
(0.0633)
(0.0401)
(0.0294)
[0.0828]
[0.0529]
[0.0392]
[0.0261]
Table 1: Finite Sample Property of Each Parameter Estimator of CARES Model
12
Panel B: τ = 0.01
Sample Size
T = 1000 T = 2000 T = 5000
T = 10000
True Parameters
Mean Estimated Parameter (Standard Deviation)
0.1534
0.1531
0.1420
0.1398
(−1)
2
β0 = a0 (Φ (τ )) = 0.1353
(0.0989)
(0.0815)
(0.0523)
(0.0366)
[0.2360]
[0.1218]
[0.0685]
[0.0281]
0.9248
0.9221
0.9239
0.9243
β1 = a2 = 0.9250
(0.0279)
(0.0217)
(0.0144)
(0.0099)
[0.0489]
[0.0250]
[0.0141]
[0.0099]
0.2492
0.2658
0.2684
0.2689
(−1)
2
β2 = a1 (Φ (τ )) = 0.2706
(0.1395)
(0.0963)
(0.0611)
(0.0312)
[0.1826]
[0.1175]
[0.0780]
[0.0302]
0.2342
0.2139
0.1897
0.1852
(−1)
2
γ0 = a0 (−φ(Φ (τ ))/τ ) = 0.1776 (0.2089)
(0.1382)
(0.0762)
(0.0406)
[0.3049]
[0.2156]
[0.0950]
[0.0404]
0.9151
0.9191
0.9229
0.9238
γ1 = a2 = 0.9250
(0.0388)
(0.0144)
(0.0139)
(0.0096)
[0.0711]
[0.0308]
[0.0217]
[0.0095]
0.3564
0.3559
0.3555
0.3553
(−1)
2
γ2 = a1 (−φ(Φ (τ ))/τ ) = 0.3552 (0.2177)
(0.1416)
(0.0860)
(0.0498)
[0.4288]
[0.2618]
[0.1073]
[0.0470]
samples and sample size T = 10000, the implied standard errors of the parameter estimators in CARES model from GARCH(1,1) model should be 0.0124 for β0 , 0.0081 for
β1 , 0.0134 for β2 , 0.0194 for γ0 , 0.0081 for γ1 and 0.0211 for γ2 when τ = 0.05, and
0.0248 for β0 , 0.0081 for β1 , 0.0269 for β2 , 0.0325 for γ0 , 0.0081 for γ1 and 0.0353 for γ2
when τ = 0.01. The fact that the implied standard errors reported here are quite close
to ones reported in Table 1 based on the simulation suggests that our procedure is able
to efficiently estimate CARES model.
4
Simulation Study
To illustrate the finite sample property of the CARES model, we conduct some simple
simulation studies to explain its superiority to other popular models with respect to
ES forecast. In all cases, performance is measured in terms of root mean squared error
(RMSE). The
q RMSE of an ES forecasting ÊS from an arbitrary model has the standard
ˆ 2 ), where ES is the true value of ES. In all the following
definition E((ES − ES)
simulations, the RMSE is approximated by square root of averaging 104 realizations of
(ES − ÊS)2 .
We begin with generating data from a simple model. Assuming that an asset or portfolio’s return follows the GARCH(1,1) model as described in Section 3.2. This model
allows for time-varying volatility, and thereby the time-varying VaR and ES. Figure 1
displays the empirical density of an asset return obtained from this model by setting the
parameters as a0 = 0.025, b0 = 0.0500, and c0 = 0.9250. Compared with the standard
Gaussian density, the density of GARCH(1,1)-GAUSSIAN model has a fatter tail.
In order to study the out-of-sample ES forecasting properties of the CARES model, we
simulate data from the above model with sample size T + 500 + 1 = 751, T + 500 + 1 =
1001, T + 500 + 1 = 1501 and T + 500 + 1 = 2501. For every sample, the first 500
observations are discarded in order to allow for a sufficiently long burn-in period. Then,
we use the first T observations to fit the three CARES models as discussed in Section 3,
and leave the (T + 1)th observation for the one-step-ahead out-of-sample 1% ES and 5%
ES forecasting evaluation. The RMSE of forecasting is computed through replicating
the simulation by 104 times. Figure 2 shows the 5% ES forecasts from the CARES
model (the indirect GARCH(1,1) specification) against the true value of 5% ES for the
GARCH(1,1)-GAUSSIAN model when the sample size is 2000. We observe that the true
value of ES exhibits a strong dynamic clustering, and the 5% ES forecasts from the two
specifications are able to capture this pattern and fit the true value of ES very well.
13
0.4
GARCH(1,1)−GAUSSIAN density
Standard GAUSSIAN density
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−8
Figure 1:
model
−6
−4
−2
0
2
4
6
8
The density of an asset return obtained from a GARCH(1,1)-GAUSSIAN
For comparison, we also compute the RMSE of ES forecasts from two commonly used
nonparametric methods and the CARE model in Taylor (2008). The first nonparametric
method is historical simulation (HS). By assuming asset returns are independent and
identically distributed, HS obtains empirical distribution of the return from past observations, and calculate a certain percentile of the empirical distribution and the expected
loss beyond this percentile as the corresponding VaR and ES measures for next period.
As the performance of HS largely depends on the length of historical data used to form
the empirical distribution, we vary the length from past 250 observations to past 500
observations in our simulations. The second is kernel-based nonparametric ES estimator
(KDE) (see Scaillet (2004). The KDE uses the historical returns r1, r2,..., rn as the sample,
P
and takes the form ESKDE = (np)−1 nt=1 rt Gh (V aRKDE − rt ), where V aRKDE is the
Rt
kernel based VaR estimator, Gh (t) = G(t/h), and G(t) = −∞ K(u)du. K and h are the
standard Gaussian Kernel and the optimal bandwidth. As the standard KDE estimator
is usually to be biased1 , in the simulation we implement jackknife technique to correct
bias on this estimator. In CARE model, we can have the relationship between α and τ
(1% or 5%) in a close form as we know the return follows a GARCH(1,1)-GAUSSIAN
1
See Theorem 2 of Chen (2008) for more discussion about this and the detailed expression of the
bias.
14
−1
−1.5
−2
−2.5
−3
−3.5
0
CAViaR−IGARCH−VaR
True VaR
500
1000
1500
1000
1500
−1
−1.5
−2
−2.5
−3
−3.5
0
CARES−IGARCH−ES
True ES
500
Figure 2:
The ES forecasts of CARES models v.s true ES for a GARCH(1,1)GAUSSIAN model
15
process. In other word, the true value of α is known. So we provide ES forecasts from
the CARE model under two scenarios when α is estimated (that is, α̂) by using grid
search2 and when α takes its true value α0 . The difference between the ES forecasts
from the two scenarios is helpful for us to understand the extra uncertainty introduced
by the estimation of α.
The results of 5% and 1% ES forecasts are respectively shown in the Table 23 . When
sample size is small (T = 250), the CARES models perform even worse than historical
simulation and nonparametric estimators. This is not surprising, as we use tail observations to fit the CARES models, and the tail observations are very less when sample size
and coverage probability are both small. The data limitation outweighs the advantage
of dynamic specification in the CARES models, which results in the inferiority of the
CARES models to the historical simulation and kernel based nonparametric estimators
in these cases. This is further corroborated by the fact that 1% ES forecasts from the
CARES models are more worse than 5% ES forecasts.
However, the CARES models exhibit superior performance to HS and KDE methods
when the sample size increases. The RMSE of 5% ES forecasts from the CARES models are smaller than those from the historical simulation and nonparametric estimators
when sample size increases to 500, and the RMSE of 1% ES forecasts from the CARES
models are smaller than those two methods when sample size increases to 1000. The
advantage of the CARES models become more obvious as the sample size increases to
2000. Compared with the CARE model, when alpha takes its true value α0 , the performances of our CARES models are quite similar. However, when we use the estimated
value (α̂) of alpha, any version of our CARES models perform better than the CARE
models regardless of the sample size. These results show the evidence that the need to
estimate α for a certain coverage probability level introduces an extra estimation error
and deteriorates the forecasting performance of the CARE model.
We study another two examples to further investigate the advantage of our CARES
model, particularly when the data is from a comprehensive distribution. Consider a
2
We follow Taylor (2008) to find the optimal value of α by estimating models for different values of
α over a grid with step size of 0.0001. The final optimal value of α was derived by linearly interpolating
between grid values.
3
In consistent with Taylor (2008), we found that the asymmetric slope CARE model and CARES
model were outperformed by the symmetric versions of these models. So in the remainder of this paper,
we do not consider further the asymmetric slope version of these models.
16
17
b
Historical simulation with the recent 250 observations.
Historical simulation with the recent 500 observations.
c
Kernel based ES estimator.
d
Kernel based ES estimator with Jackknife bias correction.
e
CARE model with symmetric absolute value specification.
f
CARE model with indirect GARCH(1,1) specification.
g
CARES model with symmetric absolute value specification.
h
CARES model with indirect GARCH(1,1) specification.
a
T=250
T=500
Bias
RMSE
Bias
RMSE
a
HS(250)
-0.0212 0.3332 -0.0297 0.3413
b
HS(500)
NA
NA
-0.0438 0.3298
c
KDE
0.1015 0.3466 0.0391 0.3387
d
KDE-JK
-0.0301 0.3351 -0.0109 0.3347
e
CARE-SAV(α0 )
0.0205 0.3571 0.0067 0.2545
f
CARE-IG(α0 )
0.0232 0.3079 0.0021 0.2534
CARE-SAV(α̂)
0.0234 0.4036 0.0077 0.2895
CARE-IG(α̂)
0.0264 0.3481 0.0023 0.2857
g
CARES-SAV
0.0207 0.3577 0.0068 0.2566
h
CARES-IG
0.0234 0.3085 0.0020 0.2532
1%
T=250
T=500
Bias
RMSE
Bias
RMSE
HS(250)
-0.0616 0.5060 -0.0690 0.5083
HS(500)
NA
NA
-0.0866 0.4691
KDE
0.1377 0.5142 0.1296 0.4682
KDE-JK
-0.1057 0.4905 -0.1216 0.4815
CARE-SAV(α0 ) -1.5641 2.5858 -0.4956 0.7421
CARE-IG(α0 ) -0.5295 1.3861 -0.3209 0.6170
CARE-SAV(α̂) -1.7797 2.6620 -0.5639 0.8356
CARE-IG(α̂)
-0.6025 1.5658 -0.3652 0.6975
CARES-SAV
-1.5770 2.5867 -0.4997 0.7405
CARES-IG
-0.5339 1.3876 -0.3236 0.6181
T=1000
Bias
RMSE
-0.0416 0.3595
-0.0449 0.3441
0.0127 0.3361
-0.0121 0.3336
-0.0020 0.2095
0.0054 0.1942
-0.0026 0.2360
0.0062 0.2209
-0.0023 0.2092
0.0055 0.1958
ES
T=1000
Bias
RMSE
-0.0675 0.5413
-0.0855 0.4961
-0.0428 0.4686
-0.0412 0.4873
-0.0594 0.4075
-0.0134 0.3795
-0.06759 0.4637
-0.0152 0.4280
-0.0599 0.4109
-0.0135 0.3793
5% ES
T=2000
Bias
RMSE
-0.0518 0.5253
-0.0506 0.4807
-0.0684 0.4629
-0.0647 0.4784
-0.0020 0.3090
0.0049 0.3086
-0.0023 0.3516
0.0055 0.3477
-0.0020 0.3116
0.0049 0.3082
T=2000
Bias
RMSE
-0.0215 0.3458
-0.0212 0.3307
0.0054 0.3327
-0.0501 0.3317
0.1729 0.1849
0.0325 0.1603
0.1967 0.2088
0.0370 0.1824
0.1743 0.1851
0.0328 0.1616
Table 2: ES forecasts When Data is from GARCH(1,1)-GAUSSIAN Model
normal mixture (NM)−stochastic volatility (SV) model as
2
2
rt ∼ N M (p1,..., pK ; µ1,..., µK ; σ1t
, ..., σKt
),
PK
i=1 pi = 1,
2
2
+ it , i = 1, ..., K,
σit = ωi + αi σit−1
(19)
where K represents the No. of components in the mixture normal distribution, and the
disturbance of volatility follows a standard Gaussian distribution. We use a NM(2)−SV
model in the simulation for simplicity. Following the EUR exchange rate analysis of
Alexander and Lazar (2006), the parameters are set to be p = 0.6927, ω1 = 0.000046,
α1 = 0.0248, ω2 = 0.0004, α2 = 0.1066, µ1 = −1,and µ2 = 1. The data generated from
this model has a skewed leptokurtic conditional density. Figure 3 displays the empirical
density of an asset return obtained from this model.
We investigate the forecasting performance of the CARES models by comparing their
RMSEs of 1% and 5% ES forecasts with those of historical simulation, kernel-based
nonparametric estimator and the CARE models when sample size is 500 + T + 1 = 751,
500+T +1 = 1001, 500+T +1 = 1501 and 500+T +1 = 2501. The first 500 observations
are discarded as a burn-in period, and then we use the next first T observations to fit
the CARES models, and leave the last observation for the one-step-ahead out-of-sample
1% ES and 5% ES forecasting evaluation. Table 3 present the simulation results. The
RMSEs of 5% and 1% ES from the CARES models are smaller than those of the two
nonparametric methods when sample size T is not less than 500. Moreover, we see
the advantage of the CARES models to the two nonparametric methods become more
pronounced as T increases. The reduction in RMSE from HS and KDE to the CARES
models represents the benefit of exploiting the dynamic pattern of the tail observations
in the ES forecasts. Again, the CARES models outperform the corresponding CARE
models4 in all the cases.
Let’s consider a GARCH(1,1) model with time-varying skewness and kurtosis as the last
example. This model is defined as follows:
σt2
= a0 +
rt = σt zt ,
+ +
−
2
b0 (rt−1 )2 + b−
0 (rt−1 )
2
+ c0 σt−1
.
The disturbance zt follows a generalized Student-t distribution
4
5
The true value of α is unknown when the return follows the normal mixture (NM)−stochastic
volatility (SV) model. So we only use the estimated value of alpha to implement the CARE models.
5
The density of generalized t distribution (GT) is defined by
(
1
2 −(η+1)/2
( bz+a
if z < −a/b
bc(1 + η−2
1−λ ) )
gt(z|η, λ) =
1
bz+a 2 −(η+1)/2
bc(1 + η−2 ( 1+λ ) )
if z ≥ −a/b,
18
19
T=250
Bias
RMSE
HS(250)
-0.8674 1.4642
HS(500)
NA
NA
KDE
-1.0164 1.5433
KDE-JK
-0.5730 1.5395
CARE-SAV -1.2134 2.0986
CARE-IG
-1.3125 2.1321
CARES-SAV -1.0397 1.8914
CARES-IG -0.7957 1.6220
T=250
Bias
RMSE
HS(250)
-0.5968 1.9950
HS(500)
NA
NA
KDE
-0.6286 1.2172
KDE-JK
-0.4028 1.2160
CARE-SAV 0.3125 1.2521
CARE-IG
0.2987 1.2529
CARES-SAV -0.5627 1.2299
CARES-IG -0.5588 1.2222
T=1000
Bias
RMSE
-0.6156 1.2430
-0.6003 1.2231
-0.6298 1.2423
-0.4992 1.2334
-0.6473 1.2323
-0.7665 1.2333
-0.5899 1.2189
-0.5907 1.2242
T=500
Bias
RMSE
-0.9180 1.5338
-0.8801 1.4517
-1.0292 1.5686
-0.7225 1.5435
-0.9982 1.4987
-0.8765 1.4890
-0.8267 1.4500
-0.8990 1.4541
T=1000
Bias
RMSE
-0.8926 1.5268
-0.8431 1.4364
-0.9487 1.4946
-0.7427 1.4850
-0.7896 1.4754
-0.8654 1.4896
-0.8290 1.4591
-0.8314 1.4549
1% ES
T=500
Bias
RMSE
-0.6505 1.2530
-0.6351 1.2315
-0.6726 1.2677
-0.4988 1.2520
0.6086 1.2701
0.6167 1.2767
-0.6237 1.2500
-0.6259 1.2541
5% ES
T=2000
Bias
RMSE
-0.8898 1.5377
-0.8414 1.4475
-0.9126 1.4746
-0.7719 1.4567
-0.8576 1.4621
-0.8976 1.4687
-0.8453 1.4389
-0.8486 1.4447
T=2000
Bias
RMSE
-0.6021 1.2431
-0.5953 1.2244
-0.6129 1.2272
-0.5152 1.2229
0.6437 1.2196
0.6018 1.2212
-0.5923 1.2154
-0.5924 1.2188
Table 3: ES forecasts When Data is from a NM(2)−SV model
1.8
Density of NM(2)−SV Model
Standard GAUSSIAN density
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
−5
−4
−3
−2
−1
0
1
2
3
4
5
Figure 3: The density of an asset return obtained from a NM(2)−SV model
with time-varying asymmetry parameter λt and tail-fatness parameter ηt as zt ∼ GT (zt |ηt , λt ),
where
+
− −
ηet = a1 + b+
et−1 ,
1 yt−1 + b1 yt−1 + c1 η
2
e
e
λt = a2 + b2 yt−1 + c2 λt−1 ,
(20)
ηt = g[2,+30] ηet ,
λt = g[−1,1] λet ,
and g represents the logistic map.
Following the S&P500 stock index return analysis of Jondeau and Rockinger (2003),
parameters of the model in our simulation are set to be a0 = 0.0074, b+
0 = 0.0384,
−
−
+
b0 = 0.0759, c0 = 0.9366, a1 = −0.5191, b1 = −0.5615, b1 = −0.0653, c1 = 0.5999,
a2 = −0.0062, b2 = 0.0626, c2 = 0.6961. This model is not only able to accommodate
the time-varying volatility, but also the time dependent higher order moments, skewness
and kurtosis. Figure 4 displays the empirical density of an asset return obtained from
this model. It is clear to see that this density exhibits an very unusual shape with strong
skewness and kurtosis, which is unable to be modeled by a simple distribution.
To study the forecasting performance of the CARES models, we compare their RMSEs of
1% and 5% ES forecasts with those of historical simulation, kernel based nonparametric
2
2
2
√ Γ((η+1)/2)
where a ≡ 4λc η−2
η−1 , b ≡ 1 + 3λ − a , c ≡
π(η−2)Γ(η/2)
20
.
1.4
GARCH model with time−varying
skewness and kurtosis
Standard GAUSSIAN density)
1.2
1
0.8
0.6
0.4
0.2
0
−6
−4
−2
0
2
4
6
Figure 4: The density of an asset return obtained from a GARCH model with timevarying skewness and kurtosis
estimator and the CARE models when sample size is 500+T +1 = 751, 500+T +1 = 1001,
500+T +1 = 1501 and 500+T +1 = 2501. In each simulation, the first 500 observations
are discarded as a burn-in period, the next first T observations are used to fit the CARES
model, and the last observation is left to do out-of-sample forecasting comparison. The
RMSE is computed by replicating the simulation by 104 times. The simulation results
are reported in table 4. Due to the lack of data, the RMSEs of 1% and 5% ES from
the CARES models are larger than those from the two nonparametric methods when
sample size is small (T = 250). However, with the increase of the sample size (even the
sample size is moderate (T = 500)), the CARES models perform better than the two
nonparamtric methods by showing the smallest RMSE. Intuitively, the reason is that the
CARES models specify a dynamic parametric structure for the tail observations, whereas
HS and KDE do not consider the time-varying volatility of the returns. Meanwhile, the
CARES models again outperform the corresponding CARE models6 in all the situations.
6
The true value of α is also unknown in this case. So we only use the estimated value of alpha to
implement the CARE models.
21
22
T=250
Bias
RMSE
HS(250)
-0.2034 1.2972
HS(500)
NA
NA
KDE
-0.0188 1.1222
KDE-JK
-0.0177 1.1184
CARE-SAV 1.8780 1.4538
CARE-IG
1.9629 1.4079
CARES-SAV -2.3325 1.3282
CARES-IG -2.2489 1.3477
HS(250)
HS(500)
KDE
KDE-JK
CARE-SAV
CARE-IG
CARES-SAV
CARES-IG
T=250
Bias
RMSE
-0.0238 0.4930
NA
NA
0.0108 0.4863
-0.0077 0.4862
0.3614 2.2076
0.3457 2.1987
-0.2914 0.7197
-0.3611 2.1917
T=1000
T=2000
Bias
RMSE
Bias
RMSE
-0.0015 0.3100 0.0525 0.2934
-0.0245 0.3095 0.0220 0.2648
-0.0930 0.3277 -0.0241 0.2721
-0.0844 0.3295 -0.0176 0.2724
0.4534 0.4621 0.3756 0.3702
0.5866 0.4637 0.3902 0.3678
-0.1002 0.3031 -0.0120 0.1683
-0.1583 0.3074 0.0102 0.1764
T=500
Bias
RMSE
0.0879 0.9296
0.1314 0.8265
0.1386 0.8289
0.1314 0.8264
1.6347 1.3014
1.7694 1.3231
-1.0849 1.2801
-1.4831 1.1887
T=1000
T=2000
Bias RMSE
Bias
RMSE
0.1112 1.0771 -0.0565 1.0329
0.1453 0.8101 0.0265 0.7536
0.1178 0.7989 -0.1107 0.5472
0.1127 0.7970 -0.1069 0.5475
0.6698 0.7503 0.2324 0.5381
0.7926 0.7761 0.1356 0.5247
0.0212 0.7129 -0.0872 0.5058
0.0163 0.7085 -0.1073 0.5027
1% ES
T=500
Bias
RMSE
-0.0482 0.4598
-0.0661 0.4575
-0.0685 0.4560
-0.0555 0.4579
0.0321 0.4507
-0.0487 0.4514
-0.0958 0.4300
-0.1031 0.4384
5% ES
Table 4: ES forecasts When Data is from a GARCH(1,1) model with time-varying skewness and kurtosis
5
Empirical Analysis
To implement our CARES model on real data, we conduct a simple empirical study to
assess the expected shortfall of some stock indices and individual stocks. We shall try
different CARES model specifications for each index and stock, and then evaluate both
in-sample and out-of-sample forecasting performance of these specifications.
5.1
Data
We consider two individual stocks, General Motors (GM) and IBM, and one stock index,
S&P500, to conduct empirical study. Following Engle and Manganelli (2004), we firstly
take a sample of 3,392 daily prices from Datastream for each of them, spanning from
April 7, 1986 to April 7, 1999, to see whether the ES estimates from our CARES model
can provide the same risk indication as the VaR estimates from CAViaR model did.
Secondly, we take a recent sample of daily prices from Wharton Research Data Services
(WRDS) for the above two stocks and one index, which ranges from Jan 1, 2005 to
Dec 31, 2011. This sample period undergoes the recent global financial crisis, and it is
useful to study whether these stocks and indices are more risky in the crisis time, and
our CARES model is able to capture this effect. The daily returns are computed as 100
times the difference of the log of the prices.
5.2
Empirical Results
For the first sample, we use the first 2,892 observations to estimate the CARES models,
and leave the last 500 observations for out-of-sample forecasting. We estimate 1% and
5% 1-day-ahead ESs, using the CARES specifications discussion in Section 3.1. The 5%
VaR and ES estimates for GM are plotted in Figure 5, and all of the estimation results
are reported in Table 5.
The top panel of Figure 5 is the plot of 5% VaR and ES estimates from CARES symmetric absolute value specification for GM 7 , and the bottom panel of Figure 5 is the
plot of 5% VaR and ES estimates from CARES indirect GARCH specification for GM.
We can see that the ES plot has a very similar pattern as the VaR plot, with spike at the
beginning of the sample indicating the 1987 crash, and the increase toward the end of
the sample, which reflects the increase volatility following the Russian and Asian crises.
7
The plot exhibits the same trend as Figure 1 in Engle and Manganelli (2004), and the only difference
is that VaR is reported as a negative number rather than positive one.
23
24
2.1648
(1.5962)
(1.4586)
(1.6313)
0.2143
(0.2300)
(0.2058)
(0.2391)
1.8901
(0.8261)
(0.8147)
(0.8416)
-0.0802
(1.2069)
(1.2582)
(1.1975)
0.9001
(0.9532)
(1.0297)
(1.0202)
0.6174
(4.7204)
(5.1286)
(5.1226)
b
k-nearest neighbour estimator
Koenker’s bandwidth with Bofinger’s hT
c
Koenker’s bandwidth with Hall and Sheather’s hT
a
1% ES
γ1
(Std1)a
(Std2)b
(Std3)c
γ2
(Std1)
(Std2)
(Std3)
γ3
(Std1)
(Std2)
(Std3)
-0.0440
(0.5396)
(0.6173)
(0.5791)
0.6981
(0.4026)
(0.4128)
(0.4085)
1.8698
(2.4376)
(2.4309)
(2.4376)
Symmetric Absolute Value
GM
IBM
S&P500
-0.9024
1.1747
-0.2470
(1.0177) (5.4666) (0.0703)
(1.0245) (9.1299) (0.0710)
(1.0161) (12.2810) (0.0705)
0.9171
0.9101
0.8919
(0.0878) (0.1034) (0.0214)
(0.0883) (0.1744) (0.0216)
(0.0876) (0.2327) (0.0215)
1.3730
0.9595
2.8273
(7.5860) (5.3782) (13.1030)
(7.5899) (5.4315) (13.0990)
(7.5832) (5.3252) (13.1010)
Indirect GARCH
GM
IBM
S&P500
Table 5: Estimation Results of the CARES models (Part A)
25
0.6092
(0.5707)
(0.9244)
(0.9737)
0.6488
(0.2449)
(0.3578)
(0.3710)
0.5755
(0.4205)
(0.5077)
(0.5125)
0.0717
(0.3806)
(0.3916)
(0.3896)
0.9190
(0.2156)
(0.2084)
(0.2095)
0.1924
(0.3890)
(0.3584)
(0.3629)
b
k-nearest neighbour estimator
Koenker’s bandwidth with Bofinger’s hT
c
Koenker’s bandwidth with Hall and Sheather’s hT
a
5% ES
γ1
(Std1)a
(Std2)b
(Std3)c
γ2
(Std1)
(Std2)
(Std3)
γ3
(Std1)
(Std2)
(Std3)
0.2367
(0.3854)
(0.4094)
(0.3812)
0.5610
(0.2775)
(0.2769)
(0.2588)
1.1267
(0.8126)
(0.8020)
(0.7828)
Symmetric Absolute Value
GM
IBM
S&P500
2.0529
0.6679
1.0458
(1.8022) (2.2157) (3.3992)
(2.1404) (1.0939) (3.1835)
(2.5239) (1.2997) (3.1485)
0.6886
0.8764
0.3302
(0.0993) (0.1320) (0.3156)
(0.1141) (0.0628) (0.2621)
(0.1336) (0.0756) (0.2628)
0.8747
0.3883
3.8413
(1.2039) (0.5361) (2.6250)
(1.2336) (0.4246) (2.6027)
(1.2024) (0.4433) (2.6138)
Indirect GARCH
GM
IBM
S&P500
Figure 5: 5% VaR and ES estimates from CARES models for GM
This shows that the ES estimates from the CARES model is able to produce the same
risk indication as VaR, and can be regarded as an alternative risk measure to the VaR
estimates from the CAViaR model.
For the second sample, we use the first 1,262 observations to estimate the CARES models, and still leave the last 500 observations for out-of-sample forecasting. We estimate
1% and 5% 1-day-ahead ESs, using the two CARES specifications discussion in Section
3.1. The 5% VaR and ES estimates for IBM and S&P 500 are respectively plotted in
Figure 6 and Figure 7. The VaR and ES estimates are reported as negative numbers in
these plots. The common spike in the middle of the sample (between the end of 2008
and 2009) is the global financial crisis, and the increase risk toward the end of the sample
reflects the recent Euro zone crisis.
All of the estimation results are reported in Table 6. The table presents the value of
the estimated parameters and the corresponding standard errors. The most striking
result is that the coefficient of the autoregressive term in CARES model is always very
significant. This confirms that the phenomenon of clustering of volatilities is relevant
also in the tails. (May Daniel can add more descriptions for the results!!)
26
Figure 6: 5% VaR and ES estimates from CARES models for IBM
Figure 7: 5% VaR and ES estimates from CARES models for S&P 500
27
28
0.1538
(0.2907)
(0.1524)
(0.3747)
0.8731
(0.0563)
(0.0629)
(0.0967)
0.5127
(0.3464)
(0.2868)
(0.1213)
0.3703
(0.5324)
(0.5650)
(0.5801)
0.7000
(0.3401)
(0.3631)
(0.3708)
0.4603
(0.4445)
(0.4727)
(0.4749)
b
k-nearest neighbour estimator
Koenker’s bandwidth with Bofinger’s hT
c
Koenker’s bandwidth with Hall and Sheather’s hT
a
1% ES
γ1
(Std1)a
(Std2)b
(Std3)c
γ2
(Std1)
(Std2)
(Std3)
γ3
(Std1)
(Std2)
(Std3)
0.0304
(0.0626)
(0.0592)
(0.0592)
0.9250
(0.1532)
(0.1272)
(0.1293)
0.2768
(0.6114)
(0.5070)
(0.5157)
Symmetric Absolute Value
GM
IBM
S&P500
0.5621
0.8743
0.3303
(0.1690) (0.8722) (0.2458)
(0.1440) (0.8765) (0.2391)
(0.1440) (0.8877) (0.2374)
0.9456
0.6490
0.8902
(0.0975) (0.1930) (0.0201)
(0.0974) (0.1966) (0.0210)
(0.0974) (0.2000) (0.0211)
0.5515
1.0359
0.7826
(0.3690) (0.5886) (0.6683)
(0.3690) (0.6323) (0.6334)
(0.3690) (0.6338) (0.6341)
Indirect GARCH
GM
IBM
S&P500
Table 6: Estimation Results of the CARES models (Part B)
29
0.0139
(0.0560)
(0.0560)
(0.0560)
0.9192
(0.0604)
(0.0594)
(0.0594)
0.2480
(0.1490)
(0.1089)
(0.1156)
0.1123
(0.1458)
(0.2173)
(0.1956)
0.7875
(0.1871)
(0.2700)
(0.2424)
0.3686
(0.2720)
(0.3709)
(0.3344)
b
k-nearest neighbour estimator
Koenker’s bandwidth with Bofinger’s hT
c
Koenker’s bandwidth with Hall and Sheather’s hT
a
5% ES
γ1
(Std1)a
(Std2)b
(Std3)c
γ2
(Std1)
(Std2)
(Std3)
γ3
(Std1)
(Std2)
(Std3)
0.0442
(0.0483)
(0.0562)
(0.0550)
0.9081
(0.0507)
(0.0473)
(0.0502)
0.2297
(0.1266)
(0.1139)
(0.1241)
Symmetric Absolute Value
GM
IBM
S&P500
0.0736
0.2058
0.0757
(0.0320) (0.1246) (0.0465)
(0.0372) (0.1154) (0.0505)
(0.0330) (0.1168) (0.0522)
0.9322
0.7016
0.9219
(0.0065) (0.0521) (0.0061)
(0.0065) (0.0471) (0.0060)
(0.0065) (0.0469) (0.0059)
0.3837
0.8893
0.3533
(0.2158) (0.4671) (0.2101)
(0.2153) (0.4078) (0.2212)
(0.2156) (0.4231) (0.1954)
Indirect GARCH
GM
IBM
S&P500
6
Conclusion
We have proposed a new model to ES estimation. Most existing methods estimate the
distribution of the returns and then recover its quantile, and the expected value of the
exceedances beyond the quantile in an indirect way. In contrast, we directly model
the quantile and the expected value of the exceedances beyond the quantile. To do
this, we introduce a new class of models, the CARES models, which use CaViaR model
for quantile estimation, along with specifying the evolution of the expected value of
exceedance beyond the quantile over time using a special type of autoregressive process.
We estimate the unknown parameters by a two-stage procedure, and derive the limiting
theory of these parameter estimators within a GMM framework. Simulation study that
compares this new model with some existing methods shows the new model performs
good with a moderate sample size. Applications to real data illustrate the ability of the
new model to adapt to new risk environments.
Appendix A
As the estimator θ̂ = (β̂, γ̂) can be asymptotically regarded as a GMM estimator, its
asymptotic distribution can be established within a GMM framework.
In our particular problem, θ̂ can be identified as
θ̂ = argmin{Qn (θ)},
where
Qn (θ)
=
mn (θ)
=
E0 [ϕt (θ)]
=
mn (θ)0 Vn−1 mn (θ)
Pn
n
1
1X
∇β ft (β) · (τ − I(yt < ft (β)))
t=1
n
ϕt (θ) = 1 Pn
n t=1
t=1 ∇γ gt (γ) · (yt − gt (γ) · I(yt < ft (β))
n
0
P
Vn −→ V,
where E0 means expectation, and V is the weighting matrix.
Proof of Theorem 3.1 In order to establish the consistency of the estimator θ̂, we
require some assumptions as follows:
P
Assumption 6.1. Denote m0 (θ) = E0 [ϕt (θ)], then sup|mn (θ) − m0 (θ)| −→ 0, where | |
is Euclidian norm. This assumption ensures that mn (θ) uniformly converges to m0 (θ)
in probability.
30
Assumption 6.2. For all θ ∈ Θ such that ||θ − θ0 || > ε, we have Q0 (θ) − Q0 (θ0 ) > 0.
This assumption ensures that the population objective function Q0 (θ) has a unique
maximum at θ0 .
Define the population objective function as Q0 (θ) = E0 [ϕt (θ)]0 V −1 E0 [ϕt (θ)]. Then under
assumption 6.1, we have
sup|Qn (θ) − Q0 (θ)|
sup|mn (θ)0 Vn−1 mn (θ) − E0 [ϕ(ωi , θ)]0 V −1 E0 [ϕ(ωi , θ)]|
=
θ∈Θ
θ∈Θ
mn (θ)0 Vn−1 mn (θ) − mn (θ)0 Vn−1 E0 [ϕ(ωi , θ)]
|
sup| +mn (θ)0 Vn−1 E0 [ϕ(ωi , θ)] − mn (θ)0 V −1 E0 [ϕ(ωi , θ)]
θ∈Θ
0 −1
0 −1
+mn (θ) V E0 [ϕ(ωi , θ)] − E0 [ϕ(ωi , θ)] V E0 [ϕ(ωi , θ)]
=
sup|mn (θ)0 Vn−1 mn (θ) − mn (θ)0 Vn−1 E0 [ϕ(ωi , θ)]|
θ∈Θ
+sup|mn (θ)0 Vn−1 E0 [ϕ(ωi , θ)] − mn (θ)0 V −1 E0 [ϕ(ωi , θ)]|
≤
θ∈Θ
+sup|mn (θ)0 V −1 E0 [ϕ(ωi , θ)] − E0 [ϕ(ωi , θ)]0 V −1 E0 [ϕ(ωi , θ)]|
θ∈Θ
sup|mn (θ)0 Vn−1 (mn (θ) − E0 [ϕ(ωi , θ)])|
θ∈Θ
+sup|mn (θ)0 (Vn−1 − V −1 )E0 [ϕ(ωi , θ)]|
=
θ∈Θ
+sup|(mn (θ)0 − E0 [ϕ(ωi , θ)]0 )V −1 E0 [ϕ(ωi , θ)]|
θ∈Θ
sup|mn (θ)0 Vn−1 |sup|mn (θ) − E0 [ϕ(ωi , θ)]|
θ∈Θ
θ∈Θ
+sup|mn (θ)0 |sup|Vn−1 − V −1 |sup|E0 [ϕ(ωi , θ)]|
=
θ∈Θ
θ∈Θ
θ∈Θ
0
0
+sup|mn (θ) − E0 [ϕ(ωi , θ)] |sup|V −1 E0 [ϕ(ωi , θ)]|
θ∈Θ
θ∈Θ
P
−→ 0
That is
P
sup|Qn (θ) − Q0 (θ)| −→ 0 .
θ∈Θ
Then, let ε > 0 be arbitrary small real number. Suppose ||θ − θ0 || > ε, by assumption
6.2, there exists a δ > 0 such that Q0 (θ) − Q0 (θ0 ) > δ. Then,
P r(||θ̂ − θ0 || ≥ ε) ≤ P r(Q0 (θ̂ − Q0 (θ0 ) ≥ δ)
= P r(Q0 (θ̂) − Qn (θ̂) + Qn (θ̂) − Q0 (θ0 ) ≥ δ)
= P r(Q0 (θ̂) − Qn (θ̂) + Qn (θ0 ) + op (1) − Q0 (θ0 ) ≥ δ)
≤ P r[(|Q0 (θ̂) − Qn (θ̂)| ≥ δ) ∪ (|Qn (θ0 ) − Q0 (θ0 )| ≥ δ)]
≤ P r[2sup|Q0 (θ̂) − Qn (θ̂)| ≥ δ].
θ∈Θ
31
P
We have sup|Qn (θ)−Q0 (θ)| −→ 0 from the above proof, therefore P r(||θ̂−θ0 || ≥ ε) → 0,
θ∈Θ
or equivalently
P
θ̂ −→ θ0 .
So far, we have proved that θ̂ is a consistent estimator.
Proof of Theorem 3.2 The proof builds on the asymptotic theory of GMM estimator
along with Huber (1967)’s theorem 3. One of assumptions in the asymptotic theory of
GMM estimator is that the objective function is twice continuously differentiable w.r.t
the parameters, but this assumption is not held in our objective function Qn (θ), as mn (θ)
is not continuously differentiable. Hence, we find an approximation for mn (θ) denoting
as m0 (θ) which is continuously differentiable, and derive the asymptotic distribution of
θ building on the approximation.
The assumptions required are as follows:
Assumption 6.3. The parameter space Θ is compact.
Assumption 6.4. The matrix
θ ∈ Θ.
∂m0 (θ̂)0 −1 ∂m0 (θ̃)
V
θ
θ0
is non-singular and has inverse for all
i ,θ)
Assumption 6.5. E0 (|| ∂ϕ(ω
||) uniformly has upper bound on parameter space Θ,
∂θ0
∂ϕ(ωi ,θ)
that is, E0 (sup|| ∂θ0 ||) < +∞.
θ∈Θ
Assumption 6.6. The variance of ϕ(ωi , θ) is finite, that is, V ar0 (ϕ(ωi , θ)) = E0 (ϕ(ωi , θ)ϕ(ωi , θ)0 ) <
+∞.
Let the expectation of the first order condition (m0 (θ)) and its differentiation to be
m0 (θ) = E0 [ϕt (θ)] =
E(∇ft (β(τ )) · (τ − I(yt < ft (β(τ )) < 0))
E(∇γ gt (γ(τ )) · (yt − gt (γ(τ )) · I(yt < ft (β(τ )))
D11 D12
∇θ m0 (θ) = Dθ =
,
D21 D22
32
,
where the four elements in the Dθ matrix are listed as follows:
∂
E(∇β ft (β(τ ))
∂β(τ )
· (τ − I(yt − ft (β(τ ) < 0))) = 0|Ft−1 )
R ft (β(τ ))
h(y|Ft−1 )dy)
=
· τ − ∇β ft (β(τ )) · −∞
R
2
2
f
t (β(τ ))
∂
∂
= ∂β(τ )∂β(τ
E(ft (β(τ )) · τ ) − ∂β(τ )∂β(τ
E(ft (β(τ )) · −∞
h(y|Ft−1 )dy))
)0
)0
D11 =
∂
E(∇β ft (β(τ ))
∂β(τ )
0
−E(∇β ft (β(τ )) · ∇β ft (β(τ )) h(ft (β(τ )|Ft−1 ))
0
= −E(∇β ft (β(τ )) · ∇β ft (β(τ )) h(0|Ft−1 )),
∂
D12 = ∂γ(τ
E(∇β ft (β(τ )) · (τ − I(yt − ft (β(τ ) < 0))) = 0|Ft−1 ) = 0,
)
∂
D21 = ∂β(τ
E(∇γ gt (γ(τ )) · (yt − gt (γ(τ ))I(yt − ft (β(τ ) < 0))
)
∂
E(∇γ gt (γ(τ )) · (yt − gt (γ(τ )))
= ∂β(τ
)
R ft (β(τ ))
∂
− ∂β(τ ) E(∇γ gt (γ(τ )) · −∞
(yt − gt (γ(τ ))) · h(y|Ft−1 )dy))
0
= −E(∇γ gt (γ(τ )) · ∇β ft (β(τ )) · (ft (β(τ )) − gt (γ(τ ))) · h(0|Ft−1 )),
∂
E(∇γ gt (γ(τ )) · (yt − gt (γ(τ ))(τ − I(yt − ft (β(τ ) < 0)))|Ft−1 )
D22 = ∂γ(τ
)
∂
= ∂γ(τ
E(∇γ gt (γ(τ )) · (yt − gt (γ(τ )) · τ )
)
R ft (β(τ ))
∂
− ∂γ(τ
E(∇
g
(γ(τ
))
·
(yt − gt (γ(τ ))) · h(y|Ft−1 )dy))
γ
t
)
−∞
0
= −E(∇γ gt (γ(τ )) · ∇γ gt (γ(τ )) · τ ).
Clearly, m0 (θ0 ) = 0. It can be verified that conditions 6.3- 6.5 are sufficient for (N-1)(N-3)of Huber (1967). Lemma 3 of Huber (1967) and assumption 6.6 together imply
#
"
PT
√
√1
∇f
(β
(τ
))
·
(τ
−
I(y
−
f
(β
(τ
))
<
0)
t
0
t
t
0
t=1
PT T
= oP (1).
T m0 (θ̂) +
√1
t=1 ∇gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · I(yt − ft (β0 (τ )) < 0)
T
Assuming that there exists a θ̃ ∈ (θ0 , θ̂), now we apply mean value theorem to decompose
m0 (θ̂) as
∂m0 (θ̃)
(θ̂ − θ0 ).
∂θ
where m0 (θ0 ) = 0 based on the first order condition. We re-arrange the above equation
to have
#
"
PT
√
√1
∇f
(β
(τ
))
·
(τ
−
I(y
−
f
(β
(τ
))
<
0)
t 0
t
t 0
PT T t=1
T m0 (θ̂) = −
√1
t=1 ∇gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · I(yt − ft (β0 (τ )) < 0)
T
m0 (θ̂) = m0 (θ0 ) +
=
∂m0 (θ̃) √
T (θ̂ − θ0 ) + oP (1).
∂θ
Hence,
√
∂m0 (θ̃) −1
T (θ̂−θ0 ) = −(
)
∂θ
"
√1
PT T
√1
t=1
T
PT
∇ft (β0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)
∇gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · I(yt − ft (β0 (τ )) < 0)
t=1
33
#
+oP (1).
By the continuity of ∇θ m0 (θ), we have
∇θ m0 (θ̃) =
∂m0 (θ̃) P
→ D(θ0 ).
∂θ
Meanwhile, the consistency of θ̂ implies that θ̃ also converges to θ0 . Therefore, it follows
that
"
#
PT
√
√1
∇f
(β
(τ
))
·
(τ
−
I(y
−
f
(β
(τ
))
<
0)
t 0
t
t 0
PT T t=1
T (θ̂−θ0 ) = −D(θ0 )−1
+oP (1).
√1
t=1 ∇gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · I(yt − ft (β0 (τ )) < 0)
T
Now, a central limit theorem is applied to yield:
#
"
PT
√1
∇f
(β
(τ
))
·
(τ
−
I(y
−
f
(β
(τ
))
<
0)
t
0
t
t
0
D
PT T t=1
→ N (0, S(θ0 )),
√1
∇g
(γ
(τ
))
·
(y
−
g
(γ
(τ
))
·
I(y
−
f
(β
(τ
))
<
0)
t 0
t
t 0
t
t 0
t=1
T
S11 S12
, and the four elements in S(θ) matrix
where S(θ0 ) = E(m0 (θ0 ) · m0 (θ0 )) =
S21 S22
are listed as follows:
S11 = E((∇β ft (β0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
·(∇β ft (β0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
0
= E(∇β ft (β0 (τ )) · ∇β0 ft (β0 (τ )) ) · E((τ − I(yt − ft (β0 (τ )) < 0))2 ))
0
= τ · (1 − τ ) · E(∇β ft (β0 (τ )) · ∇β0 ft (β0 (τ )) ),
S12 = S21 = E((∇ft (β0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
·(∇γ gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
0
= E(∇β ft (β0 (τ )) · ∇γ0 gt (γ0 (τ )) )
·E(((τ − I(yt − ft (β0 (τ ) < 0))) · ((yt − gt (γ0 (τ ))(τ − I(yt − ft (β0 (τ )) < 0))))
0
= E(∇β ft (β0 (τ )) · ∇γ0 gt (γ0 (τ )) ) · E((τ − I(yt − ft (β0 (τ )) < 0)) · 0
= 0,
S22 = E((∇γ gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
·(∇γ gt (γ0 (τ )) · (yt − gt (γ0 (τ )) · (τ − I(yt − ft (β0 (τ )) < 0)))
0
= E(∇γ gt (γ0 (τ )) · ∇γ0 gt (γ0 (τ )) ) · E((yt − gt (γ0 (τ )))2 |yt < ft (β0 (τ ))).
Therefore, it follows that Theorem 3.2 holds as
√
D
T (θ̂ − θ0 ) −→ N (0, Σ(θ0 )),
0
where Σ(θ0 ) = D(θ0 )−1 S(θ0 )(D(θ0 )−1 ) .
As in Engle and Manganelli (2004) and Kuan et al. (2009), the asymptotic covariance
matrix Σ(θ0 ) can be consistently estimated by its sample counterparts, that is stated in
Theorem 3.3.
34
Appendix B
Assuming that the asset return yt follows a GARCH(1,1) model
σt2
rt = σt zt ,
2
2
= a0 + a1 rt−1
+ a2 σt−1
,
in the meantime, when we specify an indirect GARCH model for the VaR or ES, we
have
2
2
+ b2 yt−1
,
(21)
ft2 = b0 + b1 ft−1
where ft denotes VaR or ES at time t, and
V aR = σΦ[−1] (π),
[−1]
ES = −σ · φ(Φ π (π))
(22)
as described in Section 2. Then, we can equate the above equations to get
2
2
ft2 = (a0 + a1 yt−1
+ a2 σt−1
) · ξ2
2
2
+ b2 yt−1
= b0 + b1 ft−1
2
2
,
ξ 2 + b2 yt−1
= b0 + b1 σt−1
(23)
where Φ and φ are cumulative density function and probability density function of the
standard Gaussian distribution, and τ is the coverage probability. ξ = Φ[−1] (π) is for
[−1]
the conditional model of VaR, and ξ = φ(Φ π (π)) is for the conditional model of ES.
Therefore, we have a0 ξ 2 = b0 , a1 ξ 2 = b02 , and a2 = b1 .
35
References
C. Alexander and E. Lazar. Normal mixture garch(1,1): applications to foreign exchange
markets. Journal of Applied Econometrics, 21:307–336, 2006.
P. Artzner, F. Delbaen, J. M. Eber, and D. Heath. Coherent measures of risk. Mathematical Finance, 9:203–228, 1999.
E. Bofinger. Estimation of a density function using order statistics. Australian Journal
of Statistics, 17:1–7, 1975.
S.X. Chen. Nonparametric estimation of expected shortfall. Journal of Financial Econometrics, 2:87–107, 2008.
R. Engle and S. Manganelli. Caviar: Conditional autoregressive value at risk by regression quantiles. Journal of Business and Economic Statistics, 22:367–381, 2004.
P. Hall and S. Sheather. On the distribution of a studentized quantile. Journal of the
Royal Statistical Society, Series B, 50:381–391, 1988.
P.J. Huber. The behaviour of maximum likelihood estimates under nonstandard conditions. Proeedings of the Fifth Berkeley Symposium, 4:221–233, 1967.
E. Jondeau and M. Rockinger. Conditional volatility, skewness, and kurtosis: Existence,
persistence, and comovements. Journal of Economic Dynamics and Control, 27:1699–
1737, 2003.
R. Koenker. Quantile Regression (Econometric Society Monographs). Cambridge University Press, Cambridge, MA, 2005.
C.M. Kuan, J.H. Yeh, and Y.C. Hsu. Assessing value at risk with care, the conditional
autoregressive expectile models. Journal of Econometrics, 150:261–270, 2009.
O. Scaillet. Nonparametric estimation and sensitivity analysis of expected shortfall.
Mathematical Finance, 14:115–129, 2004.
J.W. Taylor. Estimating value at risk and expected shortfall using expectiles. Journal
of Financial Econometrics, 6:231–252, 2008.
36