fulltext

The use of extreme value theory and
time series analysis to estimate risk
measures for extreme events
Sofia Rydell
Master’s Thesis in Engineering Physics, Department of Physics, Umeå University, 2013
Department of Physics
Linnaeus väg 20
901 87 Umeå
Sweden
www.physics.umu.se
Umeå University
Department of Physics
Master’s Thesis in Engineering Physics 30 hp
2013-01-27
The use of extreme value theory and
time series analysis to estimate risk
measures for extreme events.
Sofia Rydell, [email protected]
Examiner: Markus Ådahl, [email protected], Department of Mathematics and
Mathematical Statistics, Umeå University.
Supervisor: Magnus Lundin, [email protected], Svenska Handelsbanken AB
Sammanfattning
Syftet med detta arbeta är att använda extremvärdesteori och tidsserieanalys för att hitta modeller
för att skatta de två riskmåtten value at risk och expected shortfall, vilka är mått på potentiella
förluster. Fokus ligger på vilken tidshorisont som behövs för att erhålla prediktioner som
överensstämmer med den faktiska avkastningen för en tillgång eller en portfölj av tillgångar.
Metoderna baserade på extremvärdesteori som används är Hill estimatorn och den så kallade ”peak
over threshold”-metoden. Hill estimatorn kombineras även med en tidsseriemodell. Modellen som
då används är en
.
För extremvärdesteorimodellerna är valet av tröskel som skiljer de extrema observationerna i
svansen från observationerna som tillhör den central delen av distributionen viktigt. I detta arbete
används en tröskel som är 10% av den totala sampelstorleken, då detta är vanligt förekommande.
Några ytterligare metoder för att välja tröskeln presenteras också i denna rapport.
För varje modell används olika långa perioder av historiskt data då prediktioner av riskmåtten görs,
för olika tillgångar. Det främsta resultatet av detta arbete är att den bästa modellen och lämpligaste
tidshorisonten på det historiska datamaterialet för att skatta de två riskmåtten skiljer sig från data till
data. Dock är metoderna som kombinerar extremvärdesteori och tidsseriemodeller mest flexibla av
de undersökta och dessa är de som med störst sannolikhet fångar extrema händelser. Hillmetoden
som inkluderar en tidsseriemodell och använder sig av kortare tidshorisonter är att föredra då
riskmått för index ska skattas, medan Hill estimatorn i sig själv med tre eller fyra års tidshorisont är
att föredra då riskmått för valutakurser skattas. I detta arbete har endast modeller för en enskild
tillgång studerats, men modellerna kan lika enkelt användas på en portföljs tidsserie. En multivariat
version av extremvärdesteorin existerar men dess komplexitet gör den ofördelaktig att
implementera. Om de univariata extremvärdesmodellerna i sig själva anses otillräckliga när det
kommer till att fånga alla förhållandena i en portfölj så kan modellerna användas som ett
komplement till de enklare modeller som vanligen används och på så vis förbättra riskanalysen.
I
Abstract
In this thesis the main purpose is to use extreme value theory and time series analysis to find models
for estimating the two risk measures for potential losses, value at risk and expected shortfall. Focus is
on the time horizon needed to obtain predictions that are consistent with the actual outcome of an
asset or a portfolio of assets.
The extreme value based methods used are the Hill estimator and the peak over threshold method.
The Hill estimator is also combined with a time series model. The time series model used is an
model.
For extreme value theory based models the choice of threshold between the observations belonging
to the tail and the observations belonging to the center of the distribution is crucial. In this study the
threshold is set to be 10% of the sample size, by conventional choice. There are additional methods
of choosing the threshold and some of them are presented in this paper.
For each models different length of historical data is used when predictions of the risk measures are
made for different assets. The main result is that the best model and appropriate time horizon of
historical data to use for estimating value at risk and expected shortfall differs from dataset to
dataset. However, the methods that combine extreme value theory and time series models are the
most flexible ones and those are the ones most likely to capture extreme events. The conditional Hill
methods with shorter time horizons seem preferable when estimating the risk measures for indices,
while the Hill estimator with time horizon of three or four years is preferable for foreign exchange
rates. In this study only models for single assets are evaluated, but the models could easily be
implemented on a time series of a portfolio. A multivariate case of the extreme value theory exists
but its complexity makes it disadvantageously to implement. So if for example the univariate
extreme value models alone are considered inadequate to capture all the relations in a portfolio the
models could be used as a complement to the commonly used model based solely on historical
simulation and thereby improve the risk analysis.
I
Contents
1. Background and introduction ..........................................................................................................1
2. Theory ............................................................................................................................................3
2.1 Return series .............................................................................................................................3
2.2 Risk measures ...........................................................................................................................3
2.3 Likelihood estimates..................................................................................................................4
2.4 Time series models ....................................................................................................................5
2.4.1 Autoregressive model .........................................................................................................5
2.4.2 Generalized autoregressive conditional heteroskedasticity model ......................................5
2.5 Extreme Value Theory ...............................................................................................................6
2.5.1 Defining the tail ..................................................................................................................7
2.5.2 Peak over Threshold method ..............................................................................................9
2.5.3 Hill estimator .................................................................................................................... 10
2.5.4 Conditional Hill estimator ................................................................................................. 11
2.6 Historical simulation ................................................................................................................ 12
3. Data .............................................................................................................................................. 12
4. Analysis of the dataset .................................................................................................................. 13
4.1 Time series plot and descriptive statistics ................................................................................ 13
4.2 Autocorrelation and Ljung Box test.......................................................................................... 14
4.3 Histogram and QQ plots .......................................................................................................... 15
5. Method ......................................................................................................................................... 16
5.1 Overview threshold sensitivity................................................................................................. 17
5.2 Threshold choice ..................................................................................................................... 17
5.3 Peak over Threshold ................................................................................................................ 17
5.4 Hill estimator........................................................................................................................... 18
5.5 Conditional Hill estimator ........................................................................................................ 18
5.6 Backtesting.............................................................................................................................. 20
5.6.1 Backtesting VAR ............................................................................................................... 20
5.6.2 Backtesting ES .................................................................................................................. 20
5.7 Higher significance levels......................................................................................................... 21
6. Results .......................................................................................................................................... 21
6.1 Overview threshold sensitivity................................................................................................. 22
6.2 Prediction plots ....................................................................................................................... 23
6.2.1 Hill and GPD simulation .................................................................................................... 24
6.2.2 Conditional Hill simulations .............................................................................................. 25
6.2.3 Comparisons..................................................................................................................... 26
6.3 Backtesting results, 10% threshold .......................................................................................... 30
6.3.1 OMXS30 ........................................................................................................................... 33
6.3.2 FTSE ................................................................................................................................. 34
6.3.3 GSPC................................................................................................................................. 35
6.3.4 N225 ................................................................................................................................ 36
6.3.5 EUR .................................................................................................................................. 37
6.3.6 GPB .................................................................................................................................. 37
6.3.7 USD .................................................................................................................................. 38
6.4 Backtesting results, various thresholds .................................................................................... 39
6.5 Results higher significance levels ............................................................................................. 41
7. Discussion and conclusions ........................................................................................................... 43
8. References .................................................................................................................................... 46
Appendix A ....................................................................................................................................... 48
Appendix B ....................................................................................................................................... 57
Appendix C ....................................................................................................................................... 59
Appendix D ....................................................................................................................................... 64
i
1. Background and introduction
Due to the turbulence on the financial markets during the last decades it has become more
important for banks and financial institutes to try to foresee and monitor different risks. Above all it’s
the incidents of large losses that wants to be prevented and avoided. For banks’ and other financial
institutes’ reliable methods and credible measures for estimating risk is essential and are becoming
more important as new regulations and new market conditions develops. Considering today’s market
and with banks and other institutes as large as several of them are today the consequences of
inaccurate evaluation and control of risks can be devastating. The crash of the banking system on
Island and the downfall of Lehman Brothers 2008 and what followed are two examples.
A risk measure for the potential loss of an asset or portfolio widely used is the so called value at risk.
The pros for this risk measure are that it’s intuitive, easy to understand and there exists easily
implemented methods for estimating it. On the other hand the cons lie in its inability to capture the
most extreme losses and the fact that it’s not a coherent risk measure. Another risk measure that’s
up and coming in the financial sector and is likely to be included in a future Basel framework is the so
called expected shortfall. Like the value at risk it considers the potential loss and is fairly intuitive, but
additionally expected shortfall is based on the entire set of the most extreme observations and it’s a
coherent risk measure. There exist numerous methods for estimating value at risk. Many of those
methods can be developed and used to estimate expected shortfall as well.
Value at risk is included in this study since it’s universally accepted and used, and its pros in many
ways outweighs its cons. Why expected shortfall is chosen is because its close relationship to value at
risk and that therefore it is relatively easy to implement estimation methods, but mainly because The
Basel Committee on Banking Supervision indicates that it’s going to be included in the forthcoming
framework.
A theory that considers these types of extreme events is the so called Extreme Value Theory. It aims
to monitor and predict extreme events, like financial crisis, in an efficient way. Therefore an
investigation of how methods based on the extreme value theory can be used to estimate risk
measures is performed in this study. The extreme value theory methods that are included are the so
called Hill estimator and the peak over threshold method.
For both risk measures the two major questions investigated in this study are which method to use
for estimations and what length of calibration data that is needed to obtain reliable estimates, with
that method.
The aim of this study is to use Extreme Value Theory, time series analysis and Monte Carlo
simulations to obtain reliable models for predicting the two risk measures, value at risk and expected
shortfall. A reliable model is considered to be a model that behaves similar to the statistically
expected. Statistically expected means that if a model generates a VAR estimate for a significance
level of 99% the statistically expected is that this estimate will be accurate in 99 cases out of 100.
Hence, when evaluation of the model is performed the outcome should be roughly the same.
Focus in this study is on investigating which time horizon of the historical data that is needed in the
calibration to obtain stable and reliable models. The time horizon needed may differ between
methods. Hence, one goal is to identify which of all models that is preferable, and give the most
1
accurate estimate of value at risk and expected shortfall, if possible. The other is to provide
guidelines of what time horizon is needed for the different methods to obtain stable models.
For meeting the objectives of this study MATLAB is used to build the models and a backtesting
procedure is used to evaluate the models. In the backtesting procedure the models are used to
predict value at risk and expected shortfall over a certain time for which historical data of the actual
outcome is available. The predictions are compared with the actual outcome of the data and a
backtesting statistic for the respective risk measures is obtained. The statistic for expected shortfall
depends on the value at risk estimation and is an average of the difference between the predicted
and actual outcome. Therefore, in this study a model is considered to be stable and reliable if the
backtesting statistic for value at risk is within 10% from the statistically expected. To obtain a reliable
model for estimating expected shortfall two conditions need to be fulfilled. First of all the above
should hold for the value at risk estimate given by that model and second the backtesting statistic
should not deviate more than a 1% from zero. For an ideal model backtesting statistic for expected
shortfall would be zero.
2
2. Theory
In this study the loss distribution for the returns will be considered. Hence, the losses are given by
positive values. In this section the definition of the return series and the risk measures are first
presented. After that a description of the likelihood method for estimating parameters is presented
as well as the definition of some time series models that will be used in this study. Finally some
basics of Extreme value theory are outlined as well as the methods that will be used in this study.
2.1 Return series
For a given daily price process
day
to day , i.e.
the common returns
are given by the percentage change from
(1)
From that the loss returns
is then given by:
(2)
2.2 Risk measures
Value at risk (VAR) and expected shortfall (ES) is two types of risk measures of the potential loss of a
financial asset or a portfolio of assets. VAR is a measure that describes the minimal loss of a portfolio
which can occur for a given probability, during a specific time. The mathematical definition is
(3)
where
and belongs to the loss distribution (Rocco, 2011). This means that the
is
given by the smallest value such that the actual loss of a portfolio exceeds with the probability
, at most. For the discrete case, let
, where
, be the ordered
sample of
, then the VAR for the probability
is given by
(4)
In the case where
isn’t an integer different approaches can be used. For instance, either
or
can be used for the
value or an interpolation between
and
can be made to
obtain a satisfying estimate.
An example is a portfolio with a total value of 1 SEK, with a one-day
. This means
that from today to tomorrow the value of the portfolio will drop 60% with a probability of 5%, i.e. the
loss will be 0.60 SEK. Let the historical data used to estimate the VAR be 200 days. Using equation (4),
where
, the one-day
is given by the 190th observation in the
ordered sample.
3
Figure 1. Shows the loss distribution and the VAR-value for the probability .
Note that the loss can be greater, the
say that of the worst
outcomes, this is the
least you lose. Therefore, the ES-measure is an interesting compliment. ES is the average loss of an
asset or a portfolio of assets, for a certain probability, during a specific time,
.A
simple description is that ES is the average of the
worst outcomes. The definition is
(5)
where
is given by (3).
In the discrete case, where
is the ordered sample of
, the ES becomes
a sum, given by
(6)
Where n is the number of observation in the entire sample, and in the case
the reasoning is the same as for
given by (4).
isn’t an integer
2.3 Likelihood estimates
The maximum likelihood method for estimating parameters for a statistical model maximizes the
likelihood function, or the log-likelihood function, for the data and the specified model. In this
method the probability density function
can be unknown but the joint density function for the
data is assumed to come from a known family of distributions, for example the normal family. For an
independent and identically distributed sample of size n the joint density function looks like
(7)
4
where are the parameters of the model and
is the observed variables. Thus the observed
variables
are known whereas the parameters given by are to be estimated. The likelihood
function is then given by
(8)
And the often used log-likelihood function is given by
(9)
The estimated parameters are then given by the set
equation (8) or (9).
which maximizes the likelihood function,
When working with real time data the underlying distribution can be quite complicated and is often
unknown. Then the pseudo-maximum likelihood method is used. The different from the maximum
likelihood is that the estimate of the parameters, , is obtained by maximizing a function that is
related to the log-likelihood. For example, the maximum likelihood estimates parameters of a normal
distribution even though the sample distribution has fatter tail than a normal. For financial time
series, as the ones used in this study, the underlying distribution is commonly fat tailed but maximum
likelihood can still be used since it provides consistent estimators. (McNeil and Frey (2000),
Gouriéroux (1997))
2.4 Time series models
2.4.1 Autoregressive model
Given a strictly stationary time series
, where
and are measurable with respect
to the information set up to time t, denoted
, is a white noise process from an unknown
distribution with mean zero and unit variance, an autoregressive model,
, is given by
(10)
Where is a constant,
are coefficients of the model and
, hereafter denoted as innovations of the process.
is a white noise process,
2.4.2 Generalized autoregressive conditional heteroskedasticity model
The volatility process of a time series can be simulated via a generalized autoregressive conditional
heteroskedasticity model,
. The GARCH model takes into consideration both previous
observations in the time series and previous volatilities for prediction the coming. A
is
defined as
(11)
Where
coefficients of the model and
is a constant,
and
are
.
To estimate the coefficients of the models above,
equations (10)-(11) pseudo maximum likelihood is used, see section 2.3.
, given by
5
2.5 Extreme Value Theory
The extreme value theory (EVT) is based on the distribution of the extreme returns, the ones located
in the tail of the distribution of a returns series. Hence this method can be used to estimate extreme
events more accurate. In this study EVT is used to obtain estimates of VAR and ES. This section
outlines some of the basics of EVT described in Alexander (2008), Coronado (2000), Kourouma et al.
(2012), McNeil (1997), McNeil and Saladin (1997), McNeil and Frey (2000), Nyström and Skoglund
(2002), Rocco (2011).
According to the Fischer-Tippett Theorem a set of normalized maxima, if normalized appropriately,
converges in distribution to a non-deganerated limiting distribution. This limiting distribution belongs
to either Fréchet, Gumbel or Weibull family. This means that the tail, which consists of the maxima of
a sample, for in principal every probability distribution converges to one of the following three
families:

Fréchet
(12)

Gumbel
(13)

Weibull
(14)
Where is the shape parameter and gives an indication of the fatness of the tail. Notice that these
are the standardized distribution, for the non-standardized insert
instead of in the
formulas. These three distributions given by equation (12)-(14) are called extreme value distributions
and can be combined to the general extreme value distribution (GEV), given by
(15)
where
. For the standardized GEV, set
and
.
There are various ways to make use of the EVT, there exists parametric and semi-parametric as well
as non-parametric methods. The most frequently used parametric methods are the so called blockmaxima method and the peak over threshold method. For the non-parametric methods there are
several estimators that can be applied, the Hill estimator is the one most commonly used.
For all EVT methods it is the excesses over a certain threshold that are studied. The marginal
distribution, , is used to denote the distribution of the observations above the threshold . It can
be expressed by
(16)
6
where
,
is the loss distribution and
. According to
the Gnedenko-Pickands-Balkema-deHaan Theorem, if is in the maximum domain of attraction,
MDA, of the extreme value distribution with shape parameter ,
distribution of
, the marginal
converge to the generalized Pareto distribution (GPD), denoted
. The
convergence can be described by the following expression
(17)
for some positive measurable function
, i.e. for a sufficiently high threshold, , the distribution
of the observations above this threshold may be approximated with a GPD. The class of distributions
which are in the MDA of the extreme value distributions is large and all commonly used
distributions are included. The GPD is given by
(18)
where the scale parameter
,
when the shape parameter
the shape parameter
. Note that for a mean adjusted distribution
.
and
when
the GPD becomes
The main assumption in EVT is that the extreme observations are independent and identically
distributed. This is not always fulfilled when working with real data. A way to improve the EVT
models is to combine them with a
model of the volatility and an
- or
model of some
sort to the returns and use EVT on the residuals obtained via the time series model. By combining
EVT and time series models the assumption of i.i.d. observations is more likely to be fulfilled and
hence more reliable estimates are obtained.
For a time series of returns an
model is a reasonable choice. A
model for the
volatility is often used to capture the heteroskedasticity. An
model is used because the
percentage changes generally fluctuate around zero. Other suitable models are
or
just a simple
. The choice of model vary from article to article, the most common are
,
and
.
2.5.1 Defining the tail
When applying the EVT, the choice of threshold is a critical and not obvious choice. The threshold is
the cut off between observations belonging to the center of the distribution and those belonging to
the tail. If it’s set too low, observations belonging to the center of the distribution are classified as
extreme events and if it’s set too high, extreme observations are excluded. There exist several
methods for choosing the threshold and some are listed below. (Rocco (2011))
The conventional choice
How the decision of choice of the threshold is made is not specified by all authors. A generally
accepted rule of thumb is that the tail should consists of 5-10% of the entire sample and hence the
threshold is set to be in that region. It should not be higher than 10-15%. 10% seems to be a
frequently used limit (Rocco (2011), Nyström and Skoglund (2002), McNeil and Frey (2000)).
7
Graphical methods
Even though graphical methods are somewhat arbitrary they can still be a helpful tool in the
selection process and are quite easy implemented. One useful plot is the mean excess plot. Mean
excess function is defined as
(19)
i.e. the mean of excesses over the threshold ,
. Plot the set
that is, the actual observation,
, and the sample mean excess for that one. If the tail distribution is
a GPD the plot is approximately linear for the higher order statistics, since for a GPD with parameters
and the mean excess function for some
becomes
(20)
Where
. The slope of the line indicates the sign of the shape parameter, . Hence,
corresponds to the Fréchet family etc. see equations (12)-(14). (Embrechts, Klüppelberg and Mikosch
(1997)).
If the Hill estimator is the EVT method used Hill plots are another way to determine the threshold.
The Hill plot is a plot of the estimated tail index,
, as a function of the number of observations
above the threshold, . For futher information about the tail index and how it is obtained, see
section 2.5.3 below. Hence, for the Hill plots the set
is plotted, where represent the number of observations in the tail and
is the estimated
tail index. For other non-parametric methods corresponding plots can be made. When interpreting
the plot approximately horizontal lines indicates that for those values of the threshold, the tail index
estimate is essentially stable with respect to the choice of threshold (Rocco (2011)).
Monte Carlo simulation and minimizing mean squared error (MSE)
The optimal threshold of this method is given by the one that minimizes the MSE for an estimator.
Monte Carlo simulations are used to determine for which (the tail size), conditional on sample size
and EVT-distribution, the MSE for the tail index is smallest. This method is useful since the tradeoff
between bias and efficiency is optimized. (Halie and Pozo (2006)).
Data driven algorithms
There are several data driven algorithms that are design to give the optimal threshold for a given
data. In an article by Lux (2000) an overview and comparison among five different methods are
presented. Drees and Kaufmann (1998) present a method based on a stopping rule for the sequence
8
of Hill estimators. Hall (1990) develops and Danielsson and deVries (1997) lay out an improvement of
a subsample bootstrap algorithm. Danielsson, De Haan, Peng, and De Vries (2001) present another
subsample bootstrap algorithm based on subsamples of different magnitude. Beirlant, Vynckier and
Teugels (1996a and 1996b) present two iterative regression approaches.
2.5.2 Peak over Threshold method
One of the most frequently used parametric methods is the peak over threshold method, POT. As
described in section 2.5 the GPD can be used as an approximation of the distribution of the tail. In
POT the probability distribution of the GPD, given by (18), is used to derive expressions for different
risk measures. For a sample
and a choice of threshold, , a GPD is fitted to the
exceedances above
via maximum likelihood, see section 2.3. Hence estimate of the shape
parameter and the scale parameter is obtained. By letting
be the excesses over the
threshold, define
by (16). Use the fact that
and an expression for the
underlying loss distribution is
(21)
when
, where
is given by (18).
and where
can empirically be approximated with
is the number of observations above the threshold
and
is the total sample size.
Hence the underlying distribution can be estimated by
(22)
Since (22) is the cumulative distribution function for the loss distribution, given that an observation is
above the threshold, and that
is defined as in equation (3), for the observation
equation (22) can be written as
.
(23)
From this an estimate of
is obtained by using the inverse of
equation (23). Then the following expression for the estimates obtained
, i.e. solve for
in
(24)
where and are estimated GPD parameters, is the total number of observations in the sample
and is the number of excesses over a given threshold (Alexander (2008), McNeil (1999), McNeil
and Saladin (1997)).
From the definition of
given by (5) the estimate of
can be derived
(25)
Where
is given by (24) for
(McNeil (1999)).
and
, and
and
are estimated GPD parameters.
9
2.5.3 Hill estimator
Nonparametric methods for estimating the tail or the tail index makes no assumptions about the
underlying distribution in the tail. The most common used is initiated by Hill (1975). He derived the
estimator under the assumptions of i.i.d. observations via maximum likelihood. The only restriction
for the Hill estimator is that it’s only valid for Fréchet type distributions, i.e. distributions with fat tails
(Longin (2000)). To obtain the Hill estimate of the tail index, arrange the sample
in
descending order,
. The tail index can then be estimated by the Hill estimator:
(26)
where
is the number of exceedances over a given threshold, , and the threshold value is given by
, i.e.
. Depending on which tail you want to study, the indexing in the sum
changes. The procedure presented above is the one used in this study. (Halie and Pozo (2006).
As mentioned above one assumption when using the Hill estimator is that the data is heavy tailed,
i.e. the distribution of the data corresponds to the Fréchet family. Therefore, as explained in section
2.5, the GPD, here denoted , can be used as an approximation of the true marginal distribution,
.
An approximation of the Pareto distribution is given by
(27)
for some constant
approximation
. So by using equation (21), but with this approximation of
and the
, the estimate of the underlying distribution can be derived. The constant
is found from assuming that
distribution is obtained:
. Hence, the following expression of the estimated
(28)
Where
and
,
is the sample size,
is the number of observations above the threshold
is the shape parameter estimated by equation (26). (Rocco (2011), Danielsson and de
Vries (1997)).
From equation (28) an expression for
can be derived in the same way as in section 2.5.2, since
it´s the
-quantile of the tail distribution. Hence when using the Hill estimator and letting
observation
and
the
is estimated by:
(29)
(Rocco (2011), Christoffersen and Goncalves (2005))
The derivation of the
estimate is done in a similar way as in section 2.5.2, starting from its
definition given by (5). Thus when using the Hill estimator the ES is estimated by:
(30)
10
Where
is given by equation (29) for
(Christoffersen and Goncalves (2005)).
and
, and
is estimated by equation (26)
2.5.4 Conditional Hill estimator
Since EVT is based on the assumption that the data is i.i.d. a filtering of the data can be necessary. A
procedure commonly used is to first fit a time series model to the data and by that obtain the
standardized residuals and thereafter in a second step apply EVT methods to the residuals (McNeil
and Frey (2000), Diebold et al (1998)).
The time series models that are considered in this study are presented under section 2.4. In this
study both the normal and the student’s t distribution is used as the distribution for the innovations.
In this study an
model is used to model the time series. For that model the
assumption of the dynamics of the loss-return series is that it’s a strictly stationary time series and
with innovations
. The innovations have the marginal distribution
.
is
marginal distribution of . The predictive distribution for the returns over the next days is given by
.
For time series models a calibration period is needed to obtain an estimate of the parameters of the
model. Let the calibration data be of length
. The residuals are defined as
(31)
where
is the observed return and is the predicted. From the fitted model the conditional mean,
, and standard deviation ,
, are obtained from
(32)
(33)
Where are the residuals, given by equation (31). And from the equations above the standardized
residuals for the calibration period can be obtained:
(34)
Which by assumption is i.i.d. and not dependent of t (McNeil AJ and Frey R (2000)).
In the second step an EVT method is applied to the standardized residuals to obtain the desired
estimates. In this study the EVT method used is the Hill estimator, for details see section 2.5.3. The
Hill estimator for the residuals is given by equation (26) where the ordered sample is given by the
ordered residuals,
for a sample size of . The unconditional VAR for the residuals is
defined as equation (3) and obtained via equation (29) when using the Hill estimator. The
conditional VAR is defined as
(35)
i.e. the
quantile of the predictive distribution for the next
days, given
.
Similarly, unconditional expected shortfall is defined as equation (5) and given by equation (30) for
the Hill estimator. The conditional ES is defined
11
(36)
i.e. the average of the exceedances over the VAR of the predictive distribution for the next
given , the information set up to time .
days,
In this study one day VAR predictions are of interest, therefore the one step predictive distribution is
used,
, and the notation for the conditional predictions is simplified to
and
.
After some rewriting
so in the end the conditional VAR en ES can be given
by
(37)
(38)
Where
and
McNeil and Frey (2000)).
are the unconditional VAR and ES for the residuals. (Rocco (2011),
2.6 Historical simulation
One commonly used approach when estimating risk measures is what in this study is called historical
simulation. It’s included in this study because it’s easy to implement and a widely used method and
therefore works as a reference to compare the more advanced and complex EVT methods with. The
principle is that historical data for a time series or a collection of several time series of different
assets is used as the empirical distribution for the future returns. Value at risk and expected shortfall
is estimated from that empirical distribution via equation (4) and (6). How the empirical distribution
looks like depends on the length of the historical data used as well as how the returns have behaved
under that period. If a period of low volatility is used the most extreme events are likely to consist of
quite small changes in the returns, while the most extreme events most probably are quite large
changes if a period containing high volatility is used.
3. Data
In this study two different types of risk factors are included, foreign exchange rates and indices. The
time series all have a length of 18 years, from 1994-01-03 to 2011-12-30. The main analysis and
model evaluation is made with data from the OMX Stockholm 30 index (OMXS30) the index of the 30
most traded stocks, in volume, on the Stockholm stock exchange. Further evaluations are made to
investigate if different types of risk factors or even just different risk factors give various results. Here
three more indices and three foreign exchange rates are used. The indices are the FTSE 100 index
(FTSE), the index of the 100 companies on the London stock exchange that have the highest
capitalization and the Nikkei 225 average (N225), the index of the 225 Japanese companies with the
highest rating in the first section of the Tokyo stock exchange. Also included is the Standard and
Poor´s 500 (GSPC), an index of 500 stocks, representative for the major industries in USA, from either
NASDAQ or New York stock exchange. The foreign exchange rates are all against the Swedish krona
and the three included in this study are British pound (GBP), United States dollar (USD) and Euro
(EUR). For all risk factors the time series are in percentage change for the loss returns, see section
2.1.
12
4. Analysis of the dataset
The EVT is based on the assumption that the data used is independent and identically distributed,
i.i.d. Before applying EVT an analysis of how well the data meet the criterion is essential, this to be
able to draw meaningful conclusions and interpret the results obtained. Most financial time series
come from a distribution with fatter tails than the normal distribution, and hence their tails have a
Fréchet type of distribution. This assumption should be verified since the Hill estimator is conditioned
on it. In this study autoregressive time series models will be used and for these the stationarity of the
time series must be checked.
The analysis is made on the entire length of each time series even though shorter time interval will
be used for calibration data for example.
4.1 Time series plot and descriptive statistics
To investigate the i.i.d. condition the time series is plotted to get a first rough indication.
Heteroskedasticity violate the assumption of identically distributed observations and an upward or
downward trend for the return series implies non-stationary time series. To see whether the data
comes from a distribution with heavy tails and if it’s asymmetric, the kurtosis and the skewness
measures are used. For the definition and equation used to estimate kurtosis and skewness see
appendix A.
Time series for loss returns of OMXS30
8,0%
3,0%
-2,0%
-12,0%
1994-05-20
1994-11-07
1995-04-26
1995-10-12
1996-04-03
1996-09-19
1997-03-14
1997-09-01
1998-02-24
1998-08-12
1999-02-03
1999-07-22
2000-01-11
2000-06-28
2000-12-14
2001-06-05
2001-11-21
2002-05-17
2002-11-06
2003-05-07
2003-10-28
2004-04-26
2004-10-15
2005-04-11
2005-09-30
2006-03-22
2006-09-15
2007-03-08
2007-09-03
2008-02-26
2008-08-20
2009-02-13
2009-08-10
2010-02-02
2010-07-28
2011-01-18
2011-07-13
2011-12-30
-7,0%
Figure 2. The time series for the loss returns for the OMXS30 index, note that the data starts at 1994-01-03 even though
the first date on the axis label is a later date.
The most extreme losses seem to be clustered, and it seems like the volatility also is clustered, which
make the assumption of an i.i.d. time series doubtful. However, the time series seems to be
stationary over all, but with a slightly shifted mean, see table 1 below. The time series plot of the
other risk factor used is found in Appendix A, and all display similar tendencies. Since financial time
series often looks like this an
model for dealing with the shift in mean and a
model for
handling the clustering of the volatility is used in this study as a tool to satisfy the conditions of EVT.
13
Table 1. Statistics for the data used, all time series consists of approximately 18 years of daily observations.
Statistics loss
distribution
Mean
Standard
deviation
Kurtosis
Skewness
OMXS30
FTSE
N225
GSPC*
GBP
USD
EUR
-3.85∙10-4
0.0155
– 1.68∙10-4
0.0113
4.48∙10-5
0.0151
– 2.95∙10-4
0.0119
2.09∙10-5
0.0060
2.20∙10-5
0.0072
– 3.01∙10-6
0.0043
6.67
– 0.205
9.29
-0.036
9.23
0.071
15.27
-0.213
6.26
0.075
5.95
0.126
6.15
-0.190
*The 123 most recent observations are excluded since they all are zero.
The standard deviation listed in the table above should be taken for what it is, an approximation for
the entire 18 years sample. As can be seen in figure 2, the standard deviation varies quite a lot
throughout the entire period. The value of the kurtosis implies that every time series have fat tails.
The skewness is close to zero for all assets, hence the distributions can be assumed to be evenly
distributed around the mean and no tail is longer than the other.
4.2 Autocorrelation and Ljung Box test
The autocorrelation plot is used to investigate the assumption of independence a bit more
thoroughly. It is also used to investigate if heteroskedasticity occurs in the time series by testing the
squared returns. Ljung-Box test is another tool to investigate the autocorrelation and hence the
assumption of independence and the presence of heteroskedasticity in the time series. For
definitions of autocorrelation and the Ljung Box Q-test see appendix A. In this study a significance
level of 95% is set, by conventional choice.
OMXS30 Autocorrelation Function
OMXS30 Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
20
-0.2
0
5
10
Lag
15
20
Figure 3. Autocorrelation plot for the OMXS30 time series (left) and for the squared observations (right).
The plot indicates autocorrelation since the autocorrelations for several lags exceeds the 95%
confidence interval for no autocorrelation. If more than 5 % exceeds the confidence bound it implies
the presence of autocorrelation or heteroskedasticity. Autocorrelation plots for the other time series
can be found in Appendix A.
In the table below a summary of the Ljung Box Q test is presented. A test outcome equal to one
implies rejection of the null hypothesis of no autocorrelation. The significance level here is 95% as
well.
14
Table 2. Summary of Ljung Box Q test.
Test outcome
P-value
OMXS30
1
4.6672∙10-5
FTSE
1
9.7791∙10-12
N225
1
0
GSPC
1
0.0293
GBP
1
9.1342∙10-5
USD
1
1.2308∙10-5
EUR
1
0.0066
The Ljung Box Q-test for all the time series rejected the null hypothesis that no autocorrelation
occurs in the time series. The result of the test on the squared residuals gives a rejection of the null
for a p-value of zero for all the time series, i.e. heteroskedasticity occurs. As can be seen in table 2
there wouldn’t make a different if the significance level for the test was slightly higher or slightly
lower, autocorrelation occurs. The result presented above motivates the choice of time series
models in combination with EVT to improve the estimates.
4.3 Histogram and QQ plots
To see whether the data comes from a distribution with heavy tails a histogram against the normal
distribution is constructed as well as the QQ-plot and the mean excess plot.
OMXS30
Histogram of data, red line normal dist. fit
400
QQ Plot of OMXS30 Data versus Standard Normal
0.1
350
0.05
Quantiles of Input Sample
Number of observations
300
250
200
150
100
0
-0.05
-0.1
50
0
-0.15
-0.1
-0.05
0
Observation
0.05
0.1
-0.15
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
4
Figure 4. OMXS30 histogram and QQ plot against normal distribution.
The histogram for OMXS30 in the figure above seems to be more peaked around the mean. Since
several observations seem to be located far out in the tails it can be assumed that that the loss
distribution for OMXS30 has heavier tails than a normal. The QQ plot implies leptokurtic behavior
and hence that the data comes from a non-normal distribution. This appearance is noted for the
other time series as well, more or less distinctly, see histogram and QQ plots in appendix A.
The above two conclusions, non-normal and fat tails, suggest that Fréchet distribution can be a good
approximation for the tail and it can be estimated by the Hill estimator.
15
5. Method
In this section an overview of the method is presented. In the subsections 5.1 to 5.7 a more detailed
description of the methods is found. Remember, as mentioned in the previous section, that the
negative returns are used in this study. In the following sections “year” is often used as a unit to
describe the data length used. One year is set to be 252 observations, since there are 252 banking
days in one year on average.
First a threshold sensitivity analysis is made using the OMXS30 data and the POT method to get an
idea of how much the threshold choice affects the result. Some of the methods presented in section
2.5.1 are also investigated.
After that a simulation study of different methods is performed. For different length of the
calibration data
and
are estimated via historical simulation, simulation based on the
Hill method, the conditional Hill method and the POT method. In this step the threshold is set to be
10% of the total sample size, see section 2.5.1 under the subtitle The conventional choice for more
information. The data used in this step is not only OMXS30 but also data from the other assets
presented under section 3. However, a more extensive study is made for OMXS30. Focus is to
investigate which time horizon that gives stable models and which of the models that gives most
reliable estimate.
To investigate where the models don’t manage to predict the risk measures accurately and how fast
they adjust to change in volatility the OMXS30 time series is used. In a first step the entire prediction
periods for the different models are illustrated. After that, to see how the models behave for periods
of high volatility, a period from 2000 to 2002 is used since this was a turbulent time in the financial
world, for example it includes the IT crash, crisis in Asia and South America and Nine Eleven. The
financial crisis of 2008 and some other shorter time periods is used to see how the models adjust to
fast and large volatility increases. Predictions for VAR and ES are illustrated, how the curves behave
as well as where VAR exceedances occur and how the difference between the predicted ES and the
actual outcome looks like for these events.
To further investigate the models and determine which of them that gives reliable estimate, and for
which time horizons that occur, backtesting is used for all the time series included. To backtest the
models predictions are made for all sample sizes of calibration data. Since the data size in this study
has a range of 18 years the prediction period varies widely. When one year is used to calibrate the
models predictions can be made over 17 years while when 17 years is used for calibration only one
year can be predicted.
In the last part of the study only the OMXS30 data is used. The threshold is set to 5% and 7 % of the
total sample size. For the length of the calibration period of one to four years
and
are
calculated. This is done in the same way as in the previous simulation study, via historical simulation,
via simulation based on the Hill method, the POT method and the conditional Hill method. Since EVT
should be efficient for high quantile a test is made where estimations of VAR and ES for significance
level of
and
are generated and compared to a hypothetic data for a threshold of
10%.
16
5.1 Overview threshold sensitivity
As mentioned in section 2.5.1 the choice of threshold is critical. Therefore an initial analysis of the
sensitivity is made to get an idea of how influential the choice of threshold is on the results.
As presented in the section 2.5 the generalized Pareto distribution (GPD) is a god approximation of
the underlying distributions of the tail. To investigate the sensitivity four different thresholds are
specified, one, five, ten and twenty percentage of the total sample size. For these four thresholds a
GPD is fitted to the tail via pseudo-maximum likelihood. After that equation (24) and (25) is used to
calculate VAR and ES for four different significance levels. As a reference historical VAR and ES are
calculated via equations (4) and (6), where a MATLAB interpolation method is used if
is not an
integer.
The above procedure is done for two different sample sizes to see if the smaller one still gives
reliable estimates. The two sample sizes used are 18 years (full sample size) and 4 years, the data
used is OMXS30. These two lengths of the historical data used to estimate the GPD parameters, the
VAR and the ES, are arbitrarily chosen. The use of the full sample size generates the largest number
of observations in the tails and could hence give an indication of the number of observation needed
in the tail. A smaller sample size is also interesting to study to obtain a lower limit of the number of
observations needed in the tail. By common sense a parameter fit to a very small tail sizes is not
possible. Four years is chosen since it gives at least some observations in the tail when the tail size is
one and five percentage of the total four year sample size.
5.2 Threshold choice
An investigation of a few of the methods presented in section 2.5.1 is made to see if one or several
could be useful. Conventional choice is a starting point, see section 2.5.1. Therefore an upper bound
of 10% is set when the investigation of the convenient threshold choice based on the other methods
is made. First the graphical method mentioned under 2.5.1 are used. Both Hill plots and mean excess
plots are constructed. Then a Monte Carlo simulation and minimizing the mean squared error (MSE)
is performed. The graphical methods depend on subjective interpretations and the Monte Carlo
method has no connection to the real data. Therefore the conventional choice of 10% will be used
overall in this study. Some other choices will be tested, based on the result from this investigation, to
see if and how they affect the results. For algorithm and more detailed results see Appendix B.
5.3 Peak over Threshold
After the threshold is chosen the POT method can be used to obtain VAR and ES estimates. At first
the threshold is set to 10% of the total sample size. Hence, the assumption is made that the 10 %
most extreme observations form the tail of the distribution and the rest of the observations belongs
to the center of the distribution. For OMXS30 a sample of one, two, three, four, ten and 17 years is
used as calibration data.
For each calibration sample the following simulation procedure is used.
1. For the choice of tail size =10%, fit a GPD, given by (18), to the exceedances, y, above the
threshold based on the observations in the tail,
, via pseudo maximum likelihood.
Hence, the estimates of the GPD parameters and is obtained.
2. Use the parameter estimated, the tail size
and the threshold that follows from
to
estimate VAR and ES via equation (24) and (25) for a significance level of 99%.
17
This procedure is used for the OMXS30 data, for its respective calibration sample.
5.4 Hill estimator
As in section 5.3 the threshold is initially set to 10% of the total sample size. Different sample sizes
are used to derive the shape parameter via Hill simulation. For OMXS30 a sample of one, two,
three, four, ten and 17 years is used as calibration data. For the other risk factors a calibration
sample size of one, two, three and four years are used for calibration of the model.
For each calibration sample the following simulation procedure is used.
1. For the choice of threshold at tail size =10%, derive the shape parameter via equation
(26).
2. Using the shape parameter and the threshold the VAR and ES estimates can be obtain
from equations (29) and (30) for a significance level of 99%.
This procedure is used for all risk factors, for their respective calibration samples.
5.5 Conditional Hill estimator
The condition of i.i.d. observations is violated when for example autocorrelation and
heteroskedasticity occurs in returns series, see figure 2 in section 4.1. This is often the problem with
financial time series and can be solved by fitting a GARCH model to the data from which the
standardized residuals can be obtained. These residuals are in general i.i.d., so the conditions of EVT
are fulfilled. With this tactic the dependency in the data are hopefully no longer a problem and hence
more accurate estimates can be obtain.
In this study an AR(1)-GARCH(1,1) is used since it’s one of the simpler time series models that provide
a reasonable fit to the data used in this study. An AR(1)-GARCH(1,1) model is given by
(39)
(40)
For more details on the time series model see section 2.5.4 Conditional Hill estimator and 2.4 Time
series models. Hence, based on equations (32)-(33) the conditional mean and conditional standard
deviation predictions for
is given by
(41)
(42)
where
is the estimated coefficient of the model,
is the sample observation at
time ,
and is the sample standard deviation at time . As for the Hill estimator, see
section 4.3, different sample sizes are used to calibrate the model. For OMXS30 a sample of one,
two, three, four, ten and 17 years is studied. For the other risk factors a sample size of one, two,
three and four years is considered for calibration of the model.
For each calibration sample the following procedure is used.
18
1. Fit the AR(1)-GARCH(1,1) model as described by equations (39)-(40) via pseudo maximum
likelihood with normal innovations to the sample size in question.
2. Use the estimated model to calculate
and
by equations (41)-(42) and compute
standardized residuals for the calibration period via equation (34). The standardized
residuals should be i.i.d. in order for step 3 to be applicable.
3. For the residuals, use the Hill estimator to obtain the shape parameter via equation (26)
4. Calculate VAR and ES for the residuals via equations (29) and (30), for a threshold of 10 %
and a significance level of 99%.
5. Use the residual VAR and the residual ES and the calculated
and
to obtain the
estimate of the conditional VAR and conditional ES for the return series, which are given by
equation (37) and (38). Remember that these are the 1-day predictions.
This procedure is used for all risk factors, for their respective calibration periods. Since the
distributions of the data used are considered to be heavy tailed the procedure is repeated with the
modification that in step 1 Student´s t innovations are used when fitting the model. The degrees of
freedom used in the Student´s t are three. That is used because a Student´s t distribution with three
degrees of freedom is a heavy tailed distribution and that is the type of distribution considered in this
study.
In step 2 the residuals are obtained and to fulfill the conditions of EVT they should be i.i.d. To test
independence autocorrelation plots and Ljung Box Q test is made, as presented in section 4.2,
stationarity of the models is verified by checking that
.
The analyzes are done on four years of historical data, even though shorter time horizons will be
used, this is the maximum length used for most of the time series when estimating the risk measures.
The time horizon is from 2008 to 2011. The autocorrelation plots can be found in appendix C. As can
be seen in the autocorrelation plots in appendix C and in table 3 and 4 below the assumption of no
autocorrelations between the residuals holds, both when normal innovations and t innovations are
used in the models for all the time series. However heteroskedasticity occurs when t innovations are
used. But since a GARCH model is used to monitor the volatility and it adjusts to the prevailing
volatility when predicting the future, hopefully reasonable estimates for VAR and ES can be obtained.
Notice that the results in the table are for the specified time period. For other four-year time periods,
or shorter time periods, the non-heteroskedasticity assumption can be verified.
Table 3. The table shows the Summary of Ljung Box Q test on the residuals from an AR-GARCH fit with normal
innovations and with t innovations. Null hypothesis is that no autocorrelation occurs.
Residuals
OMXS30
FTSE
N225
GSPC
GBP
USD
EUR
.
Test outcome
Normal
innovations
0
0
0
0
0
0
0
P-value for test
statistic
0.5481
0.8123
0.7373
0.7373
0.5504
0.6685
0.3958
Test
outcome
T innovations
0
0
0
0
0
0
0
P-value for test
statistic
0.3162
0.7836
0.7732
0.2473
0.5956
0.6447
0.3064
19
Table 4. The shows the summary of Ljung Box Q test on the squared residuals from an AR-GARCH fit with normal
innovations and with t innovations. Null hypothesis is that no heteroskedasticity occurs.
Squared
residuals
Test outcome
Normal
innovations
P-value for test
statistic
OMXS30
FTSE
N225
GSPC
GBP
USD
EUR
0
0
0
0
0
0
0
0.03249
0.4442
0.06823
0.1142
0.4617
0.1930
0.9632
Test
outcome
T
innovations
1
1
1
1
0
1
0
P-value for test
statistic
0.001341
0.01403
8.260e-6
7.671e-10
0.3429
0.007740
0.5420
5.6 Backtesting
To verify the models a backtesting procedure is used. To be able to backtest a model both
calibration- and prediction data is needed. In this study the full sample size is 18 years, so when 1
year is used to calibrate the model predictions can be made over 17 years, when 2 years is used for
calibration, 16 can be used for predictions and so on. The predictions of the model are then used to
evaluate the reliability of it. For the risk measures considered in this study the backtesting estimates
are given in the sections below.
5.6.1 Backtesting VAR
To backtest the VAR-estimating models the predicted loss is compared to the actual loss observed on
next day. When the actual loss is greater than the estimated VAR an exceedance has occurred. For
VAR a statistically expected number of exceedances over the VAR curve is expected. For
the
loss will not exceed that value 99 times out of 100. Hence one exceedance for every 100
observations is statistically expected. These statistically expected numbers of exceedances are used
as the reference. A model is considered to be reliable if the number of exceedances when
backtesting the model is within a 10% confidence interval from the statistically excepted. To obtain
the backtesting statistic calculate the number of occasions where the actual loss return, , is greater
than the predicted VAR:
(43)
Desirably, U should be very close to the statistically expected. Hence, in this study a model is
considered to be reliable when the backtesting statistic is within the acceptable limits, which are
the following:
(44)
Where
is the statistically expected number of exceedances (McNeil (1999)).
5.6.2 Backtesting ES
To obtain the backtest estimate calculate the average difference between actual loss return,
forecasted ES conditional on having a loss return exceeding the corresponding VAR estimate:
, and
20
(45)
A positive value of the statistic indicates an underestimation of the risk loss and a negative value
indicates an overestimation. For ES estimations the value of V should be as close to zero as possible,
for the model to be as reliable and accurate as possible. (Kourouma et al (2011), Embrechts,
Kaufmann and Patie (2005)). Since the actual loss and the estimated ES both are in percentage
change, the backtesting results for ES will be presented in the unit of percentage. For example if
the result will be presented as 1%, i.e. on average the model that estimate the ES measure
underestimate the ES loss with 1%. Worth noticing is that the backtesting statistic depends on the
VAR estimate. This implies that the accuracy of the VAR estimation must be taken into consideration
when evaluating the ES backtesting statistic.
5.7 Higher significance levels
To test how the EVT models performs for VAR and ES predictions of higher significance an AR(1)GARCH(1,1) model is used to generate a hypothetical distribution of the loss returns for the day the
predictions are made for. For the AR-GARCH model description se section 5.5. The test is made on
OMXS30 data. The length of historical data used to fit the AR-GARCH and obtain the parameters for
the EVT models are the last one, two, three and four years in the sample, i.e. data from 2011, 20102011 etc. The threshold used is 10% of the sample size by conventional choice, see section 2.5.1 and
5.2.
The prediction of the loss distribution for the next day, in this case 2012-01-01, is by equation (39)(40) with
and comes from a student’s t distribution with three degrees of freedom. The
for the residuals is obtained from the
quantile of the distribution of .
for the
residuals is obtained via Monte Carlo simulations and the definition of ES, defined as in equation (5).
To get sufficiently many observations in the tail for ES estimations 1 000 000 ’s are generated so a
hypothetical distribution of 1 000 000 predictions are used to obtain the estimate.
The different models for estimating the risk measures are presented in section 5.4 and 5.5. For each
model one to four years of historical data are used respectively to obtain the estimate of VAR and ES
for the day 2012-01-01 for a significance level of
and
. The results from the models
are compared with the outcome from the hypothetical loss distribution, for which the “actual” VAR
an ES are obtained as described above.
6. Results
First the results from the sensitivity analysis of the threshold choice are presented. After that the
illustrations of the models are presented. Predictions plots over the entire available time period as
well as over shorter periods like 2008 are included. Then the results from the backtesting of the
models for all the different time series included are outlined. In section 6.3 the backtesting results for
a threshold of 10% is found while the results for when the threshold is varying is found in section 6.4.
The last section contains the result for when VAR and ES are estimated for higher significance levels
than 99%.
21
6.1 Overview threshold sensitivity
In table 5 and 6 the result of the threshold sensitivity study is presented. The VAR and ES predictions
are all for the same day, i.e. the predictions are for the first day after the last day in the calibration
data. The total number of days in the sample is found in the top left cell, under parameters gives
the number of observations in the tail and gives the threshold value. The GPD parameters are, as
mentioned under 5.1, fitted via maximum likelihood. Notice that they don’t converge for all tail sizes.
This means that the conditions for the parameters, given in section 2.5, equation 18, aren’t fulfilled
for the estimated parameters or that the parameters didn’t converge during the maximum likelihood
procedure. An example is that the parameters converge to some estimate that exceeds a predefined
tolerance level and can therefore not be considered as reliable. The main reason for this is the low
number of observations in the tail that is used in the maximum likelihood estimation. Notice that the
number of observations that is equal to or smaller than VAR is very small for higher significance
levels and hence the historical VAR and ES are just the one most extreme observation in the sample.
Table 5. Entire sample used to estimate the VAR for 2012-01-01. The sample consists of OMXS30 data that starts 199401-03 and ends 2011-12-30.
Sample
Parameters
size 4577
(18 years)
Threshold
1%*
45.77
0.0416
5%
228.85 0.0247
10%
457.7
0.0172
20%
915.4
0.0104
Historical VAR
GPD parameters
-0.0806
-0.0538
-0.0423
0.0045
Threshold
1%*
5%
10%
20%
Historical expected shortfall
(% of total sample size)
0.0105
0.0110
0.0111
0.0102
VAR estimates for different significance levels
95%
99%
99,9%
99,97%
-0.0414
0.0636
0.0737
0.0247
0.0417
0.0636
0.0740
0.0248
0.0416
0.0637
0.0745
0.0246
0.0412
0.0653
0.0779
0.0247
0.0416
0.0663
0.0735
ES estimates for different significance levels
95%
99%
99,9%
99,97%
-0.0512
0.0717
0.0811
0.0351
0.0513
0.0721
0.0820
0.0351
0.0513
0.0725
0.0828
0.0349
0.0517
0.0758
0.0885
0.0352
0.0514
0.0730
0.08172
228.8 obs 45.77 obs 4.577 obs
1.373 obs
(5%)
(1%)
(0.1%)
(0.03%)
*ML estimates did not converge
1
Calculated with “prctile” in MATLAB
2
The one most extreme observation
22
Table 6. A 4 year sample is used to estimate the VAR for 2012-01-01. The sample consists of OMXS30 data that starts
2007-12-28 and ends 2011-12-30.
Sample
Parameters
size 1008
(4 years)
Threshol
d
1%*
10.08
0.0515
5%*
50.4
0.0302
10%
100.8
0.0214
20%
201.6
0.0131
Historical VAR
GPD parameters
VAR estimates for different significance levels
95%
-0.6501
-0.2492
-0.1178
-0.0114
Threshold
1%*
5%*
10%
20%
Historical expected shortfall
(% of total sample size)
0.0147
0.0155
0.0140
0.0122
99%
99,9%
99,97%
-0.0513
0.0690
0.0717
0.0300
0.0506
0.0687
0.0748
0.0306
0.0495
0.0710
0.0801
0.0299
0.0491
0.0759
0.0897
0.0302
0.0516
0.0703
0.072322
ES estimates for different significance levels
95%
99%
99,9%
99,97%
-0.0603
0.0710
0.0726
0.0424
0.0589
0.0734
0.0783
0.0421
0.0590
0.0783
0.0864
0.0418
0.0607
0.0873
0.1009
2
0.0424
0.0598
0.07232
-50.4 obs
10.08 obs 1.008 obs 0.3024 obs
(5%)
(1%)
(0.1%)
(0.03%)
* ML estimates did not converge
1
Calculated with prctile in MATLAB
2
The one most extreme observation
In the tables above is can be seen that the different choices of threshold generate quite different
result. Now, only one value for VAR and ES are estimated, to make a more thorough investigation
several estimate for each confidence level can be made. The major problem seems to be the number
of observations in the tail. For both sample sizes the GPD parameter estimates don’t converge for all
choices of threshold. In Alexander (2008) it’s concluded that for a 10% tail a sample must have at
least 2000 observations to get at good estimate of the GPD parameters, i.e. there must be at least
200 observations in the tail. However, as can be seen in table 5 and 6 this limit isn’t holding for the
OMXS30 data. Thus, the threshold choice is neither obvious nor arbitrary.
6.2 Prediction plots
In the figures below predictions of VAR and ES based on different length of calibration period are
illustrated. For example, when one year of calibration data is used, the first prediction is of the first
day in 1995, the second is for the second day in 1995 based on the previous one year calibration and
so on. The calibrations length is fixed and the period moves as the predictions are made. This is done
to get an idea of how the different methods behave, how they adjust to change in volatility and how
they adjust over time. The threshold is set to 10%, only data and results for OMXS30 are used in
plots, the other time series give very similar appearances. For the conditional Hill the models based
in normal innovations are used in the figures since that model and the one based on t innovations
have very similar appearances.
23
6.2.1 Hill and GPD simulation
Calibrationperiod 1 year
Calibrationperiod 4 year
0.15
0.1
0.1
0.05
0.05
0
Losses
Losses
0.15
-0.05
-0.1
-0.1
-0.15
-0.15
-0.2
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
Calibrationperiod 10 year
0.1
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
PF
VAR hist
VAR hill
Calibrationperiod 17 year
ES hist
0.1
ES Hill
0.05
0.05
0
Losses
Losses
0
-0.05
-0.05
0
-0.05
-0.1
-0.15
Oct03 Jul04 Apr05 Jan06 Oct06 Jul07 Apr08 Dec08 Sep09 Jun10 Mar11 Dec11
-0.1
Nov10 Dec10 Jan11 Mar11 Apr11 May11 Jun11 Aug11 Sep11 Oct11 Nov11 Dec11
Figure 5. The figure shows the Hill predictions of VAR and ES and the actual negative return series of OMXS30 for
calibration periods of different length. The length of the calibration period for each window is found in the title of the
window. The threshold for extreme observations is set to 10 % of the sample.
Calibrationperiod 4 year
0.1
0.05
0.05
0
0
Losses
Losses
Calibrationperiod 1 year
0.1
-0.05
-0.1
-0.05
-0.1
-0.15
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
Calibrationperiod 10 year
0.1
0.05
-0.15
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
PF
VAR hist
VAR GPD
0.08
ES hist
ES GPD
0.06
Calibrationperiod 17 year
-0.05
Losses
Losses
0.04
0
0.02
0
-0.02
-0.1
-0.15
Oct03 Jul04 Apr05 Jan06 Oct06 Jul07 Apr08 Dec08 Sep09 Jun10 Mar11 Dec11
-0.04
-0.06
Nov10 Dec10 Jan11 Mar11 Apr11 May11 Jun11 Aug11 Sep11 Oct11 Nov11 Dec11
Figure 6.The figure shows the GPD predictions of VAR and ES and the actual negative return series of OMXS30 for
calibration periods of different length. The length of the calibration period for each window is found in the title of the
window. The threshold for extreme observations is set to 10 % of the sample.
In figure 5 the predictions based on historical simulation are included as a comparison. It can be seen
that the longer calibration period you use, the more immovable the model get. The same can be seen
in figure 6 where the GPD based predictions are illustrated. They follow the historical simulation
more and more closely the longer the calibration period gets. A model with calibration period of ten
years or more is stable, but it greatly overestimates the risk as long as nothing happens. However,
when for example the volatility increases it cannot adapt and hence the actual return exceeds the
predicted VAR and ES, see the plot in the bottom half of figure 6, around year 2008 (left) and august
24
to October 2011 (right). In the top right plot the models also seems a bit inflexible, except from the
ES predictions of the Hill model. One year is the one that is most flexible. In Appendix D the graphs
for 2 and 3 years of calibration data can be seen for both the Hill estimator and the GPD approach.
The fact that flexibility decreases when the calibration period becomes longer is evident. However,
the shorter calibration period the better is not entirely true, if it’s too short the risk is constantly
underestimated, see section 6.3.
Since the GPD predictions follow the predictions based on historical simulation so closely no further
investigation is made on the GPD models. The GPD model is a parametric model which requires
estimation of parameters and model assumptions which can be inaccurate. In the estimations based
on one and two years of historical data the maximum likelihood estimated parameters of the GPD
don’t converge, just as for some of the threshold choices described in section 6.1. Therefor the plot
of predictions based on one year of historical data is not reliable and should be ignored. Based on the
above mentioned shortcomings of the GPD approach the historical simulation is preferable of the
two, since they in principle generate the same predictions, and historical simulations will be included
in the further study while the GPD approach will be excluded.
6.2.2 Conditional Hill simulations
Calibrationperiod 4 year
0.3
0.2
0.2
0.1
0.1
Losses
Losses
Calibrationperiod 1 year
0.3
0
-0.1
0
-0.1
-0.2
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
Calibrationperiod 10 year
0.2
0.15
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
PF
VAR hist
VAR cond Hill N
Calibrationperiod 17 year
0.15
ES hist
ES cond Hill N
0.1
0.05
0
Losses
Losses
0.1
0.05
0
-0.05
-0.1
-0.15
Oct03 Jul04 Apr05 Jan06 Oct06 Jul07 Apr08 Dec08 Sep09 Jun10 Mar11 Dec11
-0.05
-0.1
Nov10 Dec10 Jan11 Mar11 Apr11 May11 Jun11 Aug11 Sep11 Oct11 Nov11 Dec11
Figure 7. The figure shows the predictions of VAR and ES by conditional Hill simulation and the actual negative return
series, for calibration periods of different length. The length of the calibration period for each window is found in the
title of the window. The threshold for extreme observations is set to 10 % of the sample.
As can be seen in figure 7 the conditional Hill predictions are much more flexible than those based on
historical simulation, and the one based on just Hill. The length of the calibration period seems to
have little influence on the predictions; all models adjust quickly to changes in volatility. To see how
the different calibration periods perform in terms of over- or underestimation of risk, see section 6.3.
Figures that shows predictions based on two and three years of calibration data can be found in
appendix D, as well as one figure containing VAR predictions from both conditional Hill methods and
another containing ES prediction. Those two are included to illustrate the similarities between them,
and the fact that the difference is small.
25
6.2.3 Comparisons
For all models the predictions of VAR and ES aren’t always as good as desired. In figure 8 the
exceedances over the predicted VAR are illustrated. The predictions are based on calibration periods
of one and four years.
PF
VAR hist (70)
VAR hill (54)
VAR cond Hill (46)
0.1
Marks VAR exceedances, calibrationperiod 1 year
0.05
0
-0.05
-0.1
-0.15
Dec94
Jun96
Jan98
Jul99
Jan01
Jul02
PF
VAR hist (62)
VAR hill (44)
VAR cond Hill (32)
0.1
Feb04
Aug05
Mar07
Sep08
Apr10
Nov11
Oct06
Jan08
Apr09
Aug10
Nov11
Calibrationperiod 4 year
0.05
0
-0.05
-0.1
-0.15
Nov97
Mar99
May00
Aug01
Nov02
Mar04
Jun05
Figure 8. The figure shows where the negative return series exceeds the predicted VAR, for different methods. The
symbols are the predicted VAR value at that time. The legend specifies what symbol belongs to which simulation method
and the numbers inside the parenthesis are the total number of exceedances during the prediction period. Threshold is
10% of the sample.
Marks VAR exceedances, calibrationperiod 1 year
PF
VAR hist
VAR hill
VAR cond Hill N
Marks VAR exceedances, calibrationperiod 4 year
0.06
0.04
0.04
0.02
0.02
0
Losses
Losses
0.06
0
-0.02
-0.02
-0.04
-0.04
-0.06
Dec05 Jan06 Jan06 Feb06 Mar06Mar06 Apr06 May06May06 Jun06 Jun06 Jul06
-0.06
Dec05 Jan06 Jan06 Feb06 Mar06Mar06 Apr06 May06May06 Jun06 Jun06 Jul06
Figure 9. The figure shows the marks where the exceedances of VAR for the different models as well as the actual
negative return series, for a time period around May – June 2006. In the left figure a calibration period of one year is
used, in the right figure four years. Threshold is 10% of the sample.
As can be seen in both the lower and the upper plot of figure 8 the exceedances often occur at the
same time. However the total number of exceedances differs and the main reason is the fact that the
models react to volatility change more and less quickly. In figure 9 and figure 10 below that fact is
clear. The conditional Hill models have exceedances more evenly distributed over the periods than
the historical and Hill models.
As can be seen in figure 9 all models fail to capture the great increase in volatility around may 2006.
When a longer calibration data is used the exceedances are slightly fewer and not as large. However
26
neither are able to capture the first big jump and even the conditional Hill model aren’t able to adjust
to the drastic change in volatility and several exceedances occurs close to another.
Marks VAR exceedances, calibrationperiod 1 year
0.08
0.06
PF
VAR hist
VAR hill
VAR cond Hill N
0.06
0.04
0.04
0.02
0.02
0
Losses
0
Losses
Marks VAR exceedances, calibrationperiod 4 year
0.08
-0.02
-0.02
-0.04
-0.04
-0.06
-0.06
-0.08
-0.08
-0.1
-0.12
Jun08 Jul08
-0.1
Jul08 Aug08Aug08 Sep08 Oct08 Oct08Nov08 Nov08 Dec08
-0.12
Jun08 Jul08
Dec08
Jul08 Aug08Aug08 Sep08 Oct08 Oct08Nov08 Nov08 Dec08
Dec08
Figure 10. The figure shows the marks where the exceedances of VAR for the different models as well as the actual
negative return series, for the time period around the crisis 2008. In the left figure a calibration period of one year is
used, in the right figure four years. Threshold is 10% of the sample.
In figure 10 the period around the IT crash and some other turbulence are illustrated. The conditional
Hill model is able to adjust to the sudden volatility increase and hence avoid many exceedances. Both
the historical and the Hill models are too rigid and hence several exceedances follow after one
another when the models aren’t able to adapt to the change in volatility.
In figure 11 and 12 below a section around the financial crisis of 2008 is shown, one year calibration
data is used. It’s obvious that the conditional Hill estimator is the best when coming to adjust for
emerging volatility increases. The other methods are, as mentioned earlier, more stiff and hence fail
to capture fast and large changes.
PF
VAR hist
VAR hill
VAR cond Hill N
ES hist
ES hill
ES cond Hill N
Predictions, calibrationperiod 1 year
0.2
0.15
0.1
Losses
0.05
0
-0.05
-0.1
-0.15
Jun08
Jul08
Jul08
Aug08
Aug08
Sep08
Oct08
Oct08
Nov08
Nov08
Dec08
Dec08
Figure 11. The figure shows the predictions of VAR and ES as well as the actual negative return series, for the time period
around the crisis 2008. Threshold is 10% of the sample.
27
PF
ES hist
ES hill
ES cond Hill N
VAR hist
VAR hill
VAR cond Hill N
Marks VAR exceedances and shows ES predictations, calibrationperiod 1 year
0.2
0.15
0.1
Losses
0.05
0
-0.05
-0.1
-0.15
Jun08
Jul08
Aug08
Sep08
Oct08
Dec08
Jan09
Feb09
Mar09
Apr09
May09
Jun09
Figure 12. The figure shows the predictions of ES as well as marks where the actual negative return series exceeds the
predicted VAR. The period is from June 2008 to June 2009, this to show both where the exceedances occurs as well as the
flexibility in the different models.
As shown in figure 12 the number of VAR exceedances for the conditional Hill models is fewer than
for the others. However, it can be seen that the conditional Hill estimator for the most part generate
higher predictions of the ES risk during the period of high volatility. Since ES should be the average of
the observations exceeding VAR the Hill estimator or even the historical simulation seems to be a
more accurate models. Notice however that the conditional Hill predictions are more flexible and
adjust to both increase and decrease in volatility rather quickly.
The same plots are made for the turbulent time from the middle of 1999 to 2002, including the Asian
crisis and the burst of the IT bubble among other events. In figure 13 below the flexibility of the
conditional Hill model versus the stiffness of the others is visible.
Calibrationperiod 1 year
0.15
PF
VAR hist
VAR hill
VAR cond Hill N
ES hist
ES hill
ES cond Hill N
0.1
Losses
0.05
0
-0.05
-0.1
Jun99
Sep99
Jan00
Apr00
Jul00
Oct00
Feb01
May01
Aug01
Nov01
Mar02
Jun02
Figure 13. The figure shows the predictions of VAR and ES as well as the actual negative return series, for the time period
around the IT crisis and nine eleven. Threshold is 10% of the sample.
28
Marks VAR exceedances, calibrationperiod 1 year
0.1
PF
VAR hist
VAR hill
VAR cond Hill
0.08
0.06
0.04
0.02
0
-0.02
-0.04
-0.06
-0.08
Jun99
Sep99
Jan00
Apr00
Jul00
Oct00
Feb01
May01
Aug01
Nov01
Mar02
Jun02
Figure 14. Shows the marks where the exceedances occurred for the different models. Threshold is 10% of the sample.
PF
ES hist
ES hill
ES cond Hill N
VAR hist
VAR hill
VAR cond Hill T
Marks VAR exceedances and shows ES predictations, calibrationperiod 1 year
0.15
0.1
Losses
0.05
0
-0.05
-0.1
Jun99
Sep99
Jan00
Apr00
Jul00
Oct00
Feb01
May01
Aug01
Nov01
Mar02
Jun02
Figure 15. The figure shows the predictions of ES as well as marks where the actual negative return series exceeds the
predicted VAR.
In figure 14 and 15 the Conditional Hill method gives the least exceedances over the estimated VAR
and the ES predictions seem to be more accurate than shown in figure 12 above. However neither
method can foresee a solo large price change. An example is between August and November 2001 in
figure 13 and 14. Both the historical, and hence the GPD, and the Hill estimator seems to remember
history for a longer time. When the volatility starts to increase the memory is still of a calm period
29
and therefor the models don’t react so quickly. On the other hand, when the volatility decreased
after a more turbulent period, the memory is still of a turbulent method and that is reflected in the
stiffness of the predictions, that don’t adjust to the now prevailing calm period.
6.3 Backtesting results, 10% threshold
In this section backtesting result for each of the time series are presented. For the different
simulation methods, Historical, Hill etc., different length of calibration period has been used. For
each period a comparison between the statistically expected number of exceedances and the
number of exceedances from the simulations is done. The simulation method that is closest to the
statistically expected is considered as the best method for that length of calibration period. In each
simulation the threshold for the EVT based methods is set to be 10% of the sample used. I.e. when
one year (252 days) is used as calibration period the tail consists of 25.2 observations. For the
conditional Hill approach one model is based on the assumption of normal innovations (Conditional
Hill N in the table) and the other one on the assumption of student´s t innovations with three
degrees of freedom (Conditional Hill T in the table).
For the backtesting of VAR both the number of exceedances for the models are presented and in
parentheses the difference from the statistically expected, in percentage, are presented. It needs to
be 10% or less to be considered an acceptable model. A positive value indicates more exceedances
than the expected and hence the model generates an underestimation of the VAR risk, if it’s a
negative value the opposite applies. For the backtesting of ES the average difference are presented,
in percentage. It needs to be within 1% from zero for the models to be acceptable, conditioned on
the acceptance of the VAR estimate. A positive value of the statistic indicates an underestimation of
the ES risk and a negative value indicates an overestimation. The outcomes that fall within the
acceptable limits are highlighted via dark thick letters. For ES the outcome closes to zero for every
calibration period is also market, even though the VAR estimate may be unacceptable.
Summary
In table 7 and 8 a short summary of the backtesting results is presented. For all the time series the
historical simulation gave the ES statistic closes to zero, but it always indicated an underestimations
of the risk (the statistic is positive) and the VAR estimate did never fall within the acceptable limits.
Hence, one of the EVT methods is always preferable, but which one differs between the time series.
30
Table 7. A summary of the best VAR prediction model for the different time series. The difference from the statistically
expected is given in percentage and a positive value indicates more exceedances than expected and a negative value
indicates the opposite.
Time series
Best VAR model within
limits for each method
OMXS30
Hill
Conditional Hill N
Conditional Hill T
8%
6%
2%
2
1
1
FTSE
Conditional Hill N
Conditional Hill T
Hill
Hill
Conditional Hill N
Conditional Hill T
-3%
-2%
7%
1%
-3%
-5%
1
2
1
4
1
1
EUR
GBP
Hill
Hill
Conditional Hill N
Conditional Hill T
-2%
1%
-5%
-8%
4
4
1
3 and 4
USD
Hill
Conditional Hill N
Conditional Hill T
-1%
1%
1%
3
2
3 (+) and 4 (-)
GSPC
N225
Difference from
statistically expected (%)
Calibration period
(in years)
It’s clear that the preferable model differs among the time series but common for the models is that
they are based on the extreme value theory. The historical simulation approach never even
generates a model that falls within the limits of acceptance for a reliable model, i.e. the number of
exceedances is never within 10% from the statistically expected. In the sections 6.3.1 to 6.3.7 below
it can be seen that the models that fall within the limits of acceptance vary from time series to time
series. The number of models that are considered reliable also differs largely. Notice that for some
time series not all methods generate models that fall within the limits of acceptability, for more
details see in the sections 6.3.1 to 6.3.7.
However, the Hill and the conditional Hill with t innovations are the two most eminent. Worth
mentioning is that the best models which are highlighted in the table above are closely followed by
others, as can be seen in table 7 as well as in the following sections. The length of calibration data
used to obtain the most accurate model vary but generally its longer periods for the Hill estimator
and shorter for the conditional Hill.
31
Table 8. A summary of the best ES prediction model for the different time series. The average difference is the difference
between the estimated ES and the actual outcome for those days where the actual VAR exceeded the estimated. A
negative value indicates an overestimation of the risk, i.e. the estimated ES is larger than the actual outcome, on
average.
Time series
Best ES model within
Average difference (%) Calibration period
limits for each method
(in years)
OMXS30
Conditional Hill N
-0.53%
1
Conditional Hill T
-0.84%
1
FTSE
Conditional Hill N
-0.71%
1
Conditional Hill T
-0.86%
1
GSPC
Hill
-0.54%
1
N225
Conditional Hill N
-0.70%
1
Conditional Hill T
-0.82%
1
EUR
Hill
-0.26%
3
GBP
Hill
-0.50%
4
Conditional Hill N
-0.33%
1
Conditional Hill T
-0.37%
4
USD
Hill
-0.39%
3
Conditional Hill N
-0.56%
4
Conditional Hill T
-0.52%
2
The conditional Hill methods are on average superior. For the majority of the time series tested they
generated reliable models. Of the two the one with normal innovations is preferable. However for
both GSPC and EUR the Hill estimator is the only method that generates models who meet the
requirements for reliable models. Averagely one year of calibration data seems to provide the most
reliable models but no clear conclusion can be drawn from the summary table. For some time series
longer time horizons seems desirable.
In the following sections as well as in table 7 and 8 a general rule of thumb can be set to be that
extreme value theory based models for indices need shorter calibration periods to generate reliable
models for VAR and ES estimations while models for foreign exchange rates need longer periods.
However, exceptions exist.
32
6.3.1 OMXS30
Table 9. Simulation results for OMXS30, shows how many times the actual percentage change were greater than the VAR
prediction. Since
is studied the expected number of exceedances are given by 1 % of the total prediction period.
The ES statistic measures the average difference between the estimated
and the observations exceeding VAR.
OMX Calibration period
Length prediction period
(in days)
Expected number of
exceedances (in days)
Exceedances over
estimated VAR, :
Historical
Hill
Conditional Hill N
Conditional Hill T
GPD estimate
Statistic for ES, :
Historical
Hill
Conditional Hill N
Conditional Hill T
GPD estimate
1 year
2 years
3 years
4 years
10 years
17 years
4324
4072
3820
3568
2056
292
43.24
40.72
38.20
35.68
20.56
2.92
70 (62%)
54 (25%)
46 (6%)
44 (2%)
76 (76%)
60 (47%)
44 (8%)
36 (-12%)
34 (-17%)
64 (57%)
64 (68%)
43 (13%)
35 (-8%)
35 (-8%)
63 (65%)
62 (74%)
44 (23%)
32 (-10%)
24 (-33%)
64 (79%)
23 (12%)
16 (-22%)
17 (-17%)
14 (-32%)
23 (12%)
5 (71%)
4 (37%)
3 (3%)
2 (-32%)
5 (71%)
0.11%
-1.00%
-0.53%
-0.84%
1.30%*
0.29%
-1.25%
-0.81%
-0.97%
0.23%*
0.08%
-1.55%
-0.73%
-1.10%
0.14%
0.04%
-2.00%
-1.05%
-0.96%
0.06%
0.10%
-2.34%
-0.87%
-1.19%
0.06%
0.06%
-2.70%
-1.15%
-0.95%
0.06%
*ML estimates didn’t converge
Remember, a positive value of the ES statistic, V, indicates an underestimation of the risk loss and a
negative value indicates an overestimation. Notice that the GPD parameters don’t converge for all
tail sizes. As mentioned in section 6.1, this means that the conditions for the parameters, given in
section 2.5, equation 18, aren’t fulfilled for the estimated parameters or that the parameters didn’t
converge during the maximum likelihood procedure. Because of that the GPD model that uses one
and two years of historical data to estimate VAR and ES are unreliable and so is the results from
those models.
The calibration period of 17 years can only be backtested over one year and therefor the low number
of statistically expected exceedances. Because of the low number the limits for a reliable model end
up around 2.6 to 3.2 exceedances, hence unless the number of exceedances is three the model is
rejected as unreliable. To properly investigate this length of calibration data a longer backtesting
period is needed. It’s because of this the best model for the conditional Hill with normal innovations
in table 9 is considered to be the one based on one year of calibration data.
For ES the conditional Hill methods performs best for almost every length of calibration periods.
Since the backtesting statistic for ES depends on the estimation of VAR only the conditional Hill
methods fulfill the conditions for a reliable model. As can be seen in table 9 only one and three years
of calibration data generates models that falls within the acceptable limits. Of those the calibration
period of one year with the conditional Hill model with normal innovation gives the best predictions.
If the restriction of an accurate estimate of VAR is needed to obtain a good ES model is ignored
several more models are within the 1% limit. Historical simulation always gives an underestimation of
the risk since the statistic is positive for all calibration periods, but it’s almost always the one
closest to zero. The exception is the GPD method when 10 years of historical data is used to calibrate
33
the model. The Hill estimator never generate an acceptable model for OMXS30, for the conditional
Hill methods the length of calibration data used affect the reliability, overall shorter time horizons
seems preferable over longer.
Considering VAR models the Conditional Hill with t innovations based on one year of calibrations
data gives the number of exceedances closes to the statistically expected for the OMXS30 loss series.
However, several other models fall within the limits for a reliable model. The conditional Hill
methods are preferable for most calibration periods. The conditional Hill with normal innovations
seems to be the most accurate for longer calibration periods. For a calibration period of two year the
Hill estimate seems to be slightly better, but the difference is small. Even though the model that
performs closest to the statistically expected is the conditional Hill model with t innovations, the one
with normal generates models within the limits of acceptability for almost all lengths of calibration
data. Therefore if one wants a model that is robust regardless of length of calibration period that one
is preferable.
For calibration periods of two years or longer the conditional Hill methods overestimate the risks
while historical, Hill and GPD underestimate it. The conditional Hill with T innovations gives fewer
and fewer exceedances over the estimated VAR than the one with normal innovations the longer the
calibrations period used.
To summarize the above, one of the conditional Hill methods should be used to obtain reasonable
estimate of VAR and ES, preferable with one year of historical data as the calibration period.
6.3.2 FTSE
Table 10. Simulation results
and
predictions based on the time series of FTSE.
FTSE Calibration period:
1 year
2 years
3 years
4 years
Length prediction period (in days)
4324
4072
3820
3568
Expected number of exceedances (in days) 43.24
40.72
38.20
35.68
Exceedances over estimated VAR, :
Historical
71 (64%) 71 (74%) 71 (86%) 67 (88%)
Hill
52 (20%) 60 (47%) 60 (57%) 56 (57%)
Conditional Hill N
42 (-3%) 34 (-17%) 33 (-14%) 30 (-16%)
Conditional Hill T
38 (-12%) 40 (-2%) 37 (-3%) 33 (-8%)
Statistic for ES, :
Historical
0.04%
0.21%
0.20%
0.27%
Hill
-1.25%
-1.20%
-1.47%
-1.70%
Conditional Hill N
-0.71%
-0.83%
-0.90%
-1.07%
Conditional Hill T
-0.82%
-0.86%
-0.96%
-1.13%
For VAR predictions, one of the conditional Hill methods is preferable, which one depends on the
length of calibration period used. However the one based on t innovations are the only one that
generated models within the limits for a reliable model when two years or more is used. When one
year is used it gives 12% fewer exceedances than expected which is close to the limit. Hence, that
method can be seen as the one preferable of the two is a choice has to be made. Noticeable is that
both the conditional Hill methods are always overestimating the risk.
As can be seen in table 10 the time horizon one should use doesn’t really matter, reliable models can
be obtained for each length of calibration data tested. For two years or more the conditional Hill with
34
t innovation is a convenient choice, for one year the one based on normal innovations is slightly
better.
For ES the historical simulation gives an underestimation of the risk, however it’s the method with
the statistic closest to zero as for the other time series. But since the conditions for an acceptable
model includes a reasonable estimate of VAR only the conditional Hill methods for estimating ES
should be considered. The length of calibration data can be between one and three years, depending
on which distribution the innovations are assumed to come from, one year if its normal innovations,
two or three if it’s t innovations.
6.3.3 GSPC
Note that the GSPC time series had zeros on the latest 123 days, these observations are therefore
excluded in the analysis.
Table 11. Simulation results
and
predictions based on the time series of GSPC.
GSPC Calibration period:
1 year
2 years
3 years
4 years
Length prediction period (in days)
4201
3949
3697
3445
Expected number of exceedances (in days)
42.01
39.49
36.96
34.45
Exceedances over estimated VAR, :
Historical
58 (38%) 63 (60%) 59 (60%) 58 (68%)
Hill
45 (7%) 48 (22%) 46 (24%) 39 (13%)
Conditional Hill N
36 (-14%) 31 (-21%) 24 (-35%) 24 (-30%)
Conditional Hill T
31 (-26%) 27 (-32%) 23 (-38%) 17 (-51%)
Statistic for ES, :
Historical
0.14%
0.17%
0.14%
0.11%
Hill
-0.54%
-0.57%
-0.57%
-0.53%
Conditional Hill N
-0.50%
-0.68%
-0.66%
-0.88%
Conditional Hill T
-0.26%
-0.52%
-0.56%
-0.60%
The only VAR prediction model that falls within the limits of acceptability is obtained by the Hill
estimator based on one-year calibration. All other models are pretty far off. The historical and Hill
methods underestimate the risk quite substantially while the conditional Hill methods overestimate
it.
Since only one VAR estimating model is acceptable that is the only one that can be a candidate for ES
estimations. As can be seen in table 11 the backtesting statistic is smaller than 1% and hence the Hill
estimator model based on one year of calibration data is the model that gives trustworthy estimate
of VAR and ES for time series that are similar to the one of GSPC. If the condition of an accurate VAR
estimate is ignored several more models can be seen as appropriate. Hence, an incorrect estimation
of VAR seems to generate an acceptable estimate of ES. If underestimation of ES risk is acceptable
the historical simulation based on four years of calibration data is the best model, if not any of the
EVT methods can be used.
35
6.3.4 N225
Table 12. Simulation results
and
predictions based on the time series of N225.
N225 Calibration period:
Length prediction period (in days)
Expected number of exceedances (in days)
1 year
4324
43.24
2 years
4072
40.72
3 years
3820
38.20
4 years
3568
35.68
Exceedances over estimated VAR, :
Historical
60 (39%) 56 (38%) 54 (41%) 44 (23%)
Hill
48 (11%) 48 (18%) 45 (18%)
36 (1%)
Conditional Hill N
42 (-3%) 29 (-29%) 28 (-27%) 23 (-36%)
Conditional Hill T
41 (-5%) 32 (-21%) 29 (-24%) 23 (-36%)
Statistic for ES, :
Historical
0.24%
0.30%
0.12%
0.25%
Hill
-1.21%
-1.24%
-1.30%
-1.50%
Conditional Hill N
-0.70%
-0.69%
-0.76%
-0.69%
Conditional Hill T
-0.82%
-0.96%
-0.97%
-0.93%
The Hill model with four years of calibration is the preferable model for VAR estimations, followed
closely by the conditional Hill model with normal innovations and one-year calibration period. These
are the three models that falls within the 10% limit from the expected number of exceedances. If
some other calibration periods used, historical and Hill method underestimate the risk, while the
conditional Hill methods overestimate it quite a lot.
For the time series of N225 one or four years of calibrations data has to be used for reliable VAR
models the generate prediction that are trustworthy. For one year one of the conditional Hill models
generate the most reliable estimates, for four years the simpler Hill method is preferable.
For estimating ES the conditional Hill models based on one year of historical data are the only
acceptable ones. As for the other datasets included in this study the historical simulation generates
backtesting statistics closes to zero but underestimate the VAR risk quite substantially. So for a good
ES estimating model the VAR estimate should be incorrect. But to fulfill the conditions for a reliable
model for both VAR and ES estimations one of the conditional Hill methods based on one year of
historical data should be used.
36
6.3.5 EUR
Table 13. Simulation results
and
predictions based on the time series of EUR.
EUR Calibration period:
Length prediction period (in days)
Expected number of exceedances (in days)
1 year
4324
43.24
2 years
4072
40.72
3 years
3820
38.20
4 years
3568
35.68
Exceedances over estimated VAR, :
Historical
59 (36%) 50 (23%) 44 (15%) 42 (18%)
Hill
50 (16%) 36 (-12%) 37 (-3%) 35 (-2%)
Conditional Hill N
37 (-14%) 27 (-34%) 28 (-27%) 28 (-22%)
Conditional Hill T
36 (-17%) 25 (-39%) 24 (-37%) 22 (-38%)
Statistic for ES, :
Historical
0.02%
0.03%
0.05%
0.05%
Hill
-0.23%
-0.27%
-0.26%
-0.35%
Conditional Hill N
-0.19%
-0.21%
-0.26%
-0.27%
Conditional Hill T
-0.23%
-0.27%
-0.28%
-0.25%
For EUR the Hill estimators performs best for VAR estimations, the ones with a calibration period of
three and four years performs closes to the statistically expected and are the only two which falls
within the limits for a reliable model. Both models overestimate the risk slightly. All other models are
considered as unreliable since them either under- or overestimate the risk substantially. So for the
time series of EUR the longer calibration periods are needed and only the Hill estimator generates
reliable estimates. The same reasoning holds for the preferable model for estimating ES. The Hill
estimator based on three or four years of historical data for calibration should be used.
Different from the results for the indices in the above sections are that the four year period of
historical data seems to be preferable to obtain good estimate of the risk measures when the time
series for the EUR foreign exchange rate is used. This can be seen in the two coming sections as well,
even though the models also perform well when shorter periods are used for those time series.
6.3.6 GPB
Table 14. Simulation results
and
predictions based on the time series of GPB.
GBP Calibration period:
Length prediction period (in days)
Expected number of exceedances (in days)
1 year
4324
43.24
2 years
4072
40.72
3 years
3820
38.20
4 years
3568
35.68
50 (16%)
37 (-14%)
41 (-5%)
36 (-17%)
51 (25%)
32 (-21%)
30 (-26%)
35 (-14%)
56 (47%)
34 (-11%)
33 (-14%)
35 (-8%)
47 (32%)
36 (1%)
33 (-8%)
33 (-8%)
0.11%
-0.61%
-0.33%
-0.48%
0.07%
-0.45%
-0.42%
-0.50%
0.01%
-0.53%
-0.39%
-0.48%
0.11%
-0.50%
-0.44%
-0.37%
Exceedances over estimated VAR, :
Historical
Hill
Conditional Hill N
Conditional Hill T
Statistic for ES, :
Historical
Hill
Conditional Hill N
Conditional Hill T
37
Every method apart from the historical overestimates the VAR throughout, except from the Hill
estimator with four years of calibration data. That is the model which performs closes to the
expected. However several conditional Hill models can also be considered as reliable. All reliable
models give an overestimation of the ES, but within the acceptable limit.
The choice of length on the calibration data affect the choice of model but four years of data seems
to be the suitable choice since either of the models based on extreme value theory then can be used
to obtain dependable results. The conditional Hill model with normal innovations and one year of
calibration data is the model that seems to estimate ES most accurate.
6.3.7 USD
Table 15. Simulation results
and
predictions based on the time series of USD.
USD Calibration period:
Length prediction period (in days)
Expected number of exceedances (in days)
1 year
4324
43.24
2 years
4072
40.72
3 years
3820
38.20
4 years
3568
35.68
Exceedances over estimated VAR. :
Historical
60 (39%) 56 (38%) 46 (20%) 52 (46%)
Hill
55 (27%) 45 (11%) 38 (-1%) 40 (12%)
Conditional Hill N
48 (11%) 41 (1%) 35 (-8%) 37 (4%)
Conditional Hill T
46 (6%) 40 (-2%) 38 (-1%) 36 (1%)
Statistic for ES, :
Historical
0.08%
0.11%
0.14%
0.07%
Hill
-0.52%
-0.42%
-0.39%
-0.33%
Conditional Hill N
-0.57%
-0.57%
-0.61%
-0.56%
Conditional Hill T
-0.60%
-0.52%
-0.66%
-0.54%
As for all the other time series the historical simulation provides the backtest statistic closes to zero,
but with a little underestimation of the risk. The statistic is by far the one closes to zero seen to all
the models tested. However, as mentioned before, the VAR estimations are not even close to being
reasonable and therefore the models cannot be seen as acceptable.
For the VAR estimation all extreme value theory based models generate reliable estimates. The
conditional Hill method with t innovations gives dependable models for all lengths of calibration data
tested. The models with normal innovations need two year or more, while the Hill estimator only
gives reliable estimates when three years of calibration data is used. The results from the backtesting
of ES predictions all shows that the models that can are acceptable for VAR estimations all are within
the limits of acceptability for ES estimations as well. The EVT models that give to many exceedances
than desired are not that bad. As can be seen in table 15 the differences from the statistically
expected are only 11-12%, except for the Hill estimator based on one year of calibration data. This
indicates that all the EVT models are strong candidates when choosing a model for estimating risk
measures.
The time horizon used for de models are arbitrary. Some restrictions exist depending on which model
to use and the outcome from this backtesting procedure is that the conditional Hill method with t
innovations is preferable since the choice of length on the calibration data that is used then is
arbitrary.
38
6.4 Backtesting results, various thresholds
OMXS30 is used to investigate different threshold choices than 10%. Calibration periods of one to
four years are used to estimate
and
. The threshold investigated is based on the result
presented in section 5.2 Threshold choice.
Table 16. Summary of Hill simulation for different thresholds, for the OMXS30 data.
Hill estimator
Calibration period
1 year
2 years
3 years
4 years
Length prediction period
4324
4072
3820
3568
(in days)
Expected number of
43.24
40.72
38.20
35.68
exceedances (in days)
Exceedances over
estimated VAR, :
Hill T=0.1
54 (25%)
44 (8%)
43 (13%)
44 (23%)
Hill T=0.07
70 (62%)
58 (42%)
57 (49%)
57 (60%)
Hill T=0.05
76 (76%)
62 (52%)
62 (62%)
63 (77%)
Hill T=0.03
81 (87%)
68 (67%)
71 (86%)
68 (91%)
Statistic for ES, :
Hill T=0.1
-1.00%
-1.25%
-1.55%
-2.00%
Hill T=0.07
-0.70%
-0.81%
-1.08%
-1.48%
Hill T=0.05
-0.48%
-0.30%
-0.56%
-0.95%
Hill T=0.03
-0.25%
-0.17%
-0.35%
-0.44%
VAR predictions based on Hill become less and less accurate when the tail size diminishes for all
calibration periods from one up to four years. So if using the Hill estimator the threshold should not
be less than 10% for data similar to the OMXS30 data and the length of the calibration period has to
be two years to obtain a reliable model. This model slightly underestimates the risk based on VAR
since it gives a few more exceedances than the statistically expected.
For expected shortfall the backtesting statistic gets closer to zero the lower the threshold is set to be.
However then the statistic is based on an inaccurate VAR estimate and is therefore somewhat
misleading. If the ES models should be combined with a VAR model, to avoid several different
simulations, a model based on two years of historical data and a threshold of 10% should be to used.
This would lead to an overestimation of 1.25% of the ES risk, on average. I.e. if the estimated ES is
10% the actual loss is around 8.75%.
39
Table 17. Summary of conditional Hill simulation, with normal innovations, for different thresholds, for the OMXS30
data.
Conditional Hill. normal innovations
Calibration period
1 year
Length prediction period
4324
(in days)
Expected number of
43.24
exceedances (in days)
Exceedances over
estimated VAR, :
CH N T=0.1
46 (6%)
CH N T=0.07
60 (39%)
CH N T=0.05
66 (53%)
CH N T=0.03
73 (69%)
Statistic for ES, :
CH N T=0.1
-0.53%
CH N T=0.07
-0.45%
CH N T=0.05
-0.34%
CH N T=0.03
-0.22%
2 years
3 years
4 years
4072
3820
3568
40.72
38.20
35.68
36 (-12%)
45 (11%)
54 (33%)
60 (47%)
35 (-8%)
46 (20%)
50 (31%)
54 (41%)
32 (-10%)
36 (1%)
41 (15%)
44 (23%)
-0.81%
-0.44%
-0.36%
-0.21%
-0.73%
-0.52%
-0.35%
-0.15%
-1.05%
-0.57%
-0.35%
-0.15%
Table 18. Summary of conditional Hill simulation, with t innovations, for different thresholds, for the OMXS30 data.
Conditional Hill. T innovations
Calibration period
1 year
2 years
3 years
4 years
Length prediction period
4324
4072
3820
3568
(in days)
Expected number of
43.24
40.72
38.20
35.68
exceedances (in days)
Exceedances over
estimated VAR, :
CH T T=0.1
44 (2%)
34 (-17%)
35 (-8%)
24 (-33%)
CH T T=0.07
55 (27%)
46 (13%)
42 (10%)
36 (1%)
CH T T=0.05
63 (46%)
53 (30%)
49 (28%)
43 (21%)
Statistic for ES, :
CH T T=0.1
-0.84%
-0.97%
-1.10%
-0.96%
CH T T=0.07
-0.28%
-0.68%
-0.60%
-0.70%
CH T T=0.05
-0.24%
-0.45%
-0.35%
-0.47%
For the VAR estimations the best models vary for different calibration periods, however it shouldn’t
be lower than approximately 7%, see table 17 and 18. The choice of threshold affects the appropriate
length of the calibration data. Overall two years should be avoided; either a shorter period with a
10% threshold or a longer period for a 7-10% threshold should be used. For both conditional Hill
methods a threshold of 7% and a calibration data of four years seem to be the ones that perform
closes to the statistically expected.
As can be seen in table 17 and 18 the ES backtesting statistic seems to approach zero with smaller tail
sizes. Then the threshold for the tail is set to be 5% of the sample the statistics closes to zero is
obtained. Nevertheless, the same reasoning as for the Hill estimator holds for these methods. Since
the ES backtesting statistic depends on the VAR estimate one has to consider that when interpreting
40
the results of the backtesting. If the models that are preferable for VAR estimation is used, the
overestimation of the ES risk will be slightly larger than if a threshold of 5% is used for the models.
6.5 Results higher significance levels
As described in section 5.7 an investigation of how the EVT models perform for higher significance
levels is made. A hypothetical loss distribution consisting of 1 000 000 observations for the day 201201-01 is generated via a AR(1)-GARCH(1,1) model with t innovations to obtain the ES, VAR is obtained
from properties of the distribution . The result is presented in table 19 and 20 below.
Table 19. The table shows the prediction of
and
for 2012-01-01 from the different models. The
outcome from the hypothetical data is also included as a reference, the closer to that outcome the better.
1 year
2 years
3 years
4 years
Outcome from
hypothetical data
Historical
Hill
Conditional Hill N
Conditional Hill T
0.1818
0.1666
0.1792
0.1909
0.0682*
0.1351
0.0964
0.1090
0.0682*
0.1227
0.0761
0.0866
0.0640
0.1136
0.0831
0.0833
0.0703
0.1410
0.0781
0.0900
Outcome from
hypothetical data
Historical
Hill
Conditional Hill N
Conditional Hill T
0.2766
0.2538
0.2731
0.2906
-0.2280
0.1475
0.1771
-0.2132
0.1103
0.1337
-0.1862
0.1227
0.1242
0.0723*
0.2392
0.1106
0.1349
*
The one most extreme observation in the historical data used.
Table 20. The table shows the prediction of
and
for 2012-01-01 from the different models. The
outcome from the hypothetical data is also included as a reference, the closer to that outcome the better.
1 year
2 years
3 years
4 years
Outcome from
hypothetical data
Historical*
Hill
Conditional Hill N
Conditional Hill T
0.2730
0.2504
0.2695
0.2868
0.0682
0.2207
0.1462
0.1731
0.0682
0.2045
0.1105
0.1323
0.0682
0.1817
0.1226
0.1239
0.0723
0.2311
0.1112
0.1343
Outcome from
hypothetical data
Historical
Hill
Conditional Hill N
Conditional Hill T
0.4117
0.2780
0.4068
0.4327
-0.3724
0.2239
0.2814
-0.3554
0.1602
0.2041
-0.2978
0.1810
0.1848
-0.3921
0.1576
0.2015
*
The estimated VAR is just the one most extreme observation in the historical data used, for all four time periods used.
For the VAR predictions the Hill estimator seems to be the most accurate, even though it gives a
lower VAR than the outcome from the hypothetical data and hence underestimates the risk. The
41
result is the same for the ES predictions, the Hill estimator provides the most accurate predictions,
however it consistently underestimates the ES risk.
Notice that the predictions in the tables above only are done for one day. To establish the accuracy
of the models predictions must be made over a longer period of time and then used to analyze and
backtest the separate models.
42
7. Discussion and conclusions
In this study the time series used are just the returns for a few indices and foreign exchange rates
over an 18 year period of time. The time series are just of the daily closing price for the assets. In
reality, when a portfolio or an asset performs poorly, you make changes to avoid large losses by for
example selling of some of the asset or changing the positions in the portfolio. If this is taken into
consideration the appearance of the time series may be different.
In the literature it is stated that EVT should be good for higher quantile estimations; it’s the main
idea when using the theory. The largest part in this study is used to investigate models for estimating
the risk measures for a significance level of 99%. A small analysis is made for higher levels of
significance. The outcome from that is that for higher significance levels the Hill estimator seems to
perform the best. However the estimations of both VAR and ES are pretty far from the outcome from
the hypothetical data. One explanation to why the predictions from the models are so far off can be
that when the hypothetical outcome is generated the residuals are assumed to come from a
student’s t distribution with three degrees of freedom. In reality the actual residuals of the time
series may correspond to another degree of freedom. So even if the AR-GARCH model is based on
the time series and the assumption of the residuals seems reasonable, it may not be optimal.
However, to properly investigate how the models performs for higher significance levels a more
extensive study should be made, with substantial data material so proper backtesting procedures can
be carried out.
There seem to be no clear answer to which method is preferable over the others when estimating
the two risk measures VAR and ES. The preferable model seems to vary not only between different
types of assets but between assets of the same type as well. In this study indices and foreign
exchange rates are considered, if other types of assets are studied a clearer picture of which model
to prefer may be obtained.
In addition, other aspects than just the models ability to satisfying predictions may affect the choice
of model. The conditional Hill models demand both a fit to a time series model as well as an
estimation of a shape parameter while the Hill estimator only demands an estimation of the shape
parameter before predicting VAR and ES. The Hill estimator has the advantage that it doesn’t need to
estimate any additional parameters and is therefore simpler to implement. Also, the computational
burden to obtain the desired estimate is not as heavy as for the methods that combine EVT and time
series analysis. The advantage of the conditional Hill methods is that they are more flexible than the
others, both to volatility increase and decrease. When combining EVT and time series models the
combinations are very many. In this study I chose the conditional Hill since error from model
misspecification should be as small as possible. If the POT is used, a GPD needs to be fitted to the
residuals as well, and hence the computational burden increases even more and the possibility of
error increases.
The one big drawback with the Hill estimator is that is only applies to Fréchet types of distributions
and even though financial data most often can be assumed to come from that family, that’s not
always the case. That is the main reason for considering the POT method. The advantages of the POT
method is that is can handle all type of distributions, and the GPD can be fitted no matter if the data
seems to come correspond to a Gumbel, Fréchet or a Weibull distribution. The fact that the GPD
parameter has to be fitted to the data is one drawback of the method already mentioned. The main
43
problem, as shown in this study, is that time horizon needed for a stable model is not obvious. There
need to be enough observations in the tail to obtain parameter estimate that converges. The number
is conditioned on how good fit the GPD is to the data. If the data are a very good fit to the GPD you
need fewer observations than if it’s a poor one.
The threshold choice is one important part of the EVT. In this study some different method for
finding the appropriate threshold are examined and a few choices of threshold are investigated.
Since the choices based on graphical an minimizing MSE methods give worse models than when 10%
is used those methods is obviously not optimal. If the conventional choice seems to diffuse the data
driven algorithms presented in section 2.5.1 defining the tail, should be considered, in my opinion.
However the conventional choice seems to be good enough as a first step and a threshold choice
around 10% seems to be reasonable.
As mentioned above, the preferable model depends on the data. For ES the historical simulation is
the simplest alternative. It can be easily implemented where historical simulation of VAR already
exists, it often gives a backtesting statistic very close to zero, and in comparison with the other
methods tested in this study it always gives the smallest statistic (for a threshold of 10%). However,
the statistic is obtained based on an incorrect estimation of VAR which make the results debatable.
Since the ES backtesting statistic depends on the estimated VAR it seems reasonable that the
preferable model should be the same for both. This fact can be seen in section 6.5 where the models
are tested for higher significance levels, but it’s not always the case though as seen in section 6.3.
The main reason for this is that the ES backtesting statistic then is based on an incorrect VAR
estimate. In most cases this means that the actual number of exceedances for the models is
substantially larger than the statistically expected, hence the ES backtesting statistic is based on
several more observations than it actually would be if the VAR estimate was closer to the expected.
This is one important result in this study; an incorrect estimation of VAR seems to generate an
acceptable estimate of ES. But since VAR and ES are connected one of the EVT methods are
preferable when an adequate model for estimating ES is desired. Which one, as well as the length of
the calibration period, depends on the time series in question.
As for ES, both the choice of model and length of calibration period seem to differ among the time
series for VAR estimations. In general shorter time horizons seems to generate the most stable
models, if one of the conditional Hill methods are applied while longer calibration periods should be
used if one want to implement the Hill estimator. Logically a bit longer calibration period should be
preferable so the observations in the tail that the ES estimate is based on aren’t so few, but as can be
seen in section 6.3 that’s not always the case. Consequently, the time horizon needed is somewhat
arbitrary and depends on both the chosen model but mainly on the data used.
No matter which model one uses they all fail to capture large, quick changes in the volatility. If it goes
from calm one day to very turbulent the next, no model can predict that. This is clear in the plots in
section 6.2. When a change in the volatility develops under some time the conditional Hill models
adjust fast, both to increases and decreases. These are the models that for the majority of the time
series tested that generate models within the limits of acceptability. The models with normal
innovations seem to generate the most accurate ES estimate, while the models with t innovations
generate the most accurate VAR estimate. The Hill estimator is the only one that generates
acceptable models for GSPC and EUR, and for several others gives VAR estimates that are the closest
44
to the expected. The foreign exchange rates seem to be more adaptable and all the EVT method
generate reliable estimates of both VAR and ES, except for EUR. The indices on the other hand seem
to be more sensitive to the choice of method. Either the Hill estimator is the only method that
generates acceptable models or it is one of the conditional Hill.
A drawback when using historical simulations is that you are restricted to the historical outcome of
an asset. In a sense you assume that history will repeat itself. Since that the financial market is
constantly changing EVT models or other types of semi parametric of parametric method can be
useful in the field of risk analysis. The conditional Hill models in this study have performed quite well
and with methods like that you consider what has happened recently, and adapt your model to that.
If you are in a period of high volatility the model will take that into account, and if the volatility is low
that will affect the model. Even though the first exceedance can’t be avoided the model quickly
adapts the increase in volatility and hence avoids further exceedances. Even just the Hill estimator is
more flexible and gives better backtesting results than historical simulations.
The big problem with the EVT models presented in this paper is that they can be hard to implement
on large portfolios and the computational burden for the estimations can be quite heavy. However,
this for ES there is a way around this problem. Since ES is a coherent risk measure one of the
properties that is satisfy is the sub-additivity. This means that the risk of a portfolio cannot be greater
than the sum of the individual risks of the assets in the portfolio, this property is also known as the
diversification. Hence, EVT can be applies to the individual assets and for the ES of a portfolio a upper
limit is given by the sum of the ES for the individual assets. One way to make use of the advantages
of the EVT models for VAR is either to use them on significantly smaller portfolios or just on the time
series for the portfolio since the EVT methods in the univariate case are quite easily implemented
and pretty fast to simulate, this can of course be done for ES as well. These estimations can then be
used in combination with historical simulation which is easier to implement on large portfolios. So
the conclusion of this study is that EVT method can be useful in the field of risk analysis and
contribute to improved predictions. Which model to use depends on the data as well as the level of
ambition. The models that combine EVT and time series analysis are harder to implement and the
computational burden is higher. The Hill estimator is much simpler. The time horizon of the
historical data used is shifting between models and between assets. To summarize, for indices a
shorter horizon is preferable, one to two years, and one of the conditional Hill method should be
used, GSPC is the exception. The Hill estimator with the longer horizons for up to four years should
be used on time series that are similar to the foreign exchange rates included in this study.
45
8. References
Books
Alexander, C (2008): Market risk analysis IV, Value-at-risk models. John Wiley & sons Ltd., England
Embrechts, P, Klüppelberg, C, Mikosch, T (1997): modeling extremal events for insurance and
finance, Springer-Verlag, New York
Gouriéroux, C (1997): ARCH-Models and Financial Applications, Springer Series in Statistics. Springer,
New York.
Articles
Basel Committee on Banking Supervision (2012): Fundamental review of the trading book, Basel
Committee on Banking Supervision Consultative document
Christoffersen, P and Goncalves, S (2005): Estimation Risk in Financial Risk Management, Journal of
Risk, 7, p 1-28.
Danielsson, J and de Vries, C.G. (1997): Tail index and quantile estimation with very high frequency
data. Journal of Empirical Finance 4: p241-257.
Danielsson, J, De Haan, L, Peng, L, and De Vries, C.G. (2001): Using a Bootstrap Method to Choose
the Sample Fraction in Tail Index Estimation. Journal of Multivariate Analysis 76 No. 2 p. 226–48.
Drees, H and Kaufmann, E (1998): Selecting the optimal sample fraction in univariate extreme value
estimation. Stochastic Processes and their Applications 75, p 149-172
Beirlant, J, Vynckier P. and Teugels J.L. (1996a): Tail index estimation, Pareto quantile plots and
regression diagnostics. Journal of the American statistical association 91. P. 1659-1667.
Beirlant, J, Vynckier P and Teugels J.L. (1996b): Excess function and estimation of the extreme value
index. Bernoulli 2, p 293-318.
Embrechts, P, Kaufmann, R, Patie, P (2005): Strategic long-term financial risks: single risk
factors. Computational Optimization and Applications vol 32 issue(1-2),p 61-90
Halie, FD and Pozo S (2006): Exchange rate regimes and currency crises: an evaluation using extreme
value theory. Review of International Economics, Volume 14, No. 4, p. 554-570.
Hall, P (1990): using the bootstrap to estimate mean square error and select smoothing parameter in
nonparametric problems. Journal of multivariate analysis vol 32 no 2. P. 177-203
Hill, M (1975): A simple general approach to inference about the tail of a distribution. The annals of
statistics, Vol. 3, p. 1163-1174.
Kourouma, L, Dupre, D, Sanfilippo, G and Taramasco, O (2011): Extreme Value at risk and Expected
Shortfall during Financial Crisis, Working paper, HAL : halshs-00658495, version 1
46
Longin, FM (2000): From Value at Risk to Stress Testing: The Extreme Value Approach. Journal of
Banking & Finance, Vol. 24, No. 7, p. 1097-1130.
Lux, T (2000): On moment condition failure in German stock returns: An application of recent
advances in extreme value statistics. Empirical Economics 25. P. 641-652
McNeil, AJ (1997): Estimating the tails of loss severity distributions using extreme value theory.
ASTIN Bulletin,27: p. 117-137.
McNeil, AJ (1999): Extreme value theory for risk managers . Internal Modelling and CAD II published
by RISK Books , p. 93-113.
McNeil, AJ and Frey, R (2000): Estimation of tail-related risk measures for heteroscedastic financial
time series: an extreme value approach. Journal of Empirical Finance, 7: p. 271-300.
McNeil, AJ and Saladin, T (1997): The peaks over thresholds method for estimating high quantiles of
loss distributions. Proceedings of 28th International ASTIN Colloquium.
Nyström, K and Skoglund, J (2002): Univariate extreme vale theory, GARCH and measure of risk.
Swedbank, Group financial risk control.
Rocco, M (2011): Extreme value theory for finance: a survey. Bank of Italy Occasional Paper No. 99
Coronado, M (2000): Extreme value theory (EVT) for risk managers: Pitfalls and opportunities in the
use of EVT in measuring VaR. Proceedings of the VIII Foro de Finanzas
Fisher, R. A. and Tippett, L.H.C. (1928): Limiting Forms of the Frequency Distribution of the Largest or
Smallest Member of a Sample. Proceedings of the Cambridge Philosophical Society, Vol. 24, p. 180190.
47
Appendix A
Kurtosis
Kurtosis is defined as:
In this study MATLAB is used for calculation of the kurtosis. MATLAB uses the following equation for
the kurtosis which is corrected for bias:
Where
is the equation for calculating kurtosis not corrected for bias.
The kurtosis gives an indication on how probable extreme events are for a distribution. The normal
distribution has a kurtosis equal to 3 and a kurtosis lager than 3 indicate fat tails and slimmer, peaked
center. These are called leptokurtic distributions and when the kurtosis is smaller than 3 the
distribution is said to be platykurtic.
Skewness
The skewness is defined as
In this study MATLAB is used for calculation of the skewness. MATLAB uses the following equation for
the skewness which is corrected for bias:
Where
is the equation for calculating kurtosis not corrected for bias.
Skewness is another measure that indicates the asymmetry of the probability distribution. A zero
value implies that the data are evenly spread around the sample mean, as for the normal distribution
48
for example. A negative value indicates that the left tail is longer than the right while a positive value
implies the opposite. If the probability distribution is tilted a time series models that take that into
consideration can be useful to obtain dependable results.
Autocorrelation
The lag-h autocorrelation estimate is obtained by
where is the observation at time t and is the estimated mean of the sample. Often a 95%
confidence interval is included in the plot, if no
crosses the bounds, the assumption of no
autocorrelation hold, the data can be considered to be independent.
Ljung Box Q test
For the Ljung Box Q-test the following statistic is used
Where T is the sample size, L is the number of lags where autocorrelation is tested and
is the
autocorrelation at lag k, for definition see under Autocorrelation above. Hence to obtain the statistic
, the squared autocorrelation at each lag is weighted and then summarized. The weights are just
the difference between the total sample size and the current lag. The hypotheses of the test are
whether the statistic, , belongs to a
distribution with degrees of freedom or not, with the
significance level
, and it’s formulated as:
, no autocorrelation
, autocorrelation occurs
The test does not distinguish at which lags the autocorrelation occurs, it test the overall
autocorrelation.
49
-15,0%
-14,0%
1994-05-11
1994-10-24
1995-04-07
1995-09-20
1996-03-07
1996-08-20
1997-02-07
1997-07-23
1998-01-12
1998-06-25
1998-12-08
1999-05-27
1999-11-09
2000-04-25
2000-10-06
2001-03-23
2001-09-05
2002-02-25
2002-08-13
2003-02-03
2003-07-25
2004-01-15
2004-07-06
2004-12-17
2005-06-10
2005-11-24
2006-05-16
2006-11-01
2007-04-23
2007-10-10
2008-04-02
2008-09-18
2009-03-11
2009-08-31
2010-02-18
2010-08-10
2011-01-26
2011-07-18
2011-12-30
1994-05-11
1994-10-24
1995-04-07
1995-09-20
1996-03-07
1996-08-20
1997-02-07
1997-07-23
1998-01-12
1998-06-25
1998-12-08
1999-05-27
1999-11-09
2000-04-25
2000-10-06
2001-03-23
2001-09-05
2002-02-25
2002-08-13
2003-02-03
2003-07-25
2004-01-15
2004-07-06
2004-12-17
2005-06-10
2005-11-24
2006-05-16
2006-11-01
2007-04-23
2007-10-10
2008-04-02
2008-09-18
2009-03-11
2009-08-31
2010-02-18
2010-08-10
2011-01-26
2011-07-18
2011-12-30
-11,0%
1994-05-11
1994-10-24
1995-04-07
1995-09-20
1996-03-07
1996-08-20
1997-02-07
1997-07-23
1998-01-12
1998-06-25
1998-12-08
1999-05-27
1999-11-09
2000-04-25
2000-10-06
2001-03-23
2001-09-05
2002-02-25
2002-08-13
2003-02-03
2003-07-25
2004-01-15
2004-07-06
2004-12-17
2005-06-10
2005-11-24
2006-05-16
2006-11-01
2007-04-23
2007-10-10
2008-04-02
2008-09-18
2009-03-11
2009-08-31
2010-02-18
2010-08-10
2011-01-26
2011-07-18
2011-12-30
Time series for loss returns of FTSE
4,0%
-1,0%
-6,0%
Figure 16. . The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date
Time series for loss returns of N225
10,0%
5,0%
0,0%
-5,0%
-10,0%
Figure 17. The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date
Time series for loss returns of GSPC
11,0%
6,0%
1,0%
-4,0%
-9,0%
Figure 18. The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date.
50
4,3%
3,3%
2,3%
1,3%
0,3%
-0,7%
-1,7%
-2,7%
-3,7%
-2,6%
1994-05-10
1994-10-18
1995-03-29
1995-09-06
1996-02-19
1996-07-29
1997-01-13
1997-06-23
1997-12-01
1998-05-18
1998-10-26
1999-04-09
1999-09-17
2000-02-29
2000-08-08
2001-01-19
2001-06-28
2001-12-06
2002-05-27
2002-11-05
2003-04-25
2003-10-09
2004-03-26
2004-09-10
2005-02-23
2005-08-10
2006-01-20
2006-07-10
2006-12-18
2007-06-07
2007-11-16
2008-05-07
2008-10-17
2009-04-06
2009-09-21
2010-03-08
2010-08-23
2011-02-03
2011-07-21
2011-12-30
1994-05-10
1994-10-18
1995-03-29
1995-09-06
1996-02-19
1996-07-29
1997-01-13
1997-06-23
1997-12-01
1998-05-18
1998-10-26
1999-04-09
1999-09-17
2000-02-29
2000-08-08
2001-01-19
2001-06-28
2001-12-06
2002-05-27
2002-11-05
2003-04-25
2003-10-09
2004-03-26
2004-09-10
2005-02-23
2005-08-10
2006-01-20
2006-07-10
2006-12-18
2007-06-07
2007-11-16
2008-05-07
2008-10-17
2009-04-06
2009-09-21
2010-03-08
2010-08-23
2011-02-03
2011-07-21
2011-12-30
4,5%
3,5%
2,5%
1,5%
0,5%
-0,5%
-1,5%
-2,5%
-3,5%
1994-05-20
1994-11-07
1995-04-26
1995-10-12
1996-04-03
1996-09-19
1997-03-14
1997-09-01
1998-02-24
1998-08-12
1999-02-03
1999-07-22
2000-01-11
2000-06-28
2000-12-14
2001-06-05
2001-11-21
2002-05-17
2002-11-06
2003-05-07
2003-10-28
2004-04-26
2004-10-15
2005-04-11
2005-09-30
2006-03-22
2006-09-15
2007-03-08
2007-09-03
2008-02-26
2008-08-20
2009-02-13
2009-08-10
2010-02-02
2010-07-28
2011-01-18
2011-07-13
2011-12-30
Time series for loss returns of GBP
Figure 19. The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date.
Time series for loss returns of USD
Figure 20. The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date.
Time series for loss returns of EUR
2,4%
1,4%
0,4%
-0,6%
-1,6%
Figure 21. The time series for the loss returns, note that the data starts at 1994-01-03 even though the first date on the
axis label is a later date.
51
FTSE Autocorrelation Function
FTSE Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
-0.2
20
0
5
10
Lag
15
20
Figure 22. . Autocorrelation plot for the residuals of the FTSE time series (left) and for the squared residuals (right).
GSPC Autocorrelation Function
GSPC Squared Autocorrelation Function
1
1
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
20
-0.2
0
5
10
Lag
15
20
Figure 23. . Autocorrelation plot for the residuals of the GSPC time series (left) and for the squared residuals (right).
52
N225 Autocorrelation Function
N225 Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
-0.2
20
0
5
10
Lag
15
20
Figure 24. . Autocorrelation plot for the residuals of the N225 time series (left) and for the squared residuals (right).
EUR Autocorrelation Function
EUR Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
20
-0.2
0
5
10
Lag
15
20
Figure 25. . Autocorrelation plot for the residuals of the EUR time series (left) and for the squared residuals (right).
53
GBP Autocorrelation Function
GBP Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
-0.2
20
0
5
10
Lag
15
20
Figure 26. . Autocorrelation plot for the residuals of the GBP time series (left) and for the squared residuals (right).
USD Autocorrelation Function
USD Squared Autocorrelation Function
0.8
Sample Autocorrelation
Sample Autocorrelation
0.8
0.6
0.4
0.2
0
-0.2
0.6
0.4
0.2
0
0
5
10
Lag
15
20
-0.2
0
5
10
Lag
15
20
Figure 27. . Autocorrelation plot for the residuals of the USD time series (left) and for the squared residuals (right).
54
FTSE
Histogram of data, red line normal dist. fit
GSPC
Histogram of data, red line normal dist. fit
450
700
400
600
350
500
Number of observations
Number of observations
300
250
200
400
300
150
200
100
100
50
0
-0.1
-0.08
-0.06
-0.04
-0.02
0
Observation
0.02
0.04
0.06
0
-0.2
0.08
-0.15
-0.1
-0.05
0
Observation
0.05
0.1
0.15
Figure 28. Histogram of FTSE and GSPC with fitted normal distribution.
N225
Histogram of data, red line normal dist. fit
EUR
Histogram of data, red line normal dist. fit
800
300
700
250
600
Number of observations
Number of observations
200
500
400
300
150
100
200
50
100
0
-0.2
-0.15
-0.1
-0.05
0
Observation
0.05
0.1
0.15
0
-0.04
-0.03
-0.02
-0.01
0
0.01
Observation
0.02
0.03
0.04
0.05
Figure 29. Histogram of N225 and EUR with fitted normal distribution.
USD
Histogram of data, red line normal dist. fit
350
300
300
250
250
Number of observations
Number of observations
GBP
Histogram of data, red line normal dist. fit
350
200
150
200
150
100
100
50
50
0
-0.04
-0.03
-0.02
-0.01
0
0.01
Observation
0.02
0.03
0.04
0.05
0
-0.03
-0.02
-0.01
0
Observation
0.01
0.02
0.03
Figure 30. Histogram of GBP and USD with fitted normal distribution.
55
QQ Plot of FTSE Data versus Standard Normal
QQ Plot of GSPC Data versus Standard Normal
0.08
0.15
0.06
0.1
0.04
0.05
Quantiles of Input Sample
Quantiles of Input Sample
0.02
0
-0.02
0
-0.05
-0.04
-0.1
-0.06
-0.15
-0.08
-0.1
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
-0.2
-4
4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
4
Figure 31. QQ plot against normal distribution for FTSE and GSPC.
QQ Plot of N225 Data versus Standard Normal
QQ Plot of EUR Data versus Standard Normal
0.15
0.05
0.04
0.1
0.03
0.05
Quantiles of Input Sample
Quantiles of Input Sample
0.02
0
-0.05
0.01
0
-0.01
-0.1
-0.02
-0.15
-0.03
-0.2
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
-0.04
-4
4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
4
Figure 32. QQ plot against normal distribution for N225 and EUR.
QQ Plot of GBP Data versus Standard Normal
QQ Plot of USD Data versus Standard Normal
0.05
0.03
0.04
0.02
0.03
0.01
Quantiles of Input Sample
Quantiles of Input Sample
0.02
0.01
0
-0.01
0
-0.01
-0.02
-0.02
-0.03
-0.04
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
4
-0.03
-4
-3
-2
-1
0
1
Standard Normal Quantiles
2
3
4
Figure 33. QQ plot against normal distribution for GBP and USD.
56
Appendix B
Threshold choice
To investigate different methods for setting the threshold the OMXS30 data is used. First a
construction of Hill plots and mean excess plots are made, then a Monte Carlo simulation is
implemented, for algorithm and results see below.
In the Hill plots the estimated tail index is plotted as a function of the threshold,
. The plot is
made based on the entire data as well as just one year since one year will be used when the
prediction models is implemented. As mentioned in section 2.5.1 Defining the tail approximately
horizontal lines indicates that for those values of the threshold, the tail index estimate is essentially
stable with respect to the choice of threshold.
Tail index as a function of threshold, based on the average of 1 year data
Tail index as a function of threshold, based on 18 year of data
9
6
8
5
Estimated tail index 
Estimated tail index 
7
4
3
2
6
5
4
3
1
2
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Threshold of total sample size,ex 0.02=2%
Figure 34. Hill plots of
year data (right figure).
0.4
1
0.45
0
0.05
0.1
0.15
0.2
Threshold of total sample size,ex 0.02=2%
0.25
where m is the tail size based on 18 years of data (left figure) and based on the average of 1
Tail index as a function of threshold, based on 1 year of data
Tail index as a function of threshold, based on 1 year of data
10
9
9
8
8
Estimated tail index 
Estimated tail index 
7
7
6
5
4
3
2
5
4
3
2
1
0
6
1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Threshold of total sample size,ex 0.02=2%
0.4
0.45
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Threshold of total sample size,ex 0.02=2%
0.4
0.45
Figure 35. Hill plots based on 1 year of data, year 1994 (left figure) and year 2011(right figure).
In figure 34 the left plot is based on the average of all one year estimates. I.e. the tail index is
estimated for all one year non overlapping interval, one interval is 94, the next is 95 and so on, and
then the average estimate for each threshold is plotted. As can be seen in both figures 34 and 35 it’s
not clear where the threshold is stable, and it can shift from sample to sample. A conclusion that can
be made is that the threshold should be larger than 5%, which is in line with the conventional choice
method presented in section 2.5.1 Defining the tail.
57
In the mean excess function plot the actual observation and the sample mean excess for that one is
plotted. As mentioned in section 2.5.1 Defining the tail an approximately linear plot for the higher
order statistics implies that the tail can be assumed to come from a GPD with shape parameter .
The direction of the line indicates the sign of the shape parameter.The plot is made based on the
entire data as well as just one year.
-3
11
0.0108
10.5
0.0106
10
0.0104
9.5
Mean excess value
Mean excess value
Mean excess function for the upper tail for history of 18 years
0.011
0.0102
0.01
0.0098
9
8
0.0096
0.0094
7
6.5
0
0.02
0.04
0.06
0.08
0.1
0.12
Threshold of entire samplesize, ex. 0.02=2%
0.14
0.16
Mean excess function for the upper tail for history of 1 years
8.5
7.5
0.0092
x 10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Threshold of entire samplesize, ex. 0.02=2%
0.4
0.45
Figure 36. Mean excess plots based on 18 years of data (left) and on the average of 1 year periods (right).
How the data used affects the mean excess plots is clear in figure 36. But even when just one year is
used to generate the plot there is no clear answer to where the suitable threshold should be set. The
slope declines a little around 7-10%, but it’s a very vague motivation.
The figures from both the Hill and mean excess methods show the difficulties and the drawbacks
with the graphical methods. Even though they are easily generated a subjective assessment is
needed, this is neither practical nor efficient, since the threshold choice based on these methods
needs to be adjusted to the data at hand. A conventional choice of let say 10% is hence more
consistent and time effective.
To use a more robust method that don’t rely on human conclusions a Monte Caro simulation
procedure was used. Hence the threshold is given for which the RMSE between the true and the
estimated quantile is smallest. The procedure used in this study is a Monte Carlo simulation to find a
suitable threshold by minimizing the mean squared error. It’s described in the following algorithm:

Generate n=1000 samples from student´s t distribution (4 d.o.f.) and the true
and hence the true quantile can be easily calc.

Calculate and
:
For various values of m (threshold), restrict on
is beyond the threshold.
so that the target quantile
58

For the Hill estimator , calc the quantile and MSE and bias using MC based on 1000
independent samples

Plot the MSE and BIAS against m for the 99th percentage. Choose the m which gives the
smallest MSE and BIAS
The procedure is repeated for 30 degrees of freedom as well, a distribution closer to the normal.
The procedure generates the following plots:
D.o.f. for the student´s t is 4
D.o.f. for the student´s t is 30
0.9
0.6
0.8
0.5
0.7
0.4
RMSE and BIAS
RMSE and BIAS
0.6
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
RMSE
BIAS
0
-0.1
0
0.02
0.04
0.06
0.08
0.1
Threshold in %
0.12
0.14
0
0.16
-0.1
RMSE
BIAS
0
0.02
0.04
0.06
0.08
0.1
Threshold in %
0.12
0.14
0.16
Figure 37. Plot of MSE and BIAS as a function of the threshold, based on a t distribution with four degrees of freedom
(left) and 30 degrees of freedom (right).
For thresholds between one and 7-8% the RMSE seems to be stable, while the BIAS is increasing from
a threshold of approximately 3%. The threshold that generated the minimal RMSE was 4.4 % when
fours degrees of freedom were used and 5.1 % for 30 degrees of freedom. However if a short sample
is used a threshold that is set too low there will be too few observations in the tail to obtain reliable
estimates. Note that the above procedure is based on a defined distribution, not on the actual data
which is a major drawback. The only possible connection is that if the OMXS30 data actually comes
from a student´s t distribution with exactly four degrees of freedom. Another observation is that the
plot looks quite the same for a t distribution with higher degrees of freedom. This method takes a bit
more time than the graphical ones, gives a specific threshold, the one that corresponds to the
minimal RMSE. However, it has no connection to a specific data, a return series for example, but is
entirely based on a known probability distribution, a Student’s t in this case. Therefore, as mentioned
in section 4.2, the conventional choice of 10% will be used overall in this study, see section 2.5.1
Defining the tail. Some other choices will be tested, based on the result in this section, to see if and
how they affect the results.
59
Appendix C
Autocorrelation plots
OMXS30 Autocorrelation Function Normal innovations
OMXS30 Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
OMXS30 Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
OMXS30 Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
4
6
8
10
Lag
12
14
16
Figure 38. Autocorrelation plot for the residuals from the AR-GARCH fit of the OMXS30 time series, with normal
innovations (left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right
upper), t innovations (right lower).
FTSE Autocorrelation Function Normal innovations
FTSE Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
FTSE Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
FTSE Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
20
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
Figure 39. Autocorrelation plot for the residuals from the AR-GARCH fit of the FTSE time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
60
GSPC Autocorrelation Function Normal innovations
GSPC Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
GSPC Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
GSPC Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
4
6
8
10
Lag
12
14
16
Figure 40. Autocorrelation plot for the residuals from the AR-GARCH fit of the GSPC time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
N225 Autocorrelation Function Normal innovations
N225 Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
N225 Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
N225 Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
20
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
Figur 41. Autocorrelation plot for the residuals from the AR-GARCH fit of the N225 time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
61
EUR Autocorrelation Function Normal innovations
EUR Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
EUR Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
EUR Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
4
6
8
10
Lag
12
14
16
Figure 42. Autocorrelation plot for the residuals from the AR-GARCH fit of the EUR time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
GBP Autocorrelation Function Normal innovations
GBP Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
GBP Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
GBP Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
20
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
Figure 43. Autocorrelation plot for the residuals from the AR-GARCH fit of the GBP time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
62
USD Autocorrelation Function Normal innovations
USD Squared Autocorrelation Function Normal innovations
1
Sample Autocorrelation
Sample Autocorrelation
1
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
18
0.5
0
-0.5
20
0
2
USD Autocorrelation Function T innovations
8
10
Lag
12
14
16
18
20
18
20
1
Sample Autocorrelation
Sample Autocorrelation
6
USD Squared Autocorrelation Function T innovations
1
0.5
0
-0.5
4
0
2
4
6
8
10
Lag
12
14
16
18
20
0.5
0
-0.5
0
2
4
6
8
10
Lag
12
14
16
Figure 44. Autocorrelation plot for the residuals from the AR-GARCH fit of the USD time series, with normal innovations
(left upper), t innovations (left lower) and for the squared residuals, based on normal innovations (right upper), t
innovations (right lower).
63
Appendix D
Calibrationperiod 1 year
Calibrationperiod 2 year
0.15
PF
VAR hist
VAR hill
ES hist
ES hill
0.1
0.05
0.15
0.1
0.05
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.15
-0.2
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
-0.2
Dec95 May97 Oct98 Mar00 Aug01 Feb03 Jul04 Jan06 Jun07 Dec08 May10 Nov11
Calibrationperiod 3 year
Calibrationperiod 4 year
0.15
0.15
0.1
0.1
0.05
0.05
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.15
-0.2
Dec96 Apr98 Aug99 Dec00 Apr02 Aug03 Jan05 May06 Oct07 Feb09 Jul10 Nov11
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
Figure 45. The figure shows the Hill predictions of VAR and ES and the actual negative return series of OMXS30 for
calibration periods of different length. The length of the calibration period for each window is found in the title of the
window. The threshold for extreme observations is set to 10 % of the sample.
Calibrationperiod 3 year
0.1
PF
VAR hist
VAR GPD
ES hist
ES GPD
0.05
0
-0.05
-0.1
-0.15
Dec96
Apr98
Aug99
Dec00
Apr02
Aug03
Jan05
May06
Oct07
Feb09
Jul10
Nov11
Oct06
Jan08
Apr09
Aug10
Nov11
Calibrationperiod 4 year
0.1
0.05
0
-0.05
-0.1
-0.15
Nov97
Mar99
May00
Aug01
Nov02
Mar04
Jun05
Figure 46. The figure shows the GPD predictions of VAR and ES and the actual negative return series of OMXS30 for
calibration periods of different length. The length of the calibration period for each window is found in the title of the
window. The threshold for extreme observations is set to 10 % of the sample. Predictions based on one and two years of
historical data are excluded since the parameter estimates didn’t converge for those time horizons, as described in
section 6.2.1.
64
Calibrationperiod 1 year
PF
0.3
VAR hist
VAR cond Hill
0.2
ES hist
ES cond Hill
0.3
0.2
0.1
Calibrationperiod 2 year
0.1
0
0
-0.1
-0.1
-0.2
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
-0.2
Dec95 May97 Oct98 Mar00 Aug01 Feb03 Jul04 Jan06 Jun07 Dec08 May10 Nov11
Calibrationperiod 3 year
Calibrationperiod 4 year
0.3
0.3
0.2
0.2
0.1
0.1
0
0
-0.1
-0.1
-0.2
Dec96 Apr98 Aug99 Dec00 Apr02 Aug03 Jan05 May06 Oct07 Feb09 Jul10 Nov11
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
Figure 47. The figure shows the predictions from conditional Hill with normal innovations of VAR and ES and the actual
negative return series of OMXS30 for calibration periods of different length. The length of the calibration period for each
window is found in the title of the window. The threshold for extreme observations is set to 10 % of the sample.
VAR predictions, calibrationperiod 1 year
0.2
0.15
PF
VAR hist
VAR cond Hill T
VAR cond Hill N
0.05
0
0
-0.05
-0.05
-0.1
-0.1
-0.15
-0.15
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
ES predictions, calibrationperiod 1 year
0.3
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
PF
ES hist
ES cond Hill T 0.3
ES cond Hill N
0.2
0.2
0.1
0.1
0
Losses
Losses
0.1
0.05
Losses
Losses
0.1
VAR predictions, calibrationperiod 4 year
0.15
ES predictions, calibrationperiod 4 year
0
-0.1
-0.1
-0.2
Dec94 Jun96 Jan98 Jul99 Jan01 Jul02 Feb04 Aug05 Mar07 Sep08 Apr10 Nov11
-0.2
Nov97 Mar99 May00 Aug01 Nov02 Mar04 Jun05 Oct06 Jan08 Apr09 Aug10 Nov11
Figure 48. In the two windows on the upper half VAR predictions for both t and normal innovations can be seen, in the
two windows on the bottom half ES predictions are illustrated. The predictions from historical simulation are included to
visualize the flexibility of both the conditional Hill models.
65