Backtesting Stochastic Mortality Models

Centre for Risk & Insurance Studies
enhancing the understanding of risk and insurance
Backtesting Stochastic Mortality Models:
An Ex-Post Evaluation of Multi-Period-Ahead
Density Forecasts
Kevin Dowd, Andrew J.G. Cairns, David Blake
Guy D. Coughlan, David Epstein, Marwa Khalaf-Allah
CRIS Discussion Paper Series – 2008.IV
Backtesting Stochastic Mortality Models:
An Ex-Post Evaluation of Multi-Period-Ahead Density Forecasts
Kevin Dowd *, Andrew J.G. Cairns #, David Blake ♣
Guy D. Coughlan, ♦ David Epstein, ♦ Marwa Khalaf-Allah ♦
September 8, 2008
Abstract
This study sets out a backtesting framework applicable to the multi-period-ahead
forecasts from stochastic mortality models and uses it to evaluate the forecasting
performance of six different stochastic mortality models applied to English & Welsh
male mortality data. The models considered are: Lee-Carter’s 1992 one-factor model;
a version of Renshaw-Haberman’s 2006 extension of the Lee-Carter model to allow
for a cohort effect; the age-period-cohort model of Currie (2006), which is a
simplified version of Renshaw-Haberman; Cairns, Blake and Dowd’s 2006 two-factor
model; and two generalised versions of the latter with an added cohort effect. For the
data set used herein the results from applying this methodology suggest that the
models perform adequately by most backtests, and that there is little difference
between the performances of five of the models. The remaining model, however,
shows forecast instability. The study also finds that density forecasts that allow for
uncertainty in the parameters of the mortality model are more plausible than forecasts
that do not allow for such uncertainty.
Key words: backtesting, forecasting performance, mortality models
*
Centre for Risk & Insurance Studies, Nottingham University Business School, Jubilee Campus,
Nottingham,
NG8
1BB,
United
Kingdom.
Corresponding
author:
email
at
[email protected]. The authors thank Lixia Loh and Liang Zhao for excellent research
assistance.
#
Maxwell Institute for Mathematical Sciences, and Actuarial Mathematics and Statistics, Heriot-Watt
University, Edinburgh, EH14 4AS, United Kingdom.
♣
Pensions Institute, Cass Business School, 106 Bunhill Row, London, EC1Y 8TZ, United Kingdom.
♦
Pension ALM Group, JPMorgan Chase Bank, 125 London Wall, London EC2Y 5AJ, United
Kingdom.
1
1. Introduction
A recent paper, Cairns et al. (2007), examined the empirical fits of eight different
stochastic mortality models, variously labelled M1 to M8. Seven 1 of these models
were further analysed in a second study, Cairns et al. (2008), which evaluated the exante plausibility of the models’ probability density forecasts. Amongst other findings,
that study found that one of these models – M8, a version of the Cairns-Blake-Dowd
(CBD) model (Cairns et al., 2006) with an allowance for a cohort effect – generated
implausible forecasts on US data, and consequently this model was also dropped from
further consideration. A third study, Dowd et al. (2008), then examined the ‘goodness
of fit’ of the remaining six models by analysing the statistical properties of their
various residual series. The six models that were examined are: M1, the Lee-Carter
model (Lee and Carter, 1992): M2B, a version of Renshaw and Haberman’s cohorteffect generalisation of the Lee-Carter model (Renshaw and Haberman, 2006); M3B,
a version of Currie’s age-period-cohort model (Currie, 2006), which is a simplified
version of M2B; 2 M5, the CBD model; and M6 and M7, two alternative cohort-effect
generalisations of the CBD model. Details of these models’ specifications are given in
Appendix A.
It is quite possible for a model to provide a good in-sample fit to historical data and
produce forecasts that appear to be ‘plausible’ ex ante, but still produce poor ex-post
forecasts, that is, forecasts that differ significantly from subsequently realised
outcomes. A ‘good’ model should therefore produce forecasts that perform well outof-sample when evaluated using appropriate forecast evaluation or backtesting
methods, as well as provide good fits to the historical data and plausible forecasts ex
ante. The primary purpose of the present paper is, accordingly, to set out a backtesting
framework that can be used to evaluate the ex-post forecasting performance of a
mortality model.
1
The model omitted from this second study was the P-splines model of Currie et al. (2004) and Currie
(2006). This model was dropped in part because of its poor performance relative to the other models
when assessed by the Bayes Information Criterion, and in part because it cannot be used to project
stochastic mortality outcomes.
2
M2B and M3B are the versions of M2 and M3 that assume an ARIMA(1,1,0) process for the cohort
effect (see Cairns et al., 2008).
2
A secondary purpose of the paper is to illustrate this backtesting framework by
applying it to the six models listed above. The backtesting framework is applied to
each model over various forecast horizons using a particular data set, namely,
LifeMetrics data for the mortality rates of English & Welsh males 3 for ages varying
from 60 to 89 and spanning the years 1971 to 2006.
The backtesting of mortality models is still in its infancy. Most studies that assess
mortality forecasts focus on ex-post forecast errors (see, e.g., Keilman (1997, 1998),
National Research Council (2000), or Koissi et al. (2006)). This is limited in so far as
it ignores all information contained in the probability density forecasts, except for the
information reflected in the mean forecast or ‘best estimate’. A more sophisticated
treatment – and a good indicator of the current state of best practice in the evaluation
of mortality models – is provided by Lee and Miller (2001). They evaluated the
performance of the Lee-Carter (1992) model by examining the behaviour of forecast
errors (comparing MAD, RMSE, etc.) and plots of ‘percentile error distributions’,
although they did not report any formal test results based on these latter plots.
However, they also reported plots showing mortality prediction intervals and
subsequently realised mortality rates, and the frequencies with which realised
mortality observations fell within the prediction intervals. These provide a more
formal (i.e., probabilistic) sense of model performance. More recently, CMI (2006)
included backtesting evaluations of the P-spline model and CMI (2007) included
backtesting evaluations of M1 and M2. Their evaluations were based on plots of
realised outcomes against forecasts, where the forecasts included both projections of
central values and projections of prediction intervals, which also give some
probabilistic sense of model performance.
The backtesting framework used in this paper can best be understood if we outline the
following key steps:
1. We begin by selecting the metric of interest, namely the forecasted variable
that is the focus of the backtest. Possible metrics include the mortality rate, life
3
See Coughlan et al. (2007) and www.lifemetrics.com for the data and a description of LifeMetrics.
The original source of the data was the UK Office for National Statistics.
3
expectancy, future survival rates, and the prices of annuities and other lifecontingent financial instruments. Different metrics are relevant for different
purposes, for example, in evaluating the effectiveness of a hedge of longevity
or mortality risk, the relevant metric is the monetary value of the underlying
exposure. In this paper, we focus on the mortality rate itself, but, in principle,
backtests could be conducted on any of these other metrics as well.
2. We select the historical ‘lookback’ window which is used to estimate the
parameters of each model for any given year: thus, if we wish to estimate the
parameters for year t and we use a lookback window of length n , then we are
estimating the parameters for year t , using observations from years t − n + 1 to
t . In this paper, we use a fixed-length 4 lookback window of 10 years 5.
3. We select the horizon (i.e., the ‘lookforward’ window) over which we will
make our forecasts, based on the estimated parameters of the model. In the
present study, we focus on relatively long-horizon forecasts, because it is with
the accuracy of these forecasts that pension plans are principally concerned,
but which also pose the greatest modelling challenges.
4. Given the above, we decide on the backtest to be implemented and specify
what constitutes a ‘pass’ or ‘fail’ result. Note that we use the term ‘backtest’
here to refer to any method of evaluating forecasts against subsequently
realised outcomes. ‘Backtests’ in this sense might involve the use of plots
whose goodness of fit is interpreted informally, as well as formal statistical
tests of predictions generated under the null hypothesis that a model produces
adequate forecasts.
The above framework for backtesting stochastic mortality models is a very general
one.
4
The use of a fixed-length lookback window simplifies the underlying statistics and means that our
results are equally responsive to the ‘news’ in any given year’s observations. By contrast, an expanding
lookback window is more difficult to handle and also means that, as time goes by, the ‘news’ contained
in the most recent observations receives less weight, relative to the expanding number of older
observations in the lookback sample.
5
We choose a 10-year lookback window because that is the window emerging as the standard amongst
market practitioners. However, we acknowledge that a lookback window of 10 years is not necessarily
optimal from a forecasting perspective. The choice of lookback window inevitably involves a tradeoff:
a long window is less responsive than a short window to changes in the trend of mortality rates, but is
less influenced by random fluctuations in mortality rates than a short window. The choice of lookback
window is therefore to some extent subjective.
4
Within this broad framework, we implement the following four backtest procedures
for each model, noting that the metric (mortality rate) and lookback windows (10
years) are the same in each case:
● Contracting horizon backtests: First, we evaluate the convergence properties
of the forecast mortality rate to the actual mortality rate at a specified future
date. We chose 2006 as the forecast date and we examine forecasts of that
year’s mortality rate, where the forecasts are made at different dates in the
past. The first forecast was made with a model estimated from 10 years of
historical observations up to 1980, the second with the model estimated from
10 years of historical observations up to 1981, and so forth, up to 2006. In
other words, we are examining forecasts with a fixed end date and a
contracting horizon from 26 years ahead down to one year ahead. The backtest
procedure then involves a graphical comparison of the evolution of the
forecasts towards the actual mortality rate for 2006 and intuitively ‘good’
forecasts should converge in a fairly steady manner towards the realised value
for 2006.
● Expanding horizon backtests: Second, we consider the accuracy of forecasts of
mortality rates over expanding horizons from a common fixed start date, or
‘stepping off’ year. For example, the start date might be 1980, and the
forecasts might be for one up to 26 years’ ahead, that is, for 1981 up to 2006.
The backtest procedure again involves a graphical comparison of forecasts
against realised outcomes, and the ‘goodness’ of the forecasts can be assessed
in terms of their closeness to the realised outcomes.
● Rolling fixed-length horizon backtests: Third, we consider the accuracy of
forecasts over fixed-length horizons as the stepping off date moves
sequentially forward through time. This involves examining plots of mortality
prediction intervals for some fixed-length horizon (e.g., 15 years) rolling
forward over time – with subsequently realised outcomes superimposed on
them.
● Mortality probability density forecast tests: Fourth, we carry out formal
hypothesis tests in which we use each model as of one or more start dates to
5
simulate the forecasted mortality probability density at the end of some
horizon period, and we then compare the realised mortality rate(s) against this
forecasted probability density (or densities). The forecast ‘passes’ the test if
each realised rate lies within the more central region of the forecast probability
density and the forecast ‘fails’ if it lies too far out in either tail to be plausible
given the forecasted probability density. 6
For each of the four classes of test, we examine two types of mortality forecast. The
first of these are forecasts in which it is assumed that the parameters of the mortality
models are known with certainty. The second are forecasts, in which we make
allowance for the fact that the parameters of the models are only estimated, i.e., their
‘true’ values are unknown. We would regard the latter as more plausible, and
evidence from other studies suggests that allowing for parameter uncertainty can
make a considerable difference to estimates of quantifiable uncertainty (see, e.g.,
Cairns et al. (2006), Dowd et al. (2006) or Blake et al. (2008)). For convenience we
label these the ‘parameter certain’ (PC) and ‘parameter uncertain’ (PU) cases
respectively. 7 More details of the latter case are given in Appendix B.
This paper is organised as follows. Section 2 considers the contracting horizon
backtests, section 3 considers the expanding horizon backtests, section 4 considers the
6
In principle, we could also perform further backtests based on the Probability Integral Transformation
(PIT) series that are predicted to be standard uniform under the null of forecast adequacy, which we
might do, e.g., by tests of the Kolmogorov-Smirnov family (e.g., Kolmogorov-Smirnov test, Kuiper
tests, Lilliefors tests, etc.). We could also put the PIT series through an inverse standard normal
distribution function as suggested by Berkowitz (2001) and then test these series for the predictions of
standard normality, e.g., a zero mean, a unit standard deviation, a Jarque-Bera test of the skewness and
kurtosis predictions of normality, etc. We carried out such tests but do not report them here because the
PIT and Berkowitz series are not predicted to be iid in a multi-period ahead forecasting context (and
more to the point, in many cases iid is clearly rejected), and the absence of iid undermines the validity
of standard tests (e.g., such as the Kolmogorov-Smirnov test). A potential solution is offered by Dowd
(2007), who suggests that the Berkowitz series should be a moving average of iid standard normal
series whose order is determined by the properties of the year-ahead forecast, but this is only a
conjecture. An alternative possibility is to fit some ad hoc process to the Berkowitz series (e.g., such as
an ARMA type process, as suggested by Dowd, 2008). We have not followed up these suggestions in
the present study.
7
To be more precise, the PU versions of the model use a Bayesian approach to take account of the
uncertainty in the parameters driving the period and, where applicable, the cohort effects. There is no
need to take account of uncertainty in the age effects present, because we can rely on the law of large
numbers to give us fairly precise estimates of these latter effects, i.e., so the estimation errors in the age
effects will be negligible.
6
rolling fixed-length backtests for an illustrative horizon of 15 years, and section 5
considers the mortality probability forecast tests. Section 6 concludes.
2. Contracting Horizon Backtests: Examining the Convergence of Forecasts
through Time
The first kind of backtest in our framework examines the consistency of forecasts for
a fixed future year (in our example below, the year 2006) made in earlier years. For a
well-behaved model we would expect consecutive forecasts to converge towards the
realised outcome as the date the forecasts are made (the stepping-off date) approaches
the forecast year.
Figures 1, 2 and 3 show plots of forecasts of the 2006 mortality rate for 65-year-old,
75-year-old and 85-year-old males, respectively, made in years 1980, 1981, …, 2006.
The central lines in each plot are the relevant model’s median forecast of the 2006
mortality rate based on 5,000 simulation paths, and the dotted lines on either side are
estimates of the model’s 90% prediction interval or risk bounds (i.e., 5th and 95th
percentiles). The starred point in each chart is the realised mortality rate for that age in
2006.
These plots show the following patterns:
●
For models M1, M3B, M5, M6 and M7, the forecasts tend to decline in a
stable way over time towards a value close to the realised value and the
prediction intervals narrow over time. 8
●
For these same models, the PU prediction intervals are considerably wider
than the PC prediction intervals, but there is negligible difference between the
PC and PU median plots.
8
It is noteworthy, however, that models M1 and M5 both produce projected mortality rates for the
mid 60’s that are systematically below the observed mortality rate in 2006. This appears to be due to a
significant cohort effect in the data that the other models are picking up, and this would explain why
the realised mortality rates for these models for the age-65 charts in Figure 1 are a little out of line with
forecasts from the periods running up to 2006.
7
● For M2B, the forecasts are sometimes unstable, exhibiting major spikes, and
there is also typically less difference between the PC and PU forecasts than is
the case with the other models.
Note that we have only evaluated the convergence of the models for one end date and
one data set. This is insufficient to draw general conclusions about the forecasting
capabilities of the various models. However, the instability observed in M2 (or at least
our variant of M2, M2B) is an issue.
Figure 1: Forecasts of the 2006 Mortality Rate from 1980 Onwards:
Males aged 65
Males aged 65: Model M1
Males aged 65: Model M2B
0.04
Mortality rate
Mortality rate
0.04
0.03
0.02
0.01
1980
1985
1990
1995
2000
0.03
0.02
0.01
1980
2005
Males aged 65: Model M3B
Mortality rate
Mortality rate
0.02
1985
1990
1995
2000
2000
2005
0.03
0.02
0.01
1980
2005
Males aged 65: Model M6
1985
1990
1995
2000
2005
Males aged 65: Model M7
0.04
Mortality rate
0.04
Mortality rate
1995
0.04
0.03
0.03
0.02
0.01
1980
1990
Males aged 65: Model M5
0.04
0.01
1980
1985
1985
1990
1995
Stepping off year
2000
2005
0.03
0.02
0.01
1980
1985
1990
1995
Stepping off year
2000
2005
Notes: Forecasts based on estimates using English & Welsh male mortality data for ages 60-89 and a
rolling 10-year historical window. The stepping-off year is the final year in the rolling window. The
fitted model is then used to estimate the median and 90% prediction interval for both parameter-certain
forecasts (given by the dashed lines) and parameter-uncertain cases (given by the continuous lines). The
realised mortality rate for 2006 is denoted by *. Based on 5,000 simulation trials.
8
Figure 2: Forecasts of the 2006 Mortality Rate from 1980 Onwards:
Males aged 75
Males aged 75: Model M1
Males aged 75: Model M2B
0.08
Mortality rate
Mortality rate
0.08
0.06
0.04
0.02
1980
1985
1990
1995
2000
0.06
0.04
0.02
1980
2005
Males aged 75: Model M3B
Mortality rate
Mortality rate
2000
2005
0.04
1985
1990
1995
2000
0.06
0.04
0.02
1980
2005
Males aged 75: Model M6
1985
1990
1995
2000
2005
Males aged 75: Model M7
0.08
Mortality rate
0.08
Mortality rate
1995
0.08
0.06
0.06
0.04
0.02
1980
1990
Males aged 75: Model M5
0.08
0.02
1980
1985
1985
1990
1995
Stepping off year
2000
0.06
0.04
0.02
1980
2005
1985
1990
1995
Stepping off year
2000
2005
Notes: As per Figure 1.
Figure 3: Forecasts of the 2006 Mortality Rate from 1980 Onwards:
Males aged 85
Males aged 85: Model M2B
0.25
0.2
0.2
Mortality rate
Mortality rate
Males aged 85: Model M1
0.25
0.15
0.1
0.05
1980
1985
1990
1995
2000
0.15
0.1
0.05
1980
2005
0.25
0.25
0.2
0.2
0.15
0.1
0.05
1980
1985
1990
1995
2000
0.05
1980
2005
0.2
Mortality rate
Mortality rate
0.25
0.15
0.1
2000
2000
2005
1985
1990
1995
2000
2005
Males aged 85: Model M7
0.2
1990
1995
Stepping off year
1995
0.1
0.25
1985
1990
0.15
Males aged 85: Model M6
0.05
1980
1985
Males aged 85: Model M5
Mortality rate
Mortality rate
Males aged 85: Model M3B
2005
0.15
0.1
0.05
1980
1985
1990
1995
Stepping off year
2000
2005
Notes: As per Figure 1.
9
3. Expanding Horizon Backtests
In the second class of backtest, we consider the accuracy of forecasts over increasing
horizons against the realised outcomes for those horizons. Accuracy is reflected in the
degree of consistency between the outcome and the prediction interval associated with
each forecast.
These backtests are best evaluated graphically using charts of mortality prediction
intervals. Figure 4 shows the mortality prediction intervals for age 65 for forecasts
starting in 1980 (with model parameters estimated using data from the preceding 10
years). The charts show the 90% prediction intervals (or ‘risk bounds’) as dashed lines
for the PC forecasts and continuous lines for the PU forecasts. Roughly speaking, if a
model is adequate, we can be 90% confident of any given outcome occurring between
the dashed risk bounds if we ‘believe’ the PC forecasts, and we can be 90% confident
of any given outcome occurring between the continuous risk bounds if we ‘believe’
the PU forecasts. The prediction intervals all show the same basic shape – they fan out
somewhat over time around a gradually decreasing trend and show a little more
uncertainty on the upper side than on the lower side. But the most striking finding is
that the PU risk bounds are considerably wider than their PC equivalents, indicating
that the PU forecasts are more plausible (in the sense of having higher likelihoods)
than the PC ones.
The charts also show the forecasted median mortality forecasts as dotted lines for the
PC forecasts and continuous lines for the PU forecasts, and we see the median
projections falling as the forecast horizon lengthens. As in the earlier Figures, there
are virtually no differences between the PC- and PU-based median projections.
Superimposed on the charts are the realised outcomes indicated by stars. For the most
part, the paths of realised outcomes tend to be below the median projection and move
downwards over time relative to the forecasts. This suggests that, viewed from 1980,
most forecasts are biased upwards (i.e., they under-estimate the downward trend of
future mortality rates) and the bias tends to increase with the length of the forecast
10
horizon. However, the size and significance of this bias varies considerably across
different models and between PC and PU forecasts.
Figure 4: Mortality Prediction-Interval Charts from 1980: Males aged 65
Males aged 65: Model M1
Males aged 65: Model M2B
0.05
PU: [xL, xM, xU, n] = [8, 27, 0, 27]
PC: [xL, xM, xU, n] = [7, 25, 1, 27]
Mortality rate
Mortality rate
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
0.04
0.03
0.02
0.01
1980
1985
1990
1995
2000
0.05
0.03
0.02
0.01
1980
2005
Males aged 65: Model M3B
Mortality rate
Mortality rate
0.03
0.02
1985
1990
1995
2000
0.05
0.03
0.02
1990
1995
Year
2000
PC: [xL, xM, xU, n] = [18, 27, 0, 27]
1985
1990
1995
2000
2005
Males aged 65: Model M7
PC: [xL, xM, xU, n] = [14, 25, 1, 27]
1985
2005
0.02
0.06
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
0.04
0.01
1980
2000
0.03
0.01
1980
2005
Mortality rate
Mortality rate
0.05
1995
0.04
Males aged 65: Model M6
0.06
1990
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
PC: [xL, xM, xU, n] = [12, 26, 1, 27]
0.04
0.01
1980
1985
Males aged 65: Model M5
PU: [xL, xM, xU, n] = [0, 26, 1, 27]
0.05
PC: [xL, xM, xU, n] = [16, 27, 0, 27]
0.04
2005
0.05
PU: [xL, xM, xU, n] = [0, 19, 1, 27]
PC: [xL, xM, xU, n] = [7, 19, 1, 27]
0.04
0.03
0.02
0.01
1980
1985
1990
1995
Year
2000
2005
Notes: Forecasts based on estimates using English & Welsh male mortality data for ages 60-89 and
years 1971-1980. The dashed lines refer to the forecast medians and bounds of the 90% prediction
interval for the parameter-certain (PC) forecasts and continuous lines are their equivalents for the
parameter-uncertain (PU) forecasts. The realised mortality rates are denoted by *. For each of these
cases, xL and xM are the numbers of realised rates below the lower 5% and 50% prediction bounds,
xU is the number of realised mortality rates above the upper 5% bound and n is the number of
forecasts including that for the starting point of the forecasts. Based on 5,000 simulation trials.
Figure 4 also shows quadruplets in the form [ xL , xM , xU , n ], where n is the number
of mortality forecasts, xL is the number of lower exceedances or the number of
realised outcomes out of n falling below the lower risk bound, xM is the number of
observations falling below the projected median and xU is the number of upper
exceedances or observations falling above the upper risk bound. 9 These statistics
9
Note though that the positions of the individual observations within the prediction intervals are not
independent. If, for example, the 1990 observation is ‘low’, then the 1995 observation is also likely to
be ‘low’.
11
provide useful indicators of the adequacy of model forecasts. If the forecasts are
adequate, for any given PC or PU case, we would expect xL and xU to be about 5%
and we would expect xM to be about 50%. Too many exceedances, on the other
hand, would suggest that the relevant bounds were incorrectly forecasted: for
example, too many realisations above the upper risk bound would suggest that the
lower risk bound is too low; too many observations below the median bound would
suggest that the forecasts are biased upwards; and too many observations below the
lower risk bound would suggest that the lower risk bound is too high.
As a robustness check, Figure 5 shows the comparable chart for 65-year-old males
starting from year 1990 (with model parameters estimated using data for years 19811990), with horizons now going out 16 years ahead to 2006. Figures 6-9 show the
corresponding results for ages 75 and 85 for both PC and PU forecasts.
These Figures suggest the following:
•
The models forecast decreases in median mortality rates, forecast risk bounds
that fan out over the forecast horizon, and in most cases considered show a
bias that increases with the forecast horizon.
•
The PU forecasts perform better than the PC forecasts.
•
Forecast performance tends to improve with higher ages.
•
For the sample periods considered, the forecasts are fairly robust to the choice
of sample period.
12
Figure 5: Mortality Prediction-Interval Charts from 1990: Males aged 65
Males aged 65: Model M1
Males aged 65: Model M2B
0.05
PU: [xL, xM, xU, n] = [10, 16, 1, 17]
PC: [xL, xM, xU, n] = [10, 16, 1, 17]
Mortality rate
Mortality rate
PU: [xL, xM, xU, n] = [0, 16, 1, 17]
0.04
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.05
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
Males aged 65: Model M3B
Males aged 65: Model M5
PU: [xL, xM, xU, n] = [15, 17, 0, 17]
PC: [xL, xM, xU, n] = [7, 16, 1, 17]
Mortality rate
Mortality rate
PU: [xL, xM, xU, n] = [0, 16, 1, 17]
0.05
0.04
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.05
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
Males aged 65: Model M7
0.06
PU: [xL, xM, xU, n] = [12, 16, 1, 17]
PC: [xL, xM, xU, n] = [14, 16, 1, 17]
Mortality rate
Mortality rate
0.05
PC: [xL, xM, xU, n] = [17, 17, 0, 17]
0.04
Males aged 65: Model M6
0.06
PC: [xL, xM, xU, n] = [13, 16, 1, 17]
0.04
0.04
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
0.05
PU: [xL, xM, xU, n] = [1, 14, 1, 17]
PC: [xL, xM, xU, n] = [10, 14, 1, 17]
0.04
0.03
0.02
0.01
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
Notes: As per Figure 4, but using data over years 1981-1990.
Figure 6: Mortality Prediction-Interval Charts from 1980: Males aged 75
Males aged 75: Model M1
0.08
Males aged 75: Model M2B
0.1
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
Mortality rate
Mortality rate
0.1
PC: [xL, xM, xU, n] = [12, 27, 0, 27]
0.06
0.04
1980
0.08
0.04
1985
1990
1995
2000
2005
1980
0.1
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
PC: [xL, xM, xU, n] = [8, 27, 0, 27]
0.06
0.04
1980
0.08
1990
1995
2000
2005
2000
2005
PU: [xL, xM, xU, n] = [0, 25, 1, 27]
PC: [xL, xM, xU, n] = [7, 25, 1, 27]
1980
1985
1990
1995
2000
2005
Males aged 75: Model M7
0.1
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
PC: [xL, xM, xU, n] = [8, 27, 0, 27]
0.06
0.04
1980
1995
0.04
1985
Mortality rate
Mortality rate
0.08
1990
0.06
Males aged 75: Model M6
0.1
1985
Males aged 75: Model M5
Mortality rate
Mortality rate
0.08
PC: [xL, xM, xU, n] = [13, 27, 0, 27]
0.06
Males aged 75: Model M3B
0.1
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
0.08
PU: [xL, xM, xU, n] = [1, 27, 0, 27]
PC: [xL, xM, xU, n] = [9, 27, 0, 27]
0.06
0.04
1985
1990
1995
Year
2000
2005
1980
1985
1990
1995
Year
2000
2005
Notes: As per Figure 4.
13
Figure 7: Mortality Prediction-Interval Charts from 1990: Males aged 75
Males aged 75: Model M1
0.08
Males aged 75: Model M2B
0.1
PU: [xL, xM, xU, n] = [1, 14, 0, 17]
Mortality rate
Mortality rate
0.1
PC: [xL, xM, xU, n] = [1, 14, 0, 17]
0.06
0.04
0.08
0.06
1990 1992 1994 1996 1998 2000 2002 2004 2006
Males aged 75: Model M3B
Males aged 75: Model M5
0.1
PU: [xL, xM, xU, n] = [1, 15, 0, 17]
Mortality rate
Mortality rate
0.08
PC: [xL, xM, xU, n] = [3, 15, 0, 17]
0.06
0.04
0.08
1990 1992 1994 1996 1998 2000 2002 2004 2006
Males aged 75: Model M7
0.1
PU: [xL, xM, xU, n] = [1, 14, 0, 17]
Mortality rate
Mortality rate
PC: [xL, xM, xU, n] = [4, 14, 0, 17]
0.06
Males aged 75: Model M6
0.08
PU: [xL, xM, xU, n] = [1, 14, 0, 17]
0.04
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.1
PC: [xL, xM, xU, n] = [1, 5, 0, 17]
0.04
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.1
PU: [xL, xM, xU, n] = [1, 5, 0, 17]
PC: [xL, xM, xU, n] = [3, 14, 0, 17]
0.06
0.04
0.08
PU: [xL, xM, xU, n] = [1, 16, 0, 17]
PC: [xL, xM, xU, n] = [5, 16, 0, 17]
0.06
0.04
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
Notes: As per Figure 5.
Figure 8: Mortality Prediction-Interval Charts from 1980: Males aged 85
Males aged 85: Model M1
0.2
Males aged 85: Model M2B
0.25
PU: [xL, xM, xU, n] = [1, 22, 0, 27]
PC: [xL, xM, xU, n] = [4, 22, 0, 27]
Mortality rate
Mortality rate
0.25
0.15
0.1
0.05
1980
1985
1990
1995
2000
0.2
0.1
Males aged 85: Model M3B
0.2
0.25
PU: [xL, xM, xU, n] = [1, 21, 0, 27]
PC: [xL, xM, xU, n] = [2, 21, 0, 27]
0.15
0.1
0.05
1980
1985
1990
1995
2000
0.2
2005
0.25
PU: [xL, xM, xU, n] = [1, 18, 0, 27]
PC: [xL, xM, xU, n] = [1, 18, 0, 27]
0.1
1985
1990
1995
Year
2000
2000
2005
PU: [xL, xM, xU, n] = [1, 24, 0, 27]
PC: [xL, xM, xU, n] = [2, 24, 0, 27]
1985
1990
1995
2000
2005
Males aged 85: Model M7
0.15
0.05
1980
1995
0.1
0.05
1980
Mortality rate
Mortality rate
0.2
1990
0.15
Males aged 85: Model M6
0.25
1985
Males aged 85: Model M5
Mortality rate
Mortality rate
0.25
PC: [xL, xM, xU, n] = [0, 5, 1, 27]
0.15
0.05
1980
2005
PU: [xL, xM, xU, n] = [0, 7, 1, 27]
2005
0.2
PU: [xL, xM, xU, n] = [1, 26, 0, 27]
PC: [xL, xM, xU, n] = [5, 26, 0, 27]
0.15
0.1
0.05
1980
1985
1990
1995
Year
2000
2005
Notes: As per Figure 4.
14
Figure 9: Mortality Prediction-Interval Charts from 1990: Males aged 85
Males aged 85: Model M1
0.2
Males aged 85: Model M2B
0.25
PU: [xL, xM, xU, n] = [1, 4, 0, 17]
PC: [xL, xM, xU, n] = [2, 4, 0, 17]
0.15
0.1
Mortality rate
Mortality rate
0.25
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.2
0.1
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
Males aged 85: Model M5
0.25
PU: [xL, xM, xU, n] = [0, 3, 1, 17]
PC: [xL, xM, xU, n] = [0, 3, 1, 17]
0.15
0.1
Mortality rate
Mortality rate
0.2
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
0.2
0.1
Males aged 85: Model M7
0.25
PU: [xL, xM, xU, n] = [0, 1, 1, 17]
PC: [xL, xM, xU, n] = [0, 1, 1, 17]
0.15
0.1
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
PC: [xL, xM, xU, n] = [0, 1, 1, 17]
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
Mortality rate
Mortality rate
0.2
PU: [xL, xM, xU, n] = [0, 1, 1, 17]
0.15
Males aged 85: Model M6
0.25
PC: [xL, xM, xU, n] = [1, 5, 0, 17]
0.15
Males aged 85: Model M3B
0.25
PU: [xL, xM, xU, n] = [1, 5, 0, 17]
0.2
PU: [xL, xM, xU, n] = [0, 1, 1, 17]
PC: [xL, xM, xU, n] = [0, 1, 1, 17]
0.15
0.1
0.05
1990 1992 1994 1996 1998 2000 2002 2004 2006
Year
Notes: As per Figure 5.
It is also useful to investigate further the performance of the different models across
PC and PU forecasts. To help do so, we can examine Table 1 which summarises the
models’ average scores (i.e., xL , xM and xU ) from Figures 4-9, where the averages
are taken for each model across all these Figures. The first six rows give the average
percentages of outcomes that are exceedances (i.e., it gives xL / n , xM / n and
xU / n ). The next six rows give the same excess exceedances in absolute value form –
that is, | xL / n − 0.05 | , | xM / n − 0.5 | and | xU / n − 0.05 | – and these are expected to
be ‘close’ to zero under the null hypothesis: they show how the actual exceedances
compare against expectations under the null. The next six lines give the models’
rankings by how close the actual exceedances compare to expectations.
15
Table 1: Exceedances Results for Expanding Horizon Backtests
Parameters Certain
Model
xL / n
27.3%
33.3%
24.2%
36.4%
30.3%
27.3%
M1
M2B
M3B
M5
M6
M7
| xL / n − 0.05 |
M1
M2B
M3B
M5
M6
M7
22.3%
28.3%
19.2%
31.4%
25.3%
22.3%
M1
M2B
M3B
M5
M6
M7
=2
5
1
6
4
=2
Parameters Uncertain
(a) Exceedances
xM / n
xU / n
xL / n
xM / n
81.8%
1.5%
3.0%
81.8%
64.4%
1.5%
15.9%
65.9%
81.8%
2.3%
2.3%
81.8%
81.8%
1.5%
13.6%
81.8%
76.5%
2.3%
11.4%
76.5%
78.0%
2.3%
3.0%
78.0%
(b) Absolute values of excess exceedances
xM n − 0.5
| xM / n − 0.5 | | xU / n − 0.05 | | xL / n − 0.05 |
31.8%
3.5%
2.0%
31.8%
14.4%
3.5%
10.9%
15.9%
31.8%
2.7%
2.7%
31.8%
31.8%
3.5%
8.6%
31.8%
26.5%
2.7%
6.4%
26.5%
28.0%
2.7%
2.0%
28.0%
(c) Ranking by absolute values of excess exceedances
=4
=1
=4
1
6
1
=4
3
=4
=4
5
=4
2
4
2
3
=1
3
xU / n
1.5%
1.5%
2.3%
1.5%
2.3%
2.3%
| xU / n − 0.05 |
3.5%
3.5%
2.7%
3.5%
2.7%
2.7%
Notes: Based on the exceedance results in Figures 4-9. xL , xM and xU are the numbers of
observations below the lower risk bound, below the median bound and above the upper risk bound
respectively, and n is the number of forecasts.
The main points to note are:
•
Lower exceedances: The PC models in general have far too many lower
exceedances. By contrast, the corresponding PU lower exceedances are much
closer to expectations, and especially so for models M1 and M7 (which at 2%
rank equal first by this criterion) and M3B which follows close behind at 3%.
These findings confirm our claim that the PU forecasts are more plausible than
the PC forecasts.
•
Median exceedances: There are negligible differences between the PC and PU
median exceedance results, and the proportions of median exceedances are
much higher than expected. This confirms that there is, in general, a tendency
for the forecasts to be biased upwards. The top performing models by this
16
criterion are M2B (at 15.9% for the PU case), M6 (at 26.5%) and M7 (at
28%). The remaining models all score a little higher at 31.8%.
•
Upper exceedances: Finally, there are very few upper exceedances – in every
case, there is only either 0 or 1 upper exceedances – so the upper exceedances
results are close to expectations and there are no notable differences across the
models or across PC and PU forecasts. Accordingly, there is no point trying to
rank the models’ upper-exceedances performance.
4. Rolling Fixed-Length Horizon Backtests
In the third class of backtests, we consider the accuracy of forecasts over fixed
horizon periods as these roll forward through time. Once again, accuracy is reflected
in the degree of consistency between the collective set of realised outcomes and the
prediction intervals associated with the forecasts. We examine the case of an
illustrative 15-year forecast horizon and, from this point on, we restrict ourselves to
backtesting the performance of the PU forecasts only. Figures 10-16 show the rolling
prediction intervals for each of the six models in turn. Each plot shows the rolling
prediction intervals and realised mortality outcomes for ages 65, 75 and 85 over the
years 1995 to 2006, where the forecasts are obtained using data from 15 years before.
Figure 10 gives the prediction intervals for model M1, Figure 11 gives the rolling
prediction intervals for M2B, and so forth. To facilitate comparison, the y-axes in all
the charts have the same range running from 0.01 to 0.2 on a logarithmic scale.
The most striking feature of these charts is the instability in the forecasts of M2B in
Figure 11. As with Figures 1-3, this model generates occasional sharp and seemingly
implausible ‘spikes’ in the forecasts which again suggests some degree of instability in
the forecasts of this model. The other charts indicate that there is not much to choose
between the other five models. The models all show periods in which forecasts are
within the prediction intervals for all ages, but especially for higher ages (i.e., age 85).
They also show periods in which the forecasts have drifted below the prediction
intervals, particularly for age 65.
17
Figure 10: Rolling 15-Year Ahead Prediction Intervals: M1
Age 85: [xL, xM, xU, n] = [1, 10, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 11, 0, 12]
Age 65: [xL, xM, xU, n] = [1, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: Model estimates based on English & Welsh male mortality data for ages 60-89 and a rolling 10year historical window starting from 1971-1980. The continuous, short-dashed and long-dashed lines
are the parameter-uncertain (PU) forecast medians and bounds of the 90% prediction interval for ages
65, 75 and 85, respectively. The realised mortality rates are denoted by *. For each age, xL and xM
are the numbers of realised rates below the lower 5% and 50% prediction bounds, xU is the number of
realised mortality rates above the upper 5% bound and n is the number of forecasts including that for
the starting point of the forecasts. Based on 5,000 simulation trials.
18
Figure 11: Rolling 15-Year Ahead Prediction Intervals: M2B
UU
Age 85: [xL, xM, xU, n] = [1, 5, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 65: [xL, xM, xU, n] = [8, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: As per Figure 10.
Figure 12: Rolling 15-Year Ahead Prediction Intervals: M3B
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 65: [xL, xM, xU, n] = [2, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: As per Figure 10.
19
Figure 13: Rolling 15-Year Ahead Prediction Intervals: M5
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 65: [xL, xM, xU, n] = [9, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: As per Figure 10.
Figure 14: Rolling 15-Year Ahead Prediction Intervals: M6
Age 85: [xL, xM, xU, n] = [0, 4, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 65: [xL, xM, xU, n] = [10, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: As per Figure 10.
20
Figure 15: Rolling 15-Year Ahead Prediction Intervals: M7
Age 85: [xL, xM, xU, n] = [0, 8, 0, 12]
Age 75: [xL, xM, xU, n] = [0, 12, 0, 12]
Age 65: [xL, xM, xU, n] = [4, 12, 0, 12]
Age 85
-1
Mortality rate
10
Age 75
Age 65
-2
10
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
Notes: As per Figure 10.
Table 2 presents a summary of the exceedance results reported in Figures 10-15. The
Table leaves out upper exceedances because there were none observed in any of these
Figures. We find that:
•
Lower exceedances: All models exhibit excess lower exceedances, but the
degree of excess is very low (0.6%) for M1 and M3B, 5.6% for M7, and a
little over 20% for the remaining models.
•
Median exceedances: All models show high excess median exceedances. The
best performing models by this criterion are M6 (27.8%) and M2B (30.5%).
M2B, M5 and M7 then come in joint third at 38.9%, and M1 comes in last at
40.7%.
21
Table 2: Exceedances Results for Rolling Fixed-Horizon Backtests: Parameters
Uncertain
Model
M1
M2B
M3B
M5
M6
M7
M1
M2B
M3B
M5
M6
M7
M1
M2B
M3B
M5
M6
M7
(a) Exceedances
xL / n
xM / n
5.6%
91.7%
25.0%
80.6%
5.6%
88.9%
25.0%
88.9%
27.8%
77.8%
11.1%
88.9%
(b) Excess exceedances
xM / n − 0.5
xL / n − 0.05
0.6%
41.7%
20.0%
30.6%
0.6%
38.9%
20.0%
38.9%
22.8%
27.8%
6.1%
38.9%
(c) Ranking by excess exceedances
=1
6
=4
2
=1
=3
=4
=3
6
1
3
=3
Notes: Based on the exceedance results in Figures 10-15. xL and xM are the numbers of observations
below the lower risk bound and below the median bound, and n is the number of forecasts.
5. Mortality Probability Density Forecast Backtests: Backtests based on
Statistical Hypothesis Tests
The fourth and final class of backtests involves formal hypothesis tests based on
comparisons of realised outcomes against forecasts of the relevant probability
densities. 10
To elaborate, suppose we wish to use data up to and including 1980 to evaluate the
forecasted probability density function of the mortality rate of, say, 65-year olds in
2006, involving a forecast horizon of 26 years ahead. A simple way to implement
10
Since we are testing forecasts of future mortality rates, we can also regard the tests considered here
as similar to tests of q -forward forecasts using the different models; q -forwards are mortality rate
forward contracts (see Coughlan et al. , 2007).
22
such a test is to use each model to forecast the cumulative density function (CDF) for
the mortality rate in 2006, as of 1980, and compare how the realised mortality rate for
2006 compares with its forecasted distribution.
To carry out this backtest for any given model, we use the data from 1971 to 1980 to
make a forecast of the CDF of the mortality rate of 65-year olds in 2006. 11 The curves
shown in Figure 16 are the CDFs for each of the six models. The null hypothesis is
that the realised mortality rate for 65-year-old males in 2006 is consistent with the
forecasted CDF, so we superimpose on these CDF curves the realised mortality rate
for this age group in 2006. We then determine the p -value associated with the null
hypothesis, which is obtained by taking the value of the CDF where the vertical line
drawn up from the realised mortality rate on the x-axis crosses the cumulative density
curve. A model passes the test if the p -value is large enough 12 (i.e., is not too far out
into the tails of the forecasted cumulative density) and otherwise fails it.
For the backtests illustrated in Figure 16, we get estimated p -values of 15.9%, 2.1%,
7.4%, 4.9%, 5.2% and 16.5% for models M1, M2B, M3B, M5, M6 and M7,
respectively. 13 These results tell us that the null hypothesis that M1 generates
adequate forecasts over a horizon of 26 years is associated with a probability of
15.9%, and so would ‘pass’ a test based on a conventional significance level of 5%.
The corresponding p -values for M2B, M3B, M5, M6 and M7 are 2.1%, 7.4%, 4.9%,
5.2% and 16.5%, so by these test results the forecasts of M2B and M5 ‘fail’ at the 5%
but ‘pass’ at the 1% significance level; whereas those of other models ‘pass’ at both
significance levels. 14
11
The reader will recall that we are now dealing with PU forecasts only.
This statement does however assume that we are dealing with CDF-values on the left-hand of the
Figure. Strictly speaking, the forecasts would also fail if the realised mortality rate were associated with
a CDF value that is in the upper tail of the forecasted distribution. This raises the issue of 1-sided
versus 2-sided tests which is dealt with further in note 14 below.
13
We should be careful to resist the temptation to use the estimated p -values to rank models. The p values from different models are not comparable because the alternative hypotheses are not comparable
across the models. We can therefore only use the p -values to test each model on a stand-alone basis.
14
Where the realised values fall into the left- hand tails, the reported p -values are those associated
with 1-sided tests of the null. Of course, we may wish to work with 2-sided tests instead, and we should
in principle also take account of the possibility that we might get ‘fail’ results because realised
12
23
Figure 16: Bootstrapped P -Values of Realised Mortality Outcomes:
Males aged 65, 1980 Start, Horizon = 26 Years Ahead
Males aged 65: Model M1
Males aged 65: Model M2B
1
CDF under null
CDF under null
1
Realised q = 0.0149 : p-value = 0.159
0.5
0
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0
0.04
Realised q = 0.0149 : p-value = 0.021
0.5
0
0.005
Males aged 65: Model M3B
CDF under null
CDF under null
Realised q = 0.0149 : p-value = 0.074
0.025
0.03
0.035
0.04
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0
0.04
0.035
0.04
0.035
0.04
Realised q = 0.0149 : p-value = 0.049
0.5
0
0.005
Males aged 65: Model M6
0.01
0.015
0.02
0.025
0.03
Males aged 65: Model M7
1
CDF under null
1
CDF under null
0.02
1
0.5
Realised q = 0.0149 : p-value = 0.052
0.5
0
0.015
Males aged 65: Model M5
1
0
0.01
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
0.035
0.04
Realised q = 0.0149 : p-value = 0.165
0.5
0
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
Note: Forecasts based on models estimated using English & Welsh male mortality data for ages 60-89
and years 1971-1980 assuming parameter uncertainty. Each figure shows the bootstrapped forecast CDF
of the mortality rate in 2006, where the forecast is based on the density forecast made ‘as if’ in 1980.
Each black vertical line gives the realised mortality rate in 2006 and its associated p -value in terms of
the forecast CDF. Based on 5,000 simulation trials.
Now suppose we are interested in carrying out a test of the models’ 25-year-ahead
mortality density forecasts. There are now two cases to consider: we can start in 1980
and carry out the test for a forecast of the 2005 mortality density, or we can start in
1981 and carry out the test for a forecast of the 2006 mortality density. 15 These are
illustrated in Figures 17 and 18 below. Pulling the results from these three Figures
together, models M1, M3B and M7 always ‘pass’ tests at conventional significance
levels, and models M2B, M5 and M6 ‘fail’ at least once at the 5% significance level.
mortality outcomes fall into the extreme right-hand side of the forecasted density. These extensions are
fairly obvious and there is no need to dwell on them further here.
15
In principle, both these tests should give much the same answer under the null hypothesis and there is
nothing to choose between them other than the somewhat extraneous consideration (for present
purposes) that the latter test uses slightly later data.
24
Figure 17: Bootstrapped P -Values of Realised Mortality Outcomes:
Males aged 65, 1980 Start, Horizon = 25 Years Ahead
Males aged 65: Model M1
Males aged 65: Model M2B
1
Realised q = 0.0149 : p-value = 0.1519
0.5
0
CDF under null
CDF under null
1
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Realised q = 0.0149: p-value = 0.020
0.5
0
0.04
0
0.005
Males aged 65: Model M3B
CDF under null
CDF under null
Realised q = 0.0149 : p-value = 0.077
0.025
0.03
0.035
0.04
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.035
0.04
0.035
0.04
Realised q = 0.0149 : p-value = 0.054
0.5
0
0.04
0
0.005
Males aged 65: Model M6
0.01
0.015
0.02
0.025
0.03
Males aged 65: Model M7
1
CDF under null
1
CDF under null
0.02
1
0.5
Realised q = 0.0149 : p-value = 0.045
0.5
0
0.015
Males aged 65: Model M5
1
0
0.01
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
0.035
0.04
Realised q = 0.0149 : p-value = 0.164
0.5
0
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
Note: Forecasts based on models estimated using English & Welsh male mortality data for ages 60-89
and years 1971-1980 assuming parameter uncertainty. Each figure shows the bootstrapped forecast CDF
of the mortality rate in 2005, where the forecast is based on the density forecast made ‘as if’ in 1980.
Each black vertical line gives the realised mortality rate in 2005 and its associated p -value in terms of
the forecast CDF. Based on 5,000 simulation trials.
25
Figure 18: Bootstrapped P -Values of Realised Mortality Outcomes:
Males aged 65, 1981 Start, Horizon = 25 Years Ahead
Males aged 65: Model M1
Males aged 65: Model M2B
Realised q = 0.0149 : p-value = 0.206
0.5
0
0.005
0.015
0.02
0.025
0.03
0.035
Realised q = 0.0149 : p-value = 0.111
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Realised q = 0.0149 : p-value = 0.066
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
0.035
0.04
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.035
0.04
0.035
0.04
Males aged 65: Model M5
Realised q = 0.0149 : p-value = 0.105
0
0.005
0.01
0.015
0.02
0.025
0.03
Males aged 65: Model M7
1
0.5
0
0.005
0.5
0
0.04
Males aged 65: Model M6
1
0
1
0.5
0
Realised q = 0.0149 : p-value = 0.075
0.5
0
0.04
Males aged 65: Model M3B
1
CDF under null
0.01
CDF under null
0
CDF under null
CDF under null
1
CDF under null
CDF under null
1
Realised q = 0.0149 : p-value = 0.287
0.5
0
0
0.005
0.01
0.015 0.02 0.025
Mortality rate
0.03
Note: Forecasts based on estimates using English & Welsh male mortality data for ages 60-89 and years
1972-1981 assuming parameter uncertainty. Each figure shows the bootstrapped forecast CDF of the
mortality rate in 2006, where the forecast is based on the density forecast made ‘as if’ in 1980. Each
black vertical line gives the realised mortality rate in 2006 and its associated p -value in terms of the
forecast CDF. Based on 5,000 simulation trials.
Along the same lines, if we wish to test the models’ forecasts of 24-year-ahead
density forecasts, we have three choices: start in1980 and forecast the density for year
2004, start in 1981 and forecast the density for year 2005, and start in 1982 and
forecast the density for year 2006. 16
We can carry on in the same way for forecasts 23 years ahead, 22 years ahead, and so
on, down to forecasts 1 year ahead. By the time we get to 1-year-ahead forecasts, we
would have 26 different possibilities (i.e., start in 1980, start in 1981, etc., up to start
in 2005).
16
Again, in principle each of these tests should yield similar results under the null.
26
So, for any given model, we can construct a sample of 26 different estimates of the
p -value of the null associated with 1-year-ahead forecasts, we can construct a sample
of 25 different estimates of the p -value of the null associated with 2-year-ahead
forecasts, and so forth.
We can then construct plots of estimated p -values for any given horizon or set of
horizons. Some examples are given in Figure 19. This Figure shows plots of the p values for 65-year-olds associated with forecast horizons of 5, 10 and 15 years. The
p -values are fairly variable, but our best estimate of each p -value is to take the
mean, i.e., our best estimate of the p -value associated with 5-year-ahead forecasts is
the mean of the 22 sample ‘observations’ of the estimated 5-year-ahead p -value, our
best estimate of the p -value associated with 10-year-ahead forecasts is the mean of
the 17 sample ‘observations’ of the estimated 10-year-ahead p -values, and so forth,
and the Figure also reports these mean estimates. 17
Figures 19 and 20 give the corresponding results for 75-year olds and 85-year olds. In
most cases considered, the average p-values decline as the forecast horizon lengthens,
indicating that forecast performance in these cases usually deteriorates as the horizon
lengthens. For ages 75 and 85, we find that no model ‘fails’ even at the 5%
significance level; for age 65, however, we find three ‘failures’ (out of a possible 18)
at the 5% level: M2B, M5 and M6 ‘fail’ for h = 15 . Thus, out of a total of 3 × 18 = 48
possible ‘failures’, we find no ‘fails’ at the 1% significance level and only three at the
5% significance level. 18
17
We also repeat our earlier warning from section 3 just before Figure 4: the outcomes are not
independent, so we cannot treat alternative estimates of the p -values for any given model and horizon
h as independent. However, this does not invalidate our claim that the ‘best’ estimate of any of these
p -values is the mean, as reported in Figures 19-21
18
With so few ‘fails’ to report there is little point in summarising the results in a Table as we did with
the results from the previous two sections.
27
Figure 19: Long-Horizon P -Values of Realised Future Mortality Rates:
Males aged 65
Males aged 65: Model M1
Males aged 65: Model M2B
1
Average = 0.290 for forecasts 5 years ahead
Average = 0.188 for forecasts 10 years ahead
Average = 0.143 for forecasts 15 years ahead
P-value
P-value
1
0.5
0
1985
1990
1995
2000
Average = 0.178 for forecasts 5 years ahead
Average = 0.086 for forecasts 10 years ahead
Average = 0.041 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 65: Model M3B
P-value
P-value
2005
0.5
1990
1995
2000
Average = 0.107 for forecasts 5 years ahead
Average = 0.063 for forecasts 10 years ahead
Average = 0.042 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 65: Model M6
1990
1995
2000
2005
Males aged 65: Model M7
1
1
Average = 0.193 for forecasts 5 years ahead
Average = 0.082 for forecasts 10 years ahead
Average = 0.039for forecasts 15 years ahead
P-value
P-value
2000
1
Average = 0.259 for forecasts 5 years ahead
Average = 0.164 for forecasts 10 years ahead
Average = 0.109 for forecasts 15 years ahead
0.5
0
1985
1995
Males aged 65: Model M5
1
0
1985
1990
1990
1995
Starting year
2000
Average = 0.270 for forecasts 5 years ahead
Average = 0.178 for forecasts 10 years ahead
Average = 0.132 for forecasts 15 years ahead
0.5
0
1985
2005
1990
1995
Starting year
2000
2005
Notes: Forecasts based on estimates using English & Welsh male mortality data for ages 60-89 and a
rolling 10-year window assuming parameter uncertainty. The ‘– – –’, ‘– – –’ and ‘’ lines refer to p values over horizons of h = 5, 10 and 15 years respectively. Based on 5,000 simulation trials.
Figure 20: Long-Horizon P -Values of Realised Future Mortality Rates:
Males aged 75
Males aged 75: Model M1
Males aged 75: Model M2B
1
Average = 0.297 for forecasts 5 years ahead
Average = 0.314 for forecasts 10 years ahead
Average = 0.267 for forecasts 15 years ahead
P-value
P-value
1
0.5
0
1985
1990
1995
2000
Average = 0.330 for forecasts 5 years ahead
Average = 0.326 for forecasts 10 years ahead
Average = 0.321 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 75: Model M3B
P-value
P-value
2005
0.5
1990
1995
2000
Average = 0.308 for forecasts 5 years ahead
Average = 0.291 for forecasts 10 years ahead
Average = 0.228 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 75: Model M6
1990
1995
2000
2005
Males aged 75: Model M7
1
1
Average = 0.310 for forecasts 5 years ahead
Average = 0.284 for forecasts 10 years ahead
Average = 0.226 for forecasts 15 years ahead
P-value
P-value
2000
1
Average = 0.314 for forecasts 5 years ahead
Average = 0.282 for forecasts 10 years ahead
Average = 0.228 for forecasts 15 years ahead
0.5
0
1985
1995
Males aged 75: Model M5
1
0
1985
1990
1990
1995
Starting year
2000
2005
Average = 0.312 for forecasts 5 years ahead
Average = 0.258 for forecasts 10 years ahead
Average = 0.228 for forecasts 15 years ahead
0.5
0
1985
1990
1995
Starting year
2000
2005
Notes: As per Figure 19.
28
Figure 21: Long-Horizon P -Values of Realised Future Mortality Rates:
Males aged 85
Males aged 85: Model M1
Males aged 85: Model M2B
1
Average = 0.240 for forecasts 5 years ahead
Average = 0.326 for forecasts 10 years ahead
Average = 0.282 for forecasts 15 years ahead
P-value
P-value
1
0.5
0
1985
1990
1995
2000
Average = 0.335 for forecasts 5 years ahead
Average = 0.368 for forecasts 10 years ahead
Average = 0.331 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 85: Model M3B
2000
2005
1
Average = 0.318 for forecasts 5 years ahead
Average = 0.386 for forecasts 10 years ahead
Average = 0.367 for forecasts 15 years ahead
P-value
P-value
1995
Males aged 85: Model M5
1
0.5
0
1985
1990
1995
2000
Average = 0.327 for forecasts 5 years ahead
Average = 0.377 for forecasts 10 years ahead
Average = 0.380 for forecasts 15 years ahead
0.5
0
1985
2005
Males aged 85: Model M6
1990
1995
2000
2005
Males aged 85: Model M7
1
1
Average = 0.327 for forecasts 5 years ahead
Average = 0.378 for forecasts 10 years ahead
Average = 0.386 for forecasts 15 years ahead
Average = 0.330 for forecasts 5 years ahead
Average = 0.370 for forecasts 10 years ahead
P-value
P-value
1990
0.5
0
1985
1990
1995
Starting year
2000
2005
Average = 0.371 for forecasts 15 years ahead
0.5
0
1985
1990
1995
Starting year
2000
2005
Notes: As per Figure 19.
6. Conclusions
The purposes of this paper are: (i) to set out a backtesting framework that can be used
to evaluate the ex-post forecasting performance of stochastic mortality models; and
(ii) to illustrate this framework by evaluating the forecasting performance of a number
of mortality models calibrated under a particular data set.
The backtesting framework presented here is based on the idea that forecast
distributions should be compared against subsequently realised mortality outcomes: if
the realised outcomes are compatible with their forecasted distributions, then this
would suggest that the forecasts and the models that generated them are good ones;
and if the forecast distributions and realised outcomes are incompatible, this would
suggest that the forecasts and models are poor. We discussed four different classes of
backtest building on this general idea: (i) backtests based on the convergence of
forecasts through time towards the mortality rate(s) in a given year; (ii) backtests
29
based on the accuracy of forecasts over multiple horizons; (iii) backtests based on the
accuracy of forecasts over rolling fixed-length horizons and (iv) backtests based on
formal hypothesis tests that involve comparisons of realised outcomes against
forecasts of the relevant densities over specified horizons.
As far as the individual models are concerned, we find that models M1, M3B, M5,
M6 and M7 perform well most of the time and there is relatively little to choose
between these models. Of the Lee-Carter class of models examined here, it is difficult
to choose between M1 and M3B; of the CBD models (i.e., between M5, M6 and M7),
however, we would suggest that the evidence presented here points to M7 being the
best performer on this dataset, and to it being difficult to choose between M5 and M6.
Model M2B repeatedly shows evidence of considerable instability. However, we
should make clear that we have examined a particular version of M2 in this study and
so cannot rule out the possibility that other specifications of, or extensions to, M2
might resolve the stability problem identified both here and elsewhere (see, e.g.,
Cairns et al. (2008) and Dowd et al. (2008)).
We would emphasise that these results are obtained using one particular data set – that
is, data for English & Welsh males – over limited sample periods. Accordingly, we
make no claim for how these models might perform over other data sets or sample
periods.
We would also make two other comments of a more general nature. If one looks over
Figures 4-15, it is clear that, in most but not all cases, and in varying degrees
depending on the model and whether we take account of parameter uncertainty or not:
•
There are fewer upper exceedances than predicted under the null;
•
There are usually more median exceedances – realised outcomes below the
median – than predicted; and
•
There are usually many more lower exceedances than predicted, especially for
the parameter-certain forecasts.
30
These findings suggest that all the bounds – especially the PC lower bounds – are
generally too high. In short, in most cases, the forecasted bounds are biased upwards.
Needless to say, this is useful information to potential users.
Finally, the results presented make strongly suggest that models that take account of
parameter uncertainty generate more plausible and better performing forecasts than
models that do not take account of parameter uncertainty. We obtained this finding
repeatedly and for every model considered, and the fact that we did so leads us to
expect similar findings for other models and datasets. Thus, regardless of the
particular model one might use, our results highlight the importance of parameter
uncertainty and the need to take account of parameter uncertainty if one is to make
plausible forecasts of future mortality.
References
Berkowitz, J. (2001) “Testing density forecasts, with applications to risk
management.” Journal of Business and Economic Statistics 19: 465-474.
Blake, D., K. Dowd, and A. J. G. Cairns (2008) “Longevity risk and the grim reaper’s
toxic tail: The survivor fan charts.” Insurance: Mathematics and Economics, 42:10621068.
Cairns, A. J. G. (2000) “A discussion of parameter and model uncertainty in
insurance.” Insurance: Mathematics and Economics 27: 313-330.
Cairns, A. J. G., D. Blake and K. Dowd (2006) “A two-factor model for stochastic
mortality with parameter uncertainty: Theory and calibration.” Journal of Risk and
Insurance 73: 687-718.
Cairns, A. J. G., D. Blake, K. Dowd, G. D. Coughlan, D. Epstein, A. Ong, and I.
Balevich (2007) “A quantitative comparison of stochastic mortality models using data
31
from England & Wales and the United States.” Pensions Institute Discussion Paper
PI-0701.
Cairns, A. J. G., D. Blake, K. Dowd, G. D. Coughlan, D. Epstein, and M. KhalafAllah (2008) “Mortality density forecasts: An analysis of six stochastic mortality
models.” Pensions Institute Discussion Paper PI-0801.
Continuous Mortality Investigation (2006) “Stochastic projection methodologies:
Further progress and P-Spline model features, example results and implications.”
Working Paper 20.
Continuous Mortality Investigation (2007) “Stochastic projection methodologies: LeeCarter model features, example results and implications.” Working Paper 25.
Coughlan, G., D. Epstein, A. Sinha, P.Honig (2007) q-Forwards: Derivatives for
Transferring
Longevity
and
Mortality
Risks.
Available
at
www.jpmorgan.com/lifemetrics.
Currie, I. D., M. Durban, and P. H. C. Eilers (2004) “Smoothing and forecasting
mortality rates.” Statistical Modelling, 4: 279-298.
Currie, I. D. (2006) “Smoothing and forecasting mortality rates with P-splines.” Talk
given
at
the
Institute
of
Actuaries,
June
2006.
See
http://www.ma.hw.ac.uk/~iain/research/talks.html.
Diebold, F. X., T. A. Gunther, and A. S. Tay (1998) “Evaluating density forecasts
with applications to financial risk management.” International Economic Review 39:
863-883.
Dowd, K. (2007) “Backtesting risk models within a standard normality framework.”
Journal of Risk, 9: 93-111.
32
Dowd, K. (2008) “The GDP fan charts: An empirical evaluation.” National Institute
Economic Review, 203(1): 59-67.
Dowd, K., A. J. G. Cairns and D. Blake (2006) “Mortality-dependent measures of
financial risk.” Insurance: Mathematics and Economics, 38: 427-440.
Dowd, K., Cairns, A. J. G., D. Blake, G. D. Coughlan, D. Epstein, and M. KhalafAllah (2008) “Evaluating the goodness of fit of stochastic mortality models.”
Pensions Institute Discussion Paper PI-0802.
Embrechts, P., C. Klüppelberg, and T. Mikosch (1997) Modelling Extreme Events for
Insurance and Finance. Berlin: Springer Verlag.
Keilman, N. (1997) “Ex-post errors in official population forecasts in industrialized
countries.” Journal of Official Statistics (Statistics Sweden) 13: 245-247.
Keilman, N. (1998) “How accurate are the United Nations world population
projections?” Population and Development Review 24: 15-41.
Koissi, M.-C., A. Shapiro, and G. Höhnäs (2006) “Evaluating and extending the LeeCarter model for mortality forecasting: Bootstrap confidence interval.” Insurance:
Mathematics and Economics , 38: 1-20.
Lee, R. D., and L. R. Carter (1992) “Modeling and forecasting U.S. mortality.”
Journal of the American Statistical Association 87: 659-675.
Lee, R. D., and T. Miller (2001) “Evaluating the performance of the Lee-Carter
method for forecasting mortality.” Demography 38: 537-549.
Longstaff, F. A., and E. S. Schwartz (1992) “Interest rate volatility and the term
structure: A two-factor general equilibrium model.” Journal of Finance 47: 12591282.
33
National Research Council (2000) Beyond Six Billion: Forecasting the World’s
Population, edited by J. Bongaerts and R. A. Bulatao. Washington, DC: National
Academy Press.
Renshaw, A. E., and S. Haberman (2006) “A cohort-based extension to the Lee-Carter
model for mortality reduction factors.” Insurance: Mathematics and Economics 38:
556-70.
Vasicek, O. (1977) “An equilibrium characterization of the term structure.” Journal of
Financial Economics 5: 177-188.
Wilmoth, J.R. (1998) “Is the pace of Japanese mortality decline converging toward
international trends?” Population and Development Review 24:593-600.
34
Appendix A: The Stochastic Mortality Models Considered in this
Study
Model M1
Model M1, the Lee-Carter model, postulates that the true underlying death rate,
m(t , x) = − log(1 − q(t , x)) , satisfies:
log m(t , x) = β x(1) + β x(2)κ t(2)
(A1)
where the state variable κ t( 2) follows a one-dimensional random walk with drift:
(2)
κ t(2) = κ t(2)
−1 + μ + CZ t
(A2)
in which μ is a constant drift term, C a constant volatility and Z t(2) a onedimensional iid N(0,1) error. This model also satisfies the identifiability constraints:
∑κ
t
(2)
t
= 0 and
∑β
(2)
x
=1
(A3)
x
Model M2B
Model M2B is an extension of Lee-Carter that allows for a cohort effect. It postulates
that m(t , x) satisfies:
log m(t , x) = β x(1) + β x(2)κ t(2) + β x(3)γ c(3)
(A4)
where the state variable κ t( 2) follows (A2) and γ c(3) is a cohort effect where c = t − x
is the year of birth. We follow Cairns et al. (2008) and CMI (2007) and model γ c(3) as
an ARIMA(1,1,0) process:
35
(
)
Δγ(c3) = μ(γ) +α (γ) Δγ(c3−)1 −μ(γ) + σ(γ) Z c(γ)
(A5)
with parameters μ(γ) , α(γ) and σ(γ) , and Z c(γ) is iid N(0,1). This model satisfies the
identifiability constraints:
∑κ
(2)
t
t
=0,
∑β
= 1,
(2)
x
x
∑γ
(3)
c
= 0 and
c
∑β
(3)
x
=1
(A6)
x
Model M3B
This model is a simplified version of M2B which postulates:
log m(t , x) = β x(1) + κ t(2) + γ c(3)
(A7)
where the variables (including the cohort effect) are the same as for M2B. This model
satisfies the constraints:
∑κ
(2)
t
= 0 and
∑γ
(3)
t−x
=0
(A8)
x ,t
t
Given these constraints, we let β x(1) = 10−1 ∑ log m(t , x) – bearing in mind that our
t
lookback sample window has a length of 10 years – and obtain
∑ ( x − x )(β − β
δ =−
∑ (x − x )
(1)
x
(1)
x
)
x
2
(A9)
x
and thence obtain the following revised parameter estimates:
κ t(2) = κ t(2) − δ (t − t )
(3)
γ t(3)
− x = γ t − x + δ ( (t − t ) − ( x − x ) )
(A10)
β x(1) = β x(1) + δ ( x − x )
36
which are the ones on which the forecasts are based.
Model M5
Model M5 is a reparameterised version of the CBD two-factor mortality model
(Cairns et al., 2006) and postulates that q(t , x) satisfies:
logit q(t , x) = κ t(1) + κ t(2) ( x − x )
(A11)
where x is the average of the ages used in the dataset and where the state variables
follow a two-dimensional random walk with drift:
κ t = κ t −1 + μ + CZ t
(A12)
in which μ is a constant 2 × 1 drift vector, C is a constant 2 × 2 upper triangular
‘volatility’ matrix (to be precise, the Choleski ‘square root’ matrix of the variancecovariance matrix) and Z t is a two-dimensional standard normal variable, each
component of which is independent of the other.
Model M6
M6 is a generalised version of M5 with a cohort effect:
logit q(t , x) = κ t(1) + κ t(2) ( x − x ) + γ c(3)
(A13)
where the κ t process follows (A12) and the γ c(3) process follows (A5). This model
satisfies the identifiability constraints:
∑γ
c∈C
(3)
c
= 0 and
∑ cγ
c∈C
(3)
c
=0
(A14)
Model M7
M7 is a second generalised version of M5 with a cohort effect:
37
logit q (t , x) = κ t(1) + κ t(2) ( x − x ) + κ t(3) (( x − x ) 2 − σ x2 ) + γ c(4)
(A15)
where the state variables κ t follow a three-dimensional random walk with drift, σ2x is
the variance of the age range used in the dataset, and γ c(4) is a cohort effect that is
modelled as an AR(1) process. This model satisfies the identifiability constraints:
∑γ
c∈C
(4)
c
= 0,
∑ cγ
c∈C
(4)
c
= 0 and
∑c γ
2
c∈C
(4)
c
=0
(A16)
More details on the models can be found in Cairns et al. (2007).
38
Appendix B: Simulating Model Parameters under Conditions of
Parameter Uncertainty
This Appendix explain the procedure used to simulate the values of the parameters
driving the κ t process and, where the model involves a cohort effect γ c , the
parameters of the process that drives γ c . The approach used is a Bayesian one based
on non-informative Jeffreys prior distributions (see Cairns (2000), Cairns et al.,
2006).
Simulating the values of the parameters driving the κ t process
Recall from Appendix A that each of the models has an underlying random walk with
drift:
κ t = κ t −1 + μ + CZ t
(B1)
where: in the case of M1, M2B and M3B, κ t , μ and V are scalars; in the case of M5
and M6, κ t and μ are 2 × 1 vectors and a V is 2 × 2 matrix; and in the case of M7,
κ t and μ are 3 ×1 vectors and V is a 3 × 3 matrix. For the sake of generality, we
refer below to μ as a vector and V as a matrix.
The parameters of (B1) are estimated by MLE using a sample of size n =10 for the
results presented in this paper.
We first simulate V from its posterior distribution, the Wishart ( n − 1, n −1Vˆ −1 )
distribution. To do so, we carry out the following steps:
●
We simulate n − 1 i.i.d. vectors α1 ,...,α n −1 from a multivariate normal
distribution with mean vector 0 and covariance matrix n −1Vˆ −1 . These vectors
39
will have dimensions 1×1 for M1, M2B and M3B, dimensions 2 ×1 for M5
and M6, and dimensions 3 ×1 for M7.
●
We construct the matrix X = ∑i =1α iα iT .
●
We then invert X to obtain our simulated positive-definite covariance matrix
n −1
V (= X −1 ) .
matrix, we simulate the μ vector from
Having obtained our simulated V
MVN ( μˆ , n −1V ) .
Simulating the values of the parameters driving the γ c process
Recall that c is the year of birth. We first note that, for model M7, the cohort effect
γ c(4) follows an AR(1) process, and for models M2B, M3B and M5, the cohort effect
γ c(3) follows an ARIMA(1,1,0) process. This latter process implies that the first
difference of γ c(3) , Δγ c(3) , follows an AR(1). Hence our task is to simulate the values
of the parameters μ (γ ) , α (γ ) and σ (γ ) in the following equation:
γ c = μ (γ ) + α (γ )γ c −1 + σ (γ )ε c
(B2)
and γ c is, as appropriate, either Δγ c(3) or γ c(4) . The method we use is taken from
Cairns (2000, pp. 320-321) and is based on a Jeffreys prior distribution for
( μ (γ ) , α (γ ) , σ (γ ) ) and MLE estimates of these parameters ( μˆ (γ ) , αˆ (γ ) , σˆ (γ ) ). This
involves the following steps:
•
We simulate the value of α (γ ) from αˆ (γ ) and n from its posterior distribution
f (α (γ ) ) = ((α (γ ) ) 2 − 2α (γ )αˆ (γ ) + 1) − ( n −1)/2 using a suitable algorithm (e.g., the
rejection method).
•
We then simulate X ∼ χ n2−1 and obtain a simulated value of (σ (γ ) ) 2 from
(σ
(γ ) 2
) =
(n − 1)(σˆ (γ ) ) 2 (1 + (α (γ ) − αˆ (γ ) ) 2 / (1 − (αˆ (γ ) ) 2 ) )
X
.
40
•
Finally, we simulate Z ∼ N (0,1) and obtain a simulated value of μ (γ ) from
μ
(γ )
= μˆ (γ ) +
(σ (γ ) ) 2 / (n − 1)
×Z .
(1 − α (γ ) ) 2
Further details are provided in Cairns (2000, pp. 320-321).
41