Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-PeriodAhead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt) David Blake (Pensions Institute, Cass Business School) Guy D. Coughlan (JPMorgan) David Epstein (JPMorgan) Marwa Khalaf-Allah (JPMorgan) 4th International Longevity Risk and Capital Market Solutions Conference Amsterdam September 2008 Purposes of Paper • To set out a framework to backtest the forecast performance of mortality models – Backtesting = evaluation of forecasts against subsequently realised outcomes • To apply this backtesting framework to a set of mortality models – How well do they actually perform? 2 Background – This study is the fourth in a series involving a collaboration between Blake, Cairns and Dowd and the LifeMetrics team at JPMorgan – Involves actuaries, economists and investment bankers – Of course, it is very easy (and fun!) to attack the forecasting ‘abilities’ of actuaries (remember Equitable?) and investment bankers (remember subprime? etc), but we should remember… 3 Its not just actuaries and investment bankers who can’t forecast 4 Background – Cairns et alia (2007) examines the empirical fits of 8 different mortality models applied to E&W and US male mortality data – Compares model performance • Uses a range of qualitative criteria (e.g., biological reasonableness, etc) • Uses a range of quantitative criteria (e.g., Bayes information criterion) 5 Models considered – Model M1 = Lee-Carter, no cohort effect – Model M2 = Renshaw-Haberman’s 2006 cohort effect generalisation of M1 – Model M3 = Currie’s age-period-cohort model – Model M4 = P-splines model, Currie 2004 – Model M5 = CBD two-factor model, Cairns et al (2006), no cohort effect – Models M6, M7 and M8: alternative cohort-effect generalisations of CBD 6 Second study, Cairns et al (2008) – Examines ex ante plausibility of models’ density forecasts – M4 (P-Splines not considered) – Amongst other conclusions, finds that M8 (which did very well in first study) gives very implausible forecasts for US data – Hence, decided to drop M8 as well – Thus, a model might fit past data well but still give unreliable forecasts • Not enough just to look at past fits 7 Third study, Dowd et al (2008a) – Examines the Goodness of Fits of models M1, M2B, M3B, M5, M6 and M7 more systematically • M2B is a special case of M2, which uses an ARIMA(1,1,0) for cohort effect • M3B is a special case of M3, which the same ARIMA(1,1,0) for cohort effect – Basic idea to unravel the models’ testable implications and test them systematically – Finds some problems with all models but M2B unstable 8 Motivation for present study – A model might • Give a good fit to past data and • Generate density forecasts that appear plausible ex ante – And still produce poor forecasts – Hence, it is essential to test performance of models against subsequently realised outcomes • This is what backtesting is about – In the end, it is the forecast performance that really matters – Would you want to drive a car that hadn’t been field-tested? 9 Backtesting framework – Choose metric of interest • Could choose mortality rates, survival rates, life expectancy, annuity prices etc. – Select historical lookback window used to estimate model params – Select forecast horizon or lookforward window for forecasts – Implement tests of how well forecasts subsequently performed 10 Backtesting framework – We choose focus mainly on mortality rate as metric – We choose a fixed 10-year lookback window • This seems to be emerging as the standard amongst practitioners – We examine a range of backtests: • Over contracting horizons • Over expanding horizons • Over rolling fixed-length horizons • Future mortality density tests 11 Backtesting framework – We consider forecasts both with and without parameter uncertainty – Parameter certain case: treat estimates of parameters as if known values – Parameter uncertain case: forecast using a Bayesian approach that allows for uncertainty in parameter estimates • Allows for uncertainty in parameters governing period and cohort effects – Results indicate it is very important to allow for parameter uncertainty 12 Contracting horizon BT: age 65 Males aged 65: Model M1 Males aged 65: Model M2B 0.04 Mortality rate Mortality rate 0.04 0.03 0.02 0.01 1980 1985 1990 1995 2000 0.03 0.02 0.01 1980 2005 Males aged 65: Model M3B Mortality rate Mortality rate 0.02 1985 1990 1995 2000 2000 2005 0.03 0.02 0.01 1980 2005 Males aged 65: Model M6 1985 1990 1995 2000 2005 Males aged 65: Model M7 0.04 Mortality rate 0.04 Mortality rate 1995 0.04 0.03 0.03 0.02 0.01 1980 1990 Males aged 65: Model M5 0.04 0.01 1980 1985 1985 1990 1995 Stepping off year 2000 2005 0.03 0.02 0.01 1980 1985 1990 1995 Stepping off year 2000 2005 13 Contracting horizon BT: age 75 Males aged 75: Model M1 Males aged 75: Model M2B 0.08 Mortality rate Mortality rate 0.08 0.06 0.04 0.02 1980 1985 1990 1995 2000 0.06 0.04 0.02 1980 2005 Males aged 75: Model M3B Mortality rate Mortality rate 2000 2005 0.04 1985 1990 1995 2000 0.06 0.04 0.02 1980 2005 Males aged 75: Model M6 1985 1990 1995 2000 2005 Males aged 75: Model M7 0.08 Mortality rate 0.08 Mortality rate 1995 0.08 0.06 0.06 0.04 0.02 1980 1990 Males aged 75: Model M5 0.08 0.02 1980 1985 1985 1990 1995 Stepping off year 2000 2005 0.06 0.04 0.02 1980 1985 1990 1995 Stepping off year 2000 2005 14 Contracting horizon BT: age 85 Males aged 85: Model M2B 0.25 0.2 0.2 Mortality rate Mortality rate Males aged 85: Model M1 0.25 0.15 0.1 0.05 1980 1985 1990 1995 2000 0.15 0.1 0.05 1980 2005 0.25 0.2 0.2 0.15 0.1 0.05 1980 1985 1990 1995 2000 0.05 1980 2005 0.2 Mortality rate Mortality rate 0.2 0.15 0.1 2000 2005 1985 1990 1995 2000 2005 Males aged 85: Model M7 0.25 1990 1995 Stepping off year 2000 0.1 Males aged 85: Model M6 1985 1995 0.15 0.25 0.05 1980 1990 Males aged 85: Model M5 0.25 Mortality rate Mortality rate Males aged 85: Model M3B 1985 2005 0.15 0.1 0.05 1980 1985 1990 1995 Stepping off year 2000 2005 15 Conclusions so far • Big difference between PC and PU forecasts • PU prediction intervals usually considerably wider than PC ones • M2B sometimes unstable • Now consider expanding horizon predictions … 16 Prediction-Intervals from 1980: age 65 Males aged 65: Model M1 Males aged 65: Model M2B PU: [xL, xM, xU, n] = [8, 27, 0, 27] PC: [xL, xM, xU, n] = [7, 25, 1, 27] Mortality rate Mortality rate PU: [xL, xM, xU, n] = [0, 25, 1, 27] 0.05 0.04 0.03 0.02 0.01 1980 1985 1990 1995 2000 0.05 0.04 0.03 0.02 0.01 1980 2005 Males aged 65: Model M3B Mortality rate Mortality rate PC: [xL, xM, xU, n] = [12, 26, 1, 27] 0.03 0.02 1985 1990 1995 2000 0.05 PC: [xL, xM, xU, n] = [14, 25, 1, 27] 0.03 0.02 1990 1995 Year 2000 PC: [xL, xM, xU, n] = [18, 27, 0, 27] 1985 1990 1995 2000 2005 Males aged 65: Model M7 PU: [xL, xM, xU, n] = [0, 25, 1, 27] 1985 2005 0.02 0.06 0.04 0.01 1980 2000 0.03 0.01 1980 2005 Mortality rate Mortality rate 0.05 1995 0.04 Males aged 65: Model M6 0.06 1990 PU: [xL, xM, xU, n] = [1, 27, 0, 27] 0.04 0.01 1980 1985 Males aged 65: Model M5 PU: [xL, xM, xU, n] = [0, 26, 1, 27] 0.05 PC: [xL, xM, xU, n] = [16, 27, 0, 27] 2005 0.05 PU: [xL, xM, xU, n] = [0, 19, 1, 27] PC: [xL, xM, xU, n] = [7, 19, 1, 27] 0.04 0.03 0.02 0.01 1980 1985 1990 1995 Year 2000 2005 17 Prediction-Intervals from 1980: age 75 Males aged 75: Model M1 0.08 Males aged 75: Model M2B 0.1 PU: [xL, xM, xU, n] = [1, 27, 0, 27] Mortality rate Mortality rate 0.1 PC: [xL, xM, xU, n] = [12, 27, 0, 27] 0.06 0.04 1980 0.08 0.04 1985 1990 1995 2000 2005 1980 PU: [xL, xM, xU, n] = [1, 27, 0, 27] PC: [xL, xM, xU, n] = [8, 27, 0, 27] 0.06 0.04 1980 0.08 1985 1990 1995 2000 2005 1980 2000 2005 PU: [xL, xM, xU, n] = [0, 25, 1, 27] PC: [xL, xM, xU, n] = [7, 25, 1, 27] 1985 1990 1995 2000 2005 Males aged 75: Model M7 0.1 PU: [xL, xM, xU, n] = [1, 27, 0, 27] PC: [xL, xM, xU, n] = [8, 27, 0, 27] 0.06 0.04 1980 1995 0.04 Mortality rate Mortality rate 0.08 1990 0.06 Males aged 75: Model M6 0.1 1985 Males aged 75: Model M5 0.1 Mortality rate Mortality rate 0.08 PC: [xL, xM, xU, n] = [13, 27, 0, 27] 0.06 Males aged 75: Model M3B 0.1 PU: [xL, xM, xU, n] = [1, 27, 0, 27] 0.08 PU: [xL, xM, xU, n] = [1, 27, 0, 27] PC: [xL, xM, xU, n] = [9, 27, 0, 27] 0.06 0.04 1985 1990 1995 Year 2000 2005 1980 1985 1990 1995 Year 2000 2005 18 Prediction-Intervals from 1980: age 85 Males aged 85: Model M1 0.2 Males aged 85: Model M2B 0.25 PU: [xL, xM, xU, n] = [1, 22, 0, 27] PC: [xL, xM, xU, n] = [4, 22, 0, 27] Mortality rate Mortality rate 0.25 0.15 0.1 0.05 1980 1985 1990 1995 2000 0.2 0.1 Males aged 85: Model M3B 0.2 PU: [xL, xM, xU, n] = [1, 21, 0, 27] PC: [xL, xM, xU, n] = [2, 21, 0, 27] 0.15 0.1 0.05 1980 1985 1990 1995 2000 0.2 PU: [xL, xM, xU, n] = [1, 18, 0, 27] PC: [xL, xM, xU, n] = [1, 18, 0, 27] 0.1 1985 1990 1995 Year 2000 2000 2005 PU: [xL, xM, xU, n] = [1, 24, 0, 27] PC: [xL, xM, xU, n] = [2, 24, 0, 27] 1985 1990 1995 2000 2005 Males aged 85: Model M7 0.25 0.15 0.05 1980 1995 0.1 0.05 1980 2005 Mortality rate Mortality rate 0.2 1990 0.15 Males aged 85: Model M6 0.25 1985 Males aged 85: Model M5 0.25 Mortality rate Mortality rate 0.25 PC: [xL, xM, xU, n] = [0, 5, 1, 27] 0.15 0.05 1980 2005 PU: [xL, xM, xU, n] = [0, 7, 1, 27] 2005 0.2 PU: [xL, xM, xU, n] = [1, 26, 0, 27] PC: [xL, xM, xU, n] = [5, 26, 0, 27] 0.15 0.1 0.05 1980 1985 1990 1995 Year 2000 2005 19 Expanding PI conclusions • PC models have far too many lower exceedances • PU models have exceedances that are much closer to expectations – Especially for M1, M7 and M3B – Suggests that PU forecasts are more plausible than PC ones • Negligible differences between PC and PU median predictions • Very few upper exceedances 20 Expanding PI conclusions • Too few upper exceedances, and two many median and lower exceedances • some upward bias, especially for PC forecasts • This upward bias is especially pronounced for PC forecasts • Evidence of upward bias less clearcut for PU forecasts 21 Rolling Fixed Horizon Forecasts • From now on, work with PU forecasts only • Assume illustrative horizon = 15 years • Now examine performance of each model in turn … 22 Model M1 Age 85: [xL, xM, xU, n] = [1, 10, 0, 12] Age 75: [xL, xM, xU, n] = [0, 11, 0, 12] Age 65: [xL, xM, xU, n] = [1, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 23 Model M2B Age 85: [xL, xM, xU, n] = [1, 5, 0, 12] Age 75: [xL, xM, xU, n] = [0, 12, 0, 12] Age 65: [xL, xM, xU, n] = [8, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 24 Model M3B Age 85: [xL, xM, xU, n] = [0, 8, 0, 12] Age 75: [xL, xM, xU, n] = [0, 12, 0, 12] Age 65: [xL, xM, xU, n] = [2, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 25 Model M5 Age 85: [xL, xM, xU, n] = [0, 8, 0, 12] Age 75: [xL, xM, xU, n] = [0, 12, 0, 12] Age 65: [xL, xM, xU, n] = [9, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 26 Model M6 Age 85: [xL, xM, xU, n] = [0, 4, 0, 12] Age 75: [xL, xM, xU, n] = [0, 12, 0, 12] Age 65: [xL, xM, xU, n] = [10, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 27 Model M7 Age 85: [xL, xM, xU, n] = [0, 8, 0, 12] Age 75: [xL, xM, xU, n] = [0, 12, 0, 12] Age 65: [xL, xM, xU, n] = [4, 12, 0, 12] Age 85 -1 Mortality rate 10 Age 75 Age 65 -2 10 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Year 28 Tentative conclusions so far • Rolling PI charts broadly consistent with earlier results • Some evidence of upward bias but not consistent across models or always especially compelling • M2B again shows instability 29 Mortality density tests • Choose age (e.g., 65) and horizon (e.g., 15 years ahead) • Use model to project pdf (or cdf) of mortality rate 15 years ahead • Plot realised q on to pdf/cdf • Obtain associated p-value (or PIT value) • Reject if p is too far out in either tail 30 Example: P-Values of Realised Mortality: Males 65, 1980 Start, Horizon = 26 Years Ahead Males aged 65: Model M1 Males aged 65: Model M2B 1 CDF under null CDF under null 1 Realised q = 0.0149 : p-value = 0.159 0.5 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0 0.04 Realised q = 0.0149 : p-value = 0.021 0.5 0 0.005 Males aged 65: Model M3B CDF under null CDF under null Realised q = 0.0149 : p-value = 0.074 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.025 0.03 0.035 0.04 0.035 0.04 0.035 0.04 Realised q = 0.0149 : p-value = 0.049 0.5 0 0.04 0 0.005 Males aged 65: Model M6 0.01 0.015 0.02 0.025 0.03 Males aged 65: Model M7 1 CDF under null 1 CDF under null 0.02 1 0.5 Realised q = 0.0149 : p-value = 0.052 0.5 0 0.015 Males aged 65: Model M5 1 0 0.01 0 0.005 0.01 0.015 0.02 0.025 Mortality rate 0.03 0.035 0.04 Realised q = 0.0149 : p-value = 0.165 0.5 0 0 0.005 0.01 0.015 0.02 0.025 Mortality rate 0.03 31 Many ways to do this • For h=25 years ahead: 1 way – 1980-2005 only • For h=24 years ahead, 2 ways – 1980-2004, 1981-2005 • For h=23 years ahead, 3 ways • …. • For h=1 year ahead, 26 ways – 1980-1981, 1981-1982, …, 2004-2005 32 Lots of cases to consider • The are 25+24+23+…+1=325 separate cases to consider, each equally ‘legitimate’ • Need some way to make use of all possibilities but consolidate results • We do so by computing p-values for each case and then work with mean p-values from each test • These are reported below for each age, for h=5, 10 and 15 years ahead: 33 Age 65 Males aged 65: Model M1 Males aged 65: Model M2B 1 Average = 0.290 for forecasts 5 years ahead Average = 0.188 for forecasts 10 years ahead Average = 0.143 for forecasts 15 years ahead P-value P-value 1 0.5 0 1985 1990 1995 2000 Average = 0.178 for forecasts 5 years ahead Average = 0.086 for forecasts 10 years ahead Average = 0.041 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 65: Model M3B P-value P-value 0.5 1990 1995 2000 2005 Average = 0.107 for forecasts 5 years ahead Average = 0.063 for forecasts 10 years ahead Average = 0.042 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 65: Model M6 1990 1995 2000 2005 Males aged 65: Model M7 1 1 Average = 0.193 for forecasts 5 years ahead Average = 0.082 for forecasts 10 years ahead Average = 0.039for forecasts 15 years ahead P-value P-value 2000 1 Average = 0.259 for forecasts 5 years ahead Average = 0.164 for forecasts 10 years ahead Average = 0.109 for forecasts 15 years ahead 0.5 0 1985 1995 Males aged 65: Model M5 1 0 1985 1990 1990 1995 Starting year 2000 2005 Average = 0.270 for forecasts 5 years ahead Average = 0.178 for forecasts 10 years ahead Average = 0.132 for forecasts 15 years ahead 0.5 0 1985 1990 1995 Starting year 2000 2005 34 Age 75 Males aged 75: Model M1 Males aged 75: Model M2B 1 Average = 0.297 for forecasts 5 years ahead Average = 0.314 for forecasts 10 years ahead Average = 0.267 for forecasts 15 years ahead P-value P-value 1 0.5 0 1985 1990 1995 2000 Average = 0.330 for forecasts 5 years ahead Average = 0.326 for forecasts 10 years ahead Average = 0.321 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 75: Model M3B P-value P-value 0.5 1990 1995 2000 2005 Average = 0.308 for forecasts 5 years ahead Average = 0.291 for forecasts 10 years ahead Average = 0.228 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 75: Model M6 1990 1995 2000 2005 Males aged 75: Model M7 1 1 Average = 0.310 for forecasts 5 years ahead Average = 0.284 for forecasts 10 years ahead Average = 0.226 for forecasts 15 years ahead P-value P-value 2000 1 Average = 0.314 for forecasts 5 years ahead Average = 0.282 for forecasts 10 years ahead Average = 0.228 for forecasts 15 years ahead 0.5 0 1985 1995 Males aged 75: Model M5 1 0 1985 1990 1990 1995 Starting year 2000 2005 Average = 0.312 for forecasts 5 years ahead Average = 0.258 for forecasts 10 years ahead Average = 0.228 for forecasts 15 years ahead 0.5 0 1985 1990 1995 Starting year 2000 2005 35 Age 85 Males aged 85: Model M1 Males aged 85: Model M2B 1 Average = 0.240 for forecasts 5 years ahead Average = 0.326 for forecasts 10 years ahead Average = 0.282 for forecasts 15 years ahead P-value P-value 1 0.5 0 1985 1990 1995 2000 Average = 0.335 for forecasts 5 years ahead Average = 0.368 for forecasts 10 years ahead Average = 0.331 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 85: Model M3B P-value P-value 0.5 1990 1995 2000 2005 Average = 0.327 for forecasts 5 years ahead Average = 0.377 for forecasts 10 years ahead Average = 0.380 for forecasts 15 years ahead 0.5 0 1985 2005 Males aged 85: Model M6 1990 1995 2000 2005 Males aged 85: Model M7 1 1 Average = 0.327 for forecasts 5 years ahead Average = 0.378 for forecasts 10 years ahead Average = 0.386 for forecasts 15 years ahead Average = 0.330 for forecasts 5 years ahead Average = 0.370 for forecasts 10 years ahead P-value P-value 2000 1 Average = 0.318 for forecasts 5 years ahead Average = 0.386 for forecasts 10 years ahead Average = 0.367 for forecasts 15 years ahead 0.5 0 1985 1995 Males aged 85: Model M5 1 0 1985 1990 1990 1995 Starting year 2000 2005 Average = 0.371 for forecasts 15 years ahead 0.5 0 1985 1990 1995 Starting year 2000 2005 36 Conclusions from these tests • All models perform well • No rejections at 1% SL • Only 3 at 5% SL 37 Overall conclusions • Study outlines a framework for backtesting forecasts of mortality models • As regards individual models and this dataset: – M1, M3B, M5 and M7 perform well most of the time and there is little between them – M2B unstable – Of the Lee-Carter family of models, hard to choose between M1 and M3B – Of the CBD family, M7 seems to perform best; little to choose between M5 and M7 38 Two other points stand out • In many but not all cases, and depending also on the model, there is evidence of an upward bias in forecasts – This is very pronounced for PC forecasts – This bias is less pronounced for PU forecasts • Except maybe for M2B, PU forecasts are more plausible than the PC forecasts • Very important to take account of param uncertainty more or less regardless of the model one uses 39 References • Cairns et al. (2007) “A quantitative comparison of stochastic mortality models using data from England & Wales and the United States.” Pensions Institute Discussion Paper PI-0701, March • Cairns et al. (2008) “The plausibility of mortality density forecasts: An analysis of six stochastic mortality models.” Pensions Institute Discussion Paper PI-0801, April. • Dowd et al. (2008a) “Evaluating the goodness of fit of stochastic mortality models.” Pensions Institute Discussion Paper PI-0802, September. • Dowd et al. (2008b) “Backtesting stochastic mortality models: An ex-post evaluation of multi-year-ahead density forecasts.” Pensions Institute Discussion Paper PI-0803, September. • These papers are also available at www.lifemetrics.com 40
© Copyright 2026 Paperzz