models of default - Imperial College London

Challenges arising when including
macroeconomic variables in survival
models of default
Dr Tony Bellotti
Department of Mathematics, Imperial College London
[email protected]
Royal Statistical Society, 10 February 2016
Discrete Survival Models for Retail Credit Scoring
Outline of presentation
1. Background: credit scoring models
2. Including macroeconomic variables (MEVs) using
a discrete survival model.
3. Challenges
4. Results based on mortgage data including stress
test
Background - Credit Scoring
β€’
Typically, risk models for retail credit are models of default.
β€’
Hence this is a statistical classification problem.
β€’
Almost universally, logistic regression is the model of choice in
retail banks.
𝑃 π‘Œ = 0|𝐗 = 𝐱 = 𝐹 𝑠 𝐱 where 𝑠 𝐱 = 𝛽0 + 𝛃 βˆ™ 𝐱
where π‘Œ ∈ 0,1 is a default indicator, 𝐹 is the logit link function and
𝑠 is the scorecard.
β€’
Default models are used for various functions including application
decisions, behavioural scoring and loss provisioning.
For example, for application scoring, the bank sets a threshold 𝑐,
depending on risk appetite, then accepts new applications 𝐱 new iff
𝑠 𝐱 new > 𝒄.
Introducing macroeconomic variables (MEVs)
β€’
More accurate risk management, sensitive to changing economic
conditions.
β€’
Pressure from regulators (Basel Accord internationally and
Prudential Risk Authority specifically in UK) to develop models that
calibrate economic conditions against credit risk, enabling forecasts
of risk during recession periods (stress test).
β€’
Logistic regression does not naturally allow inclusion of
macroeconomic time series, but survival models do through timevarying covariates (TVCs).
β€’
Discrete survival model is good option since:
(1) The default (failure) event is observed at discrete intervals (ie
monthly accounting data).
(2) Computationally efficient.
List of Challenges
1.
2.
3.
4.
5.
6.
7.
8.
Selection of MEVs
Structural form of MEVs (ie lag and transformations)
Time trend in MEVs
Correlations amongst MEVs
Systematic effects not fully explained by MEVs
Change in MEV risk factors over time
Segmentation and interactions
Confounding economic effects with behavioural variables
We will illustrate these challenges in this presentation using an
example of US mortgage credit risk modelling.
Discrete survival model structure for credit risk
𝑃 π‘Œπ‘–π‘‘ = 1|π‘Œπ‘–π‘  = 0 for 𝑠 < 𝑑, 𝐰𝑖 , 𝐱 𝑖 π‘‘βˆ’π‘˜ , π³π‘Žπ‘–+π‘‘βˆ’π‘™
= 𝐹 𝛽0 + π›ƒπ‘‡πœ™ πœ™ 𝑑 + 𝛃𝑇𝐰 𝐰𝑖 + 𝛃𝑇𝐱 𝐱 𝑖 π‘‘βˆ’π‘˜ + 𝑠𝑖 + 𝛃𝑇𝐳 π³π‘Žπ‘–+π‘‘βˆ’π‘™ + π›Ύπ‘Žπ‘– +𝑑
where
π‘Œπ‘–π‘‘
πœ™ 𝑑
𝐰𝑖
𝐱𝑖
π‘‘βˆ’π‘˜
π‘Žπ‘–
𝑠𝑖
π³π‘Žπ‘– +π‘‘βˆ’π‘™
π›Ύπ‘Žπ‘– +𝑑
Outcome on account 𝑖 after some discrete duration 𝑑 > 0:
1 = default, 0 = non-default.
Typically, duration 𝑑 is age of the account.
Non-linear transformation of duration; Baseline hazard.
eg, πœ™ 𝑑 = 𝑑, 𝑑 2 , log 𝑑 , log 𝑑 2
Static variables; eg application variables and cohort effect.
Behavioural variables over time (with some lag π‘˜).
Date of origination of account 𝑖.
Frailty term on account 𝑖.
MEVs over calendar time π‘Žπ‘– + 𝑑 (with some lag 𝑙).
Unknown systematic (calendar time) effect.
Model estimation
β€’
This is a panel model structure over accounts 𝑖 and duration 𝑑.
β€’
Need to specify a link function 𝐹. This could be logit or probit.
β€’ Taking 𝐹 to be complementary log-log, ie
𝐹 𝑐 = 1 βˆ’ exp βˆ’exp 𝑐 , yields a discrete version of the Cox
proportional hazard model.
β€’
Most of the variables are included as fixed effect terms.
β€’
Frailty 𝑠𝑖 can be included as a random effect term to deal with
heterogeneity.
β€’
Maximum marginal likelihood can be used to estimate coefficients on
fixed effects (𝛽0 , π›ƒπœ™ , 𝛃𝐰 , 𝛃𝐱 , 𝛃𝐳 ) and variance of the random effects.
US Mortgage data
β€’ We will use a large data set of account-level mortgage data for
illustration.
β€’ Freddie Mac loan-level mortgage data set.
β€’ Origination: 1999 to 2012.
β€’ 181,000 loans (random sample, stratified by origination).
β€’ Default event: D180 (180 days delinquency), short sale or short
payoff prior to D180 or deed-in-lieu of foreclosure prior to D180.
Heat map:
Default rate by calendar month and account age
Account age (months)
150
100
50
0
Jan
1999
Jan
2003
Jan
2007
Jan
2009
Jan
2011
Jan
2013
Hazard probability
Age of mortgage effect = Hazard probability
0
50
100
Loan age (months)
150
Calendar time effect
Default
risk
β€’ This is the estimated risk over calendar time with (full line) and without
(dashed line) age, vintage and seasonality included in the model.
Jan
1999
Jan
2003
Jan
2007
Jan
2011
β€’ We see that not including other time components would lead to
inaccurate coefficient estimates for the MEV effects.
Challenge 1: Selection of MEVs
1.
2.
3.
Consider MEVs that we would expect to have a direct effect on default.
National versus local MEVs.
Consider MEVs that are required for stress testing, as specified by
regulators or the business.
°
For this exercise: US GDP, Unemployment rate (UR), House price
index (HPI) and interest rate (IR).
Challenge 2: Structural form of MEVs
How should we include the MEVs in our model?
Things to consider:β€’ Choice of lag structure (including possibly geometric lag);
β€’ Whether to use difference in MEV;
β€’ Whether to smooth the MEV time series prior to inclusion in the
model;
β€’ Whether to include seasonally adjusted or real values;
β€’ Cumulative effects (eg we may expect high unemployment to
have a greater effect, the longer it continues);
β€’ Whether the MEV need to be transformed prior to inclusion in
the model (eg log transform for price index variables).
Challenge 3: Time trends in MEVs
β€’ Time trends in the MEVs are problematic since this could lead to
spurious correlation with default risk.
β€’ Simulation studies in the survival model setting show time trends
in MEVs can lead to errors in coefficient estimates.
β€’ Therefore do not include MEVs with time trends. This effects
GDP and HPI, in particular.
β€’ Solution #1: First difference? But this will only fit default risk
against short changes in MEVs (is this just noise?).
β€’ Solution #2: Annual difference? This is better since this gives
information regarding change in economy over a period of time;
eg GDP growth. Also, do not need to worry whether or not to
seasonally adjust.
First attempt at modelling…
Variable
Lag *
(months)
Coefficient
estimate
SE
P-value
Expected
sign
IR
0
+1.80
0.029
<0.0001
+οƒΌ
IR (log)
0
-7.30
0.158
<0.0001
Ξ”HPI
21
-3.72
0.372
<0.0001
-οƒΌ
Ξ”GDP
15
-6.35
0.900
<0.0001
-οƒΌ
UR
3
+0.245
0.0093
<0.0001
+οƒΌ
Ξ”UR
12
-0.209
0.0149
<0.0001
+
Age effect…
Vintage effect…
Seasonality…
* Choice of lag by finding best fit in a series of univariate studies.
… try removing Ξ”UR
Variable
Lag
(months)
Coefficient
estimate
SE
P-value
Expected
sign
IR
0
+1.82
0.029
<0.0001
+οƒΌ
IR (log)
0
-7.37
0.157
<0.0001
Ξ”HPI
21
-3.68
0.369
<0.0001
-οƒΌ
Ξ”GDP
15
+4.65
0.438
<0.0001
-
UR
3
+0.186
0.0083
<0.0001
+οƒΌ
Age effect…
Vintage effect…
Seasonality…
… but now we have a problem with estimate for Ξ”GDP.
What is going on?
Challenge 4: Correlations between MEVs
Ξ”HPI
Ξ”GDP
UR
Ξ”UR
Ξ”HPI
1
0.564
-0.835
-0.431
Ξ”GDP
0.564
1
-0.728
-0.842
UR
-0.835
-0.728
1
0.543
Ξ”UR
-0.431
-0.842
0.543
1
β€’ This correlation matrix demonstrates some very high correlations
amongst the MEVs.
β€’ Solution #1: Variable selection – but this may remove some
variables that are required in stress testing.
β€’ Solution #2: Factor analysis to determine macroeconomic factors
(MFs) to include in the model.
Principal Component Analysis on MEVs
Variable
MF1
MF2
MF3
Ξ”HPI
+0.474
-0.596
-0.562
Ξ”GDP
+0.528
0.359
0.431
UR
-0.524
0.375
-0.512
Ξ”UR
-0.471
-0.613
0.486
Proportion
of variance
74.5%
18.5%
4.5%
β€’ The first component (MF1) represents much of the economic effect
among the MEVs.
β€’ MF1 also has an unambiguous interpretation as a measure of
economic health.
β€’ The remaining components do not account for much of the variance
and do not have a natural interpretation, hence only MF1 will be
included in the model.
Structure: Relationship between MF1 and default
Score
Bad
Good
Bad ----------------
MF1
--------------- Good
There is a distinct β€œbreakpoint” in the risk profile of MF1.
This can be modelled with an interaction term.
Model with MF1
Variable
Coefficient
estimate
SE
P-value
Expected
sign
IR
+1.83
0.030
<0.0001
+οƒΌ
IR (log)
-7.41
0.158
<0.0001
MF1
-0.059
0.0056
<0.0001
(High MF1)
-2.17
0.0801
<0.0001
(High MF1) × MF1
-0.331
0.0119
<0.0001
Age effect…
Vintage effect…
Seasonality…
(High MF1) is an indicator variable with value 0 or 1.
-οƒΌ
-οƒΌ
Challenge 6: Residual systematic effect
β€’ Allow a calendar fixed effect to model residual systematic risk, not
modelled by MEVs (or by seasonality).
β€’ Compute standard deviation (s.d) of this effect to estimate size of the
β€œunexplained” systematic effect.
Model
Unexplained
effect (s.d)
Age, vintage but no MEVs
0.705
Age, vintage and linear MF1
0.223
Age, vintage and nonlinear MF1
0.131
β€’ Including MF1 explains much of the systematic effect, but a residual
remains unexplained (19%).
β€’ Including the MF1 with the interaction term improves the fit.
β€’ The residual estimate is important to quantify conservatism if using
the model for forecasting or stress testing.
Challenge 7: Economic model breakpoints
How stable are MEV risk factors?
β€’ One question we may have is whether the effects of
MEVs on default risk are stable over time.
β€’ In particular, after an economic regime changes.
β€’ Use a breakpoint model to test for this…
Breakpoint model…
Variable
Coefficient
estimate
SE
P-value
Expected
sign
MF1
-0.076
0.0059
<0.0001
-οƒΌ
(High MF1)
-1.82
0.0984
<0.0001
D
+0.679
0.122
<0.0001
(High MF1) × MF1
-0.227
0.0152
<0.0001
D × MF1
+0.0511
0.0260
0.0498
Other time
effects…
D = indicator variable:
β€’
-οƒΌ
1 if calendar date before or during Feb 2006,
0 otherwise
D × MF1 effect is not significant, indicating no evident difference in
economic effect in the two different time periods.
Further challenges
7. Segmentations / interactions
β€’ It is plausible that different segments of the population will react to
different MEVs in different ways.
β€’ Eg high LTV accounts may be more sensitive to changes in HPI.
β€’ Therefore, explore different model segments or variable interactions.
8. Confounding economic effects with behavioural variables (BVs)
β€’ We may want to include BVs as TVCs; however, the MEVs may be
confounders for these effects.
β€’ Eg economic conditions may affect repayments, in general.
β€’ Therefore, test for confounding and build the model in stages (eg
include MEVs in first stage, then introduce BVs).
Default rate (monthly)
Validate model with back-testing
Jan
1999
Jan
2003
Jan
2007
Jan
2011
Jan
2013
β€’ Use Default Rate (DR) during that period to measure performance.
β€’ Use conservatism (as 2 × s.d of unknown systematic effect).
Result 2: Stress test results
Scenario
UR
GDP
HPI
IR
Baseline
-2% over 2 years
2.5% growth per
annum
7% increase
per annum
No change
Stress
Rise from 7.4% to
peak of 10.6%
over 2 years
Reduction from
+2% to -2%
growth per annum
Zero increase
No change
IR rise
-2% over 2 years
2.5% growth per
annum
7% increase
per annum
Average +2%
increase over 2
years
Projection of annual default rate:
Scenario Conservatism
Year 1
Year 2
Baseline
No
1.21%
0.83%
Stress
No
1.67%
2.34%
Stress
Yes
2.21%
3.20%
IR rise
No
1.73%
2.76%
Conclusion
1. Including MEVs in credit risk models is valuable to enable
more accurate risk management, sensitive to economic
changes, as well as stress testing.
2. We have illustrated several challenges when including MEVs
in credit risk models.
3. And suggested several approaches to handle these
challenges, demonstrating results on a large US mortgage
data set.