Challenges arising when including macroeconomic variables in survival models of default Dr Tony Bellotti Department of Mathematics, Imperial College London [email protected] Royal Statistical Society, 10 February 2016 Discrete Survival Models for Retail Credit Scoring Outline of presentation 1. Background: credit scoring models 2. Including macroeconomic variables (MEVs) using a discrete survival model. 3. Challenges 4. Results based on mortgage data including stress test Background - Credit Scoring β’ Typically, risk models for retail credit are models of default. β’ Hence this is a statistical classification problem. β’ Almost universally, logistic regression is the model of choice in retail banks. π π = 0|π = π± = πΉ π π± where π π± = π½0 + π β π± where π β 0,1 is a default indicator, πΉ is the logit link function and π is the scorecard. β’ Default models are used for various functions including application decisions, behavioural scoring and loss provisioning. For example, for application scoring, the bank sets a threshold π, depending on risk appetite, then accepts new applications π± new iff π π± new > π. Introducing macroeconomic variables (MEVs) β’ More accurate risk management, sensitive to changing economic conditions. β’ Pressure from regulators (Basel Accord internationally and Prudential Risk Authority specifically in UK) to develop models that calibrate economic conditions against credit risk, enabling forecasts of risk during recession periods (stress test). β’ Logistic regression does not naturally allow inclusion of macroeconomic time series, but survival models do through timevarying covariates (TVCs). β’ Discrete survival model is good option since: (1) The default (failure) event is observed at discrete intervals (ie monthly accounting data). (2) Computationally efficient. List of Challenges 1. 2. 3. 4. 5. 6. 7. 8. Selection of MEVs Structural form of MEVs (ie lag and transformations) Time trend in MEVs Correlations amongst MEVs Systematic effects not fully explained by MEVs Change in MEV risk factors over time Segmentation and interactions Confounding economic effects with behavioural variables We will illustrate these challenges in this presentation using an example of US mortgage credit risk modelling. Discrete survival model structure for credit risk π πππ‘ = 1|πππ = 0 for π < π‘, π°π , π± π π‘βπ , π³ππ+π‘βπ = πΉ π½0 + πππ π π‘ + πππ° π°π + πππ± π± π π‘βπ + π π + πππ³ π³ππ+π‘βπ + πΎππ +π‘ where πππ‘ π π‘ π°π π±π π‘βπ ππ π π π³ππ +π‘βπ πΎππ +π‘ Outcome on account π after some discrete duration π‘ > 0: 1 = default, 0 = non-default. Typically, duration π‘ is age of the account. Non-linear transformation of duration; Baseline hazard. eg, π π‘ = π‘, π‘ 2 , log π‘ , log π‘ 2 Static variables; eg application variables and cohort effect. Behavioural variables over time (with some lag π). Date of origination of account π. Frailty term on account π. MEVs over calendar time ππ + π‘ (with some lag π). Unknown systematic (calendar time) effect. Model estimation β’ This is a panel model structure over accounts π and duration π‘. β’ Need to specify a link function πΉ. This could be logit or probit. β’ Taking πΉ to be complementary log-log, ie πΉ π = 1 β exp βexp π , yields a discrete version of the Cox proportional hazard model. β’ Most of the variables are included as fixed effect terms. β’ Frailty π π can be included as a random effect term to deal with heterogeneity. β’ Maximum marginal likelihood can be used to estimate coefficients on fixed effects (π½0 , ππ , ππ° , ππ± , ππ³ ) and variance of the random effects. US Mortgage data β’ We will use a large data set of account-level mortgage data for illustration. β’ Freddie Mac loan-level mortgage data set. β’ Origination: 1999 to 2012. β’ 181,000 loans (random sample, stratified by origination). β’ Default event: D180 (180 days delinquency), short sale or short payoff prior to D180 or deed-in-lieu of foreclosure prior to D180. Heat map: Default rate by calendar month and account age Account age (months) 150 100 50 0 Jan 1999 Jan 2003 Jan 2007 Jan 2009 Jan 2011 Jan 2013 Hazard probability Age of mortgage effect = Hazard probability 0 50 100 Loan age (months) 150 Calendar time effect Default risk β’ This is the estimated risk over calendar time with (full line) and without (dashed line) age, vintage and seasonality included in the model. Jan 1999 Jan 2003 Jan 2007 Jan 2011 β’ We see that not including other time components would lead to inaccurate coefficient estimates for the MEV effects. Challenge 1: Selection of MEVs 1. 2. 3. Consider MEVs that we would expect to have a direct effect on default. National versus local MEVs. Consider MEVs that are required for stress testing, as specified by regulators or the business. ° For this exercise: US GDP, Unemployment rate (UR), House price index (HPI) and interest rate (IR). Challenge 2: Structural form of MEVs How should we include the MEVs in our model? Things to consider:β’ Choice of lag structure (including possibly geometric lag); β’ Whether to use difference in MEV; β’ Whether to smooth the MEV time series prior to inclusion in the model; β’ Whether to include seasonally adjusted or real values; β’ Cumulative effects (eg we may expect high unemployment to have a greater effect, the longer it continues); β’ Whether the MEV need to be transformed prior to inclusion in the model (eg log transform for price index variables). Challenge 3: Time trends in MEVs β’ Time trends in the MEVs are problematic since this could lead to spurious correlation with default risk. β’ Simulation studies in the survival model setting show time trends in MEVs can lead to errors in coefficient estimates. β’ Therefore do not include MEVs with time trends. This effects GDP and HPI, in particular. β’ Solution #1: First difference? But this will only fit default risk against short changes in MEVs (is this just noise?). β’ Solution #2: Annual difference? This is better since this gives information regarding change in economy over a period of time; eg GDP growth. Also, do not need to worry whether or not to seasonally adjust. First attempt at modellingβ¦ Variable Lag * (months) Coefficient estimate SE P-value Expected sign IR 0 +1.80 0.029 <0.0001 +οΌ IR (log) 0 -7.30 0.158 <0.0001 ΞHPI 21 -3.72 0.372 <0.0001 -οΌ ΞGDP 15 -6.35 0.900 <0.0001 -οΌ UR 3 +0.245 0.0093 <0.0001 +οΌ ΞUR 12 -0.209 0.0149 <0.0001 +ο» Age effectβ¦ Vintage effectβ¦ Seasonalityβ¦ * Choice of lag by finding best fit in a series of univariate studies. β¦ try removing ΞUR Variable Lag (months) Coefficient estimate SE P-value Expected sign IR 0 +1.82 0.029 <0.0001 +οΌ IR (log) 0 -7.37 0.157 <0.0001 ΞHPI 21 -3.68 0.369 <0.0001 -οΌ ΞGDP 15 +4.65 0.438 <0.0001 -ο» UR 3 +0.186 0.0083 <0.0001 +οΌ Age effectβ¦ Vintage effectβ¦ Seasonalityβ¦ β¦ but now we have a problem with estimate for ΞGDP. What is going on? Challenge 4: Correlations between MEVs ΞHPI ΞGDP UR ΞUR ΞHPI 1 0.564 -0.835 -0.431 ΞGDP 0.564 1 -0.728 -0.842 UR -0.835 -0.728 1 0.543 ΞUR -0.431 -0.842 0.543 1 β’ This correlation matrix demonstrates some very high correlations amongst the MEVs. β’ Solution #1: Variable selection β but this may remove some variables that are required in stress testing. β’ Solution #2: Factor analysis to determine macroeconomic factors (MFs) to include in the model. Principal Component Analysis on MEVs Variable MF1 MF2 MF3 ΞHPI +0.474 -0.596 -0.562 ΞGDP +0.528 0.359 0.431 UR -0.524 0.375 -0.512 ΞUR -0.471 -0.613 0.486 Proportion of variance 74.5% 18.5% 4.5% β’ The first component (MF1) represents much of the economic effect among the MEVs. β’ MF1 also has an unambiguous interpretation as a measure of economic health. β’ The remaining components do not account for much of the variance and do not have a natural interpretation, hence only MF1 will be included in the model. Structure: Relationship between MF1 and default Score Bad Good Bad ---------------- MF1 --------------- Good There is a distinct βbreakpointβ in the risk profile of MF1. This can be modelled with an interaction term. Model with MF1 Variable Coefficient estimate SE P-value Expected sign IR +1.83 0.030 <0.0001 +οΌ IR (log) -7.41 0.158 <0.0001 MF1 -0.059 0.0056 <0.0001 (High MF1) -2.17 0.0801 <0.0001 (High MF1) × MF1 -0.331 0.0119 <0.0001 Age effectβ¦ Vintage effectβ¦ Seasonalityβ¦ (High MF1) is an indicator variable with value 0 or 1. -οΌ -οΌ Challenge 6: Residual systematic effect β’ Allow a calendar fixed effect to model residual systematic risk, not modelled by MEVs (or by seasonality). β’ Compute standard deviation (s.d) of this effect to estimate size of the βunexplainedβ systematic effect. Model Unexplained effect (s.d) Age, vintage but no MEVs 0.705 Age, vintage and linear MF1 0.223 Age, vintage and nonlinear MF1 0.131 β’ Including MF1 explains much of the systematic effect, but a residual remains unexplained (19%). β’ Including the MF1 with the interaction term improves the fit. β’ The residual estimate is important to quantify conservatism if using the model for forecasting or stress testing. Challenge 7: Economic model breakpoints How stable are MEV risk factors? β’ One question we may have is whether the effects of MEVs on default risk are stable over time. β’ In particular, after an economic regime changes. β’ Use a breakpoint model to test for thisβ¦ Breakpoint modelβ¦ Variable Coefficient estimate SE P-value Expected sign MF1 -0.076 0.0059 <0.0001 -οΌ (High MF1) -1.82 0.0984 <0.0001 D +0.679 0.122 <0.0001 (High MF1) × MF1 -0.227 0.0152 <0.0001 D × MF1 +0.0511 0.0260 0.0498 Other time effectsβ¦ D = indicator variable: β’ -οΌ 1 if calendar date before or during Feb 2006, 0 otherwise D × MF1 effect is not significant, indicating no evident difference in economic effect in the two different time periods. Further challenges 7. Segmentations / interactions β’ It is plausible that different segments of the population will react to different MEVs in different ways. β’ Eg high LTV accounts may be more sensitive to changes in HPI. β’ Therefore, explore different model segments or variable interactions. 8. Confounding economic effects with behavioural variables (BVs) β’ We may want to include BVs as TVCs; however, the MEVs may be confounders for these effects. β’ Eg economic conditions may affect repayments, in general. β’ Therefore, test for confounding and build the model in stages (eg include MEVs in first stage, then introduce BVs). Default rate (monthly) Validate model with back-testing Jan 1999 Jan 2003 Jan 2007 Jan 2011 Jan 2013 β’ Use Default Rate (DR) during that period to measure performance. β’ Use conservatism (as 2 × s.d of unknown systematic effect). Result 2: Stress test results Scenario UR GDP HPI IR Baseline -2% over 2 years 2.5% growth per annum 7% increase per annum No change Stress Rise from 7.4% to peak of 10.6% over 2 years Reduction from +2% to -2% growth per annum Zero increase No change IR rise -2% over 2 years 2.5% growth per annum 7% increase per annum Average +2% increase over 2 years Projection of annual default rate: Scenario Conservatism Year 1 Year 2 Baseline No 1.21% 0.83% Stress No 1.67% 2.34% Stress Yes 2.21% 3.20% IR rise No 1.73% 2.76% Conclusion 1. Including MEVs in credit risk models is valuable to enable more accurate risk management, sensitive to economic changes, as well as stress testing. 2. We have illustrated several challenges when including MEVs in credit risk models. 3. And suggested several approaches to handle these challenges, demonstrating results on a large US mortgage data set.
© Copyright 2026 Paperzz