Webinar: Managing Model Uncertainty – Sensitivity

Webinar: Managing Model Uncertainty –
Sensitivity Analysis and Bootstrapping
Manage the Uncertainty Risk Inherent in Financial Models
July 15, 2015
©2015 FI Consulting. All rights reserved.
Agenda
Introductions
1
2
What is Model
Uncertainty?
Sensitivity Analysis & Bootstrapping
3
Q&A
4 5
Wrap-up
©2015 FI Consulting. All rights reserved.
Introductions
 FI Consulting (FI) specializes in providing financial
institution clients with custom analytics, model
validation and advisory services, and development
support for financial and analytic software.
ROBERT CHANG
 Our experience spans commercial, government,
non-profit, and international sector financial
institutions.
 We focus on diverse sectors of the credit market
including mortgages, consumer finance, and small
business loans.
VADIM BONDARENKO
©2015 FI Consulting. All rights reserved.
What is Model Uncertainty?
1
2
3
4 5
What is Model
Uncertainty?
©2015 FI Consulting. All rights reserved.
What is Model Uncertainty?
©2015 FI Consulting. All rights reserved.
What is Model Uncertainty?
©2015 FI Consulting. All rights reserved.
What is Model Uncertainty?
Model Precision
depends on:
• Data
Measurement
• Parameter
Values
• Model
Structure
• Model
Algorithms
Sources of
Uncertainty?
• Model Design
Uncertainty
• Parameter
Uncertainty
• Temporal
Variability
• Spatial
Variability
Ways to Test
Uncertainty
• Linear relation
• Monotonic
relation
• Trends in
central
location
• Trends in
variability
©2015 FI Consulting. All rights reserved.
Sensitivity Analysis & Bootstrapping
Sensitivity Analysis & Bootstrapping
1
2
3
4 5
©2015 FI Consulting. All rights reserved.
Two R Packages for Sensitivity Analysis
 Package ‘pse’ - Parameter space exploration with Latin
Hypercubes
 Example: modeling the spread between deposit interest
rates and the short-term market interest rate
©2015 FI Consulting. All rights reserved.
Two R Packages for Sensitivity Analysis
 Package ‘pse’ - Parameter space exploration with Latin
Hypercubes
 Example: modeling the spread between deposit interest
rates and the short-term market interest rate
 Package ‘perturb’ - introduce small random changes to the
data and recalculate the estimate.
 Example: modeling the loan level prepayment probability of
single-family mortgage loans
©2015 FI Consulting. All rights reserved.
Parameter space exploration with Latin Hypercubes
©2015 FI Consulting. All rights reserved.
Parameter space exploration with Latin Hypercubes
©2015 FI Consulting. All rights reserved.
Parameter space exploration with Latin Hypercubes
Generate Samples from the Parameter Space
Which distribution function(s) do we select?
Is every parameter combination is meaningful?
©2015 FI Consulting. All rights reserved.
Parameter space exploration with Latin Hypercubes
Pass the samples to your model
The model is the function or simulation to be run.
Must specify hypercube size (# samples generated).
Generate Samples from the Parameter Space
Which distribution function(s) do we select?
Is every parameter combination is meaningful?
©2015 FI Consulting. All rights reserved.
Parameter space exploration with Latin Hypercubes
Measure the association between
the result and each input parameter.
Pass the samples to your model
The model is the function or simulation to be run.
Must specify hypercube size (# samples generated).
Generate Samples from the Parameter Space
Which distribution function(s) do we select?
Is every parameter combination is meaningful?
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
Our deposit rent model will use four parameters, g = growth rate, r = market
interest rate, and two risk-premium factors a and b.
𝑦 = 1 + 𝑔 ∗ (𝑟 − 𝑎 + 𝑏 ∗ 𝑟 )
𝑦
0.746
𝑎
𝑏
1.96 0.4
𝑔
7.3
𝑟
4.396
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We are interested in studying the effects of the parameters g, a, b, and r on the
deposit rent. We assume that the growth rate and risk premium factors follow a
normal distribution, and the market interest rate follows a uniform distribution.
Parameter
𝑔
𝑎
𝑏
𝑟
Distribution
normal
normal
normal
uniform
Arguments
𝛍 = 1.865, 𝛔 = 0.15
𝛍 = 0.4, 𝛔 = 0.05
𝛍 = 0.09, 𝛔 = 0.01
min = 3, max = 4
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create three R objects: the names of the parameters, the distribution
functions, and a nested list containing the parameters to the distribution functions.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create three R objects: the names of the parameters, the distribution
functions, and a nested list containing the parameters to the distribution functions.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create three R objects: the names of the parameters, the distribution
functions, and a nested list containing the parameters to the distribution functions.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create a "wrapper" using the function mapply. Our wrapper function applies
the four parameters to the model, then returns a single output value.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create a "wrapper" using the function mapply. Our wrapper function applies
the four parameters to the model, then returns a single output value.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We create a "wrapper" using the function mapply. Our wrapper function applies
the four parameters to the model, then returns a single output value.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We use the LHS function to generate a hypercube for the model. We pass the
model, the parameter names, and the parameter PDFs to the LHS function.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We use the LHS function to generate a hypercube for the model. We pass the
model, the parameter names, and the parameter PDFs to the LHS function.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We use the LHS function to generate a hypercube for the model. We pass the
model, the parameter names, and the parameter PDFs to the LHS function.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We use the LHS function to generate a hypercube for the model. We pass the
model, the parameter names, and the parameter PDFs to the LHS function.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We can view the samples generated and confirm that the LHS procedure
generated a well-balanced sample according the probability distribution specified.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We can view the samples generated and confirm that the LHS procedure
generated a well-balanced sample according the probability distribution specified.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
We can visually examine a scatterplot of the result as a function of each
parameter. This is a useful sensitivity analysis tool as all the parameters are being
slightly changed for each run of the model.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
The partial (rank) correlation coefficient measures how strong are the linear
associations between the result and each input parameter, after removing the
linear effect of the other parameters.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
The partial (rank) correlation coefficient measures how strong are the linear
associations between the result and each input parameter, after removing the
linear effect of the other parameters.
©2015 FI Consulting. All rights reserved.
Example: Modeling Deposit Rents
The partial correlation coefficient and confidence intervals can also be displayed
graphically.
©2015 FI Consulting. All rights reserved.
Perturbation Sensitivity Analysis
“Perturb” assesses the impact of small random changes to variables on
parameter estimates. If small amounts of noise do not alter the estimated
coefficients, you can publish them with greater confidence.
©2015 FI Consulting. All rights reserved.
Perturbation Sensitivity Analysis
“Perturb” assesses the impact of small random changes to variables on
parameter estimates. If small amounts of noise do not alter the estimated
coefficients, you can publish them with greater confidence.
Specify the data, model,
and model options
Introduce small random
perturbations to the data
Recalculate the estimate
using the perturbed data
Examine the sensitivity of
the parameter estimates
When the estimates produced using this technique vary greatly, the model
estimation is necessarily unstable
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will add random noise to variables such as LTV, mortgage rate, and income,
and then re-estimate our prepayment model.
orig_mrtg ratio_loan
_amt
_to_vl
$58,383
97
$143,673
95
$96,687
89.72
$130,740
94.02
$63,011
97
$56,078
85
$168,591
96.01
$102,210
95
$90,830
97.37
int_rt
6.5
6.5
6
5.125
6
5.875
5.875
7.5
6.25
tot_ann_eff tot_mnthly term_typ
_incm _mtg_pymt
_cd
$57,842
$48,096
$471
1
$142,883
$51,108
$1,066
1
$94,454
$31,764
$720
0
$128,693
$56,604
$886
0
$62,104
$25,992
$471
1
$54,690
$29,640
$443
1
$164,225
$89,616
$1,253
1
$101,104 $107,208
$851
0
$89,018
$32,760
$654
1
unpd_bal
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will add random noise to variables such as LTV, mortgage rate, and income,
and then re-estimate our prepayment model.
orig_mrtg ratio_loan
_amt
_to_vl
$58,383
97
$143,673
95
$96,687
89.72
$130,740
94.02
$63,011
97
$56,078
85
$168,591
96.01
$102,210
95
$90,830
97.37
int_rt
6.5
6.5
6
5.125
6
5.875
5.875
7.5
6.25
tot_ann_eff tot_mnthly term_typ
_incm _mtg_pymt
_cd
$57,842
$48,096
$471
1
$142,883
$51,108
$1,066
1
$94,454
$31,764
$720
0
$128,693
$56,604
$886
0
$62,104
$25,992
$471
1
$54,690
$29,640
$443
1
$164,225
$89,616
$1,253
1
$101,104 $107,208
$851
0
$89,018
$32,760
$654
1
unpd_bal
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will add random noise to variables such as LTV, mortgage rate, and income,
and then re-estimate our prepayment model.
orig_mrtg ratio_loan
_amt
_to_vl
$58,383
97
$143,673
95
$96,687
89.72
$130,740
94.02
$63,011
97
$56,078
85
$168,591
96.01
$102,210
95
$90,830
97.37
int_rt
6.5
6.5
6
5.125
6
5.875
5.875
7.5
6.25
tot_ann_eff tot_mnthly term_typ
_incm _mtg_pymt
_cd
$57,842
$48,096
$471
1
$142,883
$51,108
$1,066
1
$94,454
$31,764
$720
0
$128,693
$56,604
$886
0
$62,104
$25,992
$471
1
$54,690
$29,640
$443
1
$164,225
$89,616
$1,253
1
$101,104 $107,208
$851
0
$89,018
$32,760
$654
1
unpd_bal
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will use a generalized logit model, however the package “Perturb” is
compatible with many different model types.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will use a generalized logit model, however the package “Perturb” is
compatible with many different model types.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will use a generalized logit model, however the package “Perturb” is
compatible with many different model types.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We will use a generalized logit model, however the package “Perturb” is
compatible with many different model types.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then determine the magnitude of perturbations. The random perturbations
can be drawn from either a normal distribution or uniform distribution.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then determine the magnitude of perturbations. The random perturbations
can be drawn from either a normal distribution or uniform distribution.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then determine the magnitude of perturbations. The random perturbations
can be drawn from either a normal distribution or uniform distribution.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then determine the magnitude of perturbations. The random perturbations
can be drawn from either a normal distribution or uniform distribution.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then check if the estimates based on small changes to the data are unstable
and vary greatly. In this example, the perturbed estimates are very similar to the
original estimates.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then check if the estimates based on small changes to the data are unstable
and vary greatly. In this example, the perturbed estimates are very similar to the
original estimates.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then check if the estimates based on small changes to the data are unstable
and vary greatly. In this example, the perturbed estimates are very similar to the
original estimates.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then check if the estimates based on small changes to the data are unstable
and vary greatly. In this example, the perturbed estimates are very similar to the
original estimates.
©2015 FI Consulting. All rights reserved.
Example: Econometric Model for Prepayment Probabilities
We then check if the estimates based on small changes to the data are unstable
and vary greatly. In this example, the perturbed estimates are very similar to the
original estimates.
©2015 FI Consulting. All rights reserved.
Managing Model Uncertainty
THE BOOTSTRAP APPROACH
©2015 FI Consulting. All rights reserved.
The Bootstrap
1.
2.
The Intuition
Applications in Model Validation
A.
B.
3.
Stability of Regression Coefficients
Predictive Accuracy
Key Takeaways
©2015 FI Consulting. All rights reserved.
When to use Bootstrap?
Simulation-based procedure for estimating and validating
distributions when:
©2015 FI Consulting. All rights reserved.
When to use Bootstrap?
Simulation-based procedure for estimating and validating
distributions when:
1.
The theoretical distribution of a statistic is complicated or
unknown.
©2015 FI Consulting. All rights reserved.
When to use Bootstrap?
Simulation-based procedure for estimating and validating
distributions when:
1.
2.
The theoretical distribution of a statistic is complicated or
unknown.
The sample size is insufficient for straightforward statistical
inference.
©2015 FI Consulting. All rights reserved.
When to use Bootstrap?
Simulation-based procedure for estimating and validating
distributions when:
1. The theoretical distribution of a statistic is complicated or
unknown.
2. Sample size is insufficient for straightforward statistical
inference.
3. Distribution assumption inherent to the model but needs
validation
©2015 FI Consulting. All rights reserved.
The Bootstrap


The Intuition
Applications in Model Validation



Stability of Regression Coefficients
Predictive Accuracy
Key Takeaways
©2015 FI Consulting. All rights reserved.
Basic Example: Confidence Intervals of a Sample Median
The Data

x of size n=9, which is simply numbers 1 through 9.
Sample Observations x
x
x
Obs 11
Obs
Obs 22
Obs
Obs 33
Obs
Obs 44
Obs
Obs 55
Obs
Obs 66
Obs
Obs 77
Obs
Obs 88
Obs
Obs 99
Obs
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
1
9
©2015 FI Consulting. All rights reserved.
Basic Example: Confidence Intervals of a Sample Median
The Data


x of size n=9, which is simply numbers 1 through 9.
assume that x is representative of a larger population X
Sample Observations x
x
x
Obs 11
Obs
Obs 22
Obs
Obs 33
Obs
Obs 44
Obs
Obs 55
Obs
Obs 66
Obs
Obs 77
Obs
Obs 88
Obs
Obs 99
Obs
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
1
9
©2015 FI Consulting. All rights reserved.
Basic Example: Confidence Intervals of a Sample Median
The Data



x of size n=9, which is simply numbers 1 through 9.
assume that x is representative of a larger population X
The observed median = 5 (the middle Observation 5)
Sample Observations x
x
x
Obs 11
Obs
Obs 22
Obs
Obs 33
Obs
Obs 44
Obs
Obs 55
Obs
Obs 66
Obs
Obs 77
Obs
Obs 88
Obs
Obs 99
Obs
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
1
9
©2015 FI Consulting. All rights reserved.
Basic Example: Confidence Intervals of a Sample Median
The Data




x of size n=9, which is simply numbers 1 through 9.
assume that x is representative of a larger population X
The observed median = 5 (the middle Observation 5)
What is the 90% confidence interval around that estimate?
Sample Observations x
x
Obs 1
Obs 2
Obs 3
Obs 4
Obs 5
Obs 6
Obs 7
Obs 8
Obs 9
1
2
3
4
5
6
7
8
9
©2015 FI Consulting. All rights reserved.
Basic Bootstrap Setup: “Samples of Sample”
•
•
Randomly draw R=10 samples with replacement
Each draw is of the same length (n=9) as original sample
©2015 FI Consulting. All rights reserved.
Bootstrapped Samples
Original Sample x
Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Obs 6 Obs 7 Obs 8 Obs 9
1
2
3
4
5
6
7
8
9
©2015 FI Consulting. All rights reserved.
Bootstrapped Samples
Original Sample x
Bootstrap Sample 1
Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Obs 6 Obs 7 Obs 8 Obs 9
1
2
3
4
5
6
7
8
9
1
4
4
4
4
5
5
6
7
©2015 FI Consulting. All rights reserved.
Bootstrapped Samples
Original Sample x
Bootstrap Sample 1
Bootstrap Sample 2
Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Obs 6 Obs 7 Obs 8 Obs 9
1
2
3
4
5
6
7
8
9
1
4
4
4
4
5
5
6
7
1
1
1
2
2
5
6
6
9
©2015 FI Consulting. All rights reserved.
Bootstrapped Samples
Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Obs 6 Obs 7 Obs 8 Obs 9
Original Sample x
1
2
3
4
5
6
7
8
9
Bootstrap Sample 1
1
4
4
4
4
5
5
6
7
Bootstrap Sample 2
1
1
1
2
2
5
6
6
9
Bootstrap Sample 3
3
3
3
4
4
4
6
6
9
Bootstrap Sample 4
1
3
4
5
5
5
6
7
8
Bootstrap Sample 5
2
3
6
6
6
6
8
8
9
Bootstrap Sample 6
1
3
4
4
6
6
7
8
8
Bootstrap Sample 7
1
1
2
5
5
5
6
8
9
Bootstrap Sample 8
1
1
3
3
3
3
6
8
9
Bootstrap Sample 9
1
1
2
3
4
5
6
8
9
Bootstrap Sample 10
1
2
4
4
6
7
7
7
8
©2015 FI Consulting. All rights reserved.
Bootstrapping the Median at R=10
Min.
1st Qu.
Median
Mean
3rd Qu.
Max.
2
4
4.5
4.5
5.75
6
©2015 FI Consulting. All rights reserved.
Bootstrapping the Median at R=1000
0%
1%
5%
10%
50%
90%
95%
99%
100%
1
2
3
3
5
7
7
8
8
©2015 FI Consulting. All rights reserved.
The Bootstrap


The Intuition
Applications in Model Validation
1.
2.

Stability of Regression Coefficients
Predictive Accuracy
Key Takeaways
©2015 FI Consulting. All rights reserved.
Stability of Regression Coefficients
The Data:
•
1,000 consumer credit profiles obtained from a German
bank
•
Binary response variable “credit” (Good / Bad)
•
20 covariates
Data and additional description may be found here:
https://onlinecourses.science.psu.edu/stat857/node/222
©2015 FI Consulting. All rights reserved.
Stability of Regression Coefficients
Logistic Regression: Probability of ‘Good’ Credit Profile
Estimate
Std. Error
z value
Pr(>|z|)
0.7901
0.2841
2.7813
0.0054
-0.0001
0.0000
-1.7244
0.0846
0.0137
0.0069
1.9960
0.0459
-0.0325
0.0075
-4.3174
0.0000
Personal.Male.Single
0.4640
0.1513
3.0680
0.0022
Purpose.UsedCar
1.2100
0.2912
4.1549
0.0000
Property.RealEstate
0.4897
0.1767
2.7723
0.0056
(Intercept)
Amount
Age
Duration
©2015 FI Consulting. All rights reserved.
Stability of Regression Coefficients
Logistic Regression: Probability of ‘Good’ Credit Profile
Estimate
Std. Error
z value
Pr(>|z|)
0.7901
0.2841
2.7813
0.0054
-0.0001
0.0000
-1.7244
0.0846
0.0137
0.0069
1.9960
0.0459
-0.0325
0.0075
-4.3174
0.0000
Personal.Male.Single
0.4640
0.1513
3.0680
0.0022
Purpose.UsedCar
1.2100
0.2912
4.1549
0.0000
Property.RealEstate
0.4897
0.1767
2.7723
0.0056
(Intercept)
Amount
Age
Duration
©2015 FI Consulting. All rights reserved.
Stability of Regression Coefficients
Logistic Regression: Probability of ‘Good’ Credit Profile
Estimate
Std. Error
z value
Pr(>|z|)
0.7901
0.2841
2.7813
0.0054
-0.0001
0.0000
-1.7244
0.0846
0.0137
0.0069
1.9960
0.0459
-0.0325
0.0075
-4.3174
<0.0001
Personal.Male.Single
0.4640
0.1513
3.0680
0.0022
Purpose.UsedCar
1.2100
0.2912
4.1549
<0.0001
Property.RealEstate
0.4897
0.1767
2.7723
0.0056
(Intercept)
Amount
Age
Duration
©2015 FI Consulting. All rights reserved.
Bootstrapped Regression Coefficients
Intercept Amount
Age
Personal
Male
Purpose Property
Duration Single UsedCar RealEstate
Full Sample
0.79008 -0.00006 0.01372 -0.03251 0.46405 1.20996
0.48974
Sample 1
0.50144 -0.00008 0.01665 -0.02573 0.59468 1.61972
0.34213
Sample 2
0.67629 -0.00009 0.02026 -0.03445 0.52069 1.28117
0.36390
Sample 3
0.62652 -0.00007 0.01803 -0.02821 0.42826 1.33093
0.52020
Sample 4
0.54848 -0.00009 0.01788 -0.02664 0.59014 1.59539
0.39089
Sample 5
0.89008 -0.00011 0.01960 -0.03864 0.60542 1.36213
0.37428
©2015 FI Consulting. All rights reserved.
Bootstrapped Regression Coefficients
Intercept Amount
Age
Personal
Male
Purpose Property
Duration Single UsedCar RealEstate
Full Sample
0.79008 -0.00006 0.01372 -0.03251 0.46405 1.20996
0.48974
Sample 1
0.50144 -0.00008 0.01665 -0.02573 0.59468 1.61972
0.34213
Sample 2
0.67629 -0.00009 0.02026 -0.03445 0.52069 1.28117
0.36390
Sample 3
0.62652 -0.00007 0.01803 -0.02821 0.42826 1.33093
0.52020
Sample 4
0.54848 -0.00009 0.01788 -0.02664 0.59014 1.59539
0.39089
Sample 5
0.89008 -0.00011 0.01960 -0.03864 0.60542 1.36213
0.37428
©2015 FI Consulting. All rights reserved.
Bootstrapped Regression Coefficients
Intercept Amount
Age
Personal
Male
Purpose Property
Duration Single UsedCar RealEstate
Full Sample
0.79008 -0.00006 0.01372 -0.03251 0.46405 1.20996
0.48974
Sample 1
0.50144 -0.00008 0.01665 -0.02573 0.59468 1.61972
0.34213
Sample 2
0.67629 -0.00009 0.02026 -0.03445 0.52069 1.28117
0.36390
Sample 3
0.62652 -0.00007 0.01803 -0.02821 0.42826 1.33093
0.52020
Sample 4
0.54848 -0.00009 0.01788 -0.02664 0.59014 1.59539
0.39089
Sample 5
0.89008 -0.00011 0.01960 -0.03864 0.60542 1.36213
0.37428
©2015 FI Consulting. All rights reserved.
Univariate Distribution of Bootstrapped Coefficients
©2015 FI Consulting. All rights reserved.
Univariate Distribution of Bootstrapped Coefficients
©2015 FI Consulting. All rights reserved.
Univariate Distribution of Bootstrapped Coefficients
©2015 FI Consulting. All rights reserved.
Univariate Distribution of Bootstrapped Coefficients
©2015 FI Consulting. All rights reserved.
Correlation Between Two Model Coefficients
©2015 FI Consulting. All rights reserved.
The Bootstrap


The Intuition
Applications in Model Validation
1.
2.

Stability of Regression Coefficients
Predictive Accuracy
Key Takeaways
©2015 FI Consulting. All rights reserved.
Prediction Accuracy Rate
Apparent:
•
Performance on same data used to train model (aka “insample” fit)
•
Optimistic indication of performance
•
Over-fitting - Model fine-tuned to training data. Testing on new
data may show disappointing results.
©2015 FI Consulting. All rights reserved.
Prediction Accuracy Rate
External:
•
Performance on related but slightly different population (aka
“out-of-sample”)
•
Sample from a nearby geography or from the next time period
©2015 FI Consulting. All rights reserved.
Prediction Accuracy Rate
Internal:
•
Performance on same population underlying the sample, but
with a twist
•
Indicates expected performance in other settings
•
Three common techniques:
•
•
•
Split Sample: Training / Testing
Cross-validation (repeated training/testing splits)
The Bootstrap (demonstrated below)
©2015 FI Consulting. All rights reserved.
©2015 FI Consulting. All rights reserved.
©2015 FI Consulting. All rights reserved.
Bootstrapped Prediction Accuracy
AUC on bootstrap
sample
Bootstrap Sample 1
0.689
Bootstrap Sample 2
0.685
Bootstrap Sample 3
0.691
Bootstrap Sample 4
0.707
Bootstrap Sample 5
0.68
Bootstrap Sample 6
0.671
Bootstrap Sample 7
0.689
Bootstrap Sample 8
0.688
Bootstrap Sample 9
0.705
Bootstrap Sample 10
0.714
AVERAGE
0.692
©2015 FI Consulting. All rights reserved.
Bootstrapped Prediction Accuracy
AUC on bootstrap
sample
AUC on original
sample
Bootstrap Sample 1
0.689
0.675
Bootstrap Sample 2
0.685
0.676
Bootstrap Sample 3
0.691
0.674
Bootstrap Sample 4
0.707
0.681
Bootstrap Sample 5
0.68
0.675
Bootstrap Sample 6
0.671
0.675
Bootstrap Sample 7
0.689
0.671
Bootstrap Sample 8
0.688
0.677
Bootstrap Sample 9
0.705
0.682
Bootstrap Sample 10
0.714
0.674
AVERAGE
0.692
0.676
©2015 FI Consulting. All rights reserved.
Bootstrapped Prediction Accuracy
AUC on bootstrap
sample
AUC on original
sample
Optimism
Bootstrap Sample 1
0.689
0.675
0.014
Bootstrap Sample 2
0.685
0.676
0.009
Bootstrap Sample 3
0.691
0.674
0.017
Bootstrap Sample 4
0.707
0.681
0.026
Bootstrap Sample 5
0.68
0.675
0.005
Bootstrap Sample 6
0.671
0.675
-0.005
Bootstrap Sample 7
0.689
0.671
0.017
Bootstrap Sample 8
0.688
0.677
0.012
Bootstrap Sample 9
0.705
0.682
0.023
Bootstrap Sample 10
0.714
0.674
0.04
AVERAGE
0.692
0.676
0.016
©2015 FI Consulting. All rights reserved.
Bootstrapped Estimate of Prediction Accuracy
Logistic Regression: Internally-Validated Prediction Accuracy
Apparent ‘in-sample’ AUC:
0.684
Optimism
-0.016
Bootstrapped AUC:
0.668
©2015 FI Consulting. All rights reserved.
The Bootstrap


The Intuition
Applications in Model Validation
1.
2.

Stability of Regression Coefficients
Predictive Accuracy
Key Takeaways
©2015 FI Consulting. All rights reserved.
Bootstrap Basics: Key Points

Estimator of (almost) any complexity
•
•
•
•
Goodness of fit (F-stat, R2, AIC)
Statistical significance of parameters (t-test and p-values)
Residuals diagnostics (normality, heteroscedasticity, outliers)
Prediction Accuracy (Mean Squared Errors, AUC,
Sensitivity/Specificity, Kolmogorov-Smirnov)
©2015 FI Consulting. All rights reserved.
Bootstrap Basics: Key Points

Estimator of (almost) any complexity
•
•
•
•

Goodness of fit (F-stat, R2, AIC)
Statistical significance of parameters (t-test and p-values)
Residuals diagnostics (normality, heteroscedasticity, outliers)
Prediction Accuracy (Mean Squared Errors, AUC,
Sensitivity/Specificity, Kolmogorov-Smirnov)
Sample is representative
©2015 FI Consulting. All rights reserved.
Bootstrap Basics: Key Points

Estimator of (almost) any complexity
•
•
•
•


Goodness of fit (F-stat, R2, AIC)
Statistical significance of parameters (t-test and p-values)
Residuals diagnostics (normality, heteroscedasticity, outliers)
Prediction Accuracy (Mean Squared Errors, AUC,
Sensitivity/Specificity, Kolmogorov-Smirnov)
Sample is representative
Observations are independent (no serial correlation)
©2015 FI Consulting. All rights reserved.
Bootstrap Basics: Key Points

Estimator of (almost) any complexity
•
•
•
•



Goodness of fit (F-stat, R2, AIC)
Statistical significance of parameters (t-test and p-values)
Residuals diagnostics (normality, heteroscedasticity, outliers)
Prediction Accuracy (Mean Squared Errors, AUC,
Sensitivity/Specificity, Kolmogorov-Smirnov)
Sample is representative
Observations are independent (no serial correlation)
Replaces assumption-based theoretical calculation, but
still requires thought, domain knowledge, and creativity
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
Computer-intensive – run on a cluster in parallel
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
•
Computer intensive – run on a cluster in parallel
Set random seed for reproducibility
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
•
•
Computer intensive – run on a cluster in parallel
Set random seed reproducibility
Start small. Validate carefully. Add complexity.
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
•
•
•
Computer intensive – run on a cluster in parallel
Set random seed reproducibility
Validate code carefully
SAS: a macro loop with SURVEYSELECT procedure
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
•
•
•
Computer intensive – run on a cluster in parallel
Set random seed reproducibility
Validate code carefully
SAS: a macro loop with SURVEYSELECT procedure
•
R:
•
•
•
•
Loop with a sample() function in Base package
Boot() function in car package
boot() function in boot package (parallel support)
bootstrap() function in bootstrap package
©2015 FI Consulting. All rights reserved.
Bootstrap Implementation Notes
•
•
•
•
•
Computer intensive – run on a cluster in parallel
Set random seed reproducibility
Validate code carefully
SAS: a macro loop with SURVEYSELECT procedure
R:
•
•
•
•
•
Loop with a sample() function in Base package
Boot() function in car package
boot() function in boot package (parallel support)
bootstrap() function in bootstrap package
Full code is posted here: rpubs.com/vadimus/bootstrap
©2015 FI Consulting. All rights reserved.
Q&A
Q&A
1
2
3
4 5
©2015 FI Consulting. All rights reserved.
Wrap-up
1
2
3
4 5
Wrap-up
©2015 FI Consulting. All rights reserved.
Wrap-up
Thank you for attending!
Contact FI Consulting at www.ficonsulting.com
Larry Roadcap, Practice Leader
[email protected]
Vadim Bondarenko, Senior Analyst
[email protected]
Robert Chang, Senior Analyst
[email protected]
©2015 FI Consulting. All rights reserved.