453 Modern statistical methods - School of Computer Science and

Bootstrap, Jackknife and other resampling methods
Part III: Parametric Bootstrap
Rozenn Dahyot
Room 128, Department of Statistics
Trinity College Dublin, Ireland
[email protected]
2005
R. Dahyot (TCD)
453 Modern statistical methods
2005
1 / 23
Introduction
1
Nonparametric bootstrap estimates
2
Example of failure of the nonparametric bootstrap estimate
3
Parametric Bootstrap
4
Resampling and Monte Carlo Sampling
5
The law school example
R. Dahyot (TCD)
453 Modern statistical methods
2005
2 / 23
Non-Parametric Bootstrap
Real World
Bootstrap World
F →x ⇒
F̂ → x∗
↓
↓
θ̂
θ̂∗
Figure: Unknown probability model F gives observed data x and we wish to know
the accuracy of the statistic θ̂ = s(x) for estimating the parameter of interest
θ = t(F ). No prior information is available on F , therefore F̂ is estimated from x
as the empirical distribution function. Accuracy is inferred from observed
variability of bootstrap replication θ̂∗ = s(x∗ ).
R. Dahyot (TCD)
453 Modern statistical methods
2005
3 / 23
Convergence of the bootstrap estimates
Example A
F (x) = 0.2 N (µ=1,σ=2) + 0.8 N (µ=6,σ=1)
x = (x1 , · · · , x100 ).
0.35
0.25
0.2
0.3
0.15
0.25
0.1
0.2
0.05
0
0.15
−0.05
0.1
−0.1
0.05
−0.15
−0.2
0
100
200
300
400
500
600
700
800
900
1000
0
0
100
200
d
Bias
300
400
500
600
700
800
900
1000
se
ˆB
Figure: Bias and standard error bootstrap estimates w.r.t. B (4 experiments have
been run).
R. Dahyot (TCD)
453 Modern statistical methods
2005
4 / 23
Example of non-parametric bootstrap failure
Example B
Considering a sample x drawn from a uniform distribution
F = U(0, θ = 1), the statistics of interest is θ̂ = max{x1 , · · · , xn }, and
x=(0.5729,0.1873,0.5984,0.2883,0.8722, 0.4320,0.4896,0.7106,0.2754,0.7637).
700
600
500
400
300
200
100
0
0
0
0.1
0.2
0.3
0.4
Figure:
0.5
0.6
0.7
0.8
0.9
Histogram of the nonparametric bootstrap
replications θ̂ ∗ with n = 10, B = 1000, θ̂ = 0.8722.
The maximum peak is at θ̂ = 0.8722 with a probability
of P(θ̂ ∈ x∗ ) = 0.6560 ≈ 1 − (1 − 1/n)n = 0.6513.
R. Dahyot (TCD)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
^
1
θ
Figure:
Theoretical results (extreme values) says that
P(θ̂ ∗ ) = n
(θ̂ ∗ )n−1
θ̂ n
453 Modern statistical methods
.
2005
5 / 23
Example of non-parametric bootstrap failure
Example B
What went wrong in this example ?
The empirical density function F̂ is not a good approximation of the
true distribution F = U(0, θ).
Either parametric knowledge of F or some smoothing of F̂ is needed
to rectify matters.
R. Dahyot (TCD)
453 Modern statistical methods
2005
6 / 23
Convergence of the bootstrap estimates
With x = (x1 , · · · , xn ), n i.i.d. values, it is required:
1
Convergence of F̂ to F for n → ∞ (Glivenko-Cantelli lemma)
2
Estimate θ̂ = t(F̂ ) is the plug-in estimate of θ = t(F )
3
Smoothness condition on the functional. E.g
I
I
Smooth functionals: means, variance, etc.
Not smooth: extreme order statistics (minimum, maximum)
R. Dahyot (TCD)
453 Modern statistical methods
2005
7 / 23
Parametric Bootstrap
Real World
Prior : F ' N (µ, σ) → x ⇒
Estimation
(x, σ̂)
Bootstrap World
F̂ ' N (x, σ̂) → x∗
↓
↓
θ̂
θ̂∗
Figure: Example of parametric Bootstrap. F is a normal distribution of unknown
parameters (µ, σ). From the observed data x drawn from F , an estimation of the
parameters is performed giving (x, σ̂). F̂ is then modelled by a normal
distribution N (x, σ̂), from which bootstrap replications can be drawn x∗ .
Accuracy is inferred from observed variability of bootstrap replication θ̂∗ = s(x∗ ).
R. Dahyot (TCD)
453 Modern statistical methods
2005
8 / 23
Example with extreme value
350
300
Example B
250
We draw B = 1000 bootstrap
replication of θ̂∗ = max{x∗ }
using the parametric assumption
U(0, θ̂).
The extreme∗ value
distribution is
n−1
P(θ̂∗ ) = n (θ̂ )n .
θ̂
R. Dahyot (TCD)
200
150
100
50
0
−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
^
θ
Figure: Histogram of the parametric
bootstrap replications θ̂∗ with
n = 10, B = 1000, θ̂ = 0.8722.
453 Modern statistical methods
2005
9 / 23
The Law school example
School
LSAT (X)
GPA (Y)
1
576
3.39
2
635
3.30
3
558
2.81
4
578
3.03
5
666
3.44
6
580
3.07
7
555
3.00
School
LSAT (X)
GPA (Y)
9
651
3.36
10
605
3.13
11
653
3.12
12
575
2.74
13
545
2.76
14
572
2.88
15
594
2.96
8
661
3.43
Table: Results of law schools admission practice for the LSAT and GPA tests. It
is believed that these scores are highly correlated. Compute the correlation and
its standard error.
R. Dahyot (TCD)
453 Modern statistical methods
2005
10 / 23
Correlation
The correlation is defined :
corr(X , Y ) =
E[(X − E(X )) · (Y − E(Y ))]
(E[(X − E(X ))2 ] · E[(Y − E(Y ))2 ])1/2
Its typical estimator is:
Pn
i=1 xi yi − n x y
P
corr(x,
d y) = Pn
2
[ i=1 xi − nx 2 ]1/2 · [ ni=1 yi2 − ny 2 ]1/2
R. Dahyot (TCD)
453 Modern statistical methods
2005
11 / 23
The Law school example
Parametric Bootstrap approach
Assuming that F is a bivariate
normale distribution, F̂norm is
estimated by computing the mean
z = (x, y ) and the covariance matrix
b from the data.
Σ
Then B samples (x, y)∗ can be
drawn from F̂par and the bootstrap
estimate of the correlation coefficient
can be performed.
R. Dahyot (TCD)
4
3.8
3.6
3.4
3.2
3
2.8
2.6
2.4
2.2
2
500
520
453 Modern statistical methods
540
560
580
600
620
640
660
2005
680
700
12 / 23
The Law school example: Parametric Approach
Prior model
Assumption. F is a bivariate normal density function of the form:
h
i
µ T −1
µ
exp − (z−µ zF ) Σ2 (z−µ zF )
F (x, y ) =
(2π)| det(Σ)|1/2
x
with z =
y
Problem. The parameters, mean µ zF = (µxF , µyF ) and the
covariance matrix Σ, are unknown.
R. Dahyot (TCD)
453 Modern statistical methods
2005
13 / 23
The Law school example: Parametric Approach
Estimate of the parametric p.d.f.
The parametric p.d.f. is estimated by:
−1
(z−z)T Σ (z−z)
exp −
2
F̂par (x, y ) =
(2π)| det(Σ)|1/2
Pn
x
Pn
y
i
i
Means are z = (x = i=1
, y = i=1
). The covariance matrix is defined
n
n
as:


Pn
Pn
(xi − x)2
(xi − x)(yi − y )
i=1
i=1
1 

Σ=
Pn
n − 1 Pn
2
i=1 (xi − x)(yi − y )
i=1 (yi − y )
The mean is z = (x = 600.3, y = 3.1) and Σ =
R. Dahyot (TCD)
453 Modern statistical methods
1747 0.0079
0.0079 0.0001
2005
14 / 23
Parametric Bootstrap estimate of standard error
1
Using the prior assumption and the available observation
x = (x1 , x2 , · · · , xn ), estimate F̂par
2
Instead of sampling with replacement from the data x, draw B
samples x∗(b) of size n from F̂par
3
Evaluate the bootstrap replications:
θ̂∗ (b) = s(x∗(b) ),
4
∀b ∈ {1, · · · , B}
Estimate the standard error seF (θ̂) by the standard deviation of the B
replications:
"P
#1
B
∗
∗
2 2
b=1 [θ̂ (b) − θ̂ (·)]
se
ˆB =
B −1
where θ̂∗ (·) =
R. Dahyot (TCD)
PB
θ̂∗ (b)
.
B
b=1
453 Modern statistical methods
2005
15 / 23
Parametric Bootstrap estimate of the bias
1
Using the prior assumption and the available observation
x = (x1 , x2 , · · · , xn ), estimate F̂par
2
Instead of sampling with replacement from the data x, draw B
samples x∗(b) of size n from F̂par
3
Evaluate the bootstrap replications:
θ̂∗ (b) = s(x∗(b) ),
4
∀b ∈ {1, · · · , B}
Estimate the bias:
d B = θ̂∗ (·) − θ̂
Bias
where θ̂∗ (·) =
R. Dahyot (TCD)
PB
θ̂∗ (b)
.
B
b=1
453 Modern statistical methods
2005
16 / 23
Resampling and Monte Carlo Simulation
In resampling, one could do all possible combinations, but it would be
too time-consuming and computing-intensive.
The alternative is Monte Carlo sampling, which restricts the
resampling to a certain number. It is used in the computation of the
bootstrap samples in the nonparametric case when a random index
(i1 , · · · , in ) is simulated from the uniform distribution [1; n]
(x∗ = (xi1 , · · · , xi1 )).
The fundamental difference is that in the data could be totally
hypothetical in Monte Carlo simulation, while in the resampling, the
simulation must be based upon some real observation
x = (x1 , · · · , xn ). In the parametric case, the bootstrap samples from
F̂par are computed Monte Carlo methods and are not anymore
resamples from x.
R. Dahyot (TCD)
453 Modern statistical methods
2005
17 / 23
The Law school example: Parametric Bootstrap
700
600
500
400
For B = 3200 bootstrap replications,
we compute corr
d ∗ (·) = 0.7661 and
the parametric bootstrap standard
error 0.1169.
300
200
100
0
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Figure: Histogram of 3200 parametric
bootstrap replication of corr
corr(x
d ∗ , y ∗ ).
R. Dahyot (TCD)
453 Modern statistical methods
2005
18 / 23
The Law school example: Conclusion
The textbook formula for the correlation coefficient is:
√
seF = (1 − corr
d 2 )/ n − 3
With corr(x,
d y) = 0.7764, the standard error is seF = 0.1147.
The non-parametric bootstrap standard error for B = 3200 is 0.132.
The parametric bootstrap standard error for B = 3200 is 0.1169.
R. Dahyot (TCD)
453 Modern statistical methods
2005
19 / 23
Parametric and nonparametric bootstrap estimates
The nonparametric approach leads to a finite number of possible
replications θ̂∗ (b). In fact considering n distinct values in
x = (x1 , · · · , xn ), the maximum number of different bootstrap
samples (and replications) is 1 :
(2n − 1)!
2n − 1
Bmax =
=
n−1
n!(n − 1)!
The parametric approach has an unlimited number of different
bootstrap samples and replications.
1
n = 11, Bmax = 652716: big enough to minimize the effect of the discreteness in the
nonparametric approach.
R. Dahyot (TCD)
453 Modern statistical methods
2005
20 / 23
When might the (parametric and non parametric)
bootstrap fail?
Bootstrap might fail when:
incomplete data (missing data) : incomplete observation x
dependent data (e.g. correlated time series) x = (x1 , · · · , xn )
dependent
dirty data (outliers) : noisy observation x
For a critical view on bootstrap, see the publication Exploring the limits of
bootstrap edited by Le Page and Billard 1990 (ISBN: 0-471-53631-8).
R. Dahyot (TCD)
453 Modern statistical methods
2005
21 / 23
Conclusion
In parametric bootstrap, F̂par is not anymore the empirical density
function.
If the prior information on F is accurate, then F̂par estimates better F
than the empirical p.d.f.. In this case the parametric bootstrap gives
better estimation for the standard errors.
Most of the time, the point of making assumptions is to derive the
textbook formulas.
All models are wrong, but some are useful- G.E.P. Box, 1979.
On the other hand, non-parametric bootstrap allows the computation
of accurate standard errors (in many cases) without making any prior
assumption.
R. Dahyot (TCD)
453 Modern statistical methods
2005
22 / 23
Conclusion
In non-parametric mode, the bootstrap method relieves the analyst
from choosing a parametric assumption about the form of the
underlying density function F .
In both case, bootstrap can provide answers for problem for which no
textbook formulae exists.
R. Dahyot (TCD)
453 Modern statistical methods
2005
23 / 23