Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland [email protected] 2005 R. Dahyot (TCD) 453 Modern statistical methods 2005 1 / 23 Introduction 1 Nonparametric bootstrap estimates 2 Example of failure of the nonparametric bootstrap estimate 3 Parametric Bootstrap 4 Resampling and Monte Carlo Sampling 5 The law school example R. Dahyot (TCD) 453 Modern statistical methods 2005 2 / 23 Non-Parametric Bootstrap Real World Bootstrap World F →x ⇒ F̂ → x∗ ↓ ↓ θ̂ θ̂∗ Figure: Unknown probability model F gives observed data x and we wish to know the accuracy of the statistic θ̂ = s(x) for estimating the parameter of interest θ = t(F ). No prior information is available on F , therefore F̂ is estimated from x as the empirical distribution function. Accuracy is inferred from observed variability of bootstrap replication θ̂∗ = s(x∗ ). R. Dahyot (TCD) 453 Modern statistical methods 2005 3 / 23 Convergence of the bootstrap estimates Example A F (x) = 0.2 N (µ=1,σ=2) + 0.8 N (µ=6,σ=1) x = (x1 , · · · , x100 ). 0.35 0.25 0.2 0.3 0.15 0.25 0.1 0.2 0.05 0 0.15 −0.05 0.1 −0.1 0.05 −0.15 −0.2 0 100 200 300 400 500 600 700 800 900 1000 0 0 100 200 d Bias 300 400 500 600 700 800 900 1000 se ˆB Figure: Bias and standard error bootstrap estimates w.r.t. B (4 experiments have been run). R. Dahyot (TCD) 453 Modern statistical methods 2005 4 / 23 Example of non-parametric bootstrap failure Example B Considering a sample x drawn from a uniform distribution F = U(0, θ = 1), the statistics of interest is θ̂ = max{x1 , · · · , xn }, and x=(0.5729,0.1873,0.5984,0.2883,0.8722, 0.4320,0.4896,0.7106,0.2754,0.7637). 700 600 500 400 300 200 100 0 0 0 0.1 0.2 0.3 0.4 Figure: 0.5 0.6 0.7 0.8 0.9 Histogram of the nonparametric bootstrap replications θ̂ ∗ with n = 10, B = 1000, θ̂ = 0.8722. The maximum peak is at θ̂ = 0.8722 with a probability of P(θ̂ ∈ x∗ ) = 0.6560 ≈ 1 − (1 − 1/n)n = 0.6513. R. Dahyot (TCD) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ^ 1 θ Figure: Theoretical results (extreme values) says that P(θ̂ ∗ ) = n (θ̂ ∗ )n−1 θ̂ n 453 Modern statistical methods . 2005 5 / 23 Example of non-parametric bootstrap failure Example B What went wrong in this example ? The empirical density function F̂ is not a good approximation of the true distribution F = U(0, θ). Either parametric knowledge of F or some smoothing of F̂ is needed to rectify matters. R. Dahyot (TCD) 453 Modern statistical methods 2005 6 / 23 Convergence of the bootstrap estimates With x = (x1 , · · · , xn ), n i.i.d. values, it is required: 1 Convergence of F̂ to F for n → ∞ (Glivenko-Cantelli lemma) 2 Estimate θ̂ = t(F̂ ) is the plug-in estimate of θ = t(F ) 3 Smoothness condition on the functional. E.g I I Smooth functionals: means, variance, etc. Not smooth: extreme order statistics (minimum, maximum) R. Dahyot (TCD) 453 Modern statistical methods 2005 7 / 23 Parametric Bootstrap Real World Prior : F ' N (µ, σ) → x ⇒ Estimation (x, σ̂) Bootstrap World F̂ ' N (x, σ̂) → x∗ ↓ ↓ θ̂ θ̂∗ Figure: Example of parametric Bootstrap. F is a normal distribution of unknown parameters (µ, σ). From the observed data x drawn from F , an estimation of the parameters is performed giving (x, σ̂). F̂ is then modelled by a normal distribution N (x, σ̂), from which bootstrap replications can be drawn x∗ . Accuracy is inferred from observed variability of bootstrap replication θ̂∗ = s(x∗ ). R. Dahyot (TCD) 453 Modern statistical methods 2005 8 / 23 Example with extreme value 350 300 Example B 250 We draw B = 1000 bootstrap replication of θ̂∗ = max{x∗ } using the parametric assumption U(0, θ̂). The extreme∗ value distribution is n−1 P(θ̂∗ ) = n (θ̂ )n . θ̂ R. Dahyot (TCD) 200 150 100 50 0 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ^ θ Figure: Histogram of the parametric bootstrap replications θ̂∗ with n = 10, B = 1000, θ̂ = 0.8722. 453 Modern statistical methods 2005 9 / 23 The Law school example School LSAT (X) GPA (Y) 1 576 3.39 2 635 3.30 3 558 2.81 4 578 3.03 5 666 3.44 6 580 3.07 7 555 3.00 School LSAT (X) GPA (Y) 9 651 3.36 10 605 3.13 11 653 3.12 12 575 2.74 13 545 2.76 14 572 2.88 15 594 2.96 8 661 3.43 Table: Results of law schools admission practice for the LSAT and GPA tests. It is believed that these scores are highly correlated. Compute the correlation and its standard error. R. Dahyot (TCD) 453 Modern statistical methods 2005 10 / 23 Correlation The correlation is defined : corr(X , Y ) = E[(X − E(X )) · (Y − E(Y ))] (E[(X − E(X ))2 ] · E[(Y − E(Y ))2 ])1/2 Its typical estimator is: Pn i=1 xi yi − n x y P corr(x, d y) = Pn 2 [ i=1 xi − nx 2 ]1/2 · [ ni=1 yi2 − ny 2 ]1/2 R. Dahyot (TCD) 453 Modern statistical methods 2005 11 / 23 The Law school example Parametric Bootstrap approach Assuming that F is a bivariate normale distribution, F̂norm is estimated by computing the mean z = (x, y ) and the covariance matrix b from the data. Σ Then B samples (x, y)∗ can be drawn from F̂par and the bootstrap estimate of the correlation coefficient can be performed. R. Dahyot (TCD) 4 3.8 3.6 3.4 3.2 3 2.8 2.6 2.4 2.2 2 500 520 453 Modern statistical methods 540 560 580 600 620 640 660 2005 680 700 12 / 23 The Law school example: Parametric Approach Prior model Assumption. F is a bivariate normal density function of the form: h i µ T −1 µ exp − (z−µ zF ) Σ2 (z−µ zF ) F (x, y ) = (2π)| det(Σ)|1/2 x with z = y Problem. The parameters, mean µ zF = (µxF , µyF ) and the covariance matrix Σ, are unknown. R. Dahyot (TCD) 453 Modern statistical methods 2005 13 / 23 The Law school example: Parametric Approach Estimate of the parametric p.d.f. The parametric p.d.f. is estimated by: −1 (z−z)T Σ (z−z) exp − 2 F̂par (x, y ) = (2π)| det(Σ)|1/2 Pn x Pn y i i Means are z = (x = i=1 , y = i=1 ). The covariance matrix is defined n n as: Pn Pn (xi − x)2 (xi − x)(yi − y ) i=1 i=1 1 Σ= Pn n − 1 Pn 2 i=1 (xi − x)(yi − y ) i=1 (yi − y ) The mean is z = (x = 600.3, y = 3.1) and Σ = R. Dahyot (TCD) 453 Modern statistical methods 1747 0.0079 0.0079 0.0001 2005 14 / 23 Parametric Bootstrap estimate of standard error 1 Using the prior assumption and the available observation x = (x1 , x2 , · · · , xn ), estimate F̂par 2 Instead of sampling with replacement from the data x, draw B samples x∗(b) of size n from F̂par 3 Evaluate the bootstrap replications: θ̂∗ (b) = s(x∗(b) ), 4 ∀b ∈ {1, · · · , B} Estimate the standard error seF (θ̂) by the standard deviation of the B replications: "P #1 B ∗ ∗ 2 2 b=1 [θ̂ (b) − θ̂ (·)] se ˆB = B −1 where θ̂∗ (·) = R. Dahyot (TCD) PB θ̂∗ (b) . B b=1 453 Modern statistical methods 2005 15 / 23 Parametric Bootstrap estimate of the bias 1 Using the prior assumption and the available observation x = (x1 , x2 , · · · , xn ), estimate F̂par 2 Instead of sampling with replacement from the data x, draw B samples x∗(b) of size n from F̂par 3 Evaluate the bootstrap replications: θ̂∗ (b) = s(x∗(b) ), 4 ∀b ∈ {1, · · · , B} Estimate the bias: d B = θ̂∗ (·) − θ̂ Bias where θ̂∗ (·) = R. Dahyot (TCD) PB θ̂∗ (b) . B b=1 453 Modern statistical methods 2005 16 / 23 Resampling and Monte Carlo Simulation In resampling, one could do all possible combinations, but it would be too time-consuming and computing-intensive. The alternative is Monte Carlo sampling, which restricts the resampling to a certain number. It is used in the computation of the bootstrap samples in the nonparametric case when a random index (i1 , · · · , in ) is simulated from the uniform distribution [1; n] (x∗ = (xi1 , · · · , xi1 )). The fundamental difference is that in the data could be totally hypothetical in Monte Carlo simulation, while in the resampling, the simulation must be based upon some real observation x = (x1 , · · · , xn ). In the parametric case, the bootstrap samples from F̂par are computed Monte Carlo methods and are not anymore resamples from x. R. Dahyot (TCD) 453 Modern statistical methods 2005 17 / 23 The Law school example: Parametric Bootstrap 700 600 500 400 For B = 3200 bootstrap replications, we compute corr d ∗ (·) = 0.7661 and the parametric bootstrap standard error 0.1169. 300 200 100 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Figure: Histogram of 3200 parametric bootstrap replication of corr corr(x d ∗ , y ∗ ). R. Dahyot (TCD) 453 Modern statistical methods 2005 18 / 23 The Law school example: Conclusion The textbook formula for the correlation coefficient is: √ seF = (1 − corr d 2 )/ n − 3 With corr(x, d y) = 0.7764, the standard error is seF = 0.1147. The non-parametric bootstrap standard error for B = 3200 is 0.132. The parametric bootstrap standard error for B = 3200 is 0.1169. R. Dahyot (TCD) 453 Modern statistical methods 2005 19 / 23 Parametric and nonparametric bootstrap estimates The nonparametric approach leads to a finite number of possible replications θ̂∗ (b). In fact considering n distinct values in x = (x1 , · · · , xn ), the maximum number of different bootstrap samples (and replications) is 1 : (2n − 1)! 2n − 1 Bmax = = n−1 n!(n − 1)! The parametric approach has an unlimited number of different bootstrap samples and replications. 1 n = 11, Bmax = 652716: big enough to minimize the effect of the discreteness in the nonparametric approach. R. Dahyot (TCD) 453 Modern statistical methods 2005 20 / 23 When might the (parametric and non parametric) bootstrap fail? Bootstrap might fail when: incomplete data (missing data) : incomplete observation x dependent data (e.g. correlated time series) x = (x1 , · · · , xn ) dependent dirty data (outliers) : noisy observation x For a critical view on bootstrap, see the publication Exploring the limits of bootstrap edited by Le Page and Billard 1990 (ISBN: 0-471-53631-8). R. Dahyot (TCD) 453 Modern statistical methods 2005 21 / 23 Conclusion In parametric bootstrap, F̂par is not anymore the empirical density function. If the prior information on F is accurate, then F̂par estimates better F than the empirical p.d.f.. In this case the parametric bootstrap gives better estimation for the standard errors. Most of the time, the point of making assumptions is to derive the textbook formulas. All models are wrong, but some are useful- G.E.P. Box, 1979. On the other hand, non-parametric bootstrap allows the computation of accurate standard errors (in many cases) without making any prior assumption. R. Dahyot (TCD) 453 Modern statistical methods 2005 22 / 23 Conclusion In non-parametric mode, the bootstrap method relieves the analyst from choosing a parametric assumption about the form of the underlying density function F . In both case, bootstrap can provide answers for problem for which no textbook formulae exists. R. Dahyot (TCD) 453 Modern statistical methods 2005 23 / 23
© Copyright 2026 Paperzz