The bootstrap

Mathematics 243
February 20, 2014
1. The logical sequence of ideas:
(a) We estimate a parameter of a population by using a statistic computed from a sample.
(b) The problem: how big is the sampling error likely to be?
(c) The sampling distribution tells us the complete story of the sampling error.
(d) We are not omniscient – so we’ll never know the sampling distribution.
(e) We don’t need to know everything about the sampling distribution, only something about its variation.
(f) We need a model for the population.
2. Approach 1 (BC) – make a strong assumption about the nature of the population.
3. Approach 2 (AC) – the population looks like the sample.
4. The all-purpose bootstrap recipe:
do(boatloads) * statistic(model, data = resample(original_data))
100
100
80
80
Density
Density
r <- do(1000) * mean(~Mass, data = resample(dimes))
histogram(~result, data = r)
densityplot(~result, data = r)
60
40
60
40
20
20
0
0
2.245
2.250
2.255
2.260
2.265
2.270
2.275
2.25
result
5. The standard error of a statistic
sd(~result, data = r)
[1] 0.003949
Chapel: Lectio Divina, S. Bytwerk and R. Crow
2.26
result
2.27