Bootstrap Resampling Methods: Something for Nothing?

THE STATISTICIAN’S PAGE
Bootstrap Resampling Methods: Something for
Nothing?
Gary L. Grunkemeier, PhD, and YingXing Wu, MD
Providence Health System, Portland, Oregon
T
he paper by Brunelli and colleagues [1] in this issue
of The Annals of Thoracic Surgery used bootstrap
resampling to select the final variables for a logistic
regression model to predict air leak after pulmonary
lobectomy. Bootstrapping is a generic methodology,
whose implementation involves a simple yet powerful
principle: creating many repeated data samples from the
single one in hand, and making inference from those
samples. The name derives from the expression “pulling
oneself up by one’s bootstraps,” meaning no outside help
(additional data, parametric assumptions) is involved. It
seems almost magical, like getting something for nothing. Bootstrapping can be used for many statistical purposes besides variable selection. One of the most common is to compute a confidence interval (CI). As a
demonstration of the technique, we will compute a bootstrap CI for the O/E (observed-to-expected) ratio of
mortality after myocardial infarction.
See page 1205
Background and Rationale
Bootstrap methods are more than 20 years old [2], but
they are computer-intensive and have only recently become widely available in statistical programs. Powerful
and widely available computing capability is transforming statistics. Originally, computers just speeded up data
handling and statistical computations. Now they enable
exploratory analysis of gigantic data sets (data mining),
creating neural network algorithms, analyzing complex
genomic data, and supplementing traditional statistical
methods using large numbers of data simulations.
Simulation
Typically, we collect a data sample and compute a statistic of interest, say the mean value of the individual data
points. We realize there is variability associated with this
estimate, such that if we, or others, repeated our experiment or data collection maneuvers, the estimate would
be different. We need a measure of the variability of this
statistic, say the standard deviation (SD) or a CI. The
conventional method is to assume we know the distribution of the statistic of interest, often the familiar normal
(bell-shaped) distribution. Even if the underlying popuAddress reprint requests to Dr Grunkemeier, Providence St. Vincent
Hospital and Medical Center, 9205 SW Barnes Rd, Suite 33, Portland,
OR 97225; e-mail: [email protected].
© 2004 by The Society of Thoracic Surgeons
Published by Elsevier Inc
lation’s distribution is not normal, the distribution of the
statistic itself may be approximately normal if the sample
is large enough. (Often, however, we do have a large
enough sample for this asymptotic distribution to obtain.) Given the distribution, we can use the formula for
its SD and apply it to our sample.
But, if we knew the distribution of the statistic, we
would not need any formulas for it. We could simply use
the computer to determine them by repeated simulations, randomly making many selections from the distribution in question. Figure 1 confirms this assertion. The
histograms represent four computer simulations with
increasing sample sizes from the standard normal distribution (with mean ⫽ 0 and SD ⫽ 1). As the number of
simulations increases, the distribution is approximated to
as close a degree as desired. The simulation in the last
panel produced a distribution with a mean of 0.003 and
SD of 1.002. The formulas tell us that for the standard
normal distribution, 95% of the values are between the
critical values of ⫺1.960 and ⫹1.960 (this is where the
“plus and minus 2 SD” for 95% limits comes from). The
corresponding values from the last panel in Figure 1 are
⫺1.959 and ⫹1.959. We could make these even closer to
the theoretical values by adding more simulations— on a
0.8-GHz laptop it took less than 0.5 seconds to generate
100,000 and less than 4 seconds to generate 1,000,000 —
but this gives more precision than needed.
Resampling
This exercise confirms that fact that simulations can
recreate critical values for any known distribution. But
there are other situations in which we do not know or do
not want to assume the distribution. In such cases we can
use a simulation technique, similar to that demonstrated
in Figure 1, called bootstrap resampling. Instead of generating observations from a known theoretical distribution as before, we generate observations from the distribution of the sample itself—the empirical distribution.
Each simulation results in a new sample, typically of the
same size as the original, by randomly selecting (with
replacement) individuals from the original sample. With
replacement means that at each step in the selection
process, every individual from the original sample is
again eligible to get selected, whether or not he has
already been selected. Thus, in each bootstrap sample,
some of the original individuals may not be represented
and others may be represented more than once.
Ann Thorac Surg 2004;77:1142– 4 • 0003-4975/04/$30.00
doi:10.1016/j.athoracsur.2004.01.005
Ann Thorac Surg
2004;77:1142– 4
THE STATISTICIAN’S PAGE
GRUNKEMEIER AND WU
BOOTSTRAP RESAMPLING METHODS
1143
Table 1. Observed and Expected Deaths of Patients With
Myocardial Infarction With ST Elevation by Thrombolysis
in Myocardial Infarction Risk Scores
TIMI
Score
Fig 1. Simulation (random draws) from a population can reproduce
the original distribution exactly, if enough draws are made. With
increasing sample size, simulations from a standard normal distribution can approximate the underlying distribution to whatever accuracy is desired.
Example: Confidence Interval for the Observedto-Expected Ratio
A common risk-adjusted measure of mortality is the O/E
ratio, the observed mortality (or number of deaths) divided by the expected mortality (or number of deaths)
predicted by a risk model. If the O/E ratio is greater than
1, the observed mortality is higher than expected; if the
O/E ratio is less than 1, the observed mortality is lower
than expected.
We will demonstrate the bootstrap method by deriving
a 95% CI for the O/E ratio for mortality after myocardial
infarction with ST elevation (STEMI). Several Providence
Health System hospitals participate in the National Registry of Myocardial Infarction (NRMI), sponsored by
Genentech Inc (South San Francisco, CA). From October
2002 through September 2003, nine Providence hospitals
treated 913 STEMI patients, of which 105 died, for an
observed in-hospital mortality of 11.50%. The expected
mortality was derived from the NRMI national registry of
36,214 myocardial infarction patients from 1,288 participating hospitals. Table 1 contains data stratified by TIMI
(thrombolysis in myocardial infarction) risk scores [3, 4].
For each level of risk (TIMI score), the expected number
of deaths equals the number of patients in that level
times the expected mortality (expressed as a decimal).
The overall Providence Health System mortality was
11.50%, and the expected mortality was 11.04%, for an
O/E ratio of 1.04 (11.50 divided by 11.04).
Like all statistics, a point estimate, such as the O/E
ratio, is of little value unless it is accompanied by an
interval estimate, which measures its precision. If the
95% CI interval includes 1, then there is insufficient
evidence to say that the ratio is statistically different from
1. In general, only if the lower confidence limit is greater
than 1 should one conclude that the mortality is worse
0
1
2
3
4
5
6
7
8
9
10
11–14
NA
Total
Percent
Patients
34
75
88
73
91
110
94
70
51
47
31
21
128
913
Observed
Deaths
0
0
1
1
4
11
19
19
8
12
6
7
17
105
11.50%
NRMI
Mortality
(%)
0.17
0.67
1.70
2.88
5.32
9.43
14.12
18.86
19.93
24.30
26.22
33.42
14.42
Expected
Deaths
0.06
0.51
1.49
2.10
4.85
10.37
13.27
13.20
10.16
11.42
8.13
7.02
18.45
101.03
11.04%
NA ⫽ not available;
NRMI ⫽ National Registry of Myocardial Infarction;
STEM ⫽ Myocardial infarction with ST elevation;
TIMI ⫽
Thrombolysis in myocardial infarction.
than expected, and only if the upper confidence limit is
less than 1 should one conclude that the mortality is
better than expected. (And even then, for many technical
reasons, including the multiple-comparison problem,
one should be cautious in such conclusions.) The CI
protects us from overreacting to the observed data.
Conventional Method
The usual way to compute a CI uses a mathematical
expression derived from assumptions about the underlying statistical distribution. For example, assume the
statistic has a normal distribution, compute the SD of the
statistic, and then use the fact that 95% of the values of a
normal distribution are within ⫾1.96 SD of the mean. For
the O/E ratio in our example, this method [5] gives a 95%
CI of 0.86 to 1.22. This method has two shortcomings: the
lower limit can become negative, and the CI is symmetric
about the point estimate. But an O/E ratio cannot be
negative, and its distribution is not symmetric. The smallest it can be is 0 (only if no deaths were observed),
although it can range to an arbitrarily high value (when
many more deaths than expected are observed). Thus, an
alternative with better theoretical properties is based on
a normal approximation to the logarithm of the O/E ratio
[6], which produces an asymmetric, always-positive interval: 0.88 to 1.23 for the present example.
Bootstrap Method
To produce a bootstrap CI, the number of samples (B) to
be generated from the original data set is specified, and
for each sample the statistic of interest is computed. The
range of values of the statistic is determined by the
1144
THE STATISTICIAN’S PAGE
GRUNKEMEIER AND WU
BOOTSTRAP RESAMPLING METHODS
Ann Thorac Surg
2004;77:1142– 4
Comment
The birth of probability theory is usually taken to be 1654,
when a French gambler sought the help of mathematicians to determine the probabilities of a dice game [8]. He
wanted to learn, using equations derived from the properties of the statistical distribution, what the long-term
results would be of his random throwing of the dice. It is
ironic that we have now come full circle, using random
simulation methods to derive properties of the statistical
distribution. In reference to their close association with
gambling, techniques using this simulation approach are
called Monte Carlo methods. Two widely referenced
books that provide a thorough treatment of bootstrapping methods are suggested for further reading [9, 10].
Fig 2. Bootstrap confidence intervals for the observed-to-expected
(O/E) ratio. Five intervals were produced for each of four increasing
number of bootstrap samples (B). As B increases, the different iterations converge to the same size intervals, within as close a tolerance
as desired. The solid horizontal line represents O/E ⫽ 1, or observed ⫽ expected. All of the confidence intervals include this line,
so there is no evidence of mortality different than expected. The
dashed horizontal lines represent the interval computed from the
usual normal approximation formula.
distribution of the observations in the original sample.
The simplest bootstrap CI is simply the range within
which 95% of these bootstrapped statistics fall. We used
four increasing values of B, and for each value repeated
the exercise five times, to determine the consistency of
the resulting intervals (Fig 2). With B ⫽ 1,000, the endpoints of the interval were quite stable. These intervals
were produced using the percentile method, Several
transformations have been proposed for improving various properties of the bootstrap CI [7]; some of these were
tried but gave similar results.
Comparison
In our example, the bootstrap intervals, after only 1,000
resamplings, produced CIs similar to the normal approximation method. The advantage is that no distributional
assumptions are needed, particularly important in complex statistics in which such assumptions may be difficult.
Interestingly, of the two conventional intervals, the more
usual method (indicated by dashed horizontal lines in Fig
2), not the one involving a logarithmic transformation
and possessing theoretical advantages, corresponded
better to the bootstrap intervals.
Patient data were supplied by the following hospitals: Providence Anchorage Medical Center (Anchorage, AK), Providence
Everett Medical Center (Everett, WA), Providence Portland
Medical Center (Portland, OR), Providence St. Vincent Medical
Center (Portland, OR), Providence Milwaukie Hospital (Milwaukie, OR), Providence Newberg Hospital (Newberg, OR),
Providence Seaside Hospital (Seaside, OR), Providence Medford
Medical Center (Medford, OR), and Little Company of Mary
Hospital (Torrance, CA). Data management for NRMI is provided by STATPROBE, Inc (Ann Arbor, MI), who kindly provided the stratified data shown in the table.
References
1. Brunelli A, Monteverde M, Borri A, Salati M, Marasco RD,
Fianchini A. Predictors of prolonged air leak after pulmonary lobectomy. Ann Thorac Surg 2004;77:1205–10.
2. Efron B. Bootstrap methods: another look at the jackknife.
Ann Statist 1979;7:1–26.
3. Morrow DA, Antman EM, Charlesworth A, et al. TIMI risk
score for ST-elevation myocardial infarction. A convenient,
bedside, clinical score for risk assessment at presentation: an
intravenous nPA for treatment of infarcting myocardium
early II trial substudy. Circulation 2000;102:2031–7.
4. Morrow DA, Antman EM, Parsons L, et al. Application of the
TIMI risk score for ST-elevation MI in the National Registry
of Myocardial Infarction 3. JAMA 2001;286:1356 –9.
5. Shwartz M, Ash AS, Iezzoni LI. Comparing outcomes across
providers. In: Iezzoni LI, ed. Risk adjustment for measuring
healthcare outcomes. Chicago: Health Administration Press,
1997:471–516.
6. Hosmer DW, Lemeshow S. Confidence interval estimates of
an index of quality performance based on logistic regression
models. Stat Med 1995;14:2161–72.
7. Carpenter J, Bithell J, Swift MB. Bootstrap confidence intervals: when, which, what? A practical guide for medical
statisticians. Stat Med 2000;19:1141–64.
8. Weaver W. Lady luck, the theory of probability. New York:
Anchor Books, Doubleday and Co, 1963.
9. Efron B, Tibshirani R. An introduction to the bootstrap. New
York: Chapman and Hall, 1993.
10. Davison AC, Hinkley DV. Bootstrap methods and their
application. New York: Cambridge University Press, 1997.

Download Report

Bootstrap Resampling Methods: Something for Nothing?

Paperzz.com

Your Paperzz