THE STATISTICIAN’S PAGE Bootstrap Resampling Methods: Something for Nothing? Gary L. Grunkemeier, PhD, and YingXing Wu, MD Providence Health System, Portland, Oregon T he paper by Brunelli and colleagues [1] in this issue of The Annals of Thoracic Surgery used bootstrap resampling to select the final variables for a logistic regression model to predict air leak after pulmonary lobectomy. Bootstrapping is a generic methodology, whose implementation involves a simple yet powerful principle: creating many repeated data samples from the single one in hand, and making inference from those samples. The name derives from the expression “pulling oneself up by one’s bootstraps,” meaning no outside help (additional data, parametric assumptions) is involved. It seems almost magical, like getting something for nothing. Bootstrapping can be used for many statistical purposes besides variable selection. One of the most common is to compute a confidence interval (CI). As a demonstration of the technique, we will compute a bootstrap CI for the O/E (observed-to-expected) ratio of mortality after myocardial infarction. See page 1205 Background and Rationale Bootstrap methods are more than 20 years old [2], but they are computer-intensive and have only recently become widely available in statistical programs. Powerful and widely available computing capability is transforming statistics. Originally, computers just speeded up data handling and statistical computations. Now they enable exploratory analysis of gigantic data sets (data mining), creating neural network algorithms, analyzing complex genomic data, and supplementing traditional statistical methods using large numbers of data simulations. Simulation Typically, we collect a data sample and compute a statistic of interest, say the mean value of the individual data points. We realize there is variability associated with this estimate, such that if we, or others, repeated our experiment or data collection maneuvers, the estimate would be different. We need a measure of the variability of this statistic, say the standard deviation (SD) or a CI. The conventional method is to assume we know the distribution of the statistic of interest, often the familiar normal (bell-shaped) distribution. Even if the underlying popuAddress reprint requests to Dr Grunkemeier, Providence St. Vincent Hospital and Medical Center, 9205 SW Barnes Rd, Suite 33, Portland, OR 97225; e-mail: [email protected]. © 2004 by The Society of Thoracic Surgeons Published by Elsevier Inc lation’s distribution is not normal, the distribution of the statistic itself may be approximately normal if the sample is large enough. (Often, however, we do have a large enough sample for this asymptotic distribution to obtain.) Given the distribution, we can use the formula for its SD and apply it to our sample. But, if we knew the distribution of the statistic, we would not need any formulas for it. We could simply use the computer to determine them by repeated simulations, randomly making many selections from the distribution in question. Figure 1 confirms this assertion. The histograms represent four computer simulations with increasing sample sizes from the standard normal distribution (with mean ⫽ 0 and SD ⫽ 1). As the number of simulations increases, the distribution is approximated to as close a degree as desired. The simulation in the last panel produced a distribution with a mean of 0.003 and SD of 1.002. The formulas tell us that for the standard normal distribution, 95% of the values are between the critical values of ⫺1.960 and ⫹1.960 (this is where the “plus and minus 2 SD” for 95% limits comes from). The corresponding values from the last panel in Figure 1 are ⫺1.959 and ⫹1.959. We could make these even closer to the theoretical values by adding more simulations— on a 0.8-GHz laptop it took less than 0.5 seconds to generate 100,000 and less than 4 seconds to generate 1,000,000 — but this gives more precision than needed. Resampling This exercise confirms that fact that simulations can recreate critical values for any known distribution. But there are other situations in which we do not know or do not want to assume the distribution. In such cases we can use a simulation technique, similar to that demonstrated in Figure 1, called bootstrap resampling. Instead of generating observations from a known theoretical distribution as before, we generate observations from the distribution of the sample itself—the empirical distribution. Each simulation results in a new sample, typically of the same size as the original, by randomly selecting (with replacement) individuals from the original sample. With replacement means that at each step in the selection process, every individual from the original sample is again eligible to get selected, whether or not he has already been selected. Thus, in each bootstrap sample, some of the original individuals may not be represented and others may be represented more than once. Ann Thorac Surg 2004;77:1142– 4 • 0003-4975/04/$30.00 doi:10.1016/j.athoracsur.2004.01.005 Ann Thorac Surg 2004;77:1142– 4 THE STATISTICIAN’S PAGE GRUNKEMEIER AND WU BOOTSTRAP RESAMPLING METHODS 1143 Table 1. Observed and Expected Deaths of Patients With Myocardial Infarction With ST Elevation by Thrombolysis in Myocardial Infarction Risk Scores TIMI Score Fig 1. Simulation (random draws) from a population can reproduce the original distribution exactly, if enough draws are made. With increasing sample size, simulations from a standard normal distribution can approximate the underlying distribution to whatever accuracy is desired. Example: Confidence Interval for the Observedto-Expected Ratio A common risk-adjusted measure of mortality is the O/E ratio, the observed mortality (or number of deaths) divided by the expected mortality (or number of deaths) predicted by a risk model. If the O/E ratio is greater than 1, the observed mortality is higher than expected; if the O/E ratio is less than 1, the observed mortality is lower than expected. We will demonstrate the bootstrap method by deriving a 95% CI for the O/E ratio for mortality after myocardial infarction with ST elevation (STEMI). Several Providence Health System hospitals participate in the National Registry of Myocardial Infarction (NRMI), sponsored by Genentech Inc (South San Francisco, CA). From October 2002 through September 2003, nine Providence hospitals treated 913 STEMI patients, of which 105 died, for an observed in-hospital mortality of 11.50%. The expected mortality was derived from the NRMI national registry of 36,214 myocardial infarction patients from 1,288 participating hospitals. Table 1 contains data stratified by TIMI (thrombolysis in myocardial infarction) risk scores [3, 4]. For each level of risk (TIMI score), the expected number of deaths equals the number of patients in that level times the expected mortality (expressed as a decimal). The overall Providence Health System mortality was 11.50%, and the expected mortality was 11.04%, for an O/E ratio of 1.04 (11.50 divided by 11.04). Like all statistics, a point estimate, such as the O/E ratio, is of little value unless it is accompanied by an interval estimate, which measures its precision. If the 95% CI interval includes 1, then there is insufficient evidence to say that the ratio is statistically different from 1. In general, only if the lower confidence limit is greater than 1 should one conclude that the mortality is worse 0 1 2 3 4 5 6 7 8 9 10 11–14 NA Total Percent Patients 34 75 88 73 91 110 94 70 51 47 31 21 128 913 Observed Deaths 0 0 1 1 4 11 19 19 8 12 6 7 17 105 11.50% NRMI Mortality (%) 0.17 0.67 1.70 2.88 5.32 9.43 14.12 18.86 19.93 24.30 26.22 33.42 14.42 Expected Deaths 0.06 0.51 1.49 2.10 4.85 10.37 13.27 13.20 10.16 11.42 8.13 7.02 18.45 101.03 11.04% NA ⫽ not available; NRMI ⫽ National Registry of Myocardial Infarction; STEM ⫽ Myocardial infarction with ST elevation; TIMI ⫽ Thrombolysis in myocardial infarction. than expected, and only if the upper confidence limit is less than 1 should one conclude that the mortality is better than expected. (And even then, for many technical reasons, including the multiple-comparison problem, one should be cautious in such conclusions.) The CI protects us from overreacting to the observed data. Conventional Method The usual way to compute a CI uses a mathematical expression derived from assumptions about the underlying statistical distribution. For example, assume the statistic has a normal distribution, compute the SD of the statistic, and then use the fact that 95% of the values of a normal distribution are within ⫾1.96 SD of the mean. For the O/E ratio in our example, this method [5] gives a 95% CI of 0.86 to 1.22. This method has two shortcomings: the lower limit can become negative, and the CI is symmetric about the point estimate. But an O/E ratio cannot be negative, and its distribution is not symmetric. The smallest it can be is 0 (only if no deaths were observed), although it can range to an arbitrarily high value (when many more deaths than expected are observed). Thus, an alternative with better theoretical properties is based on a normal approximation to the logarithm of the O/E ratio [6], which produces an asymmetric, always-positive interval: 0.88 to 1.23 for the present example. Bootstrap Method To produce a bootstrap CI, the number of samples (B) to be generated from the original data set is specified, and for each sample the statistic of interest is computed. The range of values of the statistic is determined by the 1144 THE STATISTICIAN’S PAGE GRUNKEMEIER AND WU BOOTSTRAP RESAMPLING METHODS Ann Thorac Surg 2004;77:1142– 4 Comment The birth of probability theory is usually taken to be 1654, when a French gambler sought the help of mathematicians to determine the probabilities of a dice game [8]. He wanted to learn, using equations derived from the properties of the statistical distribution, what the long-term results would be of his random throwing of the dice. It is ironic that we have now come full circle, using random simulation methods to derive properties of the statistical distribution. In reference to their close association with gambling, techniques using this simulation approach are called Monte Carlo methods. Two widely referenced books that provide a thorough treatment of bootstrapping methods are suggested for further reading [9, 10]. Fig 2. Bootstrap confidence intervals for the observed-to-expected (O/E) ratio. Five intervals were produced for each of four increasing number of bootstrap samples (B). As B increases, the different iterations converge to the same size intervals, within as close a tolerance as desired. The solid horizontal line represents O/E ⫽ 1, or observed ⫽ expected. All of the confidence intervals include this line, so there is no evidence of mortality different than expected. The dashed horizontal lines represent the interval computed from the usual normal approximation formula. distribution of the observations in the original sample. The simplest bootstrap CI is simply the range within which 95% of these bootstrapped statistics fall. We used four increasing values of B, and for each value repeated the exercise five times, to determine the consistency of the resulting intervals (Fig 2). With B ⫽ 1,000, the endpoints of the interval were quite stable. These intervals were produced using the percentile method, Several transformations have been proposed for improving various properties of the bootstrap CI [7]; some of these were tried but gave similar results. Comparison In our example, the bootstrap intervals, after only 1,000 resamplings, produced CIs similar to the normal approximation method. The advantage is that no distributional assumptions are needed, particularly important in complex statistics in which such assumptions may be difficult. Interestingly, of the two conventional intervals, the more usual method (indicated by dashed horizontal lines in Fig 2), not the one involving a logarithmic transformation and possessing theoretical advantages, corresponded better to the bootstrap intervals. Patient data were supplied by the following hospitals: Providence Anchorage Medical Center (Anchorage, AK), Providence Everett Medical Center (Everett, WA), Providence Portland Medical Center (Portland, OR), Providence St. Vincent Medical Center (Portland, OR), Providence Milwaukie Hospital (Milwaukie, OR), Providence Newberg Hospital (Newberg, OR), Providence Seaside Hospital (Seaside, OR), Providence Medford Medical Center (Medford, OR), and Little Company of Mary Hospital (Torrance, CA). Data management for NRMI is provided by STATPROBE, Inc (Ann Arbor, MI), who kindly provided the stratified data shown in the table. References 1. Brunelli A, Monteverde M, Borri A, Salati M, Marasco RD, Fianchini A. Predictors of prolonged air leak after pulmonary lobectomy. Ann Thorac Surg 2004;77:1205–10. 2. Efron B. Bootstrap methods: another look at the jackknife. Ann Statist 1979;7:1–26. 3. Morrow DA, Antman EM, Charlesworth A, et al. TIMI risk score for ST-elevation myocardial infarction. A convenient, bedside, clinical score for risk assessment at presentation: an intravenous nPA for treatment of infarcting myocardium early II trial substudy. Circulation 2000;102:2031–7. 4. Morrow DA, Antman EM, Parsons L, et al. Application of the TIMI risk score for ST-elevation MI in the National Registry of Myocardial Infarction 3. JAMA 2001;286:1356 –9. 5. Shwartz M, Ash AS, Iezzoni LI. Comparing outcomes across providers. In: Iezzoni LI, ed. Risk adjustment for measuring healthcare outcomes. Chicago: Health Administration Press, 1997:471–516. 6. Hosmer DW, Lemeshow S. Confidence interval estimates of an index of quality performance based on logistic regression models. Stat Med 1995;14:2161–72. 7. Carpenter J, Bithell J, Swift MB. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat Med 2000;19:1141–64. 8. Weaver W. Lady luck, the theory of probability. New York: Anchor Books, Doubleday and Co, 1963. 9. Efron B, Tibshirani R. An introduction to the bootstrap. New York: Chapman and Hall, 1993. 10. Davison AC, Hinkley DV. Bootstrap methods and their application. New York: Cambridge University Press, 1997.
© Copyright 2025 Paperzz