P e e r R e v i e w e d : S tat i s t i c s Why Xbar ± 3S is not a Universal Solution Lynn Torbeck The one fact that most people remember incorrectly from that long ago statistics class is the rule of thumb Xbar ± 3S. Thus, we have the incorrect but common practice of using the sample average plus and minus three times the sample standard deviation as a solution for finding confidence intervals, setting alert criteria, identifying outliers, statistical significance, and other complex statistical questions. While applied statistics needs to be pragmatic, it cannot be incorrect. Good science and current good manufacturing practice (CGMP) regulations demand that correct tools be used for a given problem. Using Xbar ± 3S in the above topics is incorrect statistically, particularly so for small sample sizes. Given the wide spread use and abuse of Xbar ± 3S, it seems the topic needs to be clarified. This paper addresses the misuse of Xbar ± 3S and compares it to confidence intervals, tolerance intervals, control charts, and Cpk. INTRODUCTION Picture yourself setting in the office of Robert, the Vice President for production, with Roger, the VP for quality assurance (QA), during an investigation for a potential recall due to a near potency failure. The conversation comes around to the specification criteria and how it was set. Tom, a staff member, explains to the VPs that the specification criterion was set according to the GMPs, specifically: 21CFR211.110(b) “Valid in-process specifications for such characteristics shall be consistent with drug product final specifications and shall be derived from previous acceptable process average and process variability (standard deviation) estimates where possible and determined by the application of suitablestatisticalprocedures[Emphasisadded]whenappropriate” (1). Tom goes on to say that they had the average and standard deviation from the 12 validation lots and that they set the limits using the average plus and minus three times the standard deviation. At this point David, the QA statistician joins the conversation asking, “Why did you use that?’ Tom replies, “Well in the statistics course I took, the professor said that the mean plus and minus three standard deviations brackets 99.73% of the values.” David says, “Yes, that is true in theory, but only if you know the true population mean, mu (μ), and the true population standard deviation, sigma (σ), which we almost never do. Here we must estimate the population mean and population standard deviation from the small sample of 12 values using the average, (Xbar), and the sample standard deviation, (S). There will be variability in both of these estimates; what we get in one sample of 12 is just one estimate from many estimates that are possible. Other samples of size 12 will give different estimates. “The multiplier of three is correct only if we have an infinite sample size, but here we have only 12 values. We must take into account the variability of both Xbar and S due to the small sample size. The multiplier must be different than three to accommodate the uncertainty in our two estimates.” David goes on to explain that the limits set in this case using the sample average plus and minus three times the sample standard deviation are too narrow and thus the lot is on verge of rejection. Spring 2012 Volume 16 Number 2 47 Peer Reviewed: Statistics why the confusion? It is clear how this confusion occurs. The statistics professor says μ ± 3 σ brackets 99.73% of the area under the curve or 99.73% of the population of values. This gets translated into the mean plus and minus three standard deviation, which then quickly morphs into the average of the sample plus and minus three standard deviations of the sample or Xbar ± 3S. It is perfectly understandable but not acceptable and is a violation of CGMPs because it is not an ‘application of suitable statistical procedures.’ correct calculation The correct way to estimate the natural limits for the data set of 12 values is to calculate the statistical tolerance interval using Xbar ± K S where K is a function of the sample size, a given percentage of reportable values, and a stated confidence (2). The percentage of values can be whatever we wish it to be, 95%, 99%, or even 99.73%. Also, we need to specify how confident we wish to be in our statement. This can be 95%, 99%, or other value. The idea of being incorrect 5% of the time is not appealing; so many people will chose a 99%/99% tolerance interval. This allows them to state that they are 99% confident that 99% of the future reportable values (if nothing changes and the future looks like the past) will lie within the calculated interval. Note that this is the natural limit of normally distributed data, but they can serve as warning or alert limits. Or as Hahn and Meeker states, “Such and interval would be of particular interest in setting limits on the process capability for product manufactured in large quantities” (3). To set action or investigation limits, the limits should be expanded to account for other sources of variability. Accept or reject limits are wider and take into account the known product stability profile. At this point Robert, the VP for production, jumps in to exclaim, “You mean to say that we almost recalled this batch because the limits were set using a rule of thumb and not the most correct technique available?” David replies, “Well, in defense of Tom, the sample average plus and minus three times the sample standard deviation is unfortunately widely used and 48 Journal of GXP Compliance abused in the pharmaceutical industry. It has become a universal monkey wrench that people use for a wide variety of statistical issues. “This is, unfortunately, a common practice that must be changed. This lot is only the tip of the iceberg of potential financial losses due to using a rule of thumb in place of a best approach. I have seen this used not only to set specification criteria but in place of tolerance intervals, confidence intervals, significance tests, and as an outlier test. A European agency has, reportedly, required US companies to set accept and reject specification criteria using it. Not only is this incorrect, it presents a high degree of risk for the company and contradicts the CGMPs.” correct application There are only two applications where the multiplier of three is accepted by the statistical community. These are for control charts and process capability indices such at Cpk and Ppk. Control charts were developed in 1924 by Dr. Walter Shewhart while working at the Hawthorne works of the Western Electric company in Cicero, Il. It was intended as a pragmatic tool and never as an exact probability statement. Thus, the control limits on a control chart are by definition (and not theory) the average plus and minus three times the standard deviation. The ASTM book states clearly, “The choice of the factor 3 in these limits is an economic choice based on experience that covers a wide range of industrial applications of the control chart, rather than on any exact value of probability” (4). The same situation exists for Cpk. The three in the denominator is only a rough rule of thumb. Cpk was never intended to be anything other than a crude indicator of possible process incapability. Given that, control charts and Cpk should not be used to make accept/reject decisions but are an inexact suggestion for further investigation, data collection, and a rigorously correct statistical analysis. Every introductory statistics class eventually addresses the several characteristics of the normal population or distribution. See the smooth curve in Figure 1. The mean of the distribution is defined as mu, μ, and is 100.0 here. The standard deviation of L y n n To r b e c k Figure 1: Figure 2: Histogram of n=30. Histogram of SD n=5. the distribution is defined as sigma, σ, and is 1.0 here. It is always pointed out that μ ± σ will bracket 68.0% of the area under the curve, μ ± 2 σ (actually 1.96) will bracket 95.0% of the area under the curve, and that μ ± 3σ will bracket 99.73% of the area under the curve. The last being the one that is most remembered. These are true statements, but as was noted, they apply in theory not in practice where we almost never know the population mean and standard deviation. As can be seen in Figure 1 for the histogram for 30 values, the average (i.e., the estimate of the mean) is 99.78 and the sample standard deviation is 0.82. These are close to 100 and 1 but not exact. If we were to take another sample, the estimated values would be slightly different. Small samples give poor estimates and large samples give better estimates. In practice, we are nearly always working with small samples as 5, 10, 20, and 30. In this context, even samples of 100 may not large enough. As an illustration of this, Figures 2-4 show histograms for sample standard deviations, S, for sample sizes of 5, 30, and 100. Note that for n=5, the estimates of S range from about 0.2 to 2.2 even when the true value is 1.0. For n=30, they range from 0.6 to 1.4. For n=100, they still range from 0.8 to 1.2. This is plus and minus 20%. There is a lot of variability in our estimates of variability. All of this comes together to support the need for a multiplier different from three for specific applications. Figure 3: Histogram of SD n=30. Figure 4: Histogram of SD n=100. Spring 2012 Volume 16 Number 2 49 Peer Reviewed: Statistics Figure 5: Calculated 99% confidence interval vs. Xbar ±3S Figure 6: 99%/99% tolerance interval vs. Xbar ±3S. Figure 6 shows a 99%/99% tolerance interval vs. Xbar ±3S. As can be seen, the tolerance limits become almost the same as Xbar ±3S for large sample sizes greater than a hundred. But, for less than a hundred, Xbar ±3S limits are far too narrow thus leading to incorrect estimates. SUMMARY A multiplier of three is only accepted for control charts and Cpk, and even there, it is an inexact and crude guide. Xbar ±3S is not appropriate for any other applications and is a possible violation of the CGMPs. In the pharmaceutical industry, inexact rules of thumb are not an acceptable way to protect the public health. REFERENCES 1. FDA, 21CFR211.110(b). 2.A search of the Internet for “Tolerance Interval Tables” will find the tables needed. 3. G. J. Hahn and W. Q. Meeker, Statistical Intervals, John Wiley, p 34, 1991. 4.ASTM, ASTM Manual on Presentation of Data and Control Chart Analysis, p 78, 1976 5.The Student t tables are published in every statistics text. GXP confidence intervals A statistical confidence interval on the mean is a statement that we believe the true but unknown mean will lay within an interval with a given confidence, say 99%. In other words, if we were to do this many times, 99% of the time the interval would contain the true value. This interval is found using Xbar ± tS / (n)^0.5, t is found using a table where t is a function of the sample size and the desired level of confidence (5). Figure 5 shows a calculated 99% confidence interval vs. Xbar ±3S. As can be seen, Xbar ±3S is too narrow for small samples and too wide for large samples. In fact the confidence interval width goes to zero for an infinite sample size because n is in the denominator. This makes it totally incorrect for setting alert limits. 50 Journal of GXP Compliance ABOUT THE AUTHOR Lynn Torbeck is a consultant specializing in applied statistics and designed experiments for quality assurance, quality control, validation, and manufacturing under the CGMPs. He has been in the pharmaceutical industry for all of his career and president of Torbeck and Associates since 1988. He may be reached by email at [email protected].
© Copyright 2024 Paperzz