Exponentially Adjusted Moving Mean Procedure for Quality Control

HEMATOPATHOLOGY
Original Article
Exponentially Adjusted Moving Mean
Procedure for Quality Control
An Optimized Patient Sample Control Procedure
FREDERICK A. SMITH, MD, AND STEVEN H. KROFT, MD
The idea of using patient samples as the basis for control procedures
elicits a continuing fascination among laboratorians, particularly in the
current environment of cost restriction. Average of normals (AON) procedures, although little used, have been carefully investigated at the
theoretical level. The performance characteristics of BulPs algorithm
have not been thoroughly delineated, however, despite its widespread
use. The authors have generalized Bull's algorithm to use variably sized
batches of patient samples and a range of exponential factors. For any
given batch size, there is an optimal exponential factor to maximize the
overall power of error detection. The optimized exponentially adjusted
moving mean (EAMM) procedure, a variant of AON and Bull's algorithm, outperforms both parent procedures. As with any AON procedure, EAMM is most useful when the ratio of population variability to
analytical variability (standard deviation ratio, SDR) is low. (Key
words: Quality control; Patient samples; Bull's algorithm; Average of
normals) Am J Clin Pathol 1996; 105:44-51.
In 1974, Bull and colleagues' published a procedure designed to assess quality control (QC) of erythrocyte indices measured by automated hematology analyzers using
patient data rather than control materials. This has since
become known as "Bull's algorithm." Quality control indices using patient data are particularly desirable in hematology because of the expense and instability of available manufactured control materials.2 In addition,
commercial controls may possess very different physical
properties than the patient samples that they should ideally mimic within the analytical environment. This limits their effectiveness in the detection of analytical errors
that may significantly affect patient results.2 The simplest quality control procedure using patient samples is
the average of normals procedure (AON), as described
by Hoffman and Waid in 1965.3 This procedure signals
an error condition if the average of a selected number
of patient samples falls beyond predetermined control
limits set around the mean of designated "normal" patient population. Because abnormal samples may be
present within any given patient population, it is generally necessary to set truncation limits, outside of which
results are excluded from analysis. The appropriate position of the truncation limits depends on the desired sensitivity of the procedure as well as the percentage of abnormals within the population.4 In addition, it has been
demonstrated that AON performs best when the standard deviation of the patient population (spop) is not
much larger than the analytical standard deviation
(smeas)-4 For example, AON is much more sensitive to detection of analytical bias for an assay with a standard deviation ratio (SDR = SpOp/smcas) of two than for an assay
with an SDR of eight (Table I).5 Note that the spop is a
resultant of the underlying between-patient biologic
standard deviation (sbio) and the smeas. This relationship
is expressed as:
= Vs • 2 + s
s
•Jpop
V °bio
2
' °mcas
en
V' /
Bull's algorithm is a variation of the AON procedure
developed specifically for application to red cell indices
on automated analyzers. Although it is widely incorporated into commercial hematology analyzers, it has historically resisted clear explanation of how and why it
works. This may be partially due to its considerable notational complexity, which in fact has led to misprinting
by reputable authors.2 The algorithm as described by
Bull and colleagues' is as follows:
From the Department of Pathology. Northwestern University Medical School. Chicago. Illinois.
X B .i =
Manuscript received June 15, 1995; revision accepted August 30,
1995.
Address reprint requests to Dr. Kroft: Department of Pathology, Passavant Pavilion Rm. 316, 303 E. Superior, Chicago. IL 60611.
(2)
where XB,i is the current Bull's mean and XBj_i is the
previous Bull's mean. Thus, the new Bull's mean after a
44
SMITH ET AL.
Exponentially Adjusted Moving Mean
batch or run of N samples of analyte X is computed by
adding a calculated "signed function" d to the previous
Bull's mean. The function d is denned as:
d = sgnl 2 sgn(Xj - XB,i-i)|Xj - XB.i-il
/^sgn(Xi-XB,i_,)|Xi-XB,,_,l'yp
(3)
Traditionally, the exponential factor (P) = 1/2 and n
= 20. Thus, d is calculated for a batch of 20 values of
analyte X by taking the square, with sign maintained, of
the average of the square roots, with sign maintained,
of the differences between X, through X2o and XB. It is
important to realize that the bulk of the formula is present simply to maintain the sign of the function d, so that
XB is incremented in the same direction as the predominant deviation of the twenty values of X in the batch. A
critical feature of the formula is that it averages the
square roots of the deviations of individual points from
the previous Bull's mean and then squares that average.
As larger numbers are "reduced" more by taking their
square root than smaller numbers, this formulation
serves to reduce the effect of points that are far from the
previous Bull's mean compared to points that are close
to it. This effectively narrows the distribution of XB signal compared to AON by dampening both the random
variation of the normal population and the effect of abnormal patient outliers. Finally, the effect of calculating
d relative to the previous Bull's mean is to maintain maximum sensitivity to deviations around the current operating position of the method, thus making the measure
particularly responsive to stable bias or progressive in-
TABLE 1. TYPICAL STANDARD DEVIATION RATIOS
(SDR) VALUES FOR REPRESENTATIVE CLINICAL ASSAYS
Assay
MCHC
Calcium
Sodium
Chloride
Prothrombin time
Creatinine
Potassium
Glucose
MCH
CO2
Cholesterol
SDR
1.7
2.6
2.7
2.9
3.0
4.0
5.4
7.0
7.4
7.8
9.1
MCV
11.0
Urea nitrogen
Hemoglobin
13.0
23.0
MCHC c mean corpuscular hemoglobin concentration; MCH = mean corpuscular hemoglobin: MCV - mean corpuscular volume.
45
creases in bias in one direction. This imparts a trendlike behavior to the procedure. However, because of the
dampening inherent in the process, Bull's algorithm
would be expected to respond slowly to analytical bias
compared to an AON procedure, as on a given run the
Bull's mean does not completely shift to the position of
the true mean of that run. However, the error detection
would be expected to "pick up speed" over multiple runs
because of the trend effect. Note that Bull's algorithm
will not detect increases in random error.
To date, Bull's algorithm has only been evaluated for
the quality control of red cell indices for which it was
originally designed,l6~9 whereas we believe it has potential applications beyond hematology analyzers. Furthermore, although in their original article, Bull and colleagues state that a value of P = 1/2 performed best, no
data on the effect of varying P was presented. Also, the
use of n other than 20 has never been investigated. We
view Bull's algorithm, as currently used, as a specific example of a more generalized quality control procedure,
which we call the Exponentially Adjusted Moving Mean
(EAMM), which in turn is a modified AON procedure.
In fact, when P = 1, EAMM is identical to an untruncated AON. Our goal in this study was to describe the
general performance characteristics and behavior of the
EAMM over a range of values of n, P, and SDR over
multiple runs. We used a QC simulation software program written by one of the authors (F.A.S.) to determine
the rates of rejection (error detection) of simulated analytical runs with varying degrees of bias imposed on a
normally distributed population.
To ensure valid comparisons of different values of P,
it was necessary to normalize the probability of falsely
rejecting a run. We chose a fixed rate of 0.3% per run of
n patient samples, inasmuch as the EAMM test statistic
is not calculated until n patient samples have accumulated. The sample run thus constitutes the unit of decision (to accept or reject an analytical run) of the EAMM
procedure. We deemed this very low rate of false rejection to be a reasonable target for the EAMM procedure
in order to make the procedure acceptable for routine
use with high volume analyses. Although the EAMM
procedure can be adjusted to yield false rejection at any
level, a rate of .003 per run, regardless of run length, is
analogous to setting the limit of an AON rule at three
times the standard error of the mean as has typically been
reported.2"4
Like AON, we expected the EAMM to be most
effective at low SDR. We also expected the optimal value
of P and n to be dependent on several factors, including
the SDR, the desired performance characteristics of the
Vol. 105 • No. 1
46
HEMATOPATHOLOGY
Original Article
procedure, and the percentage of abnormals in the population. The last factor we will evaluate in a future study,
the current analysis being limited to normally distributed populations.
MATERIALS AND METHODS
The quality control simulation program is written in
Quick BASIC 4.0 (Microsoft, Redmond, WA) for use on
MS-DOS compatible microcomputers. The simulations
were run on a variety of personal computers powered by
i486 and Pentium microprocessors (Intel, Santa Clara,
CA). The simulator represents an extension of a previously described simulation system10"12 to include a variety of patient sample-based control procedures as well
as classical, known-value control algorithms. In the
EAMM procedure simulation, the following parameters
were under our control: the number of patient samples
per run (n); the exponential factor (P); the truncation/
exclusion limit for outliers; the flagging limit for the
EAMM signal; and the analytical and population variances.
The simulator generates simulated patient data points
using a file of pre-generated gaussian integers with a
mean of 0 and a standard deviation of 1,000. These are
scaled to the user-defined SbiO and added to the patient
population mean to yield the "true" patient sample values. To each sample value is added analytical bias, if any,
and analytical variation generated analogously from a
separate file of gaussian integers. The simulator allows
generation of abnormal samples distributed randomly
among normal patient samples, but for this study, only
a normal population was used. The use of pregenerated
random numbers to simulate both biological and analytical variations insures that each QC algorithm is exposed
to identical datasets, and thus comparisons between
different algorithms are strictly fair, if arguably not
strictly random.
Flagging limits that set a false rejection rate of 0.003
were determined by a simple simulation of the EAMM
procedure using 200,000 sequential gaussian data points
at different n and P. After each batch of n points, the
magnitude of the EAMM signal was tabulated and the
99.7th percentile determined.
Output from the simulation program was incorporated into a Lotus 1-2-3 spreadsheet (Lotus Development, Cambridge, MA) for analysis and graphic display.
Each statistical power curve was fitted empirically to the
equation:
y = 1 - [(e- b x "
(4)
The parameters b and m were determined by a least
squaresfitof y on x using the Lotus 1-2-3 solver function.
This equation is useful to produce smoothed curves that
allow for interpolation at any point, as well as for economical transfer of the curves for use (eg , in the design
of control systems for routine use).
Simulations were performed for n = 20, 30,40, 60, 80,
and 90 for P of 0.3 through 0.7 at intervals of 0.05, with
error conditions ranging from zero to at least 4.5 times
the smeas- For each N, P, and error level a total of 1,000
series of 9 runs each were simulated to calculate probabilities of rejection.
After this preliminary analysis, for each n the range of
P values within which the procedure was most sensitive
were reanalyzed at intervals of 0.01 for P. The optimal
value of P for each n was determined by summing the
area over the power function curves for runs 1 through 9
over values of analytical bias ranging from 0 to the value
that yielded 100% rejection on the first run. Greater sensitivity to persistent analytical bias is demonstrated by a
smaller summed area over the power curve set.
RESULTS
A representative set of power curves for n = 60 and P
= 0.69 is demonstrated in Figure 1. These demonstrate
the probability of rejection on the first run over a range
of error conditions (expressed as multiples of smcas) for
various values of SDR. As predicted, sensitivity for a
given level of systematic error increases rapidly as SDR
is reduced. This was true for all values of n and P, as
well as over multiple runs. When these families of curves
demonstrated in Figure 1 are normalized for SDR, it is
seen that they superimpose exactly (see below). Note that
when the systematic error expressed as multiples of smeas
(systematic error/smeas) is divided by SDR (SpOp/ smcas), the
Smeas term cancels out and the resultant is systematic error in terms of spop (systematic error/SpOp). Therefore, the
remainder of the analyses will be presented in terms of
composite curves normalized for SDR, with error expressed as multiples of spop (see below). These composite
curves may be closely approximated by a least squares
fitted curve according to equation 4 above (see below).
The effect of increasing n is as predicted, with dramatic
increases in sensitivity with increases in n over the range
we investigated (Fig. 2). It should be noted that the increase in sensitivity is not arithmetic. For instance, 3
runs of 30 is not equivalent to 1 run of 90. In fact, 1 run
of 90 is more sensitive to error than 3 runs of 30, even
though it has a lower false rejection rate (.003 compared
to approximately .009).
The effect of varying P is demonstrated cumulatively
over multiple runs in Figure 3 for n = 60 and a fixed level
A.J.C.P. 'Junuary 1996
SMITH ET AL.
Exponentially Adjusted Moving Mean
47
•
j
FlG. 1. Probability of rejection as a
function of systematic error for n = 60,
P = 0.69, over a range of SDR: (1) first
run rejection; (2) cumulative third run
rejection.
i
I
j \/Jf/
•
—
2
3
Systematic error / Smeas
4
0
—
-
—
1
f
—•IT
1
i
•
jlll:
t / t /1
rlt /
•—m—i
!
—
i
-i—
|
; ' i
1
2
i
3
4
5
Systematic error / Smeas
= 5 sSDR - 6
of SE. Note that the best performance is obtained on the
first run for P = I (equivalent to untruncated AON).
However, this advantage is quickly lost over multiple
runs, and by run nine all parameters demonstrated except 0.3 are performing better than AON. This was seen
over all values of n except 20. Interestingly, at this level
of n, parameters of < I were almost exactly as sensitive as
AON, even on the first run, as demonstrated in Figure 4.
The optimal P levels for various values of n are shown
in Table 2. These vary between 0.63 and 0.70. Families
of curves for optimized P for different levels of n are
shown in Figure 5. Curves for P = I and P = .5 are given
for comparison. Fitted curves are superimposed on observed curves in these figures.
DISCUSSION
As expected, the EAMM, like the closely related AON,
is most sensitive to systematic error (bias) for assays in
which the ratio of spop to smeas is small. The EAMM is
highly responsive to quite small biases for SDR = 3, but
this sensitivity falls offquickly for SDR increasing up to
7. However, it should be pointed out that for diagnostic
use, higher degrees of bias, expressed as multiples of
Smeas. may be more tolerable for assays with larger SDR.
This is because when SpOp/smeas is large, small biases may
be insignificant compared to the intrinsic variability
present within the population, and may therefore not
affect diagnostic decisions. However, the relative importance of various amounts of analytical error depends on
both the particular characteristics of the assay and analyte as well as the specific application of the test. For example, consider an analyte with a large between-patient
variability, but which is under very tight physiologic control in an individual patient. Small analytical biases
would be unimportant if the assay were to be used as a
screening procedure, but may assume profound importance if used to follow an important parameter in an individual patient over time in a critical care setting. Examples of SDR values for some common analyses are
shown in Table 1.5
An analysis of the curves for a range of SDR at a given
n and P revealed that when the magnitude of error is corrected for differences in SDR (expressed as fractions or
multiples of spop), the curves are exactly superimposable.
These composite curves can be closely approximated by
an equation of the form given in equation 4. Thus, the
ability of the EAMM to detect a given level of error can
be predicted directly from the ratio of the error to the
spop. This represents the mathematical basis for the observed increased sensitivity for lower SDR. It can be explained intuitively in the following way: Consider that
we are monitoring differences in the mean of our normal
population to detect drift in our analytical process. In
effect, the SpOp represents the "noise" against which we
are attempting to detect the "signal" (ie, the analytical
error): the smaller the signal-to-noise ratio, the lower the
power of the signal. This signal-to-noise ratio represents
the sole determinant of the sensitivity of the control procedure, regardless of the value of SDR.
The effect of increasing n, the number of samples analyzed per run, was not surprising in that larger values of
n markedly increased the sensitivity to bias. Also, as
pointed out earlier, large values of n provide better error
detection, with less false rejection, than equal numbers
of samples analyzed in smaller batches. However, the
trade-off of using large values of n is fewer QC data points
at less frequent intervals. In addition, at very large n (eg,
n > 100), the procedure may actually be too sensitive, as
Vol. 105-No. 1
HEMATOPATHOLOGY
48
Original Article
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
Systematic error / Spop
| « N = 20 «-N = 30 * N = 40 s N = 60 * N = 80 I
P (Exponential adjustment factor)
FIG. 3. Probability of rejection as a function of/" for runs 1, 2, 3, 4, 5,
7, 9 for n = 60 at afixedlevel of systematic error (0.3 X SpoP).
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
/fU
yi f /
1.3
0.9
Systematic error / Spop
I
B
• 0.8
i N = 20 • N = 30 * N = 40 B N = 60
/1
:
0.7
!
FIG. 2. Probability of rejection for n = 20. 30,40. 60, and 80 for a fixed
P (0.7): (A)firstrun rejection; (B) cumulative third run rejection.
-i-o.6
1
1
0 5
i
0.4
medically insignificant analytical error might signal error
conditions at an unacceptably high rate, thus increasing
the functional false rejection rate. Thus, the ideal n for
the EAMM depends on the desired sensitivity for detecting various amounts of error as well as the number of
samples analyzed daily in a given lab. An advantage of
patient data QC procedures over control procedures is
that they allow the accrual of large number of QC data
points per day for minimal cost. For low volume assays,
high values of n may prohibit the generation of sufficient
data points to effectively monitor an analyzer's opera-
!
< //I
; ii
/
I II \ /
'•if n/\
II 1/ / i
i
!
/
i.
y/ /
/
a. 0.2
0.1
0
0.1
0.2
•
!
/
0.3
0.4
0.5
! 1
0.6
0.7
0.8
0.9
1
j
i
1.1
1.2
1.3
1.4
Systematic error / Spop
- P = 0.66 (optimal)
- P = 1.0 (average of normals)
FIG. 4. Comparison of probability of rejection versus error for optimized EAMM and AON at n = 20 over runs 1, 2, and 5 (right to left).
A.J.C.P. • January 1996
49
SMITH ET AL.
Exponentially Adjusted Moving Mean
7
Sy^WMOcEirarfSpop
SytiamOc Efror / Spop
B
Systematic Eiror / Spop
Syttamatic Error / Spop
SyiumatJc Error / Spop
I-Exporiinanta] •Flttad
j-Experimental «Rt1ad
FIG. 5. Probability of rejection versus systematic error (experimental and fitted) for P = 0.5 (Bull's algorithm); P = optimal level; and P= 1 (AON)
over runs 1, 2, 3, 5, and 9 (right to left): (A) n = 20; (B) n = 30; (C) n = 60; and (D) n = 80.
tion. However, a high volume assay may easily accommodate large values of n for the EAMM calculation.
Varying the value of P in the EAMM had profound
effects on the sensitivity of the analysis, as predicted (Fig.
3). High values of P (eg, 0.9 or 1.0) showed the best early
run performance. Intermediate values of/'(eg, 0.5-0.7)
performed less well in early runs, but showed better performance over multiple runs than higher values. At very
Vol. 105 •No. 1
HEMATOPATHOLOGY
50
Original Article
TABLE 2. OPTIMAL VALUES OF P FOR VARIOUS N
N
Optimal P
20
30
40
60
80
90
0.66
0.64
0.63
0.69
0.70
0.63
low values of P (eg, 0.3) the performance remained poor
both in early runs and over multiple runs. Thus, it appears that two opposing factors influenced the performance of the EAMM: as P is lowered, there is a progressive increase in the dampening of errors resulting in a
loss of information on early runs; however, there is a simultaneous increase in the trend effect, producing an advantage over multiple runs. It turns out that for each
value of n there is a point at which these two effects balance out and produce a maximal sensitivity to error, as
given in Table 1. These values always fell between 0.63
and 0.70. Thus, the optimized EAMM detected error
better than either the traditional Bull's algorithm or the
AON.
An interesting exception to the previously described
behavior was that the optimized EAMM had essentially
equalfirst-runsensitivity as AON for low n (n = 20). This
may be explained by the fact that the dampening effect
on random error imparted by the exponential function
allows narrower error limits than AON for the same level
of false rejection, thus improving sensitivity. However,
this first run equivalence is lost for higher values of n,
such that AON is superior to EAMM in sensitivity on
thefirstrun after error is introduced. This reflects the fact
that when n is high (eg, >30), the mean of a given run
will very accurately and reliably reflect the mean of the
entire population. This will occur regardless of the standard deviation of the population values, such that the
advantage of symmetrical dampening of random error
with resultant narrower error limits imparted by the
EAMM is lost. To reiterate, the cumulative sensitivity of
EAMM over multiple runs will be superior to AON even
for high values of n because of the trend effect in EAMM.
It must be realized that our current analysis is limited
to normally distributed populations lacking abnormal
components. Therefore, our results as described are
strictly applicable only to assays and patient populations
that have a very small percentage of abnormals. We expect that as the percentage of abnormals within a population increases, the sensitivity for a given level of false
rejection will decrease. The AON procedure handles outliers by applying truncation limits outside of which data
are excluded from analysis. For populations without outliers, the procedure performs best without any truncation. The truncation limits progressively narrow as the
percentage of outliers increases. Analogously, we would
expect the optimal P value for the EAMM to decrease
(increased "trim") as the percentage of outliers increases.
Alternatively, one could apply truncation limits to the
EAMM procedures. The effects of outliers on performance and the optimal methods of correcting for them
are currently under investigation and will be the subject
of a subsequent report.
Ultimately, we imagine that a customized EAMM
procedure could be easily incorporated into computerized laboratory quality control systems. Once installed,
it would provide essentially cost-free quality control, although it will likely not replace traditional control materials. In fact, perhaps the most promising use for these
procedures is as an event gauge, or signal, to run a known
value control. For example, the sensitivity of the procedure could be set at a more stringent level than clinically
necessary. An error signal, then, would not precipitate
rejection of a run, but would rather initiate the running
of traditional controls. This could serve to decrease the
use of expensive control materials while maintaining
good control of the system. For instance, using an n of
60 with false rejection set at 0.3%, one false positive result would occur every 20,000 specimens, thus necessitating the use of traditional control materials only at very
infrequent intervals in the absence of true bias. At an
SDR of 5, a systematic error of magnitude two times smeas
would be detected on the first run after introduction of
error 54% of the time. This increases to 88% after two
runs and 98% after three runs. However, the acceptable
false rejection rate in such a system could be set much
higher, as the cost of a false positive result would not be
the rejection of an entire analytical run, but simply the
cost of a sample of control material. Therefore, the sensitivity of the assay could be markedly increased. In addition, a multi-rule version of EAMM could be used to
improve performance, as described for Bull's algorithm.7
REFERENCES
1. Bull BS, ElashoffRM, Heilbron DC, Couperus J. A study of various estimators for the derivation of quality control procedures
from patient erythrocytic indices. Am J Clin Pathol 1974;61:
473-481.
2. Cembrowski GS, Carey RN. Quality control in hematology. In:
Laboratory Quality Management. Chicago: ASCP Press, 1989,
pp 186-212
3. Hoffman RG, Waid NE. The "average of normals" method of
quality control. Am J Clin Pathol \965;43:134-141.
4. Cembrowski GS, Chandler EP, Westgard JO. Assessment of "average of normals" quality control procedures and guidelines for
implementation. Am J Clin Pathol 1984; 81:492-499.
A.J.C.P. -January 1996
SMITH ET AL.
Exponentially Adjusted Moving Mean
Cembrowski GS. Use of patient data for quality control. Clin Lab
Med \9S6\6J 15-733.
Cembrowski GS, Westgard JO. Quality control of multichannel
hematology analyzers: Evaluation of Bull's algorithm. Am J
Clin Pathol 1985;83:337-345.
Levy WC, Hay KL, Bull BS. Preserved blood versus patient data
for quality control: Bull's algorithm revisited. Am J Clin Pathol
1986;85:719-721.
Lunetzky ES, Cembrowski GS. Performance characteristics of
Bull's multirule algorithm for the quality control of multichannel hematology analyzers. Am J Clin Pathol 1987;88:
634-638.
51
9 Tramacere P, Marocchi A, Gerthoux P, et al. Inefficacy of moving average algorithm as principal quality control procedure
on Technicon system H6000. Am JCiin Pathol 1991 ;95:218221.
10. Smith FA. The effects of long-term components of variance on the
performance of rules for statistical quality control (Abstr). Clin
C/ieml987;33:21O.
11. Smith FA, Cossitt NL. Simulated comparison of multi-rule protocols for statistical quality control (Abstr). Clin Client 1987;33:
909.
12, Smith FA. Statistical power functions of multi-point rules (Abstr).
ClinChem 1986;32:1183-1184.
Vol. 105-No. 1