Empirical Bayes Combination of Estimated

Empirical Bayes Combination of
Estimated Areas under ROC Curves
Using Estimating Equations
XIAO H. ZHOU, PhD
A synthesis of the empirical Bayes method and the method of estimating equations is used
to combine individual receiver operating characteristic (ROC) area estimates from different
studies of the same diagnostic test into a single estimate. This single estimate represents
the population mean from which individual areas under the ROC curves were sampled. The
only data needed to carry out the method are estimated areas under the ROC curves and
the corresponding standard errors. Key words: ROC curves; empirical Bayes method. (Med
Decis Making 1996;16:24-28)
is how to arrive at a better estimate of the true area
under the ROC curve of the diagnostic test results
using data from all the studies. McClish 4 presented a
method to combine estimated areas under the ROC
curves across different studies. Her method assumes
that the studies are homogeneous and that all studies
provide estimates for the same true ROC area. In other
words, her method assumes that differences between
the ROC areas estimated in individual studies are due
only to within-study variability (experimental error).
However, the true ROC area for the ith study might be
affected, for example, by the design and execution of
the study and the characteristics of the patients enrolled. Thus, differences in the ROC areas estimated
in individual studies might come both from withinstudy variability and from actual differences between
studies.5
The empirical Bayes (EB) method does not assume
that individual studies all have the same true ROC area,
and provides a simple way to express study-level heterogeneity with a two-stage model: 6-8 The applications of the EB method to meta-analysis in clinical
trials have become very popular?-” However, the EB
method has not been applied to ROC studies. In this
paper, we apply the EB method to combine ROC area
estimates across studies without assuming that different studies have the same true ROC area. Also, we
apply the method of estimating equations (a version
of the method of moment) to estimate the unknown
population parameters. The next section describes the
Bayes framework in the context of the areas under
ROC curves. The method of estimating equations is
then used to estimate unknown population parameters. The following section applies the method to one
example. The data set in the example is taken from
McClish’s paper concerning the dexamethasone
suppression test.
Evaluating the accuracies of diagnostic tests in detecting the presence of disease is very important for
both quality of care and cost containment. One way
to evaluate the accuracy of a diagnostic test is to estimate the sensitivity and specificity of the test by dichotomizing the test results into a positive result and
a negative result. Both of these quantities, however,
depend on the confidence threshold used by a specific
test reader for calling a positive test, and this dependence confounds the results of the accuracy. To overcome this problem, we plot 1 - specificity versus sensitivity for all confidence thresholds, resulting in the
receiver operating characteristic (ROC) curve. Thus, a
ROC curve allows us to study the inherent discrimination capability of a diagnostic test.
The area under a ROC curve is the most widely used
index for summarizing information contained in a ROC
curve. The area under a ROC curve has been shown
to be equal to the probability of correctly ranking a
(diseased, nondiseased) pair,’ and can be estimated
either by the Wilcoxon statistics2 or by the parametric
binormal model method.3
Several separate clinical studies, using independent
case samples, are often conducted to estimate the accuracy of the same diagnostic test. ROC area estimates
of the test results* and the corresponding standard
errors are reported in the studies. Then, the question
Received May 4, 1994, from the Division. of Biostatistics, Department of Medicine, Indiana University School of Medicine, and the
Regenstrlef Institute for Health Cam, Indianapolis, Indiana. Revision
accepted for publication January 17, 1995. Supported in part by
grant number R29HS08559 from the Agency for Health Care Policy
and Research and by PHS NO1-LM-4-3510.
Address correspondence and reprint requests to Dr. Zhou: Division of Biostatistics, Indiana University School of Medicine, Riley
Research Wing, RR 135, 702 Barnhill Drive, Indianapolis, IN 462025200.
24
Combining ROC Area Estlmates 0 25
VOL 16/NO 1, JAN-MAR 1996
Met hod
We apply a two-stage model to account for the withinstudy and the between-study variability. In the first
stage, the model assumes that area estimates under
the ROC curves for individual studies am conditionally
independent, given their true areas. In the second stage,
the model assumes that the unobserved true areas for
individual studies are a random sample from a distribution with the mean A and variance 7’.
Suppose that K studies investigated the accuracy of
the same diagnostic test. All studies had different patient samples. Let Ai denote the true area under the
ROC curve for the ith study. An estimate & for A, is
derived either by a nonparametric method 2,12 or by a
parametric method.3
The two-stage Bayes model is defined as follows:
l
Stage one says the area estimate Ai for the ith study
has the unknown mean Ai and the known variance
Vi, given A,:
E&IA,) = Ai, var&lAJ = Vi
l
Stage two assumes that the mean and variance of an
unobserved true area A, are A and T’, respectively.
The parameter A represents the population mean of
the ROC area of a diagnostic test, and the parameter
+ represents the between-study variability.
arbitrary constants. Proposition 1 in the appendix shows
that the estimates A and ‘i2 are consistent estimates
of A and T' and have an asymptotically joint normal
distribution. .
From Proposition 1, we see that the variance of A
can be consistently estimated by
(h$-$l
and that the variance of ‘i2 can be consistently estimated by
Two computational methods are available to solve
equations 1 and 2. The first method is given by the
following steps:
l
Choose an initial value of
l
Estimate A by
K
l
Next, we are to estimate the parameters of interest
A.&id +. Observe that the marginal mean and variance
of Ai are
E&J = A, vat-&J = Vi + ?
n
LJ,(A, 7’) = ,$,, s = 0
I
(1)
Since T’ is an unknown parameter, we need an additional estimating equation for 72. We propose the
following quadratic estimating function:
= 0 (2)
Let A and ‘i2 be the solutions to equations 1 and 2.
Gilbert and colleagues 15 used similar estimating equations to estimate A and 72. However, they did not give
variance estimates for A and ,i2. Godambe and Heyde 14
have shown that given T' the estimate A is unbiased
and has the smallest variance among all possible estimates with forms Zr_l &oL~/X~_~ oi, where ois are
T',
A
Find the updated estimate of
say 9:.
/K
T'
by solving
zK [Ai(Vi -+ i&)1”
i=l
l
Based on theory of generalizing estimating functions 13,14
the optimal estimating function for the parameter A
is
(3)
7')
Continue this process until convergence.
The second method is for use with an existing statistical package. Note that if the marginal distribution of
Ai is a normal with the mean A,and variance Vi + ?,
then the score functions for A and T' from the normal
random variables are identical to equations 1 and 2.
Thus, equations 1 and 2 can be solved generally with
software designed specifically for normal. The LE program in BMDP16 is an example. The LE program estimates the parameters that maximize a given likelihood function, using the iterative Newton-Raphson
algorithm.
If the parameter 72 = 0, then the true areas under
the ROC curves for individual studies are the same. In
other words, the individual studies are homogeneous.
In this case, McClish 4 suggested using a weighted average of estimated ROC areas from individual studies,
(xF= 1 &Nil/(X:& 1 Vi), to estimate the common true ROC
area. This weighted average is the same as the solution
to equation 1 when 72 = 0. The classical method to
test the hypothesis that HP: 72 = 0 is based on the test
statistics Q = Zy_(,, (& - A12/Vi, where the distribution
26 . Zhou
Table 1
l
Study 1
Study 2
Study 3
Study 4
Study 5
Study 6
Study 7
MEDICAL DECISION MAKING
4
An Extension
Data from McClish’s Paper
Areas(&)
SE(k)
Negative
Cases
0.789
0.724
0.851
0.876
0.782
0.702
0.652
0.057
0.025
0.028
0.029
0.102
0.056
0.038
33
152
79
41
49
31
77
Positive
Cases
34
215
119
54
52
65
111
of Q is approximated by a chi-square with K - 1
degrees of freedom. 10
If we reject the hypothesis that 7’ = 0, then we have
to account for the between-study variability to estimate A, the true ROC area of a diagnostic test across
all studies. The solution for A in equations 1 and 2
gives a consistent estimate of A after accounting for
between-study variability.
S o m e t i m e s , w e c a n h a v e m o r e
individual study than just the estimated area under
the ROC curve and corresponding standard error. For
example, this information can be the mean age of the
patients in the ith study. Then, we will extend our
method to incorporate this information in our hierarchical
model. Letassociated with
X,s
be p covariates
.the ith study,
.
i =. 1,
, K. Then, the two-stage model
is defined as follows:
l
Stage one says that the estimated area under the
ROC curve for each study has the unknown mean
Ai and the known variance Vi:
E&,/A,) = Ai, var(AilAi) = Vi
l
Stage two assumes that the mean of an unobserved
area Ai from the ith study is a function of the covariates Xi specific to the ith study:
h[E(AJ] = X$
Application
As an illustration, we apply the method to an example. The example summarized the seven studies of
the dexamethasone suppression test and was used by
McClish as an illustration of her method. 4
where h is a known link function. Also, the variance
of Ai is 72.
The population parameters B and the between-study
variability 7’ can be estimated by the solutions to equations 5 and 6:
EXAMPLE
The dexamethasone suppression test (DST) is a simple laboratory assessment of pituitary-adrenal dysregulation that can be used to distinguish various psychiatric disorders, including psychotic depression,
schizophrenia, mania, and major depression. Mossman and Somoza 15 summarized the accuracy results
of seven studies of the DST, including area estimates
and corresponding standard errors. The area estimates and standard errors from the seven studies are
summarized in table 1.
First, we test the hypothesis that all seven studies
are homogeneous (72 = 0). The Pearson chi-square
method rejects this hypothesis with the p-value <
0.00003. Thus, we cannot use McClish’s method to
combine the estimated ROC areas from these seven
studies. We need to estimate the between-study variability 7’ to combine the individual area estimates.
The newly developed method described above does
not require the assumption that 72 = 0, and can be
used to estimate 7’. Thus, we can apply this method
to combine the estimated ROC areas from the seven
studies. Using the BMDP LE program, we obtain the
estimates for A and T? A = 0.77 and ‘i2 = 0.0051.
Equation 3 gives the corresponding standard error of
A = 0.03. Thus, the estimate for the population mean
of the ROC area is 0.77, with the standard error 0.03.
ift ah(p)
xi /i -
h-1(x;B)
= o
vi + 72
(5)
I = x:.B)
aP p
(-
2 1 [& ,,“,-‘~~)I”
K - p i=l
,
_ 1 = o
(6)
Then, the mean ROC area among the population of
studies with the characteristics Xi is estimated by
h-l(Xifi), where parameters fi are the solutions to
equations 5 and 6.
Discussion
In this paper, we propose using an empirical Bayes
method to combine estimated ROC areas across studies, accounting for both the within-study variability
and the between-study variability.
An attractive feature of the empirical Bayes method
is that it allows one to borrow strength from all studies
to estimate the population mean of the ROC area without assuming the studies are homogeneous. The
method of estimating equations allows us to estimate
the population parameters of interest without fully
specifying distributions for estimated ROC areas,
_
.
VOL 16/NO 1, JAN-MAR 1996
Combining ROC Area Estimates 0 27
By establishing the relationship between our estimating equations and the score functions from a normal random variable, we could use a widely available
statistical package, such as BMDP, to solve for A and
? in our estimating equations. Thus, the calculations
for carrying out the proposed procedure are easy.
Our estimating equations are similar to the ones
proposed by Williams.18 The estimating equations proposed by Williams are equation 1 plus the following
equation:
ci=lK (iiiVi -+ A)’7’ - K = O
However, our estimates are more efficient than the
ones proposed by Williams in the sense that our estimates are maximum likelihood estimates if 4 - N(A,
+?.
In this paper, we assume that the variance Vi is
known or can be estimated from the data in the ith
study. If Vi is unknown, then we will have an overparameterization problem because the number of unknown parameters is greater than the number of studies. In this case, we can try two possible approaches
to solve the overparameterization problem. One is to
put a prior distribution on V,s; another one is to assume that Vis are same.
For the data described in the example section,
McClish4 gave an estimate of 0.781 for the population
mean of the ROC area under the assumption 7’ = 0.
Without this assumption, we gave an estimate of 0.770
with ‘i2 = 0.0051. It would be interesting to do a simulation study to see how big the between-study variability ? needs to be so that we can see the significant
differences between McClish’s estimate and our estimate.
The author thanks Dr. Siu Hui for her useful suggestions.
References
1. Bamber D. The area above the ordinal dominance graph and
the area below the receiver operating graph. J Math Psychol.
1975;12:387-415.
2. Hanley JA, McNeil BJ. The meaning and use of the area under
a receiver operating characteristic (ROC) curve. Radiology.
1982;143:29-36.
3. Dorfman D, Alf E. Maximum likelihood estimation of parameters
of signal detection theory and determination of confidence intervals: rating method data. J Math Psychol. 1969;6:487-96.
4. McClish DK. Combining and comparing area estimates across
studies or strata. Med Decis Making. 1992;12:274-9.
5. National Research Council. Combining Information: Statistical
Issues and Opportunities for Research. Washington, DC: National Academy Press, 1992.
6. Efron B, Morris CN. Stein’s paradox in statistics. Sci Am.
1977;236:119-27.
7 Maritz JS, Lwin T. Empirical Bayes Methods. Second edition.
New York: Chapman and Hall, 1989.
8 Morris CM. Parametric empirical Bayes inference: theory and
application. JASA. 1983;78:47-65.
9. Carlin JB. Meta-analysis for 2 X 2 tables: a Bayesian approach.
Stat Med. 1992;11:141-58.
10. DerSimonian R, Laird N. Meta-analysis in clinical trials. Controlled Clin Trials. 1986;7:177-88.
11. DuMouchel WH, Harris J. Bayes methods for combining the
results of cancer studies in humans and other species. JASA.
1983;78:293-315.
12. Hanley JA, McNeil BJ. A method of comparing the areas under
receiver operating characteristic curves derived from the same
cases. Radiology. 1983;148:839-43.
13. Crowder M. On linear and quadratic estimating functions. Biometrika. 1987;74:591-7.
14. Godambe VP, Heyde CC. Quasi-likelihood and optimal estimation. Int Stat Rev. 1987;55:231-44.
15. Gilbert JP, McPeek B, Mosteller F. Progress in surgery and anesthesia: benefits and risks of innovative therapy. In: Bunker JP,
Barnes BA, Mosteller F, eds. Costs, Risks, and Benefits of Surgery.
1977, pp 124-169.
16. Dixon WJ, et al. BMDP Statistical Software. Berkeley: University
of California Press, 1990.
17. Mossman D, Somoza E. Maximizing diagnostic information from
the dexamethasone suppression test. Arch Gen Psychiat.
1989;46:653-60.
18. Williams DA. Extra-binomial variation in logistic linear models.
Appl Statist. 1982;31:144-8.
19. DeLong ER, DeLong DM, Clarke-Pearson D. Comparing the areas
under two or more correlated receiver operating characteristic
curves: a nonparametric approach. Biometrics. 1988;44:837-45.
A PPENDIX
Proposition 1. Assume that the fourth moment of & e)tists, and the following limits exist:
1
on = lim,,(l/K) 5 - 1
i=t vi + 72
K E[,$& - A)‘] - A(V, + T”)
012 = lim,,(l/K) 2
i=1
(Vi + 7”)”
.
oz1
Then,
=
1
(1/K) 5 E(A, - A ) “ - (Vi + 7”)”
lim,,(l/K) 2
i=l (Vi + 72)2’ uZZ = lirni+
iZl
(Vi + 72)”
The partial derivatives of &,(A, C) and &;(A, *PI witl%r%pedt I
Prom the strong law of large numbers, we know that as K -+ CQ
1
f v
dU,,h 7”) au,,L4, @PI
&A
a72
nrt .,A _a, ht. /.a _a,
alI
n
Therefore, for large K, applying the Taylor expansion to our estimating equations yields:
s . j ,:,
,,[LJl&, .p2), U,ji%, .pj,j’ ” [U,&%, 7% U&A, T?] -ITherefore,
X
This completes the proof of Proposition 1.
- +)‘I