Pathology and Probability

ANATOMIC PATHOLOGY
Original Article
Pathology and Probability
Likelihood Ratios and Receiver Operating Characteristic
Curves in the Interpretation of Bronchial Brush Specimens
STEPHEN S. RAAB, MD, PATRICIA A. THOMAS, MD, JULIA C. LENEL, P H D ,
KENT BOTTLES, MD, KAREN M. FITZSIMMONS, MD, M. SUE ZALESKI, SCT(ASCP),
RICHARD J. WITTCHOW, MD, LARON W. McPHAUL, MD, DANIEL D. SLAGEL, MD,
AND MICHAEL B. COHEN, MD
Diagnoses in pathology often are qualitative, such as atypical or suspicious, and consequently are thought to have limited clinical value. To
investigate the utility of a qualitative diagnostic system, seven pathologists retrospectively evaluated 100 bronchial brush specimens using the
following categories: definitely benign, probably benign, possibly malignant, probably malignant, and definitely malignant. The likelihood
ratio (LR) and receiver operating characteristic (ROC) curve, two statistical probabilistic measurements, were used to calculate diagnostic
accuracy among individuals and groups. The results show: (1) the LR
Diagnoses in anatomic pathology, being dependent on human
judgment, are embodied with a certain element of subjectivity.
If the pathologist thinks that the biopsy findings are normal or
are pathognomonic of certain disease processes, the pathology
report is unequivocal and straight-forward. If the biopsy findings are abnormal, but not pathognomonic of any particular
disease, the pathology report often is couched in qualitative
terms such as suspicious, suggestive or most likely. Many have
decried the use of qualitative diagnoses because of their ambiguous meaning and limited clinical value.1"5 The argument is
well taken. How can a clinician initiate protocol chemotherapy
on a diagnosis of probable lymphoma?
In addition to this language problem, there is the difficulty of
incorporating clinical facts into the pathology diagnosis. Pathologists often want to know the clinical history or the radiologic findings before making a final diagnosis. An example of
this is the conviction that perusal of the roentgenographic findings is essential before making the pathology diagnosis of any
bone tumor. This inclination results in the "double counting"
of clinical facts;1 the pathology report represents more than a
diagnosis made on morphologic findings. If unaware of this
procedure, the clinician will be biased and overestimate the
clinical findings in the decision making process.
for individual diagnostic categories varied among observers, resulting
in different clinically malignant probabilities; (2) observer experience
did not appear to play a role in overall diagnostic accuracy, except in
the diagnosis of small cell carcinoma; (3) observers operate at higher
levels of diagnostic accuracy with, rather than without, clinical history.
The authors conclude that qualitative diagnoses contain important information and can be interpreted effectively with LR and ROC. (Key
words: Statistics; Anatomic pathology; Cytology; Diagnosis; Quality
assurance/control) Am J Clin Pathol 1995; 103:588-593.
To provide a solution for these problems, in 1981 Schwartz
and colleagues proposed a new approach to the interpretation
of biopsy specimens.' These authors advocated the use of a conditional probabilistic technique in the reporting of diagnoses.
For example, if the pathology findings were not pathognomonic and the differential diagnosis included malignant tumor, benign tumor, and infection, the pathologist, without
clinical history, should issue a pathology report expressing the
conditional probability of each of these conditions. This article
evoked a great deal of controversy, and little advancement has
been made in addressing these issues.
We believe that in certain areas in pathology, such as cytopathology, a qualitative diagnostic reporting system already expresses these probabilistic concepts. This view requires a shift
in focus, from the idea that the pathology diagnosis provides a
correct answer to one in which the pathology diagnosis is simply a laboratory test that necessarily exhibits uncertainties and
errors.6 This conception places the focus on measuring the diagnostic accuracy of the pathology diagnosis. Instead of standard sensitivity and specificity measurements, diagnostic accuracy can be measured with the likelihood ratio (LR) and
receiver operating characteristic (ROC) curve, which incorporate a probabilistic technique.7"14 The LR and ROC have been
applied to clinical pathology and radiology data, but seldom
have been used in the anatomic pathology domain. 15 " 26
In this report, using the bronchial brush (BB) specimen as an
From the Department of Pathology. University of Iowa. Iowa City, example, we show the clinical utility of using the LR and ROC
Iowa.
curve in the evaluation of qualitative diagnoses.
Manuscript received June 6, 1994; revision accepted September 30,
1994.
MATERIALS A N D METHODS
Address correspondence to Dr. Raab: University of Iowa Hospitals
One hundred bronchial brush cases retrospectively were seand Clinics, Department of Pathology, 200 Hawkins Drive 5216 RCP,
lected from the cytology files from the years 1991-1993 at the
Iowa City, IA 52242-1009.
588
RAAB ET AL.
ROC Curves and LR in Anatomic Pathology
University of Iowa Hospitals and Clinics. Each of the BBs consisted of two slides. All cases had histologic follow-up and 6 to
18 months (mean 13 months) of clinical follow-up. Fifty cases
had malignant histologic confirmation and fifty cases had benign histologic and clinical confirmation. The original cytologic diagnoses were placed in four categories: benign (35
cases), atypical (24), suspicious (14), and malignant (27).
Some of the cases were diagnostically difficult, whereas others were straightforward. The malignant cases consisted of 41
non-small cell carcinomas, seven small cell carcinomas, one
carcinoid, and one lymphoma. The benign cases exhibited a
spectrum of cytologic findings ranging from no diagnostic abnormality to viral inclusions to acute inflammation.
Each case was randomly re-labeled a number from 1 to 100.
The slides then were screened by an experienced cytotechnologist who was unaware of the previous cytologic diagnoses and
clinical histories. The cytotechnologist placed five dots per
slide, marking the areas, which were diagnostic or most worrisome for malignancy. In the benign cases, these areas often exhibited reactive or degenerative changes.
The cases were divided into three groups and passed among
the observers. Clinical histories were not provided. The observers were instructed to first concentrate on the dotted regions,
and then, only if uncertain, to look elsewhere on the slide. Each
observer scored the cases in one, and only one of the following
qualitative categories: definitely benign, probably benign, possibly malignant, probably malignant, and definitely malignant.
These categories spanned the spectrum of certainty of malignancy. Tumor classification was not requested. Standardized
answer forms were given to each observer with instructions as
to how to complete the forms. The observers did not consult
other study participants.
The observers had different levels of experience in the interpretation of BBs. Observers 3 and 5 were considered the more
experienced cytopathologists; these individuals were senior faculty members with formal cytopathology training and had been
signing out for more than 5 years. The other observers were
considered the less experienced cytopathologists. Observers 1,
2,4, 6, and 7 consisted of a junior faculty member, two cytopathology fellows and two residents who viewed the slides at the
end of a 3-month block of cytology training.
The LR for malignancy for each observer for each diagnostic
category was calculated according to previously described
methods.7"10 To briefly summarize, the LR of a diagnostic category is the quotient of the proportion of individuals with disease who have a particular diagnosis to the proportion of individuals without disease who have that particular diagnosis.
Given the pre-test clinical probability of disease, the LR can be
used to calculate the post-test probability of disease. The LR is
related to the odds of disease by the equation:
Post-test odds = Pre-test odds X LR
The odds of disease and probability of disease are related by
the following equations:
Odds = Probability/(1 - probability)
Probability = Odds/(l + Odds)
LRs can range from 0 to oo. A LR < 1.0 lowers the post-test
probability of disease from the pre-test probability of disease; a
LR = 1.0 does not alter the post-test probability of disease; and
a LR > 1.0 raises the post-test probability of disease from the
pre-test probability of disease.
589
The ROC curves were constructed as described by Dorfman."' 2 A ROC curve is a graphic plot fitted to pairs of true
positive (TP) rates (sensitivity) and false positive (FP) rates
(100% specificity) for a given observer as the criteria for making
a diagnosis is varied. Each criterion gives rise to one point on
the curve. Sensitivity is a measure of the percentage of known
diseased patients with positive test results among all patients
diagnosed as diseased who were evaluated. Specificity is a measure of the percentage of patients with negative test results
among all tested patients who were not diagnosed with the disease in question. Sensitivity and specificity can be expressed as
follows:
Sensitivity = True positives/(True positive + False negatives)
Specificity = True negatives/(True negatives + False positives)
An ideal test would have a sensitivity and specificity of 100%,
although in practice these parameters tend to be inversely related.
Conceptually, a ROC curve can be computed along the following lines. First, all five of the diagnostic categories are assumed to correspond to the presence of disease. This corresponds to a sensitivity of 1.0 (all of the patients with cancer are
correctly diagnosed) and a specificity of 0 (all of the patients
without cancer are incorrectly diagnosed). Next, all categories
except definitively benign are assumed to correspond to the
presence of cancer and the proportion of TPs and FPs are calculated. This yields a point of decreased sensitivity and increased specificity. Next, the combined categories of definitely
malignant, probably malignant, and possibly malignant are assumed to correspond to the presence of cancer and the TPs
and FPs are calculated. This process is continued until all the
diagnostic categories are combined and assumed to correspond
to the absence of disease. This yields a point corresponding to a
sensitivity of 0 and a specificity of 1.0. In total six points are
calculated.
By convention, a ROC curve is plotted with the true positive
rate along the vertical axis and the false positive rate along the
horizontal axis. A ROC curve for an optimal observer will
travel along the upper border of the graph and drop precipitously along the vertical axis. A ROC curve for an observer
making random guesses is represented by a straight, 45° line.
The ROC curves of observers who exhibit less than 100% diagnostic accuracy and who do not randomly guess, generally fall
somewhere between these two curves.
ROC curves were calculated and plotted with either
RSCORE III or RSCORES, ROC curve analysis programs."12
With RSCORE III, individual observer curves were calculated;
information was generalized over cases. With RSCORES, data
across observers were pooled to calculate a ROC curve; information was generalized over observers. Parametric ROC curve
calculating programs were used, because the data consisted of
five discrete points and were not a continuous function.
With RSCORE III, the diagnostic accuracy of each observer
was represented by Az, which corresponds to the proportion of
the total area of the ROC graph which lies under the binormal
ROC curve. Az values generally range from 0.5 to 1.0; an Az
value of 0.5 corresponds to the area under a straight 45° line
(random guessing), whereas an Az value of 1.0 corresponds to
the area under the curve of an optimal observer. The standard
error and 95% confidence interval also were calculated for each
observer.
Vol. 103-No. 5
590
ANATOMIC PATHOLOGY
Original Article
TABLE 1. LIKELIHOOD RATIOS FOR MALIGNANCY FOR THE SEVEN OBSERVERS
Observer
Definitely
Benign
Probably
Benign
Possibly
Malignant
1
2
3
4
5
6
0.19
0.39
0.24
0.11
0.27
0.32
0.49
1.06
0.67
0.45
1.25
1.75
1.47
1.33
oo
1.63
0.67
5.00
1.72
3.19
2.00
oo
oo
oo
9.80
12.77
9.00
21.42
5.17
19.00
7
0.45
2.67
oo
oo
oo
Three ROC curves were calculated for each observer: for the
combined malignant cell types, for small cell carcinoma and
for non-small cell carcinoma. Mean Az values were calculated
for the group of experienced observers and for the group of less
experienced observers. The significance of the difference between the mean diagnostic accuracy for the combined malignant cell types of the more and of the less experienced observers
was determined by a one-tailed t-test. Analysis of variance
(ANOVA) was used to determine the effect of cell type (small
cell carcinoma and non-small cell carcinoma) for the more and
less experienced observers. Finally, the data for all observers
were pooled, and a single ROC curve was calculated using
RSCORES and represented the ROC curve without clinical
history. A ROC curve also was calculated from the original cytologic diagnoses and represented the ROC curve with clinical
history.
RESULTS
The LRs for malignancy for each of the seven observers for
the five diagnostic categories are shown in Table 1. Several of
the cells contain a LR of oo, which usually indicated few diagnoses were placed in these categories. Thus, an observer, like
observer 7, who made few possibly malignant, probably malignant and definitely malignant diagnoses can be considered to
be "conservative" because of the preponderance of benign diagnoses. The other observers were more definitive and placed a
greater number of diagnoses in the definitely benign and definitely malignant categories. The number of cases placed in the
same diagnostic category by different observers varied considerably.
An example of how the LR can be applied to a clinical scenario follows. Suppose a patient had a lung mass and, clinically,
the suspicion of malignancy was 65%. A BB was performed and
interpreted by observer 2. If the cytologic diagnosis was definitely malignant, using the formulas in the materials and methods, the post-BB probability of malignancy would be 96%. Similarly, if the BB diagnosis was definitely benign, probably
benign, possibly malignant, or probably malignant, the postBB probability of malignancy would be 42%, 66%, 71% and
86%, respectively. For observer 4, the post-BB probability of
malignancy for the five diagnostic categories in order from benign to malignant given the same pre-BB probability of malignancy (65%) would be 26%, 48%, 73%, 76%, and 95%, respectively.
A ROC curve measuring the diagnostic accuracy for the diagnosis of malignancy for each of the seven observers is plotted
in Figure 1. The Az values, standard error and 95% confidence
intervals are shown in Table 2. The mean Az values for the
Probably
Malignant
Definitely
Malignant
group of more experienced observers and for the group of less
experienced observers are shown in Table 3. A t-test for the
difference between the two means (ie, mean accuracy for the
more experienced observers vs mean accuracy for the less experienced observers) was not significant at a = 0.05 (t = 0.203, P
= .849).
The Az values measuring the diagnostic accuracy of the malignant categories small cell carcinoma and non-small cell carcinoma of the seven observers and for the groups of more experienced and less experienced observers are shown in Table 4.
For this sample, it appears there is an interaction between level
of experience and cell type. The more experienced observers
performed at a higher level of diagnostic accuracy in the diagnosis of small cell carcinoma, and both groups performed at an
equal level of diagnostic accuracy in the diagnosis of non-small
cell carcinoma. A two-way fixed ANOVA, with cell type and
experience being fixed and cases and observers being random,
was run on the data. The effects of cell type and experience
were not significant (P = . 16 and P = .35, respectively), and the
interaction of cell type and experience also was not significant
at the a = 0.05 level.
A ROC curve measuring the diagnostic accuracy of the original cytologic diagnoses, when clinical history was provided, is
plotted in Figure 2 (Az = 0.974). This is contrasted to the ROC
curve calculated from the pooled data from the seven observers, who operated without clinical history (Az = 0.841).
DISCUSSION
Both clinicians and pathologists often fail to realize that morphologic observations are a laboratory test and are an estimation of the probability of occurrence of a particular disease.6
Ambiguities in the pathology reporting of diagnoses are a manifest expression of this likelihood. These ambiguities can be
dealt with by either altering the approach in the reporting of
diagnoses to reflect diagnostic probabilities or by using statistical methods, such as the LR and ROC curve, which effectively
convey probabilities.
Schwartz and coworkers chose the first approach, using a tabular form of Bayes' rule.1 They advocated a diagnostic reporting schema that lists the individual conditional probabilities of
selected disease conditions given the observed morphologic
findings. For example, the morphologic findings in a fictitious
lung biopsy might be interpreted as 5% conditional probability
of inflammation, 20% conditional probability of benign tumor,
and 80% conditional probability of malignant tumor. Based on
these conditional probabilities, the clinical probability of any
of these entities could then be calculated. This approach by
Schwartz and colleagues is quite useful, although it is unappeal-
AJ.C.P.^May 1995
RAAB ET AL.
591
ROC Curves and LR in Anatomic Pathology
TABLE 2. DIAGNOSTIC ACCURACY (AJ VALUES FOR
MALIGNANCY FOR SEVEN OBSERVERS
1.0
0.8-
0.6
P(TP)
0.4
p(TP) - true positive rate
p(FP) - false positive rate
0.2
0.0-f
0.0
0.2
0.4
0.6
0.8
Observer
Area (AJ
Standard Error
l
2
3
4
5
6
7
0.823
0.747
0.878
0.912
0.853
0.888
0.921
0.043
0.055
0.039
0.030
0.043
0.050
0.020
1.0
/ pooled data
without history
0.6
p(TP)
0.91
0.85
0.95
0.97
0.94
0.99
0.93
Level of Experience
Area (A J
Standard Deviation
Less
More
0.858
0.866
0.073
0.018
l. Individual observers use probabilistic categories differently.
Thus what some observers mean by qualitative diagnoses
such as atypical may be different from what others mean.
0.4p(TP) - true positive rate
p(FP) - false postive rate
0.2-
0.0
0.0
<,kzz
£AZ<;
<.\z<.
sAzs
<;A Z <;
sAzi
<;A z =s
Sensitivity and specificity apply only to binary data (ie, the
presence or absence of disease) and not to qualitative probabilistic data. A second problem with sensitivity and specificity is
that these measurements fail to convey the clinical probability
of disease in an individual patient given a particular test result.
The LR and ROC curve analysis effectively handle both of
these difficulties and represent an extension of Bayes' rule.
In this study, we investigated the utility of the LR and ROC
curve in the interpretation of the prototypical BB specimen.
Several conclusions can be drawn:
1.0
/
0.74
0.64
0.80
0.85
0.77
0.79
0.88
TABLE 3. DIAGNOSTIC ACCURACY (Ar) VALUES FOR
MALIGNANCY FOR THE POOLED MORE EXPERIENCED
OBSERVERS AND THE POOLED LESS
EXPERIENCED OBSERVERS
P(FP)
0.8
95% Confidence
Interval
0.2
0.4
0.6
0.8
1.0
TABLE 4. DIAGNOSTIC ACCURACY (AJ VALUES FOR
SMALL CELL CARCINOMA AND FOR NON-SMALL CELL
FOR SEVEN OBSERVERS, POOLED MORE EXPERIENCED
OBSERVERS AND POOLED LESS
EXPERIENCED OBSERVERS
Cell Type
p(FP)
FIG. I. (Top) ROC curves measuring the diagnostic accuracy for the
diagnosis of malignancy for seven observers.
Observer(s)
Small Cell
Carcinoma
SE
Non-Small Cell
Carcinoma
SE
FIG. 2. (Bottom) ROC curves measuring the diagnostic accuracy of
seven observers without clinical history and the original cytologic diagnoses with clinical history.
l
2
3
4
5
6
7
0.752
0.454
0.888
0.847
0.847
0.536
0.830
0.124
0.182
0.057
0.089
0.057
0.326
0.130
0.835
0.787
0.877
0.916
0.848
0.922
0.928
0.043
0.052
0.042
0.030
0.048
0.036
0.019
Experience
Less
More
0.684
0.868
0.092
0.029
0.878
0.863
0.032
0.015
ing to most pathologists because it is not thought to accurately
reflect the normal thinking process.
We propose the use of statistical methods that measure the
accuracy of pathology diagnoses in probabilistic terms. This requires a switch from the commonly utilized measurements of
sensitivity and specificity, which are limited in several aspects.
SE = standard error.
Vol. 103-No. 5
592
ANATOMIC PATHOLOGY
Original Article
This can be seen with the LR. In this study, the post-BB
effect of the diagnosis of probably benign varied; for observers 1, 3 and 4, the post-BB probability of malignancy is lowered (LR <1.0), whereas for observers 2, 5, 6, and 7, it is
increased (LR > 1.0). There also can be large differences in
degrees of effect; for the diagnosis definitely malignant, the
post-BB probability of malignancy is much higher for observer 6 (LR = 19.00) than for observer 5 (LR = 5.17).
2. Diagnoses express probability of disease. The pathology diagnosis should be used in conjunction with the clinical likelihood of disease to predict the post-test probability of disease. For example, the cytologic diagnosis of definitely
malignant does not indicate that a lesion has to be malignant. In fact, if the clinical likelihood of disease is less than
100%, for all observers in this study, a definitely malignant
diagnosis does not imply absolute certainty of malignancy;
a diagnosis of definitely malignant only indicates that the
post-BB probability of malignancy increases above the preBB clinical probability. Likewise the cytologic diagnosis of
definitely benign does not imply that there is no possibility
of malignancy.
3. Some observers are more conservative in diagnosis (observer 7) than others. Reasons include lack of experience
and natural inclinations. Although observer 7 exhibited a
similar level of diagnostic accuracy as the other observers,
the predictive power of certain diagnostic categories, such as
definitely malignant, is less meaningful. Although "accurate," a diagnosis of observer 7 may not be clinically useful.
4. For many types of specimens, the currently used diagnostic
format of expressing probabilities through qualitative terms
is adequate. Likelihood ratios also can be calculated for any
other qualitative diagnoses such as suspicious or most likely;
in addition, LRs can be calculated for the probability of occurrence of any disease process in any organ, such as Pneumocystis pneumonia in bronchoalveolar lavage specimens,
metastatic disease in liver biopsies, or lymphocytic thyroiditis in thyroidfine-needleaspiration biopsies (FNABs).
certainty of malignancy than a diagnosis of definitely malignant, but more certainty than a diagnosis of suspicious. Further
studies are needed to investigate the probabilities associated
with these terms.
In actual practice, many clinicians, especially surgeons and
oncologists, want binary diagnoses, their argument being that
treatment is dependent on the presence or absence of disease
and not on the probability of disease. Clinicians need to act and
the pathologic diagnosis provides information on which this
action is based. It often is easiest to make a clinical decision if
the pathologic diagnosis lies at either end of the spectrum (ie,
definitely malignant or definitely benign). A definitely malignant diagnosis is acted on as if there really is cancer, regardless
of the probability of malignancy. However, in truth, this action
is based on a number of factors, such as clinical history and
physicalfindings,and the pathologic diagnosis is just one piece
in the puzzle.
Thesefindingsindicate that cytopathologists cannot and perhaps should not always render "black-and-white" diagnoses
and that non-binary results have an important role in pathology. At the University of Iowa as elsewhere, non-binary diagnoses effect different clinical responses, depending on the clinical scenario. Following a non-binary BB diagnosis, a clinician
may do nothing, repeat the test, order another test or start treatment. For example, if the BB diagnosis is atypical and the patient is young and without risk factors for carcinoma, the clinician may do no further work-up. With the same BB diagnosis
in an older patient who has a lung mass and is presumably operable, the clinician may repeat the BB or move on to another
test, such asfine-needleaspiration. In this case, the inability to
issue a malignant diagnosis may be attributed to a sampling
error. In an inoperable patient with brain metastases, the same
BB diagnosis may be enough to initiate radiotherapy. For
proper patient work-up or treatment, detailed communication
of the pathologicfindingsto the clinical staff is key.
The overall diagnostic accuracy of individuals or groups can
be expressed with ROC curves, which also can be used to evaluate a number of variables including the effects of observer experience or clinical history. Using ROC curve analysis, Cohen
Confidence intervals (CIs) can be calculated for each LR for
and coworkers previously showed that in breast FNAB, experieach observer. The CIs for observer 1 are: definitely benign,
enced observers performed at a higher level of diagnostic accu0.07, 0.49; probably benign, 0.08, 0.80; possibly malignant,
0.68, 3.20; probably malignant, 0.63, 4.72; and definitely ma- racy than less experienced observers.17 In this study, a similar
conclusion cannot be made in the interpretation of BB specilignant, 2.80, 34.31. Because of CI overlap, these data collapse
into three categories; the categories definitely benign and prob- mens for the diagnosis of malignancy. Possible explanations are
that the sample size was not large enough to show an effect of
ably benign can be combined and the categories possibly malignant and probably malignant can be combined. The category observer experience, too few "difficult" cases were included or
that experience does not play a key role.
definitely malignant remains unchanged. This collapse into
three categories may be a more realistic representation of how
Interestingly, although not statistically significant, the more
cytologists really think (benign, atypical or suspicious, and ma- experienced observers performed at a higher level of diagnostic
lignant). In this schema, a benign (definitely benign/probably accuracy than the less experienced observers in the interpretabenign) diagnosis would still lower the post-BB probability of
tion of BBs with small cell carcinoma. In fact, some less experimalignancy; an atypical or suspicious (possibly malignant/
enced observers exhibited Az values of less than 0.5. This trend
probably malignant) diagnosis, being centered around a LR of was not observed for non-small cell carcinomas. This finding
1.0 would not effect the post-BB probability of malignancy; and
has an important clinical impact, because the diagnosis of small
a definitely malignant diagnosis would increase the post-BB
cell carcinoma elicits a different treatment protocol than does
probability of malignancy. This collapse of categories removes
the diagnosis of non-small cell carcinoma. Additional studies
the LR of oo in cells that contain few data points.
are needed to further characterize this trend.
Other non-binary diagnoses, such as probable, consistent
ROC curve analysis showed that the absence of clinical inwith or most likely are used with relative frequency in diagnosformation appears to lower diagnostic accuracy in the intertic pathology. The probability of disease associated with these
pretation of BBs. In practice, because of the occurrence of atypcategories usually lies between the probabilities associated with
ical cells in both benign and malignant conditions,
the categories of definitely benign and definitely malignant. For cytopathologists generally are reluctant to make a diagnosis of
example, the category consistent with malignancy implies less malignancy without knowing the clinical facts, such as the age
AJ.C.P.-May 1995
RAAB ET AL.
593
ROC Curves and LR in Anatomic Pathology
of the patient. With clinical information, findings that are suspicious for malignancy in an elderly patient may be reactive in
a young patient with a history of AIDS and presumed pneumonia. Consequently, without clinical history, observers generally
are more conservative; diagnoses called definitely malignant
with a clinical history may be called probably malignant or possibly malignant without history. An important point shown in
this study is that despite the absence of clinical information,
observers still operate at high levels of diagnostic accuracy,
even though fewer cases are called definitely benign or definitely
malignant. For the diagnostic accuracy of a malignant diagnosis, the Az values ranged from 0.747 to 0.921. The absence of
clinical information eliminates the problem of "double counting" clinical data. However, if not providing clinical information, clinicians must be willing to eschew more definitive diagnoses in favor of more qualitative diagnoses that can be
interpreted effectively with the LR.
In summary, the LR and ROC curve are two statistical techniques that express the probability of disease. Together with a
rethinking of what anatomic pathologists really do, these two
techniques can facilitate the communication problems between clinicians and pathologists.
REFERENCES
1. Schwartz WB, Wolfe HJ, Pauker SG. Pathology and probabilities:
A new approach to interpreting and reporting biopsies. N Engl
J Med 1981;305:917-923.
2. Bryant GD, Normal GR. Expressions of probability: Words and
numbers. N Engl J Med 1980; 302:411.
3. Toogood JH. What do we mean by "usually"? Lancet 1980;1:
1094.
4. Selvidge J. Assigning probabilities to rare events (PHD dissertation). Cambridge, MA: Harvard University, 1972.
5. Tversky A, Kahneman D. Judgment under uncertainty: Heuristics
and biases. Science 1974; 185:1124-1131.
6. Valenstein P. Technology assessment for the diagnostic laboratory.
American Society of Clinical Pathologists National Meeting.
ASCP Special Topics Council Commission on Continuing Education: Should this test be done? Lecture notes 1992; 1-16.
7. Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical epidemiology: A basic science for clinical medicine, 2nd ed. Boston: Little, Brown, 1985.
8. Radack KL, Rouan G, Hedges J. The likelihood ratio: An improved measure for reporting and evaluating diagnostic test results. Arch Pathol Lab Med 1986; 110:689-693.
9. Giard RW, Hermans J. Interpretation of diagnostic cytology with
likelihood ratios. Arch Pathol Lab Med 1990; 114:852-854.
10. Raab SS. Diagnostic accuracy in cytopathology. Diagn Cytopathol
1994;10:68-75.
11. Dorfman DD, Berbaum K.S. RSCORE-J: Pooled rating method
data—a computer program for analyzing pooled ROC curves.
Behav Res Methods Instruments Comput 1986; 18:452-462.
12. Dorfman DD. RSCORE II. In: JA Swets, RM Pickett, eds. Evaluation of diagnostic systems: Methods from signal detection theory. New York: Academic Press, 1982.
13. Godfrey K. Statistics in practice: Comparing the means of several
groups. A' Engl J Med 1985;313:1450-1456.
14. Lusted LB. Introduction to Medical Decision Making. Springfield,
IL: Charles C. Thomas Publishers, 1968.
15. Robertson EA, Zweig MH, Van Steirteghem AC. Evaluating the
clinical efficacy of laboratory tests. Am J Clin Pathol 1983; 79:
78-86.
16. Langley FA, Buckley CH, Taster M. The use of ROC curves in
histopathologic decision making. Anal Quant Cylol 1985; 7:
167-173.
17. Cohen MB, Rodgers RPC, Hales MS, et al. Influence of training
and experience infine-needleaspiration biopsy of breast. Arch
Pathol Lab Med 1987; 111:518-520.
18. Giard RWM, Hermans J. The value of aspiration cytologic examination of the breast. Cancer 1992; 69:2104-2110.
19. Beck JR, Shultz EK. The use of relative operative characteristic
(ROC) curves in test performance evaluation. Arch Pathol Lab
Med 1986; 110:13-20.
20. Kim I, Pollitt E, Leibel RL, et al. Application of receiver-operator
analysis to diagnostic tests of iron deficiency in man. Pediatr
Res 1984;18:916-920.
21. Hanley J A, McNeil BJ. The meaning and use of the area under a
receiver operating characteristic (ROC) curve. Radiology
1982;143:29-36.
22. McNeil BJ, Hanley JA. Statistical approaches to the analysis of
receiver operating characteristic (ROC) curves. Med Decis Making 1984;4:137-150.
23. Metz CE. Basic principles of ROC analysis. Semin Nucl Med
1978;41:283-298.
24. Metz CE. ROC methodology inradiologicimaging. Invest Radiol
1986;21:720-733.
25. Metz CE, Goodenough DJ, Rossman K. Evaluation of receiver
operating characteristic curve data in terms of information therapy, with applications in radiography. Radiology 1973; 109:
297-303.
26. Hanley JA. Receiver operating characteristic (ROC) methodology:
The state of the art. Crit Rev Diagn Imaging 1989;29:307-335.
Vol. 103 • No. 5