under Receiver Operating Characteristic Curves Derived from

[Reprintedfrom RADIOLOGY, Vol. 148,No. 3, Pages839 84;],September,1981].1
Copyright 1983by the RadiologicalSocietyof North America,Incorporated
|ames A. Hanley, Ph.D.
BarbaraI. McNeil, M.D., Ph.D.
A Method of Comparingthe Areas
Operating
underReceiver
CurvesDerivedfrom
Characteristic
the SameCasesl
Receiveroperating characteristic(ROC)
curves are used to describeand compare
the performance of diagnostic technology
and diagnostic algorithms. This paper refines the statistical comparison of the
areasunder two ROC curves derived
from the same set of patients by taking
into account the correlation between the
areasthat is induced by the paired nature
of the data. The correspondencebetween
the area under an ROC curve and the Wilcoxon statistic is used and underlying
Gaussiandistributions (binormal) are assumed to provide a table that converts the
observed correlations in paired ratings of
images into a correlation between the two
ROC areas.This between-areacorrelation
can be used to reduce the standard error
(uncertainty) about the observed difference in areas.This correction for pairing,
analogousto that used in the paired ttest, can produce a considerableincrease
in the statistical sensitivity (power) of the
comparison.For studies involving multiple readers,this method provides a rneasure of a component of the sarnpling variation that is otherwise difficult to obtain.
I n d e x t e r m : R e c e i v e ro p e r a t i n gc h a r a c t e r i s t ic u r v e
(ROC)
, e p t e m b e r1 9 8 3
R a d i o l o g y 1 4 8 :8 3 9 - 8 4 3 S
1l rvnnar questionsdealing with comparativebenefitsfor alternaD tirr" diagnosticalgorithirs, diagnoitic tests,or therapeuticregimens have recently emergedin medicine. For example,how do we
know whether one diagnostic algorithm is better than another in
sorting patientsinto diseasedand nondiseasedgroups?Whether the
addition of a new test or procedureto an establishedalgorithm improves its performance?Whether it matters who of severalavailable
readersinterprets a mammogram?Whether one type of hard-copy
unit in radiology is better than another?Whether reading a CT scan
in coniunction with the patient'shistorv allows a more accuratediug.,ori, than reading it without the hiitory? The analysesof such
problems have startedwith constructionof receiveroperating characteristic(ROC)cutves (1-3). Generally theseanalyseshave used as
cutoff points either different posteriorprobabilitieson a continuous
scaleor different thresholdson a discreterating scale.The latter approach has been particularly popular in radiology.
Major gaps in the understanding of statisticalpropertiesof ROC
curves have limited their usefulness,especiallyfor questions involving comparisonsof curvesbasedon the samesampleof subjects
or objects.Thesecomparativesituationscontrastwith those involving
a single datasetand a single ROC curve. In such cases,the investigator
generally only needs to know that a single modality or diagnostic
"poor", "moderate", or "good" accuracy,and the locaapproachhas
However, when a
ROC
curve gives a rough assessment.
tion of the
comparisonof two algorithmsor modalitiesis relevant,more formal
statisticalcriteria are needed in order to judge whether observed
differencesin accuracyare more likely to be random than real.Thus
far thesecriteria have not been fully developedfor ROC curves.
In a recentpaper(4) we dealtwith one popular accuracyindex that
can be derived from and used as a summary of the ROC curve. We
showed that the relationshipof the areaunder the ROC curve to the
Wilcoxon statisticcould be used to derive its statisticalproperties,such
as its standarderror (SE)and the samplesizesrequired to measurethe
areawith a prespecifieddegreeof precision(reliability) and to provide
a desiredlevel of statisticalpower (low type II error) in comparative
experiments.This paper extendsour statisticalanalysisto another
large classof situations,where the two or more ROC curvesare generated using the sameset of patients.In thesesituations,it is inappropriate to calculatethe standarderror of the difference between
t w o a r e a s( A r 0 a 1 a n dA r A n 2a) s
SL(Arint-Ariazl=fffi
1 From tho Department of Epidcmiology antl
Health, McGill Universitv, Montreal, Canada (J.A.H.)
and the Department of Iladiologv, Harvard lvfedical
School and l3riglram and Women's HosPit.rl, Bostor,
M A , U S A ( B . l . M . ) . R t ' c e i v e d J u n e 3 , 1 c ) 8 1 ;r e v i s i o n r e quested Jull- 21, 1981; final revision receivt'cland acceptecl Feb. 15, 1983.
Supported in part by the Hartford Found.rtion antl
ht
the National Center for Health Care Technologv.
(l)
srnceAr0at and Ar0szare likely to be correlated.This correlationis
likely to be positive; if the vagariesof random sampling of cases
produce a higher/lower than expectedaccuracyindex for one modality (e.g.,if the sampleconsistedof a larger than usual number of
easy/difficult cases),then the accuracyof the secondmodality will
probably alsobe correspondinglyhigher/ lower than one would expect. In other words, while the two indices may fluctuate indepen-
dently by amounts SE1and SE2in separate samples,they will tend to fluctuate in tandem when derived from a
single sample.
In this paper we have developedan
approach to take account of this correIation. In brief, we indicate that the
relevant standarderror for such comparisons is not that shown in Equation
1 but rather
S E ( A r A a 1 -A r A a z )
=@
-2rSE(ArAa) SE(ArAa) Q)
where r is a quantity representingthe
correlation introduced between the
two areasby studying the samesample
of patients. This paper reviews the
calculations for comparing the ROC
curyesof two modalitiesand illustrates
this new approach using data from a
seriesof experimentsinvolving phantoms.
tained in three ways: (l) by the trapezoidal rule; (ll) as output from the
Dorfman and Alf maximum Iikelihood
estimation program (5); or (lii) from the
slope and intercept of the original data
when plotted on binormal graph paper
(3). As indicated in our companion
paper (4) the trapezoidal approach
systematically underestimates areas.
Because the Dorfman and Alf approach
is becoming readily accessibie to those
interested in this area, we will calculate
areas using this approach. (For those
limited to graphical methods, the area
can be derived from the slope and intercept according to the rule Area =
Percentage of Gaussian distribution to
left of zt, where Z1 = Inter-
cept/r/JT stopez-).
CalculatingStandardErrors
Thestandarderrorsassociated
with
areascan be obtained in three ways: (i )
as output directly from the Doriman
METHODS
and Alf maximum likelihood estimaThe general approach to assessing tion program; (li) front the varianceof
whether the difference in the areas the Wilcoxon statisticas illustrated in
under two ROC curves derived from detail in Reference4; or (iii) from an
the same set of patients is random or approximation to the Wilcoxon statistic
real is to calculatea critical ratio z, de- by making an assumption,shown to be
conservative(comparedwith assuming
fined as
a Gaussian-based
ROC curve),that the
o,_ Az
underlying signal (diseased)
noise
\J/ (nondiseased)distributions and
are expo,rlp1Ts77 rrtr,*
(a). We will use the
where A 1and SE1refer to the observed nential in type
standard
errors
estimated from the
area and estimated standard error of
Dorfman
and
AIf
program.
the ROC areaassociatedwith modality
1; where A2 and SE2 refer to correCalculating the Correlation
sponding quantitiesfor modality 2; and
Coefficient, r, BetweenAreas
where r representsthe estimatedcorrelation between ,41 and A2.2 This
Two intermediate correlation coefquantity z is then referred to tablesof ficients are required, which are
then
the normal distribution and values of converted into a correlation
between
z above some cutoff, e.9.,z > 7.96,are ,4 and A2 aia a table that
we supply
1
taken as evidence that the "true" ROC below. The first
is ro,,',
the correlation
areasare different. The importance of coefficient for the ratings given
to imintroducing the 2rSE$E2term in the agesfrom nondiseasedpatients
by the
above equation is obvious: failure to two moctaiities.The secondis r,i,
the
subtract out from the sampling vari- correlation coefficient for the ratings
ability those fluctuations that the of diseasedpatients imaged
by the two
paired design has already eliminated modalities.
Eachof thesecan be calcuwill leave the denominator of Equation lated in
traditional ways using either
3 too large and z too small, thereby re- the Pearsonproduct-moment
correladucing the chanceof detecting a dif- tion method
or the Kendall tau. The
ference between two modalities.
former approach is usually used for
results derived from an interval scale
whereasthe latter is more appropriate
Calculating Areas
for results obtained from an ordinal
Areas under ROC curvescan be ob- scale.
ROC curvesin radiology are derived from ordinal scale data and
therefore we have used the Kendall tau
for calculating /"r' and 11. Standard
2 As we will see later, the SE of an estimated
s t a t i s t i c a lp a c k a g e s( c . 9 . ,S P S S ,S A S )
area depends on the magnitude of the underlying
"true"
ot
atea. When calculating : to test the null
provide tau; when the number of rathypothesis that this underlying area is the same
ing categoriesis small, however, say
for both modalities, one should equate SE1 and
four
or less,the caiculationcan alsobe
SE2, calculating them both from a common est:iperformed manually.
mate of the area. In this case the denominator
becomes
t/2SET - a or sE/2(1 aJ.
Once the correlationsbetween the
840 . Radiology
ratings (rn among the normals, r..1
among the abnormals)are obtained,it
is necessatyto calculatethe correlation
that they induce between the two areas
A r and Az; for easeof notation we have
called this r (without any subscript).
This is the coefficient present in
E q u a t i o n s2 a n d 3 . T a b u i a t i o no f r
(Taet-EI) is the fundamental contribution of this paper3;therefore, in our
subsequentexamplewe will illustrate
its use.
Experimental Data for
Illustrative Examples
We studied 112phantomsthat were
specially constructedto evaluate the
accuracyof two different computeralgorithms used in image reconstruction
for CT. Fifty-eight of thesephantoms
were of uniform density and were
"normal";
designated
the remaining 54
contained an areaof reduceddensity to
simulate a lesion and were designated
"abnormal". Two images
of each
phantom were reconstructedusing the
two different algorithms, which we
will refer to as modality 1 and modality
2. A single reader read each image and
rated it on a 6-point scale:I = Definitely Normal;2 = Probably Normal;
3 = PossiblyNormal;4 = PossiblyAbnormal; 5 = Probably Abnormal; 6 =
Definitely Abnormal. From the resulting data, we constructed two ROC
curves.The data were submitted to the
Dorfman and Alf maximum likelihood
program to produce areas under the
ROC curvesand standarderrors.
RESULTS
Our resultswill be divided into two
parts. First, the analysisof the example
involving CT phantoms will be illustrated. Then, in order to verify that the
z statisticperforms correctly, results of
several simulations will be summarized.
CT Phantom Example
The basicdata are presentedin the
Appendix, along with the calculations
produced from them. The areasunder
the ROC curveswere 89.45%(SE3.0V")
and93.82%(SE2.6%).The (Kendall tau)
correlationsbetween the paired ratings
were rN = 0.39(nondiseasedpatients)
and r4 = 0.60(diseasedpatients),giving an /iaverage//correlationbetween
the ratings of 0.50.With this average
correlationof 0.50and with an average
areaof (89.45+ 93.82)l2 = 91.64,TaarE
3 Mathematical derivation available upon request.
September
1983
It8 . fSoIorpPU
Sursn sarpnls Jo raqrunu-e dq,tr ureldxa
asrer plno.t{ 1sa1parred aql leql 1ra[ord
deru uor;enrasqo srqJ 'aq llr.tt lsal z
plnoJ auo 'sarJern)Je aurlaseq pue
parred ar{} e^llrsuas aloru aql /seaJ€ ar{J suorl€IarJo) Jo suor]PurqruoJ Jnot ra^o
'aJuaJeJJrp
uaemlaq uorJ€laJro) aql ra8rel aq1
luerr;ruBrs e Surlerrpur slsal
'uJnl
parredun pue parred yo a8eluarrad oql
ur
slurod
asar{J
ssnJsrp
3u11e1nqe1.,{qpaqenlerra se,1taJueruJoJ
ar14 'srsdleue pue u8rsap leluaurrad
-rad aq1 'sluarJrJJaoJuorlelarror 8ur
-xa Jo pur{ srq} uroJJ saSlarua +eql dru
-o-uo)a
-dre,L ruory palelnurrs araru sluarur,rad
Ierrlsrlpls aLIl alprlpul ol palelo
'seare
'puoJas 'aJrMl
-xa 002
-deJlxa
aq
UPJ
elEp
Jno
Jo slas'asodrnd srr{l roC
luerled qrea 8ur.{pn1s .,{q parnpur
JOU tuaraJJIP '{tl^ salllleporu o^{l
serJe ur arualaJJrp JL{l
;o uosrredruor parrnbar )rlsrlels srql roJ
J
o
,
'
{
1
r
1
r
q
e
r
"
r
e
.
r
(,razvrod)flr,rrlrsuas ar{} Jo uorlenlelg
Surldwes.rallerus aLIl lunoJJ€ olur salel
_
'%1'96
,role8rlsarrur ar{l Jr alrlrsuas aJoru operu
Jo lrJrJrraos
eq ueJ uosrJedruor aql ]eql umoqs a^eLI e ''a'!'obg', uaaq a^pr{ plnoM srlll 'uoll
e1,4 'lua.reddp arp sllnsal alerparurur
o1!r,J'sluarled;o aldrues arues aLIl ruorJ
pa^rJap seAJnJ lou
ol!\l Japun seaJe
aql SurredruoJ Jo poqlau e paqlrrs
-ap aleq a.u uorle8rlsalur srq] uI
NOISSNJSIC
'obgrTqol (srs,,{leue
parredun
ue uroJJ palradxa)
dlr,Lrlrsues %09 e
- n q l r l s r p u p r s s n e J 1 r a 1 r . r de u l ' h 6 i 6
''a't /oby'g
set
(arua.ra;
;o ,,{1rrr;rcadse
-tlp luPrlJlu8rs d11err1sr1e1s
e Surletrp
-ur sp ua{el uJlJo sanlel) g 7- rr,rolaq
ro 0'Z a^oqe sanlel z 1o uorlrodord
a8e,ra,reaql (sarur] 007 unJ r{rea suorl
-eurquor 71) s1err1008't er'{t Suotue
'dlprryoadg'lradxa plnoqs auo
teqm ol
J s o l f ,p u p ' , l r o l J I J M s e l e r a n r ; r s o d - a s 1 e ;
aql 'I ol asol) dlqeldale
suor+el
'zl(v tv)
+
' z l ( v t N t )t
+
+
'(surunloJ) earp aBe,rale pue (s^{oJ) sButler uaa,nlaq
o.{^l uaaalaq I IUeIJIJJJoJ UoIIPIJJJoJ i
uorl€laJroJ a8era,te yo uorlJunJ e sp z y pup I y spaJe fod
096
9t6
t80
I80
BLU
9'' 0
tt'0
0t0
8 90
990
890
090
890
990
€90
Is0
6r'0
lt0
9r'0
€?0
It0
6€0
l€ 0
9€0
€€0
r€'0
6Z'0
820
9z'0
,z'0
€z'0
tz'j
610
8r'0
9 10
9I0
tI0
zr'0
II0
600
800
100
900
r00
€00
z00
r00
280
6t0
9L0
€L0
0L0
t90
r9'0
r9'0
690
950
t90
r90
6r'0
l,0
910
zv0
0r0
8€0
9€0
t€'0
z€,0
0€0
620
lz0
9Z 0
nz0
zz0
rz0
610
8r 0
910
9I0
€r'0
zI0
II0
0I 0
600
100
900
900
r00
000
200
200
r00
980
280
080
tL0
9t0
zt0
010
t90
S90
€90
090
890
990
F90
r90
6n0
t 0
9r'0
€i0
rn0
6 €0
l€,0
9€0
r€0
Z€ 0
0€0
820
9z0
920
€z0
rZ0
020
8I 0
lr'0
9I0
tI 0
zI0
II0
600
800
900
900
r00
200
r00
926
980
€80
I80
8t0
9t0
€t0
rt 0
690
990
b90
a9'0
090
ts0
9s0
€s0
r90
610
h0
910
€f0
If'o
6€0
l€0
9€0
S€'0
z€'0
0€0
8z0
920
,7,0
tz0
rz0
6 10
8I0
9I0
9I 0
gr'0
Ir'0
0I0
800
l00
900
f00
€00
r00
006
980
rB0
r80
6t0
Lt0
ftt0
zt0
010
t90
990
€90
r9 0
690
t90
rE0
290
090
810
9i0
tt 0
z,0
tB0
180
280
080
tt0
910
€.t'0
rt'0
990
99' 0
t90
z9'0
090
890
990
€90
r90
6t0
fi'U
91 0
€i0
180
980
zB0
080
8t'0
9t0
tt0
rL0
690
t90
990
z9'0
090
890
990
190
290
0s0
8?0
91'0
rn0
180
980
€80
I80
8 10
9t0
bL0
zt0
690
190
990
€ 90
190
690
l90
990
€90
r90
6b0
lt'j
9t0
0r0 rt0 zn'j €r0
8€0
t€,0
9€0
€€0
I€0
6Z0
tz0
920
,20
zz0
6e0
8t0
9e0
n00
zt0
000
820
920
920
020
0''0
8€0
9€0
9€0
t-€0
r€'0
6z'0
lZ0
920
,20
It0
6E0
t€,'0
9E0
tt0
rt0
0€0
820
920
,20
020 rz0 zz0 zz0
610
tr'\
9r'0
fI 0
zI'0
II'O
600
t00
900
r00
€00
I00
stg
6r'0
8I 0
9r'0
tI 0
tI 0
II'O
600
800
900
900
t00
200
098
+eaJV
020
8r'0
lr'0
9I 0
€I 0
II'O
0I0
800
900
900
€00
200
s28
Iz0
6 10
tr'0
9r'0
€r'0
ZIO
0I0
800
100
900
s00
200
008
880
s80
e80
r80
610
910
rt0
zt}
0t'0
890
990
t90
290
690
t90
990
€90
r90
6t0
fi'j
9t0
€r0
zn0
0n0
8€0
9€0
r€0
z€0
0€0
820
920
920
€20
880
980
e80
r80
6t0
tt0
910
ztj
010
890
990
i90
z9'0
090
890
990
190
290
090
Bn0
9t0
fr'O
zt0
0r0
8€0
9€0
r€'0
z€0
rt0
620
tz}
920
sz0
880
980
180
r8'0
6t0
lt0
9[0
t,to
rl0
690
990
,90
290
090
890
990
r's0
290
090
810
9r'0
nr0
zt0
0n0
6€0
t€'0
9e0
e€0
IE0
620
t(,0
920
ez0
880
980
180
280
6t0
tl0
9t0
€.t0
rl0
690
t90
990
€90
I90
690
t90
990
t90
r90
610
fi0
9t0
ti0
It0
6€0
lE0
9t0
800
I€0
62' 0
tzj
920
1z'0
rz0 rz0 zz0 zz0
6 r ' 0 6 r ' 0 0 z0 0 z0
tr'j
9r'0
fI0
zr0
OI()
600
100
s00
€00
200
qtt
8r'0
9r'0
tI0
zr0
IIO
600
t00
900
€00
200
09L
8I0
9 10
tI0
zl0
II'O
600
100
900
f00
200
qzt
8 10
910
tI0
tI0
IIO
600
100
900
i00
200
00t
PJJ^V
06 t)
890
980
180
280
080
8t0
9t0
tt0
zt0
0t0
890
990
t90
z9'0
090
8E0
990
190
290
090
8n0
9f0
tt0
0f0
8€0
9€0
r00
Z€O
0€0
BZ0
920
ftz0
zz0
020
8I 0
9 10
rI 0
zr0
0I0
800
900
r00
200
1 s . 9 ur l e y
LrJi),\{laq
uorlPI.rJJoJ
aBera,ty
*sluarJrJJaoJuorlelaJJoJ :I a-lsvI
€ raqurnN
8tI aurnlo^
-rrap prepuels peq pue sauo uerssneD
ruor; alqer{srnBurlsrpur sasodrnd 1errl
-rerd 11eJoJ aJaM suorlplnrurs snorJel
esaql ruorJ paurelqo z Jrlsrlels lsal
aql Jo suortnqlrlslp palelnqel aql 'suoll
-elalJor pue seare 3gy Surdpapun
JO suorleuiquroJ IeJaAas Jo qJea roJ
pauroyad ara,u sas.{1euepal€lnruls 00t
'flrrrlnads
alpln)ler
ol Japro uI
aql
'u) uPluuor) pue zlal pue (9) qarsH
tr
pue ))ellod ,{q pasn asoql ol snoSoleue
spoqlaru Sursn'suorlenlrs palelnurrs
yo a8ue,r e JaAo aJu€urroyad rrlsou8erp
slr paurruexa a1\1.'lsal IeJrlsrl€ls /vtau
su.{l JoJ sJrlsrJalJeJerlJasaql auIruJa]ap
q8rq)
o1 '(,,(1rl;nads pue ,,{1r,r.r1rsuas
'lJeJ ur 'uar{.4{
lsrxa
lsrxa saop auou
ol pr€s sr aJuaraJJrpe qJTLIMur saJuels
-ul to Jaqurnu aq1 .{e.u alqelrrpard
e ur azrurrurru plnoqs 1r lnq 'luasard
,,{11earsr auo uar{.naaJualaJJrp e aleJ
-rpur plnor{s lsal leJrlsrlels poo8 y
lsal parIEd
aql Jo aruEruroJrad lerauac
'alerrdordde uaaq
a^eq plno.4 lsal palrel-oMl e uaql'uoll
-Jalrp JelnJrlred auo ur lsaralur uoud
, ou peq arvr;1 '(sarlrlepoul qloq qllrr
sluerled Jo tas aur€s aql 8u1,(pnls {q
paJnpur .,{lr'rrlrsuaspaseaJJur ar{l }uno)
-re olur d)el ot PrIeJ rM pPt{ 'sproM
Jar{lo ul) oraz oJ lenba aq ol pauns
-Se uaaq s€aJe uaa.ft{Jaq uollelalJoJ
ar{l peq Paleln)lef, uaaq a^eq plnoM
leql (l uI I ro 9€I'0 Jo anle^ d e ro) 0I'I
Jo orler IeJrlrJJ e woJJ uMeJp aq plno,4t
]eql aJuaJaJur Ja{ea1vraql.Lllllvr slseJl
-uoJ srr{I 'ruopueJ aq lou Aeru af,uaJaJ
-Jrp pa^rasqo aql ler{l s1sa33nsaruap
-rla srllt :(OZO'O= d) saldrues g1 ,{ra.,ra
ur aJuo ,,{1q8norrnJro plnoqs raq8rq
ro It'I Jo anle^ e leql saleJrpul uorl
-nqrrlsrp uerssneD aq1 'alerrdordde sr
lsal palrel-auo e uar{l'sluaruanordrur ur
palsaralur,,{1uo ere pue 1 dlrleporu uelll
raltaq aq o1 .{14111sr 7 dlrlepou 1eq1
tnud a a^arlaq ol uoseoJ a^erl a,tr JJ
'zr'l
lo anleA z Ie)rluapl
lsourle ue splar.{ g uorlenbg Jo Joleu
-rruouap aLIt ut g0€0'0 = ft'}-I)zL
I8Z0'0 Sursn l1g7g'g sp srorra prepuels
arll Jo r.{)ea lJrpalo ol t a)uaJaJau
u r P l n u r r o j . ) r { l r s n p u P ' t 9 1 6 ' 0J o e a r P
uor.ur.uoJ
e urPlqo ol 5PJJeoa,r1aq1 aSera
-,r.elq8rru auo 'rar]rea pauorluau sV
I7'I = 60€0'0ll€10'0 =
- zgzo
(qzo'o)(eo'o)(t'o)z
o +zso'oA
=z
lGvogo-z8€6'o)
p l e pr ^ o q e e q l S u l s n
'3ur1dwesruopuer
to llnsar e .,{lararu
se]l{ seale pa^Jasqo uaa^{laq a)ualaJ
-Jrp pa^rasqo ar{l leql srsaqlodfq 11nu
aql lsa] ol Japro uI z olleJ IeJrlrJJ ar{l
e l e l n J l p ro l p a s n u r q l s p M g u o r l e n b 3
'67'g ,,{lalerurxorddesr seale uea.Ml
-aq I uorlelarroJ ar{l leql sale)lpul I
an unpaired z test that assumed the two
areas were statisticallv independent
failed to find a significantdiiference
between the modalities. The degree of
correlation expected between R(JC
areas obtained with different modalities varies considerably depending
upon the types of modaiities involved.
For example, if the two images are obtained from the same machine with
two different settings or if a radiologist
reads a CT scan with and without extensive clinical history, high correlation can be expected. In this study involving different reconstruction algorithms with CT, the correlation between the paired ratings of abnormal
phantoms was 0.60 ancl between paired
ratings of normal phantoms was 0.39.
We have observed similar results in a
study of ours (8) involving the interpretation of CT studies of the head
with and without extensive clinical
history. On the other hand, when the
onlv common denominator in the
comparison is the patient, the correlations are likelv to be weaker. For example, a study by Alderson ct al.(9)
comparing CT, ultrasound, and nuclear
medicine imaging in the diagnosis of
liver metastases found considerably
lower rating-pair correlations (0.36 in
abnormal patients and 0.28 in normal
patients). Obviously, in the latter situation the gains from using a paired
rather than an unpaired analysis are
smailer.
Two other points must be made
about correlation coefficients. First, in
general we have noted that whatever
the modalities under study, the ratings
tend to be less correlated in the nondiseased patients than in the ciiseased
patients. This suggests that in diagnostic imaging agreement tends to be
greater if there is in fact underlying
disease,and less if there is not. Second,
if an investigator knew a Ttriorithat the
correlations between the modalities
under study were smaii, then an experimental design that did not involve
pairing could be used, provided that it
was no more difficult to separate
(diagnose) the patients studied by one
modality than it was to diagnose those
studied by the other modalitv.
The statistical economy resulting
from this new statistical test is large.
Statistical economy relates to the
question of how many more patients
are required in an unpaired design
then in a paired design to achieve the
same sensitivity or statistical power. A
comparison of Equations I and 2 provides an answer to this question. Each
of the standard errors is inversely
proportional to the square root of the
sample size n. Also, the equations can
be simplified by assuming that the
standard errors of the two areas are
842. Radiology
equal; in this case, Equation 2 differs
from Equation 1 only in the presence of
the factor (1 - t). When the sample
sizes associated with the two techniques are arranged so that the pairc-d
and unpaired tests prclduce the same z
value, then a simple algebraic iclentity
emerges:
t r , ,= r t r l ( 1 - r )
or
tr,,=(l-r)rr,,
where li,, and 1,, are the numbers of
patients per modalitv in the respective
unpaired and paired designsa. For example, i.f r is anticipated to be roughlv
0.3 and an unpaired design called for
100 patients per modalitv, then a paired
design should require only 70 per
moclalitv. Thus the total number of
images iead would be 140 rather than
2 0 0 . T h i s e ff i c i e n c v i s e v e n m o r e i m portant if the limiting factor is the
number of avaiiable patients with a
proved outcome (rather than the
number of images a reader can be expected to reacl), since the total of 140
paired images is obtained from just 70
patients, rather than from 200 patients
in the unpaired. design. The investigator must weigh very carefully the
practical and statistical issues,keeping
in mind that if one uses an unpairecl
design, one must establish (thrr:rugh
casematching and/or random allocation) that the method of constructing
two independent samples of subjects
does not give one modalitv an inbuilt
advantage.
The discussion thus far has centered
on a rather restricted design where just
one reader read the images 5;enerated
by the two moclalities being compared.
The statistical test simplv asked the
q u e s t i o n :i f t h i s o n e r e a d e r r e a d a n i n finite rather than a finite number of
images, would his/her accuracy be
comparable in both modalities?q
Clearlv, a more general question is
relevant: how do the modalities compare over many reaclers?
For the sake of completeness, we
refer briefly to this probiem of multiple
readers and readings in each modality.
This situation has been discussed extensivelv by Swets and Pickett (10); our
main reasonsfor mentioning it here are
to draw readers' attention to a very
extensive treatment of the design and
analysis of imaging experiments, and
to point out that our method of ob-
4 This simple relation allows the user to mult i p l y t h e s a m p l e s i z e s i n T A B L I - :I I I o f o u r f i r s t
publication (4) bv the appropriate (1 - r) ancl use
them for paired designs.
5 Onc could also use the z test to compare two
specific readers on one modality.
taining r now allows the methods
therein to be used with greater sensitivity. This is best appreciated by reproducing the formula that the authors
give (Equation 2, Chapter 3) for the
standard error of a difference between
the value of an accuracv index (such as
the area under an ROC curve) for one
modality (averaged over / readers, each
reading each image rr times) and the
value of the same accuracv inclex (again
averaged over readers and reaclings)
for a seconcl moclality. The expression
i n v o l v e s t h r e e s o u r c e so f v a r i a t i o n : S f ,
the variation in the inclex clue to differences in mean difficultv of cases
f r o m c a s c s a m p l e t ( ) c . t s es . i m p l e ; 5 f , ,
between-reade-r variance due to differences in diagnostic^capability from
reader to reader; ancl 5;,,, within-reader
v a r i a n c e d u e -t o d i f f e r e n c e s i n a n i n d i vidual reacler's diagnc'rsesof the same
c a s e i n r e p e a t e d o c c a s i o n s .I t a l s o i n voives two correlation coefficients: r,.
t o d e n o t c t h e -c o r r e l a t i o n s i n t r o c l u c e d
bv using similar (or even the same')
c a s e sw i t h b o t h m o d a l i t i e s a n d r r , , t c r
denote correlationsbetn,eenthe accuracy index obtained bv usinp; rnatchecl
(or possiblv the same) readers. With
this notation. the formula becomes
S E ( d i i f e r e r r c e=)
r 2
The authors describe fullv i'ln seve-rai
w o r k c d e r a m p l e s h t r w t o e v a l u a t ee a c h
'fhev
point out, howof these terms.
ever, that the estimation of the two
c o m p o n e n t s r a n t l 5 r . c r e a t e sp r o b l e m s . F i r s t , i f n r = 7 , i . c . ,i f e a c h i ^ m a g e
is read just once, then Sf, ancl Su',,are
not separable., anci one is forced to
overestimate the SE. The second, and
more serious, problem is that if rl = 1
and if one does not have a large number of cases,enough ifor example) to
split them into a number of subsamples
and fit an ROC curvL- to each, one is
unable to estimate r, . ln such cases,the
authors explain that one has no alternative but to assume r, : 0, therebv
g i v i n g u p n n v b e n e f i t sa t t a i n a b l el r o m
case matching.
The method we have-presented here
means that if on€. uses the area under
the ROC curve as an index of accuracy,
one is not forced to assume r,. = 0. The
quantity we have callecl r, which is
obtainable r;la Taslr I from the area
and from the correlations between
r a t i n g s , i s t h e s d m e q u . r n t i l v r .- , . ,
mentioned in Equation 5, Chapter 4 of
Swets and Pickett (8)6. The interested
6If m ) 1, one can correct the quantitv /.
,,.,
(obtained from TlnLr l) for the "attenuation"
"true"
p r o d u c e d b v S 1 , . ,a n d e s t i m a t e t h e
correlation r,. introduced bv using similar (or the same-)
cases.
September 1983
gpg . f8o1orpel
€ rrqunN
8'l rrunlo^
'a8b
I
' s s J J J J r L U , ) p P - r V : j r O I . r A N S U , l l S . \ Sf r l s O U
-3prpr;o uorlr'n1p^:I l lll ltal-lrd '\/J sl.).uq 0 l
' . r t d r r r 1 g s 1 . i i , ' , r r 1 ' 1 ' yi 1 ' n 1 r
JArlf,Jdsord V ieuoLrrllpl lseJrq l() Lr()l()J
qlr,u stu;rtecl ur JrArl rql lo,iqdt.rilrlur:s
p L r r l ' p u n ( ) s r l J l l n ' , i q c l r . r B o r u o lp J l n c l r u ( ) )
'{ci sLuPpv '()(l uosr,)plY
'1fl
IP lr
IlrNrl,"{
ssar cl
ur 1!B6l .iNirlrrrpry ,ipnis rsr': Lrsp pt,Jtl
: u ( ) l J L ' l a l ( { l a l L I-Ir t q c l t r S t l l p u r u t t
J q l J ( )l - )
.{ro1sr11_1o1:11q.r.rL{lprr r sJ,\r n.) -ll l<uJlJpJprJ-l
i t t r l r ' : , r . 1 ' t' , ) \ t i \ r l l \ ' r l t r ' ( j I r l | t t l l l { ' \ \
l l l l L I r ' ) l \ t r r l t r n \. l | \ ' r ' . r I H l r l l r ' r N ) n
l f ; _ r I i : a i r 0 8 6I L l r . \ s c l
tlllll^\i 'sJA.In-r
l()l slsil ilLIt'J
|
l())l
Jrul.l(rLlt!l
r g r u H r s1 r r r 1 s r 1 e 1 qg H L r p L u L { ( ) J \ ' : i /. l)J I / \ , 1
/
fJI lql I r6s6l llntl Llr\s([
''p
J()pLr. J,\rnl-.)())J,)tl] rJfrun 0.)ipaLll
'l '.j.)Pll1)(i
1 o . i 1 r l r q c r : e , rl u r l e l L u r q ) l L I , ) r sil
9 6 1 - i 8 f : q r b 9 6 [ L i - r\ s ( l
Lllpl,\ I
tlpp poL{l!)Lu-NLr}Pl-ilP.\J,rltrl
, r ) U A l r t l U r rl \t r l t , r i l r ' u t r u l , r l , )l l\ \U t ' \ l , r . r { l
I t I = ( 9 2 00 ) ( 0 € 0 ' 0 X 0 1 0 ' 0-) zz 9 z 00 + . 0 € 0 0 ' \ / l e t 0 0 = : : J ' l s r l e l sl s r J
(p)
0t 0 = sP;iP LlJJ,tlaLlu(lllL)iJ.llt).)
llt0
O
91680 SZr
0 e 00
Z8€60 f9t
9200
0lI
JoJlq
PJrV
Zlr
T
9
1
!
i9t0
6
(lrJJV)
Er6'0
lz
|
tt
q
9tl6 0 = tart.rHerr,rl'
t t = S P J I PL I rJ f , u i ) l d l i l ( I
6 1 9 T
lt
E tl
q
s
I
8z
t
al
Pr.UJ()
Lrq Y
IE
{P.LUl() N
z.\trlLlpor\
JPUrI()Uq\i
IPrUr()N
I.\l{lP})(rl\
u o r l J a l i p l t , r r N r s1 o s l . r l a u u J p d J ( ) L I ( ) r l r t t i r l s i
p o o r i l l J . l r Iu n r u i \ P l \
.{]lv'CC] LrPLulr{)(j
9l 6;:lt l
: 1 9 1 1r . i f o 1 o r p r ; 1 a . u n J ( . ) ( ) ) J ) - r r l s r J r ] r p l t ' t l l
I r i r l r : a d o r J , \ r , r . r i .Ir J a l ] u n u , ) r pJ L l l J ( ) , ) s n
1 ' . 1 rs' 1 1 r 1 1 r ' .. rr 11 s1 . l l l I r . ' \ r n \ | \ ' ) l u l l l
I Z I 6 0 i : t t : 6 / 6 [ l ( ) r p P ]tl \ r , \ u l
\ J n [ ] ru L { r , l l i t i ri t r u r I p - ) r p . ) t uJ ( ) u ( ) rI p t l I P , \ . 1
rrll ()l Prrlcldr crs,ilPuP X))l
\''l sli.11(
'g6a-!Sa:B 18/61 pJI{
l . ) n f . Jt r r u r q
srs,\1rur' .11)y 1o s.rltlrrrir:cl rrsug t1.1z1atrr,1
9 9 6 1 ' s u o sP U P
,ia|,11 uqof :jl() \ uaN s:rs.iqcloq:.isd pur
. i r o ; r q t t t o t l - r r 1 a pl r u F r q y 1 s 1 . r , u q ' C l u i a J ( )
LlIlPfr()nl
1cirr.ra1u1 aclol_q
saJuaJatau
urlPu
prepuPls
r5^rs.ilPur'
-)c)u (.))
610: \r
09 0 = ur)llPl):t,r: r8t'.t:,tt'
Z .\lIllrpotu
.s,l ,UIIL?L]otrl
[
's.2 ,tlllepor-u
I
; ilrlepotu
090= i'r
:(nPl
(,1)
sHurlt,r Lraa,!\laq uorlel.)JJo-)
Ilppu.)))
f i l a v f H P p P U P)
.),)q.rnO'iPJrirrol\
] , 1 , ) J l s. \ l r s l ! ) A l l la I 1 / !
. \ l r s J J , \ r u nl I r r ) J l \
q I I p a H p u r , t N o 1 r , ru r . r p rr l r l 1 o 1 u ; t u ; r u c J . 1 6 1
'1c1
u:sntreu
. r r 1 l p . r c { , \ 1u o u r
- u I p - ) - l J r ' \ .i r u . ; . r 1 t r o r I e . 9 r 1 s . ) , \ L l rs I q I J o J s J n o - )
L rI \ u ( ) r \ \ n . r ' r 1 t r n . ; J 1 . r t .1{ 1 1 1 ' 111, r 1 2 1 r t r 1
()slu JJP .l^|sllnsal
()l p.)lq.pur
itll
salJPrlJ
JpuJouq\/.i1a1ru1116:9 ol Jt'uuoN,\l;lrrrlJc
r-9 0r
-
f
I
9
9
6
ZZ
ft
IPIoJ
9t
1
Oi
I
6I
L
6
L
: ! l
I
I
a
t
I
I
-
-
I
I
t
a
8
= I L u ( u J : s N r t t l r y,
r
z
r
;
[
a
t
6
i
llurpuuar
.ipnI
ptre sr.uo;ueqcl ,1.; aql 3urpr,to.r.l
prr-rt u()ssLtr,1r.S pJeLlJi]l
clrpq.l
aJB a,1l
:s1uaru3pa1,r'roulry
9'
L
e
l
f
I
6
'sJ.lpPaJ
t
E
Z
I
I ,\trlPpol,\
ql1.1\
I
a
ro1
ot pJlq.lpur
Iel()f
()
a
l
f
9
S
z
I
9
f
t
L
9^
9
E
t
€
Z
TuloJ
. L U , " L r t . l ,B
l uJ"N
-LU.,ltl,!J $uJ,\,qV
_
z ,\lrlPpo] l L[]L\4.*illlltrtl
* ilr-rrlPu
:Lrlep.Isug (t/)
: J r r l l o a q l u f q l e a J eJ O l l r J l u a r x B p r p u J l
- q n s { 1 r 1 e p o u ra u o J . r { J a L { , rlas a l o l l s . ) l z J o
s u o r J p l n J I e Jq l r , n r a q l a S o ] ' s t u o 1 u e q c l7 1 1
Jo L{)eJyo s.rBeur o,u1 o; ua,ttB sBr.trley
XICIN!IddV
'p.rJeurrlsaJap
.r1dr11nr-u
Jo r-Jrpnls LrI
-un Jo'lp prassanBiluo ilsnornard se.ll.
'JaAoaJotr
IPLII Lriall uu sapr,lord slql
tr
'seaJP aql
1L])ttosuedr-uoJ JAtltsuJs
aJoLu e urrolred ol uollelaJJof, stql
JSn ol ,ltor,lu.MoLIsJAuLIpue sluaried 1o
aldues aulps JLII LUoJJpJAIrJp sa JnJ
JOU o^ i rJpun sParP JLll uraMlJq
uorJeiaJJoJJql SurlprurlsJroj PoLllsru P
papr,r.ordJAeLIa,M'uaql ',fueutuns u1
' p a s n s r M o L Iu o s l l e l r p
ll
IInJ
JoJ JJUJJ.IJJJIPLII llnsuoJ ueJ JJpPaJ