Individual Matching with Multiple Controls in the Case of All-or

Individual Matching with Multiple Controls in the Case of All-or-None Responses
Author(s): Olli S. Miettinen
Source: Biometrics, Vol. 25, No. 2 (Jun., 1969), pp. 339-355
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2528794 .
Accessed: 18/02/2015 09:50
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Biometrics.
http://www.jstor.org
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
IN THE CASE OF ALL-OR-NONE RESPONSES
OLLI
S. MIETTINEN
Departmentsof Epidemiologyand Biostatistics,Harvard School of Public Health, and
Division,Department
HospitalMedicalCenter,
ofMedicine,Children's
Cardiology
U. S. A.
Boston,Massachusetts,
SUMMARY
individualmatchingprincipleof the matchedpairsdesignis generalized
The one-to-one
responsesand fixedsample size
to R-to-oneindividualmatchingin the case of all-or-none
procedures.A test is given;its asymptoticpowerfunctionis derived;the selectionof the
in relationto the unitcostsin the two comparisongroups;
matchingratio(R) is considered
are described.
forsamplesize determination
and finally,
procedures
1. INTRODUCTION
Matching is a common featurein the design of nonexperimentalstudies
concernedwith the evaluation of causal propositions(such as hypotheseson
disease etiology). Its main purposetypicallyis the attainmentof validityfor
but it has implicationsfordesignefficiency
as well.
the inferences,
studies with matchedcomparisonseries are frequently
As nonexperimental
quite expensive,it is importantto understandthe propertiesofmatchingdesigns
so as to be able to make the best use of them. The matchedpairs designin the
case of all-or-noneresponsesand fixedsample size has recentlybeen studied
ratherextensively(Worcester[1964], Billewicz [1964, 1965], Miettinen [1966,
1968a, b], Bennett[1967],Chase [1968]). The presentpaper deals withthe extensionofthis designto the case wherethe numberof controlsubjectsobtained
foreach propositusis not necessarilyone but some generalnumberR. We will
use the term'R-to-oneindividualmatchingdesign.' This generalizationand an
intelligentchoice of R are importantwheneverseveral controlsubjects can be
obtained at a unit cost substantiallylowerthan that of the propositi.
2. BASIC CONCEPTS, TERMINOLOGY, AND NOTATION
In a studywithR-to-oneindividualmatchingone obtains J sets of 1 + R
subjects. Of each ofthese (1 + R)-tuples,one unitbelongsto one and R unitsto
the otherof the two comparisongroups. The firstgroup,consistingof J subjects (propositi),is heretermedSeries 1, and the lattergroup,withRJ subjects
(controls)is called Series 2.
The matchingis based on some matchingvariate M (whichneed not be unidimensional). The jth (1 + R)-tuple showssome realizationmi forM, and this
impliesa realization(p,i, p2,) = [p,(mi), p2(mj)] forthe randompair (P1 , P2)
of underlyingresponseprobabilities,the numeralsin the subscriptsreferring
to Series 1 and Series 2, respectively.
339
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1969
340
Each subject is characterizedby a code value of either0 or 1 fora (dichotomous) responsevariate Y. Thus, in the jth matchedgroupa realization(y,i1
*
Y2j) is obtained forthe randomresponsevector (Yl, , y2i
Y2jI,
*...*
Y2;iR) -
3. THE MODEL, THE HYPOTHESES, AND THE TEST
The model underwhichthe resultsare derivedin the presentworkis completelyanalogousto that consideredby Miettinen[1966; 1968a, b] in the special
case of the matchedpairs design. It consistsof the assumptionsthat
(1) the distributionof M is independentlyidentical for all the J sample
positions,and that
(2) conditionallyon the realizations{mi} forM the observation(response)
variatescorresponding
to the (1 + R)J unitsare independentwithinas
well as amongthe J sample positions.
To state this modelin anotherway, it is assumed that
, * , Y2Y) are independently
and identically
(1) the J vectors(Y1i,
i
that
and
distributed,
conditionallyon (P1, P2)
(2) Y1, Y2iI *
Y2YR are mutuallyindependent
= (Pli,P2;,j=
, J.
1,P2
As before(Miettinen[1968a, b]), the object of statisticalinferenceis taken
to be
a
=
01 -
02,
whereOi = E(YJ) = EM[E(Yi I M)] = EM[pi(M)] = E(PJ), i = 1, 2, and the
expectationis taken withrespectto the distributionof the matchingvariate in
the sourcepopulationof the propositi. The null hypothesisis taken as
Ho: a = 0,
and the alternativesconsideredare a < 0, a > 0 and a 7 0.
for
Informationin termsof a two-by-twotable of frequenciesis sufficient
each matchedgroup. For the jth groupthe table is a realizationfor
Series 1
Series 2
Total
1
X1;
X2;
xi
0
1-X1,
R-X2j
1 + R-X;
1
R
1+ R
y
Total
whereX1; = Yli and X2j =
ak Y2ik .
In termsof the above hypothesesthe interestlies in the contrast
E
(RX,
-X2)
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULITIPLE CONTROLS
341
On thenullhypothesisits expectationis zero. Conditionallyon {Xi } itsvariance
is
var [
i
(RXjj
X23) I {Xi}]
-
=
var [(1 + R)Xj
i
X
E(1+ R)2
-
-
Xi
Xj
1 + R-X,
Xj(1 + R-XX).
E
in thecase oflargeJ can be takenas
Thus,theteststatistic
T = [E
i
(RX1j
-
X2i)]/[
i
Xj(1 +
R
-
Xi)]',
(3.1)
(The Cenand itsvaluereferred
to tablesofthestandardnormaldistribution.
variatesappliesherebytheLindetralLimitTheorem
forunequallydistributed
If a continuity
correction
wereto be appliedto thisstatistic,
bergcriterion.)
by (1 + R).
it wouldmeanreducingthe absolutevalue of the numerator
The squareof the above teststatisticcan be thoughtof as representing
in the chi squarestatisticwhichCochran[1950]gave
one degreeof freedom
matchedseries,each
forcomparing
amongseveralindividually
proportions
variate.
by oneunitat eachoftheobservedlevelsofthematching
represented
Moreover,
it is a specialcase ofthemethodwhichManteland Haenszel[1959]
information
froma numberof hypergeometric
suggestedfor accumulating
experiments.
testimoftheMantel-Haenszel
The resultsofBirch[1964]ontheproperties
plythatthestatisticin (3.1) is optimalifPj(1 - P2)/(1 - P1)P2 is constant.
It may be notedthat the Mantel-Haenszeltest appliesalso in the case
analysesforthatcase
whereR variesamongthematchedgroups. Alternative
by Cox [1966].
wereconsidered
in sections4.1 and 8.
to (3.1) is considered
The exacttest corresponding
4. THE TWO-TO-ONE INDIVIDUAL MATCHING DESIGN
4.1. Thetest
among
designmeritsspecialattention
individualmatching
The two-to-one
theR-to-onematching
designswithR > 1. It is to be regardedas the most
used matchedpairsdesign
forthe commonly
applicablealternative
frequently
in the categoryof individualmatchingdesignswitha fixedmatchingratio.
designit serveswell
R-to-onematching
Also,as the simplestnonsymmetrical
case.
of the generalasymmetrical
the principles
the purposeof introducing
In thecase oftwo-to-one
theteststatistic(3.1) takes
individualmatching,
theform
T
=
(2X
-X3)]
X2
[zEX(3
-
Xi)]
(4.1)
A usefulalternative
forthisstatisticmaybe obtainedby conexpression
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1969
342
sideringthe multinomialdistributionof the responsevector (X1, , X2i). There
are six possible realizations,and theirrandomfrequenciesare here denoted as
follows:
2
x1
x2
1
0
1
Z12
Zll
0
ZJ2
Z(O1 ZOO
Total
ZIO
J
Total
In termsof this layout and notation,the statisticin (4.1) may be recastin the
form
T
=
[2Zio - Zol + Z1l
-
+ Zol + Z1l + Z02)]1.
2ZO2]/[2(Z1o
(4.2)
The square of this latterstatistic,withpossible continuitycorrectionof 3/2, is
completelyanalogous to the McNemar [19471statisticfor the matched pairs
design.
The above multinomnial
formulationalso permitsready constructionof an
exact test. Regarding {X} as fixedin (3.1) and (4.1) impliesthat the sums
(say) are taken to be fixedin (4.2),
Z,0 + Z0i = S1 (say) and Z1l + Z02 =-2
and that ZAoand Z1l have independentbinomialdistributions
whoseparameters
underthe nullhypothesisare (Si , -) and (S2 , 2),respectively.The corresponding joint probabilityfunction,Pr (Z0o z1O Zll = Zll I S1 = sl , S2 = S2), iS
1:0)3
()
()
(11)3
This permitsthe computationof the p-value forhypothesistesting. E.g., with
a > 0 as H1 , p = Pr (Zl0 + Zl 2 zio + z11= v), i.e.,
(si
2 ("
1 2)
)8
2 8-)>'.1k,
4.2. The asymptotic
unconditional
powerfunction
From the above exact test it is apparent that any assessmentof power at
the conclusionof a study with two-to-oneindividualmatchingwould best be
made conditionallyon S1 and S2 . That conditionalpower functioninvolves
nuisanceparameters(see Birch [1964]) whoseevaluationin practicewould tend
to be quite problematic. However, in the presentcontextwe are concerned
withdesignproblems(cf. sections4.3, 5.2, and 7) and theycall forthe unconditionalpowerfunction. It is seen below that this functioncan be approximated
in termsof rathersimpleexpressions.
One approachto the asymptoticunconditionalpowerfunction,II(a), is that
oftaking
II(6) = EsH(6 I S),
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
343
where S = S1 + S2,i.e.,
S = ZIO + Zo1 +
Zll +
Z02
It is shownin Proofs1 and 2 ofAppendixA that,forthe statisticin (4.2),
I S)
= (2S)'a/(0p +
+
O2)(2if-_ 2h2) -
E(T
2142)
and
I S)
var (T
-[(i
+
282]/(
2)
where
2P1P2) = 01 + 02 - 20102 - 2 cov (P1 , P2)
= E(P1 + P2-
(4.4)
and
'P2
= E(P2 + P2
-
2P2P2) = 202(1
02) -
-
2 var (P2).
(4.5)
Therefore,fora one-sidedtest,
11( i S) =
(4.6)
(2S)-2 6]
P+ !162)(2q -
whereb denotesthe distributionfunctionof a standardnormalvariate,and ua
is the 100(1 - a) percentileof the standardnormaldistribution.
A firstapproximationto 11(a) may now be obtainedby evaluatingII(8 I S)
at S = E(S)
JE[Pi(I - P2)2 + (1 - P1)2P2(1 - P2) + P12P2(1 - P2) +
(1 -P)P2]
= J(VI + 42'2).
Thus, fora one-sidedtest,
(.7
+ [2J(t'+ .2'J2)] 1
-FUa(IP + 124/2)
11(a)
+
[NI
b
112 )(2'1 -
-
21/2)
284
.(7
of 'P and 'P2 in (4.4) and (4.5), it is apparentthat in the
From the definitions
vicinityofthenullstateone maytake 'P
'P2
of
Thus,in the assessment
.
local power,(4.7) may be furtherapproximatedas
11()
[U
('P2
talI.
(4.8)
+ 2(J8//3)88 2/9)1.(48
-
An alternative approach may be based on findingthe parameters of
the asymptoticnormaldistributionof T. Using the above conditionalresults
we obtain
E(T) = Es(T
S)
-
2E(SI)8/(4P +
21V2)3
and
var (T)
= Es[var (T I S)] + var[E(T I S)]
E(4' +
?2 IN)(22
(VI
+
282]/['
--12)
12)(2 VI -
2)
2[12
+
-
2 CV2)2
var
+
[215/(o + .A1)]2 var (Si)
(&-W)}]/['P
+
IP2)2
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
344
BIOMETRICS, JUNE 1969
It followsthat the asymptoticallyexact unconditionalpower functionforthe
case of a one-sidedtest is
11(8)
=
L={G' +
-UM(/'+
1 12)(241 -
21I'2) +
1412) -
1
2iE(S) 1
2 2[1 -var (Si)]}
(9
]
(
A first-order
approximationto this functionis obtained by replacingE(SI)
and var (SI) by theirfirst-order
approximationsfromTaylor seriesexpansions
about E(S):
E(SI)
[J(GP+
and
ht2)]I
var (Si) =
1(1- 2+'12)
Thus,
]
II(6) - [{{ (+ + 'U)(2'+ _12'2)
11
2 +
1 + -[2J(G
ja
8(3 + 12)]
(lk+
42)(241412)
J/+ 14/'2)/2}1
The furtherapproximationcorresponding
to t -t2
II1(8) -
[Up
2
2 +
_ 52(2
(4.10)
iS
)/3])
1
(4.11)
The numericalevaluationsin AppendixC indicatethat the approximations
fromthis latterapproach, (4.10) and (4.11), are moreaccurate than (4.7) and
(4.8), respectively.
Second-orderapproximationsare given in Appendix B and evaluated in
AppendixC. They affordonly slightrefinements
of the above first-order
approximations.
Regardingthe case ofa two-sidedalternativehypothesis,
each ofthe additive
powercontributionsfromthe two tails may be evaluated by using II(a), with
U ,a in place of ua , and with - I6 in place of I8j forthe lowertail contribution.
4.3. Sample size determination
The sample size which,with a level of significance,
yieldsthe power 1against 8 is, from(4.7),
J
Wt2) +
{U.(VI +
U9[(Vt +
t'V2)(2t' -
21/2) -
282]I}2/2(
+
2 2)8.
The corresponding
approximationfrom(4.8) is
J
3[ual'
+ ug(42
_
81p8/9)2]/41,2.
The alternativefirst-order
approximationto the power function,(4.10),
yields
J
{Ua.(VI +
2 12) +
U4[(VI +
2-V2)(2
-
-
2
2
82(3
} /2 (
+ V/+ 2V12)/2]4
+
21 2) 82.
The approximationto this expressionfrom(4.11) is
J
3{u '
+
u1[2
82(2 +
iP)/3]i}2/4052*
The second-orderapproximationsto the asymptoticpower function(see
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIIPLE CONTROLS
345
AppendixB) are not readilyapplicable to sample size determination.
As to the case of a two-sidedalternativehypothesis,a and ,Bare usually
smallenoughso that all the poweris derivedessentiallyfromone ofits two components,and in such situationsthe above sample-sizeformulasapply upon substituting
u4a
for ua, .
.5.THE GENERAL CASE
5.1. The asymptotic
unconditionalpowerfunction
In section4.2 as wellas in earlierworkon thematchedpairsdesign(Miettinen
[1968a, b]) resultsforunconditionalpowerwere derivedthroughresultson the
conditionaldistributionof T. In the generalcase of R: 1 individualmatching
designs this approach does not seem to be readily applicable (cf. Miettinen
[1968b]). Therefore,in the presentcontexta resultforthe unconditionalpower
will be derivedwithoutthe intermediaryof the conditionaldistributionof T.
Due to the natureof the approach,the immediateresultapplies well onlywhen
81is small relativeto At.
For a derivationof the resultforlocal unconditionalpower,it is coinvenient
to writethe test statistic(3.1) as
T = UJ/VJ,
where
U.,= J-
?
(RX1;
-
and
X2,)
V.,
X,(1 + R-Xi)].
J-1
The expectationof U.Tis
E(UJ) = J-E
Jr
(RO
-
R02)
=R16,
It
and by Proof 3 of Appendix A, var (U,) = R2(l - 62) _ 'R(R - 1)>2
followsthat if J -* o and a -* 0 in such a mannerthat Jl8 remainsfixed,then
the distributionfunctionFJ(u) of U, tends to
F(u)
[R2(
L[R(t' 62)211
8)R(R
-
- 1)4'12]1-
On the otherhand, the sequence V, convergesstochasticallyto the constant
[(R + 1)E(X) - E(X2)]I, and by Proof 4 of Appendix A this limit equals
I'R% +
2(R -1)V2]3
From the above resultsit follows-e.g., by Theorem20.6, Cram6r[1946]that as J --+ o and a -O0 in such a mannerthatNib remainsfixed,Pr (Uj/Vj <
ua)
= Pr (T < ua) tends to
The cop
bU
[4, +
roia
1
R - 1) V/2]' - (RJ)'6
62) t 12(Rc-1) nc2]-
The corresponldingapproximation to the asymptotic unconditional power func-
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 19f9
346
tion fora one-sidedtest is
11(8)
0{-u~E~t'? 2(R -1)[R(/-
82)
21 ? (RJ)l
R
-
_-41
1
(5.1)
As this resultwas derivedin a mannerwhichis adequate forsmall I16only,
it is interesting
and pertinentto compareit to earlierresultswithoutsuchlimitation forthe cases of 1:1 and 2:1 individualmatchingdesign.For this purposeit
is expedientto rewrite(5.1) as II(a) -=
1
-ua[\6
+2(R-
1)J2]
+ 2(R -1) V2][RVIVI[
?{RJ[B,t ?+ (R - 1>^62]}2181
(- 1412 - R[V + 12(R-1)2
622
If in this resultthe latter termin the denominator,-R[RI ?
2,
+ (R- 1)-2]
is replaced by -R82, it gives exactly the earlier first-order
approximation
(4.7) as well as its equivalent for the matched pairs design (cf. Miettinen
[1968a,b]). This modification
is, as was to be expected,immaterialwithsmall
81,but forrelativelylarge values of 161it may be important.
Making the above modification
in (5.2) yields
11(^)
? {iJ[D ?+ (R - 1>J2]11
+ 2(R - 1> RJ21
(53)
2
L{[VI + 3(R -1) 2][RV - 2(R - 1)121 -R2
It was seenabove that thisformulaapplieswellin the cases ofR = 1 and R = 2'
withoutthe limitationthat I16be small. One may surmisethat this is the case
also forlargerR, particularlyforpracticalpurposes,as the frequencywithwhich
a particularR : 1 individualmatchingdesignis optimaldecreasessharplywith
increasingR so that designswith,say, R > 4 are of verylittleinterest.
In the vicinityof the null state one may again set Vt it2 , and the result
in (5.3) becomes
r[Ua[1
? R) ? [2R(1 + R)J]
811
(5.4)
i(54
[(1 ?~j?
R2ifr2
But setting A VI,,2
presupposesalso that 181
is small relativeto At,so that it is
of deletingthe
reasonableforsome purposesto make the furthersimplification
lattertermin the denominator. This yields
11I(8)_
,[ Ua(1
L
H1(8) -{I-ua
-
+ [2RJ/(1 + R)i/4
VI
1}.
(5.5)
5.2. Selectionof thedesignconstants
Suppose again that the desireddegreeof information
is specifiedin termsof
the level a of the test,whetherthe test is to be one-sidedor two-sided,and the
powerlevel 1 - f to be attainedagainstsomealternative8. The designproblem
thenis to choosethe constantsR and J in such a mannerthat this information
is obtainedat a minimumcost.
Only the simple (and useful) cost model is consideredwhere
Co = 'set-up' cost,
C= = unit cost in Series 1, and
C5 = unit cost in Series 2,
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
347
so that the total cost of the studyis
C = Co + (cl + RC2)J.
(5.6)
The firstproblemis to choose R in such a way that C is minimizedforthe
particularcombinationof a, , and the parameters. From (5.3),
J
{Ua[lt+
?2L(R-
+ u[Rf~-
1)>b211
t-a2/(R + 2(R-1)2)]1}2/Ra2.
22-(R -
(5.7)
Substitutingthis expressionforJ in (5.6) permitsa trial-and-error
solutionfor
the integervalue ofR whichminimizesC forthe case of a one-sidedalternative.
The above approachto the selectionofR has the drawbackthat the solution
dependson the nuisanceparameters,and typicallygood estimatesof them are
not available in the planningstage ofthe study. These problemsare avoided by
using the somewhatless accurate power functionin (5.5). The corresponding
expressionforthe total cost (5.6) is
C -Co + (cl + Rc2)(1 + R)VI(ua+ ug)2/2Rft2,
(5.8)
and this cost functionis minimizedby taking
R=
(5.9)
(Cl/C2)1
R is, of course,an integer,and in practiceone could,thus,choosethat matching
ratio whichbest correspondsto the square root of the inverseof the unit cost
ratio. In case of ambivalence,the integervalue of R which minimizesC in
of(Rf+ C1/c2) (1 + R)/tR.
minimization
(5.8) maybe obtainedbytrial-and-error
WithR selected,the corresponding
J forthe case of a one-sidedtest is specifiedby (5.7). This formulahas the obvious drawbackthat it involvestwo nuisance parameters,VIand 4'2 . However,in the vicinityof the null state, with
small relativeto 4, it is appropriateto use the powerexpressionsin (5.4) and
I51
(5.5). They yield
J
{u<[(1 + R)A]2 +
up[(1
+ R)I - 4Rf2/(1
+
R)VI]}2/2fta2 (5.10)
(1 + R)VI(u,+ up)2/2Rf2.
6. EVALUATION OF THE NUISANCE PARAMETERS
It is seen fromthe powerfunctionsin (5.1) and (5.3) that withR = 1 there
is onlyone nuisanceparameter,VI. Moreover,it was concludedon the basis of
in (4.4) and (4.5) that in the vicinityof the null state one may
the definitions
take 412= 4' and therebyhave a singlenuisanceparametereven when R > 1.
The evaluation of 4' has been discussedin the contextof the matchedpairs
design(Miettinen[1968a,b]), whereit has the interpretation
of beingthe probabilitythat a pair will show discordantresponses. It is usually reasonableto
assume that cov (P1 , P2) > 0. The corresponding
rangeof 4' is
151? 4' ?
01 +
02 -
201021
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
348
BIOMETRICS, JUNE 1969
and in a priorievaluationit wouldgenerallybe advisableto choosea value which
is relativelyclose to the above upper bound. Furthermore,
the maximumlikelihood estimateof Akbased on matchedpairs data has been shown (Miettinen
[1968a]) to be
=
[Z1o + Zo0 + a(Z1o -ZoJ)]2J
+ {[Z10 + Zol + a(Zlo
-
Z0)]2/4J2 b-Zlo
(Z1l + Zoo)]/J},
-Zol
(6.1)
whereZf0is the frequencyof pairs with Y, = f and Y2i g
If data froman R:1 individualmatchingdesignwithR > 1 are at hand, the
estimateof V/may be taken as
R
=
Q
Ak/R
E
(6.2)
where iS computedaccordingto (6.1), takingZf0as the frequencyofmatched
groupswith Y = f and Y2ik = g.
The definitionof 12 in (4.5) may be put to the form 12 = E[P2(1 - P2)].
Thus, it is theprobabilitythat a pair ofcontrolsubjectswithinthe same matched
groupshow discordantresponses. Its estimate,therefore,
may be taken as the
proportionof such pairs,i.e.,
Ak
J
p2
=
E
ij1
IR
i
k-1
IR
E
k'=1
(Y2ik
-
Y2jk) 2/R(R -
1)J.
(6.3)
7. THE CHOICE BETWEEN R-TO-ONE INDIVIDUAL MATCHING
DESIGNS AND THE USE OF TWO INDEPENDENT SERIES
If matchingis not needed forvalidity(comparabilityof the two series),it is
pertinentto knowunderwhat conditionsindividualmatchingwiththe optimal
fixedmatchingratio would be indicatedfromthe efficiency
point of view. In
the case of retrospective(case history) studies matchingtends to decrease
efficiency(cf. Miettinen [1968c]) and thereforethe presentdiscussionrelates
only to prospective(cohort) designsand experiments.
The answerdependson the costmodel. As an example,considerthe common
situationwherea set of propositiis obtainedat a unit cost c,
nonexperimental
whichdoes not dependon whetheror not the controlseriesis matched. Suppose
also that the cost model in (5.6) is adequate forthe matchingdesigns. Then,
by (5.9) and (5.10) the cost of the optimalR:1 individualmatchingdesignwith
a one-sidedalternativehypothesisis approximately
C
as
C0 + (ci + c2iI (ua + un)2/2a2.
The corresponding
resultforthe case oftwo independentseriesmay be taken
(e'
00c+
[4l? (cl)]2it"(u.,+ u,0)2/262,
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
349
where
01 +
=
02 -
20102
and cl is the unit cost of controlswhenno matchingis attempted.
In termsof the above cost model and notation,matchingis justifiedby an
efficiency
gain if C < C', i.e., if
(+/A')
<
[1
+
(c2,/cJ)'/[1
+
(C21CJ1)i]
It is to be understoodhere that in the unmatchedcase the allocationratio R'
can be essentiallyany positiverationalnumberand its optimalvalue can, thus,
well be approximatedby (cl/cl)l. On the otherhand, however,substitutionof
(Cl/c2)1
forthe optimalvalue ofR in the derivation
ofthe above resultsmay
not be veryaccuratebecause R is an integer.
8. EXAMPLE OF APPLICATION
Trichopouloset al. [1969] tested the hypothesisthat induced abortionsincreasethe riskofectopicimplantationin subsequentpregnancies. Utilizingthe
case history (retrospective)approach, 18 propositi with ectopic pregnancy
followingat least one earlierpregnancywereidentifiedin a largeseriescollected
fora breastcancerstudy. For each propositus,fourcontrolsubjectsweredrawn
fromthe rest of the available series,matchingfororderof pregnancy,age, and
husband'seducation. E.g., fora womanwhosethirdpregnancywas ectopicand
occurredin the age intervalof30-34 years,and whosehusbandhad betweenone
and seven yearsof education,the controlsubjectshad to have theirthirdpregnancyin that same age intervalin additionto the requirement
that the husband
have 1-7 yearsof education. The historyof inducedabortionsterminating
any
of the precedingpregnancieswas recordedforthe propositiand the controls.
The essentialdata are givenin Table 1.
For testingwhetherthe frequencyof positivehistoriesof induced abortion
is significantly
higheramong the propositi,let us firstapply the statisticin
(3.1). In the numerator,the quantityE, RX,i = REi X1i is fourtimesthe
numberof '+' signsin the proposituscolumnof Table 1, while ; X,i is the
total numberof controlswith a positivehistory. Thus, E (RX1i - X2j) =
4(12) - 16 = 32. As to the sum in the denominatorof T, the contribution
from
each of the fivematchedgroupswith all historiesnegativeis zero, as forthese
Xi = 0; thereare fourmatchedgroupswithone subjectgivinga positivehistory,
and theirtotal contribution
to the sum is 4(1) (1 + 4 - 1) = 16; etc. The value
of the denominatoris thus readilyseen to be (64)1 = 8. The resultingvalue
for T is 32/8 = 4.0, and this correspondsto the one-sidedp-value of 0.00003.
Applyinga continuitycorrection,
the absolutevalue ofthe numeratoris reduced
by 2(1 + R) = 2.5,and theresultbecomesT = 3.7 withp = 0.00011.
An exact test analogous to that in section4.1 is also easy to apply here.
Amongthe fourmatchedgroupswith Xi = 1 therewere threewith Xi = 1.
The binomial probabilityfor this under the null hypothesisis 4 1 4
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1969
350
TABLE 1
PREVIOUS
HISTORY
OF INDUCED
AND MATCHED
ABORTION
IN PROPOSITI
CONTROLS.
WITH ECTOPIC
PREGNANCY
et al.
TRICHOPOULOS
il'istoiryof induced abortion
Control number
Index
Propos-
iiurnbl-lloer
_
1
itiis
_
2
2
3
__
4
+
+
+
-_
+-
-
_
I
-
-
)
+
-
_
+
6
+
tS-
+
J
10
II
+
I--
+-
I
-
-
+
+
_
_
+
_
+-
+
+
+
_
_
+
-
_
_
--
13
+
14
-++
:15.
17
18
-
+
-
.12--
16
_
!
_
4_ _
+
_
+
+
+
_
-
-
+
Similarlyforothervalues of Xi *Thus,the null probabilityof the observed
outcomeconditionallyon the Xi's is
[(4)(5)
j)](5)()
(53) ][(3)()()
]
=
2533
Thereare two otherequally extremeoutcomes(each witha total of 11 'successes'
in the three binomials) and one more extremeoutcome (with a total of 12
'successes'). Their probabilities,togetherwith the one shown above, add up
to the one-sidedp-value of 0.00009. As this falls between the two values
derivedfromthe asymptotictest, the size of the series seems to be sufficient
forthe applicationof large-sampleprocedureswithreasonablyaccurate results.
One mightnextwishto estimatethe sensitivityofthisstudy. E.g., it might
be askedwhatpowerit could have been expectedto have againstthe alternative
wherethefrequencyofpositivehistoriesofinducedabortionsamongwomenwith
ectopic pregnancyis 20 percentagepoints higher than among 'comparable'
womenwithoutectopic pregnancy(8 = 0.2), using a one-sidedtest at the 5%
level of significance.The firstproblemhereis to estimatethe nuisanceparameters. Using (6.1) and the data forthe propositiand the firstset of controls
in Table 1, the firstestimateof A1becomes
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
=
[8 + 1 + 0.2(8
-
1)]/2(18)
+ {[8 + 1 + 0.2(8
=
351
-
1)]2/4(18)2
0.2[8
-
-
1 - 0.2(9)]/18}i
0.44914.
The estimatesobtainableby usingthe second,third,and fourthsets of controls
are 0.33333, 0.33333, and 0.40000, respectively. Thus, upon averagingas in
(6.2), the 'best' estimateof q/becomes
== 0.37895.
As to the estimationof V12accordingto (6.3), the contributions
to the numerator
fromthose matchedgroupswhere one controlhas positive historyis 6, while
those withtwo '+' signscontribute8. Quite immediately,therefore,
'P2
Applyingthe above t and
yieldsthe powerestimate
60/216 = 0.27778.
=
f2,
togetherwith ua
1fI(0.2)
=
1.645 and R = 4, to (5.3)
4(0.242) = 0.60.
(The formulaforlocal power,(5.1), gives here 17(0.2) -1I(0.237) = 0.59, and
the close agreementwiththe above resultindicatesratherwide applicabilityfor
this formulaclespitethe theoreticallimitationto the vicinityof the null state.)
It is of interestto considerhere the sensitivitygain fromthe utilizationof
multiplecontrols,instead of just one, for each of the 18 available propositi.
WithR 1 and otherspecifications
as in the above, the formulain (5.3) yields
WU(0.2) 1I(-0.314) = 0.38 substantiallyless than the above 0.60 forR = 4.
On the otherhand, R = 10 correspondsto II(0.2) - (0.386) = 0.65, so that
withincreasingR the returnsare rapidlydiminishing.
ACKNOWLEDGMENTS
The authoris indebtedto Miss Linda Rosensteinforcarryingout the computer work forAppendix C, and to the Editor and a refereeforhelpfulsuggestions.
This workwas supportedby Grants5 Ti GM 7 and HE 10436-02fromthe
National Institutesof Health. It formeda part of the author's Ph.D. thesis
submittedto the Universityof Minnesotain March, 1968.
APPARIEMENT INDIVIDUEL A DES TEMOINS MULTIPLES DANS LE CAS
DES REPONSES EN TOUT-OU-RIEN
RESUME
Le principed'appariement
individuel,'un A un,' des plans A couplesde sujets appari6s
est g6n6ralis6
de 'R sujetsa un' dans le cas de reponsesen tout-ou-rien
et de
a l'appariement
procedures
fixee.Un testest etabli; sa fonctionde puissanceasympA taillede l'6chantillon
est etudieen relationavec les cotLts
totique6galement;le choixdu rapportR d'appariement
dansles deuxgroupes
et finalemenit
unitaires
les procedures
A comparer
la
pourde'terminer
taillede l'echantillon
sontdecritas.
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1969
352
REFERENCES
J. Inst. Actuar.73,
Beard,R. E. [1947].Some noteson approximateproduct-integration.
356-416.
concerning
matchedsamples.J. R. Statist.Soc. B
Bennett,B. M. [19671.Tests ofhypotheses
29, 468-74.
Brit.J. Prev.Soc. Med.
Billewicz,W. Z. [19641.Matchedsamplesin medicalinvestigations.
18, 167-73.
of matchedsamples:aniempiricalinvestigation.
BioBillewicz,W. Z. [1965].The efficiency
metrics
21, 623-44.
Birch,N. W. [1964].The detectionofpartialassociation,I: the2 X 2 case. J. R. Statist.Soc.
B 26, 313-24.
55,
of matchedpairs in Bernoullitrials.Biometrika
Chase, G. R. [1968].On the efficiency
365-9.
in matchedsamples.Biometrika
of percentages
37,
Cochran,W. G. [1950].The comparison
256-66.
53,215involving
quantaldata. Biometrika
Cox,D. R. [1966].A simpleexampleofcomparison
20.
PrincetonUniversity
MethodsofStatistics.
Press,Princeton,
Cram6r,H. [1946].Mathematical
N. J.
Vol.1. 2ndEdn. Hafner
Theory
ofStatistics.
Kendall,M. G. andStuart,A. [1963].TheAdvanced
Publishing
Company,New York.
Mantel,N. and Haenszel,W. [1959].Statisticalaspectsof the analysisof data fromretrospectivestudiesof disease.J. Nat. CancerInst.22, 719-48.
betweencorrectedproMcNemar,Q. [1947].Note on the samplingerrorof the difference
or percentages.
Psychometrika
12, 153-7.
portions
seriesin the case of all-or-nonie
0. S. [1966].Matchedpairsvs. two independent
Miettinen,
M. Se. Thesis,Univ.ofMinn. (Unpublished.)
responses.
0. S. [1968a].The matchedpairsdesignin the case of all-or-none
responses.BioMiettinen,
metrics
24, 339-52.
research
0. S. [1968b].Somebasic theoryformatchingdesignsin nonexperimental
Miettinen,
Inc., Ann Arbor,
on causation.Ph. D. Thesis,Univ. of Minn. UniversityMicrofilms,
Mich.
in epidemiologic
studies.Atti5? Cong.
Miettinen,0. S. [1968c].Under-and overmatching
Internaz.Igi6neMed. Prev.1, 49-61.
forthe testforthe difference
betweentwo proportions
Patnaik,P. B. [1948].Powerfunction
in a 2 X 2 table.Biometrika
35, 157-75.
A. [1969].Inducedabortionsand
D., Miettinen,0. S., and Polychronopoulou,
Trichopoulos,
(Unpublished.)
ectopicpregnancy.
studies.Biometrics
20, 841-8.
Worcester,
J. [19641.Matchedsamplesin epidemiologic
APPENDIX A
PROOFS
Proof1
LettingXf = Pr (X1 = f,X2
E(T I S = s)
(2s)-Is(2X1o-
=
g) we have
=
Xol
(2s)-IsE[2P(l+
P1)2P2(l
-(2s)-IsE[2(P1
-(2s)
61/(4+
- 2X02)/(X10
P2)2 -(1
+ P12P2(1 - P2)
(1 -
+ X1
-
-2(1-P2)
-
+
+
01
P1)2P2(1
-
Xl
+
X02)
P2)
P 2P]/E[P(1-P2)
+ P12P2(1
P2)]/E[(P, + P2
-
P2)
+
(1 -P1)P2]
2P1P2) + (P2
'202)-
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
P2)]
INDIVIDUAL MATCHING WITH MULTIPLE CO(NTROLS
353
Proof2
With X'sdefinedas in Proof1, writingX10+ X01+ Xll + X02=
and realizing that conditionallyon S = s the Z's are frequenciesin a multinomialdistribution (s, X10/A,
Xoi/u, X11/41 X02/A),we may write
var (T I S = s)
=
(2s)'sr2[4xo10(1-
X10)+
Xol) + X11(
01(j-
X11)+ 4X02(04- X02)
-
- 4X10X11
+ 4X10X01
- 4X01X02+
+ 8X10Xo2
+ 2X0\/1X
){,[M
= (2
+ 3(X10+ X02)]
= (2A2>1{,2
+
E((1
-
3,E[P(1
-
P2))
=
(2,u) {,u + 3,AE[(Pl + P2
=
(22)1{,2
= [(i/'
3A(VI -
+
+
IVtI2)(2
12 2)]
-12 212)
X01 +
-
X11
-
2X02)2}
P2)2 + (1-P1)P]-[2E(Pj(1
-
P1)2P2(1
-
(2X10
-
+ E(P12P2(1
-
2P1P2)
2)2)
-
P2))
-
2E((1 -P1)P2)]}
[2E(P1 -P2)]2
(P2 -P2)]-
4 62}
22]/(
-
-
-
4X11X02]
+
2).
12
Proof3
By the independenceassumptionsof the model,
var (UJ)
=
-1
=
J1
[R2var (Xli) + var (X2j)
J
{R2[E(var(X1 I P1)) +
-
2R cov (X,j X2,)]
var (E(X1 I P1))]
+ E[var (X2 I P2)] + var [E(X2 IP2)]
=
-R2
2R[E(cov (Xi
{E[Pl(1
,
+
02 -
-
20602 -
P2)] +
R2
var (P2)
2 cov (Pl , P2)
-
=
RI2(
-
2)
P1), E(X2 IP2))]
PF)] + var (P1)}
-
+ RE[P2(1
=I2[01
X2 IP1 , P2)) + cov (E(Xl
-
R(R
-
2R2[0 +
02 _
-
COV
(PF , P2)]
02 + 201021
1)[02
-
2
-
var (P2)]
!-R(R-12
-
Proof4
[(R + 1)E(X) -E(X2)]
-
-
{(R + 1)E(Xl + X2)
{(R + 1)(01 + R62)
-
-
[E(X1 + X2)]2
(6, +
02)
-
var (Xi + X2)}k
2
-E[var (Xi + X2 P1 FP2)] -var [E(Xl + X2 I P1 I P2)]}I
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
BIOMETRICS, JUNE 1969
354
=
{(R + 1)(01 +
+ 1)(01 +
-[(1
(01 + R602)2
R02)
R2)
-
E[Pl(l
-(01
+
=
{R[01
2 -
+
(Rik + 1R2IP2 -2R
RI[V + !(R -1)
-
var (P1 + RP2)}*
+ var (P1)
+
02
var (Pl)
-
R var (P2)
cov (P1 , P2)] + RI[6202
-2
20102
-
P2)]
-
61
-
R02)2
-!RV12
-
P1) + RP2(1
-
2R cov (P1, P2)],
-
var (P2)]
-
-2
2)
2]1
APPENDIX B
SECOND
ORDER APPROXIMATIONS
TO THE ASYMPTOTIC
2:1 INDIVIDUAL
UNCONDITIONAL
POWER FUNCTION
OF THE
MATCHING DESIGN
As in the matchedpairs design (see Miettinen,1968a, b), various secondorderapproximationsto the asymptoticunconditionalpower functioncan be
derived.Utilizingthe relation11(6) = E-11(6 I S) and expandingII(8 JS) in
a Taylor seriesabout E(S) = J(V + /k2)givesthe second-order
approximation
I S J(
+
12)] +
2
var(S) dS2(
=(+
D
S)I
The corresponding
explicitexpressionforthe case of a one-sidedalternativeis
obtainedby usingthe powerresultin (4.6) and carryingout the algebra. The
result,a refinement
of (4.7), is
11(8)
4(y)
-
(y)(l
if -2
-
~4'2){[(GI + 12'2)(2VI + 2y/(
+
1f2)
-
262]
262], (B.1)
wherespdenotesthe densityfunctionof the standardnormaldistributionand y
is the argumentof 1 in (4.7).
Anothersecond-orderapproximationis obtained by applyingsecond-order
approximationsto E(S") and var (SI) in (4.9). S has a binomialdistribution
withparametersJ and VVI+ 2&2 . Thus, writingSI = f(S), the secondorderapproximationto E(S ) froma Taylor series expansion about E(S) =
Jj/*is
E(S')
f(JVI*)+ - var (S)f"(JVI*)
*[262/J(Q+
+
22)
V(Jq*)2
-[(8J
+
+
12_J,*(1-
1)i+*
-
2)(2q/ - 12 2) -
)[1/4(J
*p
1]/8(Ji'*)1.
Similarly,
var (S2)
[ff(Jf*)]2
_
var (S)-
var (S)]2/4
[f"(JVIf)
+ f'(J,jf*)ff"(Jif
*)E(S-J f*)3 + [f"(4J /4*)].E(S4
J,*(1
=
JV*)I
2i6*)
and
E(S
J/*)4
SubstitutingE(S
,*)(1
*)l + J*(1 - 41*)[1- 6j/*(1 - 0*)] (cf.Kendalland Stuart
3J2/**(1
-
-
(1 - *) { (8J +
[1963]pp. 58-9) and carryingout the algebrayieldsvar (Si)
3 + [1 6G*(1
7)*0*)]/2Jj1*J/32Jj1*.Applyingthese resultsto (4.9)
one obtains,fora one-sidedtest,
-
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions
355
INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS
L*(2--2
-uFu4*
-
;2)
+ {[(8J + l1);*
_[(8J + 7)i* -
-
+ 2(l-2
262
3 + (1
-
-
l
1]/4(2JV*}
)
)
6VI*+ 6
(B.2)
*2)12J*]/32JR*l_
Finally, followingPatnaik [1948] one may use Beard's [1947] approximate
=
withthe furtherapproximationof settingE(S -j*)3
product-integration
E(S - Jj*)5 = 0. Patnaik's result,as adapted to the presentproblem,is
H(3)
-=
6H(3
I Sj)
+
2H(3
8
2)
+
6H(3
I
(B.3)
S3),
where,witha one-sidedalternative,II(8 *) is the conditionalpowerfunctionin
[3J,*(l (4.6), si J*s2 = JJ*,andS3 = J/* + [3JzP*(1The variousapproximationsto the asymptoticunconditionalpowerfunction
are comparednumericallyin AppendixC.
APPENDIX C
COMPARISON
OF APPROXIMATIONS
FUNCTION
Param-
.10
.10
.10
.10
.10
.10
.05
.05
.05
.05
.05
.05
.10
.10
.10
.10
.10
.10
.05
.05
.05
.05
.05
.05
=
.20
.20
.20
.40
.40
.40
.20
.20
.20
.40
.40
.40
.20
.20
.20
.40
.40
.40
.20
.20
.20
.40
.40
.40
TO TIIE ASYMPTOTIC
2:1 INDIVIDUAL
Design
eters
P
OF THE
.05
.05
.05
.05
.05
.05
.05
.05
.05
.05
.05
.05
.10
.10
.10
.10
.10
.10
.10
.10
.10
.10
.10
.10
85.47
115.52
143.73
181.96
250.61
315.58
363.91
501.23
631.16
738.41
1021.40
1289.63
61.44
87.25
111.97
132.23
191.57
248.84
264.46
383.14
497.69
537.96
782.86
1019.63
POWER
Power approximations
constants*
2J
UNCONDITIONAL
MATCHING DESIGN
(4.7)
(4.10)
(B.1)
(B.2)
(B.3)
.800
.900
.950
.800
.900
.950
.800
.900
.950
.800
.900
.950
.800
.900
.950
.800
.900
.950
.800
.900
.950
.800
.900
.950
.794
.894
.946
.799
.899
.949
.799
.899
.949
.800
.900
.950
.794
.894
.946
.799
.899
.949
.799
.899
.949
.800
.900
.950
.789
.891
.943
.798
.898
.949
.798
.898
.949
.800
.900
.950
.789
.890
.943
.798
.898
.949
.798
.898
.949
.799
.900
.950
.792
.893
.945
.799
.899
.949
.798
.899
.949
.800
.900
.950
.791
.893
.945
.799
.899
.949
.798
.898
.949
.800
.900
.950
.792
.893
.945
.799
.899
.949
.798
.899
.949
.800
.900
.950
.791
.893
.945
.799
.899
.949
.798
.898
.949
.800
.900
.950
* The values of J werechosento yieldpreselected
powervalues whenthe simplestap(4.7), is used.
proximatepowerfunction,
December1968
ReceivedApril 1968, R?evised
This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM
All use subject to JSTOR Terms and Conditions