Individual Matching with Multiple Controls in the Case of All-or-None Responses Author(s): Olli S. Miettinen Source: Biometrics, Vol. 25, No. 2 (Jun., 1969), pp. 339-355 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/2528794 . Accessed: 18/02/2015 09:50 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics. http://www.jstor.org This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS IN THE CASE OF ALL-OR-NONE RESPONSES OLLI S. MIETTINEN Departmentsof Epidemiologyand Biostatistics,Harvard School of Public Health, and Division,Department HospitalMedicalCenter, ofMedicine,Children's Cardiology U. S. A. Boston,Massachusetts, SUMMARY individualmatchingprincipleof the matchedpairsdesignis generalized The one-to-one responsesand fixedsample size to R-to-oneindividualmatchingin the case of all-or-none procedures.A test is given;its asymptoticpowerfunctionis derived;the selectionof the in relationto the unitcostsin the two comparisongroups; matchingratio(R) is considered are described. forsamplesize determination and finally, procedures 1. INTRODUCTION Matching is a common featurein the design of nonexperimentalstudies concernedwith the evaluation of causal propositions(such as hypotheseson disease etiology). Its main purposetypicallyis the attainmentof validityfor but it has implicationsfordesignefficiency as well. the inferences, studies with matchedcomparisonseries are frequently As nonexperimental quite expensive,it is importantto understandthe propertiesofmatchingdesigns so as to be able to make the best use of them. The matchedpairs designin the case of all-or-noneresponsesand fixedsample size has recentlybeen studied ratherextensively(Worcester[1964], Billewicz [1964, 1965], Miettinen [1966, 1968a, b], Bennett[1967],Chase [1968]). The presentpaper deals withthe extensionofthis designto the case wherethe numberof controlsubjectsobtained foreach propositusis not necessarilyone but some generalnumberR. We will use the term'R-to-oneindividualmatchingdesign.' This generalizationand an intelligentchoice of R are importantwheneverseveral controlsubjects can be obtained at a unit cost substantiallylowerthan that of the propositi. 2. BASIC CONCEPTS, TERMINOLOGY, AND NOTATION In a studywithR-to-oneindividualmatchingone obtains J sets of 1 + R subjects. Of each ofthese (1 + R)-tuples,one unitbelongsto one and R unitsto the otherof the two comparisongroups. The firstgroup,consistingof J subjects (propositi),is heretermedSeries 1, and the lattergroup,withRJ subjects (controls)is called Series 2. The matchingis based on some matchingvariate M (whichneed not be unidimensional). The jth (1 + R)-tuple showssome realizationmi forM, and this impliesa realization(p,i, p2,) = [p,(mi), p2(mj)] forthe randompair (P1 , P2) of underlyingresponseprobabilities,the numeralsin the subscriptsreferring to Series 1 and Series 2, respectively. 339 This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 1969 340 Each subject is characterizedby a code value of either0 or 1 fora (dichotomous) responsevariate Y. Thus, in the jth matchedgroupa realization(y,i1 * Y2j) is obtained forthe randomresponsevector (Yl, , y2i Y2jI, *...* Y2;iR) - 3. THE MODEL, THE HYPOTHESES, AND THE TEST The model underwhichthe resultsare derivedin the presentworkis completelyanalogousto that consideredby Miettinen[1966; 1968a, b] in the special case of the matchedpairs design. It consistsof the assumptionsthat (1) the distributionof M is independentlyidentical for all the J sample positions,and that (2) conditionallyon the realizations{mi} forM the observation(response) variatescorresponding to the (1 + R)J unitsare independentwithinas well as amongthe J sample positions. To state this modelin anotherway, it is assumed that , * , Y2Y) are independently and identically (1) the J vectors(Y1i, i that and distributed, conditionallyon (P1, P2) (2) Y1, Y2iI * Y2YR are mutuallyindependent = (Pli,P2;,j= , J. 1,P2 As before(Miettinen[1968a, b]), the object of statisticalinferenceis taken to be a = 01 - 02, whereOi = E(YJ) = EM[E(Yi I M)] = EM[pi(M)] = E(PJ), i = 1, 2, and the expectationis taken withrespectto the distributionof the matchingvariate in the sourcepopulationof the propositi. The null hypothesisis taken as Ho: a = 0, and the alternativesconsideredare a < 0, a > 0 and a 7 0. for Informationin termsof a two-by-twotable of frequenciesis sufficient each matchedgroup. For the jth groupthe table is a realizationfor Series 1 Series 2 Total 1 X1; X2; xi 0 1-X1, R-X2j 1 + R-X; 1 R 1+ R y Total whereX1; = Yli and X2j = ak Y2ik . In termsof the above hypothesesthe interestlies in the contrast E (RX, -X2) This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULITIPLE CONTROLS 341 On thenullhypothesisits expectationis zero. Conditionallyon {Xi } itsvariance is var [ i (RXjj X23) I {Xi}] - = var [(1 + R)Xj i X E(1+ R)2 - - Xi Xj 1 + R-X, Xj(1 + R-XX). E in thecase oflargeJ can be takenas Thus,theteststatistic T = [E i (RX1j - X2i)]/[ i Xj(1 + R - Xi)]', (3.1) (The Cenand itsvaluereferred to tablesofthestandardnormaldistribution. variatesappliesherebytheLindetralLimitTheorem forunequallydistributed If a continuity correction wereto be appliedto thisstatistic, bergcriterion.) by (1 + R). it wouldmeanreducingthe absolutevalue of the numerator The squareof the above teststatisticcan be thoughtof as representing in the chi squarestatisticwhichCochran[1950]gave one degreeof freedom matchedseries,each forcomparing amongseveralindividually proportions variate. by oneunitat eachoftheobservedlevelsofthematching represented Moreover, it is a specialcase ofthemethodwhichManteland Haenszel[1959] information froma numberof hypergeometric suggestedfor accumulating experiments. testimoftheMantel-Haenszel The resultsofBirch[1964]ontheproperties plythatthestatisticin (3.1) is optimalifPj(1 - P2)/(1 - P1)P2 is constant. It may be notedthat the Mantel-Haenszeltest appliesalso in the case analysesforthatcase whereR variesamongthematchedgroups. Alternative by Cox [1966]. wereconsidered in sections4.1 and 8. to (3.1) is considered The exacttest corresponding 4. THE TWO-TO-ONE INDIVIDUAL MATCHING DESIGN 4.1. Thetest among designmeritsspecialattention individualmatching The two-to-one theR-to-onematching designswithR > 1. It is to be regardedas the most used matchedpairsdesign forthe commonly applicablealternative frequently in the categoryof individualmatchingdesignswitha fixedmatchingratio. designit serveswell R-to-onematching Also,as the simplestnonsymmetrical case. of the generalasymmetrical the principles the purposeof introducing In thecase oftwo-to-one theteststatistic(3.1) takes individualmatching, theform T = (2X -X3)] X2 [zEX(3 - Xi)] (4.1) A usefulalternative forthisstatisticmaybe obtainedby conexpression This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 1969 342 sideringthe multinomialdistributionof the responsevector (X1, , X2i). There are six possible realizations,and theirrandomfrequenciesare here denoted as follows: 2 x1 x2 1 0 1 Z12 Zll 0 ZJ2 Z(O1 ZOO Total ZIO J Total In termsof this layout and notation,the statisticin (4.1) may be recastin the form T = [2Zio - Zol + Z1l - + Zol + Z1l + Z02)]1. 2ZO2]/[2(Z1o (4.2) The square of this latterstatistic,withpossible continuitycorrectionof 3/2, is completelyanalogous to the McNemar [19471statisticfor the matched pairs design. The above multinomnial formulationalso permitsready constructionof an exact test. Regarding {X} as fixedin (3.1) and (4.1) impliesthat the sums (say) are taken to be fixedin (4.2), Z,0 + Z0i = S1 (say) and Z1l + Z02 =-2 and that ZAoand Z1l have independentbinomialdistributions whoseparameters underthe nullhypothesisare (Si , -) and (S2 , 2),respectively.The corresponding joint probabilityfunction,Pr (Z0o z1O Zll = Zll I S1 = sl , S2 = S2), iS 1:0)3 () () (11)3 This permitsthe computationof the p-value forhypothesistesting. E.g., with a > 0 as H1 , p = Pr (Zl0 + Zl 2 zio + z11= v), i.e., (si 2 (" 1 2) )8 2 8-)>'.1k, 4.2. The asymptotic unconditional powerfunction From the above exact test it is apparent that any assessmentof power at the conclusionof a study with two-to-oneindividualmatchingwould best be made conditionallyon S1 and S2 . That conditionalpower functioninvolves nuisanceparameters(see Birch [1964]) whoseevaluationin practicewould tend to be quite problematic. However, in the presentcontextwe are concerned withdesignproblems(cf. sections4.3, 5.2, and 7) and theycall forthe unconditionalpowerfunction. It is seen below that this functioncan be approximated in termsof rathersimpleexpressions. One approachto the asymptoticunconditionalpowerfunction,II(a), is that oftaking II(6) = EsH(6 I S), This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS 343 where S = S1 + S2,i.e., S = ZIO + Zo1 + Zll + Z02 It is shownin Proofs1 and 2 ofAppendixA that,forthe statisticin (4.2), I S) = (2S)'a/(0p + + O2)(2if-_ 2h2) - E(T 2142) and I S) var (T -[(i + 282]/( 2) where 2P1P2) = 01 + 02 - 20102 - 2 cov (P1 , P2) = E(P1 + P2- (4.4) and 'P2 = E(P2 + P2 - 2P2P2) = 202(1 02) - - 2 var (P2). (4.5) Therefore,fora one-sidedtest, 11( i S) = (4.6) (2S)-2 6] P+ !162)(2q - whereb denotesthe distributionfunctionof a standardnormalvariate,and ua is the 100(1 - a) percentileof the standardnormaldistribution. A firstapproximationto 11(a) may now be obtainedby evaluatingII(8 I S) at S = E(S) JE[Pi(I - P2)2 + (1 - P1)2P2(1 - P2) + P12P2(1 - P2) + (1 -P)P2] = J(VI + 42'2). Thus, fora one-sidedtest, (.7 + [2J(t'+ .2'J2)] 1 -FUa(IP + 124/2) 11(a) + [NI b 112 )(2'1 - - 21/2) 284 .(7 of 'P and 'P2 in (4.4) and (4.5), it is apparentthat in the From the definitions vicinityofthenullstateone maytake 'P 'P2 of Thus,in the assessment . local power,(4.7) may be furtherapproximatedas 11() [U ('P2 talI. (4.8) + 2(J8//3)88 2/9)1.(48 - An alternative approach may be based on findingthe parameters of the asymptoticnormaldistributionof T. Using the above conditionalresults we obtain E(T) = Es(T S) - 2E(SI)8/(4P + 21V2)3 and var (T) = Es[var (T I S)] + var[E(T I S)] E(4' + ?2 IN)(22 (VI + 282]/[' --12) 12)(2 VI - 2) 2[12 + - 2 CV2)2 var + [215/(o + .A1)]2 var (Si) (&-W)}]/['P + IP2)2 This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions 344 BIOMETRICS, JUNE 1969 It followsthat the asymptoticallyexact unconditionalpower functionforthe case of a one-sidedtest is 11(8) = L={G' + -UM(/'+ 1 12)(241 - 21I'2) + 1412) - 1 2iE(S) 1 2 2[1 -var (Si)]} (9 ] ( A first-order approximationto this functionis obtained by replacingE(SI) and var (SI) by theirfirst-order approximationsfromTaylor seriesexpansions about E(S): E(SI) [J(GP+ and ht2)]I var (Si) = 1(1- 2+'12) Thus, ] II(6) - [{{ (+ + 'U)(2'+ _12'2) 11 2 + 1 + -[2J(G ja 8(3 + 12)] (lk+ 42)(241412) J/+ 14/'2)/2}1 The furtherapproximationcorresponding to t -t2 II1(8) - [Up 2 2 + _ 52(2 (4.10) iS )/3]) 1 (4.11) The numericalevaluationsin AppendixC indicatethat the approximations fromthis latterapproach, (4.10) and (4.11), are moreaccurate than (4.7) and (4.8), respectively. Second-orderapproximationsare given in Appendix B and evaluated in AppendixC. They affordonly slightrefinements of the above first-order approximations. Regardingthe case ofa two-sidedalternativehypothesis, each ofthe additive powercontributionsfromthe two tails may be evaluated by using II(a), with U ,a in place of ua , and with - I6 in place of I8j forthe lowertail contribution. 4.3. Sample size determination The sample size which,with a level of significance, yieldsthe power 1against 8 is, from(4.7), J Wt2) + {U.(VI + U9[(Vt + t'V2)(2t' - 21/2) - 282]I}2/2( + 2 2)8. The corresponding approximationfrom(4.8) is J 3[ual' + ug(42 _ 81p8/9)2]/41,2. The alternativefirst-order approximationto the power function,(4.10), yields J {Ua.(VI + 2 12) + U4[(VI + 2-V2)(2 - - 2 2 82(3 } /2 ( + V/+ 2V12)/2]4 + 21 2) 82. The approximationto this expressionfrom(4.11) is J 3{u ' + u1[2 82(2 + iP)/3]i}2/4052* The second-orderapproximationsto the asymptoticpower function(see This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIIPLE CONTROLS 345 AppendixB) are not readilyapplicable to sample size determination. As to the case of a two-sidedalternativehypothesis,a and ,Bare usually smallenoughso that all the poweris derivedessentiallyfromone ofits two components,and in such situationsthe above sample-sizeformulasapply upon substituting u4a for ua, . .5.THE GENERAL CASE 5.1. The asymptotic unconditionalpowerfunction In section4.2 as wellas in earlierworkon thematchedpairsdesign(Miettinen [1968a, b]) resultsforunconditionalpowerwere derivedthroughresultson the conditionaldistributionof T. In the generalcase of R: 1 individualmatching designs this approach does not seem to be readily applicable (cf. Miettinen [1968b]). Therefore,in the presentcontexta resultforthe unconditionalpower will be derivedwithoutthe intermediaryof the conditionaldistributionof T. Due to the natureof the approach,the immediateresultapplies well onlywhen 81is small relativeto At. For a derivationof the resultforlocal unconditionalpower,it is coinvenient to writethe test statistic(3.1) as T = UJ/VJ, where U.,= J- ? (RX1; - and X2,) V., X,(1 + R-Xi)]. J-1 The expectationof U.Tis E(UJ) = J-E Jr (RO - R02) =R16, It and by Proof 3 of Appendix A, var (U,) = R2(l - 62) _ 'R(R - 1)>2 followsthat if J -* o and a -* 0 in such a mannerthat Jl8 remainsfixed,then the distributionfunctionFJ(u) of U, tends to F(u) [R2( L[R(t' 62)211 8)R(R - - 1)4'12]1- On the otherhand, the sequence V, convergesstochasticallyto the constant [(R + 1)E(X) - E(X2)]I, and by Proof 4 of Appendix A this limit equals I'R% + 2(R -1)V2]3 From the above resultsit follows-e.g., by Theorem20.6, Cram6r[1946]that as J --+ o and a -O0 in such a mannerthatNib remainsfixed,Pr (Uj/Vj < ua) = Pr (T < ua) tends to The cop bU [4, + roia 1 R - 1) V/2]' - (RJ)'6 62) t 12(Rc-1) nc2]- The corresponldingapproximation to the asymptotic unconditional power func- This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 19f9 346 tion fora one-sidedtest is 11(8) 0{-u~E~t'? 2(R -1)[R(/- 82) 21 ? (RJ)l R - _-41 1 (5.1) As this resultwas derivedin a mannerwhichis adequate forsmall I16only, it is interesting and pertinentto compareit to earlierresultswithoutsuchlimitation forthe cases of 1:1 and 2:1 individualmatchingdesign.For this purposeit is expedientto rewrite(5.1) as II(a) -= 1 -ua[\6 +2(R- 1)J2] + 2(R -1) V2][RVIVI[ ?{RJ[B,t ?+ (R - 1>^62]}2181 (- 1412 - R[V + 12(R-1)2 622 If in this resultthe latter termin the denominator,-R[RI ? 2, + (R- 1)-2] is replaced by -R82, it gives exactly the earlier first-order approximation (4.7) as well as its equivalent for the matched pairs design (cf. Miettinen [1968a,b]). This modification is, as was to be expected,immaterialwithsmall 81,but forrelativelylarge values of 161it may be important. Making the above modification in (5.2) yields 11(^) ? {iJ[D ?+ (R - 1>J2]11 + 2(R - 1> RJ21 (53) 2 L{[VI + 3(R -1) 2][RV - 2(R - 1)121 -R2 It was seenabove that thisformulaapplieswellin the cases ofR = 1 and R = 2' withoutthe limitationthat I16be small. One may surmisethat this is the case also forlargerR, particularlyforpracticalpurposes,as the frequencywithwhich a particularR : 1 individualmatchingdesignis optimaldecreasessharplywith increasingR so that designswith,say, R > 4 are of verylittleinterest. In the vicinityof the null state one may again set Vt it2 , and the result in (5.3) becomes r[Ua[1 ? R) ? [2R(1 + R)J] 811 (5.4) i(54 [(1 ?~j? R2ifr2 But setting A VI,,2 presupposesalso that 181 is small relativeto At,so that it is of deletingthe reasonableforsome purposesto make the furthersimplification lattertermin the denominator. This yields 11I(8)_ ,[ Ua(1 L H1(8) -{I-ua - + [2RJ/(1 + R)i/4 VI 1}. (5.5) 5.2. Selectionof thedesignconstants Suppose again that the desireddegreeof information is specifiedin termsof the level a of the test,whetherthe test is to be one-sidedor two-sided,and the powerlevel 1 - f to be attainedagainstsomealternative8. The designproblem thenis to choosethe constantsR and J in such a mannerthat this information is obtainedat a minimumcost. Only the simple (and useful) cost model is consideredwhere Co = 'set-up' cost, C= = unit cost in Series 1, and C5 = unit cost in Series 2, This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS 347 so that the total cost of the studyis C = Co + (cl + RC2)J. (5.6) The firstproblemis to choose R in such a way that C is minimizedforthe particularcombinationof a, , and the parameters. From (5.3), J {Ua[lt+ ?2L(R- + u[Rf~- 1)>b211 t-a2/(R + 2(R-1)2)]1}2/Ra2. 22-(R - (5.7) Substitutingthis expressionforJ in (5.6) permitsa trial-and-error solutionfor the integervalue ofR whichminimizesC forthe case of a one-sidedalternative. The above approachto the selectionofR has the drawbackthat the solution dependson the nuisanceparameters,and typicallygood estimatesof them are not available in the planningstage ofthe study. These problemsare avoided by using the somewhatless accurate power functionin (5.5). The corresponding expressionforthe total cost (5.6) is C -Co + (cl + Rc2)(1 + R)VI(ua+ ug)2/2Rft2, (5.8) and this cost functionis minimizedby taking R= (5.9) (Cl/C2)1 R is, of course,an integer,and in practiceone could,thus,choosethat matching ratio whichbest correspondsto the square root of the inverseof the unit cost ratio. In case of ambivalence,the integervalue of R which minimizesC in of(Rf+ C1/c2) (1 + R)/tR. minimization (5.8) maybe obtainedbytrial-and-error WithR selected,the corresponding J forthe case of a one-sidedtest is specifiedby (5.7). This formulahas the obvious drawbackthat it involvestwo nuisance parameters,VIand 4'2 . However,in the vicinityof the null state, with small relativeto 4, it is appropriateto use the powerexpressionsin (5.4) and I51 (5.5). They yield J {u<[(1 + R)A]2 + up[(1 + R)I - 4Rf2/(1 + R)VI]}2/2fta2 (5.10) (1 + R)VI(u,+ up)2/2Rf2. 6. EVALUATION OF THE NUISANCE PARAMETERS It is seen fromthe powerfunctionsin (5.1) and (5.3) that withR = 1 there is onlyone nuisanceparameter,VI. Moreover,it was concludedon the basis of in (4.4) and (4.5) that in the vicinityof the null state one may the definitions take 412= 4' and therebyhave a singlenuisanceparametereven when R > 1. The evaluation of 4' has been discussedin the contextof the matchedpairs design(Miettinen[1968a,b]), whereit has the interpretation of beingthe probabilitythat a pair will show discordantresponses. It is usually reasonableto assume that cov (P1 , P2) > 0. The corresponding rangeof 4' is 151? 4' ? 01 + 02 - 201021 This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions 348 BIOMETRICS, JUNE 1969 and in a priorievaluationit wouldgenerallybe advisableto choosea value which is relativelyclose to the above upper bound. Furthermore, the maximumlikelihood estimateof Akbased on matchedpairs data has been shown (Miettinen [1968a]) to be = [Z1o + Zo0 + a(Z1o -ZoJ)]2J + {[Z10 + Zol + a(Zlo - Z0)]2/4J2 b-Zlo (Z1l + Zoo)]/J}, -Zol (6.1) whereZf0is the frequencyof pairs with Y, = f and Y2i g If data froman R:1 individualmatchingdesignwithR > 1 are at hand, the estimateof V/may be taken as R = Q Ak/R E (6.2) where iS computedaccordingto (6.1), takingZf0as the frequencyofmatched groupswith Y = f and Y2ik = g. The definitionof 12 in (4.5) may be put to the form 12 = E[P2(1 - P2)]. Thus, it is theprobabilitythat a pair ofcontrolsubjectswithinthe same matched groupshow discordantresponses. Its estimate,therefore, may be taken as the proportionof such pairs,i.e., Ak J p2 = E ij1 IR i k-1 IR E k'=1 (Y2ik - Y2jk) 2/R(R - 1)J. (6.3) 7. THE CHOICE BETWEEN R-TO-ONE INDIVIDUAL MATCHING DESIGNS AND THE USE OF TWO INDEPENDENT SERIES If matchingis not needed forvalidity(comparabilityof the two series),it is pertinentto knowunderwhat conditionsindividualmatchingwiththe optimal fixedmatchingratio would be indicatedfromthe efficiency point of view. In the case of retrospective(case history) studies matchingtends to decrease efficiency(cf. Miettinen [1968c]) and thereforethe presentdiscussionrelates only to prospective(cohort) designsand experiments. The answerdependson the costmodel. As an example,considerthe common situationwherea set of propositiis obtainedat a unit cost c, nonexperimental whichdoes not dependon whetheror not the controlseriesis matched. Suppose also that the cost model in (5.6) is adequate forthe matchingdesigns. Then, by (5.9) and (5.10) the cost of the optimalR:1 individualmatchingdesignwith a one-sidedalternativehypothesisis approximately C as C0 + (ci + c2iI (ua + un)2/2a2. The corresponding resultforthe case oftwo independentseriesmay be taken (e' 00c+ [4l? (cl)]2it"(u.,+ u,0)2/262, This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS 349 where 01 + = 02 - 20102 and cl is the unit cost of controlswhenno matchingis attempted. In termsof the above cost model and notation,matchingis justifiedby an efficiency gain if C < C', i.e., if (+/A') < [1 + (c2,/cJ)'/[1 + (C21CJ1)i] It is to be understoodhere that in the unmatchedcase the allocationratio R' can be essentiallyany positiverationalnumberand its optimalvalue can, thus, well be approximatedby (cl/cl)l. On the otherhand, however,substitutionof (Cl/c2)1 forthe optimalvalue ofR in the derivation ofthe above resultsmay not be veryaccuratebecause R is an integer. 8. EXAMPLE OF APPLICATION Trichopouloset al. [1969] tested the hypothesisthat induced abortionsincreasethe riskofectopicimplantationin subsequentpregnancies. Utilizingthe case history (retrospective)approach, 18 propositi with ectopic pregnancy followingat least one earlierpregnancywereidentifiedin a largeseriescollected fora breastcancerstudy. For each propositus,fourcontrolsubjectsweredrawn fromthe rest of the available series,matchingfororderof pregnancy,age, and husband'seducation. E.g., fora womanwhosethirdpregnancywas ectopicand occurredin the age intervalof30-34 years,and whosehusbandhad betweenone and seven yearsof education,the controlsubjectshad to have theirthirdpregnancyin that same age intervalin additionto the requirement that the husband have 1-7 yearsof education. The historyof inducedabortionsterminating any of the precedingpregnancieswas recordedforthe propositiand the controls. The essentialdata are givenin Table 1. For testingwhetherthe frequencyof positivehistoriesof induced abortion is significantly higheramong the propositi,let us firstapply the statisticin (3.1). In the numerator,the quantityE, RX,i = REi X1i is fourtimesthe numberof '+' signsin the proposituscolumnof Table 1, while ; X,i is the total numberof controlswith a positivehistory. Thus, E (RX1i - X2j) = 4(12) - 16 = 32. As to the sum in the denominatorof T, the contribution from each of the fivematchedgroupswith all historiesnegativeis zero, as forthese Xi = 0; thereare fourmatchedgroupswithone subjectgivinga positivehistory, and theirtotal contribution to the sum is 4(1) (1 + 4 - 1) = 16; etc. The value of the denominatoris thus readilyseen to be (64)1 = 8. The resultingvalue for T is 32/8 = 4.0, and this correspondsto the one-sidedp-value of 0.00003. Applyinga continuitycorrection, the absolutevalue ofthe numeratoris reduced by 2(1 + R) = 2.5,and theresultbecomesT = 3.7 withp = 0.00011. An exact test analogous to that in section4.1 is also easy to apply here. Amongthe fourmatchedgroupswith Xi = 1 therewere threewith Xi = 1. The binomial probabilityfor this under the null hypothesisis 4 1 4 This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 1969 350 TABLE 1 PREVIOUS HISTORY OF INDUCED AND MATCHED ABORTION IN PROPOSITI CONTROLS. WITH ECTOPIC PREGNANCY et al. TRICHOPOULOS il'istoiryof induced abortion Control number Index Propos- iiurnbl-lloer _ 1 itiis _ 2 2 3 __ 4 + + + -_ +- - _ I - - ) + - _ + 6 + tS- + J 10 II + I-- +- I - - + + _ _ + _ +- + + + _ _ + - _ _ -- 13 + 14 -++ :15. 17 18 - + - .12-- 16 _ ! _ 4_ _ + _ + + + _ - - + Similarlyforothervalues of Xi *Thus,the null probabilityof the observed outcomeconditionallyon the Xi's is [(4)(5) j)](5)() (53) ][(3)()() ] = 2533 Thereare two otherequally extremeoutcomes(each witha total of 11 'successes' in the three binomials) and one more extremeoutcome (with a total of 12 'successes'). Their probabilities,togetherwith the one shown above, add up to the one-sidedp-value of 0.00009. As this falls between the two values derivedfromthe asymptotictest, the size of the series seems to be sufficient forthe applicationof large-sampleprocedureswithreasonablyaccurate results. One mightnextwishto estimatethe sensitivityofthisstudy. E.g., it might be askedwhatpowerit could have been expectedto have againstthe alternative wherethefrequencyofpositivehistoriesofinducedabortionsamongwomenwith ectopic pregnancyis 20 percentagepoints higher than among 'comparable' womenwithoutectopic pregnancy(8 = 0.2), using a one-sidedtest at the 5% level of significance.The firstproblemhereis to estimatethe nuisanceparameters. Using (6.1) and the data forthe propositiand the firstset of controls in Table 1, the firstestimateof A1becomes This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS = [8 + 1 + 0.2(8 - 1)]/2(18) + {[8 + 1 + 0.2(8 = 351 - 1)]2/4(18)2 0.2[8 - - 1 - 0.2(9)]/18}i 0.44914. The estimatesobtainableby usingthe second,third,and fourthsets of controls are 0.33333, 0.33333, and 0.40000, respectively. Thus, upon averagingas in (6.2), the 'best' estimateof q/becomes == 0.37895. As to the estimationof V12accordingto (6.3), the contributions to the numerator fromthose matchedgroupswhere one controlhas positive historyis 6, while those withtwo '+' signscontribute8. Quite immediately,therefore, 'P2 Applyingthe above t and yieldsthe powerestimate 60/216 = 0.27778. = f2, togetherwith ua 1fI(0.2) = 1.645 and R = 4, to (5.3) 4(0.242) = 0.60. (The formulaforlocal power,(5.1), gives here 17(0.2) -1I(0.237) = 0.59, and the close agreementwiththe above resultindicatesratherwide applicabilityfor this formulaclespitethe theoreticallimitationto the vicinityof the null state.) It is of interestto considerhere the sensitivitygain fromthe utilizationof multiplecontrols,instead of just one, for each of the 18 available propositi. WithR 1 and otherspecifications as in the above, the formulain (5.3) yields WU(0.2) 1I(-0.314) = 0.38 substantiallyless than the above 0.60 forR = 4. On the otherhand, R = 10 correspondsto II(0.2) - (0.386) = 0.65, so that withincreasingR the returnsare rapidlydiminishing. ACKNOWLEDGMENTS The authoris indebtedto Miss Linda Rosensteinforcarryingout the computer work forAppendix C, and to the Editor and a refereeforhelpfulsuggestions. This workwas supportedby Grants5 Ti GM 7 and HE 10436-02fromthe National Institutesof Health. It formeda part of the author's Ph.D. thesis submittedto the Universityof Minnesotain March, 1968. APPARIEMENT INDIVIDUEL A DES TEMOINS MULTIPLES DANS LE CAS DES REPONSES EN TOUT-OU-RIEN RESUME Le principed'appariement individuel,'un A un,' des plans A couplesde sujets appari6s est g6n6ralis6 de 'R sujetsa un' dans le cas de reponsesen tout-ou-rien et de a l'appariement procedures fixee.Un testest etabli; sa fonctionde puissanceasympA taillede l'6chantillon est etudieen relationavec les cotLts totique6galement;le choixdu rapportR d'appariement dansles deuxgroupes et finalemenit unitaires les procedures A comparer la pourde'terminer taillede l'echantillon sontdecritas. This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 1969 352 REFERENCES J. Inst. Actuar.73, Beard,R. E. [1947].Some noteson approximateproduct-integration. 356-416. concerning matchedsamples.J. R. Statist.Soc. B Bennett,B. M. [19671.Tests ofhypotheses 29, 468-74. Brit.J. Prev.Soc. Med. Billewicz,W. Z. [19641.Matchedsamplesin medicalinvestigations. 18, 167-73. of matchedsamples:aniempiricalinvestigation. BioBillewicz,W. Z. [1965].The efficiency metrics 21, 623-44. Birch,N. W. [1964].The detectionofpartialassociation,I: the2 X 2 case. J. R. Statist.Soc. B 26, 313-24. 55, of matchedpairs in Bernoullitrials.Biometrika Chase, G. R. [1968].On the efficiency 365-9. in matchedsamples.Biometrika of percentages 37, Cochran,W. G. [1950].The comparison 256-66. 53,215involving quantaldata. Biometrika Cox,D. R. [1966].A simpleexampleofcomparison 20. PrincetonUniversity MethodsofStatistics. Press,Princeton, Cram6r,H. [1946].Mathematical N. J. Vol.1. 2ndEdn. Hafner Theory ofStatistics. Kendall,M. G. andStuart,A. [1963].TheAdvanced Publishing Company,New York. Mantel,N. and Haenszel,W. [1959].Statisticalaspectsof the analysisof data fromretrospectivestudiesof disease.J. Nat. CancerInst.22, 719-48. betweencorrectedproMcNemar,Q. [1947].Note on the samplingerrorof the difference or percentages. Psychometrika 12, 153-7. portions seriesin the case of all-or-nonie 0. S. [1966].Matchedpairsvs. two independent Miettinen, M. Se. Thesis,Univ.ofMinn. (Unpublished.) responses. 0. S. [1968a].The matchedpairsdesignin the case of all-or-none responses.BioMiettinen, metrics 24, 339-52. research 0. S. [1968b].Somebasic theoryformatchingdesignsin nonexperimental Miettinen, Inc., Ann Arbor, on causation.Ph. D. Thesis,Univ. of Minn. UniversityMicrofilms, Mich. in epidemiologic studies.Atti5? Cong. Miettinen,0. S. [1968c].Under-and overmatching Internaz.Igi6neMed. Prev.1, 49-61. forthe testforthe difference betweentwo proportions Patnaik,P. B. [1948].Powerfunction in a 2 X 2 table.Biometrika 35, 157-75. A. [1969].Inducedabortionsand D., Miettinen,0. S., and Polychronopoulou, Trichopoulos, (Unpublished.) ectopicpregnancy. studies.Biometrics 20, 841-8. Worcester, J. [19641.Matchedsamplesin epidemiologic APPENDIX A PROOFS Proof1 LettingXf = Pr (X1 = f,X2 E(T I S = s) (2s)-Is(2X1o- = g) we have = Xol (2s)-IsE[2P(l+ P1)2P2(l -(2s)-IsE[2(P1 -(2s) 61/(4+ - 2X02)/(X10 P2)2 -(1 + P12P2(1 - P2) (1 - + X1 - -2(1-P2) - + + 01 P1)2P2(1 - Xl + X02) P2) P 2P]/E[P(1-P2) + P12P2(1 P2)]/E[(P, + P2 - P2) + (1 -P1)P2] 2P1P2) + (P2 '202)- This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions P2)] INDIVIDUAL MATCHING WITH MULTIPLE CO(NTROLS 353 Proof2 With X'sdefinedas in Proof1, writingX10+ X01+ Xll + X02= and realizing that conditionallyon S = s the Z's are frequenciesin a multinomialdistribution (s, X10/A, Xoi/u, X11/41 X02/A),we may write var (T I S = s) = (2s)'sr2[4xo10(1- X10)+ Xol) + X11( 01(j- X11)+ 4X02(04- X02) - - 4X10X11 + 4X10X01 - 4X01X02+ + 8X10Xo2 + 2X0\/1X ){,[M = (2 + 3(X10+ X02)] = (2A2>1{,2 + E((1 - 3,E[P(1 - P2)) = (2,u) {,u + 3,AE[(Pl + P2 = (22)1{,2 = [(i/' 3A(VI - + + IVtI2)(2 12 2)] -12 212) X01 + - X11 - 2X02)2} P2)2 + (1-P1)P]-[2E(Pj(1 - P1)2P2(1 - (2X10 - + E(P12P2(1 - 2P1P2) 2)2) - P2)) - 2E((1 -P1)P2)]} [2E(P1 -P2)]2 (P2 -P2)]- 4 62} 22]/( - - - 4X11X02] + 2). 12 Proof3 By the independenceassumptionsof the model, var (UJ) = -1 = J1 [R2var (Xli) + var (X2j) J {R2[E(var(X1 I P1)) + - 2R cov (X,j X2,)] var (E(X1 I P1))] + E[var (X2 I P2)] + var [E(X2 IP2)] = -R2 2R[E(cov (Xi {E[Pl(1 , + 02 - - 20602 - P2)] + R2 var (P2) 2 cov (Pl , P2) - = RI2( - 2) P1), E(X2 IP2))] PF)] + var (P1)} - + RE[P2(1 =I2[01 X2 IP1 , P2)) + cov (E(Xl - R(R - 2R2[0 + 02 _ - COV (PF , P2)] 02 + 201021 1)[02 - 2 - var (P2)] !-R(R-12 - Proof4 [(R + 1)E(X) -E(X2)] - - {(R + 1)E(Xl + X2) {(R + 1)(01 + R62) - - [E(X1 + X2)]2 (6, + 02) - var (Xi + X2)}k 2 -E[var (Xi + X2 P1 FP2)] -var [E(Xl + X2 I P1 I P2)]}I This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions BIOMETRICS, JUNE 1969 354 = {(R + 1)(01 + + 1)(01 + -[(1 (01 + R602)2 R02) R2) - E[Pl(l -(01 + = {R[01 2 - + (Rik + 1R2IP2 -2R RI[V + !(R -1) - var (P1 + RP2)}* + var (P1) + 02 var (Pl) - R var (P2) cov (P1 , P2)] + RI[6202 -2 20102 - P2)] - 61 - R02)2 -!RV12 - P1) + RP2(1 - 2R cov (P1, P2)], - var (P2)] - -2 2) 2]1 APPENDIX B SECOND ORDER APPROXIMATIONS TO THE ASYMPTOTIC 2:1 INDIVIDUAL UNCONDITIONAL POWER FUNCTION OF THE MATCHING DESIGN As in the matchedpairs design (see Miettinen,1968a, b), various secondorderapproximationsto the asymptoticunconditionalpower functioncan be derived.Utilizingthe relation11(6) = E-11(6 I S) and expandingII(8 JS) in a Taylor seriesabout E(S) = J(V + /k2)givesthe second-order approximation I S J( + 12)] + 2 var(S) dS2( =(+ D S)I The corresponding explicitexpressionforthe case of a one-sidedalternativeis obtainedby usingthe powerresultin (4.6) and carryingout the algebra. The result,a refinement of (4.7), is 11(8) 4(y) - (y)(l if -2 - ~4'2){[(GI + 12'2)(2VI + 2y/( + 1f2) - 262] 262], (B.1) wherespdenotesthe densityfunctionof the standardnormaldistributionand y is the argumentof 1 in (4.7). Anothersecond-orderapproximationis obtained by applyingsecond-order approximationsto E(S") and var (SI) in (4.9). S has a binomialdistribution withparametersJ and VVI+ 2&2 . Thus, writingSI = f(S), the secondorderapproximationto E(S ) froma Taylor series expansion about E(S) = Jj/*is E(S') f(JVI*)+ - var (S)f"(JVI*) *[262/J(Q+ + 22) V(Jq*)2 -[(8J + + 12_J,*(1- 1)i+* - 2)(2q/ - 12 2) - )[1/4(J *p 1]/8(Ji'*)1. Similarly, var (S2) [ff(Jf*)]2 _ var (S)- var (S)]2/4 [f"(JVIf) + f'(J,jf*)ff"(Jif *)E(S-J f*)3 + [f"(4J /4*)].E(S4 J,*(1 = JV*)I 2i6*) and E(S J/*)4 SubstitutingE(S ,*)(1 *)l + J*(1 - 41*)[1- 6j/*(1 - 0*)] (cf.Kendalland Stuart 3J2/**(1 - - (1 - *) { (8J + [1963]pp. 58-9) and carryingout the algebrayieldsvar (Si) 3 + [1 6G*(1 7)*0*)]/2Jj1*J/32Jj1*.Applyingthese resultsto (4.9) one obtains,fora one-sidedtest, - This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions 355 INDIVIDUAL MATCHING WITH MULTIPLE CONTROLS L*(2--2 -uFu4* - ;2) + {[(8J + l1);* _[(8J + 7)i* - - + 2(l-2 262 3 + (1 - - l 1]/4(2JV*} ) ) 6VI*+ 6 (B.2) *2)12J*]/32JR*l_ Finally, followingPatnaik [1948] one may use Beard's [1947] approximate = withthe furtherapproximationof settingE(S -j*)3 product-integration E(S - Jj*)5 = 0. Patnaik's result,as adapted to the presentproblem,is H(3) -= 6H(3 I Sj) + 2H(3 8 2) + 6H(3 I (B.3) S3), where,witha one-sidedalternative,II(8 *) is the conditionalpowerfunctionin [3J,*(l (4.6), si J*s2 = JJ*,andS3 = J/* + [3JzP*(1The variousapproximationsto the asymptoticunconditionalpowerfunction are comparednumericallyin AppendixC. APPENDIX C COMPARISON OF APPROXIMATIONS FUNCTION Param- .10 .10 .10 .10 .10 .10 .05 .05 .05 .05 .05 .05 .10 .10 .10 .10 .10 .10 .05 .05 .05 .05 .05 .05 = .20 .20 .20 .40 .40 .40 .20 .20 .20 .40 .40 .40 .20 .20 .20 .40 .40 .40 .20 .20 .20 .40 .40 .40 TO TIIE ASYMPTOTIC 2:1 INDIVIDUAL Design eters P OF THE .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .05 .10 .10 .10 .10 .10 .10 .10 .10 .10 .10 .10 .10 85.47 115.52 143.73 181.96 250.61 315.58 363.91 501.23 631.16 738.41 1021.40 1289.63 61.44 87.25 111.97 132.23 191.57 248.84 264.46 383.14 497.69 537.96 782.86 1019.63 POWER Power approximations constants* 2J UNCONDITIONAL MATCHING DESIGN (4.7) (4.10) (B.1) (B.2) (B.3) .800 .900 .950 .800 .900 .950 .800 .900 .950 .800 .900 .950 .800 .900 .950 .800 .900 .950 .800 .900 .950 .800 .900 .950 .794 .894 .946 .799 .899 .949 .799 .899 .949 .800 .900 .950 .794 .894 .946 .799 .899 .949 .799 .899 .949 .800 .900 .950 .789 .891 .943 .798 .898 .949 .798 .898 .949 .800 .900 .950 .789 .890 .943 .798 .898 .949 .798 .898 .949 .799 .900 .950 .792 .893 .945 .799 .899 .949 .798 .899 .949 .800 .900 .950 .791 .893 .945 .799 .899 .949 .798 .898 .949 .800 .900 .950 .792 .893 .945 .799 .899 .949 .798 .899 .949 .800 .900 .950 .791 .893 .945 .799 .899 .949 .798 .898 .949 .800 .900 .950 * The values of J werechosento yieldpreselected powervalues whenthe simplestap(4.7), is used. proximatepowerfunction, December1968 ReceivedApril 1968, R?evised This content downloaded from 132.216.239.61 on Wed, 18 Feb 2015 09:50:20 AM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz