I .~ I I, I I I I I II I I I I I I I {' I ON TESTING HYPOTHESES CONCERNING STANDARDIZED MORTALITY RATIOS AND/OR INDIRECT ADJUSTED RATES* By Lawrence L. Kupper and David G. K1einbaum Department of Biostatistics University of North Carolina at Chapel Hill Institute of Statistics Mimeo Series No. 709 September 1970 I ON TESTING HYPOTHESES CONCERNING STANDARDIZED MORTALITY RATIOS AND/OR INDIRECT ADJUSTED RATES* I I I I I I II I I I I I I I .. I Lawrence L. Kupper and David G. Kleinbaum University of North Carolina at Chapel Hill 1. ~. Suppose that there are p test populations ~, ulation o ~l'~2' ••• '~p' a standard pop- and c categories, where the term "test population" denotes a collection of individuals under study. We shall regard a test population as a sample from some universe, for only then is there any point in discussing statistical estimation and hypothesis testing (see Section I of [2] for comments along this line). Let nij = the number of individuals in category j of ~ ., 1 Pij = the true probability of death in category j of d .. 1J = the observed number of deaths in category j of i=O,l, ••• ,p, j=l,2, ••• ,c. for population ~. 1 ~ ., 1 ~. 1 , Then, the standardized mortality ratio SMR(i) is defined to be *Work supported by National Institutes of Health Grant No. GM-l2868. I (1.1) I I I I I I It I I I I I I I e' I I C SMR(i} = L _ nJ..jP . . j-1 J.J (Note that SMR(O) = 1.) IL7J=1 n .. p j' i=O,l, ••• ,p. J.J 0 In this paper we shall be concerned with testing the following general hypothesis, namely (1.2) where 1<i1<i2< ••• <i~ and 2<k~. The indirect adjusted death rate P(i) for ~. J. \ is where Wij = (n. - J. j /L7J= 1 .IL7J= 1 n J.J .. p0 j)(L7J= 1 nOJ.p OJ n OJ.). ~c Since P ( i ) = a o SMR ( i ) , where a o = Lj=l nojPoj I~c Lj=l noj is independent of i, it is clear that testing the hypothesis H given by (1.2) is equivalent to o testing the hypothesis Hence, in what follows we can, without loss of generality, restrict ourselves to considering tests of H • o Finally, although we have presented the problem in a mortality-type setting; the techniques developed. below can validly be used in any situation I -3- where the response is dichotomous (e.g., was or was not a migrant, did or did not have cancer). I I I I I I 81 I I I I I I I .. I 2. Cboice.of~ .and.itseffect.on.tests.ofH. When the standard population TI o is chosen independently of the test pop- ulations TI ,TI , ••• ,TIp ' then it is reasonable to assume, for all practical l 2 purposes, that the quantities p = d ./n j are non-stochastic (c.f.,[2],[4],[5]). oj oJ 0 An example of such a situation would be·one in which the test populations are all the different counties in England in 1963 and the standard is the U. S. population in 1963 or the 1945 population of England. In this case the best estimate of SMR(i) is (2.1) where P1.'j = di./n .. is approximately normally distributed for large n .. , has J 1.J . 1.J A mean Pi' and variance p .. (l-p . . )/n .. , and is uncorrelated with p.,., unless J 1.J 1.J 1.J 1. J \,C A both i=i' and j=j'. The corresponding best estimate of P(i) is P(i) = [.. lW"P ..• J= 1.J 1.J ~ A However, the situation is quite different when the standard population is formed by pooling the test populations together so that p . has the structure oJ (2.2) In this case not only the numerator but also the denominator of SMR(i) must A be estimated using the Pij'S obtained from the test populations; in particular, the best estimate of SMR(i) is then given by I -4- (2.3) I I I I I I 81 I I I I I I I .. I A _ ~p ~p A where Poj - Li=l nijPij/Li=l n ij • - given by A, P(~) ~c' A '" '" = L' 1 wi,Pi" J= J The corresponding best estimate of P(i) is J '" where wi' is obtained by substituting p , for J . oJ Poj and n. j = Ii=l n ij for noj in the expression for wij • The use of a "pooled standard" of the form (2.2) is not without precedent. For example, researchers associated with the Evans County Coronary Heart Disease Study [1] have compared the SMR of white males having low systolic blood pressure with that of white males having high systolic blood pressure, the categories being age groups and the standard population consisting of all white males in Evans County, Georgia. Similar pooled standards were used to study the effects of other physiological factors as well. In general, the use of such a standard is dictated when the response under study (e.g., -the incidence of coronary heart disease) has not been the subject of a previous investigation conducted under comparable conditions. It is important to emphasize the distinction between the above two choices of the standard population because the different estimates. (2.1) and (2.3) necessitate the use of different procedures for testing H. o These approaches are described in the next two sections. 3. TestingHwhen~ is cbosenindependently of the test populations. If we define the vector A a' A A A = (SMR(l),SMR(2), ••• ,SMR(p», '" where SMR(i) is given by (2.1), then it follows that A '" E(a) = a and Var(~) = diag(cr~,cr~, ••• ,cr~) = D, I ~ I I I I I I -5- where a' = (SMR(1),SMR(2), ••• ,SMR(p» In this framework, the hypothesis H can equivalently be written as o H: o .e I a=0 '. (j~l)-th row the vector c' -j-l = (0, .•• ,0,1,0, ••• ,0,-1,0, ••• ,0) having +1 in the il-th position, -1 in the i,-th position, and zeros elsewhere, j=2,3, ••• ,k. From the fact that J T1 '" "'-1 = a' c' (CDC') is approximately distributed as a central X2 variable with (k-l) degrees of '" freedom when H is true, where D = o • "'2 "'2 d~ag(crl,cr2' .' "'2 . ••• 'cr ) is the matrix obtained p from D by replacing p .. by p'" .. everywhere it appears, we would reject H0 at , ~J ~J the a level of significance if T > X2 where X2 is the upper 1 - k-l;l-a' k-l;l-a 100 (I-a) % point of the central X~_l distribution. For the special case of H when k=p (so that we are testing the equality o of all p SMR's or all p indirect 'I I C where the (k-l) x p matrix C has as its aI I I I I and rates~ C is simply the (p-l) x p matrix 1 -1 . 0 o 1 o -1 o o o 1 o o 0 o o o .(3.1) -1 I ~ I I I I I I -6- When k=2, we may alternatively use as the test which is approximately distributed as a standard normal random variable when H is true. In this case we would reject H at the a level of significance 0 0 · when Izi ~ zl-a/2' where zl-a/2 is the 100(1-a/2)% point of the standard • normal distribution. A The use of T to test hypotheses which involve an SMR(i) for which Pij 1 is either 0 or 1 for every j requires special attention since the estimated A A variance cr.2 of SMR(i) is zero. ~ ~ I I I I I I I •I sta~istic choo~ing Such a situation can be handled by either the estimate of Pij according to some criterion other than maximum likelihood (e.g., (d .. +1)/(n. +2) is the Bayes estimate for a uniform prior ~j ~J on Pij) or by deciding not to test hypotheses about SMR(i). 4. Testing.H whenTI.isformedfrom the. test populations. When Poj is of the form (2.2) so that the estimate of SMR(i) is given by (2.3), the procedure for testing H depends on the relationships expressed o in the following lemmas. Lemma 4.1. When P has the form (2.2), then Ii=l a i SMR(i) = 1, where oj a i = I;=l nijPoj/li=l Proof. I~=l nijPoj· From (1.1), ~~ 1 a.~ SMR(i) = L~= ~~ 1(~: 1 n ~J .. p~J .. /~~ 1 ~: 1 n ~J .. p OJ.) L~= LJ = L~= L J = = r;=l(Ii=l nijPij)/I~=l Ii=l Lemma 4.2. When POj nijPoj' and the result then follows by using (2.2) • has the form (2.2), then SMR(1)=SMR(2)= ••• =SMR(p) if and only if there is a subset of (p-1) SMR's all equal to one. I ~ I I I I I I Ie I I I I I I I .I -7- Proof. If SMR(1)=SMR(2)= ••• =SMR(p)=~, say, then, from Lemma ·4.1, ~~~ 1 a.=l=~ t..~= since Ii=l ai=l. ~ If there is a subset of (p-l) SMR's all equal to one, then, taking this subset to be the first (p-l) SMR's, it follows from Lemma 4.1 that Ii:i a i + ap SMR(p)=l, or SMR(p) = (l-li:i ai)jap=ap!ap=l. This completes the proof. Lemma 4.3. Let Wii ' = I;=l nijni'j(Pij-Pi'j)!n.j and define Wi = Ii'=l Wii " i'=1,2, ••• ,p. Then, (i) (ii) (iii) W.. , = -W.,. and W.. , = 0 if i=i', ~~ ~~ 1 W.~ t..~= ~ ~ ~~ = 0, when p . has the form (2.2), SMR(i)=l if and OJ only if W. = 0, i=1,2, ••• ,p. ~ Proof. (i) Trivial. (iii) Using (1.1) and (2.2), we have i, I -8- ~ I I I I I I I SMR{i)-l = So, when p . is of the form (Z.Z), it .fo110ws from Lemma 4.2 and from OJ part (iii) of Lemma 4.3 that testing the hypothesis that SMR(1)=SMR{2)= ••• =SMR{p) (=1) or that P(1)=P{2)= ••• =P{p){=a ) is equivalent to testing the hypothesis o that W.=O for every i, i=1,2, ••• ,p. 1 ~ I I I I I I I .I (I:J=1 n 1J.. p1J. . /I:J=1 n1J.. pOJ.)--jL The appropriate test statistic for. this latter hypothesis is A where ~' and A I are the estimates of ~' = (W1~W2'· •• 'Wp) and L= A A P {(cov(Wi ,Wi '»)i,i'=l A obtained by putting p .. for p .. , and where C is given,by (3.1). (From part 1J 1J (ii) of Lemma 4.3, it follows that one can alternatively use for C the matrix obtained by deleting anyone row of the identity matrix of order p.) The statistic T is approximately distributed as a central X2 variate with (p-1) Z degrees of freedom when the hypothesis that W=O is true. The determination of the elements of forward. I is somewhat tedious but straight- In particular, A A A A cov{Wi,wi ,) = Li=l Li'=l cov{Wii,Wi'i') I ~ I I I I I I -9- can be evaluated directly from the following formulae (see part (i) of Lemma 4.3): A I I I I .I = 0 if either i=~, i'=~', or both, or if i, ~, i' and ~, are all different; A A "o• W " o ,.,); = cov(W cov(Wi~,Wi'~') = - "" . N~, ~ N c cov(Wi~,Wi~') = L j =l nijn~jn~'jPij(l-Pij)!n~j if i, ~ and ~, are all different; and, for i:/:~, " var(Wi~) II I I I A COV(Wi~,Wi'~') ,,2 ,For the special case p=2, the test statistic takes the formw which is approximately distributed as X~ when SMR(1)=SMR(2). " " !var(W ), 12 12 This case has been studied by Cochran [3] and Quade (unpublished). Unfortunately, ·the above procedure cannot be used to test the general hypothesis H given by (1.2) when k is to be strictly 1ess.than p. o the difficulty, consider the case where k=2 < p and =SMR(i ). 2 w~ To illustrate wish to test H : o SMR(i ) 1 Then, because p exceeds k, we cannot claim that this hypothesis is equivalent to the hypothesis that W. =W. =0, which is necessary for the test ~l statistic T to be appropriate. 2 ~2 A reasonable solution to this problem is suggested by the findings presented in the next section. 5. Computer programs have been prepared which calculate SMR's and indirect rates for p 2 10 populations and c 2 20 categories. One of these programs is written in Fortran IV for the IBM 360 computer at the Triangle University I '- I I I I I I I Ie I I I I I I I •e I -10- Computation Center (TUCC) and the other in Time-Sharing Fortran for the Ca11-A-Computer system. These programs allow the user to specify the standard population or to instruct the computer to form the standard by pooling the test populations. The user of either program may test hypotheses of the form (1.2) as described in Sections 3 and 4. These programs were used to compare the two test procedures presented in this paper in the situation where the the test populations. stand~rd population is formed by pooling The results obtained should be of considerable interest to those researchers who have incorrectly used the test procedure described in Section 3 when they should have employed that of Section 4, for it is conceivable that inferences drawn from using the "incorrect" test statistic could be reversed when the "correct" one is used. 'The data used in this computer study was obtained from -the Evans County Coronary Heart Disease Study mentioned in [1]. sets, each pa~titioned There were 38 different data into the same c=7 age groups; 25 of the sets were based on two populations, -7 on three and 6 on four. The hypothesis (1.2) for k=p was tested and the values of the test statistics T and T ,for each data set 1 2 are presented in Table I. None of the data sets gave. estimated SMR's of zero (see the last paragraph of Section 3). It can be seen from Table I that, despite the rationale for preferring the test procedure of Section 4, both test statistics give almost exactly the same numerical values in each case. This leads us to feel that a test of the hypothesis (1.2) when the standard is pooled from the test populations and the "incorrect" test procedure of Section 3 is used will yield fairly reliable results not only for the case k=p but also when k < p • I '- I I I I I I I -11- 6. ~. This paper describes· tests of hypot?eses of the form H : o =••• =SMR(i ) or k and 2 I I I I I I I .e I k ~ population. ~ P(i )=P(i )= ••• =P(i ), where 1 1 2 k I p, when there are p test popu1ation~ i 1 < i 2 < ••• < ~ ~ P and c catagories for each The distinction is made between a standard population which is chosen independently of the test populations and one which is formed by pooling the test populations. two situations. Different test procedures are appropriate for these When the pooled standard is used, the appropriate test.pro- cedure is applicable only when k=p. • Experimental evidence is givell showing that when the pooled standard is used both the "correct" and "incorrect" test procedures lead to the same conclusion concerning H o k=p. Ie ~ H~: SMR(i )=SMR(i ) 1 2 (or H') for the case 0 The recommendation is made, therefore, to use the "incorrect" test pro- cedure when the standard is formed by pooling the test populations even whe. k < p. • I -12- lI I I I I I II I I I I I I I ,ilI TABLE I VALUES OF T AND T OBTAINED BY TESTING Ho 1 2 FOR k=p WHEN THE STANDARD IS FORMED BY POOLING THE TEST POPULATIONS NUMBER OF POPULATIONS P 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 "INCORRECT" T1 28.2136 16.9346 21.2930 10.2976 10.4231 1. 7373 3.1895 7.6659 0.1513 0.0475 1.2320 0.1386 .1.4684 1.9818 0.1872 1.3005 1.4704 0.2089 0.4434 0.5522 2.2685 0.0042 0.0361 2.6843 0.0016 3.5495 8.1122 10.4558 2.5145 10.6485 17.0558 21.6299 23.9671 20.3859 16.6307 7.1490 9.7703 3.6639 "CORRECT" T 2 28.7157 17.4983 21.6004 10.1884 10.8275 1.7946 3.2292 7.3038 0.1533 0.0476 .1.2488 0.1450 1. 4437 1.9965 0.1947 1.2828 1.4462 0.2131 0.4296 0.5302 2.1455 0.0043 0.0466 2.7078 0.0016 3.5685 7.816.2 10.8670· 2.4496 10.5651 17.6127 21. 9669 24.6090 20.9092 16.4557 7.0270 9.6688 3.6656 I -13- ~ I I I I I I I ~ I I I I I I I ~ I REFERENCES [1] CASSEL, J. C., HEYDEN, S., TYROLER,'H. A., CORNONI J., BARTEL, A. and HAMES, C. G. (1970). Differences in probability of death related to physiological parameters in a biracial total community study (Evans County, Georgia). To be published. [2] CHIANG, C. L. (1961). Standard error of the age-adjusted death rate. U. S. Department of Health, Education and Welfare, Vital Statistics Special Reports, i[, 271-285. ' [3] COCHRAN, W. G. (1954). Some methods of strengthening the common X2 tests. Biometrics, 10, 417-451. [4] HILL, A. B. (1956). University Press. [5] KEYFITZ, N. (1966). Sampling variance of standardized mortality rates. Human Biology, 38, 309-317. Principles of Medical Statistics. New York, Oxford
© Copyright 2025 Paperzz