DETECTION OF CONFOUNDING AND TESTING FOR CROSS-OVER
EFFECT IN EPIDEMIOLOGICAL STUDIES
by
Regina C. Elandt-Johnson
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1459
April 1984
•
DETECTION OF CONFOUNDING AND TESTING FOR CROSS-OVER
EFFECT IN EPIDEMIOLOGICAL STUDIES
Regina C. Elandt-Johnson *)
Department of Biostatistics, University of
North Carolina, Chapel Hill, N.C. 27514
ABSTRACT
This is a semi-expository paper on theoretical bases, applicability
2
and applications of X -ANOVA-like analysis of incidence rates or prevalence
proportions, where the data are presented in the form of a sequence of 2 x 2
tables corresponding to levels (strata) of a specified variable (risk factor) X.
The distributional assumptions for incidence rates (prospective studies) and
prevalence proportions (retrospective studies) are reviewed with the emphasis
on conditional inference, in each situation.
A novel feature of this paper
2
2
is the finding that the so-called Uhomog
~(denoted here by XD'ff)
is not a test
1
of "homogeneity" in the sense that rate ratios are constant over all strata,
but it is a test for crossing-over, at least once, of the rate functions in two
defined populations.
Several examples are given to illustrate the techniques
of constructing the tests as well as emphasizing the applicability of stratified
2
X -analysis to various problems in epidemiological studies.
Key Words:
Incidence rate; Poisson distribution; Binomial distribution;
Confounding factor; Multiplicative model; Interaction; Effect modifying factor;
Cross-over effect; Homogeneity.
*)This work was supported by U.S. National Heart, Lung, and Blood
Institute contract NIH-NHLI-7l2243 from the National Institutes of Health.
2
1.
1.1.
INTRODUCTION
Consider an epidemiological or medical comparative study, in
which the effects of a specified factor (X) on incidence or prevalence of
certain event
(0)
is investigated in two populations,
PI
and
PZ.
Thus,
formally, three random variables, P, D, and X can be observed on each
individual.
Population variable P takes values 1 or
groups,
PI
and
PZ'
respectively.
° and determines two study
They might represent two demographic
populations, individuals exposed and non-exposed to occupational hazard,
two groups of people with different life styles, diet habits, treatment and
control groups in medical investigation, etc.
Response variable D also takes values 1 or 0, which refer to occurrence
or non-occurrence of event (0).
In epidemiological studies, event 0 may
represent a certain disease or health disorder; in mortality studies, it
represents death, Dr death from a specific cause.
Concomitant variable X is a characteristic which may be, in some way,
associated with P or D or both, and therefore is a reasonable subject for
investigation.
It can be a discrete variable (e.g. smoking or non-smoking),
or a continuous variable such as age, blood pressure, level or serum
cholesterol, etc.
By association of random variables we understand their stochastic
(statistical) dependence.
We now examine possible associations among P,
D, and X.
(i)
If X and D are associated, then X may be considered as a
prediction or prognostic factor for event O.
In particular, if 0 is a
3
harmful event and increase in X increases (decreases) the probability of occurrence of
V,
then X is a risk (beneficial) factor.
It is, however, customary in
epidemiological literature to use the term risk factor in a broader sense
to mean:
"X is associated with D."
Association between X and P implies that the distributions of X
(ii)
in
PI
and
P2
are different.
Care should be taken to recognize how this may
happen; X might be differently distributed in
PI
and
P2
because genetic and
environmental factors affecting X are differently distributed in
PI
and P ,
2
or the association between X and P may arise from the design of the experiment
in selecting
PI
and P •
2
(iii) Of special interest in epidemiological and medical experiments
is the association between X and both D and P.
~
confounding factor.
In this case, X is a
Or, in other words, if the response (D) depends on factor
X, and the distributions of X in
PI
and
P2
are different, then X is a con-
founding factor.
1.2.
Comparison of incidence rates or prevalence proportions in two
populations almost always encounters the problem of adjusting for confounding.
Often data are stratified by a confounding factor and are represented in
the form of a series of 2 x 2 contingency tables.
First, one may be interested in a sort of 'global test' for comparison
of X-adjusted rates (Armitage (1966), Gart (1978)) or X-adjusted proportions
Cochran (1954), Mantel and Haenszel (1959)).
In Section 2, some of the
issues in constructing such tests will be reviewed, emphasizing various
assumptions and interpretations of the results.
4
1.3.
Our interest, however, might also be in the analysis of behavior
(patterns) of rates in two populations over individual strata.
One way
of approaching this problem would be by fitting multiplicative (log-linear)
models.
If the data fit a multiplicative model, then there is no interaction
between X and P.
(See Section 3.2 (iii).)
In Section 4, we present a special
2
approach--a X -ANOVA-like method--to detect whether cross-over of the rate
functions in two populations occurs.
2.
2.1.
COMPARISON OF TWO SURVIVAL F~CTIONS:
SOME TESTS BASED ON POISSON MODELS
For convenience, and without loss of generality, we assume
that the event (V) of interest is death, and the factor X is age, which is
clearly associated with mortality.
We consider mortality experiences of two
z' with the
study populations, PI and P
data grouped in fixed age intervals.
In age determined cohort studies, grouping might be according to follow-up
time rather than age.
For convenience we will occasionally refer to the
ith grouping unit as the ith stratum.
For the gth experience (briefly, sample from the gth population,
g
=
1,2) and ith stratum, let A . denote the amount of person-years (or
gl.
person-time units) exposed to risk, d .--the observed number of deaths, and
gl.
~
gi
= d gi /A gi --the observed death rate.
For the ith stratum we display the
data in the following form
Sample 1
Sample 2
(2.1)
Let the random variable D . denote the number of deaths in the gth
gl.
sample and the ith stratum.
If the D 's are small and the A . 's are
gi
gl.
5
large, then (conditional on the A . 's), the D 's can be treated as
g~
gi
independent Poisson variabZes with means
1,Z; i = 1,Z, ... ,I) ,
(Z.Z)
where the A .'s are the true death rates.
g~
Z.Z.
Conditional test in the ith stratum.
(itn) stratum as shown in (Z.l).
Consider first the single
Assume that D and D are independent
li
Zi
The sum D.
Poisson variates with means given in (Z.Z).
has a Poisson distribution, with mean
~.i
= ~li
+
~Zi.
i
= Dli + DZi also
Conditional on
D14 + DZ " = d ., the distribution of D . is binomiaZ with parameters d .
l
~
.L
and TI.~
=
.~
~l'/~
~
.; that is, Dg
.~b(d .,TI.).
~. ~
~
~. ~
.~
This is a well-known result, first
obtained--to the best of my knowledge--by Przyborowski and Wilenski (1939).
-e
Such distributions will be used in constructing tests for various hypotheses
concerning comparison of two rate functions.
Consider first the single (ith) stratum alone and a null hypothesis
(Z.3)
against one-sided alternative
(Z.4)
Let
AiO denote the (hypothetical) death rate when HiO is valid.
Then under H. '
~O
~
'0 = A
g~
.·A.~ O (g
g~
= 1,Z), and
(Z.5)
Note that
TI
iO
AiO (which is a nuisance parameter) is here not relevant;
is expressed entirely in terms of amounts of person-years
expos~d
to risk.
6
The null hypothesis (2.3) is then equivalent to
(2.6)
against the alternative
RiA·.
TI
i
>
TI
(2.7)
iO·
Three tests can be constructed for testing RiO.
(a) Exact test based on the binomial distribution ) b(d · l..' TI.l ).
O
a~
given significance level
let k.
l
= k.(a)
denote the smallest
l
For a
k. such
l
that
d .
~ l (d • i )
L
r=k. r .
pdD l· ~ k.ld
. ~TI .O}
l·l
l
l
d .) -• l r
TI.r(l-TI.
lO
lO
~
a .
(2.8)
l
The null
hypothesis~ RiO~
is rejected if d
(b) Approximate z-test.
small~
li
2 k .
i
For sufficiently large
d.i~
and
TI
iO
not too
a normal approximation to the binomial can be used.
The expected value and variance of D
conditional on d. and under
i
li
E (D . Id ., TI • 0)
·1
1
ll
d ·1.TI.o~
1
(2.9)
and
(2.10)
respectively.
The statistic
z.l
= (D . -E. .) /
l
l
l.I
is approximately distributed as unit
IV:1 ,
normal~
(2.11)
when RiO is true.
e-
7
(c) Approximate
x~'/,. test. Equivalently, the statistic
2
(Dl.-E ,) Iv,
1l1 1
(2.12)
2
is approximately distributed as X with 1 d.f.
2.3.
PZ'
We consider two experiences, PI and
Stratified analysis.
in which mortality data are grouped in I age intervals (strata).
We restrict
ourselves here to the situation in which the age specific rates in PI are not
smaller than the corresponding rates in P ; that is, we assume a model
Z
(2.13)
Under this assumption, we wish to test the hypothesis
1 for aU i,
·e
(2.14)
against the alternative
y, > 1 for at least some i.
(2.15)
1
Assuming that (conditionally on the A . IS), the D . 's are independent
g1
g1
Poisson variates, we construct a test based on the statistic D • =
l
t
D •
li
In fact, we propose two versions of this test.
2.3.1.
Version 1 (Statistic
X~omb).
We recall that, conditionally
on d. i , the variate D
is binomial with mean Eli = d.iTI
and variance
iO
li
V1' = d •.1 TI'O(l-TI.
)' where TI.1 O = Al./A
,.
1
1O
1 . ·1
{d
Thus, conditionally on the sets
.}and{TI.J. }' the random variable Dl • is approximately normal with mean
O
• ].
I
El .
and variance
E ,
i=l l 1
I
V
I
=I
I v.
i=l 1
=I
d .TI,o,
i=1·1
(2.16)
1
I
=
I
i=l
d .. TI'O(l-TI,O)
1 1
1
(2.17)
8
The statistic
(2.18)
("Comb" for "combined") is approximately distributed as unit normal, and
the statistic
(2.19)
2
is approximately distributed as X with 1 d.f., when H is valid.
O
Either
of these can be used in testing H .
O
2.3.2.
Version 2 (Statistic
*2
XComb )·
We propose a test based, again,
on the statistic D ., but conditionally on the total number of
1
d ..
deaths~
= d = I I d g1.o·
o
g 1.
(i
As in Section 2.2, let A
iO
rates, when H is valid.
O
so are D . and D ..
2
1
= 1,2, ... ,1) denote the true death
Since the Do's are independent Poisson variates,
g1.
e-
Under H (see (2.14», we have
O
I
I
~~.
1,2) ,
A oAo (g
i=l g1. 1. O
<:>
(2.20)
and
~
..
(2.21)
Thus, conditional on D . + D .
l
2
= d, the random variable Dl . has a
binomial distribution with parameters d and
I
=
Now,
TI
O
TI
' where
O
I
(I AloAoO)/(I A.oAo O)
011.1.
°11.1.
1.=
1.=
(2.22)
does depend on the nuisance parameters AiO's so that, in
principle, the test can be affected by the choice of the AiO's.
If there
9
is no rationale for using pre-specified AiO's, a common and convenient
procedure is to use the maximum likelihood estimates for the AiO's.
In our
case, we have
A
\0 = d./A· i ,
(2.23)
so that
(2.24a)
and similarly,
d ·1.(1-71".0).
1
(2.24b)
Hence, under H
O
I
~l. = El . =i=l
L d ·1,71"·0
1
(2.25a)
(see also (2.16», and
(2.26b)
so that
~
..
(2.26)
Substituting into (2.22), we obtain
1 I
Ld ··71"·O
=di=l
1 1
(2.27)
Therefore, under H and conditionaZZy on D • + D • = d, the distribution
O
2
l
A
of D . is approximately binomial, b(d,71"O).
l
Using this distribution, an
'exact' critical region can be constructed in a similar manner as for the
ith stratum (cf. (2.8».
10
We notice that
(2.28)
and
If TI'"
O
is sufficiently large, normal approximation can be used.
The statistic
(2.30)
or equivalently,
*2
Comb
(2.31)
X
can be used in testing H '
O
We also notice that
*2
Comb
(2.32)
X
The following remarks might be useful.
*2
(i) The form of X
in (2.32) resembles the Peto and Peto (1972)
Comb
logrank statistic, which was derived for ungrouped data, by using nonparametric techniques.
(ii) Statistic (2.31) was also considered by Armitage (1966).
He
showed that conditional on the set {d. } (though this is not clearly spelled
i
out in his paper), the statistic (2.31) has to be modified (corrected) to
have an approximate X2-distribution.
In the present context, (2.31) is
derived conditionally on the total number of deaths, d.
11
(iii) The techniques presented here are essentially equivalent to
indirect standardization.
If the AiO's are pre-specified, this corresponds
to external standardization; when the ~iO's given by (Z.Z3) are used, this
corresponds to internal standardization.
Note that our hypothesis H is
O
concerned with comparison of two (non-crossing over) rate functions, and
not with the estimation of population effects, so it is legitimate to use
standardization.
(iv) It can be shown (the algebra is a bit tedious) that
V ~ V* ,
(Z.33)
so that tests constructed in "Version 1," reject H more often than those
O
in "Version Z."
If the numbers of events (the d.i's) are small, the statistics
in the two versions differ but little.
4It
If this is not the case, the dis-
crepancies might be too big to be neglected; in such cases Version 1 is the
more appropriate (Haybittle and Freedman (1979» .
•
(v)
Clearly, similar analyses can be used with the person-years
exposed to risk, A , replaced by the numbers of individuals, n , in the
gi
gi
midyear population (samples).
(vi) Of course, the situation
M: Ali
:$
A for all i
Zi
,
H : Ali = A or Ali/A
= 1 for all i ,
Zi
Zi
O
H : Ali < AZi or Ali/A
Zi
A
Yi < for some i
I
(Z.34)
can be treated in the same fashion.
3.
3.1.
DETECTING CONFOUNDING AND INTERACTION
Confounding.
We first consider the problem whether X is a con-
founding factor; that is, whether the distributions of X in P and P are different.
1
z
12
3.l(i) Analysis of the TIiO's.
Suppose that
(3.1)
where c is a constant.
It follows that
(3.2)
is also constant.
If the data represent two independent random samples
from two populations, one may construct a test of the null hypothesis:
TI
ia
= TI O for aZZ i.
However, in this article inferences are made conditionaZ
on sets {Ali}' {A 2i }, so that the Aia's are true binomial proportions.
Of
course, it is also necessary to judge what sizes of differences among the
TIiO'S have a practical significance.
3.l(ii)
Pooling the strata.
Another way of detecting confounding
is to pool the strata, and calculate the test statistic as if there were a
singZe stratum (cf. Section 2.2).
In this case, we calculate
(3.3)
E(D
1·
Id TI')
'0
= d TI'0 = E'1·
(3.4)
and
d TI'(l-TI')
a
a
(3.5)
The statistics
z'
(3.6)
or
'2
X
= (D 1· -E'1· ) 2 lv'
(3.7)
13
,
can be used to test the hypothesis HO: Al
=
AZ ' that is, that the cru.de
(overall) rates are the same.
On the other hand, the statistics Z
Z
xComb
(X
or equivalently
Comb (Z~omb)
*2 ) can
be used in testing the hypothesis that the age adjusted
Comb
rates are equal.
Notice, that with this formulation of the null hypothesis,
the assumption that Ali/A
~
Zi
1 for all i is not required.
If, however, we have proportionate distributions in PI and P as
z
defined in (3.1), then
c/ (l+c) ,
(3.8)
so that
E~.
= El .
and V'
V
=
(3.9)
V* •
Clearly, if (3.8) holds, then the observed statistics are identical,
that is
,
Z = Z
Comb
and
'z
X
xZComb
Z*
Comb
(3.10)
*Z
Comb
(3.11)
= X
'z
Thus, sufficiently large discrepancy between X
indicates that confounding exists.
2
*2
and X
(or X
)
Comb
Comb
No formal test involving size of this
discrepancy is here suggested; the bigger the discrepancy the more support
for using stratification (see Example 1).
3.2.
Interaction and effect modification.
assumed that A /A
li 2i
A /A
li 2i
=Y
~
= Yi
~
1 for all i.
In our model (M), we only
If, however, additionally,
1, for all i, where y is a constant, then we conclude that
there is no interaction between X (age) and D (mortality) (see Section 3.2(iii)
and also Elandt-Johnson (1984).)
14
I f the Yi's are not the same (even i f the model
Ali /A 2i = Yi 2: 1
for aLL i holds), then there is an interaction and X is considered as an
effect modifying factor for response V.
How do we test the hypothesis
Hb:
A /A = Y for aU i, against the alternative
2i
l
at least some i?
H!:
A11 /A 2i
= Yi :f Y for
(i) A heuristic approach would be to examine the observed ratios
~li/~2i
=
Yi ,
to obtain some idea from the data whether the hypothesis that
X is not a modifying factor for V might not be rejected.
(ii) Another way of looking at this problem would be to analyze the
individual
X~'s.
Of course, their values depend also on the total number
1.
of deaths, d. , in each stratum.
i
A better idea can be obtained by comparing
the "phi coefficients"
(3.12 )
where 0 ~
f.
1.
~ 1 is a kind of correlation coefficient for the ith 2 x 2
contingency table (see, for example, Fleiss (1981), Section 5.2).
(iii) A formal method would be to fit a Poisson multiplicative model
Agl.•
(with 8 + 8
2
1
=
=
8g si ' g
1), where 8
g
=
1,2, ..• ,1 ,
1,2; i
(3.13)
is the effect of the gth population and Si is
the effect of the ith stratum.
If model (3.13) is valid, then there is
no interaction between P and X (Elandt-Johnson (1984». Model (3.13) can be
equivalently represented in the form
Ag1..
=
exp(a +8.).
g
1.
(3.14 )
Several authors (Bishop et al (1975), Breslow and Day (1975),
Gail (1978), Gart (1971, 1978), Osborn (1975) among others) have considered
15
such models, usually for more than two populations.
An elegant (theoretical
and practical) treatment of Poisson multiplicative models and of inference
based on these models forG
(~
2) populations is given by Anderson (1977).
If the data fit the model (3.13), then we have
(3.15)
Also, under the assumption that the data fit a multiplicative model, our
hypothesis HO in (2.13) is equivalent to H : 01 = 8 2 =
O
4.
4.1.
-e
•
CROSSING-OVER OF RATE FUNCTIONS.
X2 -ANOVA-LIKE ANALYSIS
Incidence data.
for all i holds.
t for all i.
So far, we have assumed that the model M: Ali
~
A2i
Suppose, however, that this assumption cannot be made, nor
do the observed Agi's support this assumption, that is, we observe ~li > \2i
+i .
for some i, and ~l.1· < \2.1· for some i'
The analysis may be the following.
We consider again mortality data
in two populations, grouped in fixed age intervals.
We first recall a well-known algebraic relationship.
Let y. be an
1
observed value of a variable y, and w. be a 'weight' attached to y .•
1
I
2
L w.y.
i=l
I
where y
= (L
1
1
1
(L
i=l
w.y.)
1
1
2
1
1
/(L
i=l
w.) =
1
1_
2
w.(y.-y)
i=l 1 1
L
(4.1)
I
w·y·)/(L w.).
i=l 1 1 i=l 1
For our data, let
(4.2)
be a 'score,' and
We have
16
(4.3)
W.
1.
be the 'weight,' so that
1
2
I
W.Y.
i=l 1. 1.
1
=
I (dl·-E
l 1.·)
i=l
1.
2
I
Iv. =
1.
2
I X.
i=l 1.
2
K.rotal .
(4.4)
2
Note that conditional on the set {d .}, the statistic X
is
'1.
Total
2
2
approximately distributed as X with I d.f. The x
1 defined in (4.4)
Tota
reflects--to some extent--the variation from stratum to stratum, irrespective
of the signs (+ or -) of individual scores.
Also the statistic
1
1
2
(I
w.y.) I(I w.)
i=l
i=l
1. 1.
(4.5)
1.
(cf. (2.19)) conditional on the set {d. }, is approximately distributed
i
2
as X with 1 d.f. This is a test statistic for overall effect of population
PI relative to Pz (without assuming that the model M is correct).
The difference
2
X
Diff
is approximately distributed as
=
x2
Total
_ X2
(4.6)
Comb'
X2 with (I-I) d.f. and can be used in
testing the hypothesis
R(S), (A
o .
li
lA 2i - 1) does not change sign as i varies,
(4.7)
against the alternative
Ali > A2i for some i, I
A . A '<
A2i ' for some 1. •
li
R(S) .
., J
(4.8)
17
2
In other words, X
is a test statistic for detecting whether the
Diff
rate functions oross-over at least onoe over the strata; it provides a
test for a special kind of heterogeneity.
(See discussion in Section 6.)
2
2
Also notice that XComb and X
are approximately independent.
Diff
If the rates are small, we often have
X
*2
Comb
• X2
T
(4.8)
Comb
*2
where X
is the logrank test statistic for grouped data, defined in
Comb
(2.31).
Although algebraically
X2
*2
Total - XComb
(4.9)
2
is correct, we notice that X
1 is a relevant test statistic if inference
Tota
*2
is conditional on the set {d .}, while X
is relevant for inference
.1.
Comb
conditional on d
.
= \d
..
~ .1.
1.
2
2
2
X
The three statistics X
and X
should be considered and
Total' Comb
Diff
interpreted jointly, as will be shown in Examples (Section 5); some questions
of joint behavior of these statistics will be discussed in more detail in
Section 6.
4.2.
Prevalence data.
The tests we have discussed in Sections 2, 3,
and 4.1 are appropriate for comparisons of inoidenoe rates of an event V in
prospective studies.
Similar methods can also be used in analysis of preva-
lence proportions in retrospective epidemiological experiments.
Let n . be the number of individuals and d . the number who experienced
g1.
g1.
an event V in the gth population and the ith stratum.
"-
Let q .
g1.
d
./n g1.. be
g1.
the estimated prevalence (probability, proportion) of an event V in the
18
(gi)th class, and q . be the corresponding true (but unknown) probability
gl
of this event.
If the q . 's are small, then the binomial distribution,
gl
b(n .,q .), can be approximated by the Poisson distribution with mean
gl
~gi
gl
= ngiqgi'
Thus, the methods described in this paper are approximately
applicable to prevalence data.
If, however, the q . 's are rather large, these
methods might be inappropriate.
In such cases, the conditional distribution
of D given D + D = d' is
li
Zi
i
li
H : qli = qZi for
O
aZZ
gl
hypergeometric~
and the test criteria for
i, should be based on this distribution.
Analysis analogous to those used in deriving
X~omb
(Section Z.3.l)
leads to the Mantel-Haenszel (1959) (briefly, M-H) procedure; the only formal
difference is the formula for variance in each stratum.
It turns out,
however, that the variance in the M-H procedure, Vi(M-H)' can be expressed
in terms of the variance V. defined in (Z.10) by the relation
1
n
Vi(M-H) =
.-d '1.
'1
n' i
V••
1
(4.10)
The M-H test statistic for the ith stratum is
(4.11)
and for all strata combined
(4.1Z)
where
I
=i~lVi(M-H)
(4.13)
V(M_H) < V < V*
(4.14)
V(M_H)
Of course,
(For application, see Example 4.)
19
In our analysis of rates, we have assumed that these are very
4.3.
small, so that the assumption that the D . 's are Poisson variables is
gl.
approximately valid.
Suppose, however, that this is not the case.
What
A-
should we do in such situations?
•
We may 'convert' the rates A • 's into
gl.
A-
proportions q . 's, by calculating the 'effective number of initial exposed
gl.
,
to risk,' n . (Elandt-Johnson and Johnson (1980) Chapter 8).
gl.
If n . is the size of the gth rnidperiod population (sample) in the ith
gl.
age interval (stratum), then
,
1
n . . n · + - d.
2 gl.
gl.
gl.
(4.15)
If the person-years exposed to risk, A ., over an age (or time)
gl.
interval of length h. is given, then
l.
n .
gl.
Using the n ' 's as
gi
~
l
1
h (A .+ -2 h.d .).
i gl.
l. gl.
(4.16)
if they were integers in binomial distributions, b«n'gl..,q gl..)
for D .J the M-H procedure can be applied as discussed in Section 4.2.
gl.
5.
EXAMPLES
Three examples are given in this section, mainly for the purpose of
2
illustrating various aspects of inferences and conclusions from the X -analysis.
The significance level a
EXAMPLE 1.
= 0.05 will be used in these analyses.
The data in Table 1 represent the mortality experience
of white males selected at random from two locations (called here, briefly,
"cities") and followed, on the average, for 5 years.
City 1 is an
industrial city with a greater proportion of younger people, while City 2
is a kind of retirement community, with greater proportion of older people.
TABLE 1
COMPARISON OF MORTALITY FROM ALL CAUSES IN TWO USA CITIES
City 1
Stratum
(i)
Age
Group
Ali
1
2
3
4
5
6
30-55
55-60
60-65
65-70
70-75
75+
TOTAL
e
-
-
A
li
Ali
Eli
A
2i
875.06
196.24
184.49
156.87
118.49
58.61
4
4
5
6
6
6
.00457
.02038
.02710
.03825
.05064
.10237
2.54
2.87
3.47
4.64
2.91
5.15
503.65
145.73
187.49
282.45
369.52
305.78
0
1
2
7
6
26
1589. 76
31
21.58
1794.62
42
2
XTota1 = 9.80 (NS);
-~
Total
City 2
A
d
.
2
X
= 6.62 (S);
Comb
d 2i
A
A
2
X,
A2i
E2i
A,
d ,
AiO
Yi
TriO
V,
0
.00686
.01067
.02478
.01624
.08503
1.46
2.13
3.53
8.36
9.09
26.85
1378.71
341. 97
371. 98
439.32
488.01
364.39
4
5
7
13
12
32
.00290
.01462
.01882
.02959
.02459
.08782
2.97
2.54
1.54
3.12
1.20
-
.63469
.57385
.49597
.35707
.24280
.16084
0.9274
1. 2227
1. 7499
2.9845
2.2062
4.3192
2.30
1.04
1.34
0.62
4.33
0.17
51.42
3384.38
73
13.4099
9.80
2
XOiff = 3.18 (NS);
e
'1
'1
1
X' 2 = 0.60
e
1
N
0
21
(For details, see Lipid Research Clinics Program (1974).) Clearly (as also
can be seen from the TI
iO
column), age is a confounding factor.
The resulting
2
X -tests are:
Version 1:
•
X~otal
2
Version 2: X
Total
X~omb
2
6.62 (S); ~iff
3.18 (NS);
*2
9.80 (NS); X
Comb
*2
5.84 (S); X
Diff
3.96 (NS).
9.80 (NS);
(S-significant, NS-not significant)
2
*2
In this example, the results X
and X
are similar, each
Comb
Comb
indicating that the age specific mortality rates over the age range 30 to
75+ are higher in City 1 than in City 2.
The observed values of the Y. 's
1
are greater than I for all i, which indicate that the model M: Ali
for all i, is not contradicted by the data.
-e
X~omb'
A
2i
It is also worthwhile noting
'2
that pooling the data in a single stratum, we obtain X
different from
~
= 0.60,
which is far
and indicates (together with the TIiO's) that the
age (X) here is, indeed, a confounding factor.
EXAMPLE 2.
The data used in this example are also Lipid Research
Clinics Program Follow-Up Study, but the samples were here purposive, with
fairly large proportions of individuals with high cholesterol and/or high
triglycerides (for detail, see LRCP (1974».
was about 7 years.
level
The average follow-up time
Our interest here is in the effect of higher vs. lower
of cholesterol on mortality from Coronary Heart Disease (CHD) (see
Table 2) and mortality from Cancer (Table 3) in white males of age x
~
30.
The higher level of cholesterol was defined as being the third tertile
(approximately, Chol
~
67th percentile), and the lower level was corresponding
to the remaining two first tertiles (approximately, Chol < 67th percentile).
TABLE 2
CHD MORTALITY IN WHITE MALES WITH HIGH AND LOW LEVELS OF CHOLESTEROL
StraAge
tum
(i) Group
1
2
3
30-55
55-70
70+
Total
2
X
1
Tota
Cholesterol
Ali
d
~
~li
li
6568.98
2470.80
615.54
16
23
10
9655.32
49
= 24.44 (S);
67th PRC
.002436
.009309
.016246
2
X
Comb
Cholesterol < 67th PRC
A2i
Eli
d
2i
7.66
13.56
7.72
13149.86
4997.76
1219.34
7
18
13
28.94
19366.96
38
~2i
.000532
.003602
.010662
Total
E
2i
A'1.
15.34
27.44
15.28
19718.74
7468.56
1834.88
23
41
23
58.06
29022.28
87
2
= 20.84 (S); XDiff
= 3.60 (NS);
X'2
d .
-1
~iO
.001166
.005490
.012535
Yi
TT iO
V.
x12
4.58
2.58
1.52
.33313
.33083
.33547
5.1096
9.0766
5.1273
13.61
9.82
1.01
19.3135
24.44
"-
1
= 20.83
TABLE 3
CANCER MORTALITY IN WHITE MALES WITH HIGH AND LOW LEVELS OF CHOLESTEROL
StraAge
tum
(i) Group
1
2
3
30-55
55-70
70+
Total
2
X
1
Tota
e
Cholesterol
~
67th PRC
Cholesterol < 67th PRC
li
~li
Eli
A
2i
d
6568.98
2470.80
615.54
4
4
7
.000609
.001619
.011372
4.33
9.59
9.06
13149.86
4997.76
1219.34
9
25
20
9655.32
15
22.98
19366.96
54
Ali
d
= 5.61 (NS);
2
X
Comb
= 4.15 (S);
2
X
Diff
2i
~2i
.000684
.005002
.016402
= 0.46 (NS);
N
N
Total
A .
'1
d .1.
8.67
19.41
17.94
19718.84
7468.56
1834.88
13
29
27
46.02
29022.28
69
2i
'2
X
~iO
.000659
.003883
.014715
"-
Yi
0.89
0.32
0.69
TT
iO
.33313
.33083
.33547
V.
1
X:
1
2.8880
6.4200
6.0190
0.04
4.86
0.70
15.3270
5.61
= 4.14
e
e
•
23
It would be desirable to have age specific population terti1es; lacking
these, we estimate them from the data.
Also, since there were no repeated
measurements of cholesterol during the follow-up time, the terti1es were
estimated from the frequency distribution at entry.
This is not quite
•
correct, but lacking proper data, it seems reasonable,
at this stage.
First we notice that for these data age does not appear to be a confounding factor; the TIiO'S are almost the same for the two groups, and
2
'2
the X
and X
are practically the same.
Comb
Comb
However, the effects of
cholesterol level on the CHD and on cancer mortality are quite different.
(a) CHD mortality.
The mortality rates are greater for the higher
level of cholesterol for all ages (y. > for all i); the
1
highly significant.
However, the
y. 's
1
X~omb =
20.84 is
(and the X:'s) decrease substantially
1
with age, indicating that high level of cholesterol as a potential risk
factor in CHD mortality is more important in younger ages.
It implies that
there might be a chol x age interaction.
(b) Cancer mortality.
Here we observe an opposite effect.
Lower
level of cholesterol is a potential risk factor in cancer mortality
A
(y. < I for all i).
1
It seems that this may be more apparent in age group
55-70, but since mortality data are rather sparse, they do not support strongly
this view.
EXAMPLE 3.
The data in Table 4 represent prevalence of cardiac event
(a manifestation of possible ischemic heart disease) in white males and white
females selected for the Follow-Up Study in the Lipid Research Clinics Program.
The TIiO'S indicate that age x is, in these data, a confounding factor.
A
For
A
ages 30-59, y. > 1, while for ages
1
is a cross-over effect.
~
60, y. < 1, which suggests that there
1
2
The M-H X -statistics calculated for these data
TABLE 4
PREVALENCE OF CARDIAC EVENT IN WHITE FEMALES AND WHITE MALES
Age
Group
n
1
Stratum
(i)
Total
White Males
White Females
A
A
A
n . 1.
d '1.
qiO
Yi
'!TiO
V.
A
q2i
384
7
.01823
10.42
700
19
.02714
2.08
.45143
4.7052
4.5775
.08099 14.22
355
9
.02535
17.78
639
32
.05008
3.19
.44444
7.9013
7.5056 10.27
9.00
373
8
.02145
11.00
678
20
.02950
1. 83
.44985
4.9497
4.8037
1.87
16
.05016 12.96
321
10
.03115
13.04
640
26
.04062
1.61
.49844
6.5000
6.2358
1.48
248
22
.08871 18.35
333
21
.06306
24.65
581
43
.07401
1.41
.42685 10.5199
9.7413
1.37
243
24
.09877 21. 56
253
20
.07905
22.44
496
44
.08871
1. 25
.48992 10.9955 10.0201
0.59
li
qli
Eli
30-35
316
12
.03797
8.58
2
35-40
284
23
3
40-45
305
12
.03934
4
45-50
319
5
50-55
6
55-60
n
2i
d
2
Vi(M-H) Xi(M-H)
2i
E
2i
dli
1
2.56
N
---------------------------------------------------------------------------------------------------------------------7
60-65
146
12
.08219 15.03
126
16
.12698
12.97
272
28
.10294
0.65
.53676
6.9622
6.2455
1.47
8
65-70
121
13
.10744 15.88
100
16
.16000
13.12
221
29
.13122
0.67
.54751
7.1846
6.2418
1.33
9
70-75
81
3
.03704
7.95
82
13
.15854
8.05
163
16
.09816
0.23
.49693
3.9999
3.6072
6.79
10
75-80
28
6
.21429
8.52
18
8
.44444
5.48
46
14
.30435
0.48
.60870
3.3346
2.3197
2.74
11
80+
19
4
.21053
3.06
12
1
.08333
1. 94
31
5
.16129
2.53
.61290
1.1863
0.9949
0.89
2110 147
TOTAL
135.11 2357 129
140.89 4467 276
68.2392 62.2931 31.36
2
2
2
'2
XTota1 = 31.36 (S); XComb = 2.27 (NS); XDiff = 29.09 (S); X = 4.29
e
e
.
e
~
25
confirm this hypothesis;
X~omb =
2.27 is not significant, while x2Diff
(with 10 d.f.) is highly significant.
29.09
The cardiac event in white males
aged 30-59 is observed less often (and in white females, more often) than
expected, while the situation is reversed for ages
~
60.
2
2
The X -analysis restricted to ages 30-59 gives the following M-H X _
statistics:
2
XTotal(M-H)
2
2
= 18.14 (NS; XComb(M-H) = 13.80 (S)
'2
= 4.34 (NS), and X
XOiff(M-H)
= 13.84.
The interpretation is straightforward: there is an excess (statistically
significant) of cardiac events in females, for ages 30-59.
Similar results, but in the opposite direction, are obtained from
analysis of these data for ages
~
6.
6.1.
60.
DISCUSSION
Our main concern in this paper is with comparison of rates (or
more precisly, rate functions) in two populations, but most of the remarks
below apply also to proportions in stratified analysis of categorical data.
2
.
. . .1S
F1rst,
t h e XComb ( or X*Z)
Comb stat1st1c
0
Z
f ten terme d x assoc
("Chi-
square for association").
In this context, this means association between D and
P.
Zi for aU i) is valid, and
If the model M (Ali;:: A
this, indeed, indicates that the incidence rates in
If, however, model M does not hold, and
X~omb
X~omb
is significant, then
PI are higher than in PZ.
is not significant, it does not
imply that there is lack of association; in fact, there is often crossingover of rate functions, as can be seen from Example 3.
Therefore, the term
"association" seems to be not quite relevant; in this paper, the term
26
"combined" indicates only cumulative effect over strata as opposed to X'2
(Chi-square 'pooled') in which the strata are pooled into a single stratum.
6.2.
A second, even more important problem, arises with the inter-
pretation of X;iff' which is also commonly termed
~omog ("of
homogeneity"),
sometimes without even defining what "homogeneity" means.
Consider the model (3.13)
°
1,2, ... ,1
g s.1. , g = 1,2; i
(6.1)
This model implies that there is a constant (multiplicative) population
effect,
°,
g
over all strata, and there is no interaction between P and D.
I t follows that
°
(6.2)
for aU i
so that the relative risk over all strata is constant.
model may be regarded as homogeneous.
In this sense, the
An appropriate goodness of fit test
of a multiplicative model would then be the test of homogeneity (for heterogeneity; cf. Section 3.2 (iii».
2
The X
is not a suitable test statistic
Diff
for detecting heterogeneity (Examples 1 and 2(a»; it is constructed to
detect cross-over effect of rate functions.
Also, some authors use the term "homogeneity" in a narrower sense:
that there is no population effect, that is, 01
= 02 in (6.1) (e.g. Armitage
(1966).)
The above remarks apply also to proportions and to the use of
M-H procedure.
6.3.
Mantel et al. (1977) choose to define "homogeneity" in terms of
odds ratios.
ith stratum is
Using the notation of our Section 4.2, the odds ratio for the
qli • l-q2i
q2i
l-qli
(6.3)
27
If ql"1 and q2"1 are very small, then o.1
~ ql"/q2"' that is, the odds
11
ratio approximates the relative risk.
Otherwise,
0"
1
has no sensible mean-
ing, in my opinion, and should be avoided, even though it is a mathematically
•
"handy" index •
In the examples given by Mantel et ale (1977), the qli and q2i are large
(of order 0.50 to 0.99) and so the discussion about use of Zelen's (1971) test
2
(which appears to be based on our XDiff(M-H) is confusing and misleading.
If
Zelen intended to use this test for equality of odds ratios over all strata,
it would, indeed, not be appropriate for this purpose; but it is not "an
invalid test for any purpose," as Mantel et at. (1977) conclude at the end
of their discussion.
6.4.
2
In the X -analysis, commonly encountered patterns are the
following:
2
X
Total
2
X
Comb
X2
Diff
(3)
NS
NS
S
NS
S
S
NS
NS
NS
(4)
(5)
S
S
NS
S
S
S
(1)
(2)
The question arises:
stantial crossing-over?
}
No crossing-over
of rate functions
}
Crossing-over
of rate functions
Does the pattern (5) always indicate sub-
There are two situations where this might not be so.
(i) If the mortality data (the d.i's) are large, any small difference
2
•
between observed and expected values would lead to big X -value, even if
such a difference is not practically important.
(ii) A more complicated situation can arise, even when the d.i's
are rather small.
To investigate this question formally, consider, for
simplicity, only two strata, 1 and 2.
It is easy to show that the right-
hand side of (4.1) can be written in the form
Z8
Z
_ Z
L w. (y i- y )
i=l
(6.4)
1
Expressing this in terms of our variables defined in (4.Z) and (4.3),
we obtain
(6.5)
Suppose that D - Ell> 0 and D - E
< 0, that is, crossing over
IZ
ll
IZ
is observed in the sample, and suppose that
X~iff is significant.
IDll-Elll/Vl is relatively large as compared to
the sign of (DlZ-E
~iff
lZ
IDlZ-ElZIVZ ,
If
then changing
Z
) would have a minimal effect on X
, that is,
Diff
still could be significant even if D - E
> O.
IZ
IZ
From my experience,
however,IDll-EllI/Vl must be very large as compared to IDlZ-ElZI/VZ for this
to occur.
In practice such situations are very rare, so we have to use an
artificial example to produce such a situation.
EXAMPLE 4.
Consider the following data (below, in Table 5).
TABLE 5
Stratum
(i)
Sample Z
Sample 1
n
li
d
li
qli
n
Zi
d
Zi
qZi
1
100
ZO
O.ZO
100
Z
O.OZ
2
100
5
0.05
100
3
0.03
Note that the observed proportion in sample 1 and stratum 1, qn
=
0.20,
•
is much bigger than the remaining q's, so that the interaction populations x
strata is almost obvious.
2
The resulting x ,s are as follows.
..
29
2 2 2
XTotal(M-H) = l7.05(S), XComb(M-H) = l4.46(S) and XDiff - 2.39(N5); this
means that the observed model M: qll> Q2l'
GlZ
> QZ2 ' is not contradicted
by the analysis, even with this obvious interaction effect.
(ii) Suppose, however, that d
•
ll
=
70, while the remaining d . 's are
g1
the same as in Table 5, so that there is still no observed crossing over.
In this case,
9.20(5).
(gIl
Now,
~otal(M-H) = 100.35(5), X~omb(M-H) = 9l.l5(S)
X~iff
and X;iff(M-H)
is significant because the unusual structure in stratum 1
= 0.70) as opposed to stratum 2 (qZl = 0.05).
Of course, such data
are unlikely to occur in a real situation, and if they were to occur, no
statistical test would be required.
(iii) Also note that if in Table 5 we would have d
but d
ZI
Z
9.38(s), and XDiff(M-H)
•
=
20, d
Zl
=
2,
= 3 and dZZ = 5 (that is, d 21 and dZ2 have exchanged their positions,
so that gIl> qZI but QZI < QZ2)' then
•
ll
7.67(5).
X~otal(M-H) =
2
17.05(S), XComb(M-H)
2
It appears that XDiff is rather sensitive
for detecting even small cross-over phenomenona.
30
REFERENCES
•
1.
Andersen, E.B. (1977). Multiplicative Poisson models with unequal cell
rates. Scand. J. Statist. ~, 153-158 •
2.
Armitage, P. (1966). The X2 test for heterogeneity of proportions after
adjustment for stratification. J. Roy. Statist. Soc. Sere B, ~§,150-l63
3.
Bishop, Y.M.M., Fienberg, S.G., and Holland, P.W. (1975).
Multivariate Analysis. Cambridge, Mass., MIT Press.
4.
Breslow, N. and Day N.E. (1975). Indirect standardization and multiplicative models for rates with reference to the age adjustment of
cancer incidence and relative frequency data. J. Chron. Dis. ~§, 289-303.
5.
Discrete
2
Cochran, W.G. (1954). Some methods of strengthening the common X -tests.
Biometrics !2, 417-451.
6.
Elandt-Johnson, Regina C. (1984). Statistical interaction revisited.
Inst. of Statistics Mimeo Series, No. 1457, March 1984, Dept. of
Biostatistics, ONC, Chapel Hill, N.C.
7.
Elandt-Johnson, Regina C. and Johnson, Norman L. (1980). Survival Models
and Data Analysis, (Chapter 8) J. Wiley and Sons, New York.
8.
Fleiss, J.L. (1981). Statistical Methods for Rates and Proportions
(Chapter 10). J. Wiley and Sons, New York.
9.
Gail, M. (1978). The analysis of heterogeneity for indirect standardized
mortality ratios. J. R. Statist. Soc. Sere A, !~!, 224-234.
10.
Gart, J.J. (1971). The comparison of proportions: a review of significance
tests, confidence intervals and adjustments for stratification. Rev.
Inter. Statist. Inst. J2, 148-169.
11.
Gart, J.J. (1978). The analysis of ratios and cross-product ratios of
Poisson variates with application to incidence rates. Commun. Statist.
(Theory and Methods) A7 (10), 917-937.
12.
Haybittle, J.L. and Freedman, L.S. (1979). Some comments on the logrank test statistic in clinical trials applications. The Statistician,
London f§, 199-208 .
13.
Lipid Research Clinics Program (LRCP) (1974). Protocol of the LIpid
Research Clinics Prevalence Study. Central Patient Registry and
Coordinating Center, Dept. of Biostatistics, ONC, Chapel Hill, N.C.
•
31
14.
Mantel, N., Brown, Ch., and Byar, D.P. (1977). Test for homogeneity
of effect in an epidemiologic intestigation. Amer. J. Epid. 1Q2, 125-129.
15.
Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis
of data from retrospective studies of disease. J. Nat. Cancer lnst. tt,
719-748.
16.
Osborn, J. (1975). A multiplicative model for the analysis of vital
statistics rates, Appl. Statist. ~~, 75-84.
17.
Peto, R. and Peto, J. (1972). Asymptotically efficient rank invariant
test procedures (with discussion). J. R. Statist. Soc. Ser. A, 122,
185-198.
18.
Przyborowski, J. and Wilenski, H. (1939). Homogeneity of results in
testing samples from Poisson series with application for testing clover
seeds for dodder. Biometrika ~1, 313-323.
19.
Ze1en, M (1971). The analyses of several 2 x 2 contingency tables.
Biometrika 2~, 129-137.
,
•
© Copyright 2026 Paperzz