Roy, S.N. and M.A. Kastenbaum; (1955)A generalization of analysis of variance and multivariate analysis to date based on frequencies in qualitative categories or class intervals." (Air Research and Dev. Command)

Gertrude M. Cox
A GENERALIZATION OF ANALYSIS OF VARIANCE AND MULTIVARIATE
ANALYSIS TO DATA BASED ON FREQUENCIES IN QUALITATIVE
CATEGORIES OR CLASS INTERVALS
by
S. N. Roy and Marvin A. Kestenbaum
This research was supported in part by the United
States Air Force, through the Office of Scientific
Research of the Air R~search and Development Commend,
and in part by the United States Public Health Service RTS 5065 (c).
'
Institute of Statistics
Mimeograph Series No. 131
June 1, 1955
•
•
•
•
UNCLASSIFIED
Security Information
Bibliographical Control Sheet
1.
2•
~.
O.A.:
Institute of Statistics, North C.rolina State College of the
University of North parolins
M.A.:
Office of Scientific Research of the Air Research and Development
Command
0 .A • :
CIT Report No. 19
M.A.:
0SR-TN-55-167
A GENERALIZATION OF ANALYSIS OF VARIANCE AND MULTIVARIATE ANALYSIS TO
DATA BASED ON FREQUENCIES IN QUALITATIVE CATEGORIES OR CLASS INTERVALS
4. S. N. Roy and Marvin A. Kastenbaum
5. June, 1955
6.
27
7. None
8. AF 18(600)-83
9.
ROO No. 670-088
10.
UNCLASSIFIED
11.
None
12.
In the situation indicated by the title a p-variate body of data arranged
in a q-wey classification will formally look like a body of data arranged
in a. (p + q)-way ta.ble, but the fundamental distinction between a socalled "varia.te" and a eo-called "way of classifica.tion" is that a,long
the direction of a "variate," the marginal frequencies are fixed or prescribed. The hypotheses of "no tatal correlation," "no multiple correlation.. " "no partiel correlation," 'Ino canonical correlation, II " no main
effect," "no interaction, II etc... are transla,ted into hypotheses on the
structure of the probabilities over the different cells or catee;~ies, and,
with large sample assumptions.. these hypotheses are tested by.x. with
appropriate degrees of freedom. No exact test 1n terms of the original
multinomial distribution is attempted in this paper.
A GENERALIZATION OF ANALYSIS OF VARIANCE AND MULTIVARIATE
ANALYSIS TO DATA BASED ON FREQUENCIES IN QUALITATIVE
l
CATEGORIES OR CLASS INTERVALS
by
2
S. N. Roy and Marvin A. Kastenbaum
1.
Summary.
In the situation indicated by the title a p-variate body of data
arranged in a q-way classification will formally look like a body of data arranged
in a (p + q)-way table, but the fundamental distinction between a so-called "variate"
and a so-called "way of c1.asi5tfication" is that along the direction of a "variate,"
the marginal frequencies are stochastic variates, while along a "way of classification," the marginal frequencies are fixed or prescribed.
The hypotheses of "no
t,')tal correlation," "no multiple correlation," "no partial correlation," "no canoni.
CJ.l correlation, It "no main effect," "no interaction, It etc., are translated into
h)Totheses on the structure of the probabilities over the different cells or cate2
fi'Qries, and, with large sample assumptions, these hypotheses are tested by X with
e:,;propriate degrees of freedom.
No exact test in terms of the original multinomial
distribution is attempted in this paper.
2.
<1:'"
Nota.tion and Preliminaries.
To fix our ideas, consider a
J111-'_
:'ize n, distributed over a three-wa.y table in terms of, let us assume
rn.::nt, three variates.
•••
7
sample
for the mo-
denote the observed frequency, and Pijk' the probaiJk
bility under any given hypothesis of haVing an observation in the (iJk)-th cell,
~here
i
= 1,
Let n
2, ••• 1 rj j
= 1,
2, ••• , Sj k
= 1,
2, ••• , t.
Also let the marginals
1. This research was supported in part by the United States Air Force, through the
Office of Scientific Research of the Air Research and Development Command, and in
part by the Un1ted States Public Health Service HTS 5065 (c).
2.
Marvin A. Kastenbaum: Research Fellow in the Department of Biostatistics,
University of North Carolina
2
be denoted by ~ n ijk = noJk ' ~ n iJk = n iok ' ; n ijk = n ljo ' i~jnijk
= nojo '
j~,k' n ijk
= n iOO ,
L
i,J,k
n ijk
= n(say).
= nook'
i~k nljk
Let the corresponding summations over
P ijk be denoted
by PoJk ' Piok ' PiJo ' Pook ' Pojo ' Pioo ' Pooo • Since the categories are mutually exclusive, it is easy to see that these are, in tact, the corresponding marginal probabilities, so that P000 = 1. The generalization to more than three variates would
be obvious. The total number n 1s, in any case, supposed to be fixed. The likelihood function, which in this case is also the probability ot the nijk's, is given by
n1
(2.,1)
•
last expression on the right side of (2.1) 1s the one we shall need when we are
~e
interested in finding the
max~um
likelihood estimates of the p's.
= npijk , from which it is easy to see, by taking summation,
E(n 100 ) = np iOO ' etc., and, in general, for any linear function
We have E(n 1jk )
E(n ijO )
'l'S
n~xt
(1
= npijo'
the same type of relationship will hold for the p's.
that
of
Starting from (2.1), we
observe that the conditional probabilities of the nijk's, given, say n 100 's
= 1,
2, ••• , r), or say nijo's {i
= 1,
2, ••• , r; j
= 1,
2, ••• , s),will be given
respectively by
and
If now the niOO's are held fixed, or say the nijO's are held fixed, we shall
~
have a selt consistent set up if we put Pi00 = ni 00 / n, or, in the second case,
P1jo = n ijo / n, and also take the right sides of (2.2) and (2.3) to be the actual
3
probabilities of the niJk's in the two different situations.
The generalization to
more general types of linear constraints on the n's is obvious.
Notice that it, for
example, the n . 'a are held fixed, and i f we want to estimate the piS by, say, the
i JO
maximum likelihood method, we do not have to estimate the Pijo's since they are already given by Pijo :: ni.1o/n. For this purpose, therefore, it is enough to replace
(2.3) by
,
(2.4)
and obtain the maximum likelihood estimates of the Pijk'S subject to
fixed (with i :: 1, 2, ••• ,
rj
j
= 1,
2, ••• , s).
~
Pijk being
Likewise when the n.~oo 's are held
fixed, it is enough to replace (2.2) by (2.4) and obtain the maximum likelihood est imates of the Pijk's SUbject to j~K Pijk being fixed (with i
= 1,
The
2, ••• , r).
generalization of this to other linear constraints on the n's is also obVious.
No-
tice that any linear constraint on the n's will imply a similar linear constraint on
the P,s, but it is not the other way around.
3. Hypotheses of Independence between "i" and ".1" in a Two-Way Table.
and J :: 1, 2,
Case I:
L-2,4,5_7
... ,
s).
Both "i" and"j" are "variates"; or in other words, the hypothesis is a com.
pos1te one, where the Pi'S and p .'s are the free or nuisance Parameters, subject
o
OJ
to I. p. = I. P . = 1. We shall see that in this situation (r + s - 2) independent and
i
),0
J
oJ
free parameters have to be eventually estimated from the observations.
also one linear constraint on the n's which is I. n .
. J 1J
~,
"no correlation" for a bivariate normal population.
= n.
There is
(3.1) is the analogue of
4
e
Case II:
"variate".
Either "ill or "J", say "i", 1s a II wayof classification", while "j" is a
In other words the niO's are fiXed but the nOj's are stochastic variates.
= niO/n,
but, of course P ~ noJ/n. Thus the only free and nuioJ
sance parameters are the p .'s subject to E P . = 1, so that (s - 1) free parameters
oJ
J oJ
In this case Pia
have to be eventually estimated from the data, and we have niO's (i
all fixed; that is, we have r linear constraints on the niJ's.
= 1,
2, ••• , r)
Physically, HO of
(3.1) in this case means testing the hypothesis that r observed frequency dlstributiona with fixed marginal totals, niO's, have come from the same parent frequency
distribution.
This is easily seen to be one natural generalization of the hypathe-
sis of the equality of means for a one-way classification in the analysis of variance,
when we remember that HO:~l= ~~=
••• = ~ r for N(t.,
~
1
that the r distributions are the same.
0
2
) (i
= 1,
2, ••• , r) would imply
For the normal distribution the class of al-
ternatives is supposed to be H f HO under the model N(~i' 0 ), but here the class of
alternatives is, of course, much more general. The case "i" being a "variate" and
2
"J't a "way of classification" is exactly similar, and need not be considered sepa..rately.
Case
III~
Both" i" and" j" are "ways of classification".
Here the sets of niO ' s
and noJ's are both fixed so that there are (r - 1) + (s - 1) + 1 independent linear
and Pa j = nOJ./n, so that no parameter
has to be estimated, all of them being prescribed. This is the case usually given
constraints on the nij's, while p.10
= ni 0 /n
in the textbooks, and this is exactly the case which is most difficult to visualize,
unless we think in terms of a hypothetical sub-population having the same fixed marginals as the ones we have observed, which is anyway a highly artificial concept.
However, for an r x r case, an extension of Fisher's "tea.-tasting" experiment would
provide a. realistic example of fixed marginals both ways.
But, it will be shown
later, that, in all three cases we end up with the same test of H of (3.1).
O
is a highly interesting result.
This
:;
4.
L-6_7
Hypotheses Associated with a Three-Way Table:
Features of the three-way
table which, by considering the marginals, can be easily seen to be identical with
those of a two-way classification, are not of so much interest. We shall, therefore,
restrict ourselves mainly to those hypotheses which have no analogue in a two-way
table.
and "k" all being "variates ll ;
"J",
Also, out of the possible cases (I) "i",
(II) any two of them being "ways of classification" and the remaining one a "variate ll ;
(III) anyone of them being a "way of classification ll , and the remaining two "vari_
ates"; (IV) all of them being "ways of classification", we shall discuss, in the
present paper, (I) and (II), these being of greater physical interest than the others.
Mathematically, however, it will turn out that each hypothesis will have the same
test for all the different cases.
4.1 Hypotheses of Conditional Independence hetween "i and
see that the conditional probability of "i and jll
I"kll
j"-l lI k".
= Pijk
It is easy to
IpoOk' and the condi-
I
tional probabilities of "1" 11k" and" jll \"k" are respectively p
Ip
and p Ip
•
ojk ook
10k ook
Thus the hypothesis of conditional independence between "i and j"lllk" 1s
(4.1.1)
(for 1
H :
O
= 1,
2,
... , r;
Pijk
Piok. Pojk
=
Pook Pook
Pook
j
= 1,
2,
••• I
s; k
or
= 1,
Pijk
2,
=
... ,
Pi6k:'o:1!:
Pook
t).
,
The alternative class
is, of course, H , HO• In (4.1.1) adding up over i and j respectively we have
(4.1.2)
which are merely two consistency conditions.
Adding up over k, we have
If on (4.1.1) we superimpose the conditions of independence between "i" and "k"
and
tt
JIt and "kit, i.e.,
6
(4.1.4)
and
we have
(4.1.5)
which is the condition of complete independence of "i", "j.", and "k".
Notice that
(4.1.1) will not imply (4.1.4), and (4.1.4) by itself will not imply (4.1.5).
Now consider for (4.1.1) the Case I, where "i", "j", and "k" are all variates.
The H of (4.1.1) is now easily seen to be the analogue of "no partial correlation"
O
between the first two variates, in the case of a three-variate normal population.
Now if we want to test (4.1.1) we must eventually estimate the nuisance parameters
= 1:.j
PJ.'Ok's, PO.ik's,
and P~l.
~l'l sUbject to 1:
_
0,_',,,
i p.1.0k
£ P k
e
k
00
= 1.
P0 jk
= P00k
(k
= 1,
2,
... , t), and
It is easy to check that this leaves us With rt + st + t - t - t - 1,
that is t(r + s - 1) - 1 free parameters to estimate.
straint on the n's, namely
E
i, .i,k
n 'k
iJ
=n
We have Just one linear con-
(fixed).
If we start out to test (4.1.5), we would have to estimete, eventually, the
nuisance parameters Pi's, P j
00
i.e., (r +
8
0 0
IS, ~~d
P
1.'8
'oo~
+ t - 3) independent parameters.
sUbject to I. p.
1
There
j.B
J.OO
= I. P
j
ojo
= I.kook
P
also the SC-nle one linear
constraint on the n's, as stated above in the case of (4.1.1).
It will be seen in sections 5, 7, and 8 that it is unrealistic to try to test
(4.1.1) for the Case II, where "i" and
II jll
are "ways of classificat:!.on" and "k" is
a "variate".
4.2 Hypothesis of Independence be+;ween "(i,j)" and "k", that is, the hyPothesis of
multiple independence.
e
(4.2.1)
HO·
Consiccer
P
p
ijk =P ijo"ook
= 1,
2, ••• , r; J
k = 1, 2, ••• , t),
(for i
= 1, 2, ••• ,s;
=1,
7
~
the alternative being, of course, H 1 H • It is easy to check, by summing over i
O
and J respectively that (4.2.1) implies
and
Summing over k we have merely the consistency condition
(4.2.2.1)
Notice that although (4.2.1) implies (4.2.2), the condition (4.2.2) will not, in
general, imply (4.2.1).
However, for a normal population (4.2.2) implies (4.2.1).
Let us ask ourselves what set of conditions is there which, when superimposed on
(4.2.2) will, together, be exactly equivalent to (4.2.1). One possible set might
appear to be
PiJoPiokPojk
PiooPojoPook
(for i = 1, 2, ••• ,
k
= 1,2, ••• ,
rj
.1= 1, 2, ••• ,
t).
Check that (4.2.3) does not imply (4.2.2), but that if on (4.2.3) we superimpose
(4.2.2), we have (4.2.1) all right. But (4.2.3) would be mathematically most difficult to handle, in that the Parameters on the right side of this equation are subject
to sets of side conditions, typical among them being
(4.2.4)
=
PijoPiokPoJk
k PiooPoJoPook
L
or
and other such sets obtained by permuting the sUbscripts.
In fact, (4.2.3) was
tried, and was found to be intractable.
Physically a less natural and more abstract, but mathematically a much easier
set of conditions seems to be
8
q q
q1Jo iok Ojk
q q
Q100 oJo ook
=
where we do not assume that QijO
Equation (4.2.5), after
= Pijo
el~lnation
(1
=1
1
21
r; j = 1 1 21
."1
=1
k
1
""IS;
2, ••• , t),
etc., nor even that Qioo
=~
qijO' etc.
of the qls, leads to a number of constraints on
the pIS themselves l and it is easier to try to estimate the p's subject to these
constraints and to
Pijk = 1, rather than to try to estimate the q' s.
.
L
i,jl k
The
only role of the Q's and of the hypothesis (4.2.5) is one of yielding certain constraints on the p'S themselves.
It will be shown in sections 8 and 9 that (4.2.5)
is equivalent to just (r - 1)
(s - 1) (t - 1) constraints on the Pijk 'sl which l
= 1,
make just (r - 1) (s - 1) (t - 1) + 1 constraints.
together with . ~
Pijk
1.,j,k
Notice that in this case we do not have constraints like (4.2.4) Which, in practice,
turn out to be quite aWkward.
Now
we lay down the ruW, which is physically rather
abstract, but mathematically quite straightforward, that if (4.2.2) is true, that
is, if the hypothesis (4.2.2) is tested and accepted, we shall substitute in (4.2.5)
PijolPiok' etc. for Qij o l Qiok' etc., and Pioo for Qioo and so on, and superimpose
(4.2.2), and end up with (4.2.1). Notice that if in (4.2.5) we were to replace
(i,J l k) by (x,y,z), then (4.2.5) would be found to imply
(4.2.6)
f(x,y,z)
=
fl(X,y) f 2 (x,z) f (y,Z)
3
Fl(X) F (y) F (z}
2
3
with nothing else connecting f , f , f , F , F 2 , F among themselves or with f.
2
l
3 l
3
We shall now consider cases I and II, each in relation to (4.2.1) and (4.2.5).
4.3 Case I. "i",
t1jtl,
In this case, (4.2.1) is the
and "k" are all "variates ll :,
natural analogue of the hypothesis of "no multiple correlation" between "(1,J)1I and
"k".
L
1,j
We have to estimate the nuisance parameters p. J
J. 0
Pij
= L P k
0
k
00
=1
1
I
sand p k's subject to
00
which leaves us with (ra - 1) + (t - 1) free parameters to be
9
estimated.
e .
l:.
l,j,k
There is, of course,
the linear constraint on the n's, namely
n iJk = n (fixed).
Turning now to (4.2.5) as applied to Case I we notice that this does not have
any analogue in multivariate analysis based on the normal population. We shall find,
in the next subsection on Case II, where "i" and "j" are "ways of classification" and
"k" ia a "variate", that (4.2.5) is really the hypothesis of "no interaction".
For Case I, we can thus regard (4.2.5) as a contribution to multivariate analysis
made by analysis of variance.
From the remarks on the constraints on the PiJk' s
following from (4.2.5), we note that the number of free parameters to be estimated is
rst - L-(r - 1) '. (s - 1) (t - 1) + lJ, and we have the usual linear constraint on
the n's, namely
L
i,j"k
4.4 Case II.
n ijk
=n
(fixed).
"i" and "j" are "ways of classification" and "k" is a "variate".
In
this case the nijo's are fixed by the conditions of the exper~ent" which means that
n
iJo
Pijo =
and thus the PijO's do not have to be estimated from the data. (4.2.1)"
--n- '
in this case, will thus be the hypothesis of equal (population) frequency distributiona" in terms of the "variate" "k" over the r x s categories.
there are p k's to be estimated subject to L p
k
00
t - 1 free parameters to
esti~te.
~k
Ov
= 1"
To test (4.2.1),
which means that we have
We have also r x s linear constraints on the n's,
by virtue of the nijO's being given.
Turning now to (4.2.5) as applied to Case II, we note that it
(4.4.1)
and
are tested and accepted, that is,, if in the (iJ) classification, the marginal i's
(i = 1, 2, ... " r) have the same frequency distribution in terms of "k"" and so alao
the marginal J's (j = 1, 2, ••• " a)" then subatituting PiJO'a" etc. for qijo'a etc.
~
in (4.2.5) and then auperimposing (4.4.1) on (4.2.5) we have
10
(4.4.2)
(over all i, 3, k).
This means that in every (ij)-cell there is the same frequency distribution in terms
of "k".
Going back to the usual model of analysis of variance for a two way classification, and assuming, for
cells, we recall that if
••• , u), we assume that
s~plicity,
Xij~
Xij~
be the
equal frequencies, say u, over the different (ij)
~-th
observation in the (ij)-th cell
NL-E(Xij~)'
is
0
2
-7,
(~=
1, 2,
and
(4.4.;)
where
1:. lli
i
0
=
1:. IJ.
oj
j
=
1:. II
i
ij
=
L ~
j
iJ
= o.
The condition of "no interaction" is usually expressed as
e
(4.4.4)
(i
= 1,
2, ..
0'
(i
= 1,
2, ••• , r),
(J
= 1,
2, ••• , s).
rj j = 1, 2, ... , s) ,
and the condition for "no main 1i ' -effect" as
IJ.
10 = 0
and that for "no main Ijl_effect" as
(4.4.6)
IJ.oj
=0
If all these hold, then we have
for all i,J, and
~,
which means that every (ij) cell has the same frequency distri-
bution in terms of the variate x, the distribution in this case being normal.
~
easy to see that (4.4.2) is a proper generalization of (4.4.7).
It is
It is now eaay to
11
~
check that none of the three conditions (4.4.4), (4.4.5), and (4.4.6) implies the
other two, and no two of these imply the third, and also that none or no two of them,
separately, will imply (4.4.7). All of these have to be true in order to lead to
(4.4.7). Assuming now
remembering that Pijo
any "k" interval to be the interval between x and x + dx, and.
= u/n = u/urs = lire,
we have in general
(4.4.8)
or
w~ere
E(x) is given by the right side of (4.4.3).
b1.l.1:. not necessarily (4.4.6),
If now (4.4.4) and (4.4.5) hold,
then substituting trom
(4'.4.4) and (4.4.5) in (4.4.3),
anc. B\1.mming the two sides of (4.4.8) over .1, we have
Notice that this is independent of "i", which means that in every "i"-cell there is
a eJistribution in terms of "k" or x which is the same for all "i".
tl,at there is a similar result tor ".1" after summing over "i".
It is obvious
Notice now that
(4.4.1) is a generalization of these. Thus we can regard (4.4.1) as one analogue of
"no main effects" and (4.2.5) as one analogue of "no interaction".
e~sily
o~
perceive that this generalization does not retain all the detailed features
analysis of variance as we know it in the case of the highly structured normal
poral~t1ons.
But we believe that some important features are retained.
5. Large Sample Tests in terms of
the~e
~
The reader will
x2 • L-4, 5.1
It is well known that i f (i)
is a total ofn observations distributed over s cells such that the number of
observations in the j-th cell is n
J
(J
= 1,
2, ••• , s), and if
subject to the linear constraints
(5.1)
(i = 1, 2, ••• , r
<
s),
(ii)
the nj's are
12
~ where A(r x a) is of rank r , and if
(iii)
the probability P
an observation in the j-th cell be of the form Pj(Q11 Q21
and i t
(iv)
(j
j
= 1,
2 , ••• , s) of
Qt)' where r + t
••• ,
<
8,
2
the Qk'S are estimated by the modified minimum X method (which has
been shown to be the same as that of maximum likelihood), and if (v)
this leads to
A
a unique solution in the P,,'s, to be called, say Pj's, then for reasonably large nJ's,
e
E (n j
j=l
A
-
2
A
np,,) Inp
J
2
is approximately distributed as a X with degrees of freedom equal to s - r - t.
2
That is to say, the degrees of freedom of X are equal to the number of cella minus
tr.e number of independent linear constraints on the nj's, minus the number of free
pm:3.meters to be estimated from the data.
Notice that (5.1) includes the condition
E n j = n.
Notice also that every linear constraint on (5.1) will imply a similar
cO~utraint
on the PJ's (although the reverse is not true), and will thus reduce the
j
,
n~b~r
of tree parameters Qk's.
h~;potheais
Notice further that customarily the role of any
2
(that we test by X ) is to give a structure of PJ'S in terms of Qk'S,
which then have to be estimated from the data.
2
6.
Applications of the X -test to Section 3.
Starting from (3.1) and taking
int0 account the remarks of section 2 , we write the likelihood function as
(6.1)
Now let us consider the three cases separately.
c~~~.
We must estimate both the PiO' s and the Poj ' s, subject to
rntroducing the usual Lagrangian multipliers A and
I-l,
~
Pio =
~
PoJ = 1.
and taking the logarithm of the
right side of (6.1), we have as the maximum likelihood equations for the Pio's and
the POj's
13
e
(6.2)
(J
= 1,
2, ••• , 8).
Multiplying both sides of (6.2) by Pio and summing over i, and using the side condition that ~ PiO
= 1,
and also multiplying both sides of (6.3) by Poj and summing over
j, and using the side condition E P J
J
(6.4)
A
= 1,
it is
~ediately
seen that
0
=
~
= - nj
so that we have the maximum likelihood estimates of Pio and P given by
oJ
and
Sp~etituting
2
2
in the usual expression for X , we have the modified X given by
(6Ji)
~e~al1ing
section 5 and the observations under case I of section 3, we note that
2
(6.h) has a X -distribution with d.f.
rs - 1 - (r + s - 2)
=
(r - 1)(8 - 1).
Here Pio = nio/n, and the PoJ'S alone have to be estimated under the side
co~cltion E P J = 1.
Proceeding as in the previous case, we observe that here we
J 0
Ca~~:~.
shall have only the equation (6.3), and we end up with
and
e
~oti.ce
p.
-~o
the difference between (6.5) and (6.7).
In (6.5) there are carats on both
and p0 J' while in (6.7) the carat appears only on P0 J.
2
IGual expression for X , we get
Now substituting in the
14
2
Recalling the observations under Case II of section 3,we have a X with d.f.
rs - r - (8 - 1)
Case III.
(r - 1)(8 - 1) •
c
Here we have
(6.10)
and
Substituting in the usual expression for
x2 ,
=
we get
,
(6.11)
and recalling the observations made under case III of section 3, we have X2 with d.f.
rs - (r + s - 1)
(r - 1) (8 - 1).
=
The familiar text book example of "vacc inated" against "not vaccinated" one way,
anl'l. "attacked" against "not attacked" the other way is really an example of Case II;
it is neither Case I nor Case III.
7. Application of the X2_test to section 4. In this section we shall use the
X2 to test all the different. hypotheses in section 4 except the hypothesis (4.2.5)
wOidl) as we have already observed in seetion 4, plays the role of "no interaction"
"'~~n
ro~_e,
"i" and "J" are "ways of classification" and "k" is a "variate", and a certain
which has no analogue in multivariate analysis when all three are "variates".
The hypothesis (4.2.5) will be considered in detail in section 8 for the special case
of r
= s = t = 2,
20:\d~tiona1
in
fI~ction
and in section 9, in less deta.il, for the general r x s x t table.
Inde;pcnden£f:,'
Starting from (4.1.1), and taking into account the remarks
2, we can wr:Lte down the likelihood function as
¢... II
P
n. "
1"j"k ijk
tbat. is,
l.J.{
'
¢ ...
n
n
II Pi k 10k II P .ojk II POOk
.i k
0
j nk oak
k
,
. "
-n
ook
15
e
Case!.
"i", "j", and "k" all are "variates".
Notice that we have to estimate the
Piok's, POjk's, and Pook's subject to
~ Piok
=
~ Pojk
=
Pook
= 1,
(k
2, ••• , t) and; Pook
=
1.
Now using the same method as in section 6, and ca.lling the associated Lagrangian
~, ~
multipliers
(k
11:
v, we have the maximum likelihood equa-
1, 2, ... , t), and
tions
n
n
n
-!2!
+ ~ = 0; ~ + ~ = 0;
P
P
iok
ook +
P
ook
oJk
with i • 1, 2, ••• , r;
j
= 1,
2, ••• , s;
A..
"1t + .U.K -
= 1,
k
v
2, ••• , t.
= 0,
Multiply the first
equation of (7.3) by Piok and sum over i using the side condition
~
Piok
= Pook :
mUltiply the second equation of (7.3) by PoJk and sum over j using the side condition
1'. P jk • P
j
0
00
k.
Now using the third equation of (7.3) we have
(7. 4)
Multiplying the third equation by P
and summing over k using the sidecondit10n
ook
1'. P
= 1. we have
ook
'
k
v
=
-no
Hence we have the following maximum likelihood estimates:
(7.6)
"p
n
=niok
_
.
10k
n
SUbstituting in the usual expression for
I.
i,j,k
ook
n
I
(n
iJk - n
x2
"PiokPoJk
"
x
Potitt
we get
)
2
"PiokPojk
"
/n....;....,A~~
Pook
•
16
2
Recalling the observations under Case I of sub-section 4.1, we have a X with d.f.
rst - 1 - L-t(r + s - 1) - 1-7
=
t(r - l)(s - 1).
The question as to how far it is meaningful to investigate this "conditional
independence ll for the other cases, namely, when not all are "variates", is now under
examination.
However, there are additional mathematical difficulties in these situ-
ations.
Complete Independence.
Starting from (4.1.5) and recalling section 2, we write
the likelihood function as
¢Case I.
Pojo IS,
i'~,
n ijk
II Pijk
,that is,
i,j,k
¢ - II
PiOO
ni
00
i
II Pojo
j
n j
0 0
n k
II Pook 00 •
k
"J" I and Ilk" all are variates.
Here we have to estimate the P
IS,
ioo
and P00k'S under the side conditions L Pioo L Pojo ~ Pook = 1. IntroIt
=
i
=
k
j
ducing the usual Lagrangian multipliers, we have as our maximum likelihood equations
n
ook + v
Pook
=
with i
= 1,
2, ••• , rj J
= 1,
(7.10)
~
2, ••• J
=
~
=
n
ioo.
=_
n
k
Sj
v
=
I
= 1,
=
n
=
2, ••• , t. SolVing we have
and thus
-n,
ojo
0,
n
j
n
ook
n
2
SUbstituting this in the usual X expression we have
(7.12)
e-
and recalling the observations under Case I of sub-section (4.1) we see that our X2
has d.t. rst - 1 - (r + s + t - 3)
=
rat - r - s - t + 2.
17
2
In this case, as in section 6, it can be shown that we should have the same X
with the same d.f. for (i) any two of "i", "J", and "k" as "variates" - say "i" and
"J" - and "k" as a "way of classification", (ii) anyone of "i", "j", and "k", say
"i", as a "variate" and" jll and "k" as "ways of classification", or (iii) all three
1\
1\
as "ways of classification".
and P00k
= n00kIn;
in caee (iii) Pi00
In case (i.) we should have P
ioo -- niooIn''
1\
in case (ii) Pi 00
= n.100In;
P0 j 0
= n i 00In;
= n0 .1 0 In;
= n0 j 0 In;
p0 j 0
oJo
= nojo'
In'
= n00kIn;
and
= nook/no
and p00k
MUltiple independence between "(i, j)" and "k".
and P00k
p
Starting from (4.2.1) we write down
the likelihood function as
=
(7.13)
(J
Case I.
"i",
n
n1
II
n ijk l
i,j,k
"t,
n
II p. 1jo IIp
ook
i,j iJo
kook
n
II
N
p
i,.1
ij 0
n
ijo IIp
ook
ook'
k
and "k" all are "variates". We have to estimate the Pijo'S and
P k's subject to the constraints
00
~
i,j
Pij
0
= k~
P k
00
= 1.
Introduc ing the usual La-
grangian multipliers, we have as our maximum likelihood equations
n
(7.14)
and
....2.2!+
I..l
=
0
Pook
tor (i = 1, 2, ... , rj j = 1, 2, •.. , a)
(7.15)
and (k = 1, 2,
"'J
t).
Solving,we have
and
2
and substituting in the UBua1 X expression we get
(7.16)
2
which, recalling the remarks under Case I of subsection (4.3) is a X with d.t.
~
rst - 1 - L-(rs - 1) + (t - 1)_7 = (ra - l)(t - 1). We shall end up with the same
x2 with the same d.f. in the cases where (i) "i" and "J" are "variates" and "k" is a
18
•
"way of classification" or vice verss, or (11) "1",
classifl-::ation".
The obvious
reod~.f'ic3tione
i::J.
If
J", and "k" sre all "ways of
k~;:;;Ets,:·'
.~ i)
.1"':: (:.;; .i:1
"'..,.C'
..
n
the feet tha.t the s.ppropria.te pIS will hsve carats over them and the rest will not.
8.
"No interaction" in a. 2 x 2 x 2 table.
Consider in this caEe -:he hypothe-
sis (4.2.5), and write it out in full as follows:
O:
(8.1)
-
H
qllOqlOlqOll
q100~lOqOOl
,
q012
qlOOQ01OqOO2
P
111
=
p
= qllOq102
112
q2l0q201 qOll
q200q010qOOl
P
211
=
,
P212
= Q200Q01OqOO2
Q101Q021
..
Q1OOQ020QOOl
,
p
= Q220
Q120Q102Q022
Q1OOQ020QOO2
,
p
121
= Q120
P
122
=
2?1
P
222
q2l0'202~12
Q201Q021
Q200~20QOOl
=
,
,
,
Q220 Q202Q022
Q200Q020QOO2
It is easy to check that by eliminating the Qls, we have, what we will call, the
"no interaction" constraints, which, in this case, represent Just one relation among
the pIS, namely
PII1P221
P211P12l
(8.2)
=
Pl12P222
P2l2P122
There is, of course, the other side condition on the pIS:
(8.3)
!:
i,j,k
P
iJk
=
1-
Recalling again section 2, the likelihood function can be written ss
¢ -
(8.4)
•
Now consider
Csse I.
"i", """, and "k" ell are "variates".
The problem is to estimate the
19
~
¢ subject
Pijk'S by maximizing
to the constraints (8.2) and (8.3).
Introducing the
usual Lagrangian multipliers on (8.2) end (8.3) we have the maximum likelihood
eq'J.ations
n
(8.5)
-
'A
iJk
+
+
Pijk
Pijk
n ijk
Pijk
-
-
'A
+
P ijk
Now multiplying by P
ijk
l-1
=
0
(ijk
= 111,
221, 212, 122)
l.L
=
0
(ijk
= 112,
222, 211, 121)
' and summing over 1,J,k, and using (8.3), we have l-1
= -n,
and
1\
(8.6)
ijk ...
t(n ijk + ~} / n
(1Jk
= 111,
221, 212, 122)
Pijk ...
~(niJk - ~) / n
(1jk
= 112,
222, 211, 121).
P
1\
SUbstituting in (8.2), we have for
~
the cubic equation
(n112- ~)(n222- ~)
(n + i\) (n + ~)
212
122
(n 111+ ~)(n221+ ~)
(8.7)
(n 211- ~)(n121- ~)
1\
Solving for ~ and substituting in (8.6) we have the estimated Pijk'S occurring in
2
the usual X.
Since
1\
n
(8.8)
1jk
- nP
ijk
1\
- nPijk
iJk
D
.... ~
(ijk...
= +~
(ijk
=
111, 221, 212, 122)
112, 222, 211, 121) ,
2
the final X 1s given by
This will be a
x2
with d.f.
=
.the total number of cells (8 here) - L-the apparent
number of parameters (8 here) - the number of "no interaction" constraints (1,
20
here) - the number of linear relations on the pIS coming from the linear constraint.
on the n's (1 here)-l - L-the number of linear relations on the n's (1 here)-l
= the
number ot "no interaction" constraints
= 1,
in this case.
from this that, in all cases, no matter whether "1 n,
II
It is easy to see
.1", and "k" are all "variates",
or some are"variates" and some "ways of classification", or all are "ways of classification", we are going to end up with a
X
2
with '. d.f. exactly equal to the number
of "no interaction" constraints like (8.2).
L Pijk = 1
i,.1,k
goes with the "no interaction"
Notice that in (8.5), the Lagrangian ~ goes w1th the constraint
which stems from
I.
n
1,.1,k i.1k
constraints (8.2).
Case II.
= n,
and the Lagrangian
~
"i" and "J" are "ways of classification, and "k" is a "variate".
This
is the case of a two-way analysis of variance, and it is clear from the remarks
after (8.9) that we shall end up with a
X
(8.10)
2
with the same d.f.
(fixed)
with 1, j
= 1.
= 1,
Here
2.
The maximum likelihood equations for the p's subject to (8.2) and (8.10) will now be
(ijk
= 111,
221, 212, 122)
(ijk
= 112,
222, 211, 121).
(8.11)
Consider any (ij).
k
= 2,
(8.12)
.~
Take say (11), and notice that
~
so that multiplying by Pijk and summing over k
o
or
~ij
goes with k = 1, and
=1
= -n,
-~~
with
and 2, we have
using (8.10).
Hence substituting from (8.11) in (8.2), we have the same equation in A, and finalA
2
ly the same pIS, and thus the same X as in the preVious Case I.
21
e
"i" 1s a "way of classification" land "j" and "k" are "variates". Again
Case III.
2
from the remarks after (8.9) we observe that we will get a X with the same d.f. ; 1.
Here
(8.13)
= nioo /n
P100
(fixed),
with i == 1, 2.
The maximum likelihood equations for the p's, subject to (8.2) and (8.13) will now be
n~.jk
Pijk
~
+
+
J.L
i
==
0
(ijk
==
111, 221, 212, 122)
+
J.L
i
==
0
(ijk
==
112, 222, 211, 121).
with jk
==
11, 22, and
Pijk
(8.14)
nijk
Pijk
Notice that for a
- -Pijk
)...
g~ven
i, say 1, we have
~
-~
with jk
= 12,
21,
so that if we multiply by Pijk and cum over Jk, we will have
(8.15)
n
P.
= 0
ioo + "-1'ioo
or
II
= -n,
using
Hence substituting from (8.14) in (802) we have the same equations in
~
and finally
2
A
the same piS, and thus the same X as in the previous Cases I and II.
Case IV.
X
2
"1" l "j", and_Ilk" a~l are "ways of' cla~~ific~~~1J:on" ["lj.
with the same d.f'.
= 1.
We shall get a
Rere
(8.16)
with i,J,k
==
1,2.
N,tice that the relations in (8.16) are not 1ndep:,:mdent.
In fact,
that if' we put Plll = x (say), then all the other pts
will be completely given in terms of x and the fixed marginals of (8.16). Then sUbfrom one angle it will be
see~
stituting in (8.2) we can find x. From another angle (which should, of course, f1~
na1ly give the same result) we notice, by putting nlll - nPlll =-x and using (8.16),
that
22
e-
= 221"
212" 122)
n ijk - nPijk =-x
for
(ijk
n iJk - nPijk = tx
for
(ijk = 112" 222, 211, 121).
(8.17)
T~is
means that x is exactly
~
of (8.5) or (8.11) or (8.14), so that we have the
same equations in x as we had in
~
in the previous cases.
Hence we have the same
"
expressions for Pijk in terms of nijk as we had for Pijk in terms of nijk in the previous case.
2
And hence we have finally the same X in terms of the nijk's as in the
previous cases.
9.
"No interactions" in an r x s x t table.
Let us consider here the hyPOthe-
sis of "no interaction", and try to eliminate the q's.
first the case of a 2 x 2 x t table.
To fix our ideas, consider
Looking into the mechanics by which (8.2) is
obtained from (8.1)" it is easy to see that" corresponding to (8.2) we are going to
have
PlltP22t
P21tP12t
= Pll,t-1P22,t-l
= Pll,t-2P22,t-2 =
P21"t-1P12"t-l
P21 "t-2P12"t-2
•••
=
PIUP 221
P211P121
For a general r x s x t table we can figure out that we are going to have the following "no interaction" constraints:
(9.2)
PrskPijk
PrstPiJt
=
PistPrJt
PiskPrjk
,
for
k = 1, 2,
J = 1, 2,
i = 1" 2"
This gives us (t-l)(s-l)(r-l) constraints on the Pijk's.
... ,
• • • J
• • • J
(t-l)
(a-l)
(r-l).
Checking the mechanics of
the derivation of (8.2) from (8.1)" it will be seen that (9.2) yields a set of independent and exhaustive relations among the p'S by eliminating the q's from (4.2.5).
Here Prst is, as it were, a pivotal element" and r" s, and t the pivotal subscripts.
~
We can make any other three subscripts the pivotal ones" and thus obtain another set
of independent and exhaustive relations like (9.2) which would be exactly equivalent
23
e
to (9. 2 ), and so on.
Our likelihood function is
¢ Looking into the mechanics by which, in the case of a 2 x 2 x 2 table, the four
cases, namely (i) "i", 11.1", and "k" all being "variates", (ii) anyone being a
IIvariate" and the other two "ways of classification", (iii) any two being "variates"
and the remaining one a "way of classification", and (iv) all being "ways of classi2
fication", were shown to be mathematically equivalent (in terms of testing by X ),
we can verify that this will hold for the general r x s x t table also. In all the
2
cases, we have the same form of x , distributed with d.f. (r-l)(s-l)(t-l). We discuss, therefore, only
e
Case I.
All are "variates".
Here we have to maximize (9.3)
subject to the "no
interaction" constraints (9.2), and the further constraint
Introducing for (9.1) the Lagrangian multipliers Aijk L-i
j
~,
= 1,
and
2, ••• , (a-l); k
max~lzing
= 1,
2,
.00,
= 1,
2, ••• , (r-l);
(t-l)_7, and for (9.4) the Lagrangian multiplier
A
(9.3), we have for Pijk the typical equations
/..-Il
24
e
(r-1) (s-1)
r.
n
(9.5)
rst
+
Prst
r.
i=1
(t~1)
r. "'ijk
.1=1 k=1
Prst
+
IJ.
= 0
+
tJ.
= 0
+
tJ.
=
+
IJ.
= 0
+
IJ.
= 0
+
IJ.
= 0
(s-1) (t-1)
D
iat
P1st
-
-
r.
r.
.1=1
k=1
Plat
"'ljk
(r-1) {t-1}
D
rjt
Prjt
-
r.
r.
1=1
k=1
Prjt
"'ijk
0
(r-1) (a-1)
n
rsk
Prsk
-
r.
r.
i:1
.1=1
Prsk
--
"'ijk
(t-l)
r.
\jk
Dijt
k=l
+
Pijt
Pljt
(s-1)
Disk
-Pisk
r.
+
"'ijk
.1=1
Pisk
(r-l)
r.
Drjk
+
Prjk
"'ijk
i=1
Prjk
+
IJ.
= 0
D
"'ijk
Pijk
+
IJ.
= 0
2,
... ,
ijk
PiJk
-
With, of course, i = 1, 2, ___ , (r-1); j
= 1,
(s-l); k = 1, 2,
••• J
(t-l).
Notice that with the pivotal subscripts erst) goes a triple summation over the
e
",IS
and a positive sign before that expressionj with just one subscript changed goes
double summation over the ",fa and a negative sign before
a
that expressionj with two
of the subscripts changed goes a single summation over the hiS and a positive sign
before that expressionj and finally with all the subscripts changed, we have a single
~ijk
with a negative sign before it.
As in the case of the 2 x 2 x 2, it is easy to see by multiplying both sides of
(9.5) by Pijk and summing over i,j,k, that
~
= -no
Thus solving for the Pijk'S in
terms of the nijk's and hiJk's, and substituting in the "no interaction" constraints
(9.2), we have for ~ijk the following equations L-for i = 1, 2, ••• , (r-l)j j = 1, 2,
... , (s-l)j k .. 1, 2, ... , (t-l)J:
(nrst + ~rst) (n ijt + ~ijt)
(n iet - ~i8~) (nrjt - ~rjt)
=
(n rsk - ~rsk) (n ijk - ~iJk)
(n iSk + ~iSk) (n rjk + ~rJk)
where ~rst stands for the triple summation expression in (9.5), ~ist' ~rjt' ~rsk for
the double summation expressions in (9.5), ~ijt' ~isk' ~rjk for the single summation
expressions in (9.5), and
~ijk
is simply h ijk • Solving equations (9.10) for the
A
~ijk'S,
and ultimately for the hijk's, in terms of the nijk's, we can find the Pijk's.
2
SUbstituting these values in the usual expression for x we have
where
i
= 1,
2, ••• , rj j
= 1,
2, ••• , s; k
= 1,
2, ••• , tj and where
'liJk = +1 i f iJk'= rst (the pivotal subscript);
'lijk
'lijk
=
=
-1
if anyone SUbscript differs from the corresponding pivotal sUbscript;
+1
if any two subscripts differ from the corresponding pivotal subscripts;
\Jk = .1
if
all subscripts differ from the corresponding pivotal subscripts.
The method of solVing equations (9.10) for the
~
~ijk'S
and finally for the hijk'S
in terms of the n1Jk 's on modern high speed computers will be discussed 1n a later
paper.
26
10.
Concluding remarks.
The extension from tables of three to more than three
dimensions does not bring up any new problems so far as concepts like "no multiple
correlation", "no partial correlation", etc. are concerned.
A new feature with, for
example, a four-way table would be the concept of "no correlation"between "(ij)" and
which can be expressed as HO: Pijkl = PiJooPookl' and tested ina straightforward manner. The generalization to any number of dimensions of the concept of
" (kl )"
independence between two sets of "variates" is obvious.
Replacement of some of the
"variates" by "ways of classification" will only make the final interpretation a littIe different.
eralization.
The concept of "no interaction", however, is not one of trivial genIn a four-way table the hypothesis analogous to (4.2.5), that is, the
hypothesis of "no second order interaction", seems to be
= qijkoqijolqiokiqojkRqioooqOjooqookoqoool
(10.1)
qijooqiokoqioolqojko~joi~oki
•
In this case the hypotheses of four separate "no first-order interactions" follow
exactly the same pattern as in section 9, and need not be separately considered. The
extension of (10.1) to higher order "no interactions", in the case of tables of higher
dimensions, forms a certain pattern which has been worked out and which will be discussed in a later paper.
The technique of testing (IO.I) and "no interaction "hypo-
theses of higher order is essentially similar, in principle, to what has been discussed
in section 9.
The details alone are more complicated.
For higher order "no inter-
actions" there are, however, various intermediate cases of considerable interest
which will be discussed later.
We have discussed the hypotheses of "no multiple correlation", "no partial correlation", etc., but have not introduced any measures of "mUltiple
tial correlation",
"interaction~
2
etc.
correlation~',
This is now under investigation.
"par-
One measure
might be the expected value of X when the hypothesis is not true minus the expected
27
- . value when the hypothesis is true (which is a simple and well-known expression).
exact tests, with some reasonably good properties
No
~,~
.-.r
the class of relevant alter-
natives (to the hypotheses), have been discussed here, nor have the powers against
of
f'1.t:Af.A. 4...,.11.
2
permissible alternatives," the X -test been 11 ;1
E~.
In several of the cases, the
power functions would be easily available from previous work by others, but in other
cases, they would have to be worked out.
We give below only six references.
L-l.1,
L-4.1, L-5.1, L-6_7.
These will be discussed later.
The sources we have drawn upon most are
For a critical review of much of the previous work on
the subject, and a reasonably exhaustive bibliography we would recommend
L-3_7. The reader will perceive that this paper
L-2.1 in the general sector of "independence" in
L-2.1
and
has some (but not much) overlap with
a two-way table.
REFERENCES
e
L-l.1 Bartlett, M. S., "Contingency table interactions," Jour. Roy. Stat. Soc.
SUPR~.' Vol. 2 (l93~), pp. 248~252.
2
L-2.1
Cochran, W.
L-3_7
Cochran,
L-4.1
Cramer, H.,
Mathec~tical Methods of Statistics, Princeton University
Press, 19'4'b, Chap. 30.
-
L-5_7
Fisher, R. A.,
"On the interpretation of chi-square from contingency tables,
and the calculation of P," Jour. Roy. Stat. Soc., Vol. 85
(1922 ), pp. 87-94.
L-6_7
Yule, G. Udny and Kendall, M. G., An Introduction to the Theory of Statistics,
Hafner Publishing Company, 1956; Chapter 1 - 4.
G.,
w. G.,
"The X -test of goodness of fit," Annals of Math. Stat.,
Vol. 23 (195 2),PP. 315-345.
2
"Some methods for strengthening the common X -tests,"
Biometrics, Vol. 10 (1954), pp. 417-451.