I
I ..
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
f
I
BJOllAnfEMAnCS TRAINING PROGRAM
NOTES ON
ANALYSIS OF CATEGORICAL DATA
by
V. P. Bhapkar
University of North Carolina
University of Poona
Institute of Statistics Mimeo Series No. 477
May 1966
This work was supported by the National Institutes
of Health, Institute of General Medical Sciences Grant
No. GM-12868-01.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
CONTENTS
Preface . . . . .. . . . . . . . . . . . . . . .. . . .. Page
1
1.
General Introduction • • • . • . . • • • • • • • • . • • ••
2
2.
Some Basic Results ••
8
Some Problems of Association and Symmetry
for a Single-Multinomial Distribution . •
4.
39
Some Univariate Problems for a
Product-Multinomial Distribution.
72
Some Multivariate Problems for a
Product-Multinomial Distribution. • . . • • • • • • • • • • 101
6.
Some Other Problems. • • • . • . • . • • • . • • • • . . • • 126
Appendix . . . . . . . • . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . 157
I
I
,-
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
PREFACE
_w
Professor S. N. Roy had planned for qUite some time a monograph on
Analysis of Multi-Factor MUlti-Response Experiments embodying a part of work
in this field by several students under his guidance.
prepared by him based on the lectures he had delivered at the Institute of
Statistics, University of Paris, during 1962-63.
The project was left incomplete
because of his sudden death on July 23, 1964.
The present work is a detailed development of a part of that sketch
pertaining to analysis of categorical data.
This includes a part of contri-
butions made to this topic by Kastenbaum, Mitra, Diamond, the present author,
and Sathe under the guidance of Prof. Roy, and some further developments due
to the present author.
The emphasis is, naturally, on that part of the work
with which the author is most familiar.
Thus, this work does not profess to be a complete treatment of the
subject.
It does include, wherever necessary, some basic contributions to this
subject by others and, thus, attempts to give a unified treatment of the problems
faced in this analysis.
.I
A rough sketch was
1
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
1.
-
Factors and Responses
.
~
In most experiments, we have data of two particular types from each
subject or experimental unit.
(i)
These are
a description of the sub-population of units to which the
subject belongs or of the experimental conditions which
he undergoes,
(ii)
a description of what happens to the experimental unit
during and/or after the experiment.
We shall use the term "factor" to denote a characteristic of type (i) that can
be controlled and the
term" response" to denote a characteristic of type (ii)
which is in the nature of an outcome as far as the experiment is concerned.
According as we study each subject with respect to one or more responses, we
have a uniresponse or a multiresponse problem.
Similarly we have a unifactor
or a multifactor situation according as each subject is classified with respect
to one or more criteria describing sub-populations and/or experimental conditions.
In the cas.e of contingency tables, we usually have certain dimensions
or ways of classification corresponding to factors the remaining ways corresponding to responses.
fre~uencies
The data in any contingency table then are simply the
with which subjects belonging to the same combination of factor
categories give a specific combination of response categories.
marginal
fre~uencies
Thus, the
(summed over response categories) for any factor-combination
are fixed numbers known prior to the actual performance of the experiment.
These ideas will be further clarified in the introduction of the next chapter.
2
I
--I
,...2.__ ...structured
we.... •variables
......,
A response (or a factor) could be one of the four types, namely, (i)
purely categorical, (ii) categorical with an implied ranking, (iii) discrete,
and finally, (iv) continuous.
In the first case we shall say that the variable
(either a response or a factor) is unstructured and, in the last three cases,
we shall say that it is structured.
of categories is purely nominal.
In the nonstructured case any numbering
There is no natural system of weights or
scores to go with these categories.
In the structured case (ii) of implied
ranking (e.g. good, fair, poor etc.), the categories can be numbered in a
natural way and scores can also be assigned to these categories, although the
system is flexible and thus non-unique.
In the structured case (iii) the set
of possible discrete values for the variable prOVides a natural system of scores.
Finally, for the structured case (iv) categories would be class intervals for
a
conti~uous
variable and the distances from an arbitrary origin of, say, the
midpoints of such intervals form a natural system of scores.
However, in the
case of responses, because of economic or other considerations, one might assign
a system of scores to these categories which would thus convert an initially
unstructured response to a structured one, or which would induce a system of
weights different from the 'natural' weights even for a genuinely structured
response.
The main distinction between the structured and the unstructured response
lies not in any difference between the probability models but in the kind of
problems of interest.
For the study of associations among several responses
-
or of dependence of responses on factors there are certain types of questions
that are relevant in the unstructured case.
The same questions could be asked
in the structured case, but, in this case some other questions that could not
3
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
be posed at all in the unstructured case may very well be far more relevant
and meaningful.
It will be shown in later chapters how it is possible to test,
in the structured case, certain 'weak' hypotheses which might be of greater
interest to the experimenter rather than their 'strong' versions which are
testable in any case, structured or unstructured.
Thus it may very well happen
that such a weak hypothesis is sustained even though its stronger version was
not.
On the other hand, it may turn out that the given data are not sensitive
enougb to reveal the suspected differences, as far as an overall hypothesis(or
test) is concerned, but nevertheless they are brought out by a more sensitive
criterion for a specific feature of the original hypothesis; this aspect of
strengthening the usual overall tests has been already stressed by Yates (1948)
and Cochran (1954).
As far as the present work is concerned, we shall assume
that the system of scores, in the structured case, is preassigned and we shall
not discuss the problem of choosing an 'optimum' system of scores for structured
variables.
4
I
3.
_.
A Brief Description of the ScoPe of this Work
.,
......
...............
........
....... _.
_
",.,..,--
...
~
rJ
In this work all responses and factors are assumed to be in a crossclassification.
\'1e point out here that so far as inference is concerned, we
restrict ourselves mainly to point estimation and testing of hypothesis.
The
probability model assumed throughout is that of a single or a product multinomial distribution.
We concern ourselves only with methods of analyzing such
categorical data and we do not consider the problem of designing experiments.
One remark may be in order here.
As explained earlier in this chapter,
the categorical method of analyzing data is capable of handling the case of
continuous responses also, if we break down the domain of each response into
suitable class intervals, then form categories by combining the different
responses (one interval from each) and associate with each category an unknown
probability.
As against this, the traditional nonparametric method of handling
continuous responses is in terms of quantiles or ranks.
Compared to this
traditional method, the categorical method has two distinct disadvantages.
First, the large sample chi-square tests for the categorical method (and in
many realistic and somewhat complicated problems these are all that are available) require a much larger sample size than would be needed under the traditional nonparametric method, even when the latter method itself ends up by
using Chi-square tests of its own.
Second, the system of class intervals that
goes with each response type is non-unique.
I f we are in a position to dis-
regard these disadvantages, then the categorical method, as will be seen in
subsequent chapters, has one great advantage over the traditional in that it
can handle far more general tYPes of questions under far more general kinds
of models.
The basic mathematical results and theorems are stated without proof
especially when the proofs are lengthy, fairly complicated and are available
5
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
elsewhere.
I'
certain large classes of problems.
It is then indicated how they provide general methods for handling
The actual applications to special sub-
classes of problems are considered in some detail.
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
These are given just for
illustration and by no means exhaust all the possibilities.
Chapter 1 is the general introduction setting forth the underlying
philosophy so far as this work is concerned.
This has been very ably expounded
by Barnard (1947) for the simple 2 x 2 table and the present work lays a considerable stress on this aspect of the whole development.
In Chapter 2, the categorical set-up is explained in detail using a
single-multinomial or a product-multinomial probability model.
The basic
results on estimation and testing of hypothesis on this model are brought
together.
Chapter 3, Some problems of association and symmetry
!£E. ~
single multi-
nomial distribution, discusses problems which are natural analogues, appropriate
to the categorical arrangement, of various problems of association and symmetry
concerning a single multivariate normal distribution.
It defines various
kinds of independence between different response types and indicates tests of
relevant hypotheses.
The non-centrality parameter that occurs in the asymptotic
power function of'the test in each case could no doubt be regarded as a measure
of departure from the null-hypothesis and hence, in some sense, as a measure
of association between the response types.
However, for unstructured responses,
such a measure of association could seldom serve any useful purpose, while for
structured responses it is not adequate and something is called for that is
more in the spirit of regression than in the spirit of correlation.
Chapter 4, Some univariate problems for
6
~
product-multinomial distribution,
I
poses certain questions that are analogous to the traditional questions in
classical univariate analysis of variance.
are given for the study of these questions.
Methods available from Chapter 2
As in the previous chapter a
distinction is made between questions that are appropriate to unstructured
responses and to structured responses vis-a-vis unstructured and structured
factors.
Chapter 5,
~
multivariate problems for
~
product-multinomial distribution,
tackles similarly the appropriate analogues of the traditional problems concerning
several multivariate normal distribution.
Chapter 6, Some other problems, discusses briefly some topics not
treated in the earlier chapters.
A short ApPendiX follows.
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
7
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
Some Basic Results
""""Pr
1.
Suppose
from
s
s
nij
~
....
_
Introduction
ft_
""'V
independent random samples of experimental units are taken
populations,
lation and
...
n.
is the preassigned sample size from the J.-th popu-
oJ
is the observed frequency in the !-th category of the J.-th
sample,i=l, .• ,r and j-l, •• ,s. If we assume that Pij is the probability that the
experimental unit drawn at random from the J.-th population belongs to the !-th
category, and sampling is idth replacement, then the probability distribution,
~,
of the observed frequencies is given by
(2.1.1)
~
s
=
r
n .!
oJ;
IT
r
IT
j=l
i=l
IT
n. j !
. 1
1.=
P'j
nij
1.
which is a product-multinomial distribution of
is fixed and
t.1. P'j
1.
that subscript.
=P j
0
,
1.
= 1;
n.j's
1.
such that
~.
1.
=noJ.
the zero in the subscript denotes the sum over
It may be seen that even if the sampling were without replace-
ment, (2.1.1) wou~d still represent the probability distribution of
"noj's
n
ij
are small compared to
Noj'S,
nij's
if
the number of experimental units (or in-
dividuals) in the populations, and Noj's
are sufficiently large.
Otherwise,
then, the probability distribution would be the product-hypergeometric distribution given by
s
(2.1.2)
,
IT
j=l
8
I
is, of course, the number of units in the l-th population bewhere NoJ, P'j
~
longing to the !-th category.
It should be noticed that! may be a multiple subscript, lay i , i •.. i
k
1 2
with i = 1,2, .,., r , a = 1, 2, ••• , k; with all combinations allowed,
.-I
I
I
I
I
I
I
a
a
Similarly 1 also might be a multiple subscript say j1j2
with
j~
= 1,2, ••. ,
s~,
~
= 1,
2, ••. , J, but with the important distinction
that all combinations may not be allowed.
In other words, the sampling design
may be an 'incomplete design' such that, for a given combination
j~+l
~
(1, 2, ... , s~+l)'
can take a value only from a subset of integers from
(j1 j 2 .•. j~), for each
depending on
and for some
j~), j~+l
(j1 j 2'"
(1, 2, ••. , s~+l)'
Thus
s
= 1,
~
(j1j2' ... j~),
2, •.• , J-1; of course, for some
can take any value from the complete set
is not necessarily sls2 ••• sJ
from the nature of the design used; it is
but can be computed
sls2 •.. sJ only if the design used
is a 'complete design'.
This setup will be called a k-response (or k-variate) and J-factor experiment; if the experimental unit belong to the response-category
then
,', ,.,
~1'
~2'
,.1 ,
.. . ,
k
(i i •·· i k ),
1 2
will be called the responses of the experimental
unit, and if the unit comes from the population
said to belong to the
~-th
category of the
~-th
(j1j2 ••• j J)' then it will be
factor,
~
= 1,
2, .•• , J.
Summation oyer a subscript will be indicated by replacing that subscript
by 0; such a sum for
n's will indicate a marginal frequency.
denotes the number of units from the factor-combination
i 1, n (i 0 ••• 0)(0 ••. )
1
the first response i n ,
(0 ... Ohj10 ... 0)
(j1 ••. jJ) with the
Similarly, we have p(o
9
We note here that
n
I
I
I
I
I
I
I
J)
eI
°... 0 )( j1··· j J )
11
denotes the number of units from
the l1-th category of the first factor, and so on.
are preassigned integers.
n(
denotes the total number of units with
first response
l'
Thus
_I
(0 ... 0)(j1' .. j
• • • 0) ( J. 1 ... j J ) == 1, while
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
P(ilO .,. O)(jl •.. jl)
denotes the probability that the experimental unit
from the factor-combination
(jl •.• jl)
the !l-th category (i.e., has the first response equal to
typical quantity (other than n
or
iI' ••• , i k and the factor-categories
these subscripts underneath.
i l ), and so on.
jl' ... , jl will be denoted by writing
A star in place of a subscript will indicate that
the quantity under consideration is independent of that subscript.
example,
t. j j
J. l
1 2
:::: t. *j
J. l
2
A
p) depending on the response-categories
Thus, for
implies that the quantity is independent of the
category of the first factor.
The product-multinomial distribution (2.1.1) reduces to the multinomial
distribution when
s:::: 1; in this case we need only drop the suffix
j
(together
with the associated product symbol).
The same remark applies to the product-
hypergeometric distribution (2.1.2).
In this work, we shall restrict ourselves
to the model (2.1.1), Le., we assume that either sampling is with replacement
or, if without replacement, the sampling fractions are small relative to the
large populations.
As already mentioned in the first chapter, in this monograph
we shall confine ourselves to the problems of inference that come under point
estimation, tests of hypotheses and, to a lesser extent, confidence interval
estimation.
Some more general problems will be briefly touched upon in the
later chapters.
We shall also not go into the problem of 'designing' the experi-
ments, i.e., the problem of selecting factor-combinations
jlj2 •.. jl
from
which samples are to be drawn and also that of determining a suitable sample
size for each such combination selected.
,.
I
has the first response belonging to
10
I
2.
Some Basic
Theorems on ..............
Estimation
....... _""'..
"oJ
~
~~
With regard to point estimation and testing of hypotheses we shall make
a few remarks before stating the various theorems without proof.
as the model, suppose that the probabilities Pij
functions of t
unknown parameters
el , e2 ,
••• ,
Under (2.1.1)
are expressed as specified
et ; thus
r
.E Pij (,2)
:0:
~:::::l
for all ,..e belonging to the parametric space
interval in the t-dimensional Euclidean space.
,2.
A
1
n
Which is a non-degenerate
The first problem is to estimate
maxinnun likelihood estimate of ,2 is that value of ,2 in n for which
the likelihood function
its supremum).
$,
given by (2.1.1), attains its maximum (or, rather,
Note that such an estimate may not exist.
It can be shown,
however, that under suitable regularity conditions such an estimate exists
with probability approaching one as the total sample sii..ze is increased indefinitely
and, asymptotically, this estimator has certain desirable properties.
It turns
out that several other estimators possess these desirable properties asymptotically
under such regularity conditions.
Two of these are obtained by minimiZing the
follo,iing measures of discrepancy between the observed frequencies
their expected values
n oJ.p'j{e),
~
,..
Pearson -
x2
=
r
s
E
E
j=l i=l
and
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
(2.2.3)
2
s
Neyman - Xl::::: E
j=l
nij
-'I
r
E
i:::::l
11
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
e
with respect to variation of
'"
in
n;
in defining the latter it is assumed
that no observed frequency is zero.
The minimum values (or, more generally, values obtained by using any
'efficient' estimate of .....e) of these measures have been proposed as criteria
to test the hypothesis
HOl
and y2 respectively.
case
s
= 1,
=0
t
X2
specified by
(2.2.1).
These vall be called
X2
was proposed by Karl Pearson (1900) for the special
and he proved that this statistic has a limiting chi-square
distribution with r - 1 degrees of freedom; Fisher (1928) proved that for
the case
=1
the statistic has a limiting chi-square distribution with
1 degrees of freedom.
r - t y2
s
Neyman
(1929) proposed the modified version
above and proved a number of basic theorems in an important contribution
(1949) to this subject. Various proofs of these theorems have been offered
so far under regularity conditions of differing generality and the reader is
(1946), Rao (1958) and Birch (1964), among others for
referred to Cramer
example, for the case
s
= 1. A proof
along Cramer's lines for the y2-statistic
using linearization (to be described later in this chapter), if necessary,
for the case
s
=1
tic with general
is given by Bhapkar
s
is given by Mitra
(1959) while a proof for the X2-statis-
(1955).
Consider then the probability distribution
expressed as specified functions
the total sample size and
s
N=
!:
j=l
n.
oJ
~j'S
Let
the observed sample proportions, 1. e.,
,
n
j
(2.1.1) with pIS
(2.2.1) of unknown parameter .....e.
let also
Q
~
=::£J.
N
12
N be
I
In addition to the basic assumptions (2.2.2), we assume the following:
e
The 'true' parameter point
(i)
(ii)
Pij (£)
The functions
The parameters
ek's
n.
possess continuous partial derivatives up to the
e' s
second order with respect to
(iii)
is an interior point of
-0
for all
-
n.
e in
are independent, that is, the rank of the rs x t
matrix
.. ,
r
· .. ,
t
i "" 1, ·
j ::: 1,
k ::: 1,
· .. ,
is
t
-
for all
e in
s
t < rs - s.
0, and
We confine ourselves to those estimators only which are functions of
~j
and do not depend directly on N.
normal (BAN) estimator of
(b)
Ok
a,s
N ....
eo
~fN(~k - 90k )
a positive number
is called a best asymptotically
eok is it satisfies the following properties:
(a) ~k is a consistent estimator of
to e
ek
with Q' s
eok '
that if,
"ek
converges in probability
is asymptotically normal with zero mean, that is, there exists
rr , independent of N, such that
k
1.. . J2JI L
< z
(c)
y
place of
(d)
1
N .... IlO with Q' s he ld fixed for every real
If
is any function of
rrk , then
q's
z.
satisfying (a) and (b) with
rr taking the
rr? rrk •
~k has continuous partial derivatives with respect to each ~j'
An estimator
BAN estimator of
e
-0
~ with components ~, k "" 1, •.. , t,
with components
.
are satisfied for each k
I
I
I
I
I
I
I
_I
held fixed,
z
as
_.
and,
eok
~oreover,
13
will be called a
if the conditions (a) through (d)
if
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I-
I
(e)
.IN (~,.. -
e)
-0
is asymptotically non-singular normal with a null mean-vector,
i.e., there exists a positive
~
~
as
N
~co
with
Q's
held
de~inite
-co
x
,..
N, such that
o~
dx
-00
~ixed ~or
~or
every real
zk' k
= 1,
•.. , t.
has diagonal elements
~
and, moreover,
rr
k
k by (e); however, (b) ~or all
each
'?lk
The property (e) implies that in the limit
t
.IN (~,.. -
~ollows i~
e)
-0
.IN
normal distribution (possibly degenerate)
THEOREM 2.2.1.
....
_
a' (~e)
,..
""0
,..
~or
every
has a limiting univariate
~ixed
real vector
Under the conditions mentioned above, the system
;J
obtained by equating to zero the derivatives
o~
the likelihood
e's possesses a solution Which is a BAN estimate
respect to
THEOREM 2.2.2.
Under the same conditions, the system
......,
"'...".~~--
by equating to ze~o the derivatives o~
a solution Which is a BAN estimate
~~~.2...:.?3...
the system
2
Xl
o~
do
are linearly
S
a.
,..
~ollowing:
The basic theorems, due to Neyman(1949), on estimation are the
___
k
A weaker property implying asymptotic joint normality (possibly
o~
singular)
independent
-1
(b) is automatically implied
independent.
~
,..
e --21 x,.,' I:
vie note that in this case
not imply (e).
matrix
o~
(2.2.3)
Assume that no
t
o~
x2
e
""0
nij is zero.
o~
t
t
o~
,
o~
(2.1.1) with
e.
""0
equations obtained
(2.2.3) with respect to
e's possesses
Then under the same conditions,
equations obtained by equating to zero the derivati ves
with respect to
equations
o~
e's possesses a solution which is a BAN estimate
e.
""0
R~~
(1)
The condition that the number
population is the same, r,
~or
all
j
o~
categories, r j ,
is not necessary.
14
~or
He have
the j-th
ass~d
this
I
condition only because all the applications in this work satisfy this condition
(2)
The conditions
e
by requiring it only of
(3)
-
p .. (e) > 0, (ii) and (iii) can be weakened,presumably,
J.J in some neighborhood of e.
-0
The solution mentioned in Theorem 2.2.1 may be called a maximum
likelihood equation
(m.l.e.) estimate. Is it necessarily a maximum likelihood
estimate, i.e., does it maximize
(iii) above can be relaxed?
~?
To what extent the conditions (ii) and
In this connection we may refer the reader to some
results due to Rao (1965) and Birch (1964) for the case
s
=1
and which,
presumably under some additional conditions, hold for the general case as well.
They impose an identifiability condition, viz.,
(iv)
whenever
given an
€
Ie- - -0
e I>
€.
> 0, there exists a
0 > 0 such that
1Pi(2) - Pi(2c)
I
> 0
Rao replaces (ii) by a weaker condition assuming existence of first-order derivatives
of p. (e) and their continuity at e and proves that a maximum likelihood
-0
J. estimator is some m.l.e. estimator which is asymptotically normal and 'efficient'.
Birch assumed a still weaker condition of total differentiability of Pi(2) at
e,
-0
i.e., of existence of first-order derivatives at
e
-0
such that
eI
-0
e
as
-
-+
e,
to prove that the maximum likelihood estimator is asymptotically
-0
normal and 'efficient'.
(4). Under the
ori~al
conditions it can be shown that the solution
mentioned in Theorem 2.2.1 maximizes
N
-+
~
with probability tending to one as
eo; henceforth this solution will be called the maximum likelihood estimate.
Similarly the solutions in Theorem 2.2.2 and 2.2.3 will be called henceforth the
..
2 an d mnJ.mum-X
..
2 es t'J.Inat es.
mnJ.ll1UIll-x.
l
15
.-I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
I
I
-
I
I.
I
I
I
I
I
I
I
(5)
Note that all n .. > 0
~J
for all i, j; thus the minimum-x~ esti-
in view of the condition Pij(20) > 0
mator is defined with probability approaching one.
In general, the equations giving maximum-likelihood or minimum-x2 (or
xi)
estimates are fairly complicated to solve and we have to resort to iterative
methods.
The minimum-xi method, though, has particular advantages in those
problems where the functions p .., in (2.2.1), are linear; in this case the
~J
equations are linear and, hence, easier to solve.
(1949)
linear, Neyman's
BAN estimates using the minimum-xi method subject to 'linearized' constraints.
The condition that
p .. 's
are expressed in terms of
~J
t independent ek's
by (2.2.1) is technically equivalent to the set of rs - s - t independent restrictions
t
other than
= 1, 2, ... ,
rs - s - t,
E. p. j = 1, obtained by eliminating
~
9' s.
~
functions of either
rs - s
independent
F
We shall assume that the functions
t
p's
or
F' s
may be regarded as
rs p's subject to
E. p ..
~
~J
= 1,
possess continuous partial derivatives
up to the second order with respect to p .. 's.
~J
Then we have the following result
due to Neyman:
THEOREM
2.2.4.
............ -
~""""'*""_MtoI
If a system of equations (mentioned in theorems above) subject
to constraints (2.2.1) leads to BAN estimates of Pij'S, then a similar system
of equations obtained subject to 'linearized' constraints.
s
(2.2.4)
F~ (~) == F t (s,) +
E
j=l
r-l
E (
i=l
.I
If the functions are not
'linearization' technique is quite useful to generate
Ie
I
I
I
I
I
I
I
N~c:o
with probability tending to one as
dF t )
~
(P ..
£ = S.
~J
t ::: 1, 2, ... , rs - s - t,
16
I
also leads to BAN estimates of
p .. 's
lJ
In particular, minimum-xi equations obtained subject to 'linearized'
constraints lead to BAN estimates of
Pij'S,
in p's they are, in general, easier to solve.
for a certain problem in Chapter
4.
As these equations are linear
This technique is illustrated
.'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
17
I
I
I
,e
3.
Under the model (2.1.1) suppose that the hypothesis
that
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
Silinj.fjc~nc~
l-_l'eE,tS",i'f
can be expressed as given functions of
p .. ' S
lJ
Hol
specifies
0; thus
.....
p .. =p .. (e)
J.J
J.J""
where, as before,
r
~
p'j(e)
1
,.,. > 0
for all
n,
Pij
e
.....
n
in
(20) > 0
i::l
p .. (e) :: 1
J.J
,.,.
and at the 'true' point
for all
i, j.
e,
""'0
which is an interior point of
We assume that these functions satisfy also the
2.
conditions (ii) and (iii) in section
Neyman
(1949) proves the following two
theorems, stated here without proof:
Let ~
be the maximum likelihood estimate mentioned in Theorem
,...,
THEOREM 2.2.1.
"'--...-----."""tww
~~
~
2.2.1,
,.,.
Pij == Pij (~)
/\
and assume that no
A=
(2.3.2)
n ..
J.J
is zero.
s
r
II
II
j==l
i=l
If
(n .~. -tij
OJ
n
lJ
ij
then under the conditions (i) - (iii), - 2 log A has a limiting chi-square
distribution with rs - s - t degrees of freedom as
N~ro
Hith Q's held fixed
if H
is true.
ol
TI!E...9E.~-~. 2.:.g:..
through
Let ~
..... be any BAN estimate of
2.2.4,
18
e
""0
mentioned in Theorems 2 2.1
I
I
s
E
j=l
-.
r
E
i=l
and
Y?-:;:
(2.3.4)
in defining
s
E
r
E
j=l
i=l
Y?-
it is assumed that no n
is zero. Under the conditions
ij
(i) - (iii), both X2 and y2 defined above have limiting chi-square distri-
butions with rs-s-t
HoI
degrees of freedom as
N.-+oo with QI S
held fixed if
is true.
~...rJ.t.t - 2 log t..,
X2
Y?-
and
defined in the above two theorems will be
called x2-statistics for testing HoI'
junction with the
minimum-·il
Y?-
would be used, in general, in con-
estimate, if the functions
p,J.J.(e)
....
are linear,
,and the minimum - xi estimate using linearization otherwisejthen it would be
called hereafter ~ xi-statistic.
Similarly
x2
would be used, in general,
in conjunction with the maxiITru[fi likelihood estimate; this would be hereafter
called the (Pearson)
X2
statistic.
t.. is the well-known Neyman-Pearson
t.. it is implicitly assumed that
likelihood ratio statistic, and in defining
~j
is the maximum likelihood estimate of Pij if the pIS are not subject to
ij > 0, Ei Pij
any constraints other than P
C
1; note that trivially
~j
2
also the minimum-x (or xi) estimate of Pij under these conditions.
is
Since
Pij > 0, all nij are positive with probability tending to one as N.-+oo
with Qls fixed so that the statistics t.. and Y?- above are also defined
wi th
probability tending to one as
N.-+ 00.
The theorems above remain valid for
the case Where there are r j categories for the j-th population with r
by
s
replaced
E. r ..
J J
It may be noted at this stage that the hypothesis
19
HoI may be specified
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
in
t,~
equivalent forms; the restrictions on p's (other than
may be specified either by'the freedom equations' (2.3.1) where
parameter in
~i Pij = 1)
0, or by rs - s- t
on Pij'S
~.
~
= 1)
P'j
~
e is
,."
a free
independent constraints (other than
after eliminating the
t
unknovm parameters
eks.
Suppose these 'constraint equations' are
t
£
where
= 1,2,
.-., rs - s- t,
denotes the vector of Pij'S arranged in a suitable order.
It is
always theoretically possible to go from (2.3,1) to (2.3.5) and vice-veraa,
though, the actual process may not be easy in practice.
If we decide to work
with (2.3.5), rather than with (2.3.1), then the equations for ~ij'S
(either maximum-likelihood or minimum
to the constraints (2.3.5).
i( xi))
will have to be solved subject
Whether the hypothesis Hol is specified by the
'freedom equations' (2 3.1) or by the 'constraint equations' (2.3.5) will
depend on the particular situation and the nature of the hypothesis.
cases the
specific~tion
In some
(2.3.1) is more meaningful, in some other situations the
specification (2.3.5) is more appropriate while in the remaining cases any of
these could be used.
In any case, the degrees of freedom for the limiting
chi-square distribution are given by the number of independent constraints
(2.3.5) defining, the hypothesis, or equivalently, by the number of independent
(i. e. rs - s) minus the number of independent parameters (i. e.
p .. 's
~J
be estimated in (2.3.1).
- 2 log
A
with
t) to
Since any of the statistics (2.3.3), (2.3.4) and
A given by (2.3.2), is, in some sense, a measure of deviation
from H ' the hypothesis is rejected if the observed x2-statistic exceeds the
ol
tabled value of the chi-square variable for the desired level of significance.
Now suppose it is assumed that the
(2.3.1) where the functions
Pij
Pij'S
are given by the relations
satisfy the regularity conditions already
20
mentioned, and we want to test the hypothesis H02 that
0, where parameters
(2.3.6)
h
ek's
are subject to u
(e) == 0 ,
m -
2
€
1.0', a
subset of
given independent restrictions
m = 1, 2, . ., u.5 t ,
in addition to the original restrictions (2.3.2).
The functions
hm are
asswned to possess continuous derivatives up to the second order with respect
to the ek's; the independence of these restrictions then means that the u x t
matrix
[dhm/oe ]
k
If
is of rank u
(at' ... ,
2
for all
n.
in
~) is any set of BAN estimates of
(e l , ... , et)
respectively, mentioned in Theorems 2.2.1 through 2.2.4, but subject to the
conditions (2.3.6), and P~j
~
==
P'j{e*), then it follows from Theorems 2.3.1,
~-
2.3.2 that each of the statistics (2.3.3), (2.3.4) (with ~ij replaced by any
prj) and - 2 log ~
(With ~
in (2.3.2) replaced by p*
==
p(~), ~ here
being the maximum likelihood estimate) has a limiting chi-square distribution
wi th
rs - s - t + u
degrees of freedom as
fixed, on the assumption that (2.3.6) holds.
N ... eo wi th Q' s
ThUS, each of these statistics
provides a test of the combined
hypothesis H0 1
,
{P
iJ •
(2.3.7)
h (e)
m-
starting from the model (2.1.1).
remaining
n
H 2 (i. e. both
0
H 1
0
and
P/£)
i
IE
0,
m· 1, 2, •.• , u
The test of the hypothesis
H
02
specified
by
m. 1,2, ..• , u
m (2) == 0 ,
starting from the model (2.1.1) and (2.3.1) is provided by the following
02 :
H
h
theorems due to Neyman (1949):
THEOREM 2.3.3.
Let
'w"'"
~. ~'-
--
~ and e* be the maximum likelihood estimates of e
fIIttJI
~o
,.,.,
21
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
I
I
..
I
I
••
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
under the conditions (2.3.1) and (2.3.7) respectively,
~'j = P'j(~)'
~
pfJ'
.. ( ,..,
9*)
... :: p ~J
~,..,
and
.~ (~)nij
s
A
(2.3.8)
JI
lC
j::l
p ..
l-l
lJ
If H
holds, - 2 log A has a limiting chi-square distribution with u
o2
degree s of freedom as N -400 Hith Q' s remaining fixed.
THEOREH 2.3.4.
W~"""4"'" • .,.....,...,.,.
Let ~
and
~
9*
~
be any BAN estimates of
e,
mentioned in
~
Theorems 2.2.1 through 2.2.4, under the conditions (2.3.1) and (2.3.7) respectively.
If vIe set
each of the two statistics
s
I:
I:
j=l
i=l
s
(2.3.10)
r
I:
j=l
(n
u - noj P!j)
noj p!j
2
-
(n .. - noj P*
ij )2
lJ
I:
nij
i=l
r
has a limiting chi-square distribution with
\cith Q's
that no
held fixed if
nij
1\
s
r (n .. - n oj Pij)
I:
lJ
I:
1\
j=l i=l
noj Pij
s
r
A
2
(n ij - noj Pij )
I:
I:
n
j=l i=l
ij
2
u degrees of freedom as
N-4 oo
Ho2 is true; in defining (2.3.10) it is assumed
is zero.
Just as H
given by 'freedom-equations' (2.3.1) can be expressed equivaol
lently in terms of 'constraint equations' (2.3.5), Ho2 given by 'freedom
equations' (2.3.6) can be expressed equivalently in terms of 'constraint
equations'
22
(2.3.11)
t = rs - s - t + 1, ... , rs - s - t + u
vmich are independent of the basic constraints
~.
~
p ..
~J
= 1 and those given
by (2.3.5) which define H l' The equations for p*.. 's (either maxirnum-1ike1io
~J
2
hood or minimum-x (xi» then have to be solved subject to the constraints
t = 1, 2,
,rs - s - t + u,
t (E) : 0,
in addition to the basic constraints. The test of H
and H
jointly,
o1
02
i.e., of (2.3.7) is provided by any of the statistics (2.3.3), (2.3.4) and
F
- 2 log '"
with" given by (2.3.2), and ~ .. replaced by pf j
~J
~
,
with rs-s-t+u
degrees of freedom; the test of H
only given by (2.3.6) ''lith Ho1 (i e.,
as the assumed model is carried out by using any of the statistics
02
(2.3.1»
(2.3.9), (2.3.10) and - 2 log '"
with" given by (2.3.8), on u
freedom.
degrees of
I
I
...
I
I
I
I
I
I
el
I
I
I
I
I
I
I
23
••I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
4. Wald's Test......Criterion
~~
~
For the test criteria described in the earlier section one crucial step,
from the computational stand point, is to obtain the estinmtes ~
,.., or
$*.
,..,
many situations i t is quite difficult to solve the relevant equations.
In
In
such cases Neyman's 'linearization' technique (1949), described earlier, and
Wald' s theorem (1943) come in quite handy.
In the next section it
the
~~ll
be shown that, whenever all
n
ij
are positive,
xi- statistic (using trlnimum-xi estimates) is algebraically identical to
Wald's criterion for testing a 'linear hypothesis', that is, a hypothesis
which is expressed either by linear constraints on
of linear functions of
e's.
p's
pIS or by a specification
in terms of linear functions of unknown parameters
Moreover, this algebraical identity of these two test criteria is preserved
even if the hypothesis to be tested is nonlinear provided for the xi-statistic
we use minimum-xi estimates obtained after using the 'linearization' technique.
Wald's theorem stated in its general form, under a number of regularity
conditions not stated here, is as follows:
function of independent and identically distributed random variables
m = 1,2, •.. , N, involving
parametric point.
e
"'"
t
unknown parameters
belongs to some space
0.
continuous partial derivatives with respect to
(el ,
•.• ,
et ),
It is assumed that
e
~,
where the
W
possesses
up to the second order and
the matrix
= 1,
(2.4.1)
ex, f3
which is assumed to exist, is positive definite.
Let ~ be the maximum likelihood
estimate of
e.
"'"
If
~'
is a subset of
24
n such that
"'"
2,
, t,
(2.4.2)
= Fu
(e)
...
I
I
-.
(u < t),
= 0
where the restrictions (2.4.7) are independent,
I
I
I
I
I
I
and
k
(2.4.3)
= 1,
2, ..• , u
a = 1,2, . . ,
then as
N -+ flO, if
...e
N E'(~)
(2.4.4)
belongs to
t
,
the statistic
L',
-1
[J!(~) ~-l (~) J!'(~) l l!(~)
has a limiting chi-square distribution with u degrees of freedom (under some
regularity conditions).
Bw
Remark:
the probability function we mean either the probability distribution
if the variables are discrete or the probability density function if they are
absolutely continuous.
...H
Independence of the restrictions
(2.4.2) implies that
is of rank u.
Note that Wald' s theorem is concerned with a random sample from a single
population; however, we shall refer to
(2.4.4) as Wald's statistic even for
the case of independent random samples from several populations provided N
= E.n
.
J oJ,
noJ. being the size of the sample from the JL-I-th population.
We now propose to show that Wald's statistic
(2.4.4), when adapted to
the categorical set up, is exactly the same as the xi-statistic (using linearization if the constraints F's are not necessarily linear) whenever the latter
is defined.
The assertion in Theorem 2.4.1 remains valid for sampling from
several populations as well in view of Theorem
fixed as
N -+ 00.
2.3.2 provided (noj!N)'S remain
For the case of one population Wald has shown that his statistic
and the likelihood-ratio statistic possess some asymptotic optimality properties;
25
el
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
we conjecture that this remains
80
from several populations provided
as N -+ co under
SOJIM:l
even for the general case of sampling
no/N remain fixed (avTay from 0
further regularity conditions.
and
1)
It then follows that
the same asymptotic optimality properties are possessed by the xi-statistic
for the case
s "" 1
s >2
and possibly also for the case
if the conjecture
is true.
For the categorical set up, then,
_ { l if the ~-th observation from the 1-th sample belongs to the
let XCi)
mj . 0
otherwise,
!-th category
i
= 1,2,
x'
~
.•. , r; m"" 1,2, ... , n oj ; j E 1,2, ... , s and
= [X(l)
mj ,
..... , xt)]·
Then the probability distribution of the
~'s
is
given by
8
=
,
II
j=l
(i)
.
since E xCi) En' it is of course assumed that each Xmj
is either 1 or
m D1j
ij'
Taking £ "" £ with t =.rs - s, it is then easy to verify from (2.4.1) that
-
[-1
n o 1 PI
""
-1
-1]
+ Pr
1 L
0
""""
o
,...
"" N
n
02
,o
...
O.
0
,...
'[p- 1 + p-l L]
~
r2 ,...
9-
.........
.-
,
,o
...
where
Pj "" diagonal (Pl"J
,...
•.. , Pr- 1, j)
,
and
~ = [l](r_l) x (r-l)
so that
) "There
._
'" = N £ ( £'
~-l(n)
--1
nOl
(2.4.6)
,
[fl
- £l£i
-1
n 02
,o
...
o,...
Q.
0,...
[Ie - ~~]
0,...
0
,...
26
n
-1
os
[r.a
-
lls~]
,
I
.'I
where
E,
If the hypothesis H is then given by (2.4.2), with f!. replaced by
o
the large sample chi-square statistic to test Ho is then
(2.4.7)
~'(~)[~(~)Q(~)ll'(~)]-~(~)'
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-,
27
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I-
I
5.
~licit
E~ressions
for
co•
...;"O:w_
~_
the
--.
2
~l" Te!lt-Cri}eri~_
in ~
S~ia]. £lass..e s• .9!,
Problems
,.......~-
~
Suppose the hypothesis
Ho :
where
f
kij
(i)
(ii)
and
the
s
-
matrix
the equations
= 1,
[f
kij
]
is of rank u < rs - s
and
(2.5.1), together with L:.p
..
~ ~J
°
It is, of course, assumed that the constraints
of the basic constraints
2, , ., u
L:.p.
j
~ ~
for all
=
i
1, have at least one
= 1, ... ,
r
and
j
= 1, ... ,
(2.5.1) are linearly independent
= 1-
(n .. > 0, all i, j) has probability approaching one under
Since the event
~J
Pij >
assume that all the
L:
j=l
k
for which Pij >
(Pij)
the condition that
r-l
L: (fk · j-fk . )p .. +fk+L:fk j=O'
i=l
~
rJ ~J
j r
are knovm constants such that
k
u x rs
set of solutions
is a 'linear' hypothesis given by
s
L:
j=l
-
f
H
o
nij's
°
for all
(i, j), we may for asymptotic purposes
are positive.
Let
r
=
L: f . '~j'
i=l k ~J
and
= [c l ,
c'
...,
... , cu ]
Vle notice that
is in tIle nature of a 'sample mean I of a random variable
'F ' taking a value
k
f kij
if the corresponding observation falls in the l-th
category for the ,j,-th sample, while
of 'F
k
I
and 'F l
it follows that
I
for the ,j,-th sample.
g is
e
klj
is in the nature of a 'sample covariance'
Since the F I S are linearly independent,
k
positive-definite with probability one.
28
s.
I
!~O~,f.2.~2.1..:..
x~
The
statistic to test H ' given by (2.5.1) under the conditions
o
mentioned above, is
x*2
with
u
PROOF:
-1
= ,...,...
c' G ,...
c
,
degrees of freedom.
"(.i
To minimize
pliers, Aj
and
f
"1t'
=
subject to the constraints we introduce Lagrangian multi-
and vTrite
s
r
1: n
1:
j=l oji=l
(P ij - ~j)
2
-~--
--";;;JIl._
~j
Differentiating with respect to
P
ij
s
-
2
r
u
1: (1: Pi' - 1) - 2 1: ~ F (I?,)
k
j=l i=l
J
k=l
and equating this to zero, we get the
minimizing equations
~j
Multiplying by
and summing over
,
16' s
are t(') be
r
determined from (2. 5. 1).
s
i~ j~lfkij~j
(1 +
n~j
Lt
Hence
~t(ftij-btj»)
that is,
where
}:' = [ ~1' .. ,~],
Hence
ij
= - ~
29
-1
£.
I
I
I
I
I
I
_I
i, we get
Eliminating the A'S we get
where the
••I
Then
+ fk = 0 ,
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
2
'Y.*
E
=
s
r
1
2
2
min Xl = .I: n oj I: a. ( - kI: u. (fk-l ' -bkj ) } ,
J=l
i=l ~j n oj
'K
• J
u'
N
G
II
,.,.., N
=
-1
c' G
~".""
c
"..,
Q. E. D.
.
Similarly we can show that I'lald' s statistic (2.4.7), to test the hypothesis
H , given by (2.5.1), also reduces to the same expression, x*2, given by (2.5.3).
o
PROOF:
= ...c'
Here
and
2,
... ,
r-l
j :::: 1, 2,
.... ,
s
= 1,
i
in view of the conditions
I:.p .. = 1.
~
,
Hence,
~J
Where
£k(q) = [~l - f krl
1', .. -,
~j = [fklj , ... , fk(r_l)j]
!'kS - f krs
and
1 ::
1'] ,
[1] (r-l) x l '
Therefore,
from (2.4.6), where
~j:::: diagonal (qlj' ""
~-l, j) and ~j
After simplification, this matrix is seen to be
Remarks:
,.. ......
...
~
(1)
The
::
[qlj' , .. , ~-l,j]
...
G and the result follows •
x*2 statistic (2.5.3) obtained thus in this case either by
30
I
the minimum-xi method or by Hald's method, is seen to be the same as the statistic
we 1.,ould obtain if we test the hypothesis (2.5.1) by considering the
natural unbiased estimates of
~j
~i
b ' s, the
k
f kij Pij , and using asymptotic normality
We have
-1
noj
_.
I
I
I
I
I
I
I
(~i f kij Pij)(~i f lij Pij )
Hence, when H is true, £ is asymptotically N(2" ~), (to be more precise
O
·rN £ is asymptotically normal (2" I), where I::: [ '!Tkl] and
so that ,..,,..,
c,~-lc
,.., is, in the limit, distributed as a chi-square with u
degrees
2
If we replace Pij in ~ by ~j' we get Q,. Thus the minimum-x.l
method (or Wald's method) to test the linear hypothesis (2.5.1) is equivalent
of freedom.
to the 'large sample test' based on the asymptotic normality of the unbiased
estimates of F (£), whose covariance matrix is estimated by the 'sample
k
covariance matrix'.
(2)
If hypothesis
H is not linear, say,
o
= 0,
k = 1, 2, ... , u,
then Neyman's linearization technique considers
_I
I
I
I
I
I
I
I
-.
31
I
I
I.
I
I
I
I
I
I
I
where
i
~
1,2, .. "
k
= 1,
r-1,
and tests the 'linearized' hypothesis
The
xi to
test
by (2.5.2) with f krj
H~
is then
= O.
£,~-1~
with
.,., u.
ck
= F~(~) = Fk(~)
and
~
given
On the other hand, Wa1d's statistic (2.4.7) is seen
to be
with
Ie
I
I
I
I
I
I
I
I·
I
where
f
. = 0;
krJ
thus as mentioned in the last section Ha1d's method and Neyman's
linearization technique give the same statistic.
In fact we have proved (see
Bhapkar (1966))
!~O~~g~ Wa1d's statistic and the
xi
statistic, whenever the latter is
defined, are algebraically identical.
In the other class of problems, a linear hypothesis is defined by expressing linear functions of pIS as linear functions of unknown parameters.
Theoretically, of course, this can be reduced to the case already considered above
where the hypothesis is defined by linear constraints on pIS.
But, in many cases,
this equivalent expression in terms of linearly independent constraints on pIS
may be tedious to work out.
He now state theorems vlhich show that the problem
can be handled by the generalized least squares technique.
32
I
TI!E_OR]:M.. ~.~.)...
_.
Let a linear hypothesis be defined by
r
Ho :
where the
Then the
d's
i~l
a i Pij = d jl el + d j2 e2 +... + d jt et ,
and a's are known constants and the e's
= 1,
j
I
I
I
I
I
I
I
... , s,
are unknown parameters.
2
'('1 to test Ho is the same as the minimum sum of squares of residuals
obtained by the general least squares technique on
with the variances
i
estimated by 'sample-variances'; moreover, this asymptotic chi-square statistic
has s-v degrees of freedom, where
INDICATIOlf OF THE PROOF:
v
~i a ~j'
is the rank of the
s x t
matrix
~ =
[d jk ].
It can be easily shown that H, given by (2.5.4), is
o
equivalent to
~j ~i ~j a i Pij
(2.5.5)
where
M is
a (s-v) x s
A
-
= 0,
matrix
= diagonal
[~j]
(~, .'
'l
Then to test the hypothesis (2.5.5)
(2.5.6)
'(*2
"
= a'
M" (M
,...,,..,,,.,.,,
= 1, 2,
k
of rank
... , s - v,
--= -
s-v such that
MD
0.
Let
= [a , ... , a ].
0' A) and at
1
s
s
we have, from
(2.5.3), the
xi
statistic
AM'
,..,., )-1 Ma .
"..,
"..,
On the other hand, the aj's
,.."
are independent with variances
so that the 'sample variances' are
Aj ,
j
= 1,2,
... , s.
-1 (
2
= Ej(aj - d jl el - d j2 e2 - ••• - d jt et )2/ Aj
with respect to the e's, the m~inimized 52 can be shown to be (2.5.6). Thus
52
the result follows.
The above theorem is a special case of the following more general theorem:
nm.9J.ml1..?~.:
Let a linear hypothesis be defined by
r
i~l a~i Pij
= d jl
e~l + d j2 e~ + ... + d jt e~t'
33
2
noj ~i a i Pij - (~i a i Pij ) )
If we minimize the
sum of squares
(2.5.7)
_I
~
= 1, 2, "0' m,
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
where
D
""
a's
= [d'k]
J s
and
x
t
d's
are kno"m constants, B's
with rank D
""
=v
are unknorTn parameters,
< s, and the linear functions on the left in
(2.5.8) are linearly independent and also linearly independent of
of course).
~'P'j (::: 1,
].
].
Suppose
and
~,
~'
= 1,
2, •. , m
with
~~'j = n~~(~i a~i a~'i ~j
Then the
2
~1
-
a~j a~'j)
statistic to test the hypothesis
,
(2.5.8) is equal to the
minimum value of
~)
with respect to the
e's· and has
""
m(s - v) degrees of freedom
The proof of this theorem is similar to, but more complicated than, that
of the earlier theorem and, hence, in omitted.
We note here that
natural unbiased estimate of the linear function on the left in
/:!;j
a~j
(2.5.8), while
is the 'sample covariance matrix' of £j obtained after replacing
q's.
is a
p' s by
s2 in (2.5.10) is thus the 'generalized sum of squares of residuals'.
The last two theorems will be seen to be quite useful in later chapters in giving
immediately the appropriate
xi-statistics.
The first one is useful in the case
of structured response while the latter is applicable even to the unstructured
case.
We shall also need the following more general theorem which can be
;4
I
_.
established in a similar "'!ray.
Let a hypothesis be defined by
~.9.wMkl:4.:
(£):
g
T)
where
t
E
d
e
TJ :::
a=l rp: ex
t < P < {r-l)s, d's
1, 2, ... ,
are known constants and g's
p
are independent
functions with continuous partial derivatives upto the second order.
fj is the matrix obtained after substituting £ by
matrix of
~(~)
forms of
~(£), as fixed quantities.
regarding the coefficients of p's
~
in
Suppose
in the covariance
~(£),
the linearized
Then the Wald criterion (or the
xi-
statistic if it exists) to test the hypothesis is the same as the minimum value
of
with respect to ,..e
This
_I
.
A
,.. will be referred to as the 'asymptotic sample covariance matrix'
of ~(~) if g's are not necessarily linear functions and as the 'sample covariance
matrix'
if
g' s
I
I
I
I
I
I
I
are all linear.
I
I
I
I
I
I
I
-.
35
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
A test criterion is said to be consistent for the hypothesis
probability of rejecting the null-hypothesis
H if tte
o
H , by using the criterion, tends
o
N (the total saraple size on which the criterion is based) tends to
to one as
infinity for every fixed level of significance and for every admissible hypothesis
inconsistent with H.
o
The test criteria are said to be asymptotically equivalent
if the probability of the respective tests contradicting each other tends to zero
as
N tends to infinity for every admissible hypothesis (irrespective of whether
it is consistent or inconsistent with H).
o
\'laId (1943) we have the follo\'Ting theorem:
THEOREM 2.6.1.
"W_ -='W _
Mr
From the work of Neyman (1949) and
All the test criteria mentioned in sections 3 and 4 are consistent
for the respective null-hypothesis; moreover, any two of these criteria to test
the same null-hypothesis are asymptotically equivalent.
As any of
hypothesis
the~e
test criteria is consistent for the respective null-
H, the power of the test tends to one as
o
N tends to infinity for
every fixed alternative hypothesis, that is, a hypothesis which is admissible
but inconsistent l'Tith H. The limiting power can be derived, however, by
o
considering a sequence of alternative admissible hypotheses converging to the
Inlll-bypothesis
H ; this asymptotic power, in a way, reflects on the performance
o
of the test in the immediate neighborhood of the null-hypothesis.
by
l~tra
The results
(1958) and Diamond (1963) show that under the conditions mentioned
in section 3 and under suitable sequences of alternatives (tending to H at a
o
suitable rate) the ~-statistics have limiting noncentral chi-square distributions.
For these details, refer to the appendix.
The noncentrality parameters of these limiting distributions are, in some
sense, measures of departure from the hypotheses tested.
We note here that these
results have been proved only for the X2-statistic based on maximum likelihood
I
estimates.
But in view of the fact that these estimates belong to the class
of BAN estimates, properties of which are essentially used in these theorems,
and that x2statistics belong to the class of asymptotic clrl-square statistics,
discussed in section 3,
vnlich are asymptotically equivalent (at least for
every fixed admissible hypothesis), it is conjectured that these theorems hold
for the general class of x2-statistics (discussed in section 3) using any BAN
estimates.
Finally, we discuss briefly the concept of asymptotic mutual independence
of several tests, that is, of the corresponding test-criteria.
Let
{XN,
'7 )
Y , ... , '-'N
N
be a sequence of k-variate random variables such that the sequence of their
cumulative distribution fUnctions, {FN(x, y, .. " z)), converges to a distribution fUnction F(x, y, •.• , z) as
N -+ 1 0 .
XN'
to be asymptotically mutually independent if F(x, y, .. "
F(x,
10,
••• ,
~
YN' , •• , and
z)
are said
=
eo)F(eo, y, ..• , (0) ••. F(eo/ eo, .. -, z) for every vector (x, y, ... , z)
of real numbers.
For some theorems due to Diamond (1963) concerning asymptotic independence
of two test criteria, refer to the appendix.
In particular, it has been shown
by Neyman (1949) that the test-statistics mentioned in Theorems 2.3,3, 2.3.4
are asymptotically independent of any of the test-statistics, mentioned in the
earlier Theorems. 2.3.1, 2.3.2, for testing a hypothesis Hol which, then, becomes
a part of the assumed model for testing a subsequent hypothesis H '
o2
In some situations, a particular hypothesis H may be considered as an
o
intersection of several component hypotheses
statistics
xi, ~, ....
Hol ' Ho2 ' ••• with respective
It is only if these statistics are asymptotically
xi
mutually independent, at least under Ho' that we can consider
+ ~ +
as a reasonable test-criterion for H. If x2 is any test-statistic for H
o
0
2
then, in general, x and
+ ~ + ••• may nct be asymptotically equivalent
xi
37
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
(to test
H) unless the component statistics are asymptotically mutually
o
independent.
I
_.
3
CHAPTER
Problems of Association
...Some
_.....-..
ow.and Symmetr~
................... ...for
~
.... --
~
- ....
a Single-Multinomial
Distribution
.. _....
~
_~
In this chapter we shall consider, in addition to problems of symmetry,
some problems of association for categorical data arising from a single-multinomial distribution.
These will be seen to be analogues, appropriate for such
a set up, of the usuBl problems of association between variables having a joint
normal distribution.
We shall consider bivariate and trj.variate problems in
some detail and the methods can be immediately carried over to general multivariate problems.
First we shall consider the case of unstructured variables
and, later, the case where some variables are structured.
Here the letters
i, j, k ••• will denote categories of different variables.
We start with the simplest case of two variables denoted by the respective
categories i and j, i =
l,~,
~,
••• , rand j = 1,
••• , s.
The frequencies nij's
have the probability distribution
N~
n
i,j nij~
with
n
ij
i,j Pij
n
,
N =!:i ,J. n ij and!:.1, j Pi J. = 1-
Hypothesis of independence of the two variables
39
1:.
and
.J.
is expressed by
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
II
I
I
I
I
I
I
Ho :
p;J' = P. P . subject to E.p
= E.P . = 1
~
~o OJ
~ io
J oJ
and the well-known X2-statistic is given by
N E (nij i,j
n. n j 2
~~ 0 ) /niOnOj with (r-l)(s-l) d.f.
Hypothesis of symmetry is given by
where pIS are parameters in the model (3.1.1) with r=s.
The
J?
statistic is
easily seen to be
Ie
I
I
I
I
I
I
I
.I
This statistic has been already suggested by Bowker (1948) while the particular
case, r=2, had been considered earlier by McNemar (1947).
Hypothesis of equality of two marginal distributions is given by
where again we have the model (3.1.1) with r=s.
We note here that Hl --> H2
but the converse does not hold unless r::2; that is, H and H are equivalent
l
2
for r=2 and for r > 2 H is a weaker hypothesis than H •
2
l
Let hi (2) = Pio - Poi' i = 1, 2, ••• , r-1.
expressed as h.(p) = 0, i = 1,2, ••• , r-l.
~-
By applying Waldls technique Sathe
(1962) has derived an asymptotic J?-statistic
40
Note that ~ is equivalently
I
,
d.f.
= r-l,
i,j
= 1,2,
_.
where
=1
with 6..
J.J
if i.j and
otherwise.
0
••• , r-l
The statistic is readily seen to be the
~
Xi statistic (as it must according to Th. 2.5.2),
~l
being the sample covariance
This statistic differs from the one suggested by Stuart (1955)
matrix of h( q).
-,.,
who deletes the last term in the rectangular bracket for.Gi we prefer the
statistic
--
(3.1.7) since
~l
is a consistent estimator of the covariance matrix
of h(q) even when H is false while the one used by Stuart is consistent
only if
2
~
ho lds.
Three-dimensional Table
Let us now consider the case of three variables denoted by the respective
categories i,j and k, and with the corresponding probability distribution
i
j
k
IT
IT
n
'
i j k ijk'
,,
with
N
= ~i,j,k
i,j,k
n ijk and ~i,j,k Pijk
=1
= 1,2,
= 1,2,
= 1,2,
·..
,r
·.. , ts
·.. ,
•
Hypothesis of no partial association between the variables "i" and "j", given
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
the third variable "k" is expressed as
41
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
subject to ~iPiok
= ~jPOjk = Pook
and ~kPook
= 1.
This can also be described
as the hypothesis that i and j are independent in their conditional distribution
given k.
There are two equivalent ways to get (3.1.9).
One is to note that
the conditional probability of i and j, given k, is p, 'kip
J.J
conditional marginal probability of i, given k, is p, kip
J.O
p 'kip
oJ
00
00
00
k' while the
k and that of j is
k; this leads to (3.1.9) as the condition that there is no partial
association between i and j, given k.
Another way is to start from the statement
which means that the conditional distribution of i, given j and k, is independent
of j.
Summing the relation Pijk
= ti*k?Ojk
over j, we find that (3.1.10) is
equivalent to (3.1.9).
The maximum-likelihood estimates of the parameters on the right side of
A
A
A
(3.1.9) are given by p,J.O'k = n,10 kiN, P0 jk = noJ'kiN and P00 k = n00 kiN. Using
these, we have the ~ test-statistic for H
ol
(3.1.11)
We observe that the number of unknown independent parameters in the model
(3.1.8) is rst-l and'that in the hypothesis (3.1.9) is rt+st-t-l; hence the
asymptotic ~-statistic (3.1.11) has d.f. equal to ((rst-l) -(rt+st-t-~).
H is seen to be the analog of the usual hypothesis of no "partial
ol
correlatio~r in the study of association of normally distributed variables
just as H (3.1.2) is of the hypothesis of no "total correlation."
o
42
I
Hypothesis of independence of the first two variables "i,j" and the third
variable "k" is expressed by
where
E , jPijo = EttPoQk = 1.
It may also be looked upon as the hypothesis
i
of no multiple association between (i,j) and k or, in other words, as the
analog of the hyPOthesis of no "multiple correlation" in the study of normal
variables.
(3.1.12) is arrived at starting from either of the relations
k 2
n .. n
E
i,j,k
~J~
(nijk -
00
)
/nijOnOOk' d.f.
= (rs-l(t-l)
•
Hypothesis of complete independence of the three variables "i", "j" and "k"
is given by
The X2-statistic is
~
n
n
ioo ojo OOk)2/n
n
n
n ijk - ~~_2~~~
ioo ojo ook '
i,j,k
W
(
I
I
I
I
I
I
I
_I
The X2-statistic is seen to be
N
_.
n
d.f.
= rst-r-s-t+2
•
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
We shall next consider the following three hypotheses that have interesting
relations to the ones discussed above:
Hypothesis of pairwise independence of i, j and k:
Hypothesis of pairwise independence of i and k and also of j and k:
and finally,
which, as explained in section 4 later, can be regarded as one possible fonnulation of the hypothesis of no interaction (second order)
among
the three
variables "ill, "jll and lI,kll •
It is easy to check that H implies H ' but not conversely, and simi02
o5
larly H implies Ho4 ' but not conversely.
03
But we observe that H
o5
n Ho6
= H
02
(i.e., not only H =:> H and H ' but also H and H together -> H )
o6
o2
o6
o2
o5
o5
and also H
o4
n Ho6 = Ho3 •
Thus, H can be regarded as a bridge over the gap
o6
between H and H or between H and H ; it is one of the reasons as to why
o4
o5
o2
o3
H may be regarded as one fonnulation of the hypothesis of no interaction
o6
among. three variables.
Refer to section 4 for further remarks in connection
with H and, in general, the hypothesis of no interaction of second order
o6
among three variables.
44
I
In principle, for each of the three hypotheses H ' H and H ' J,2_
o4
o6
o5
statistics can be derived, but in practice necessary BAN estimates have to be
obtained by solving some complicated equations.
Alternatively we can employ
Wald's method, or in other words, Neyman's linearization technique.
Examples of asymptotic independence
(i)
Consider the following hypotheses:
* :
Ho2
Piok = PiooPook
*:
Ho3
Pojk = PojoPook
-'I
I
I
I
I
I
I
_I
It is easily seen that H = HOI n H* n H* and it can be shown that
o2
o3
o3
(e.g., refer to Mitra (1955)), if H holds, x2-statistics, say
X;2 , and
o3
X3*2 with respective d.f. (r-l)(s-l)t, (r-l)(t-l) and (s-l)(t-l) to test the
xi,
hypotheses HoI' H* and H* respectively, are asymptotically mutually independent.
02
o3
y2
2 2
*2
Hence if ~ is a r -statistic for H with d.f. rst-r-s-t+2, then Xl + X +
2
03
*2
y2
_.2
*2
*2
X3 --> ''3 in pr?bability under Ho3 " Thus, though + X2 + ~ is a valid
statistic for H , we prefer using ~ since it arises as a 'natural' statistic
o3
for H and, moreover, can be computed easily"
o3
XI
(ii)
Consider the hypotheses
*
H*
o2 ' Hoy Ho6 and Ho2 :
45
Pijk = PijoPook •
I
I
I
I
I
I
I
-,
I
I
-.I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
It is seen that H = H*o2
o2
n H*o3 n Ho6 and it can be shown (Mitra (1955)) that,
2
if H holds, x2-statistics x: , X;2 and ~ with the respective d.f. (r-l)(t-l),
o2
*
(s-l)(t-l) and (r-l)(s-l)(t-l) to test H*
02 ' Ho3 , and Ho6 ' respectively, are
asymptotically mutually independent. Hence if
is a X2 -statistic (say (3.1.13))
X;
for Ho2 ' with (rs-l)(t-l) d.f., then x; +
Ho2 holds.
X;, X;2 and
~2
+
~ --> ~ in probability
if
~ can be computed easily, but a natural x2-statistic,
~, for H06 is not easy to compute. Since in the limit as N - > 00 with
probability tending to one, ~ is equivalent to ~ - x;2 - X;2 under H and
02
hence in a sense can be 'estimated' by the latter quantity, the latter 'estimate'
is not a 'natural' statistic for H and may very well turn out to be even
06
negative.
Moreover,
_2
X2
-
*2
X2
-
*2
~
will have asymptotic chi-square distribution
under H and not necessarily under H and hence is not a completely valid
06
02
statistic for H '
06
In this connection note also Plackett's (1962) remarks
quoted in section 4 of this chapter.
Hypothesis of symmetry of
II
i" and
II
j"
for each k is given by
where it is assumed, of course, that r=s in the model (3.1.8).
Note here that
H 7 implies Po j = Pj ° and this weaker hypothesis can be tested by using the
o
1 0
10
statistic (3.1.5) (With nOj replaced by n Oj etc.) with r(r-l)/2 d.f.
110
statistic for the stronger hypothesis H is
07
t
E
k=l
E (niJok - n jik ) 2/(nijk + n JOik ),
i<j
46
A proper
d.f. = rt(r-l)/Q •
I
.-I
Hypothesis of complete symmetry of "i", "j" and "k" is given by
with r = s = t in the model (3.1.8).
The
x2
statistic is seen to be
(nijk+nikj+njik+jki+~j+~ji)/6]2
Lnijk -
n ijk + n ikj + n jik + n jki + kij +
~ji
I
I
I
I
I
I
_I
+ 3
~
i~k
[nO"k - (niok+noki+n.
22
2
2
00
K12
2
)/3J /(no ·k+niki+n. "i)
22K1
with r(r-l)(5r+2)/6 degrees of freedom.
Hypothesis of equality of three marginal distributions is
assuming, again, r=s=t in the model (3.1.8).
Pi00 = P0 i 0 = P00 i
47
It is equivalently expressed as
= Q.2
i=1,2, ••• , r-l
I
I
I
I
I
I
I
-,
I
I
I.
I
I
I
I
I
I
I
2
where Q's are unspecified, and the Xl statistic can be derived easily by the
generalized least squares technique.
I
~oo'
·.. , q(r-l)oo ] '
~ = L~lO'
~2o'
·.. ,
= [~Ol'
~o2'
·.. , qoo(r-l)J,
= diagonal
(~oo' i
~(r_l)oJ,
= 1,2,
N~= L~ - ~ l}.'I3]
a,(3
q'
••• , r-l)
~12 = [~jo] i,j = 1,2, ••• , r-l
Ie
I
I
I
I
I
I
I
I·
31 = Lqloo'
q'
...:3
~ll
Let
, ,]
= [gl' ~,
513 '
etc.,
etc. ,
= 1,2,3,
and finally
-
-
Note that A is the 'sample covariance matrix' of q and, hence, is nonsingular
almost everywhere excluding, of course, the degenerate case where some variables
(or rather the associated probabilities) are linear functions of the reamining
2
variables (i.e., their corresponding probabilities) •. The Xl statistic is then
seen to be
48
I
d.f.
= 2(r-l)
The method can be innnediately extended to the case of k variables; the statistic
then has (k-l)(r-l) degrees of freedom.
Cochran L1950J has offered a statistic
for the general k-variate problem only for the special case r=2; even for this
case our statistic, though slightly complicated from the computational point
of view, is expected to be asymptotically more efficient.
••I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
One Structured
"..--_...
_ ....
2.
Variable
..... -.
~
Let us suppose that the first variable "i" is structured with
I
the corresponding scores.
I
tured case.
a~s
l
as
As mentioned already it is possible to ask the same
questiona and handle them the same way in the structured case as in the UllstrucHowever some other kinds of questions, that can be posed only in
the structured case, may be more relevant from the point of view of the
I
I
I
experimenter.
Consider the case of two variable, represented by
a. for the first variable
J..
J
" l." •
(3.1.1), with scores
HyPOthesis of independence of "average i" and
is expressed by
I
L: a.p . . /p
Ie
I
I
I
I
I
I
I
••I
i
l
J..J
0
j
is independent of j, say t**{ unlmown)
which states that the conditional mean of the first variable (taking value a.
l
over the
1
th category), given j, is independent of j.
It is easy to see that
. p .) = O. Thus, the hypothesis of
(3.2.1) is equivalent to,L:.J.. a.(Pi'l
J PlO oJ
independence
(3.1.2) implies 'mean-independence' characterized by (3.2.1) but
not vice-versa.
(3.2.1) even
if
The experimenter may very well be interested in testing just
().1.2) is not true. Test statistic for (3.2.1) can be worked
out by the usual methods; a statistic based on the conditional distribution
method, discussed later in this chapter, is offered by
(4.1.6) in the next
Three-dimensional Table
Consider the case of three variables, with the distribution given by
50
I
(3.1.8).
Hypothesis of no partial association between "average i" and j,
given k J is expressed by
~ aiPijk/POjk is independent of j, say t**k (unknown)
~
which states that the conditional mean of the first variable, given j and k,
is independent of j (the category of the second variable).
It is easy to see
that (3.2.2) is equivalent to ~i a i (Pijk-PiokPOjk/POOk) = 0; thus no partial
association between i and j, given k, represented by hypothesis H (3.1.9)
ol
implies (3.2.2), but not vice-versa.
The experimenter may be interested in
testing only (3.2.2) and not necessarily (3.1.9).
The statistic for testing
(3.2.2) can be worked out; a statistic based on the conditional distribution
method is given by (4.3.4) with ~jk
= nijk/nOjk
•
Hypothesis of independence of "average i" and (j,k) is expressed by
It is equivalent to ~i ai(Pijk-PiOOPOjk)
= 0,
and thus independence of i and
(j,k) (given by H with a change of variables) implies (3.2.3) but not viceo2
versa.
A conditional-distribution statistic to test (3.2.3) is given by
(4.3.2), with ~jk
= nijk/nojk
•
Hypothesis of no interaction between variables "j" and
"average in is expressed by
II
kIt with respect to the
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
where tIs are unknown functions and a statistic based on the conditional
distribution method is given by
(4.3.6) with Qo""'1Jok = noJ.J0k/nOJok. The hypothesis
is in the spirit of the hypothesis of no interaction in ANaVA and states that
the conditional mean of the first variable, given the second and third variables,
depends on their categories only through the additive relation given by the
right hand side of
(3.2.4).
Ie
I
I
I
I
I
I
I
••I
52
I
l=
...-...
Two
or r.fore
-""IlMIW.........
structured
Variables
........
.............
In this case, we can not only pose questions as in Sections 1 and 2
above, but we can ask questions of a more general nature and analyze the data
more meaningfully.
Two-dimensional
Table
..
......-w
~...............
Consider again the set up (3.1.1) with both variables "i" and "j"
structured and with [ail and [b j
}
as the respective scores.
(3.2.1) of independence of "average i" and
j
If the hypothesis
is disproved, in this case we can
go ahead and ask the question whether the regression of the first variable on
the other is linear and, if so, we can estimate the regression coefficient.
HYPothesis of linearity of regression of i on
(3.3.1)
1: a.p. j/p j .
i
~ ~
0
j
A + IJ. b j .
A conditional statistic for testing (3.'.1) is offered by (4.1.8) with
~j
= ni/nOj '
HYPothesis of equality of means of the two variables is
.'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
53
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
and the
x~
2
statistic is c jg with 1 degree of freedom.
Consider the 'symmetrical' case with r=s and a
= b ; the hypothesis
i
i
of equality of means is then seen to be much weaker than H , the hypothesis
2
of equality of two marginal distributions given by (3.1.6), so that the
hypothesis of equality of means might be of interest when the stronger assertion
H might not hold.
2
Consider again the set up (3.1.8) with variables "i" and "j" structured
and [ail, {b } as the corresponding scores.
j
answer questions of type
(3.2.2), (3.2.3)
and
In this case, we can not only
(3.2.4),
but if these hypotheses
are rejected, we can even explore the kind of dependence that exists between
the mean "i" and the level of j.
We can test, for example, the following
hypotheses:
Hypothesis of linearity of regression of i on j
Hypothesis of linearity of regression of i on j and of equality of regression
coefficients
Hypothesis of linearity of regression of i on j and of equality of regressions
54
I
We can test
(3.3.4) qr (3.3.5) starting from either (3.1.8) or (3.3.3) (of
course together with
Thus, i f
(3.1.8)) as a model by the methods discussed in Chapter 2.
L~, X~ and X~ are any valid X2 -statistics to test (3.3.3),
(3.3.4) and (3.3.5) respectively, X22 - Xl2 can be used as an appropriate teststatistic for the hypothesis
~l
starting from
= ~~ = .•. = ~t
'
(3.3.3) as the model; similarly X~ - X~ can be used for the
hypothesis
~l
with
with
(3.3.3)
= ~2 = ••• = ~t'
2
as the model and X
3
-
~l
= ~2 = ... = ~t
2
X can be used to test the hypothesis
2
(3.3.4) as the model.
Statistics based on conditional distributions are offered by
(4.4.2),
(4.4.4) and (4.4.B) for testing (3.3.3), (3.3.4) and (3.3.5) respectively, with
~jk = nijI!nOjk·
Let us now consider the case where all the three variables "i", "j" and
Suppose the scores are ta ), (bjJ and lc ) respectively.
k
i
Here, in addition to the hypotheses considered above, we can test the
"k" are structured.
Hypothesis of linearity of regression of i on j and k, expressed by
55
-'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I.
I
I
I
I
I
I
I
A statistic based on the conditional method is given by (4.5.2) with
/n
a.
'"1J'k = n,~J'k oJ·k·
In the study of association among
different variables, when at least
two are structured, we have been working in the spirit of regression (i.e.,
prediction) rather than in the spirit of correlation.
In other words, as
mentioned in Chapter 1, we have not been trying to use a single measure for
any of the various types of association.
The tests of various kinds of
independence have corresponding non-centrality parameters which can be interpreted as a single measure of association for each case.
However, for the
structured case we can do much better and investigate, partly, the kind of
association.
For various possible measures of association between different
variables, the reader is referred to the series of papers by Goodman and
Ie
Kruskal ([31J, [32J and [33J).
I
I
I
I
I
I
I
Hypothesis of equality of means of three structured variables is
,.
I
~
i
a.p.
~ ~oo
= ~j
bj P .
oJo
= ~k
ckP
00
k'
with la.}, (b.) and (c ) as the corresponding scores for categories of the
~
J
k
2
three variables. ' This is a linear hypothesis and the Xl statistic is
innn.ediately available by the generalized least squares technique.
NA. ll
NA12
etc.
= (~i,j aibj~jo)-Y1Y2
~ = LA~J,
[m~J~
M = ~A- l
~
"""",,,,"'"
56
,
etc,
,
Let
I
-'I
and finally,
with a,
~
= 1,2,3.
Then the statistic is
d.f. = 2 •
Remarks similar to the ones in connection with H in Section 1 can be
o9
made here also. This method also can be immediately extended to the case of
k structured variables; the statistic then has k-l degrees of freedom.
For the 'symmetrical' case With r=s=t and a. = b. = c., this hypothesis
l.
l.
l.
(3.3.7) is weaker than H given by (3.1.23) and may be of interest when the
o9
latter does not hold.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
57
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
amonjl
Three Variables
.............
- . _.
. ..
..:--~ ."..
~~
H in Section 1 implies H* :
o6
o6
A
hypothesis of no interaction for three variables essentially menas that Pijk
(with three subscripts) can be meaningfully expressed in terms of quantities
eacb depending on at most two subscripts.
One possible way of doing this is
**
offered by H*
o6 ; the other might be, e.g., Ho6 :
*
H
o6
Pijk
= t ij *
+ t*jk + ti*k·
may be looked upon as an hypothesis of no interaction in the multiplicative
set up
while the other as one in the additive set up.
The meaningful inter-
pretation behind H ' given by (3.1.18), is already explained in Section 1 and
o6
affords at least a partial justifection for preferring either H and H*
o6
o6
(which is imp~ied by H ) to H** •
o6
o6
The multiplicative set up H* of no
o6
interaction hypothesis seems to be more meaningful for probabilities pIS just
as the additive set up seems for the 'average' values of response in the
traditional normal ANOVA as well as in the catagorical set up as proposed by
(3.2.4) in this chapter 'and also in the later chapters.
The hypothesis of no interaction in the form H given by (3.1.18),
o6
however, is not mRthematically very convenient to handle because of conditions
of the type
that follow from (3.1.18).
As noted earlier, Ho6 implies that Pijk be of the
form t ij* t*jk ti*k' with tIs defined suitably.
therefore, decided to work with
58
Roy and Kastenbaum [1956J,
I
H*
o6 :
Pijk
= t ij *
t*jk ti*k
as the hypothesis of no interaction among
H*
o6
--I
'
three variables.
They showed that
also can be looked upon as a bridge over the gap between the hypothesis
H
of complete independence and the hypothesis H of pairwise independence
o4
o3
or over the gap between the hypothesis H of independence of (ij), k and
o2
the hypothesis H of pairwise independence of i, k and of j, k; hence H*
o6
o5
also can be regarded as a reasonable hypothesis of no interaction. It can be
shown that H* iS equivalent to (r-l)(s-l)(t-l) independent constraints
o6
=
Pijk Prsk
Pisk Prjk
So
any
r 2 -statistic
to test
Pijt Prst
Pist Prjt
H:6 will have
i=l,
j=l,
k=l,
,
··.... ,, r-l
s-l
t-l
·.. , •
(r-l)(s-l)(t-l) d.f ••
To obtain maximum-likelihood estimates we use the method of Lagrangian
multipliers; let
r-l
- E
a-l
E
t-l
E
i
j
k
I!ijk
- log Pisk - log Pijt - log Prjk - log Prst ),
where ;.. and I!'s are multipliers corresponding to the constraints
E. . k p. . k = 1 and (3.4.2) respectively.
~,J,
~,J,
59
Equating the derivatives of
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I.
I
-
f(p) to zero, we get the equations
(n
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
rs t + ~ rs t) (n rJ'k + ~ rJ'k)
where
t-l
~ijt = Ek ~ijk '
s-l
~isk = Ej ~ijk'
s-l
~ist = E j ~ijt'
~rjt =
r-l
~rjk = E i ~ijk
r-l
r-l
E.~ IJ..~J-t'
~ rs k =
E.~ ~.~s k
and
r-l
~rst = Ei ~ist •
An iterative method for solving
discussed in [38].
(3.4.3) and the programming for a computer is
2
Substituting these the X - statistic is seen to be (Roy and
Kastembaum [1956])
r
s
t
E E E
i=l j=l k=l
~2
ijk
/ (nijk +
~ijk ~ijk)'
where
~. 'k
J.J
= +1 if ijk = rst (the pivotal subscripts), = -1, +1,
-1 according as just one, or two, or three subscripts differ from the
corresponding pivotal subscripts, respectively.
For the special case r=s=t=2,
(3.4.3) reduces to just one cubic equation
(n
+~) (n
2ll + ~)
222
(n22l - ~) (n2l2 - ~)
,
60
I
having, therefore, at least one real root.
Even when all the roots are real
it can be shown that the numerically smallest root leads to the maximumlikelihood estimate.
The hypothesis of no interaction in categorical data has been discussed
qUite a lot in literature, (refer to, for example, L28J, [48J, [54J, [60J, [63J.)
The
hj~othesis
of zero interaction (of the second order) has been framed in
various wasy and different test procedures have been suggested.
2x2x2 table with 2 categories for each of the three variables.
Consider a
In a two-
dimensional table for two variables, the first-order interaction may be said
to be zero when the variables are independent.
But in defining zero second-
order interaction in a three-dimensional table (for three variables "i", "j",
and "k") we can allow "i" and "j" to be associated provided the degree of
association is the same for each category k of the third variable.
Various
measures of association between two variables have been proposed for data
arranged in a 2x2 table and use of different measures will naturally lead to
different definitions of interaction for three variables in a 2x2x2 table.
Accordingly, Simpson [1951J introduced a function Wsuch that
W(Plll' P12l' P2ll' P22l') measured the degree of association between" i" and
"j" with k=l.
The condition for zero second-order interaction is then
W(Plll' P12l' P2il' P22l ) = W(P l12 , P122 ' P2l2 ' P222 )·
equation must be equivalent to
For consistency, this
W(P lll , Pl12 ' P211 , P2l2 ) = W(P12l , P122 ' P22l ' P222)
and
.-I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
.,
61
I
I
'.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
Bartlett [1935J suggested the function
This leads to consistency and his condition of zero second-order interaction
is
PIll
P121
P221
P211
=
Pl12 P222
P122 P212
It is seen that the hypothesis of zero second-order interaction (for three
variables) offered in the form H* ' or equivalently in the form of constraints
o6
(3.4.2), by Roy and Kastenbaum [1956J for an rxsxt table is a straight-forward
extension of Bartlett's criterion (3.4.6).
One may define the function
~
as
the condition of no second-order interaction will then become
and is seen to be consistent.
The natural extension for three variables
arranged in an rxsxt table would give
i=1,2, ••• , r-l, j=1,2, ••• , s-l and k=1,2, ••• , t-l, as the conditions for
62
I
no second-order interaction.
These conditions can be seen to be equivalent to
the hypothesis
mentioned above already and has been called the hypothesis of no interaction
in the additive sense.
2
The Xl statistic can be obtained by the methods already
discussed to test this linear hypothesis; but we prefer the formulation H* as
o6
the hypothesis of no second-order interaction in view of the physical interpretation behind it as already pointed out in the second paragraph of this section
and also for reasons discussed immediately later in this section.
As noted by Simpson, there can not be a unique way of defining either
interaction or hypothesis of no interaction in a contingency table with three
or more responses.
Corresponding to every reasonable measure of association
between variables (i,j) given k, there will eXist a measure of interaction
between three variables i,j,k giving rise to the corresponding formulation of
the hypothesis of zero interaction.
*
Denoting by Pijk the conditional probability
Pijk/POOk of (i,j), given k, Bartlett's condition corresponds to
* P22k
* _
Pllk
Pllk P22k
* P2lk
*
P12k
P12k P2lk
, k=1,2,
as the relevant measure of association between the first two responses in a 2X2
table given k.
For a general rxs table, given k, one can think of
*
*
Pijk Pi'j'k =
Pijk Pi'j'k
Pij'k Pi'jk
Pij'k Pi'jk
*
*
-
63
.'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
.,
I
I
'.
I
I
I
I
I
I
I
as a measure of association between the first two responses with respect to
pairs of categories (1,i') and
(j,j'). The condition that this be the same
for all k leads to the constraints suggested by Roy and Kastenbaum L1956J. An
alternative measure, for example, could be
_
Pijk Pook
,
Piok Pojk
(or rather its logarithum); if the conditional distributions of i and j are
independent, given k, then the above ratio is one.
The corresponding hypothesis
of no interaction would be
Ie
I
I
It will be noted that H satisfies this condition; on the other hand (3.4.7)
o6
is also seen to be of the form
*
I
I
I
I
,.
I
I
Ho6
Pijk
= t ij *
t*jk ti*k
(3.4.7) also suffers from a draw-back, similar to that of Ho6 ' that
mathematically it is not very convenient to handle because of existence of
side conditions.
MOreover, (3.4.7) is not symmetric in i,j, and k as one
would require of the hypothesis of no interaction . among
three variables.
Thus, H* (or equivalently, the constraints (3.4.2) that follow) seems
o6
offer the most suitable way of formulating the hypothesis.
to
If 1(12)' 1(23)' 1(31)' and 1(123) denote the i?-statistics for testing
64
I
the independence of i
and j, j and k, k and i, and finally of i,j and k,
then Lancaster L195l J proposed 1
*2
2
2
2
2
~ 1(123) - 1(12) - 1(23) - 1(31) to test
the hypothesis of zero second-order interaction and showed that the statistic
has asymptotically chi-square distribution if i, j and k are independent.
However, Plackett L1962J shows that the statistic is not valid (1. e., not
necessarily asymptotically chi-square distributed) under the conditions (3.4.2)
2
2
*2
and also that 1(31) + 1(23) + 1
is not a valid criterion to test independence
of k and (i,j).
As pointed out earlier in example (ii) of section 1, a
X2-
2
statistic (say 1 ) obtained to test (3.4.1), i.e., the hyPOthesis of zero secondo
order interaction as formulated by Roy and Kastenbaum, will have the reasonable
222
property that 1(31) + 1(23) + 1 is asymptotically valid for testing that k is
0
independent of (i, j).
It may be noted here that H <=> H
02
03
notation of example (ii) of section 1, where H*1 : Pij
o
0
it can be shown that, under Hoy 1
that 1
2
-> 1 22
3
follows that 1
+.\*2
= p.
n H*ol '
~oo
P j.
0 0
in the
Moreover,
2
*2
and 1
are asymptotically independent so
2
1
in probability.
From example (ii) in section 1 it then
2
2
*2
*2
((Le.)" 1(123)) -> 1
+1
1
2
3
+ 1
*2
3
2
+ 1 6 ' in probability,
if H03 holds, t. e., if the three variables are mutually independent.
Thus
y2
[y2
12
y2
y2}
"'6 - "'(123) - (12) - "'(23) - "'(31) ->0 in probability if the three variables
are independent.
Thus Lancaster's statistic for no interaction hypothesis and
2
any valid 1 -statistic for H:
6
are asymptotically equivalent only under the
stronger condition that the three variables be independent and not necessarily
so under H* i moreover it reqUires that H* ' H* and H* jointly (ie., H )
01
02
04
06
03
hold, in addition to H* ' in order that it have an asymptotic chi-square
06
distribution.
Another method has been suggested by Woolf [1955J, Plackett [1962J
65
.'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
.,
I
I
I.
I
I
I
I
I
I
and Goodman L1963aJ to test the hypothesis of no interaction.
For the case of
three variables, the hypothesis of no interaction is essentially proposed in
the form (3.4.2) or, eCluivalently, in the form
b rst + b rJ'k + b,1S k + b,1J't - (b,1S t + b,1J'k + b rs k + b rJ't)
i=1,2,
j=1,2,
k=1,2,
where b ijk = log Pijk and so on.
maximum likelihood estimates
=0
·.. , r-l
·.. , s-l
·.. , t-l
An asymptotic test is then based on the
~,1J'k = log (n,1J'kiN). The asymptotic covariance
1\
I
Ie
I
I
I
I
I
I
I
,.
I
matrix of bijk's is estimated by using sample proportions (nijk/N) for the
probabilities p, 'k and, thus, also the sample variance (or covariance matrix)
1J
of the m.l.e. of an 'interaction contrast' of type (3.4.8) (or of a set of u given
interaction contrasts, u ~ (r-l)(s-l)(t-l)).
A limiting
r 2 - statistic is then
obtained by making use of asymptotic normality of these interaction contrast
estimates.
For the special case r=s=t=2, the statistic comes out to be
with 1 d.f..
The advantage of the expression (3.4.9) is that no eCluations
have to be solved to calculate a
r 2 -statistic, while with the earlier method
one cubic eCluation, viz. (3.4.5), has to be solved to obtain the numberically
smallest root.
To determine which test is more accurately approximated by
the chi-sCluare probabilities when the sample size N is moderate, further research
is needed.
2
(This is also true of most of the alternate r -statistics proposed
for various problems in categorical data.)
66
Reference may be made to some other
I
statistics proposed by Goodman [1964bJ; these are based on the 'conditional
method' described in section 5.
For the case of more than three variables, the method of Bartlett [1935J,
Roy and Kastenbaum [1956J has been further developed by Good [1963J, while the
method of Wooli' [1955J and Plackett [1962J has been extended by Goodman [1964aJ.
The reader is referred to these for the case of more than three variables.
It may be mentioned in passing, as noted by Goodman [196l~aJ, that the earlier
.'I
I
I
I
method is essentially the Lagrange-multiplier method (based on restricted
maximum likelihood estimation of pIS), while the latter is, essentially, Waldls
method (given in Theorem 4.1 of Chapter 2) based on unrestricted maximum
likelihood estimation of pIS (see Aitchinson and Silvey L196oJ).
For further discussion regarding the hypotheses of no interactions of
various orders, refer to Bhapkar and Koch [1965, a,bJ.
I
I
I
_I
I
I
I
I
I
I
I
.,
67
I
I
'.
I
I
I
I
I
I
I
5. -Conditional Distribution Approach
----------..--..Some of these association problems can be tackled by using conditional
distributions and thus, in effect, by reducing them to the 'analysis of variance'
problems discussed in the next chapter.
This point had already been noted by
Bartlett [1937J in connection with two-dimensional contingency tables.
illustrate this method by considering the case of three variables.
.nJ, k
~,
n.~ jk~
n· jk
~
p
rr
ijk
i,j,k
,.
I
(~i,j,k Pijk
= 1)
can be written as
4' = l
iT
jf
nojk!
nijk !
¥
ijk ] [
IT (Pijk) n
i
Pojk
I
I
I
I
I
I,
The
probability distribution
Ie
I
We shall
= 4>1 4>2'
Let Pijk/POjk
*
= Pijk'
*
&u that ~i Pijk
= 1.
N~
'iT n .k.i
j,k OJ
n
j,k
n
ojk
Pojk
J
say •
Then ~2 denotes the probability
distribution of n ..k'S, given n .k'S. Note that the number of independent
~J
oJ
parameters in 4>, ~l and ~2 is rst-l, st(r-l) and st-l respectively. We may
*
consider the parameters Pijk'S
and POjk'S instead of Pijk's.
Then it is qUite
reasonable to require that the hypotheses, which are expressed in terms of
P* I s only, should be tested by criteria based on
ijk
~l
only.
If this principle
* ' s only will be the same as that on Pijk ' s if
is followed, the test on Pijk
and
"~'
are factors.
11
j"
We shall illustrate by considering three simple examples.
(i) Hypothesis of independence of two variables
68
I
--I
is equivalent to the hypothesis
that is the hypothesis of homogeneity when Ij' is a factor (see 4.1).
It is
well-known that
E
d.f.
N ij
= (r-l)(s-l)
can be used to test both H and H*
o
0
(11) Hypothesis of no partial association
between the variables" i" and "j", given the third variable "k", as given by
I
I
I
I
I
I
_I
I
I
is already seen to be equivalent to (3.1.10):
* is independent of j, where" j"
Pijk
are factors (see
and
"k"
4.2.4). As a result, it has been observed (Mitra L1955J)
2
that (3.1.11) may be used as a X -statistic with (r-l)(s-l) t degrees of freedom
to test both H and H* •
o1
ol
(11i) Hypothesis of independence of variable" itt and
I
I
I
I
I
-,
variables" j,lt" is given by
69
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
i.e., (3.1.12) with a slightly different notation.
independent of j and k.
If
H*2:
p.* 'k is independent of j and k, with "j" and "k"
lJ
factors (see (4.2.2), then
o
N
can be used as a
' r:
1,J,
niOONnOjk)2 /
k ( n,lJ'k n.100 nojk
r 2 - statistic with (r-l)(st-l) degrees of freedom to test both
H02 and H*
02 •
The use of such a conditioning principle is perfectly valid in the
sense that the unconditional probability of rejection of the null-hypothesis,
say H , remains less than or equal to
o
Q,
when H is true, if the conditional
0
probability of rejecting H , when H is true, is less than or equal to Q for
o
0
every set of values of conditioning variables. Thus, in the context discussed
above
l Ho is rejected/ H0 , [noJ'k] } ~
p
Q
for every set of marginals nojk's implies
P
i.
H is rejected' H ) <
o
0
-
Q.
Hence any conditional'exact' test of size
Q
for every set of values of conditioning
variables remains an 'exact' test of size
Q
even unconditionally for the
hypothesis under consideration; obviously the conditional test is only approx-
70
I
imately of size ex for the hypothesis H , unconditionally, if it is so in the
o
conditional set up. In general, we cannot say whether such a conditional test
is optimal, in some sense, unless the conditioning variables happen to be
sufficient statistics for the 'nuisance' parameters and the conditional distribution involves only the parameters of interest.
These requirements are satisfied
in the categorical problems being discussed under the product-multinomial
probability modeL
It can be then shown that for some simple problems optimal
conditional tests are optimal even unconditionally; for example,
Lehmann [(1959), p.134].
refer
to
That it is so for the simple 2X2 table was already
shown by Tocher [ 1950] when he pointed out that the one-sided conditional
Fisher's exact test is optimal against one-sided alternatives for testing the
equality of two binomial probabilities and also for testing the independence
of two responses; refer to 6.2 for further details regarding this case.
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
71
I
I
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
II
CHAPTER
-- ...
4
Some Univariate Problems for a
Product-Multinomial Distribution
In this chapter we shall consider problems analogous to those of analysis
of variance and regression in univariate normal analysis.
We shall study, in
some detail, two dimensional contingency tables with. one factor, one response
and three dimensional contingency tables with two factors and one response;
the methods can be immediately carried over to the case of higher dimensional
contingency tables with one response and more than two factors.
Here the
letter i will denote categories of the responses while the letters j, k will
denote categories of the factors.
Two-dimensional Table
1.
~--
.... ft_
~
.'*'
Here we have the probability distribution
(4.1.1)
~'-1
J.-
(
!
n Oj
.r n'j~
J.
rr
1&1
with the random variables n .. 's subject to constraints
l.J
Hypothesis of homogeneity is expressed by the statement that
(4.1.2)
Pij is independent of j,
72
I
and the well-known X2-statistic to test this hypothesis, as mentioned in 3.5,is
N
n10 noj 2
)
E (n~J' ij·
N
I
n.
~o
n j ,
0
with (r-l)(s-l) degrees of freedom.
The above hypothesis is linear and the
X{ statistic is easily seen to be
2
/I.
E(n .. -njP.)/n' j
ij
~J
0
~
~
/I.
with Pi
= %I
(Ek qk) and
% is the
N/(E.J
weighted harmonic mean
n j a:~ ) ;
0
""J.J
(4.1.4)
and this has, of course, (r-l)(s-l) de'grees of freedom.
This
xi
statistic is
not as well known as the X2-statistic (4.1.3), but should be preferred to the
latter in some situations j for example it has been pointed out by Goodman [1964c ]
Xi
statistic (4.1.4) if
and only if at least one estimated contrast
is significantly different from zero and, hence, the
73
I
I
I
I
I
I
_I
the above statistic can be more simply expressed as
that the hypothesis of homogeneity is rejected by the
••I
Xi statistic as
a test-
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
criterion is naturally related to the corresponding simultaneous confidence
intervals for contrasts among multinominal populations.
If
the variable is structured, with a. as score for the !th category,
~
we can certainly test the hypothesis of homogeneity (4.1. 2) as before; but in
this case the experimenter may not be interested in the hypothesis (4.1.2) and
may want to test a weaker hypothesis, that we may call
H;ypothesis of equality of "means" expressed by the statement that
(4.1.5)
~i
a
i
P ij is independent of j.
2
By using the results in 2.5, it can be shown that the Xl statistic to
test the hypothesis (4.1.5) is
s
(4.1.6)
s
2
- (~ y. a.) /( ~ y.),
j=l J J
j=l J
d.f.
= s-l
where
~. = ~.(a. J
l
~
2
a.)
a .. and y. =
J
~J
J
nOJ'/~J'
If now, in addition, the factor is also structured with the score b.
J
associated with the lth category of the factor (i.e. the lth level of the
factor in the language of design of experiments), we can not only test hypotheses
(4.1.2) and (4.1.5) (if necessary), but we can test a weaker, and possibly
more meaningful, hypothesis that may be called
Hypothesis of Linearity of regression:
(4.1.7)
~ . a. p. .
~
l
~J
74
= f...
+ IJ.b J'
I
By the methods in
2.5, the x~ statistic to test the hypothesis (4.1.7)
is seen to be
,
(4.1.8)
The estimates
,
1\
~
~,
Care
yd - 6c
_ 62
= re
Testing significance of the regression coefficient ~ in
Assuming the model
given by
(4.1.7):
(4.1.7) now, we can test the hypothesis
~
=0
by the methods in Chapter 2, i,e, by considering the statistic (4.1.6) - (4.1.8),
which is (yd_5c)2 with 1 degree of freedom.
y(ye_5 2)
It may be pointed,out here that the statistic
(4.1.6) or the one given
above for testing the significance of the regression coefficient
different from those offered by Yates [1948J.
~
are
Even though the respective
statistics are asymptotically equivalent under the corresponding null hypothesis,
they are not so otherwise.
Our statistics are based on consistent estimators
of the variances involved while Yates' statistics use estimators which are
computed on the assumption that the populations are homogeneous, i.e., assuming
that
p..
~J
is independent of j.
This drawback has been pointed out by Yates
himself for his statisticsj our statistics do not suffer from this shortcoming and, hence, should be preferred.
75
••I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I
I
I
I
This situation is very similar to the one pointed out in connection with
our statistic
(3.1.7) and Stuart's [1955J statistic for the hypothesis of
equality of two marginal distributions.
Ie
I
I
I
I
I
I
I
~
I
76
I
2.
_.
Three-dimensional Table
With i indicating categories of a response and j,k denoting those of
two factors, say treatments and blocks, respectively, in the language of design
of experiments, we have the probability distribution
(4.2.1)
for nijk's, subject to the constraints ~i n
and k
= 1,2,
present.
ijk
= nojk
(fixed), j
= 1,2,
••• , s
••• , t, but here all combinations (jk) may not be necessarily
Lem M be the number of (jk) combinations.
If M = st, i.e., all
combinations are allowed, then the design will be said to be complete.
wise, it will be said to be an incomplete design.
Let ~jk
Other-
= nijk/nojk.
Hypothesis of no treatment and block effects:
(4.2.2)
H:
o
P"k
1.J
= t.**
1.
The X2-statistic to test H is
o
N
~
~
j,k i=l
(n
ijk
-
niO~nOjk)2
d.f.
/ n
n
ioo ojk '
= (r-l) (M-l)
= (r-l)(st-l)
Hypothesis of no treatment effects:
for a complete design.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
(4.2.4)
77
I
I
I
.e
The X2-statistic to test H is
l
r
E
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
E
i=l
j,k
(
_ nojk n iOk )2 nojk niok
j
n
n" k
n'
~J
.
ook
ook
d.f. = (r-l)(M-t)
= (r-l)(s-l) t for a complete design.
Suppose there are v
k
treatments, say, k , k , ••• , k
' in the Eth block.
l
2
Vk
The h:ypothesis H is then seen to be equivalent to the h:ypothesis
l
i
= 1,2,
••• , r.
The numf)er of linearly independent constraints is thus (r-l) E
k
(Vk-l),
i.e., (r-l)(M-t), since E v = M and the constraints for i=1,2, ••• , (r-l)
k k
imply the constraint with i=r.
This accounts for the (r-l) (M-t) degrees of
freedom for the statistic (4.2.5); alternatively one sees that the number of
independent p's is M(r-l) under the model and t(r-l) 'on H •
l
Hypothesis of no interaction between the two factors and the response
"i" is offered in two alternate forms:
(4.2.6)
This will be referred to as the h:ypothesis of no interaction in the additive
sense; the physical interpretation behind H is that Pijk - Pij'k is independent
2
of k (for any two treatments j and jl occuring in the Eth block) or, equivalently,
78
Pijk - Pijkl is independent of j (for any two blocks k and k' containing the
,J,th treatment).
It will be noticed that this hypothesis is offered in the
spirit of the hypothesis of no interaction in the traditional ANOVA.
For the pUrPOse of identifiability of the parameters (4.2.6) may be
expressed in an equivalent form
(4.2.7)
for the allowable M (j,k) combinations, with Ej t ij * = Ek ti*k = O.
We notice
that the relation for i=r follows from these relations for i=1,2, ••• , r-l
involving (r-l)(s+t-l) independent tIS.
(r-l)(M-s-t+l) degrees of freedom.
~
Thus any asymptotic I
2
statistic has
If the design is complete, the hypothesis
will be seen to be equivalent to the (r-l)(s-l)(t-l) linearly independent
constraints
(4.2.8)
Pijk - Pijt -p isk +p ist =0
,
i=1,2,
j=1,2,
k=1,2,
·.. , r-l
·.. , s-l
·.. , t-l
In any case, even if the design is not complete, th.e h.ypothesis H is seen
2
to be a linear hypothesis and, theoretically, the l~ statistic can be computed
by solving only linear equations.
On the other hand, even if the design is
complete, there is no essentially simpler expression in the general case; for
the special case r=s=t=2, though., we have the l~ statistic (in view of 2.5)
given by
I
I
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
with
v
qlll ~ll
=----
,
nOll
and for the case r=s=2 we have the Xi statistic
(4.2.10)
d.f.
= t-l
,
with
This is because with r=s=2, H is equivalent to the statement
2
Pllk - P12k is the same for k=1,2, •• , tj
letting dk = qllk - q12k' note that d's are independently distributed with
variances n
are
~k'
-1
Olk
-1
Pllk P
+ n
P
P
, so that the "sample variances"
2lk
02k 12k 22k
k=1,2, ••• , t.
discussed in Chapter 2
According to the generalized least squares technique
(see Theorem 2.5.5), the minimization of
with respect to Q leads to (4.2.10); t=2 takes (4.2.10) back to (4.2.9).
These
statistics have been already proposed by Goodman L1962bJ and we have shown
2
here that these are the relevant Xl statistics for the corresponding situations.
80
I
Hypothesis of no interaction between the two factors and the response
"i" in the multiplicative sense is offered by
(4.2.11)
The physical interpretation behind (4.2.11) is that for any two treatments
j and j' occuring
in the ktb block p. 'k!P'j'k is independent of k and,
-
~J
~
similarly, for any two blocks k and k' containing the
is independent of j.
l th
treatment Pijk!Pijk'
Note that H is offered, in terms of logarithms of pt s ,
3
in the spirit of the hypothesis of no interaction in the traditional ANOVA
since we have
(1)
log Pijk = t ij *
(4.2.12)
(1)
where, for identifiability, ~j t ij*
,
ti*k
= O.
The tIs in (4.2.11) are
subject to the admissibility constraint
r
~
i=l
t. j * t'*k
~
~
=1
;
it is seen that (4.2.11) or (4.2.12) for i=1,2, ••• , r-1 imply the relation
for i=r as well, since
r-1
Prjk
Thus
vIe
=1
r-1
- i~l Pijk = 1 - i~l
t ij* ti*k
= t rj*
tr*k
•
have (r-1) M independent pt s (under the basic model) expressed in
terms of (r-1)(s+t-1) independent parameters tIs under the hypothesis H so
3
81
I
I
I
I
I
I
el
(1)
= ~k
-'I
I
I
I
I
I
I
I
-,
I
I
I
.-
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
that any valid X2 statistic for testing H will have (r-l)(M-s-t+l) degrees
3
of freedom. For the complete case H is equivalent to the (r-l) (s-l)(t-l)
3
independent constraints
i=1,2,
(1j.• 2.13 )
Pijk
Pisk
=
Pijt
Pist
,
j=1,2,
k=1,2,
·.. r-l
·.. , s-l
·.. , t-l
)
or, in other words,
(4.2.14)
log p.~J'k + log p.~s t - log p.~J"t - log p.~s k = 0 •
One may attempt to solve maximum likelihood equations by a method similar to
that of Roy and Kastenbaum [1956J for the case of three responses or, alternatively, one may 'linearize' (4.2.13) or (4.2.14) and use X~ statistics which
then, as proved in Chapter 2, turn out to be Wald's statistics as well.
These
x~ statistics may be then obtained directly by computing the 'asymptotic sample
covariance matrix' of the sample analogs of (4.2.13) or (4.2.14), i.e., the
"sample covariance matrix" of sample analogs of linearized versions, or can
be obtained indirectly by applying a suitable generalized least squares technique as pointed out in Theorem 2.5.5.
Special cases:
We shall illustrate the above discussion by working out the x~ statistics
for this special case for which these expressions can be derived in a fairly
82
simple manner.
If we work with (4.2.13), we have the constraint
I
I
-.
I
I
I
I
I
I
the 'natural' estimate is
with its 'asymptotic sample variance'
el
so that Wald's statistic with 1 d.f. is given by
~ (q)/"A. •
(4.2.16)
Note that the linearized version of (4.2.15) is
(4.2.17)
with its sample analog F* ( IV
q)
=F
(q),
as it should be.
_
But for finding the
'sample variance' of this sample analog, we have to look at this as
83
I
I
I
I
I
I
I
-.
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
where an asterisk denotes that the particular quantity is to be regarded as
fixed for the purpose of finding the variance; thus its variance is
*2
*-4 -1
+ ql12 q122 n022 P122 P222
'
and on replacing pIS by q's and dropping the asterisks we get again
Isample variance' of the sample analog of (4.2.17).
direct method to the more general case r=s=2, t
?:
~
as the
We shall apply the in-
2; the expression for t=2
will be shown to reduce to (4.2.16) obtained by the direct method.
On the other hand, if we work with (4.2.14), we have the constraint
log Plll + log P122 - log Pl12 - log P121 = 0
with the 'natural' estimate given by
c = log qlll + log q122 - log ql12 - log q121;
the 'asymptotic sample variance' of c is
y =
~ll
+ -~21
--n021 q121
84
I
I
-.
so that Wa1d's statistic with 1 d.f. is given by
(4.2.18)
Note that (4.2.18) is, in general, different from (4.2.16) and these two
statistics are asymptotically 'equivalent'.
r=s=2, t ~ 2
(ii)
If we first work with
(h.2.13), H is equivalent to the statement that
3
P11k/P12k = Q,
where
Q
is an unknown parameter.
k=1,2, ••• , t
Let
I
I
I
I
I
I
el
It can be easily shown that
~k
is the 'asymptotiC sample variance' of eke
sum of squares to be minimized (since e's are independent) is
t
E
k=l
and the minimization with respect to Q leads to the r~ statistic (using
'linearization')
t
(4.2.20)
E
2
ek
-
k=l ~k
t
- (E
e
~)
k=l ~k
2
t
/( E
1
-)
k=l ~k
85
d.f.
= t-1
j
The
I
I
I
I
I
I
I
-.
I
I
I
,-
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
note the resemblance between the expressions (4.2.20) and (4.2.10).
It can
be verified that t=2 reduces (4.2.20) to (4.2.16).
On the other hand, if we ,.ork with logarithmic formulation (4.2.14),
H is equivalent to the statement
3
log Pllk - log P12k =~,
where
~
is an unknown parameter.
k=1,2, ••• , t
Let
(4.2.21)
v
k
=
~2k
+
;
it may be seen that v
the sum of squares
(4.2.22)
t
is the 'asymptotic sample variance' of gk.
k
2
2
L (gk-'T1) /v
we get the Xl statistic
k
k
~
L
--
k=l
vk
d.f.
-
Minimizing
= t-l
. as before, note the resemblance of this expression to (4.2.16) and (4.2.20)
and verify that t=2 reduces (4.2.22) to (4.2.18).
in general, different from (4.2.20) but
The expression (4.2.22) is,
these two statistics are asymptotically
, equivalent' •
Remarks on the hypothesis of no interaction in three-dimansional
contingency tables.
We have made a careful distinction between the hypothesis of no interaction among three responses, discussed in the last chapter, and the hypothesis
86
of no interaction between tvo factors with respect to the response under consideration, being discussed in this chapter, and the hypothesis of no interaction
between two responses and one factor, to be discussed in the next chapter.
The
experimantal set up is different in the three situations and, naturally, the
hypotheses have different interpretations.
When we are dealing with two or
more responses, we are interested in the nature of association between two
responses and we want to study whether some reasonable measure of association
between the two responses remains constant over categories of the factor (in
the 2-response I-factor set up) or of the third response (in the 3-response
set up).
set up.
But the interpretation is quite different in the I-response 2-factor
Here we are mainly interested in studying the effects of the factors
on the distribution of the response; this districution may be characterized
either by cell-probabilities Pijk' or their logarithms, or the mean
~iaiPijk
if the response is structured with a. as the score attached to the ith category,
l
or any other reasonable feature.
-
If the factors affect this feature of the
distribution of the response, the question naturally arises whether the effect
due to one factor, on this feature of the response distribution, remains constant over the categories of the second factor.
This is very much in the
spirit of the hypothesis of no interaction between two factors in the ANOVA
and the hypothesis in the same spirit but in the I-response 2-factor categorical
set up may be called the hypothesis of no interaction between two factors and
the response (or rather the specific feature of the response distribution).
Note that the specific feature at the background of H is the set of probabilities
2
Pijk' i=1,2, ••• , r for each factor-combination(jk), while for H it is the
3
set of log Pijk and for H , to be discussed in the next section, it is the
6
'mean', ~. a. p. 'k' for each factor-combination (jk).
l
l
lJ
Refer to Bhapkar and
Koch L1965aJ for further discussion on the hypothesis of no interaction in the
87
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
three-dimensional contingency tables.
proposed by Roy and Bhapkar [1960J.
The hypothesis, H , of no treatment effects can be tested either by
l
itself on the original model by using statistic (4.2.5) with (r-l)(M-t) degrees
of freedom or by starting from the hypothesis of no interaction (either the
form H or H ) as the basic model (of course along with (4.2.1) ) to test the
2
3
hypothesis that t
ij
*
(in H or H ) is independent of j.
2
3
As pointed out in
section 3 of Chapter 2, a valid statistic is provided by the difference in
2
X -statistic for H and for H (or H ) with degrees of freedom obtained by
2
l
3
subtraction, i.e., (r-l)(s-l).
,e
I
The formulations H , H and H were
6
3
2
88
I
I
3.
Three-dimensional
Table with
the resJ20nse "i" structured.
.,.",..,....
• • ,.,."
,oC"--=- .. _ . . . . .
.~
wI/I4tII'4
••
.rt4/'4
~_
...............
~
It/IrW
Suppose now a. is the score attached to the i th category of the response;
-
~
j
and k denote the categories of two factors, say treatments and blocks, respec-
tively, as before.
Hypothesis of no treatment and block (mean) effects:
H4 :
~i
a
i
Pijk is independent of j and k.
By using the results in 2.5, it can be easily seen that the x~ statistic to
test H is
4
d.f
= M-l
-.
I
I
I
I
I
I
el
where
r
a jk
= i:l ,ai ~jk'
13 jk
and
Hypothesis of no treatment (mean) effects:
r
~ a. PiJ'k
i=l ~
= t**k
I
I
I
I
I
I
I
-.
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
Again using the results in 2.5 (Bhapkar [1961J ), the x~ statistic to test
H is
5
d.f.
= M-t
where Bk = L: j Ct jk h jk' b ok = L: j h jk and M is the number of allowable (jk)
combinations; the summations are, of course, over allowable combinations only.
It may be noticed that H => H4 and H =>
l
o
~
but not vice-versa.
Thus,
if the response is structured, the experimenter can not only test the previous
hypotheses H ' H , if necessary, but may be satisfied with testing the weaker
o
l
hypotheses H and H instead. The same comment can be made with respect to
4
5
~
and H to be stated now.
6
Hypothesis of no interaction between (mean) response and factors "j"
and "k".
Again using the results in (2.5) (Bhapkar [1961J), the r~ statistic to test
H6 is
t
2
s
2
L: L: Ct 0k h
- L: (Bk/hok ) - L: Qo t.,
jk k=l
j=l J J
j k J
d.f. = M-(s+t-l) ,
where to' s satisfy equations
J
s
QJo = L:
j' =1
cjJo, t j "
90
j=1,2, ••• , s
I
_.
and
h
Q,
J
= ToJ
Coo,
JJ
=-
It may be noted that
~k
-
l::
k
= ~k
0
JO
h ok '
J
(Bk/h k) h'1' Coo
J
0
JJ
l:
(h ok h 0' k/h k)'
J
J
0
= h JO
0
-
~k
2
(h jk/h k)
0
jfj'
(4.3.6) and (4.3.7) are similar to the expressions for
"error sum of squares" and "normal equations", respectively, in ANOVA, T and
j
B playing the role of "treatment total" and "block total" respectively. The
k
fundamental difference, however, is that CIS here depend not only on the design
but also on the observed proportions.
In normal ANOVA, designs can be chosen
suitably so that the normal equations have neat closed solutions.
fails here for the corresponding equations (4.3.7).
This approach
For example, even for a
complete design (which may be called" randomized block design"), there is no
essential simplification in the equations
this case are, of course"
(4.3.7); the degrees of freedom in
(s-l) (t-l) •
If the hypothesis of no interaction, H , is accepted, the estimates of
6
treatment effects, t*j*' are provided by tj's obtained by solving equations
(4.3.7). We can then go ahead and test the hypothesis of no treatment effects,
, either by using the statistic (4.3.4) with (M-t) d.f. on the basis of the
5
original model (4.2.1), or by using the statistic
H
t,
J
on the basis of the model
that
d.f. = s-l
,
(4.3.5) (of course, along with (4.2.1)). We note
(4.3.8) = (4.3.4) - (4.3.6) according to Theorem 3.4 in Chapter 2.
91
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
structured factor "j" •
• __
......
,....
-..r
~
Let us now suppose that a., b. are the scores that go along with !th
1.
J
cateGory of the response and the lth category of the first factor, respectively,
while k denotes the category of the second factor.
As before, let us say that
j refers to treatments and k to blocks; in the language of design of experiments,
then, b
j
denotes the lth level of the treatment-factor under study.
HJ~othesis
of linearity of regression:
r
~
(4.4.1)
i=l
The
a. PiJ'k = ~k + ~ b , •
J
1.
Ii statistic (Bhapkar [1961J) is seen to be
(4.4.2)
2
t
2
t
2
~ ~ a'k h' k - ~ (Bk/hok ) - ~ (Yk - 6 ) /(l k - wk ), d.f.= M-2t,
k
j k J
J
k=l
k=l
where
Ik =~
J
~
and
,'I
k
= ~j
b,
J
h'k
J
b~
h jk '
,
= r{,/h
(others defined as in Section 3).
Ok
the estimates are given by
,
92
If H is not rejected,
7
I
Hypothesis of linearity of regression with the same regression coefficient for all k (i.e. blocks):
8:
H
r
~
i=l
a. p. jk
l
l
= Ak
+ Il b .
J
The x~ statistic is seen (Bhapkar [1961J) to be
(B2k /hok ) - (y - 62 )/(~ - w), d.f.
(4.4.4)
= M-t-1 ,
where
t
t
Y = ~ Yk' 6 = ~ 6k ,
k=l
k=l
t
t
jk' and ,~= ~ wk
k=l
k=l
~= ~
•
On the other hand, if we want to test the hypothesis H8, starting from H as
7
the model, i.e., in effect we test
wher Ilt s are given by (4.4.1), then we have the statistic
t
(4.4.6)
~
k=l
that is,
1\
with Il = (Y-6)/(J-w) as the estimate of Il if (4.4.5) (i.e.(4.4.3) ) holds.
93
-'I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I
..
I
I
I
I
I
I
..I
I
I
I
I
I
I
,e
I
We note that the statistic (4.4.6) is (4.4.4) - (4.4.2) according to Theorem.
3.4 in Chapter 2.
Hypothesis of linearity of regression with the same regression for all
k (i.e., blocks):
(4.4.7)
2
The Xl statistic is seen (Bhapkar L1961J) to be
(4.4.8)
~ ~
j k
2
2 2 2
a jk h. k - (1 G - 2ymG + Y h)/(Y, h - m ),
J
where
s
G=
~
j=l
t
B , m= ~
k=l k
k=l
t
T. =
J
~
~,
h
=
t
~
k=l
h
ok
,
and other quantities are defined as before.
On the other hand, if we want to test H ' starting from H as the
IO
8
model, i.e., in effect we test
A.
1
= A.2 =
= A.t
'
where A.'S are given by (4.4.3), then we have the statistic (4.4.8) - (4.4.4),
i.e. ,
(4.4.10)
t
~
k=l
2/2
(B k
hok ) + (y-6) /U-w)
22
- ( q G2
- 2ymG + Y h)/Uh-m )
d.f.
= t-l
•
Similarly, if we test H , starting from H as the model, i.e., in effect we
10
7
test
(4.4.11)
H :
12
=~ 2
= •••·= 1
~t' ~ = ~
1
2
~
= .•. = ~t ,
where ~'s and ~'s are given by (4.4.1), then the proper statistic is (4.4.8)(4.4.2) with 2t - 2 degrees of freedom.
I
I
~
I
I
I
I
I
I
~
I
I
I
I
I
I
I
~
95
I
I
I
..
I
I
I
I
I
I
..I
5.
Three-dimensional Table with structured response "ill and
...
. . . . . . . _ .......... </r4.
......
--I
......
A"W
......... ~
~
-""'11#
Suppose now that a., b. and c k are the scores associated with the categories
~
J
of the response and two factors respectively.
b
then can be looked upon as
j
the ,Jth level of the first factor while c
as the kth level of the second
k
factor; the experiment is then a simple one , without any blocks , involving
treatments which are combinations of various levels of two factors.
case
vie
but we also have some additional possibilities.
Hypothesis of linearity of regression:
.-...-
--rw
••
~...................
_"ftttI
r
H : ~ a·p··k=A.+lJ.b.+vc k •
13 i=l ~ ~J
J
The 1
2
1
In this
can, of course, test hypotheses mentioned in sections 2, 3, 4 as before
statistic is seen (Bhapkar[1961J) to be
1\
I
I
I
I
I
I
..........
A
A. G - IJ.
A *
Y - v 'Y
d.f.
= M-3
where
A A
-,.., IJ.
A
and v satisfy equations
A
A
G=~h +lJ.m
+vm*
'Y=~m+C, + v x
/\
*
* +lJ.x
A
A ~*
'Y=A.m
+v
,
and
'Y*
, m*
t
=
~
k=l
ck h k
0
,
I
I
=
t
L:
k=l
2
ck
h k
'
x
=
0
with other quantities defined as before.
I.J.
If H is accepted, we can testofurther hypotheses obtained by putting
13
and/or v equal to zero. These can be tested either by the methods in section
4, starting from (4.2.1) as the model, or by using appropriate statistics of
the type (2.3.10) if we start from (4.5.1) (along with (4.2.1) ) as the model.
These details can be worked out very easily and are omitted.
..
I
I
I
I
I
I
..I
I
I
I
I
I
I
97
--I
I
I
--I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
6.
Higher-dimensional Tables
,...
........,....
..
-....
_III
Let, as before, i denote categories of response and j, k,
categories of the first, second, third factors and so on.
~,
••• denote
The methods of
previous sections can be immediately carried over and we shall illustrate only
a fe,,/ of these.
Let us consider, for illustration, a four-dimensional table
with one response and three factors.
The analogues of H (4.2.2) and H (4.2.4) will be
O
l
H : Pijkf = t ***
14
i
(4.6.1)
H :
15
Pijkl = t ij ,**
H :
Pijkl = tijk*
16
,.
H
16 means that the probabilities do not depend on the cateBory of the third
factor while for H
15
they do not depend on the categories of the second and
third factor and so on.
The" statistics to test these hypotheses can be
seen to be
r
N
~
~
j,k,.f. i=l
n.1000 noJ'kj ) 2 / n,
1000 nOJ'kn~
N
(nijk~
= (r-l)(M-l)
d.f.
r
(4.6.2)
~
~
j,k,.t i=l
(nijk~ -
n
n
d
2
ij~O ojk,{) /
ojoo
n. j
1
n'
o~ oJ k i
ojoo
d.f. = (r-l)(M-s)
98
,
,
I
I
..
n.
n
n
n
J.jko OjkJ)2/ ijko ojki
n
n
ojko
ojko
d.f.
respectively, where M is the number of
of (jk) combinations, i=l, ••• , r,
u and so on.
(jk~
= (r-1)(M-M12 )
) combinations,
j = 1,2, ••• , s,
,
~
is the number
k=1,2, ••• , t,
1 = 1,2, ••• ,
The degrees of freedom become (r-1)(s t u -1), (r-1) s(t u - 1)
and (r-1) s t(u -1), respectively, for a complete design.
The hypothesis of no three-factor interaction (with respect to response
11.11)
J. , referred to, in Bhapkar and Koch [1965bJ, as the hypothesis of no thirdorder interaction between the response and the three factors, is, again, offered
in two alternative forms:
..I
,.
these will be called the hypotheses of no three-factor interaction in the
additive sense and in the multiplicative sense respectively.
These will be
seen to be generalizations of H (4.2.7) and H (4.2.11) respectively and
2
3
can be handled by similar methods. For reasons similar to the ones mentioned
in section 2, it is for the experimenter to choose the form he feels more
meaningful for his problem.
I
I
I
I
I
I
Refer to Bhapkar and Koch [1965bJ Tor further
discussion.
The case where response
II
i" is structured also presents no difficulty.
Hypotheses similar to H4 , H , H ,
6
5
~,
•••• can be obtained by the least
99
I
I
I
I
I
I
--I
I
I
..
squares method mentioned in Chapter 2.
and, hence, are omitted.
I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
100
Tbese details are straight-forward
I
I
CHAPTER
..
5
Some Multivariate Problems for a
Product-Multinomial Distribution
In this chapter we shall consider problems analogous to those of multivariate analysis of variance (MANOVA) and those of association in normal multivariate analysis.
We shall study, in some detail, a three dimensional con-
tingency table with two variables and one factor and in brief a four dimensional table '.ith two responses and two factors j the methods can be innnediately
carried over to the case of higher dimensional contingency tables with two or
more responses and one or more factors.
eI
Suppose there are k responses, categories of which are denoted by
iI' i 2 , ••• , i , respectively, and a factor with categories denoted by j,
k
= 1, 2, ••• , s and i a - 1, 2, ••• , r , a = 1, 2, ••• , k.
a
frequencies n.
j obeying the probability law
j
We have then the
~l i 2 •• .ik
s
n
j=l
n
.~
k
00 ••• 0J
n
~ r~
a=l
a=l i =1
a
where
=1
101
I
I
I
I
I
I
,
I
I
I
I
I
I
I
--I
I
I
..
I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
and
j
= 1,
2, ••• , s.
Hitb these k responses, a linear hypothesis analogous to that in MANOVA 'Hill
be, in general, specified by constraints of the type
s
2:
j=l
(5.1.2)
H :
0
r
i =1
1
r
s
2:
j=l
l
2:
•
•
. + h(l) = 0
t
1
••• OJ
•
k
2:
i =1
k
p.
f t i *... * j 1. 0
1 1
1
f t k ** ••• ik'J
P
. . + h (k) = 0,
oO···1. k J
t
k
t a = 1,2, ••• , ma , a = 1,2, •.• , k,
where f's and hIs are known constants and the linear functions are linearly
independent.
These linear functions are assumed to be independent of the
functions
He can write these as
s
2:
2:
j=l
i
a = 1,
2, ••• , k
with i = (i , i , ••. , i k ) and 2: standing for 2: 2:. , i~ = 1, ••• , ra'
2
l
i
a 1.
a
\..<
a = 1, 2, •.• , k; (5.1.3) is seen thus to be a special case of (2.5.1).
with
102
I
I
b(a)
tw~j
=
E f
~~
r
=
t a** ... * i
..
~
~ J"
a* ... * j o.
~1~2···~k
a
E
f
ia=l
t
a
**... * i a*...*
j
a
~o ••• o i~o ••• oj
w
... ,
r
13
E
i =l
13
f t a * ••• *"~a* •.. *.J f t
13,* ••• *"~13* ••• *"J
,
g
(a,l3)
t t'
al3
s
=
L:
j=l
and
Qll Q12 ••• .Q lk
(5.1.5)
G
..,
=
.Qkl Qk2 ••• Qkk
with
t
t
From
Q
(2.5.3)
the
a = 1, 2, ••• , ma
13
= 1,
2, ••• , m
13
xi-statistic is £' Q-l £ with
given by (5.1.4) and (5.1.5) respectively.
Eaa
m d.f. with ....
c and
The matrix
-
G will be non-
singular with probability one except in the degenerate case where some variable
103
I
I
,I
I
I
I
..I
I
I
I
I
I
I
--I
I
I
..
I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
(or rather its marginal probabilities) is functionally determined uniquely
from the other variables (i.e., from their marginal probabilities).
Linear hypothesis for several structured responses (MANOVA analogue):
If these
k
responses are structured, a linear hypothesis will be, in
general, of the type
v'here
[a ~ 0:), (i
10:
0:
= 1, ••. , r
0:
)} are knovin weights associated with the
0:
-
th
structured response, d's are known constants and Q's are unknown parrnueters.
It can be easily shown that (5.1. 6) is equivalent to constraints of type
s
L:
=0
j=l
1, ••• , m ,
0:
0:
= 1,
•.• , k ,
where
in a manner similar to the one employed in the univariate case (refer to
section 5 of Chapter 2).
the
(5.1.7) is of course of the type (5.1.3) and hence
r~ statistic is immediately available from the inversion of matrix ~.
As noted in section 5 of Chapter 2, this test is equivalent to the
It
large
sample test" based on the asymptotically normal 'natural' estimates of the
left hand side of the (5.1. 7) with their covariance matrix estimated by the
104
I
I
"sample covariance
of the type (5.1.7) is
s x u matrix
a
~
[d~~:J.
JJ
..
The number of linearly independent constraints
matrixl~
a ma where m = s - va , va being the rank of the
In many problems of this type, dIs are independent of a so that m
a
s -v ;; m, '''here v is the rank of the s x u matrix [d, 'I J.
=
The degrees of
JJ
2
freedom for the Xl-statistic are then equal to mk.
Linear hypothesis for several unstructured responses:
If the
k
responses above are unstructured, a linear hypothesis will
be, in general, specified by constraints of the type
H
~
L.
1 : j=l
ft
'* ••• *
a*... *
~a
i a = 1, 2, ••• , ra-l, t
the relation for
ia = r
a
a
+
j p 0 ••• 0 ~ao
.
... oj
h(a).
ta~a
= 1, 2, ••• , m and a = 1, 2, ••• , k;
a
follows from the relations for i
provided HI is consistent.
= 0,
a
= 1,2, •.• , r -1
~a
ma(ra-l) in number which is the number
of degrees of freedom for any asymptotic X2 -statistic to test HI.
problems of this type,
each a;
ft
* •.• *i
ex
is independent of a and
a *... * j
the number of degrees of freedom is then m(~a r
HI is a special case of (5.1.2) and hence of (2.5.1).
Ya
= 1,
2, ••• , r
a -1
(Ya )
f t a* ... *'~a··
* . *'J
105
..I
a
The linear functions are assumed to be linearly
independent and they are, then,
I
I
I
I
I
I
For many
m~
u.
= m for
- k}. We note that
a
We need only define for
I
I
I
I
I
I
--I
I
I
..
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
--I
Then
Let
s
= j:l
f
ta*···* i a *...
2
It '·lilJ. then follow that the Xl test to test H
l
is equivalent to the 'large
sample test' based on the asymptotic normality of
CIS
(the unbiased estimates
relevant to H ), whose covariance matrix is estimated by the 'sample covariance
l
matrix'. The sample covariance matrix can be obtained from (5.1.5) taking
Linear hypothesis for several responses (some structured, some unstructured) :
If out of the k responses above, some are structured (say the first k l )
and some are unstructured (say the last k-k ), a linear hypothesis will be,
l
in general, specified by constraints of the type
s
L:
r
a
L:
j=l i =1
a
f
' * *,p
. ••• OJ,
t a * ••• * ~a···
J 0 ••• o~ao
ta
= 1,
and
106
2, ••• , rna' a
+ ht(a) = 0
= 1,
ex
2, ••• , kl
where the f's and h's are known constants and the linear functions are linearly
independent;
the constraints for structured responses will be of the type
r
a
~
i
a
=1
with the same notation as before.
Then it can be shown by a similar argument
2
that the Xl test for H is equivalent to the 'large sample test' based on the
2
asymptotic normality of c's (the unbiased estimates relevant to H ), whose
2
covariance matrix is estimated by the sample covariance matrix'.
of freedom are
m~
= m and
~Q: rna + ~~ m~ (r~ - 1).
The degrees
In many problems of this type m =
a
then the degrees of freedom reduce to
It may be pointed out here that this is the most likely situation to
be encountered in practice with some responses structured (or say, quantitative) and others purely qualitative (Le., unstructured).
As pointed out in
section 5 of Chapter 2, in many cases it may be simpler to use the generalized
least squares approach to derive the X~ statistic than the inversion of matrix
g
relevant to either of H ' H or H •
o
l
2
107
I
I
..
I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
--I
Let us now for simplicity of notation denote by
gorieG of two responses and by
k
i
and
j
the cate-
tbose of a factor, say treatments.
The
frequencies, n. 'k' obey the probability law
lJ
,
t
n
(5.2.1)
k=l
n
a.
""1.J.'k
2: . . p. 'k = 1
1,J
lJ
IT
s
n
i= 1 j =1
n n. jk~
i=l j=l
with
r
n00 k.
s
r
n
ijk
Pijk
,
1
and
(fixed) ,
k
= 1,
••• , t.
Let
= n.lJ·k/n00 k·
Hypothesis of no (marginal) treatment effects:
p. k
is independent of k, i = 1, 2, ••• , r
Pojk
is independent of k, j = 1, 2, ••• , s.
10
Note tbat this hypothesis
Pijk
H
3
is weaker than the bypothesis
is independent of k,
Le., the hypothesis of no treatment effects (for all (ij) combinations) in
that H => H but not vice versa. The experimenter may be interested in the
4
3
weaker bypothesis H and not in H • The test-statistic for H is, of
4
4
3
course,
108
I
I
t
NL:
k=l
r
s
L:
L:
i=l
j=l
n .. n
(
n ijk -
k 2
n
lJO 00 ) /
N
n
ijo oak
'
d.f. (r s-l) (t-l),
(obtained from
(4.1.3) with suitable modifications).
The hypothesis
(5.1.8).
H
3
is seen to be a special case of
HI given by
Let
... ,
q(r-l)ok
]
SIlk = diagonal (~ok' i = 1, 2, ••• , r-l) etc.
~12k = [~jkJ i = 1, ••• , r-l
and
j = 1, ••• , s-l
Note that .£:k is the 'sample covariance matrix' of -S:k and is nonsingular almost everywhere (excluding the degenerate case).
It then follows that the
Xi-statistic is
-
l
u , M- u,
,..,
.....
d.f. = (r+s-2)( t-l).
One may be interested in the marginals of, say, the first variable
only and may want to test the hypothesis:
Piok is independent of
109
k.
-.
I
I
I
I
I
I
..I
I
I
I
I
I
I
--I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
e
I
I
The hypothesis can be looked upon as the one of bomogeneity for one response
II
i"
and a factor
11
kl1
discussed already in Chapter 4 and the ..f statistic
(modified suitably for (4.1.3)) is
n,
n
100 00
N 6 6(n, k
k 2
)
N
kilO
/
n,100 n 00 k '
witb (r-l)(t-l) degrees of freedom.
Hypothesis of independence between 2
variables
II .11
1
,
11 jl1
for each
category k of tbe factor is given by
~:
Note that
Pijk = Piok Pojk'
(5.2.7) for i =
k = 1, 2, ••• , t.
2, .•• , r-l implies the relation for i = r also
1,
and the same thing bolds for
j.
Hence the number of independent constraints
The ..f-statistic to test H is seen (Diamond, etc. [1960J)
is (r-l)(s-l)t.
5
to be
t
I:
k=l
n00 k
r
I:
S
I:
i=l j=l
d.f. = (r-l)(s-l)t.
This statistic is seen to be exactly the same as the statistic
2 4) when
th.e h..;ypoesls
th ' (4 ••
11 1,11
l'S
a response and I1 Jol'
,
(11-.2.5) to test
"k" are f ac t ors, or
the stat:"stic (3.1.11) to test the hypothesis (3.1.9) when "i", I1j", "kl1 are
all responses.
This is not surprising and is a conseCluence of the situation
discussed in 3.5 in connection with the use of conditional distributions to
arrive at suitable statistics.
110
I
The relations between these three hypotheses can be seen as follows:
the b;ypothesis of conditional independence between responses i, j given the
response k
(3.1.9) is
\-lith L: L: L: p, 'k = 1;
i
j k
lJ
but i f 1ve work with the conditional distribution of n, 'k's regarding marginals
lJ
nOO}c's fixed and the corresponding conditional probabilities
p,lJ'kip 00 k
=
*
*
Pijk
H
*
with
* Pojk
*
= Piok
L:. L:. p,lJ'k
l.
=
1, H is equivalent to
J
'
Le., the hypothesis (5.2.7) of independence between two responses i, j for
every factor-category
On the other hand, if we work with. the conditional
k.
distribution of nijk,s regarding marginals nojk's fixed and the corresponding
conditional probabilities
"**
Pijk
then 1ve have
=
=1
(for H)
Pijk/pOjk
and both H and H* are seen to be equivalent to
H**
i.e., the hypothesis
(4.2.4) of no j-effect for one response (i) and two factors
III
.-I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
eI
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
(j,k) situation.
'f,le note that these three different hypotheses have different
physical and meaningful interpretations depending on ,Jhich Hays of classification are factors and which are responses, Le., "lhich marginal frequencies
are fixed (or controlled) by the experimenter and ,.hich are purely random
variables.
Hypothesis of no interaction between variables" i",
II
j" and a factor "k"
essentially means that the nature of assocation between variables i and j be
independent of k, i.e., the factor-category and any reasonable measure of this
association will result in the formulation of the corresponding hypothesis of
no interaction.
The situation here is very much similar to the one already
observed in the case of three responces and remarks made .in section 4 of
Chapter 3 are more or less applicable here also.
two such possible measures, viz
We can immediately think of
p
- P
P
and p / p · . p
'ijk
iok ojk
ijk io~ojk
(or
rather the logarithm of the latter) and these will be referred to as an additive and a multiplicative measure of association respectively.
This additive
form of the hypothesis, then, may be proposed in the form
Pijk - Piok Pojk is independent of
k.
The physical interpretation is that the nature of association between variables
"i" and
II
j", as measured by po Ok - po k P Ok' does not depend on the category
k of the factor.
~
~J
oJ
W
This hypothesis is thus seen to be weaker than the hypothesis
,.hich is a special case of H •
6
H
6
is seen to be equivalent to the hypothesis
of the (r-l)(s-l)( t-l) independent constraints
112
I
_.
i
= 1,
2, •• • , r-l, j
= 1,2,
••• , s-l and k
= 1,
2, ••• , r-l.
We note that
the constraint for i=r follows from the constraints for i = 1, 2, ••• , r=l
and similarly for j.
It can be also seen that not only H => H and H , but
degrees of freedom.
4
also H and H => H .
6
3
Thus any statistic to test H6 will have (r-l)(s-l)(t-l)
4
3
6
Thus H , so to say, covers the gap between 'complete
6
independence' of (ij) with respect to k and the 'marginal independence' of
i and j, separately, with respect to k.
Now to test H6 it is seen that none of the maximum likelihhod or min-
X Xi equations,
imun 2 or
to solve.
subject to constraints (5.2.10), is very convenient
For this problem, then, we should use Neyman's linearization
technique, or equivalently, Wald's technique.
The multiplicative form of the
hypothesis of no interaction between variables "i", "j" and factor "k" may be
proposed in the form
the physical interpretation being that the association between variables i
and j, as measured by Pijk/PiokPOjk' does not depend on the factor-category
6,
Here also we note that not only H4 => H and H but also H and H => H ;
4
3
3
i. e., H also bridges the gap between 'complete independence' of (ij) witb
k.
6
6
respect to k and the 'marginal independence' of i and j, separately, with
respect to k.
In view of the admissibility condition
~itijiPiok
= 1 obtained from
(5.2.11) the relation for i=r follows from the relations for i=1,2, ••• ,r-l
113
I
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
and the same holds true for j.
The number of independent constraints of the
form
i=1,2, ••• ,r-l
j=1,2, ••• ,s-l
k=1,2, ••• ,t-l
=
is thus (r-l)( s-l)( t-l) which is then the number of degrees of freedom for
2
any valid X -statistic to test HE;.
For further discussion on this, refer to
Bhapkar and Koch (1965a), and Goodman (1964b).
.e
I
114
I
II
Suppose now that we have scores a. and b. corresponding to the ith
J
~
I
I
I
II
and .lith categories of the two responses respectively.
HJ~othesis
I
of no (marginal) treatment mean-effects:
~.
a.p'k
is independent of k.
~j
bjPojk
is independent of k.
~
II.r:
~
~o
Note that H3' given by (5.2.2), => H but not vice-versa. Thus H is a wea.ker
7
7
hJ~othesis than H and may be valid even if H is not. The l~ statistic to
3
test
~
X'
~
(5.3.3)
3
can be obtained easily by the generalized least squares method.
(1)
~
= ~i
= [~i,
1\
.bk = COy
(2)
~ b
'
a i ~ok' ~
= j j ~jk' ~
=
,
-1
= nook
~k
~
gk
,
where
g
k
"
= ~i
j
a
i
b
j
Let
I
I
el
x2 ' ••• , ~J
(~k)
-.
a.
~jk
- x(l) x.(2)
k
K
I
I
I
I
I
I
I
-.
and finally
115
I
I
I
,I
I
I
1:1 2. •..
=
o 2.. .. '~t
The generalized sum of squares of residuals is then
2
8
I
I
I
Ie
I
I
I
I
I
I
I
-
(x - e(x))'
=
t
L:
=
k=l
.......
(x
-k
.......
- A.)'
in view of the hypothesis H
T
~ to
L-l(x - e(x))
"-.-
-
Equating the derivative of 8 2 '-lith respect to
0, we get the' least squares' estimate
&from
-
= 0
where
~
Thus
The
/\
1:::.
-1
-
= M
to test
ILr
t
k=l
k
- n
- ook gk
y, where
X~-statistic
L:
~
l
-- L- "'"k
~
!:\ ~ - Z
is then 8
,
~
-1
2
with
2"
2:. =~,
d.L
,I
0
116
Le.
= 2t
- 2.
Note that L is nonsingular with probability one excluding, of course, the
k
degenerate case mentioned earlier.
The case where the factor is also structured.
If c
k
is the score
corresponding to the ,!th category (or, say, level) of the factor, we can not
only test H , but (in case H.( is rejected or even otherwise) we can also test
7
the
Hypothesis of linearity of regression:
a i Piok
=
~(l) + ,,(1) c
....
k
I
II
-II
II
II
I
II
I
I
In this case we have
el
and eq,uating the derivatives to 0 we have
~J ~ ~
=
so that
(5.3.9)
[~
=
~:
,
~ ~-l[~J ,
where
The Xi-statistic to test HS is then S2 with~ and ~ given by (5.3.9), i.e.,
117
I
I
I
I
I
I
I
-I
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
d.f.
= 2t
-4.
These inverses do exist (with probability one).
Covariance Analysis:
With structured responses, even if the factor is
unstructured, we can test the hypothesis of no mean-effect of the treatments
on the first response, after eliminating the covariance-effect of the second
response in the form
r:,~
a, p. 'kip
~
~J
'k
OJ
= f,.
+ ~b ,
J
This hypothesis, thus, can be considered an analogue of the normal covariance
analysis hypothesis of no treatment effects.
Using the method of conditional
distributions, as explained in section 5 of Chapter 3, H is seen to be equiva-
9
lent to H ' given by (4.4.7), in the case of one response and two factors,
10
and hence the statistic (4.4.8), with ~jk = nijk/nOjk' and st-2 degrees of
freedom can be used to test H given by (?3.11).
9
If, in addition, the factor also is structured, we can not only test
, but can also test (in case H is rejected or even otherwise) the hypothesis
9
9
of linearity of regression of the 'corrected' mean-effect on the level of the
H
treatment-factor in the form
H10 :
where c
k
r:.~
a, P . .
~
,/poJ'k =
~JK
f,.
+ ~bj + v c k '
is the score associated with the kth level.
Again following the
method of conditional distribution, the statistic (4.5.2) with a. 'k
"'"J..J
118
= n.~J'kinOJ'k
and st-3 degrees of freedom can be used to test H given by (5.3.12).
IO
Note that the methods of this section can be inunediately extended to
the multi-response case with one factor.
The general formulations in section
1 enable us to test hypotheses of different types for different responses;
for example, one can also test a hypothesis of type (5.3.8) with J.,l.(2)
= O.
I
I
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
119
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
Let us now consider briefly the mixed case where one response, say the
first, is structured with a. as the score associated with the ith category
1
-
and the other response is unstructured.
In this case we can not only test
the hypothesis H3' given by (5.2.2), but can also test a vleaker hypothesis,
viz. ,
Hypothesis of no (marginal) treatment effect on the second response
and no mean-effect on the first:
~.
1
a. P. k
1
is independent of k
10
is independent of k
This is seen to be a special case of H given by (5.1.9).
2
•
H is seen to be
11
equivalent to s(t-1) linearly independent linear constraints
=0
k
= 1,
2, ••• , t-1
j
= 1,
2, •.. , s-1.
The X~-statistic will then have s(t-1) degrees of freedom and can be obtained
after inverting the sample covariance matrix of
CIS,
where
One can also test hypotheses of the 'mixed'type; for example, if
the factor is structured with c
k
as the score for the kth category of the factor,
~i a i Piok
=
120
~ + ~ck
II
II
5.
Two Responses and Two Factors
Let i, j denote categories of two responses and k, 1. those of two factors,
say, treatments and blocks respectively.
The frequencies, n ijk1 ' have the
probability distribution
n
(5.5.1)
k,~
nook1
r
s
n
~ij Pij~ = 1,
r
n
i'
i=l
:IT n.J.jk.
'=1 j=l
with
!
Let
j=l
nijk.(
Pijkl
,
~ij nij~ = noo~ (fixed), k = 1, 2, ••• , t and~ =
1, 2, ••• , u with the proviso that all
selected.
s
n
m denote the number of
(kR) combinations may not have been
(~)
combinations so that m = tu if
the design is complete, and ~jk..t = nijk/nOOkR.
Piok1' Pojk.l are independent of k
Suppose there are ~ treatments, say ,(1' .(2' ••• , Iv
,in the -l,th block.
'./.
H is then equivalent to
12
i = 1, 2,
j = 1, 2,
m = 1, 2,
~= 1, 2,
121
I
I
I
I
I
I
_I
Hypothesis of no (marginal) treatment effects:
H12 :
-.
·.. , r-l
·.. , s-l
·.. , v,t-l
·.. , u
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
and tbe number of these linearly independent contrasts is seen to be (r+s-2)
E£(v~-l),
i.e., (r+s-2)(m-u).
H can be tested by the method similar to the
12
one employed for testing H in section 2 and tbe details are omitted.
3
Let us now assume that both the responses are structured with ai' b j
as tbe scores associated with the ,!th and ltb categories of the two responses
respectively.
Hypothesis of no (marginal) treatment mean-effects:
_ .. (1)
-
"').
•
2
We note that H is weaker than H • The Xl-statistic for H can be obtain12
13
13
ed easily by the generalized least squares technique.
Let x. (1)
. k.e
=
E
x. (2)
o.
i a i "'"iok.e'
M.
~K.e
--
K.£
n
=
~ b
L.
j j~jk.e'
x.' =[x. (1)
K,(
,
ook....e
Then
is to be minimized with respect to '"
-:e
K.f
•
122
We have then
'
x. (2)J
K i.
'
I
II
-.
/I.
and hence ?:t..( =
-1
~.,f
Ii
II
'1'/' where
The Ii-statistic to test H is then
13
d.f.
= 2(m-u).
Now if the treatment-factor (k) is structured with c k as the score
assoicated with the !th level of the treatment-factor and H is rejected we
13
can test
Hypothesis of linearity of regression:
H14:
I: i a Piok..('
i
= A.~l)
~
+
I:jbjPOjk.('
= A.J2)
+
'L~l) c k
1"'"-1;
1J.~2)
ck
Here we have
2
S
= I:
(~
k,.f -.K"e
- A.
":..(
-...e c)'
k
- IJ.
~
"'-1t-€
(~
A.
-K.e - -..e
- IJ.
--b
.-<;
and the minimizing equations are
[::]
=
,
123
c)
k
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
The Xi-statistic to test H is thus seen to be
14
Note that methods of this section can be immediately extended to the
multi-response situation with two factors.
hypotheses (for the mixed case, if necessary).
Hypotheses of no interaction of various orders:
No interaction hypothesis for a four-dimensional table with two responses
and two factors arise in problems of two different types.
First we may want
to test whether there is any interaction between factor-effects with respect
to the two responses separately, i.e., the hypothesis
one may, of course, think of a similar hypothesis with the additive formulation.
If the responses are structured with weights
the hypothesis can be posed as
.e
I
One can also test various 'mixed'
124
(a } and lb j
i
}
respectively,
I
which is seen to be in the spirit of the hypothesis of no interaction in the
MANOVA situation.
On the other hand, we may be also interested in the study of association
between the two responses for various factor combinations.
If~k~
denote
any suitable measure of association between the two responses for the factorcombination (k.R.), the problems that may be of interest are
(i) Is ~~ independent of both k and .1..?
(ii)
(iii)
Is
~...e
independent of of?
Is the dependence of "\~ on k the same for all.1.?
For discussion and further details the reader is referred to Bhapkar
and Koch (1965b).
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
125
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
.~
In this chapter we shall discuss briefly some problems which have not
been discussed in the earlier chapters.
This discussion would be, of course,
sketch.y and the reader is referred to the original papers, mentioned at the
appropriate places later on, for a fuller treatment.
1.
Closeness of Approximation
The various test criteria described in Chapter 2 and the specific test
statistics offered in the later chapters are valid only on asymptotic considerations; the criteria have chi-square distributions with the relevant
degrees of freedom under the appropriate null hypotheses only in the limit
as the total sample size, N, tends to infinity (With some further conditions,
if necessary, as described in Chapter 2).
The criteria are thus only approx-
imately valid for finite samples and the approximation is expected to be close
only for large samples. 'The question then naturally arises as to how large
is large enough in order that the approximation be sufficiently close.
It is commonly felt that all the expected frequencies should be greater
than 5 for the J,2_ tests (especially pearson-i?) to be reliable.
However,
Cochran (1954) and others feel that this thumb rule is too stringent; Cochran
suggests that if relatively few expectations are less than 5 (say, one cell
out of five or more, or two cells out of ten or more) a minimum expectation
of 1 is allowable in computing i?
In the next section we shall consider some
exact tests which are available in some simple situations, especially in the
two-dimensional contingency tables.
Computation of the complete multinomial
126
I
distribution, in some cases, suggests that these asymptotic
r 2 _tests are
fairly reliable, when the number of degrees of freedom is moderately large,
even Ivhen some of the expected frequencies are as low as 1 or 2; Maxwell (1961)
feels that 15 or there abouts degrees of freedom might be sufficient for the
pearson-X2 test to be reliable even with most of the expected frequencies are
as low as 1 or 2.
In general, the closeness of approximation in a particular situation
can be studied by computing P(waIHo)' the probability of rejecting the null
hypothesis H ' when it is true, when we use w as the critical region corresa
o
ponding to th~ stated level of significance a; with
r 2 -criteria discussed in
2
this monograph the hypothesis is rejected whenever Y > r
where Y is a
- n,a
relevant statistic with a limiting chi-square distribution with n degrees of
freedom under Hand X2 the upper a-significance point of this limiting
o
n,a
chi-square distribution. It is necessary, then, to compute the probabilities
_.
I
I
I
I
I
I
I
_I
2
of all outcomes resulting in Y > r
,the probabilities given by either the
- n,a
multinomial (the product-multinomial, if necessary) or the hypergeometric
distribution, as the case may be.
Such computations are quite cumbersome and
the reader is referred to Welch (1938) for an illustration in a relatively
simple case; refer to Wise (1964) for another illustration where he also
2
suggests an improved X -statistic for the case with equal expected frequencies.
In his earlier paper (1963) he points out that the standard approximation to
a multi-nomial probability, derived from Stirling's expansion for a factorial,
is misleading and he offers a better approximation which seems to justify the
y2
~
goodness-of-fit test under much wider conditions.
Apart from exact tests, some other methods can be used in some situations
when some expected frequencies are small; refer, for example, to Haldane (1939).
Yates (1934) suggested a continuity correction to give a closer approximation
127
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
in a 2 x 2 contingency table:
(6.1.1)
a
c
m
b
d
n
r
s
N
The well-known Pearson x2-statistiC is given by
N(a d - b c)
r s mn
(6.1.2)
2
,
d.f. - 1 ,
to test either the hypothesis of independence in a two-response situation or
the hypothesis of homogenity in a one-factor one-response situation, or the
'hypothesis of random allocation' in the case with all marginals fixed (refer
to the next section).
For the third case Yates suggested that the statistic
should be computed after increasing the smaller frequencies (i.e., b and c
if b c < a d) and decreasing the larger ones by 1/2 each so that the modified
statistic becomes
N(lad-b cl _N/2)2
r s m n
This is mainly to give a closer approximation in using a continuous distribution to approximate a true distribution which is, in fact, discontinuous.
The effect of the discontinuous nature of the distribution on the x2-test
was not considered serious for contingency tables with more than one degree
of freedom.
Plackett (1964) points out that such a correction is appropriate
128
I
for the case with all marginals fixed and cautions against its uncritical use
for the first two cases; this confirmed the conclusion reached empirically
by Pearson (1947).
Zero frequencies.
applies to Neyman's
Much of the above discussion concerning pearson-x2
X~-statistic
and
t~e
Neyman-Pearson likelihood ratio sta-
tistic as well; the latter two, however, have a further disadvantage in that
these are not even defined when some frequp.ncies are zero.
Even though the
probability of such an event tends to zero with increasing sample size under
the initial asstunption that all p .. be positive, such an event might occur
~J
with finite samples.
In the light of earlier discussion concerning approxima-
tion to a discontinuous distribution by a continuous one, one may suggest that
while computing the statistics one should replace zero frequencies by 1/2 ;
Rao (1965) recommends replacement of 0 by 1.
In the case of xi-statistics
discussed in this monograph, such a continuity correction may not be necessary
in situations where the explicit expression obtained by the least squares
technique, especially when dealing with structured responses, does not involve individual
frequen~ies
themselves but rather some function of a ntunber
of class frequencies as a denominator; we are then using Wald criterion for
such cases with Xi not defined, as pointed out in Chapter 2.
One has to be
careful, of course, when a ntunber of zero frequencies arise for one cannot
be sure, then, of the closeness of approximation; but in such cases the assumption that all Pij are positive is itself questionable.
_.
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
129
I
I
I
.I
I
I
I
I
I
2.
,..
.I
Tests
Consider an urn-experiment with N balls, n.
~o
Ai' i
= 1,
of which are marked
2, ••• , r, and suppose that these N balls are drawn in random order
and placed, in order, in a row of N receptacles, n . of which have been marked
oJ
B , j
j
= 1,
2, ••• , s; then
~i
n io
= ~j
noj
= N.
The outcome of such an exper-
iment can be represented by a two-dimensional contingency table
Total
•I
I
I
I
I
I
I
...... Exact
A
r
Total
where n
ij
N
is the number of balls, marked A., found in receptacles marked
~
B ; n's are then random variables subject to the constraints that both the
j
marginal totals, n ' s and n ' s, are fixed.
iO
Oj
The probability of such an out-
come is
11:
j ( 11:
i
n oj ~
d
n' j •
~
,
)
130
I
or alternatively
.-I
,
n io '
rc (
)
i
rc n ij i•
j
N
(
rc n
j
.,
,
..
i
OJ
which is
(~ niO~)
(6.2.2)
N ~
(~nOj~)
(rc rc n .. ~ )
i j
lJ
In order to test the hypothesis that the allocation of these balls to the
receptacles is genuinely random, we sum the probabilities of the type (6.2.2)
of the observed event and those which are less probable on the above hypothesis;
the hypothesis is rejected if this total probability is less than the desired
level of significance.
This is the basis of Fisher's exact test which is
quite appropriate for experiments of the above type, 1. e ., of which the urnmodel above is a SUitable, abstract picture.
these exact test in the special case r
Various tables for carrying out
= s =2
to Finney (1948), Fisher and Yates (1957).
are available; refer for example,
A large sample approximation to
the above exact test procedure is prOVided by using the
n
N !: !:
i
(n ij -
j
Y? -statistic
n
2
io N oj- )
n.
lO
n.
oJ
with (r-l) (s-l) degrees of freedom; in the special case r
statistic reduces to (6.1.2) in the notation of (6.1.1).
= s = 2,
the above
Cochran (1954)
recommends that the exact test should be used when the total sample size (for
131
I
I
I
I
I
I
•I
I
I
I
I
I
I
..
I
I
I.
I
I
I
I
I
I
I
a 2 x 2 table) is less than 20, or when it is less than 40 and expected
frequency is less than 5.
It may be pointed out here that such an experiment (With both marginals
fixed) can be looked upon as a 2-factor l-response experiment where the factors
A and B, however, are not arranged in a cross classification assumed throughout in this monograph.
E~mct
test for homogeneity of several populations.
picture of a 1-factor (say B) and l-response (say A) experiment, consider s
urns, each containing a large number of balls, such that in the lth urn B
j
the proportion of balls marked Ai is Pij.
Suppose noj balls are drawn at
random from urn B and n
of these happen to be marked Ai; the outcome can
j
ij
be again represented by the contingency table (6.2.1) but now only noj ' s are
•I
fixed beforehand.
I
I
I
I
I
I
which, on the hypothesis-of homogeneity:
The probability of the outcome (6.2.1) is very nearly
j
~
(6.2.4)
J
n
(
:It
n
:It
i
With P.'s as nuisance parameters.
~
,
~J
i
(
oj
n ..
:It
Pij
. ~
OJ
n ..
)
:It
i
~J
=
n.
~o
Pi
Pi' becomes
'
We can now invoke the method of conditional
distribution mentioned in Chapter 3 and write the probability (6.2.4) as
n
:It
,
P io )
i i
('
I
To give an abstract
132
I
the first term being the
probab~lity
of the observed marginal totals n.
~o
and
the second being the conditional probability of the outcome (6.2.1), given the
marginals n
(in addition to the marginals noj fixed by the experimental
io
The latter is free of the nuisance parameters and, thus, the exact
scheme).
test of the hypothesis, in the framework of this conditional set up, is provided by the exact test mentioned earlier for the case with both marginals
fixed.
Exact test for independence of two responses.
An abstract picture for
the 2-response experiment is provided by considering an urn containing a large
number of balls, each marked Ai Bj with i
= 1,
••• , rand j
= 1,
••• , s, with
Pij as the proportion of balls marked Ai B j , and noting the number, n ij , of
balls marked Ai B in a random sample of N balls drawn from the urn. Again
j
the outcome is represented by the contingency table (6.2.1) with only N
fixed beforehand.
The probability of such an outcome is very nearly
Nl
:n: :n: n ..
i
j
:n:
i
~J
under the hypothesis of independence:
Nl
:n: :n: n ij
i
,•
j
0
= Pio
Poj ' this becomes
n.~o
:n: Pio
,
i
with p. , p j as nuisance parameters.
~o
Pij
Here again we can use the method of
conditional distributions writing (6.2.5) as
(
N l
:n: n.
i
~o
:n:
i
j
~
:n:. n.~o l :n:. n0 j
n.~o) ( N '•
Pio
:n: n
(";;;'~N;;r,_--lI'.J_---rj_)
• :n: :n: n. j
oj
i
133
j
~
•
,
.-I
I
I
I
I
I
I
•I
I
I
I
I
I
I
..
I
I
I
,e
the probability of marginals n. , that of marginals n . and the conditional
lO
OJ
probability given these both marginals, respectively.
Thus on the hyPOthesis
under consideration the conditional probability is again given by (6.2.2) and
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
we are led back to the same exact test if we work in this conditional framework.
For a straight forward extension in the general case, the reader is
referred to Freeman and Halton (1947).
is discussed in detail by Barnard (1947) and Pearson
(19l~7).
We may remark
here that the exact test discussed above, which arises 'naturally' in the
first case, is not necessarily the 'most natural' or the 'best' in the last
two situations.
these situations.
Barnard and Pearson discuss some alternative criteria in
It was proved by Tocher (1950) that the exact one-tailed
test for the first case is uniformly most powerful unbiased test against onesided alternatives with any of the three types of data provided it is used
with the necessary randomization to attain exactly th.e desired level of significance.
This has been already pointed out in Chapter 3.
,e
I
The case of a 2 x 2 contingency table
134
I
I
3.
-,
Classification Procedures
• :C&.
..... ~ .. - • •
•
.....................
-
Let us consider now s populations, P j '
such that an individual selected
at rando"'1 from P. would belong to the ith response-category, Le., would have
J
response i, with probability p .. , i = 1, 2, ••• , rand j = 1, 2, ••• , s so
lJ
that L:. p. . = l.
1
lJ
We have to devise a classification rule 1'Ihereby an individual,
which is lmown to have come from one of these s populations, is assigned to
one of these on the basis of the response-category x to vlhich it belongs.
Suppose first that we lmow the a priori probabilities,
populations with, of course,
11:.
> 0,
J -
L:
j
11:.
J
= 1.
11:.,
.
~xJ
of these
These could be, for instance,
the proportions of individuals from P j in the mixture of these.
n
J
Noting that
is the conditional probability that the individual to be classified has
the response x, assuming that it has come from P., the posterior probability
J
of P., given the response x, is
J
p(j
I x) =
Consider a classification procedure, say d(x), such that for each x
= 1,
••• , r,
d(x) takes some value j from (1, 2, ••• , s) whereby the individual is assigned
to P j on the basis of x.
Let..tjk be the loss, or the seriousness of error
cOlnmitted, due to assigning the individual to P
Pj •
~len
k
when it really comes from
the conditional average loss, given the response x, due to assigning
the individual to P
k
is
L(k
I x)
=
~ p(j f x) ~k
J
135
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-,
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
Then a procedure 5(x) such that
5(x ) = k
if L(k , x)::: L(m
I x),
i.e., if L rr. p . 1' k < L rr. p . i j , m=l, ••• ,s,
. J xJ J - . J xJ
m
J
J
minimizing the (posterior) average loss, for each x, is a Bayes procedure with
respect to the prior distribution rr = (rr , rr , ••• , rr ); in other words, this
2
l
k
procedure 5 minimizes the average loss, with respect to rr, amongst all procedures of the type d described above.
This can be easily verified.
For the
simple loss function 1 j = 0, .!jk = 1 whenever j 1= k, such a procedure becomes
6(x)
=k
if rr k Pxk > rr p ,m
- m xrn
= 1,
••• , s;
in other words, it is a procedure that selects population P which has the
k
maximum posterior probability given x. It may be verified that this Bayes
procedure
'~ith
respect to rr minimizes the total probability of misclassification
with respect to the prior, probability distribution rr.
For the particular case
of rr. = lis for all j, a Bayes procedure is then a maximum likelihood procedure.
J
In case where the prior distribution rr is unknown, it can be shown that
we may reasonably confine ourselves to the consideration of only Bayes procedures of the type (6.3.2) with arbitrary rr (or of the type (6.3.3) in case
we use the simple loss function).
This is because it can be shown that for
such a classification problem Bayes procedures (With respect to rr, rr j >
° for
all j, which may be assumed without a loss of generality) are admissible, in
the terminology of the Statistical Decision Functions (refer to Wald (1950) ),
and conversely all admissible procedures are Bayes with respect to some rr.
selection from this class then would have to be done on some additional considerations.
136
A
II
So
far we have assumed a previous knowledge of all probabilities
to arrive at a classification decision.
Pij
The position is not very satisfactory
in the case where these probabilities are not known and have to be estimated
from random samples known to have come from some one of these populations.
Suppose
know that in a random sample of n j from P., n .. belong to the
o
J
~J
response-category 1. Then p .. may be estimated by q •. =' n. j/n j and we may
vie
~J
~J
~
0
restrict ourselves to Bayes procedures with pIS substituted by q's.
For de-
tailed discussion of such procedures reference may be made to Birnbaum and
Maxwell (1960) and Maxwell (1961).
They discuss a method of selecting a
suitable procedure of this type by considering the performance of some arbitrary
Bayes procedure (with respect to some n) and then by adjusting n suitably to
give more desired results in some particular direction (where, for example,
the errors due to misclassification are more serious) at the cost of greater
chances of wrong decisions in other directions.
Maxwell (1961) claims that
such a method described in his book 'ensures that the number of misclassifications is a minimum'.
This refers to the number of misclassifications by
applying the procedure to the individuals in the given samples known to have
come from specific populations.
This is so only if the prior probabilities
n j are estimated by the sample proportions nOj/N with N
this assumption is made implicitly.
= ~j
noj ; apparently
In any case, one may feel that the rela-
tive performances of a procedure should be judged by
th~
consideration of its
performance in classifying an individual known to have come from some one
of these populations, i.e., by its performance for the problem for which it
is designed, rather than by its performance in classifying individuals which
are already known to have come from specific populations.
••II
II
II
II
I
I
I
_I
I
I
I
I
I
I
I
-.
137
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
4.
Simultaneous
Confidence Intervals
......-..- ...
......
•
.........-w
_....- •
~..
In this section we shall consider briefly simultaneous confidence inter-
vals for contrasts among multinomial populations for a detailed discussion of
which the reader is referred to Goodman (1964c). It will be shown that the
2
ll-test for homogeneity of several populations discussed in Chapter 4 has a
natural tie up with these confidence intervals in that the hypothesis of homogeneity is accepted by the X~-statistic at a level of significance
a
if and
only if all these confidence intervals, with a joint confidence coefficient
l-a, contain zero.
This desirable property is not possessed by the well-known
Pearson x2-statistic.
Consider then s populations such that a random observation from the
lth population belongs to the ith response-category with probability P'j'
-
i
= 1,
= 1,
••• , r, j
••• , s
~
so that
~i
Pij
= 1. A contrast
between these
populations with respect to the ith response-category is P. : Ej c .. p .. with
~
~J
~J
~j
=0
c ..
~J
and the number of such linearly independent contrasts is s-l for
A linear function of contrasts within each response-category will be
each i.
called a contrast, P, between the s populations.
Thus
r
P=E diP.
i=l
~
with e
= d
ij
i
c
ij
so that Ej c
= 0 implies Ej e
= O.
ij
ij
Notice that any such
contrast can be written as
r-l
~
i=l
where f
s
s
r-l s
j:l e ij Pij - j:l e rj (l-Plj- ••• - P(r-l)j)' that is i:l j:l f ij Pij
ij
=e
ij
- e
rj
so that E f
= O.
j ij
138
The number of such linearly indepen-
I
dent contrasts is thus (r-l)(s-l).
The contrasts 5,. = P'j - p. , i = l, ••• ,r-l
lJ
1
lS
and j = 1, ••• , s-l will be called elementary contrasts; these are linearly
independent and any contrast can be expressed as a linear combination of these
(r-l)(s-l) elementary contrasts.
H :
o
The hypothesis of homogeneity of s populations,
Pij is independent of j, i = 1, ••• , r; j = 1, ••• , s
holds if and only if all 5ij are zero.
Suppose now that we have a random sample of size n j from the jth popo
ulation and, of these, n ij belong to the !th response category. A I natural I
estimate of Pij is
~j
= nij/nOj and that of P = Ei E j e ij Pij is
~
= Ei Ej
Moreover the variance of ~ is
1\
v(p)
so that the
I
Qj
(~
s
-1
En.
j=l
OJ
r
2
r
2
(E e , Pij - (E e" Pl' J') }
i=l iJ
i=l lJ
2
r
2
e, . a, . - (E e., a'j) }
i=l lJ ~J
i=l lJ ~
r
(E
-
P)/S(~)
= noj/N
has a limiting standard normal distribution as N ->
fixed (or, more generally, With Qj -> a such that
j
0
CD,
with
< a j < 1).
Thus we get a confidence interval for P, viz.
with a limiting confidence coefficient l-a, where Za is the two-sided asignificance point of the standard normal distribution.
139
I
I
I
I
I
I
I
_I
,
sample variance' obtained after substituting piS by qls is, say,
(6.4.2)
Then
=
_.
We want to show now
I
I
I
I
I
I
I
-.
I
I
I
,e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
that
,
2
where X is the upper a-point of the chi-square distribution with (r-l)(s-l)
a
degrees of freedom, are simultaneous confidence intervals for all contrasts
P with a limiting joint confidence coefficient l-a.
PROOF:
Consider the estimates dij
mentary contrasts 6ij
= Pij
- Pis' i
= 1,
= ~j
- ~s of the (r-l)(s-l) ele-
••• , r-l and j
be the 'sample covariance matrix' of the vector
~
S with
= 1,
••• , s-l.
Let
elements d ij ; in
other words, ,.,
B is obtained after replacing pIS by q's in the covariance matrix
of~.
Let ~ be the vector of 0ij.
Then (5!..-~)' ~-l (~-~) has a limiting chi-
square distribution with (r-l)(s-l) degrees of freedom so that
_
P[(d-O)
.....
,
this is a consequence of generalized Cauchy-Schwarz inequality J~'
Jx'
.......,
.~
Ax
""'.....
Jy,
A-l y for any positive
i"ftrI
'"""'"
'""""-
is proportional to 1..
I.':::
definite matrix ....
A, with equality when
Thus the probability that
X
a
simultaneously for all
~
tends to l-a as N ->
00.
But each contrast, P, is
of the form a' 6 and, conversely, every a' 6 is a contrast; a' d is then ~
,.",,,,-.1
"""""'"....,
while a' B a is the 'sample variance' of aId, i.e.,
""'-
#JIIfIII1.......
,...,
140
.....
""""""
_
S2(~). Thus the state-
ments (6.4.5) are precisely the statements (6.4.4) for all P and the assertion
follows.
')
From the results in Chapter 2, it follows that the X~-statiBtic (4.1.4)
to test H (6.4.1) is d' B- 1 d.
o
if
.... ,...
""
The X2 -test rejects H if d' B- 1 d > X2 , i.e.,
1
and only if, there is at least one a
0
for which I a' d
'""" -
--
a
-
I > Ja'
Ba X ,
,..,.,... ...... 0:
words, at least one of the intervals (6.4.5) does not include zero.
in other
Conversely
2
the Xl-test acceptsHo if and only if all the confidence intervals (6.4.5)
include zero.
Th.us from the point of view of existance of such a corres-
pondence between a test-criterion and an associated confidence-interval procedure, the Xi-statistic (4.1. 4) seems more desirable than the Pearson ..;:.statistic (4.1.3) to test the hypothesis of homogeneity.
Similar results can be established regarding the X2 -test (4.1.6) for
1
the hypothesis of equality of 'means', viz.,
H:
a i Pij
~
i
is independent of j, and the associated confidence intervals
~* - S (~*) Xa* < p* < ~*
(6.4.6)
,
where now X*2 is the upper a-point of the Chi-square distribution with s-l
a
degrees of freedom.P*
is any arbitrary contrast between 'means'
,
p*
~.
1
a,1 P'j'
say
1
I
I
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
in the notation of 4.1.
141
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
If we are interested only in some specific contrasts, for example only
6's, rather than the complete set, it is sometimes possible to obtain shorter
simultaneous confidence intervals for the specified set compared to the relevant ones given by (6.4.4).
The reader is referred to Goodman (1964c) for
details.
.e
I
142
5.
I
I
-.
Relative Merits of Different Methods
of
Testing
....__ft_Estimation
._....... .and
..........
_ ..-.
For the problems of estimation arising from a single or product-multinomial distribution, we have mentioned a number of methods in Chapter 2.
The
interest may be in estimating the probabilities, p .. 's themselves or a fewer
~J
number of parameters, a's, in terms of which all the probabilities can be
expressed.
All these methods provide reasonably good estimates, at least for
large samples, in the sense that under suitable regularity conditions these
estimates are consistent, asymptotically normal and asymptotically 'efficient'
as Sh010ln by Neyman (1949).
However some differences between these methods
have been brought out by considering 'the second order efficiency' on the
basis of which it has been shown that the method of maximum likelihood is, in
a sense, superior to the rest; the reader is referred to Rao ((1961, (1965))
for details.
But this still
leaves open the basic problem regarding optimality
for small or moderate size samples and nothing is reliably known providing
an answer.
Among these methods which are asymptotically equivalent, in some
sense, we may also introduce the criterion of computational simplicity.
From
this point of view we have seen that, in general, the method of l~ due to
Neyman (sometimes also referred to as 'modified 1 2 ,), using the technique of
linearization if necessary, has an advantage that the estimates can be obtained by solving only one set of linear equations while the other methods may
require a complicated iterative procedure or, possibly, solving one set of
linear equations after another until solutions converge.
It is only in some
relatively simple situations that other methods lead to explicit solutions
immediately.
Thus from the computational point the minimum-Ii estimator has
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
a considerable advantage; this is, of course, offset to some extent these
days by the advent of electronic computers.
There is another advantage in
using the minimum-1~ estimator that, in general, other methods do not lead to
unique solutions and one has to find a consistant solution.
But how does, in
practice, one decide that a given solution is consistent?
There are some conjectural opinions in favor of the maximum likelihood
estimate even for small samples, but these do not provide any reliable definite
answer to the problem.
Berkson (1955) reports some interesting findings con-
cerning estimation of parameters, a and
~,
in the logistic function with bi-
nomial variation of the dependent variable, viz.,
arising in a bio-assay with quantal response.
and the minimum 1
2
Both the maximum likelihood
(Pearson) equations are difficult to solve.
On the other
hand, by considering the logit Pi' viz.,
z
i
= log
2
the logit-1 is easy to obtain.
Pi
l-Pi
=
a+~x.,
~
Moreover, after computing the complete
sampling distribution with 10 observations each at three equally spaced values
of x. (With, thus, a total sample size of 30), it turns out that the variance
~
and mean square error are the smallest for the logit-12 estimate, larger for
the Pearson-12 estimate and largest for the maximum likelihood estimate.
A more or less similar situation exists concerning the problem of
2
testing hypotheses. Neyman (1949) has shown that all these 1 -tests, viz.,
those based on pearson-x2, Neyman-Xi and the likelihood ratio criterion, are
144
consistent and asymptotically 'equivalent' in the sense that the probability
of any two contradicting each other tends to zero as the total sample size
tends to infinity for every fixed parametric point
Q.
Since the tests are
consistent, the power of any of these tends to one for every fixed alternative
(i.e., for Q in the space 0 - w) and for every fixed level of significance a.
One approach that has been mainly followed so far for asymptotic relative
effeciency studies is that of computing limiting power for a sequence of alternatives tending to the null-hypothesis (w) at a suitable speed so that this
limit is bounded away from both zero and one.
This general Pitman (194i)
method fails here to distinguish, as far as efficiency is concerned, between
these three X2 _tests as they turn out to have the same asymptotic relative
efficiency.
Another approach for asymptotic comparison of these tests has been
recently discussed by Hoeffding (1965).
Here the size of a test is not kept
fixed but tends to zero as the sample size tends to infinity; this enables
consideration of alternatives not very close to the hypothesis.
that
II
It is shown
if a given test of size ~ is 'sufficiently different' from a likeli-
hood ratio test, then there is a likelihood ratio test of size
~
ON which
is
considerably more powerful than the given test at 'most' points p in the set
of alternatives when N is large enough, provided that
rate ".
~
-> 0 at a suitable
This result has been established mainly for tests of simple hypotheses
but is shown to hold for some situations with composite hypotheses as well.
So the comparative position regarding the various test criteria appears
to be very similar to that regarding the corresponding estimators.
The cri-
teria are asymptotically eqUivalent, in some sense, while in the finer analysis
it appears that, under some conditions, the likelihood ratio test has some
advantage over others.
Nothing can be said, as yet, for small samples.
The
I
II
-.
II
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I
I
I
,I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
xi-statistic, though, has a computational advantage, in general, over the
other criteria.
We might then propose a very rough working rule, in the light of above
findings, that one may prefer maximum likelihood estimates of parameters and
the likelihood ratio test criterion for the purpose of estimation and testing
of hypothesis, respectively, in situations where these estimates are easy to
obtain.
For several simple illustrations of such a situation, the reader is
referred to Kullback (1959).
But for more complex situations where the maximum
likelihood estimates are difficult to obtain and especially when one does not
know which solution is the right one we recommend the Xi-statistic or, more
generally, Wald's statistic.
Of course
is not an appropriate statistic
in some situations, for example, where there are several zero frequencies;
but neither is the likelihood ratio criterion quite appropriate for such a
case or, for that matter, any statistic discussed in this monograph; special
methods should be devised for such cases.
It would be worthwhile carrying
out an empirical study involving computation of exact distributions (and the
exact significance points) of the various test-criteria for several sample
sizes both under the null hypothesis and some alternatives; such a stUdy will
throw some light on the approximations involved and also on the relative powers
of these criteria.
Some such finding could be, perhaps, far more relevant
to judge the relative merits than purely theoretical studies based on asymptotic
considerations alone.
,I
xi
146
6.
,..,
Construction of Categories
. . . . . . . . . . ~ ='"'%:'"
---
.........
........,
When a way of classification corresponding to the response is unstructured or, if structured, is such that the variable under consideration
takes only a finite number of values, the corresponding categories are naturally
well-defined.
But if the variable is continuous or if it takes a very large
number (possibly infinite) of values, then the first step is to group tbe
observations into classes.
The question then arises as to how to form these
categories in some optimum manner.
Mann and Wald (1942) and Gumbel (1943)
suggest that the categories be chosen so that the expected number is the same
in all classes.
Mann and Wald consider also the problem of choosing, in some
optimum manner, the number
of categories.
The reader is referred to that
paper and also to a paper by Williams (1950) for further details.
The problem of designing the experiment is not discussed at all in this
monograph.
Suitable adaptations of the 'response surface' techniques, due to
Box and others, appropriate to the categorical arrangement are called for.
I
I
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
147
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
7. Partitions of r 2 -statistics
"""'"
.........
~..........
. . - .......
Suppose we are testing some hypothesis
2
r -statistic, say
some
x2,
J[
~"'"'J
o
regarding p on the basis of
t>J
from the ones described in Chapter 2.
If that
statistic has a limiting chi-square distribution with t degrees of freedom
if J:l
o
holds, roughly speaking it essentially means that
+
where
(~)
+ ••• +
2
Yt
x2
can be expressed as
,
means that the difference converges to zero in probability as N -> CD,
and y's'are asymptotically mutually independent standard normal variables.
Suppose now that z2,
that J;o
= :HOI n J:l 02
w2
are some
r 2 -statistics for hypotheses HoI and H02 such
and
+ •.. +
+ ••• +
. then '''e have
w2)
in such a case we shall say that (Z2,
is an asymptotic partition of
X2
into two asymptotically independent components each of which is a 'natural'
r 2 -statistic for a component hypothesis. In this case
x2 - Z2
is also a valid
statistic (in the asymptotical sense) for testing the hypothesis :H
2
generally if there eXist r -statistics
x.2i '
1,2, ••• , t, respectively, such that
148
02
•
More
for testing hypotheses ~r o~., i
=
I
I
-.
t
J'
0
n
=
(p)
2
~ -v,., Yi
oi
,
~
then ,.,Ie shall say that
:H
i=l
(xi, ..., ~)
is an asymptotic partition of
asymptotically mutually independent one-degree components.
i2
into
t
Here we have only
asymptotic additivity in the sense that
T.here have been given some partitions which are exactly additive; for
example, refer to Irwin (1949) and Lancaster (1949).
We feel that the re-
quirement of exact additivity of components, as in the analysis of variance,
is desirable only if the components satisfy the other conditions mentioned
above where each component, ~, is a perfectly valid statistic for some
~
meaningful component hypothesis, H .'
o~
If such a division with exact additivity
is not possible, this requirement may be discarded and we should not insist
on an exactly additive partition, in a blind analogy with analysis of variance,
if the component statistics are not valid criteria for testing some meaningful
sub-h~JPothesis.
We have already discussed in Chapter 3 some of these points
in connection with the hypothesis of no interaction (of second order) among
three responses.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
149
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.I
APPENDIX
:0.
til
Let the probabilities be specified by the h;ypothesis
(A.l)
H :
o1
p ..
lJ
= p lJ
.. (g)
.....
wbere Pij(2) are given functions of
t
g
,
(2
€
unknown parameters gk satisfying the
basic conditions
p .. (g) >
lJ -
°
r
L:
i=l
p .. (g)
lJ ~
=
1
and the" regularity conditions (i) - (iii) in 2.2.
(A.2)
c ..
lJ
be a seCluence of alternative hypotheses such that L:. c ..
1 lJ
c's zero.
2.2.1,
If
1\
E is
= 0,
with not all
the maximum likelihood estimate of 'so' mentioned in Theorem
then the statistic
s
L:
j=l
has a limiting non-central chi-sCluare distribution with rs-s-t degrees of
freedom under the seCluence H of distributions as N ->
lN
the noncentrality parameter of this distribution is
150
CD
with Q,' s kept fixed;
I
(A.4)
••I
~l
where
6
=
r s x 1
and
B
,oJ
,
=
g
= ....0
g
r s x t
it being assumed that the r s Pij' s are arranged in the lexicographic order.
I
I
I
I
I
I
·_1
With reference to a test of the hypothesis
I
h (g)
m-
=0
m
,
= 1,2,
••• , u
with H as the assumed model, we have the following
ol
THEOREM A.2.
.-......
Assume regularity conditions in 2.3.
Let
~
(A.6)
m
= 1,2,
••• , u
be a sequence of alternative hypotheses, where not all d's are zero,
be the 'true' point.
and~o
If ~ and g* are the maximum likelihood estimates
(mentioned in Theorem 2.2.1) of g
"'0
under the conditions (A.l) and (A.5) re-
spectively, then under the sequence of distributions specified by H2N , the
151
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
statistic
(A.7)
Q* ) ) 2
( n.~J. - noJ. p.~J. ( ~
_
l':~
j i
n . P.j(Q*)
oJ ~ .....
(
~ ~
n .. - n
~J
f\
. p .. (Q)
oJ
~J-
)2
1\
n . p .. (Q)
j i
oJ
~J
""
is, in the limit as N -> co, distributed as a non-central chi-square with
degrees of freedom and non-centrality parameter
(A.8)
=
*']_'
1
(!* ~*)-~*J ]1 JJ~
,. . '-1 '
d
Ih
E- ~
~l [
S,
where
d
=
[d ]
mu x 1
,
~l =
[h (2J
m
6 Q
k
,
Q
1
u x u
~l =
(~J )~
C6P~J(~)
, kl
= ....Q0
Q
,.,
=Q
Pij (20)
r s x u
""0
and
,
r s x(t-u)
it being assumed, without loss of generality, that Q
k
1
(k
1
denote the u parameters made dependent under Ho2 ' Q~ (k2
152
= 1,2,
=u
••• , u)
+ 1, ••• , t)
u
I
* is p .. expressed in
denote the t-u parameters remaining independent and Pij
~J
terms of the t-u independent parameters under H02 •
The reader is referred to Mitra (1958) for a proof of Theorem A.l and
to Diamond (1963) for that of Theorem A.2.
We shall mention below a few
applications of the above theorems given in Diamond etc. (1960).
(1)
Hypothesis of independence of two responses:
,
H'
o'
Pij
~:
-1/2
0
+
N
d
PijN = Pio Poj
ij
= Pio
Poj
Ei Pio'
= Ej
0
~ = E E (d ij -
i j
=EE
i j
2
dij
/
P~j d.
~o
0
Poj
=1
E. . d
~,J
ij = 0
2
0
/ 0
0
Pio Poj
- Pio doJ.)
0
if
Pio Poj
.
,
dio = doj = 0,
for the statistic (3.1.3).
(2)
~:
A
Hypothesis of independence of three responses:
o
0
0
-]12
PijkN = Pioo Pojo Pook + N
d ijk '
E.~,J,
'kd'jk=O.
~
1
(d
_po
0
d
_ 0
0
d
=E E E
i j k 0
0
0
ijk ojo Pook ioo Pioo Pook ojo
Pioo Pojo Pook
o
0
d
)2
- Pioo POjO ook
153
••I
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
if
d ioo = dojo = dook = 0,
for the statistic (3.1.15).
(3)
Hypothesis of no partial association between the first two responses
given the third response:
H·
o·
~k Pook
J
=1
~:
•
~i,j,k d ijk
if
for the statistic (3.1.11).
(4)
Hypothesis of homogeneity of several populations:
is independent of
j
J
~. d ..
~:
A
~
n
= r: r: :2l
j i
N
where
for the statistic (4.1.3).
,
~J
=
°.
= 0.
I
(5)
.:
Hypothesis of no 'treatment effects', i.e., hypothesis of no
effect of one factor in the presence of another factor:
~:
L:. d. 'k
~
~J
=0
I
I
I
I
I
I
•
,
A
where
_I
for the statistic (4.2.5).
We state below a theorem on asymptotic independence of two test criteria;
this is due to Diamond (1958) and some applications of this theorem from Diamond
etc. (1960) have been mentioned earlier in Chapter 2.
THEOREM A.3.
Suppose we have to test, on the basis of same data, two
hypotheses
= 0,
hlm
1
h~
c.w
2
(~)
,..,
= 0,
assuming the model (A.l), the regularity conditions (i)-(iii) in 2.2 and
those in 2.3 regarding the functions specified by HoI and H
02
155
with u +u
l 2
~
t.
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
2
2
Let X (1) and 1 (2) be the test statistics of tbe type (A.7) for Hal and Ho2
respectively.
Then a necessary and sufficient condition for the asymptotic
2
2
independence of X. (1) and X. (2)' when both Hal and Ho2 are true, is that
,
--
= I - B(B B)
wbere
~
is defined in Theorem A.l, and
*
......
156
,
B
* are
~l '~2
A.2 for Hal ' Ho2 respectively.
-1
defined as
1.*
in Theorem
I
_.
REFERENCES
"'40' - .......... ~
[lJ Aitchison, J. and S.D. Silvey (1960), Maximum likelihood estimation
procedures and associated tests of significance, Jour. Roy. Stat.
Soc. B 22, 154-171•
.....;.......;-.-
-
[2J Barnard, G. A. (1947),
123-138.
--
Significance tests for 2X2 tables, Biometrika 34,
[3J Bartlett, M. S. (1935), Contingency table interactions, Jour. Roy. Stat.
Soc. Supp • ..g, 248-252.
[4J
(1937), Properties of sufficiency and statistical tests,
Proc. Roy. Soc. of London A 1£Q, 268-282.
2
[5J Berkson, Joseph (1955), Maximum likelihood and minimum X estimates of
the logistic function, Jour. Amer. Stat. Asso. .-50, 130-162.
[6 J Bhapkar, V. P. (1959 ), Contributions to the statistical analysis of
experiments with one or more responses (not necessarily normal),
Institute of Statistics Mimeo Series No. ~, Univ. of North
Carolina.
[7J
=_
~,
[8J
[9J
[10 J
(1961),
72-83.
Some tests for categorical data, Ann. Math. Stat.
_I
~_
(1966), A note on the equivalence of two test criteria for
hypotheses in categorical data, Jour. Amer. Stat. Asso. 61,
228-235.
..-.
and Gary G. Koch (1965a), On the hypothesis of "no interaction"
in three-dimensional contingency tables, Institute of Statistics
Mimeo SerieS No. ~..Q, Univ. of North Carolina.
--::-:-and Gary G. Koch (1965b), Hypothe se s of no interaction in fourdimensional contingency tables, Institute of Statistics Mimeo
Series No. ~ Univ. of North Carolina.
[llJ Birch, M. W. (1964), A new proof of the Fisher-Pearson theorem, Ann. Math.
~ • ..<2, 817-824.
[12J Birnbaum, A. and A.E. Maxwell (1961), Classification procedures based
on Bayes's formUla, Applied Statistics 2, 152-169.
[13J Bowker, A. H. (1948), A test for symmetry in contingency tables, Jour.
Amer. Stat. Asso. ~, 572-574.
[14J Cochran, W. G. (1950), The comparison of percentages in matched samples,
Biometrika 37, 256-266.
[15J
I
I
I
I
I
I
I
-
~~~ (1952), The x2 test of goodness of bit, Ann. Math. Stat.
~,
315-345.
157
I
I
I
I
I
I
I
-.
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
[16J
(1954), Some methods of strengthening the corrnnon 12 test,
------~B~i-ometrics 10, 417-451.
-
[17J Cramer, H. (1946), Mathematical Metbods of Statistics, Princeton Univ.
Press, Princeton.
[18J Diamond, E. L. (1958), Asymptotic power and independence of certain
classes of tests on categorical data, Institute of Statistics
Mimeo Series No. ~, Univ. of North Carolina.
[19J
(1963), The limiting power of categorical data chi-square
tests analogous to normal analysis of variance, Ann. Math. Stat.
~, 1432-1441-
[20J
, S.K. Mitra and S.N. Roy (1960), Asymptotic power and
asymptotic independence in the statistical analysis of categorical
data, Bull. Inst. Inter. Stat. ....37 III, 309-329 •
[21J Finney, D. J. (1948), The Fisher-Yates test of significance in 2x2
contingency tables, Biometrika 35, 145-156.
---
[22J Fisher, R. A. (1928), On a property connecting the 12 measure of discrepancy with the method of maximum likelihood, Atti, Congresso
Internat. Mat., Bologna 6, 95-100. Reprinted in Contributions
to Mathematical Statistics by R. A. Fisher (1950). Wiley, New York.
[23J
and F. Yates (1957), Statistical Tables for Biological, Agricultural and Medical Research (5th Ed.), Oliver and Boyd, Edinburgh.
[24J Freeman, G. H. and J. H. Malton (1951), Note on an exact treatment of
contingency, goodness of bit and other problems of significance,
Biometrika ..-38, 141-149•
[25J Good, I. J. (1963),· Maximum entropy for hypothesis formulation, especially
for multidimensional contingency tables, Ann. Math. Stat. 34,
911-934.
'"""'
[26J Goodman, L. A. (1963a), On Plackett's test for contingency table interactions, Jour. Roy. Stat. Soc.~ gz, 179-188.
[27J
[28J
(1963b),
------=:----.
Jour. Roy. Stat.
On methods for comparing contingency tables,
Soc. !. ].S.€, 94-108.
(1964a), Interactions in multidimensional contingency
tables, Ann. Math. Stat. 35, 632-646.
----~~~-
~
[29 J
--,.-,_ (1964b), Simple methods for analysing three-factor interaction in contingency tables, Jour. Amer. Stat. Asso. ..-....
59, 319-352 •
[30J
(1964c) , Simultaneous confidence intervals for contrasts
among multinomial populations, Ann. Math. Stat. 35, 716-725.
---
158
I
[31J
and W. H. Kruskal (1954), Measures of association for cross
classifications, Jour Amer. Stat. Asso. 49, 732-764.
[32J
(1959),
"""'"
Measures of association for cross
Further discussion and references, Jour.
classifications. II:
Amer. Stat. Asso. 54, 132-163.
[33J
----
--- (1963),
Measures of association for cross
classifications. III: Approximate sampling theory, Jour. Amer.
Stat. Asso. ..58, 310-364 •
[34J Gumbel, E. J. (1963), On the reliability of the classical X2 test,
Ann. Math. Stat. ~, 253-263.
[35J Haldane, J. B. S. (1939), The mean and variance of X2 when used as a
test of homogeneity, when expectations are small, Biometrika 31,
346-355.
..[36J Hoeffding, Wassily (1965), Asymptotically optimal tests for multinomial
distributions, Ann. Math. Stat. ,...,..
36, 369-401.
[37J Irwin, J. o. (1949), A note on the subdivision of X2 in certain discrete
distributions, Biometrika 36, 130-134.
----- .........
[38J Kastenbaum, M. A. and D. E. Lamphiear (1959),
Calculation of chi-square
to test the no three-factor interaction hypothesis, Biometrics
15, 107-115 •
.......
[39J Kullback,S. (1959),
Information Theory and Statistics, John Wiley and
Sons, New York.
2
[40J Lancaster, H. o. (1949),
The derivation and partition of X in certain
discrete distributions, Biometrika ~ 117-129.
[41J
-------
(1951),
Complex contingency tables treated by tbe
2
partition of X , Jour. Roy. Stat. Soc.B 13, 242-249•
....-
[42J Lemnann, E. L. (1959),
Testing Statistical Hypotheses, John Wiley and
Sons, New York.
[43J Lewis, B. N. (1962),
On the analysis of interaction in multidimensional
contingency tables, Jour Roy. Stat. Soc. A 125, 88-117.
---
[44J Mann, H. B. and A. Wald (1942),
On the choice of the number of class
intervals in the application of the chi-square test, Ann. Math.
Stat. 13, 306-317.
-
..--
[45 J Maxwell, A. E. (1961), Analysing Qualitative Data, Methuen and Co.,
London.
_.
I
I
I
I
I
I
I
e,.1
I
I
I
I
I
I
I
-.
159
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I"
I
[46J McNemar, Q. (1947),
Note on the sampling error of the difference between
correlated proportions or percentages, Psychometrika 12, 153-157.
<-
[47J llitra, S. K. (1955), Contributions to the statistical analysis of
categorical data, Institute of Statistics Mimeo Series No. 142,
Univ. of North Carolina.
~
[48J
(1958), On the limiting power function of the frequency
chi-square test, Ann. Math. Stat. 29,1221-1233.
-
[49J Neyman, J. (1929),
Contribution to theory of certain test criteria,
Bull. Inst. Inter. Stat. ........
18, 1-48•
[50J
2
(1949),
Contribution to the theory of the 1 test, Proceedings
of the Berkeley Symposium on Mathematical Statistics and
Probability, University of California Press, Berkeley, 239-273.
[51J Pearson, E. S. (1947),
The choice of statistical tests illustrated on
the interpretation of data classed in a 2x2 table, Biometrika
34, 139-167.
---
[52J Pearson, K. (1900),
On the criterion that a given system of deviations
from the probable in the case of a correlated system of variables
is such that it can be reasonably supposed to have arisen from
sampling, Philosophical Magazine Series 5, ~, 157-172.
[53J Pitman, E. J. G. (1948),
Nonparametric statistical inference, Unpublished
lecture notes given at Columbia University, New York.
[54J Plackett, R. L. (1962), A note on interactions in contingency tables,
Jour. Roy. Stat. Soc. B. £t., 162-166.
[55 J
-.",-,-_
(1964) , The continuity correction in 2x2 tables,
Biometrika ........
51, 327-337.
[56J Rao, C. R. (1958), MaXimum likelihood estimation for the multinomial
distribution with an infinite number of cells, S~hya 20,211-218.
--
[57J
(1961),
A study of large sample test criteria through properties
of efficient estimates, sankhya~, 25-40.
[58J
(1965), Linear Statistical Inference and Its Applications,
<John Wiley and Sons, New York.
[59J Roy, S. N. and V. P. Bhapkar (1960),
Some nonparametric analogs of
normal ANOVA, MANOVA, and of studies in normal association, Contributions to Probability and Statistics, Stanford Universit-y-Press, Stanford, 371-387.
[60J
and M. A. Kastenbaum (1956), On the hypothesis of no interaction in a multi-way contingency table, Ann. Math. Stat. ~,
749-757.
160
I
[61J
and S. K. Mitra (1956), An introduction to some nonparametric
generalizations of analysis of variance and multivariate analysis,
Biometrika~,
361-376.
[62J Sathe, Y. S. (1962),
studies of some problems in nonparametric inference,
Institute of Statistics Mimeo Series No. 325, Univ. of North
Carolina.
-
[63J Simpson, E. H. (1951),
The interpretation of interaction in contingency
tables, Jour. Roy. Stat. Soc.B 11, 238-241.
[64J Stuart, A. (1955), A test for homogeneity of the marginal distributions
in a two-way classification, Biometrika~, 412-416.
[65J Tocher, K. D. (1950), Extension of the Neyman-Pearson theory of tests to
discontinuous variates, Biometrika ........
37, 130-148.
[66J Wald, A. (1943),
Tests of statistical hypotheses concerning several
parameters when the number of observations is large, Trans. Amer.
Math. Soc. 54, 426-482.
[67J
(1950),
-
Statistical Decision Functions, John Wiley and Sons,
New York.
[68J Welch, B. L. (1938),
On tests for homogeneity, Biometrika 30, 149-158.
--- x2 dis-
(1964), A complete multinomial distribution compared with
the x2 approximation and an improvement to it, Biometrika 51,
~
[71J Williams, C. A. (1950),
On the choice of the number and width of classes
for the Chi-square test of goodness of fit, Jour. Amer. Stat. Asso.
jQ, 77-86.
[72J Woolf, B. (1955),
On estimating the relation between blood groups and
disease, Ann. Human Genetics .....-.
19, 251-253.
2
[73J Yates, F. (1934),
Contingency tables involving small numbers and the X
test, Jour. Roy. Stat. Soc. Supp k 217-235.
[74J
(1948), The analysis of contingency tables with groupings
based on quantitative characters, Biometrika 35, 176-181.
---~
--
I
I
I
I
I
I
el
[69J Wise, M. E. (1963), Multinomial probabilities and the X2 and
tributions, Biometrika~, 145-154.
277-281.
,.-I
I
I
I
I
I
I
I
...
161
I
© Copyright 2026 Paperzz