Kim, Buyng Soo; (1984).Studies of Multinomial Mixture Models."

STUDIES OF MULTINOMIAL
MIXTURE MODELS
by
Byung Soo Kim
A Dissertation presented to the faculty of The
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements of the
degree of Doctor of Philosophy in the Department
of Statistics
Chapel Hi 11
April 1984
Approved by:
~-Jf!J17~
~~
Reader
BYUNG sao KIM.
Studies of Multinomial Mixture Models
(Under the direction of Barry H. Margolin)
We investigate certain inferential aspects of mixtures of multinomial distributions, both in nonparametric and parametric contexts.
As a nonparametric mixture model we propose a k-population finite mixture
of binomial distributions, which can be applied to the analysis of noniid data generated from a series of toxicological experiments. A necessary
and sufficient identifiability condition for the k-population finite mixture of binomials is obtained. The maximum likelihood estimates (MLE's)
of the k-population finite mixture of binomials is computed via the EM
algorithm (Dempster, Laird and Rubin, 1977), and the asymptotic properties
of the MLE's are discussed. The identifiability condition is equivalent
to the positive definiteness of the information matrix for the parameters.
The MLE's and their sampling distributions, together with
the
data
mentioned above, provide an empirical check of the statistical procedures
proposed by Margolin, Kaplan and Zeiger (1981).
The Dirichlet-multinomial distribution, a parametric mixture of
multinomials, is discussed as a random group effects model for a one-way
layout contingency table. Interest focuses on testing the hypothesis of
no random effects. For this
testing
problem' Neyman's C(a) procedure
yields a new test statistic, which is aymptotically superior to Pearson's
chi-square test. This superiority is further evidenced by a Monte Carlo
simulation study. A duality between the C(a) statistic and the Catanova
statistic proposed by Light and Margolin (1971) is demonstrated.
The random effects model for the one-way layout contingency table is
extended within the framework of the Dirichlet-multinomial distribution
to a balanced nested mixed effects model, and two hypothesis testing
problems are investigated.
ACKNOWLEDGEMENTS
I wish to express my deepest gratitude to my research advisor,
Dr. Barry H. Margolin, for his suggestion of this topic and for his
guidance and encouragement throughout the duration of this research.
I also would like to thank my committee members, Dr. Norman
Johnson, Dr. Gordon Simons, Dr. Doug Kelly, and Dr. David Ruppert,
for their careful reading of the manuscript and many valuable comments.
The financial support from the Statistics Department has been
indispensable; without it my stay in Chapel Hill would not have been
possible. Thanks are also extended to Dr. David Hoel of the National
-e
Institute of Environmental Health Sciences for providing computer
facilities. Credit is also due to Mrs. Judy Harrelson and Mr. K. Doug
Vass for their excellent typing job.
I am especially indebted to my parents at home in Korea, who have
been praying for the successful completion of my study in Chapel Hill.
Finally, I would like to thank my wife Myung Sook, son Stephen, and
mother-in-law for their understanding and support •
•
TABLE OF CONTENTS
Page
CHAPTER I
INTRODUCTION AND SUMMARY .
1.1
··
1
The Binomial Distribution
1
1.2 Mixture Models of Count Data.
1.3 Scope of the Thesis .
1. 3.1
3
·
6
······
The Finite Mixture of Binomial
Distributions. · · · · ·
6
1. 3. 2 The Dirichlet-Multinomial
Distribution . ·
1.4
CHAPTER II
Further Research.
.··
14
FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS
15
2.1
15
~e
Identifiability Problem
2.1.1
Preliminaries.
·
15
2.1.2
I-Population Finite Mixture of
Multinomials · · · · · · · ·
16
2.1. 3 k-Population Finite Mixture of
Multinomials · · · · · · · · ·
2.2
·
20
Estimation of the Mixing Distribution .
24
2.2.1
Preliminaries.
·······
24
2.2.2
Maximum Likelihood Equations
26
2.2.3 Asymptotic Distribution of the
ML Estimator . · · · · · · ·
2.3
e
10
·
31
k-Population Finite Mixture of
Binomials - Application. · · ·
37
2.3.1
Description of the Ames Test
37
2.3.2
Statistical Analysis of the
Experimental Data. · · · · ·
38
Further Analysis of the Derived
Data: Mixture Model
·····
41
2.3.3
·
Page
2.3.3.1
CHAPTER III
k-Population Mixture
of Two Binomials. . .
2.3.3.2 Results of the Analysis
50
2.3.3.3 Discussion of the
Results . . . .
53
MIXTURE OF MULTINOMIAL DISTRIBUTIONS
3.1
Binomial Case
.
3.2 Random Effects Model of One-Way Layout.
3.2.1
68
68
77
Dirichlet-Multinomial Model.
78
3.2.2 Test of the Random Effects.
81
3.2.2.1
Case of P Known.
3.2.2.2 Case of P Unknown.
3.2.3 Approximate Null and Alternative
Distributions . . .
..
-e
41
81
83
87
3.2.3.1 Approximate Null Distributions. . . .
.
87
3.2.3.2 Approximate Alternative
Distributions. . .
3.2.4 ARE of X~ Relative to Tk. .
3.2.5 Monte Carlo Simulation: Power
Compari son..
.
CHAPTER IV
91
93
95
3.2.6 Duality between C and Tk. . .
Appendix I: Wisniewski-type Alternatives
103
Appendix II: The Dirichlet-Multinomial
Alternatives. . . . .
107
BALANCED NESTED MIXED EFFECTS MODEL .
107
4.1
Introduction . .
4.2 Test of the Nested Random Effects..
100
107
108
Page
4.2.1
C(a) Test. . . . . . . . . .
4.2.2 ARE of X(3) Relative to T(3)..
4.3 Test of Equality of the Fixed Row
Effects. . • . . . . . . . . . . . .
4.3.1
CHAPTER V
114
115
117
4.3.2 ARE of a Test F Relative to a
Test C. . . . . . . . . . . . .
122
4.3.3 Wald Statistic for Testing the
Equality of Fixed Effects.
130
FURTHER RESEARCH
5.1
-e
Wald Statistic and Chi-Square
Statistic . . . . . . . . .
109
.
133
The Finite Mixture of Binomial Distributions. . . . . . . . . . . . . . . .
133
5.2 Tk Statistic as a Measure of Association . . . . . . . . . . . . . . . . .
138
5.3 The Nested Random Group Effects Model
of Count Data.
. . . .
140
BIBLIOGRAPHY . . . .
145
CHAPTER I
INTRODUCTION AND SUMMARY
1.1
The Binomial Distribution
Among the class of discrete distributions the binomial distribution
is by far the most widely used for bounded count data, while the Poisson
distribution enjoys the same role for unbounded count data.
Since the
binomial and Poisson distributions share certain important properties
and there exist several interesting relations between these two distributions, we include the Poisson distribution in the context of our
-e
discussion of the binomial distribution.
To introduce notation, these
two distributions are formally defined as follows:
Definition 1.1:
A random variable X has a binomial distribution with
parameters nand p, denoted by X ~ B(n,p), if
n x (l-p )n-x
Pr(X=x ) = (x)p
for x=O,l, ... ,n, 0
<
p
<
1, and n
E
(1. 1)
+
I,
where I+ is the set of positive integers.
For notational convenience
we write b(x;n,p) for the binomial mass function (1.1) and B(x;n,p) for
the corresponding distribution function.
Definition 1.2:
A random variable X has a Poisson distribution with
parameter A, denoted by X ~ P(A), if
Pr(X=x) = e-AAx/x!
for x=O,1,2, ... , and A> O.
(1 .2)
2
These two distributions possess a variety of desirable properties,
which, in part, account for their popularity.
{P(\);\
>
Both the Poisson family
O} and the binomial family {B(n,p); n E I + and known, O<p<l}
are one-parameter exponential families.
Hence the ML estimator for the
single parameter is a sufficient statistic and achieves the Cramer-Rao
lower bound.
If Xl ,X 2 , ... ,X k are independent Poisson random variables
such that Xi - P(\i) for i=l, ... ,k, then the conditional distribution of
Xi given
\./\~
1\0) for i=1,2, ... ,k.
, L = J
k
~j=lXj = n is B(n,
J
For large n,
the DeMoivre-Laplace limit theorem admits a normal approximation to the
binomial distribution.
There also exist several interesting relations between Poisson and
binomial distributions.
Among them we may note a result due to Chatterji
(1963) that if X and Yare independent nonnegative integer-valued random
-e
variables such that Pr(X=x)
>
0 and, Pr(Y=x)
>
0 for x = 0,1,2, ... , and
the conditional distribution of X given X + Y is binomial, then both X
and Yare Poisson random variables.
Another relation that is usually
referred to as the 'Poisson approximation of a binomial
I
is cited as a
lemma for (Feller, 1968, p. 153).
Lemma 1.1:
Suppose that {X n} is a sequence of random variables such that
Xn - B(n,Pn) and nPn -+ \ as n --+ 00, where 0 < \ < 00; then
(1 .3)
for x
= 0,1,2, ... as n -+
00 •
From the standpoint of this thesis, the most important property
shared by the binomial and Poisson distributions is that these two
distributions belong to a class of discrete probability distributions
that admit mathematically tractable mixture generalizations.
3
1.2 Mixture Models for Count Data
Definition 1.3:
Let F(X;8) be a d-dimensional distribution function
'" '"
indexed by an m-dimensional parameter vector
and let
G(~)
~
in a parameter space 8
be an m-dimensional distribution function.
Then
is called a G mixture of F or simply a mixture, and F and G are referred
to as the kernel and the mixing distribution, respectively.
kernel in (1.4) is expressed in terms of a density function
If the
f(~;~),
then
(1 .5)
is called a mixture density when (1.5) exists.
If the mixing distribution G is discrete with finite support, then
we call the resulting mixture a finite mixture.
-e
Johnson and Kotz (1969) we may write
h(~)
H(~)
Using the notation in
= F(~;~)
i
G(~)
for (1.4) and
= f(~;~) @G(~) for (1.5). The following definitions are included
for notational purposes.
Definition 1.4:
parameters A >
A random variable X has a gamma distribution with
a and
r
>
0, denoted by X '" G(A,r), if it has a prob-
ability density function (p.d.f) fG(x) given by
r
r-l -AX
( ) - A
f GX-rrxrx
e
for x
>
(1 .6)
0, where
r(t)
Definition 1.5:
= f ooa x t-l e -x dx
for t
>
O.
(1 .7)
A random variable X has a negative binomial distribu-
tion with parameters m and c, denoted by X'" NB(m,c)
if
4
-1
Pr(X=x) = r ( x+c )
-1
x! r ( c )
for x = 0,1,2, ... ,0
<
-1
cm ) x( 1 Jc
(l+cm l+cm
m < 00, and 0
~
c
<
(1 .8)
00; values for c = 0 are
understood to be evaluated in the limit as c
+
o.
It is an easy exercise to show that a negative binomial distribution
can be obtained as a gamma mixture of Poissons, that is,
NB(m,c) = p(e) ~ G((mc)-l,c- l )
(1. 9)
Mixture models for count data have been formulated ever since the
limitations of the binomial and Poisson distributions to fit data were
first noted.
In the binomial distribution the independence of n
Bernoulli trials and the constancy of the success probability p throughout the n trials may be suspect in specific applications.
For instance,
as is observed in Haseman and Kupper (1978), in certain animal studies
to investigate the toxicological effect of a compound there is a
tendency for implants from the same litter to respond more alike than
implants from different litters.
That is, the litter-specific success
probability varies from litter to litter.
Therefore, the independence
among implants in the same litter may not be maintained.
For the Poisson distribution the equality of mean and variance
places an important restriction on the applicability of the model in
practice.
Thus, for example, Paul and Plackett (1978), and Margolin,
Kaplan and Zeiger (1981) study the effects of non-Poisson distributed
random variables, specifically negative binomial random variables, on
certain aspects of inference for the Poisson based model.
These limitations of the binomial or Poisson distributions are most
evident when there is clear 'heterogeneity' of count data; this in turn
has led to various considerations of alternatives or generalizations of
5
those two important count data distributions, foremost among which have
been mixture formulations.
As early as 1915, K. Pearson (1915), having noted the 'heterogeneity· of the data, considered a mixture of two binomial distributions
as a model for the counts on yeast cells analyzed by Student (1907).
Greenwood and Yule (1920) considered accident data and found that a
gamma mixture of Poissons, which as noted is a negative binomial
distribution, gave a closer fit to the data than a single Poisson
distribution.
In the same vein Skellam (1948) introduced a beta-bino-
mial distribution in the form of a beta mixture of binomial distributions, after observing that the association probabilities varied from
nucleus to nucleus in the analysis of the secondary association of
chromosomes in Brassica.
-e
As the above historical examples of mixture distributions indicate,
there have been two distinct subclasses of mixture models.
A parametric
mixture is defined to be a mixture in which the mixing distribution has
a specific functional form, whereas in a nonparametric mixture the
mixing distribution does not have a specific functional form.
The
mixture of two binomials is an example of a nonparametric mixture and
the negative binomial and the beta-binomial are examples of a parametric mixture (Johnson and Kotz, 1969).
The flexibility and generality of a mixture model for count data
are gained at the expense of simplicity and the attractive 1properties
of the binomial or the Poisson model.
This is w~ll illustrated in the
search for maximum likelihood estimates (MLE) of parameters from a mixture model of count data; in most cases this involves iterative solution of a set of equations rather than a closed form solution.
6
1.3 SCOPE OF THE THESIS
This thesis investigates certain inferential aspects of mixtures of
multinomial distributions, both in nonparametric and parametric mixture
models.
In Chapter 2 a class of finite mixtures of binomial distribu-
tions is proposed to model non-iid data generated from certain important
toxicological experiments, and the resultant implications are investigated.
Chapter 3 combines studies of the goodness of fit test of the
binomial distribution against parametric mixture alternatives and the
development of a random effects model for count data in a one-way layout
contingency table by employing a Dirichlet mixture of multinomial
distributions.
In Chapter 4 we discuss a balanced nested mixed effects
model based on a Dirichlet mixture of multinomial distributions.
Finally in Chapter 5 we suggest several problems for further research.
·e
1.3.1
The Finite Mixture of Binomial Distributions
Besides K. Pearson's early attempt to fit Student's counts with a
mixture of two binomials (Pearson, 1915), one other application of this
approach is noteworthy.
Neyman (1947) developed a finite mixture of
binomial distributions for the analysis of roentgenographic reading
results of tuberculosis tests.
A set of four chest films of different
sizes was taken for each subject and each film was interpreted
independently by five expert radiologists.
population into three categories:
(i)
Neyman decomposed the patient
entirely free from the disease,
(ii) moderately affected, and (iii) heavily affected; and associated
l-T, p, and 1 as the probabilities of correct diagnosis for the components (i), (ii), and (iii), respectively.
Hence the number of positive
results among five independent reader diagnoses for a particular film
7
follows a mixture of three binomial distributions, or a mixture of two
binomials if the component (iii) is entirely dropped from the model.
However, as Neyman (1947) indicated lIin reality we may expect that
the subdivision of human population is much finer than is postulated
here and that the category of 'moderately affected
I
splits into a
continuous graduation of the intensity of the illness, from very slight
to very heavy ...
11
Thus a 'finite' mixture may be regarded as a
'simplification', which appears to have been the primary motivation for
introducing the finite mixture of binomial distributions.
There is, however, truly a need for a finite mixture of two binomials in the context of imperfect testing of various materials for
positive or negative evidence of a specified characteristic that is
either present or absent.
·e
Let ¢ be a set of r materials that have been
tested and let the i-th subset of ¢ consist of r i materials, indexed by
(i,j), j=l, ... ,r i , each of which has been tested ni times for
k
Clearly Ii=lr
i = r. Let T and (l-p) denote the probabilities of a false positive and a false negative, respectively, and
i = 1,2, ... ,k.
let
~
be the prevalence rate of the characteristic in question.
We
assume independence among all I~=lniri tests.
For the i-th subset of r i
materials, each of which has been tested ni times, we can summarize the
), where Xij is
observed data in a random vector ~i = (X il ,X i2 ,·· .,X
iri
the number of positive findings in the ni tests with material (i,j).
Under our assumptions, the i-th observation vector X, is now a random
~,
sample from a mixture of two binomial distributions, that is,
iid
X, 1 ,X'2'''''X, ~ TI(n,. p} + (l-TI) B(n",T}
"
,ri
for i=l, ... ,k.
We note that
~l '~2""'~k'
(1.10)
the totality of observations,
8
do not constitute an independent and identically distributed (iid) data
set unless the nils are all equal.
Even though the mixture model (1.10) is the primary motivation for
the research to be discussed, we note the generalization of (1.10) to
mixtures of c binomial distributions, where c
~
2 is known a priori.
Define Gc to be a class of all discrete distribution functions with c
atoms. Then we have
iid
Xll ' ... , Xl r
l
~
B(n l ,p)
P
B(n 2 ,p)
P
A
G(p)
iid
X21'···'X2r2
~
A
G(p)
(1.11)
iid
·e
Xkl ,· .. 'X
krk
~
B(nk,p)
A
P
G(p) ,
where G E Gc .
Equation (1.10) is then the special case of (1.11) where c = 2.
We call
the mixture model (1.11) a k-population finite mixture of binomial
distributions and define the parameter space of interest to be
c- l
. 1 ni
I 1=
< 1, O<p 1<P2< .. ·<P c <l} .
When k=l, the formulation (1.11) reduces to iid data from a finite mixture of binomial distributions, which is referred to as a l-population
finite mixture of binomials.
Thus the i-th finite mixture of binomials
in (1.11) has a mixture density
c
ni
x
n.-x
h.(e) = \. In.( ) p.(l- p.) 1
1 ~
L. J = J x
J
J
(1.12)
9
for x = O.l
1-
Ij = 1TI j
•
ni • where
i =1,
, k.
~
= (·rr1 ... ·'TIc _1 ' P1'''.'Pc) and TIc =
Our primary interest lies in the MLE of the mixing distribution G
in (1.11), which is equivalent to finding an MLE
e of ...,8 in
...,
(1.12)
because of the nonparametric nature of the mixing distribution G.
Before estimation is attempted, however, we need to verify that the
k-popu1ation finite mixture of binomial model in (1.11) is 'identifiable'.
Teicher (1963) showed that the 1-popu1ation finite mixture of
binomial distributions in (1.12), i.e., k=l, is identifiable if and
only if n1 ~ 2c-1. In Chapter 2 we extend Teicher's result to the
multinomial case and find the necessary and sufficient condition for
identifiability of a k-popu1ation finite mixture of multinomial distributions.
·e
The maximum likelihood approach to estimation of the mixing
distribution in the finite mixture problem has been discussed only since
the late 1960's when access to fast electronic digital computers made
it feasible.
Hasse1b1ad (1969) developed a set of iterative equations
for obtaining MLE's of parameters in the 1-popu1ation mixture of members
of the exponential family. which was later recognized as a special case
of the EM algorithm (Dempster. Laird and Rubin. 1977).
By extending
Hasse1blad ' s iterative equations to the case of the k-population
mixture of certain members of the exponential family. we obtain as a
special case an algorithm for the MLE
~
of
~
for the k-popu1ation
finite mixture of binomials model (1.11).
This theoretical framework of the k-population mixture of
binomials model is applied in Chapter 2 to the analysis of a sizable
database of Ames test data. gathered at considerable cost to the
10
federal government.
The Ames test is a bacterial test that detects
evidence of mutagenicity for chemical compounds.
The observed Ames test
data in a given experiment, which are counts, can be transformed into
a 0 or 1 indicating a non-mutagen or mutagen, respectively, through
analysis via a family of mutation models described in Margolin, Kaplan
and Zeiger (1981).
Analysis of the derived 0-1 experimental results
based on the k-popu1ation mixture of two binomi'a1s provides estimates
of (i) the prevalence rate of mutagens among the test compounds, (ii)
the false positive rate and (iii) the false negative rate, as well as
the standard errors of these three estimates.
By providing these
estimates of various parameters of interest, the k-popu1ation finite
mixture model provides an empirical check of the operational properties
of the mutation models and statistical procedures proposed by Margolin,
·e
Kaplan and Zeiger (1981).
1.3.2 The Dirichlet-Multinomial Distribution
Using an analogy to fixed effects and random effects in linear
models, it appears that almost all the methods for the analysis of
multi-dimensional contingency tables have focused on fixed-effects
models.
Fienberg (1975) points this out and lists the development of a
discrete analog to the nested and random effects (Model II) ANOVA models
among the unsolved problems in the analysis of multi-dimensional contingency tables.
In section 3.2 we develop a random effects model for the one-way
layout contingency table.
o
Define
Sp = {(Pl,· .. ,PI-l); 0
<
Pi
<
\'1-1
1, l.i=1 Pi
< l}
(1.13)
11
(1.14)
For our development, we need the following definitions.
Definition 1.6:
A random vector
~
= (Xl , ... ,X I _l ) has a multinomial
distribution with nand £ = (Pl, ... ,PI-l)' denoted by X ~ M(n,£) if
Pr(~=~)
for
~
E Sx and £
= (
n
Xl""'X I
I
xi
)IT'- l p.
1-
1
(1.15)
o
\1-1
\1-1
Sp , where xI=n - Li=l xi and PI=l - Li=lPi .
E
~
~
Notationally,
m(~;n,£)
and
M(~;n,£),
denote the multinomial mass function
(1.15) and corresponding distribution function, respectively.
Definition 1.7:
A random vector
= (U l , ... ,U I _l ) has a Dirichlet
= (8 1 " .. ,8 1), denoted by ~ ~ D(@) if it has a p.d.f
distribution with
~
~
given by
1-1 8i -l
1-1
81- 1
= I
( IT u.
)(l-I·-lu.)
IT =lf(8 ) i=l 1
11
i
i
f(B)
·e
(1.16)
for ~u E S.':!o ,
I
where 8i > 0 for i=l ,2, ... ,I, and B = Li=18 i .
The Dirichlet distribution D(8) can be reparametrized so that it can be
denoted by D(E,e), where TI i = 8i /B and e = liB for
o
E = (TIl"" ,TI I _l ) ESE' A Dirichlet mixture of multinomial distributions
is called the Dirichlet-multinomial distribution, and denoted by
DM(n,E,e), that is,
(1.17)
Mosimann (1962) provides an extensive study of the Dirichlet-multinomial
distribution, thereby extending Skellam's work on the beta-binomial
distribution.
Brier (1980) investigates the effect of the Dirichlet-
multinomial distribution on the chi-square test of a general hypothesis
12
in the one-way layout contingency table and shows that Pearson's
chi-square statistic is in fact asymptotically a constant multiple of a
chi-square random variable when the hypothesis is true.
Thus it follows that in a contingency table with I response
categories and G groups the Dirichlet-multinomial distribution
DM(n+ ,TI,8) can introduce random group effects, since the j-th group
o
J -
probability vector, say ej' is now randomly generated from D(n,8) and,
conditional on the observed ej' the j-th group response vector, say
~j'
has a multinomial distribution M(n+.,p.), where n+. is the j-th group
J -J
size.
J
In a handy notation this is described as
£j
iid
~
0(;[,8)
(1.18)
(n . In+ . , £ .)
~J
·e
J
J
~
M(n+ . , £ . )
J
J
for j=1,2, ... ,G.
The primary concern of section 3.2 is hypothesis testing of the
presence of random group effects, which can be formulated as
Ho:
e =0
vs.
Ha:
8 > O.
(1.19)
For testing (1.19) we find that Neyman1s C(a) procedure yields a new test
statistic, denoted by Tk. The asymptotic relative efficiency
of the classical chi-square statistic X2p satisfies
e(X~ITk)
(1. 20)
where the equality holds iff the group
sizes {n+j};=l are asymptotically
The superiority of Tk to X~ based on (1.20) is
further evidenced by a Monte Carlo simulation that compares the actual
balanced or G = 2.
performances of those two statistics in terms of their sizes and powers.
13
The formulation of the random effects model in the IxG contingency
table (1.18) is extended to a balanced nested mixed effects model in
Chapter 4.
Using the conditioning arguments employed in (1.18), nested
mixed effects can be represented as
ind
(P·k1n.)
~ D(n.,S)
~J
~J
~J
ind
(1.21)
~ M(n+·k,p·k)
(n·k!n+·k,p·k)
~J
J ~J
J ~J
for j=1,2, ... ,R and k=1,2, ... ,C,
where
~l' ~2"
"'~R
are fixed and correspond to R levels of the row
variable, and ~J
P'k corresponds to the k-th replication within the j-th
level of the row variable.
In the model (1.21) interest centers on the hypotheses of no nested
random effects and the equality of the fixed row effects, which are
·e
respectively formulated as
H : 8 =
r
a
vs.
(1. 22)
and
= ~R vs.
( 1. 23)
The C(a) procedure can be readily extended for problem (1.22); however,
for testing (1.23) two side questions can be raised.
(i)
Are the Wald statistic and the Pearson's chi-square statistic
asymptotically equivalent in the presence of nested random
effects?
(ii)
What is the cost of analyzing the balanced nested mixed effects
model as if it were a crossed mixed effects model?
Complete answers to those two questions are not available for general I,
R, and C; however, based on the results for I = R = 2 and general C in
14
section 4.3 the answer to (i) appears to be yes.
The answer to (ii)
appears to be sizable when the group sizes {n+ jk } exhibit 'reasonable'
departures from balance. Finally in section 4.3 a Wa1d test is constructed for testing (1.23).
1.4
FURTHER RESEARCH
In Chapter 5 four problems for further research are discussed.
(i)
Study of the uniqueness of the MLE
~
of a finite mixture of
binomial distributions.
(ii)
Development of the likelihood ratio test in the finite mixture
problem for testing Ho:
c=l
versus
Ha:
c=2, where c refers to
the number of components of a population.
(iii)
·e
(iv)
Use of Tk statistic as a measure of association.
Development of a random effects model in a two-way layout
contingency table.
CHAPTER II
FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS
This chapter focuses on problems relating to the k-population
finite mixture of binomial distributions, such as identifiability,
estimation of the mixing distribution and the asymptotic covariance
matrix, and the asymptotic distribution of the ML estimator.
Finally
an example will be presented together with extensive numerical
analyses.
2.1
·e
2.1.1
IDENTIFIABILITY PROBLEM
Preliminaries
Estimation of the mixing distribution in any mixture problem is
meaningful only if the mixture distribution is 'identifiable'.
Early on
K. Pearson (1894) treated this problem for the case of a mixture of two
normal distributions; later Feller (1943) observed that any mixture of
Poisson distributions was always identifiable due to the uniqueness
property of the Laplace transform.
Teicher (1963) pursued the study of identifiability in the case of
finite mixtures, including the finite mixture of binomials.
A portion
of his development of this topic is summarized below.
Let G*c be the class of all discrete distribution functions with at
most c atoms, and let HF be the class of all finite mixtures of F given
by
16
HF = {H(x); H(x) = fOF(x;8)dG(8), G
= {H(x); H(x) = F(x;8)
Definition 2.1:
A
8
G(8)} .
If H is considered as the image of the map of G,
then HF is said to be identifiable if and only if this map defines a
one to one map of G*c onto HF.
Teicher (1963) found a necessary and sufficient identifiability
condition for the class of all finite mixtures of binomial distributions
with n fixed, which is stated as a lemma.
Lemma 2.1
(Teicher, 1963):
Let B = {B(x;n,p); 0 < p <
l}
be a
one-parameter family of binomial distribution functions, n being fixed.
A necessary and sufficient condition that the class
G(p), G E G* }
c
P
2c - 1.
H = {H(x); B(x;n,p)
·e
B
is identifiable is that n
~
A
l-Population Finite Mixture of Multinomials
2.1.2
In exploring another dimension of the identifiability problem,
Chandra (1977) related the identifiability of the class of mixtures of
multivariate distributions to the identifiability of the class of
mixtures of the corresponding marginals.
In what follows, G is defined
to be a class of arbitrary distribution functions.
for i=l, ... ,k and let
~
= (xl' ... ,x k)
~ F(o;~).
Let X.1
~
F.(o;e.)
1
~l
Then Chandra (1977)
in his theorem 2.1 showed that the identifiability of the class
HF. = {Hi(x); Hi(x) = F·(X;8.)
1
-1
1
A
6
G·(8.),
1 _1
for all i=1,2, ... ,k
implied the identifiability of the class
H
F
= {H(x);
H(x)
= F(x;8)
-
A
~
G(8), G E G}
G·1
E
G}
17
Chandra's theorem permits an immediate extension of Teicher's
results to yield a new identifiability condition for the class of finite
•
mixtures of multinomial distributions .
Lemma 2.2:
Let M{x;n,p) be the distribution function of a multinomial
'"
'"
£=
distribution with parameters (n'E)' where
\r
Li=l Pi = 1.
Then
the class
HM = {H(~); M(~;n,£)
EG(£),
is identifiable if and only if n
Proof.
(Pl , ... ,P r ), Pi > 0, and
~
G E G*c }
(2.1)
2c - 1.
Let Gi(Pi) be the marginal distribution function (d.f) of
G(Pl , ... ,P r ) with respect to Pi'
Then the marginal d.f. Hl of Xl can
be obtained as
Hl (x) = f f .. · f dH(x l , ... ,x r )
(_co,x] ><z
X
·e
r
1'" flO[
= fo
= I6 .. ·!6(
1
f ·.. I
f
(_co,x]><Z
I
( _oo,x]
X
r
dM(~;n,p)JdG(Pl'''''P )
'"
r
dB(xl;n,Pl))dG(Pl'''''P r )
1
= fo .. ·foB(x;n,Pl)dG{Pl'·"'P r )
= B(x;n,Pl)
A
Gl(Pl)'
Pl
where the interchange of integrations in the second step can be
justified using the result in Neveu (1965, p.77).
Similarly X. '" B(x;n,p.) A G.(p.)
,
, Pi ' ,
for i=l, ... ,r.
Since G,- is a
marginal distribution of Pi' the number of atoms, say c i ' is less than
or equal to c. Thus n ~ 2c - 1 implies n ~ 2c. - 1 for i=l , ... ,r.
,
Hence by lemma 2.1 each class of mixtures of binomial distributions is
identifiable.
able if n
~
Thus by theorem 2.1 of Chandra (1977), HM is identifi2c - 1.
18
For the necessary condition we prove the contrapositive.
n
<
2c - 1.
Suppose
Thus it suffices to show that there exist two different
mixing distributions, say G1 and G2 in G, giving rise to a common
mixture distribution. Consider G1 and G2 whose c atoms are
2i = (p, ... ,p, qi' 1-(r-2)p- qi) for i=l,
,c with corresponding probabilities ~n = (nl, ... ,n C ) and 2Cl
+' = (p,
,p, qCl
+., 1-{r-2)p-q C1
+.)
for i=l, ... ,c with corresponding probabilities
n* = (n c+l , ... ,n2c )'
respectively, where ql, ... ,qc' qc+l, ... ,q2c are all distinct.
To
prove the result we need to demonstrate the existence of
n* such
~
and
that
But (2.2) is equivalent to
L~:16.M(x;n,p.)
= a for all .-~ ,
1- 1
~
~1
(2.3)
for suitable choices of 6.1 Is.
Since M(n,p) has the probability
generating function {tlPl+···+
tr_1Pr_l+(1-L~:~Pi)}n when E = (Pl,···,Pr)'
~
(2.3) is equivalent to
\~:16.{tlP+
... + t r- 2P+t r- lq·+[1-{r-2)p-q.]}n
=a
L1 1
1
1
(2.4)
for all t = (t 1 ,···, t r _l ) .
Since (2.4) holds identically in t, (2.4) is equivalent to
L~~16i(1+up+Wqi)n = a
(2.5)
\r-2
for all ( u,w), where u = L;=lti-l and w = tr_l-l.
Now, (2.5) holds if the following homogeneous linear equations have a
nontrivial solution;
19
1
[1
p
1
-
p
p
ql
q2
'q3
q2c
p2
p2
p2
p2
pql
pq2
pq3
pq2c
2
ql
2
q2
2
q3
2
q2c
p3
p3
p3
p3
2
P ql
2
P q2
2
P q3 •
2
P q2c
2
pql
2
pq2
2
PQ3
2
PQ2c
Q3
3
Q2
3
Q3
1
P
...
3
Sj
r
82
83
=0
(2.6)
82c
Q2c
·e
We rewrite (2.6) as £~
2cxl vector.
= Q, where £ is an (n;2)
x
2c matrix and ~ is a
After deleting the linearly dependent rows in £' (2.6)
can be reduced to
e
1
1
1
Ql
2
Ql
Q2 • • • Q2c
2
2
Q2 •• • Q2c
n
Ql
n
Q2
n
Q2c
-
81l
62
= '"0
l;2~
(2.7)
20
which we denote by
(2.7), rank(Q)
~
rank(Q)
<
gf = Q.
2c.
In order to have a nontrivial solution in
But this is guaranteed since n+1
min(n+1 ,2c) = n+1
<
2c.
<
2c, and hence
Thus nontrivial values of
~
and
o
hence of nand n* can be found to satisfy (2.2).
2.1.3
k-Popu1ation Finite Mixture of Mu1tinomia1s
Suppose we observe k sets of independent random variables
X. l 'X. ' ... ,X.
1
12
*
lr
i
generated from M(n.,P)
1\ G(P) for i=l, ... ,k and
1 ~
P ~
~
G E Gc ' In the following discussion we first define the identifiability
problem of the k-popu1ation finite mixture model in general and then
specialize it to the multinomial context.
Let
.
·e
be a class of k-vectors, each of whose elements is ad-dimensional
distribution function indexed by a point f2. E R~ in a Borel subset R'f
of Euclidean m-space Rm such that each element F.(x:e)
of the vector
1
.E(k)C~:f2.) is measurable in Rd x R~. Then a vector of mixtures
~
g G(f2.
H1C~
F1C~:f2.)
H2C~)
F2(~:f2.) ~ G(f2.)
=
is the image of the map of G E G*c '
~
(2.8)
21
Definition 2.2:
Let H~k) be the class of k-population mixtures of
F(k) induced by the above mapping.
Then H~k) is said to be identifiable
if and only if this map is one to one from.. . .
We call
~(k)(~)
G: onto H~k)
a k-population finite mixture
a~d
denote with
corresponding small letters the set of marginal probability density
functions if they exist.
We now specialize the argument to the multinomial context.
M(k) =
Let
((~1(~;nl'£), M(~;n2,£), .. ·,M(~;nk'£)); £ = (Pl, ... ,Pr-l)'
.
\'r-l
o < Pi < 1 for l=l,
... ,r-l, Li=lPi < l}
be a class of vectors of k multinomial distribution functions,
nl ,n 2 , ... ,n k being fixed, and let
*
H.J = {M(x;n.,n)
A G(n), G E Gcl
. . . J.c. p
.c.
·e
j=l, ... ,k
and
(k)
HM
M(~; n l ,E)
p G(EJ
M(~;n2'£)
£G(£)
={
•
'
G
G*}
E:
C
Then we prove
Lemma 2.4:
n·
1
~
Suppose Hj is not identifiable.
Then any Hi such that
n. is not identifiable, and there exists at least one common pair
J
(G l ,G 2), G11G2 that are mapped to a given mixture for all i such that
n.
1
~
n ..
J
22
Proof.
Non-identifiability of Hj implies that for Gl , G2
Gl ' G2 we have a common hj(~) such that
hj(~) =
E
G*c with
fo1 ... Jo1 m(~;nj,£)dGl(£)
(2.9)
l
Jl0 m(~;nj,£)dG2(£)
= f0'"
However, by lemma 2.2, (2.9) is true if and only if n.
J
~
2c-l.
Now, we
define
It.1 = n}
S(n,r) ={(tl, ... ,t r ); tid +U{O}, for all i, \~
l.1=
and
o
'
S (n,r) ={(tl, ... ,t r ); tid
+
U{O},
where r+ is the set of positive integers.
•
Then
1
·e
n.
1
Xl
h.(~)= fo ... f o(
J
)Pl
J
xl"",x r
I
I
1 x.+y.
f O' .• f 0rr ri =1
for (xl, .. ·,x r )
Eo
p.1
1
1 dG
1
(p)
s(nj,r), where
~
't
= (Yl'''·'Yr-l)·
Consequently (2.9) is true if and only if
xr
X
£=0
9,
L (-1)£(
r)
for (xl' ... ,x r )
E
s(nj,r),
r
for all i, Li=lt i
~
n},
23
where
( .)
]J ,
i=1,2,
(V l '··· ,v r _l )
and v = (v l '···, v r- 1) .
~
Hence (2.9) is true if and only if
11
(1)
for ~v
E
(vl'· .. ,v r _l )
_
(2)
-]J
(vl'oo.,v r _l )
So(n.,
r) .
J
Thus the same result will follow for ni
Lemma 2.5:
IF there exists
j ~
~
nj with the same (G l ,G 2 ).
0
1 such that Hj is identifiable, then
H(k) is identifiable.
.
M
Proof.
Suppose H~k) is not identifiable.
Then there exist two
~
different mixing distributions Gl ,G 2
M(~;ni'£)
£ Gl (£)
E
G*c such that
= M(~;ni'£) ~ G2 (£)
for all x
for i=l, ... ,k with the same (G ,G ).
l 2
Consequently
no H.J is identifiable.
,
Theorem 2.1:
o
A necessary and sufficient condition that the class
H~k) of all k-population finite mixtures of multinomial distributions
be identifiable is
max n.
>
l~i~k ' -
Proof:
~c-l.
Without loss of generality we assume that
Suppose nl
<
2c-l.
nl~n2~ ... ~nk.
Then by lemma 2.3 we have
Hl(~) = M(~;nl'~)
p Gl(~)
= M(~;nl'~)
£G2(~)
~
(2.11)
24
for G1 f G2 , where G1 ,G 2 c G*c .
Hence by lemma 2.4 we still have
H.{x)
1 ~
=
M(x;n.,n)
A G (n)
~
1,.(,
P
1 ,.(,
(2.12)
~
= M{~;ni'E)
p G2 (E)
for i = 1, ... , k.
Thus H(k) is not identifiable.
M
~
The other direction is a direct application of lemma 2.5.
2.2
2.2.1
0
ESTIMATION OF THE MIXING DISTRIBUTION
Pre1 iminaries
Many methods have been suggested for the estimation of the mixing
..
distribution in the 1-popu1ation mixture model, ranging from the method
of moments to a formal maximum likelihood (ML) approach to methods based
on numerical analysis technique and minimum distance methods.
See
Pearson {1894, 1915), Rider (1961a, 1961b), and Blischke (1962, 1964) for
the method of moments, Kabir (1968) for the numerical analysis technique,
Choi and Bulgren (1968) and Deely and Kruse (1968) for the distance
methods, and Hasselb1ad (1969), Sundberg (1974), and Dempster, Laird and
Rubin (1977) for the ML approach.
The rationale for concentrating on the moment estimator, or on the
distance method, rather than the ML approach had been that the latter
method yielded 'highly' intractable equations.
However, it was not until
the late 1960's, with the emergence of fast electronic digital computers,
that the ML approach was suggested in various forms for incomplete data.
Observations from a finite mixture are considered incomplete, because the
component in the population from which each observation originates is
25
unknown.
Thus Hasselblad (1969) developed a set of iterative equations
for obtaining estimates for the l-population finite mixture of members
of the exponential family, and Orchard and Woodbury (1972) proposed
the missing information principle (MIP) for a problem that originated
from estimating genotypic frequencies from phenotypic frequency data.
Sundberg (1974) considered a ML approach for incomplete data when the
iid data came from an exponential family member and suggested a
fundamental set of formulae for the current iterative computational
approach to obtaining the ML estimator.
Under the assumption of
positive definiteness for the information matrix, he also obtained the
consistency, asymptotic efficiency and asymptotic normality of the ML
estimator.
The statistical analysis tools for incomplete data were finally
·e
unified when Dempster, Laird and Rubin (1977) (henceforth abbreviated
DLR) suggested an EM algorithm, which included as special cases Orchard
and Woodbury's MIP and Hasse1blad's iteration equations for the finite
mixture problem.
The 'E' in EM stands for the expectation step,
which consists of estimating the complete data sufficient statistic by
constructing the conditional expectation of the complete data given
the observed incomplete data and the current fit of the parameter.
The
'M' implies a maximization step, which takes the estimated complete data
and estimates the parameters by the ML methods as though the estimated
complete data were the observed data.
The EM algorithm is defined by
cycling back and forth between these two steps.
In their paper DLR showed that the likelihood is nondecreasing at
"each iteration of the EM algorithm.
Recently Wu (1983) presented an
elegant study on the convergence of the EM algorithm, which was not
26
clear in the original work of DLR.
One of Wu's primary results that
is relevant to our mixture model is that if the unobserved complete data
specification can be described by a curved exponential family or satisfies a mild regularity condition (condition (10) in his paper), then
all the limit points of any EM sequence are stationary points of the
likelihood function.
Also it has been recommended by various authors
(Hasselblad (1969), Wolf (1970), Laird (1978) and Wu (1983), among
others) that several EM iterations be tried with different initial
values to minimize the chance of possible entrapment at a stationary
point but not a local maximum.
Recently Louis (1982) aided applicability of the EM algorithm by
developing an implementation based on the complete data gradient and
the second derivative matrix to find the observed information matrix.
In many cases Efron and Hinkley (1978) have shown this to be a more
appropriate measure of the covariance matrix than the traditional
approximation 1(8), where
e is
a maximum likelihood estimator and 1
is the Fisher information matrix.
2.2.2 Maximum Likelihood Equations
In this subsection we extend Hasselblad's iterative equations for
obtaining the ML estimator of the l-population finite mixture of an
exponential family to the case of a k-population finite mixture.
Hence we can obtain the ML estimator of the k-population finite mixture
of binomial distributions as a special case of the resulting algorithm.
Let (X, 1 ,X'2""'X, ) be a random sample from the i-th population
"
'~i
distribution hi(x), which is a mixture of c component distributions,
that is,
27
h.(x)
= \~
1 TI.f.
,
1
LJ =
~J
J lJ.(x;8.)
(2.13)
where f.1 .(X;8.)
is the j-th component distribution in the i-th populaJ ~J
tion and is assumed to belong to an s-parameter exponential family,
TI. is a mixing proportion for the j-th component and
J
~j
= (8 1j , ... ,8 Sj ) is the parameter vector.
Thus
f 1..J (x;8.)
= A.(x)C.(8
~J
,
, 1J·, ... ,8 sJ.)ExP[8 1J·T l (x)+ ... +8 sJ.T s (x)J
(2.14)
for i=l, ... ,k and j=l, ... ,c.
Define ~ = ((Xll,· .. ,X lrl ), (X21""'X2r2)"",(Xkl""'Xkrk)) ,
•
·e
TI = (TI1, ... ,TI c- 1)' TI c =l-\~=ll
TI.,
and
LJ J
~
Then the log-likelihood
L*(x;~J)
~
~
1/1
;t.
= (TI,8- 1 , ... ,8- c )'
of the k-population data becomes
rl
c
L*(ZS;;l;) = I£=llog{Ij=l TIjA l (xl£)C l (8 l j " " ,8 sj )Ex P[8 lj Tl (xl £)+ ... +
8sj Ts (X )]
H
r2
c
+ I£=llog{Lj=l 1TjA2(x2£)C2(8lj"" ,8sj)ExP[81jTl(x2£)+"'+
8sj Ts (x 2Q,)J
(2.15)
If Cl ,C 2 "",C k are differentiable, then the ML equations can be
derived as follows;
aL* _ r 1 'ITjflj(x u )
88
- IQ,=l h (x Q,}
pj
1 1
r 2 'ITj f 2j (x2Q, )
+ LQ,=l h (x Q,}
2 2
t
t
28
log C,
88
pj
log C2
88
pj
+
Tp(XH~
+
Tp(XU~
(2.16)
for p=l, ... ,s and j=l, ... ,c,
and
·e
(2.17)
for j=l, ... ,c-l.
We assume there exists a real valued function
8
log C.1 (A.)
~J
8 8 .
= n.1
PJ
8
C(~j)
such that
log C(e~J. )
(2.18)
d 8 .
PJ
for p=l, ... ,s, j=l, ... ,c,
where the nils are constants.
It can be easily checked that the
binomial, Poisson, normal and gamma distribution G(A,r i ) with known
r.1 satisfy the condition (2.18).
Setting equations (2.16) and (2.17) equal to zero yields the
following ML equations for p=l, ... ,s and j=l, ... ,c;
29
d
log C(8 j) =
a8 .
PJ
Tf.
J
Tf. [r l fl·(xlQ)
~2 f .(x £)
r 2 fk.(X k£)]
= --l- I
J
+ 2: 2J 2 + + I
J
r+ £=1 hl(x l £)
£=1 h2 (x 2£) ... £=1 hk(x k£)'
(2.20)
k
where r+ = Ii=l r i .
Next, consider the case where each of the k populations has a single
(component) distribution instead of a mixture distribution.
corresponding ML equations have a closed-form solution for
If the
p ' we may
use that closed-form solution of 8p for the solution of equation (2.19)
and achieve a major simplification in computation. Thus let L*(x:~)
be
s
8
~
the log-likelihood of
~
~
when the number of components c is equal to 1.
Then
+
(2.21 )
when
(0, , ... ,8 s )
= 8 = 8,
~
~
.
30
Under assumption (2.18) a set of ML equations is given by
r
r
r
2
l
k
ologC(e) = L£=lT p(x l £)+I£=lT p(X 2£)+· .. +L£=lTp(x k£)
d6
(2.22)
nlrl+n2r2+···+nkrk
p
for p=l ~ ...
~s.
If equation (2.22) has a closed form solution for
6p = gp (t l ' ... ~ t s ) ~
6p~
say
p=1 ~ ... ~ s .
(2.23)
where
(2.24)
then equations (2.18) can be written as
(2.25)
where
·e
t
(2.26)
.
OJ
for 0=1 ~ ... ~ s ~ j =1 ~ ... ~ c .
If we denote the estimate of the parameter ;J:, at the v-th iteration by
~(v)~ then the t
-
.I
~
S
in (2.26) can be evaluated by using ~(v).
-
The new
estimates are given by (2.25) as
(v+ 1) _
6pj
(
- gp t lj
(v )
~
t 2j
(v )
~
... ,t sj
(v) )
.
(2.27)
Similarly (2.20) can be updated as
(v+l)
TI.
J
(v)
r r1
=.-L-lL
r
£;1
TI.
for j=l ~ ... ~c ~
+
(2.28)
31
where the superscript (v) implies that the reference quantities are
evaluated at ~(v).
Thus we can use equations (2.27) and (2.28) as a basis for the iterative
algorithm.
In the language of DLR (2.26) corresponds to the E-step
and (2.27)-(2.28) corresponds to the M-step in the EM algorithm.
We note that in the k-population finite mixture of multinomials
case only the largest ni determines the identifiability; hence there may
be elements in the k-population finite mixture that, marginally, lack
identifiability.
For estimation purposes, however, data from all
k-populations are used even though some of them may not be marginally
identifiable.
2.2.3 Asymptotic Distribution of the ML Estimator
·e
For the asymptotic normality of the ML estimator
~
for a
k-population finite mixture of binomial distributions we rely on the
usual maximum likelihood asymptotic theory.
Cramer (1946) showed that
under certain regularity conditions the likelihood solution is consistent, asymptotically normal, and asymptotically efficient.
Cramer's
proof was extended by Chanda (1954) to the multivariate case.
Chanda1s
proof of the uniqueness of the consistent root of the likelihood solution is not correct.
Gruenhage (1975).
A correct version is provided by Tarone and
In a more general setting Sundberg (1974) provides a
maximum likelihood asymptotic theory for incomplete data from an
exponential family member, which employs Chanda's extension of Cramer's
conditions.
However, it is not clear in his proof that Sundberg checked
Tarone and Gruenhage's conditions that need to be added to Chanda's
conditions.
32
Lemma 2.6
~
law;
(Chanda (1954)):
Suppose
f(y:~)
is a probability density
= (8 1 , ... ,8 k ) is the unknown parameter vector and Yl'Y2'··· 'Yn
are n independent observations on y.
estimating
~
The likelihood equations for
are given by
a10gL
a8 = 0 ' r= 1
, 2
, ...k
, ,
r
n
where log L = L log f(Yi ;~).
i =1
Let
~O
be the unknown true value of the parameter vector
at some point in the region 0.
~,
which exist
Then if the conditions (i)-(iii), below,
hold, there exists a unique consistent estimator
solution of the likelihood equation.
Further
e
~n
corresponding to a
In(s -8 0 ) is asymptotic~n
~
ally normal with mean 0 and covariance matrix 1(~o)-l, where 1(~O) is
the Fisher information matrix.
·e
Cond it ion (i):
For almost all y and for all
a10gf
a210gf and a310gf
a8 r
a8 r a8 s
a8 r ae s ae t
exist for all r,s,t = 1,2, ... ,k.
Condition (ii):
For almost all y and for all
~ E 0
~ E
0
where
JOOHrst(y)f dy
<
M<
00
_00
and Fr(Y) and Frs(Y) are bounded for all r,s,t = l, ... ,k.
Condition (iii):
For all
~ ~ 0
the matrix
1(8) = foo (alogf] (alogfJ~ f d
_00
is positive definite.
a8 r
a8 s
y
33
For the positive definiteness of the information matrix of the
k-population mixture of c binomials when c is known apriori
we prove
the following:
Let ~ = (TIl ,TI 2 ,···,TIc _l' Pl ,P2""'Pc) and let the parameter space 0 be given by
c-l
0= {(TI1,· .. ,TIc-l,Pl'''''Pc): O<TIi<l, i=l, ... ,c-l .L1TI i < 1,
1=
o < Pl < P2 < ••• < Pc < l}
Lemma 2.7:
Let
Al
G
~O'
the true parameter value be contained in some closed region
which does not contain the boundary values of 0.
If the random variable Y = (Y- l , ... ,Y- k) where Y. = (Y·l,
1 ... ,Y.1r.1 )
for i=1,2, ... ,k is distributed following the probability mass function
~
~1
r.
-e
k
1
h (y:e) = II
II [TIlb(y .. ;n·,Pl)+TI b(y .. ;n',P2)+... +TI b(y .. ;n.,p )] (2.29)
2 1J 1
1J 1
C
1J 1 c
Y- . 1 j=l
~
1=
c-l
where TIc = 1
- . L1TI.1
,
1=
then the information matrix
=E
e
~O
-r \' k
l
[k
.. :8)]·-J
\' r.1 alog h·(Y·
1 1J.:8)]
~
\' \' r.1 alog h.(y
1 1J
~
q=lLj=l
ae~
Li=lLj=l
ae~
J'
(2)
.30
"
where h.(y
.. :e)
= \,c
lTI b(y 1J
.. ;n·,pt)
for j=1, ... ,r 1·, i=l, ... ,k,
1 1J
~
Lt = t
1
is positive definite if and only if the identifiability condition
max
l.:s.i~k
Proof.
n. > 2c-I holds.
1-
We first prove the positive definiteness of the information
contained in a single observation Y from
matrix, say 1.(8),
1
~
ISuch a region always exists, because by the definition of the mixture of
of c components the boundary values of 0 are not true parameter values.
34
For convenience we write
(2.31)
and
(2.32)
,
The first partial derivatives of h.(y;e) are given by
~
(2.33)
,
Since 1.(e) is a dispersion matrix, it is positive definite unless there
r
exists
-e
~
l
=
(Yl"" 'Y2c-l) f Q such that
alogh.(y:e)
1
\,2c-l
L.£=l Yi
for y
~
=
0
(2.34)
= O, ... ,n i .
The set of equations (2.34) constitutes a set of homogeneous linear
equations
-1
oh.
Ay~
,
=0
~
<=> Ay
~
=0
,
(2.35)
~,
, ,
where Dh . = diag(h.(O), ... ,h.(n.)), and A is a (n,.+1)x(2c-l) matrix
whose y-th row appears in (2.33). Since the sum of each column of A is
,
equal to zero, we can delete anyone row from A in finding the solution
of (2.35) and let A* denote the resulting matrix.
matrix A* is nonsingular for
~ E
When ni =2c-l, the
8, which can be shown by elementary,
but tedious column operations that transform A* into an Echelon form.
Consequently only a trivial solution exists for! in (2.35) if and only
if ni
~
2c-l.
Hence l i (e) is positive definite if and only if
ni~2c-l.
35
Now. for the information matrix
1(~)
contained in Y=(Yl •...•Yk)
we can decompose I(Q) as
(2.36)
Since Xl .···.X k are independent and within the i-th population
yil •...• yir. are i.i.d for i=l •...• k. (2.36) readily follows. Suppose
1
~
max n. = nl
1
lsi::;k
(2.36)
1(~)
2c-l.
Then
Il(~)
is positive definite.
Hence by
is positive definite.
To prove the necessary condition we assume max n.
1::; i::; k
<
2c - 1. and
1
solve the following homogeneous linear equations;
-e
(2.37)
for Yij = 0.1 ..... ni and i=l, .... k.
Since (2.37) admits the number of equations less than 2c-l. nontrivial
solution for
r
exists.
Hence
1(~)
is not positive definite.
0
Now. we prove the large sample property of the ML estimator
~ = (TIl'" .• TI c _l ' Pl.··· .Pc) based on I~=lri= r+ observations.
2
Theorem 2.2:
Let Q. 8 and
8 be defined as in lemma 2.6. Let
the true parameter value contained in 8.
~O
be
If the random variable
2Similar arguments can be found in N. Kiefer (1978). who considered a
mixture of two normal distributions in the switching regression.
36
Y = (Yl,
... ,Y~ k), where ~"
Y. = (Y.l, ... ,Y., r. ) for i=l, ... ,k, is distributed
~
1
according to the probability mass function
~
,
with max n.
l:;:;i:;:;k
2c-l, and if
~
then for large enough r+, there exists a unique consistent root
~r
+
of the likelihood equations and ;r:{~r -~O) is asymptotically
+
normally distributed with mean zero and covariance matrix (~ 1{~O))-l.
+
Proof.
-e
(i)
~
The proof consists of verification of Chanda's conditions
(iii), modified for the one-way layout nature of our data together
with two extra conditions of Tarone and Gruenhage (1975).
(iii) is readily verified by the use of lemma 2.7.
The condition
Verifying conditions
(i) and (ii) involves straightforward differentiation.
It can be easily
seen that ahi/a~ and a2hi/a~a~' are all continuous functions on 8,
hence they are bounded.
, = -ah.'
a9-nh.
aes
2
Zl £n hi
aGsaG t
Now using the relation that
1
,
~
aes
a2h.,
,
ah.
aesae t - aes
1
,
-~
a3.Q,nh i
ah.1 ah.
= 2 38
assaetas u
s ae t
ah.1
ae t
,
h?
ah. 1
ah.1 a2h.
ah.1 1
a2h.
ae u h~ - aesae t aes h? - aes aesae t
ah.
a2h.1
a3h.
1
_1 +
a8 u a8 s a8 t h~
a8 s a8 t d8 u ~
,
-
1
,
,
,
,
,
,
,
,
,
1
0,
37
Chanda's condition (i) and (ii) are easily verified.
2
A
conditions, that 8 is a convex subset of
are continuous for all ~
2.3
2.3.1
E
R
c-
1
The two extra
a£nh.
and that -~ and
s
o
8, are readily verified.
k-POPULATION FINITE MIXTURE OF BINOMIALS-APPLICATION
Description of the Ames Test
Since publication of the paper by Ames et
~
(1975), the Ames test
has gained worldwide use for investigation of a chemical's mutagenicity.
Its extensive use in studies of genetic toxicology is due to the test's
sensitivity in detecting mutagens, its economy both with respect to
time and material, and the well-documented link between carcinogenicity
and mutagenicity.
bacterial test.
The Ames test is based on a very sensitive
The bacterial test uses several genetically constructed
histidine-dependent (auxotrophic) Salmonella typhimurium strains that
can be reverted to histidine independence (prototrophy) by a wide
variety of mutagens.
This bacterial test is adapted for use in detect-
ing chemicals that are potential human carcinogens or mutagens by adding homogenates of mammalian liver, which is a convenient source of the
activating enzymes that are an important aspect of mammalian metabolism.
Ames et
~
(1975) reported that about 85% of known animal carcino-
gens had been detected as bacterial mutagens and among 106 known
non-carcinogens few were mutagenic in the test.
.
Many Salmonella tester
strains have been developed by Ames and his colleagues; among them
TA 98, TA 100, TA 1535, and TA 1537 are most commonly used.
As indi-
cated earlier, if a tester strain is hit by a mutagen, then it may be
reverted to prototrophy.
Since prototrophic strains are capable of
38
synthesizing histidine, an essential amino acid, they continue growing
and dividing without an external supply of histidine, whereas auxotrophic strains, being entirely dependent on an external supply of
histidine, cannot sustain growth.
Thus if at least one auxotrophic
bacterium reverts to its prototrophic state through mutation, there
will be continuous growth of its descendants after exhaustion of the
external supply of histidine.
Thus mutagen-induced and spontaneous
revertants ultimately yield colonies that are visible to the human
eye.
It has been observed frequently that the toxicity effects of the
chemical increasingly outweigh its mutagenicity effects beyond certain
dose levels.
Thus, toxicity must be considered as a competing risk
vis a vis the mutation process.
2.3.2 Statistical Analysis of the Experimental Data
The experimental data consist of the results of 763 compounds,
where the experiments followed the standard protocol of the Ames
test.
Four tester strains, TA 98, TA 100, TA 1535, and TA 1537, were
employed, and three levels of metabolic activations were considered by
adding (i) no enzyme, (ii) liver homogenate from a hamster, and
(iii) liver homogenate from a rat, respectively, to each of a set of
three petri dishes.
Each compound for each of the 12 combinations of
four strains and three metabolic activations was tested nat times, for
a = 1, ... ,763, t = 1, ... ,12.
For each of the nat times the experiment
should have consisted of 18 petri dishes, i.e., 3 replicates at
control and 3 replicates at each of 5 dose levels, but there was
occasional loss of dishes due to breakage, extreme toxicity, etc.
39
For the analysis of the observed numbers of revertant colonies
in a single experiment, Margolin et
!L
(1981) suggested a family of
mechanistic models based on the biological formulation of the Ames
test.
They also noted the existence of hyper-Poisson variability
among the replicated plate counts and advocated the use of a negative
binomial distribution.
X~
Their negative binomial model for the number
of revertant colonies observed on a plate with environment
was
~
denoted by
(2.38)
~
where
~(~), ~ >
is shorthand for
0 and c > O.
The variability in replicated plates is reflected in c; when c
+
0
(2.38) reduces to the Poisson distribution through a standard limit
·e
argument.
In order to disentangle the competing risks of mutagenicity and
toxicity, and hence to draw inferences regarding the mutagenicity,
~
Margolin et
(1981) modeled
~
as NOP o ' where NO is the known
average number of microbes placed on a plate, which is large, e.g.,
108 , and Po is the probability that any plated microbe yields a
revertant colony when the plate is exposed to dose 0 of a test
chemical.
Among those models for Po considered, they suggested that
two models were of primary interest;
Po
= {l
exp[-(a+80)]}· exp(-yO)
(2.39)
Po
= {l
exp[-(a+SO)]}· [2 - exp(yO)]+
(2.40)
where [xJ+
max(O,x), a
=
mutation, S
~
~
0 is related to a spontaneous rate of
0 is related to the induced mutation, and y
~
0 is
40
related to the induced toxicity.
From (2.39) or (2.40) it can be seen that PD is a product of two
probabilities, one for mutagenicity and one for survival from toxicity;
hence PD represents the competing risks of the two. Moreover, a
chemical under study is mutagenic if and only if B > O. Thus one may
formulate the mutagenicity testing problem into a statistical hypothesis test by setting up the hypothesis as'
HO
Ha
B=0
B >0
«=>
«=>
not mutagenic, or for brevity -)
mutagenic, or +)
with a significance level a.
This significance level a is by
definition
a
·e
= Pr(judged +1 truly -) ,
(2.41)
the false positive probability assumed constant for each compound in
each experiment.
In each experiment a chemical is determined to be local-positive
iff [S/SE(S)J > c*, where Sand St(S) are obtained by the ML methods
based on 18 petri dish data under the negative binomial model (2.38)
and either (2.39) or (2.40).
Under HO we may claim that
[6/51(6)J - N(O,l)
Thus the critical value c* is determined by the given level of a.
(2.42)
For
= 0.05
and compound i we may obtain xit ' the number of local-positives
among nit experiments for each of 12 combinations of strain-activation
a
sets.
For example, for chemical 1 (identification number) we observe
the following table:
41
~
None
0/1
_.2/3
3/3
TA 100
0/2
0/2
0/2
TA 1535
0/1
0/1
0/1
TA 1535
0/2
1/3
2/3
Strain
TA 98
Hamster
Rat
Table 2.1
The number of local positives out of
{nlt}~:l experiments for chemical 1.
where notationally yin implies y local-positives out of n experiments.
Lastly, Margolin and his colleagues (personal communication) combine
data from different experiments and reach a single conclusion and
determine a chemical to be positive if and only if there is at least
one repeated local-positive in at least one strain-activation set.
Thus in Table 2.1 or in any other such table for another chemical if
any cell contains the number of local-positives
chemical is considered to be positive.
~
2, then that
In what follows we refer to the
summary data in Table 2.1 obtained through the statistical procedures
described above as the derived data.
2.3.3
Further Analysis of the Derived Data:
2.3.3.1
Mixture Model
k-Popu1ation Mixture of Two Binomials
The derived data in Table 2.1 admits further statistical analysis
that may be focused on the following three problems;
Problem (i):
How to perform an empirical check on the operating
properties of Margolin et
~IS
statistical procedures?
42
Problem (ii):
What is the proportion of mutagens in the popula-
tion of compounds tested?
Problem (iii):
What is the power of detecting a true positive
chemical in this procedure?
This last problem reflects both the sensitivity of the Ames test as
well as the sensitivity of the statistical analysis.
The derived data
can be arranged to yield a lower triangular two way layout for each
strain-activation set by counting the numbers of chemicals that have
O,1,2, ... ,n i positive results, respectively, out of ni experiments,
where n. = 1,2, ... , max(n t) for i=1,2, ... ,k.
1
a,t a
To develop a suitable statistical model for problems (i)-(iii),
we may note several characteristics of the experiments and the derived
data;
(i)
There are S compounds that have been tested from a
hypothetical set ¢ of compounds that have or will be tested.
Note, this is not the universe of chemical compounds.
Scientific judgement enters the selection procedure, so that
for example, H20 would not be tested nor included in ¢.
(ii) The tests adopted have a probability T of yielding
false positives, and a probability l-p of false negatives.
The latter is somewhat of a simplifying assumption, similar
to Neyman's (1947) diagnostic simplification, so as to permit
some analytic progress.
(iii)
The set ¢ of compounds has a proportion
chemicals.
TI
of positive
43
(iv)
For each strain-activation set, the chemical can be grouped
into sets such that the i-th set or r i chemicals has been tested
n.1 times for positive or negative evidence of mutagenicity, where
n. = 1,2, ... , max(n t}, i=l, ... ,k.
a, t
1
a
The probabilities p and T can be described in the usual table:
~
Result
State
of Nature
positive
pos iti ve
p
l-~
negative
T
l-T
negative
Table 2.2
For each strain-activation set the vector Xi=(Y il ,Y i2 , ... ,Y ir .)
1
of positive results of the i-th set of chemicals can be considered
as an observation from a mixture of two binomial distributions, i.e.,
1 n. y..
n.-y..
n.
y..
n.-y ..
J ,
Pr(Y.=y.) = IT ITI( 1 )p 1J (1_p) 1 1J +(1_TI)( 1 h 1J (1_-r) 1 1~
-1 -1
j=l L Yij
Yij
r'r
(2.43)
for i=l, ... ,k.
Using the simple notation we denote (2.43) as
{Y .. } '" {b(y .. ;n.,p)}
1J
fo r
1J
1
G (p)
P 2
1\
(2.44)
j =1, ... , r i ' i =1 , .•. , k ,
where G2 is a discrete distribution function with two atoms and
,§ = (TI,p,T).
Now, (Xl'
X2 ,
... ,
Yk)
constitutes an independent, non-identically
distributed set of data with joint likelihood
k
k
ri
IT h.(,l.;,tZ):=.: IT {IT [b(y .. ;n.,p) 1\ G2 (p)]} .
. 1 1 1
. 1 J=
. 1
1J 1
P
1=
1=
(2.45)
44
Specifically, we have
iid
Y11 ' ... , Y1r ~ h1 (,©) = B(n 1 ' p)
1
P G2 { p}
iid
Y21 ' ••• , Y2r 2 ~ h2 {~} = B{ n2' p}
P G2 (P)
.
(2.46)
The reader wi 11 recognize this formulation to be a k-popu1ation mixture of two binomials.
~
The problems (i)
{iii} above are tied to estimation of param-
eters TI, P and T of the k-population mixture of two binomials model
{2.43} and obtaining their sampling distributions.
Particularly
Problem (i) affords an empirical check of the a priori assumption on
the size of the false positive probability, which was set to be 0.05
·e
based on the large sample behavior {L.42}.
Problem {ii} is identified
with estimation of the prevalence rate TI, and Problem (iii) may be
partially answered by studying the sampling distribution of the estimaA
tor p.
The identifiability condition of the class of joint distributions
of
(Y 1 , ... ,y k)
is given by Theorem 2.1, which says the class is
identifiable if and only if
(2.47)
max n. 2 3.
hi<::;k
1
In the derlved data,
max n.2 4 for each of the 12 combinations of
I::;i::;k
1
strain-activation set.
In order to find the MLE of 8
equations (2.27) and (2.28).
=
(TI,p,T) we use the iterative
We first define
45
z..
1.1
1
= {
0
if Yij is from blYij;n i ,p)
if Y.. is from b(y .. ;n. ,T)
1.1
1.1 1
(2.48)
for j=l, ... ,r i , i=l, ... ,k.
Then (2.27) and (2.28) admit the following EM implementations;
Let
w..
.. ;8)
= E(Z.1.1·IY 1.1
.. ,8)
1.1 = w(y 1.1
~
~
(2.49)
=
TIb(y .. ;n. ,p)
1J 1
TIb(Y· .;n. ,p)+ll-TI)b{y .. ;n. ,T)
1J 1
1.1 1
and
w~ "! )= w.. (y .. ; 8 ( \J) ) ,
1J
1J
1J
(2.50)
~
where the superscript (\J) implies that the estimated quantity ;s
evaluated at the \J-th iteration step.
·e
Then the parameter vector
~
= (TI,p,T) is updated by
(\J+ 1) = k r i (\J )
k ri
(\J )
L L w-. y .. / I I n.w ..
P
i=l j:';1 1J 1J
i=1 j=1 1 1.1
T
(v+ I)
(2.51)
(2.52)
I
.~\~\JJ.)'
TI(\J+l) = 1
r+ i=l J=l
where r+ =
I
ri=l 1
(L.53)
The equations (Z.51)-(2.53) can be simplified by noting that the i-th
set of r i chemicals can assume values U,l , ... ,n i for numbers of positives. Hence frequencies, say {fit}' can be assigned to {Yij = t},
the w..
's are equal, say to w-t .
1J
1
Hence we have the following simplified equations;
t=O,l, ... ,n i ·
For all those .y 1 J
.I
S,
46
p
T
(\)+1)
(2.54)
( \)+ 1 )
(2.55)
(
) =
TT \)+1
1
-
k
\
L
n.1
\ f
L
r+i=l t=O
(2.56)
'tw't
1
1
Thus cycling back and forth between the equation (2.49) and the
equations (2.54)-(2.56) defines the eM algorithm for the k-population
mixture of two binomials model.
The observed information matrix for our model can be obtained
following Louis' EM implementation (Louis, 1982).
Using EM terminology,
the complete data can be defined by specifying the component distribution from which each observation is drawn.
·e
Let
z.
~1
=
t 2.57)
(Z'1,Z·2""'Z.
)
1
1
1r i
and
where Z.. is defined in (2.48).
1J
Then the complete data ~ = (~1'~2"" '~k) is defined as
X. = (Z., Y. L
~1
~1
(2. 58 )
~1
where .6i = (X il ,··· ,X ir .), i=l, ... ,k.
1
The likelihood of the complete data
~
suggests a two-stage
experiment, where first a component is picked by a Bernoulli experiment, and then a binomial variate is generated.
likelihood hX .. (o) for Xij is given by
1J
Therefore the log-
47
h*X
t x1 J e)
~
0
•
0
1J
•
;
(2.59)
fo r j = I , ••. , r i' i =1 , ... ,k.
Let S(X ij :&), S(~i:&)' and S(~:&) be gradient vectors of
r.
k
r.
h*x (x .. :&),
I 1 h*X (xo. :&), and I I 1 h (x .. :&), respectively.
ij 1J
j=l ij 1J
i=l j=1 ij 1J
x
Let BtX 1J
.. :8), BtX.:8) and B(X:8) be the negatives of the associated
~
~1
~
~
Since ~l '~2"" '~k are independent,
second derivative matrices.
k
k
S(~:&)
-e
= L
i =1
~
S(X.:8) =
~1
~
.
I1
1=
and
1. S(X 1..J :8),
~
. '1
J=
ri
k
=
ri
I
I
i=l j=l
BtX .. :8).
1J ~
Thus we obtain
S(X .. : 8) =
1J
r
~
_
and
~p
h"X (x .. : 8)
ij 1J ~
}-P
h"X (x
8)
ij 1J ~
0
•
:
~
h*X (x. 0: e)
uTI
.
1J ~
lJ
0
y .. -n. p
1J
1
p(l-p)
Z
ij
=
( I-Z
ij
)
Z.. -TI
1J
TI(
1-TI)
y .. -noT
'lJ
1
Ttl-T)
(2.60)
48
B{X 1J
.. :8)
=
~
Z..
1J
n.p 2-y .. (2p-l)
1
1J
o
2 2
(p - p )
o
n.T 2 -yo .(2T-l)
( 1-Z.
.)
1J
1J
2 2
1
(T-T)
Z.• -2'ITZ .. +'IT 2
1J
1J
2
2
'IT (l-'IT)
(2.61 )
Hence the conditional complete data observed information matrix
·e
becomes
Ix
=
k
ri
L
L
i=l j=l
o
")
n.T,,2 -y .. (2T-l
(l-w .. )
1J
o
1
1.)
" ,,2)2
( T-T
(2.62)
49
Following Louis ' development, the lost information, due to the
unobservable
I.
denoted by IllY' is obtained as
(2.63)
where S'(l:8) is the transpose of S(l:Q).
Also by the definition of the MLE
§.
which maximizes the incomplete
r.
k
data likelihood ITi=l ITj~l hitYij:~)' we have
t2.64)
Hence
k
= [I
k
ri
ri
A
I S*(V .. ;&)J[ I I S*(V .. ;e)J
i=l j=l
lJ
i=l j=l
lJ
= O.
(2.65)
Thus by using equations (2.63) and (2.65), and the independence of
·e
k
r.
1\·11E"'e[S(X .. ;8)S'(X .. ;8)!V .. ]
L1= L J=
lJ lJ lJ
= \.
k
r.
- \. 1\·11S*(V .. ;8)S*'(V .. ;8)
L1 = LJ =
lJ lJ -
After simple algebra it can be shown that the two terms in (2.66)
become
4Louis (1982) has a different expression for lxlv' which is algebraically equivalent to (2.66). However. we find that (2.66) was simpler to
be programmed.
50
e) Iy lJ.. ]
Ee/{S (X .. ; 8) S I (x. 0;
lJ ~
lJ -
y lJ
.. - n 1op
Wij [ - p{l-p)
J2
l
o
(2.67)
+i
w.. -2iTw . .
lJ
lJ
[ fT (1-fT ) ] 2
o
and
(2.68)
w. . -1[
lJ
fT(l-fT}
·e
In the equation (2.67) the entries in the (1,3) and (2,3) positions are
equal to zero because of the EM equations (2.54) and (2.55), respecti vely.
Finally the observed information matrix I y is obtained as
IX = IX - IX IX '
(2.69)
where IX is obtained in (2.62), and I
is obtained by (2.66)-(2.68).
XIX
2.3.3.2
Results of the Analysis.
Two sets of derived data were obtained for further statistical
analysis.
The first set of output data was based on the statistical
procedures by Margolin, Kaplan and Zeiger (1981) using the usual
significance level a
= 0.05. This is now referred to as the stat-call.
51
For the second set of derived data a senior toxico10gist's5 subjective
judgment based on his past experience yielded the decisions of whether
a compound being tested was local-positive or local-negative.
Hence
there is no formal statistical procedure in the generation of the
second set of derived data.
The second set of derived data is called
the Zeiger-call.
The stat-call and Zeiger-call data are presented in Table 2.3 in
lower triangular arrays for each strain-activation set.
Tables 2.4.A
and 2.4.B display the corresponding MLE's and the inverses of the
observed information matrices Iy'S obtained by the EM algorithm 6
procedure described in 2.3.3.1.
The total number of compounds tested in each strain-activation set
varies slightly (at most by 1) because some compounds were not tested in
·e
certain strain-activation sets.
In Table 2.4.B some MLE's were obtained
at the boundary of the parameter space, i.e., p = 1 for TA lOO-N and for
TA 1537-R.
This yielded singular information matrices (see (2.62),
(2.66)-(2.69) for the singularity of a information matrix at p
=
1.)
Since these estimated values do not belong to the parameter space at
the outset, they must be interpreted without benefit of a corresponding
standard error.
For the overall probability of a false positive, for a compound
when all 12 strain-activations are employed, we note the following
immediate but useful results.
50r . Errol Zeiger of the National Institute of Environmental Health
Sciences, who actually supervised all the biological experiments that
yielded the experimental data.
6Several sets of initial values were tried with the EM alqorithm. It
turned out that the EM algorithm leads to fairly stable stationary values
of the estimator values with respect to various sets of initial values.
52
Lemma 2.8:
Let
be the probability of a false local-positive for
T.·
lJ
a compound in the i-th strain and j-th activation set, for i=l , ... ,4,
•
j=1,2,3, and let Tover be the overall probability of a false positive
for a compound. Then under the independence of the 12 combinations
of strain-activations,
(2.70)
Proof:
1 - Lover = Pr(judged negativeltruly negative)
= Pr(no repeated local-positives in any of 12
combinations of strain-activation setltruly
negative)
4
n
=
3
2
o
n (l-T .. )
i=l j=l
lJ
Thus by lemma 2.8 the MLE Tover of Lover becomes
(2.71)
T..
lJ
where
by Theorem 2.2 has an approximate normal distribution
with mean Tij and variance equal to the (3,3) entry of Iy(~)-l. The
distribution of Tover can be obtained under the independence of the
T..
's when the compound is not mutagenic.
lJ
Lemma 2.9:
3
4
12
Let hij}j=l ;=1 be reindexed as ht}t=l and assume
[1'" .,1 12 are independent.
~(T
-1
+ over over
)
V
Then
12 .
2)J2 at2) '
~ N(O,r+Lt-l[2T t n (l-T s
s1t
(2.72)
where T
is defined in (2.69) and a~ = Var(T t )·
over
Proof.
Let I = (Tl,· .. ,T 12 ) and the MLE
I
be defined accordingly.
Then as a result of Theorem 2.2 and the independence assumption
53
~ (~ - ~) ~ N(~,r+diag(0i, ... '0i2))'
(2.73)
where diag(0i, ... ,0~2) is a diagonal matrix with diagonal entries
0~, ... ,0~2' Then by using the multivariate a-method (Bishop,
Fienberg and Holland, 1975, §14.6.3) the result follows.
0
Calculation of ~over and S~D(iover) for the stat-call and
Zeiger-call data are given in the table below:
A
T
over
S:D(T over )
Stat-Call
0.0560
0.0010
Zeiger-Call
0.0013
0.0004
Table 2.5: Tover and its standard
deviation for Stat-Call and
Zeiger-Call data
-e
2.3.3.3
Discussion of the Results
Margolin et
~'s
statistical procedures described in subsection
2.3.2 assumed that for each set of 18 petri-dishes 8/St(s) was
distributed as N(O,l) based on the large sample theory.
Based on
this normal assumption the cut-off value was determined for given
level of significance a = 0.05 for each experiment to test the
local-positiveness of the compound.
By noting that the significance level of a is equivalent to the
probability of a false-positive in each experiment (see t2.4l)) the
operating property of
~~rgolin
et
~IS
statistical procedures can be
checked against the sampling distributions of Tij for i=l , ... ,4,
j=1,2,3. For Table 2.4.A we extract the entries corresponding to
54
T and var(i) and present them in the table below:
StrainActivation
A
A
T;j
S.O(Tij)
(T .. -0.05)/S:O(T .. )
TA 98-N
0.0441
0.0290
-0.203
TA 98-H
0.0493
0.0235
-0.030
TA 98-R
0.0659
0.0187
0.850
I
TA 100-N
0.0536
0.0289
0.125
I
lJ
lJ
-
•
-e
TA 100-H
0.0969
0.0231
2.030
I
TA 100-R
0.1026
0.0293
1.795
,
TA 1535-N*
0.0788
0.0107
2.692
TA 1535-H
0.0762
0.0170
1.541
TA 1535-R
0.0607
0.0154
0.695
TA 1537-N
0.0584
0.0207
0.406
1537-H
0.0533
0.0143
0.231
TA 1537-R
0.0659
0.0111
1.432
TA
A
Table 2.6:
T and SO(i) for each combination of
strain-activation set.
In Table 2.6 we see that among the 12 Tij's, 10 have 0.95 confidence
intervals containing the value 0.05.
Thus we may conclude that the
stat-call data provide eVidence that N(O,l) is a good approximation
of the tail distribution of ~/SE(S).
With biocnem;cal techniques, TA 98 and TA 100 were engineered
from TA 1535 and TA 1537, respectively, to have greater sensitivity
55
to mutagens.
This fact is reflected in the dominance of a1s in
TA 98 over a's in TA 1535 uniformly with respect to the activation
sets.
The same is true for TA 100 and TA 1537.
Investigation of the Zeiger-cal
I
data indicates that his
decision making is too conservative relative to tne conventional
range of statistical significance levels commonly employed in
scientific research .
•
56
The Number of j Positive Results
Table 2.3
in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
Zeiger-Call
Stat- Call
TA98-N
i\j
1
2
3
4
5
6
'e
0
1
65
512
15
3
0
0
13
2
75 64
5 1
1 1
0 1
0 0
3
4
2
0
0
4
0
0
0
r.
5
6
0
0
78
651
25
7
1
0 0
1
1
0
2
71
5
580 21 44
19 3 2
6 0 0
0 0 1
0 0 0
3
4
r.
5
6
76
645
27
6
1
0 0
3
0
0
0
0
0
0
0
0
3
4
5
1
TA98-H
i\j
1
2
3
4
5
6
0
1
2
40 21
482 88 83
17 8 7
2 0 2
1 0 0
0 0 0
3
9
0
0
0
4
1
0
0
r.
5
6
2
0
61
653
41
5
3
0
0
0
1
1
2
54 4
563 15 72
24 4 3
3 0 0
0 1 0
0 0 a
10
1
0
0
6
r.
1
58
650
41
4
0
1
0
1
0
3
0
TA 98-R
i\j
1
2
e
3
4
5
6
0
1
2
3
47 12
478 96 80
22
4 4 11
2 2 1 0
0 0 0 0
0 0 0 0
4
5
3
1
0
0
0
6
r.
1
59
654
41
8
1
0
0
0
1
54 3
569 17
26
2
4 0
0 0
0 0
2
64
6
1
0
0
3
7
1
0
o'
4
1
·1
0
5
6
r.
1
57
650
41
7
0
0
1
0
0
57
The Number of"j Positive Results
Table 2.3
in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
•
Zei ger-Ca 11
Stat- Call
TA 100-N
i\j
1
2
3
4
5
6
0
1
2
36 11
409 135 101
24 9 9
3 3 3
1 0 0
0
0
0
3
9
5
0
1
4
5
6
r.
a
47
645
51
18
1
1
-4
0
0
0
0
1
0
1
2
3
39 2
541 25 79
33 11
0
...,
12 2
0 0 1
0 0 0
4
5
6
()
1
0
0
0
3
1
41
645
50
18
1
6
.)
r.
0
0
0
0
0
4
5
6
r.
•
TA 100-H
~e
i\j
0
1
11
2
4
394
12
4
5
0
6
1
3
1
2
3
8
129 154
12 9 12
1 1 4
1 0 0
0
0 0
4
5
9
0
0
a
0
6
r.
19
677
45
19
1
1
0
1
0
1
2
1
18
674
16 2
530 21 123
19 8 5 10
7 2 1 4
1 0 0 0
0 0 0 0
42
5
0
0
0
0
0
19
1
0
TA 100-R
i\j
1
2
e
3
4
5
6
0
1
2
3
9 10
386 155 140
15 9 7 13
1 2 1 4
0
0 0 1
1 0 0 0
4
8
1
0
5
6
r.
1
19
681
44
16
0
0
2
0
1
0
1
2
18 3
539 24 113
25 -3
5
6 3 0
1 0 0
0 0 0
3
7
4
0
0
4
3
'1
0
5
6
r.
1
21
676
40
16
0
0
2
0
0
58
The Number of j Positive Results
Table 2 3
0
in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
Zeiger-Call
Stat- Call
TA 1535-N
i\j
1
2
3
4
5
6
0
1
2
84 12
489 85 42
22 7 1
5 3 1
0 0 0
0 0 0
3
8
1
0
1
4
6
0
0
96
616
38
12
0
1
0
2
0
0
r.
5
l.
0
93
569
26
6
0
0
1
2
0
14 34
2 2
2 1
0 0
0 0
3
4
5
6
r.
l.
93
617
35
5
2
0
0
0
0
0
11
0
0
0
0
0
TA 1535-H
~e
i\j
1
2
3
4
5
6
0
1
2
3
68 15
472 87 71
19 9 3 10
2 1 1 0
0 0 1 1
1 0 0 0
4
5
6
r.
83
630
41
1
0
0
5
1
0
3
0
1
1
0
l.
2
4
3
82 2
556 18 47
22 10 6
4 0 a
0 1 2
0 0 0
6
5
r.
l.
84
621
43
5
0
1
5
0
0
0
0
0
0
...,
,)
0
0
TA 1535-R
i\j
•
1
2
e
3
4
5
6
0
1
2
70 18
476 88 60
16 10 7
4 1 0
0 0 0
0 1 0
3
4
6
0
0
88
624
40
8
2
1
0
7
1
0
0
2
2
0
r.
5
l.
0
1
2
81
3
554 18 50
26 -4 5
5 0 1
0 0 1
0 0 0
3
5
1
1
4
1
-0
'
0
0
r.
5
6
0
0
84
622
40
8
2
0
0
l.
59
The Number of j Positive Results
Table 2 3
0
in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
Zei ger-Ca 11
Stat- Call
TA 1537-N
"\"
,J
1
1
2
3
4
·e
0
1
2
94 20
510 80 28
12 7 4
3
1 a
5
0
6
a
a
a
a
a
3
4
3
0
a
a
a
a
a
5
6
r.
1
'114
618
26
4
a
a
0
0
1
105 a
592 10
18 ...?
4
a
a
a
0
2
3
4
5
r.
6
1
105
622
25
4
20
2 3
a
a
a
a a a
a a a a
a a a a a
a
a
TA 1537-H
0
i\j
1
2
76
528
.3
11
23
68
6
4
4
a
5
0
0
6
a
0
1
2
3
37
3
a
a
a
4
5
6
r.
99
633
27
7
0
0
a
a
0
a
4
a
a
0
0
1
2
1
.3
95
1
579 15 32
18 5 1
4 0 a
a
0
0
0
4
1
96
626
30
4
6
a a
a a a
a a a
a
a
r.
6
5
a
a a
0
a
TA 1538-R
i\j
•
1
2
e
.3
4
5
6
0
1
2
87 12
512 73 48
14 6 2
1 3 a
a 0
a a
a
a
.3
4
5
6
1
99
633
26
4
a
a
a
r.
1
a
a
5
a
a a
a
a
0
1
2
92 3
583 10 33
20 .6 a
4 a a
a
a
a
a
.3
4
5
6
1
95
626
30
4
a 1
a a a
a a a
r.
5
a
a a
a
a
60
Table 2.4.A:
The ML Estimator
Matirx Iy(;)-l
"
~
"" ,.. "
= (TI,p,l)
and the Covariance
via EM Algorithm in Each Strain-
Activation Set Based on the Stat-Call
TA 98-N
A.
TI
"8
=
"p
rl
"
TI
0.00423
0.1595
=
A.
0.7870
I (~)-1
y -
0.0441
1"
"
1"
"P
-0.00972
-0.00181
0.02415
0.00430
0.00084
sym.
TA 98-H
·e
"
TI
"
8=
"p
0.2282
=
"
1"
0.7973
Iy
(~) -1 =
0.0493
"
TI
"P
"
1"
( 0.00233
-0.00336
-0.00103
0.00594
0.00162
l s~.
0.00055
TA98-R
"TI
A.
8=
"p
A.
1"
0.1837
=
0.8550
0.0659
Iy(~)-1
"
TI
"P
0.00143
-0.00247
-0.00061
0.00548
0.00120
=
syrn.
"
1
0.00035
1
J
61
Table 2.4.A:
The ML Estimator
A
~
"'"
"
,.,.
= (n,p,T) and the Covariance
Matirx I y (Q)-l via EM Algorithm in Each StrainActivation Set Based on the Stat-Call
TA 100-N
"
n
'"e
=
'"p
=
'"T
.-
r
l
"
n
0.00368
0.3489
0.6887
0.0536
I (~)-1
y -
-0.00310 -0.00159
0.00331
=
"T
"p
syrn.
0.00141
0.00083
TA 100-H
A-
0.3273
n
A-
e=
A-
p
=
A-
0.8541
Iy
(~) -1
'"p
A-
r 0.00165
-0.00150
-0.00077
0.00193
0.00083
=
l
0.0969
T
'"
n
T
0.00053
sym.
TA 100-R
"
n
A-
"e=
"p
'"
T
0.00291
0.3428
n
=
0.8070
0.1026
Iy(~)-1 =
sym.
'"p
"T
-0.00254
-0.00140
0.00282
0.00133
0.00086
1
J
62
Table 2.4.A:
The ML Estimator
Matirx I y(2)-1
A
~
= (TI,p,T) and the Covariance
'"
"
"
via EM Algorithm in Each Strain-
Activation Set Based on the Stat-Call
TA 1535-N
'"
TI
'"e
=
'"p
=
'"
T
r
l
'"
TI
0.00032
0.0827
0.9376
0.0788
I (~)-1
-0.00100 -0.00011
0.00538
y -
'"T
'"P
syrn.
0.00049
0.00011
TA 1535-H
4_
'"TI
0.1471
'"
TI
'"
e=
'"p
=
'"T
0.8953
[0 00103
0
Iy
(~) -1
t
l s~.
0.0762
'"T
'"P
-0.00228 -0.00044
0.00657
0.00115
0.00029
TA 1535-R
'"
TI
'"TI
'"6=
'"p
'"
T
0.00]15
0.1751
=
0.7846
0.0607
I y (§)-l
=
syrn.
'"P
'"
T
-0.00187
-0.00042
0.00444
0.00078
0.00024
1
J
63
Table 2.4.A:
The ML Estimator
Matirx Iy(~)-l
Q=
"
.I\.
"
"
(iT,p,T)
and the Covariance
via EM Algorithm in Each Strain-
Activation Set Based on the Stat-Call
TA 1537-N
r
A
iT
A
e=
A
p
=
A
T
l
A
A
P
T
0.00303
0.1034
0.7027
A
iT
I (~)-l
y -
-0.00890 -0.00105
0.02942
=
sym.
0.0584
0.00313
0.00043
TA 1537-H
-_
A
0.1034
iT
A
e=
A
p
=
A
0.8472
Iy
(~)-l
A
A
iT
P
T
r 0.00093
-0.00289
-0.00036
0.01162
0.00127
=
t
0.0533
T
A
0.00021
sym.
TA 1537-R
A
A
A
8=
p
A
T
=
0.9382
0.0659
A
A
P
T
0.00038
0.0855
iT
A
iT
Iy(§)-l
=
sym.
-0.00129
-0.00014
0.00673
0.00064
0.00012
64
Ta bl e
2•4• B--
The ML Estimator
A
_8 =
A
"
""
(n,p,T) and the Covariance
Matrix Ir(~)-l via EM Algorithm in Each StrainActivation Set Based on the Zeiger-Call
TA98-N
r
A
n
A
A
8 = P
l
A
T
.-
r
0.0962
0.8274
0.0073
A
A
A
n
P
T
0.00047
-0.00170
-0.00016
0.00917
0.00074
0.00008
I (~)-1
y -
l
sym.
J
TA 98-H
"n
0.1321
A
n
A
A
8=
P
=
A
0.9365
[ 0.00018
I y (§) -1
l
0.0101
T
=
A
A
P
T
-0.00010
-0.00001
0.00063
0.00005
0.00002
sym.
TA 98-R
A
n
A
A
A
8=
P
A
T
0.00022
0.1390
n
=
0.8484
0.0010
Iy(~)-1
sym.
1
A
A
P
T
-0.00024
-0.00003
0.00145
0.00009
0.00001
65
Table 2.4.B:
The ML Estimator ~ = (n,p,T) and the Covariance
"
A.
"
'"
Matrix Iy(~)-l via EM Algorithm in Each StrainActivation Set Based on the Zeiger-Call
TA 100-N
'"
n
"8
'"
= p
=
'"
T
r
l
'"
n
'"p
'"T
1
0.1150
1.0000
0.0348
1y(~)-1
=
7
J
TA100-H
-e
0.2139
'"
n
'"8=
'"p
=
'"
T
0.9338
'"n
'"p
'"T
[0.00026
-0.00009
-0.00002
0.00038
0.00005
1y (~)-l
l syrn.
0.0174
0.00003
TA 100-R
'"
n
'"8=
'"p
'"T
7
0.2018
=
0.9153
0.0123
1y (§)-1
'"
n
'"p
0.00026
-0.00010
-0.00002
0.00048
0.00005
=
sym.
'"T
0.00002
When the ML estimator is obtained at the boundary of the parameter
space, usual asymptotic results fail to hold (see Chernoff, 1954,
and Feder, 1968, 1978).
66
4 B''
Ta bl e 2••
A
The ML Estimator _8
=
"
"
A
(TI,p,T)
and the Covariance
Matrix I (§)-l via EM Algorithm in Each Strain-
y
Activation Set Based on the Zeiger-Call
TA 1535-N
A
TI
A
A
8 =
p
=
A
T
r
l
r
0.0721
0.8518
0.0071
'"
TI
0.00013
I (~)-l
y -
=
l
A
P
'"T
-0.00026
-0.00002
0.00272
0.00010
0.00001
sym.
J
TA 1535-H
A
-e
'"
TI
'"
8=
0.0905
A
p
=
A
0.8922
Iy
(~)-l
'"P
'"T
[ 0.00019
-0.00046
-0.00004
0.00339
0.00023
l
0.0203
T
TI
=
0.00004
sym.
TA 1535-R
'"
TI
A
'"8=
A
P
'"
T
0.00031
0.1254
TI
=
0.7748
0.0014
Iy(§)-l
=
sym.
1
'"P
'"T
-0.00061
-0.00006
0.00316
0.00022
0.00003
67
Ta bl e 2.4. B
" '" " and the Covariance
The ML Estimator _8" = (TI,P,T)
Matrix Ir(~)-l via EM Algorithm in Each StrainActivation Set Based on the Zeiger-Call
TA 1537-N
A
TI
'"
~
A
=
P
=
A
T
r
l
'"
TI
'"P
0.0418
0.00010 -0.00051
0.8622
0.0048
0.00728
,
Iy(~)-1
sym.
'"T
-0.00002
0.00021
0.00001
J
TA1537-H
A
TI
-e
A
A
8=
A
p
r0.00009
0.0554
TI
=
A
0.9637
Iy
(~) -1 =
0.0136
T
l
'"T
'"P
-0.00014 -0.00001
0.00151
0.00007
0.00001
syrn.
TA 1537-R
'"
TI
'"
TI
A
A
8=
P
'"
T
a
'"P
'"T
0.0534
=
1.0000
I y (§)-1
1
=8
0.0122
The ML estimator at the boundary of the parameter space does not
have the usual asymptotic results. See the footnote 7.
CHAPTER III
MIXTURE OF THE MULTINOMIAL DISTRIBUTIONS
In section 1 of this chapter discussion focuses on the goodness
of fit test for a binomial distribution against alternative paramI
etric mixtures of binomials.
We show that the locally optimal test
for detecting extra-binomial variability from a mixture alternative
can exist in the one-way layout.
In section 2, 'where the beta-bino-
mial discussed in section 1 is generalized, we develop a random
effects model for a one-way layout contingency table by employing a
Dirichlet mixture of multinomial distributions.
We then discuss
three tests of whether the random effects are negligible.
3.1
Binomial Case
In the binomial distribution of James Bernoulli the n Bernoulli
trials are assumed to be independent and the success probability is
constant throughout these n trials.
It has been frequently observed
in practice, for example in biological experiments, that either one
or both assumptions are not satisfied.
Sometimes these observations
can be explained or understood by investigating the underlying
mechanism.
For instance, for the number of cavities among children
the 'success' probability may differ from child to child due to
differences in nutrition and other factors; hence, independence among
those cavities in a child may not be maintained if the probability is
viewed as a random variable itself.
For other examples of deviations
69
from the binomial conditions see Chatfield and Goodhardt (1970),
Griffiths (1973), Haseman and Soares (1976) and Haseman and Kupper
(1978) .
The inconstancy of the success probability caused heterogeneity
of Student's yeast cell count data (Student, 1907), which led K.
Pearson (1915) to consider a mixture of two binomials.
A mixture of
two binomials has three parameters, which may not be sufficiently
parsimonious in a small data set as an alternative to the one-parameter binomial distribution.
In what follows we focus on a well-known
two-parameter generalization of the binomial, the beta-binomial
distribution, which can be described as follows:
Let X have a binomial distribution B(n,u), 0
<
u
<
1 and let U itself
be a random variable that has a beta distribution Beta(a,S) with
-e
eters a
> 0
and S
> 0,
~aram-
i.e.,
a > 0, 13 > 0
(3.1)
where
Be (a, b) = f lOX a-l( 1- x )b-l dx,
a > 0, b
> O.
(3.2)
Since the mean and covariance of U are
E(U) = a/(a+B)
Var(U)
= p,
(q
=
l-p)
(3.3)
= p(l-p)/(a+B+l) ,
respectively, it is useful to reoarametrize a and B into
p
= a/(a+B)
e
=
Then e
(3.4)
l/(a+S)
= 0 implies that Var(U) = 0, which reduces a beta-binomial
into an ordinary binomial distribution.
With the new parametrization
(3.4) it can be shown that the marginal distribution of X can be
70
represented by
h(x) = (~)Be(% + x,
%+ n-x)/Be(% ' %)
(3.5)
Xl
= (~) IT- (p+t8)~ r-X-l
IT (q+t8)~ /n-l
IT (l+te).
x G
t=o
t=O
t=o
Using previous notation, (3.5) can be expressed as
x~
B(n,u)
A
u
Beta(p,8)
(3.6)
The beta-binomial distribution in the form of the beta mixture
of binomials in (3.6) appears to have been first introduced by
Skellam (1948).
It may be noted, however, that the
~robability
mass
function (3.5) of the beta-binomial distribution has been recognized
since 1923 as the Polya-Eggenberger urn model with stochastic replace-
·e
ments, which includes the binomial and hypergeometric as soecial cases.
We
refer to Johnson and Kotz (1977, Ch.4, 1969) for a detailed
description of the Polya-Eggenberger distribution, and for various
other nomenclatures of the beta-binomial distribution.
Since Skellam (1948) introduced the beta-binomial distribution,
many authors have emoloyed it in the analysis of biological data, most
note\'IortI1Y among them being:
Crowder (1978
Kemp and Kemp t 1956), Hill i ams (1975),
Feder (1978) and Segreti and t1unson (1931).
We may illustrate the difference between the
ordina~y
binomial,
the beta-binomial and other proposed binomial generalizations using
the following example.
In certain toxicological experiments with animals the outcome of
interest is the occurrence of a dead fetus in a litter that receives
a certain treatment.
A typical example would be the dominant lethal
71
test (Haseman and Soares. 1976) to determine the mutagenicity of a
compound.
Let
Yij k =
r
if the k-th fetus in the j-th 1itter receiving
(3.7)
the i-th treatment is dead.
O. othen-lise.
and
n..
lJ Y.. k
= Lk=l
lJ
X..
lJ
(3.8)
for k=l .... ,n ij , j=l, ...
,9-.
i=l •...• 1.
Then by assuming that the litter-specific probability of fetal death
has a beta distribution we have
X..
lJ
~
B(n .. , u.)
lJ
1
A
U
i
Beta ( p. , e.).
1
(3.9)
1
Now the beta-binomial model in (3.9) is partially analogous to the
random effects model under normal theory
(3.10)
where Y..
k* denotes the observed response,
the n..
lJ
.
lJ
I
S
are random
effects that are independent N(~,0~) and the Eijk's are independent
N(O,0 22 ) errors of observation and are independent of nij.
In the random effects model (3.10) Y.· k and Y.. k l for k ~ k l are
1J
lJ
conditionally independent given nij but Yijk and Yijkl are unconditionally dependent with correlation 0i/(0f+0~). Hence with the betabinomial model one introduces intra-litter correlation, but it is
always positive.
The correlated binomial model developed by Kupper and Haseman
(1978) is more flexible than the beta-binomial in the sense that it
72
can allow negative intra-litter correlation.
Using Bahadur's tech-
nique (Bahadur, 1961) they 'corrected' the ordinary binomial to
incorporate the intra-litter dependence and arrived at a probability
mass function tp.m.f)
h tx .. )
c 1J
=
n. .
x .. n .. -x ..
(l J ) p.1 J q .1 J 1J
x. .
1
1
1J
i
+ T2
2+x .. (2p.-l)-n .. p.]
2 } ,
2 [(x 1J
.. -n 1J
.. p.)
{ 1
1
1J
1
1J
1
2 p.q.
(3.11)
1 1
where
Tl'
=
Co V (Y 1..J'k' Y..
k' )
1J
q.
= 1-p 1..
fo r k "f k
I
and
1
The possible range of Pi = T/tPiqi) is calculated in Kupper and
Haseman (1978) for several choices of p. and n...
1
1J
1
In the same vein A1tham (1978) obtained a multiplicative
generalization of the binomial using the multiplicative definition of
interaction for count data.
Even though the multiplicative generali-
zation has a remarkable property that it belongs to the two-parameter
exponential family, it may not be so easy to work with as the other
two-parameter generalizations.
For a goodness of fit test of a binomial against a mixture
alternative we may classify the data types into three categories
under HO:
1A1tham t1978) actually obtained two generalizations of the binomial.
However, the additive generalization is equivalent to the correlated
binomial discussed above.
73
Xl, ... ,X kiid
"" B(n,p)
(A)
A random sample:
(B)
Non-iid case, no replication:
te)
One-way layout:
X1 , ... ,X k such that Xi ind
.... B( ni,p )
for i=l, ... ,k, and all ni distinct.
i id
Xil,···,X ir .""' B(ni,p)
1
fo r all i =1, ... , k.
For case tAl Wisniewski (1968) showed that the test based on the
classical chi-square statistic
\'k (
AA
SA = Li=l
Xi-npA)2 /npq,
where
P
k
= Li=l Xi/n and q = 1 -
P,
is the locally most powerful (LMP) test having Neyman structure against
·e
a wide class of mixture alternatives.
It can be further shown that
the test based on SA is locally most powerful unbiased (LMPU) against
the same class of mixture alternatives.
tSee Appendix I for the
proof. )
Potthoff and Whittinghill l1966.a) derived the LMP test against
the beta-binomial alternative in case (B) when
~
is known; their test
rejects the null hypothesis for large values of
SB
=
t L~=l
Xi (X i -1)+
~ L~=l(ni-Xi)(ni-Xi-l).
(3.12)
When p is unknown in case (B), Wisniewski (1968) proposed a test based
on
AA
S*B = Li=l
\'k (X i-niPA)2/ pq,
(3.13)
where
Recently Tarone (1979) showed that the test based on
S~
is the corre-
74
sponding C(a) test against the class of general mixture alternatives,
and hence it is asymptotically locally most powerful.
As will be shown below, the general mixture alternative suggested
by Wisniewski is broad enough to include both the beta-binomial, and
the correlated binomial when the correlation is positive.
Hence fur-
ther discussion of the detection of extra-binomial variability from
mixture alternatives can be focused on the general mixture alternative
suggested by Wisniewski.
Definition 3.1:
Suppose a random variable X has a mixture distribu-
tion of the following form:
(3.14)
for 0
·e
<
u
< 1,
where U is a random variable having a p.d.f g(o)
with mean p and finite variance a2
Then the mixture distribution
(3.14) is called the Wisniewski-type general mixture.
Even though Kupper and Haseman (1978) derived the correlated
binomial in an attempt to 'correct l the ordinary binomial it is useful to derive the p.m.f (3.11) of the correlated binomial when 8 i > 0
from the Wisniewski-type general mixture of binomials. It is obvious
that the beta-binomial distribution belongs to the class of
Wisniewski-type general mixtures.
Lemma 3.1:
Up to the order of a2 , the ~.m.f of the Wisniewski-type
general mixture corresponding to (3.14) is equivalent to the corre1ated binomial distribution wnen the correlation is positive.
Proof.
If we make a change of variable by putting
U
= p(l+cV),
where c = alp, (3.14) becomes
X ~ (~)pxqn-xJ'(l+cv)x(l_cPv/q)n-xg*(v)dV ,
(3.15)
75
where g*(o)
q
is the p.d.f of the standardized random variable V, and
= l-p.
Let h(x) denote the marginal p.rn.f of X. Then
2
h(x) = (~)pxqn-x {l+ ~qP ~(~-l) + (n-x)~n-x-l) - n(n-li]
+
O(03)}
= (n)pxqn-x {l+
x
(3.16)
2
0
2p2q2
[(x-np)2+x(2P_l)_np2]+0(03)}
Thus after deleting 0(0 3 ) terms, (3.16) is equivalent to the corre-
o
lated binomial when the correlation is positive.
Recently, using the C(a) procedure, Tarone (1979) was led to the
same test statistic S8 in (3.13) against the correlated binomial,
beta-binomial and the Wisniewski-type general mixture alternative.
·e
Tarone's result of having the same C(a) test statistic S8 against those
three alternatives is seen to be an immediate consequence of Lemma 3.1.
In detecting extra-binomial variability from mixture alternatives
the locally optimal test has been derived only for the iid case.
This
is not, however, the situation for the closely related Poisson case.
Collings and Margolin (1983) derive a LMPU test that detects negative
binomial departure from the Poisson in the one-way layout case by
extending Potthoff and Whittinghill's result in the iid case (Potthoff and Whittinghill, 1966 b).
In the following lemma we provide a
necessary and sufficient condition on the mixing distribution for the
existence of an LMPU test for extra-binomial variability from a mixture departure in a one-way layout.
Lemma 3.2:
Suppose the mean success probabilities are unknown.
Let
the null hypothesis HO and the alternative hypothesis Ha be represented
76
as follows;
j=l, ... ,ri , i=l, ... ,k
(3.17)
where Ul ,U 2 "",U k are independent random variables and Ui has a
p.d.f gi(e) with mean Pi and finite vadance E;;cr~ for i=l, ... ,k.
Then the LMPU test of HO against Ha exists if and only if cr~ is a
constant multiple of p~q~ for i=l, ... ,k.
Using transformations Ui = Pi(l+ciV i ), where c i = .{"cr./p.
1
1
(3. 17) becomes
n.
x.. n.-x..
x..
n.-x ..
X.. - ( 1) p. 1 J q .l l J J(l+c.v.) l J {1-C.p.v./q.) 1 lJg~(v.)dv., (3.18)
lJ
x.. 1
1
1 1
1 1 1 , 1
1
1
1
lJ
Proof.
g~(e)
1
where
is a p.d.f of V..
1
Under the null hypothesis HO : E;; = 0, S(~) = (Xl+, ... ,X k+),
r.
where Xi + = Lj~l Xij for i=l, ... ,k, is complete and sufficient for the
unknown parameter Q = (Pl, ... ,Pk)' We attempt to construct a test
having Neyman structure.
The conditional likelihood of
~_ = (~l"" '~k)'
where ~i = (X i1 ,··· ,X ir .) for i=I, ... ,k, given the
1
sufficient statistic S(~) is
k
ri
II
II
i=l j=l
)(ij
under HO' and
k
A
L
{l+ ~'
i =1
<"
k n.r.
n.
( 1 ) /
II ( 1 1)
i=l xi +
(3.19)
r.
2
,1
cr zi 2 L [(x .. -n.p.) 2+x .. (2p.-l)-n.p.]+0(E;;
2
3/2 )}
2p. q. j =1
1J
1 1
1J
1
1 1
1 1
k
ri
II
II
i=l j=l
n.
(1)
X ij
(3.20)
77
under Ha , where A is a quantity depending on the data only through
Sl~) .
Since the conditional likelihood ratio, i.e., the ratio of (3.20) to
(3.19) depends on the unspecified parameters, no uniformly most
powerful test of Neyman structure exists.
By the Neyman-Pearson fundamental lemma the locally most powerful test criterion having Neyman structure becomes the ratio of
~ +
(3.20) to l3.l9) as
0, which is
2
k
O"i
ri
2
2
S = I· 1 22 I· l[(x .. -n.p.) +X .. (2p.-l)-n.p.]
c
1= 2p.q. J=
1J 1 1
1J
1
1 1
(3.21 )
1 1
However, Sc cannot be a test statistic unless the dependence on
222
O"i/(Piqi) is removed. This dependence is removed if and only if
222
O"i=aPiqi'
i=1,2, .. .,k
(3.22)
where a is a constant.
Now under (3.22) the LMP test of Neyman structure based on Sc is
equivalent to a test based on
* _
SC
k
ri
2
-'·l'·lX
...
L. 1 = L. J = 1J
The argument in Appendix I can be extended to the one-way layout
data to show that the LMP test of Neyman structure based on S* is
. c
LMPU.
Remark:
It is difficult to find a class of mixing distributions that
satisfy l3.22); the beta does not.
3.2
0
Random Effects Model of One-Way Layout
In the last section, we discussed mixture models for binomial
distributions, so as to allow random variation of the success prob-
78
e.
ability.
We extend the discussion to mUltinomial distributions and
focus on a Dirichlet mixture of multinomials alternative.
This section
consists of a model specification to accommodate random effects in a
multinomial model, the development of an asymptotically optimal test
statistic for detecting these random effects by using Neyman's C(a)
procedure (Neyman, 1959), the null and alternative distribution of the
test statistic, and the large-sample comparison of the C(a) statistic
wtth the classical chi-square statistic.
Because of Lemma 3.2, which
includes the case of beta-binomial distributions, it is difficult to
find a locally optimal test for detecting a Dirichlet mixture departure
from a multinomial in the non-iid case.
Even in the iid case the local
optimality of a test that detects Wisniewski-tyoe general mixture departures from a binomial is not preserved in the multinomial case.
-e
This is
shown in Appendix II in the case of a Dirichlet mixture of multinomials.
A Monte Carlo simulation of the power comparison of the C(u) statistic
and the chi-square statistic is presented.
Finally a duality between
the C(u) statistic and Light and Margolin's Catanova statistic is
discussed.
3.2.1
(Light and Margolin, 1971, Margolin and Light, 1974).
Dirichlet-Multinomial
We
~onsider
r~del
the multinomial as a generalization of the binomial and
consider a product of multinomials likelihood.
Experimentally this can
arise from a situation in which we have G unordered experimental groups
and I unordered response categories with n+ j observations taken in group
j
for j=l, ... ,G.
Data from such sampling can be represented in the
following contingency table;
79
I~
1
Response
1
n11
2
n21
···
•
··
I
nIl
Group
Total
n+ 1
2
·..
G
·.. n1G
n22 ·.. n2G
·•
··
n12
Response
Total
n1+
n2+
···
·
•
n12
n1G
n1+
n+ G
n++
·..
n+ 2 ·..
Table 3.1
IxG Contingency Table
Let the j-th group response vector, given the group total be denoted by
D.jl= (nlj, ... ,n 1_1j ).
One natural way of imposing random group effects
on the j-th group response vector is to generalize the multinomial
distribution by allowing the group probability vector itself to have a
Dirichlet distribution.
Thus we have
ind (
(D.j I~j' n+ j ) ~ Mn+ j , ~j)
for j=l,~ .. ,G and
U U
~1 '~2""
U iid D(r:<).
~
'~G
(3.23)
(3.24)
From (3.24) it can be easily shown that the means, variances, and
covariances of Yi = yl = (U 1 , ... ,U 1_1 ) are
80
E(U.)
= 8./B
1
1
Var(U i ) = 8i (B-8 i )/B 2 (B+l)
COV(Ui,U j ) = -8 i 8j / B2 (B+l),
(3.25)
i f j
for i, j=l, ... ,I-l, where B = L~=18i.
It is useful to change the parameters by putting
Pi = 8;1B
for i=l, ... ,I-l
(3.26)
8 = liB .
Then it is an easy exercise to show that the marginal distirbution of
~j'
which is a Dirichlet mixture of multinomial distributions, has a
p.m.f
h(nj;£,.e) = (
n+
j
ni j , ... , nI -1 j
·e
r~ nit (Pi+re~ /
G-1
y-O
J
1
[+r
r-O
(l+re~,
'J
(3.27)
,I -1
where nIj = n+ j - £i=l nij for j=l, ... ,G.
We refer to a Dirichlet mixture of multinomials (3.27) as a Dirichletmultinomial distribution and denote it by
(3.28)
whe re £. = (P 1' ... , PI-1)
I •
which is symbolically described as
!!j ~ M(n+j' ~J.) Q. 0(£,8) •
~J
Thus as an extension of a product of multinomial distributions the joint
distribution of (nl, ... ,nk) becomes a product of Dirichlet-multinomial
distributions, i.e.,
(
nj ind
~ OM n+ j , £,8),
j=l, ... ,G.
(3.29)
81
Since a Dirichlet-multinomial distribution is a multi-dimensional extension of a beta-binomial distribution, there are several other termino10gies lJohnson and Kotz, 1969, 1977).
3.2.2 Test of the Random Effects
In the product Dirichlet-multinomial model (3.29) 6 becomes the
parameter of interest for testing the existence of random group effects,
because if 6 = 0 the model reduces to a product of mu1tinomia1s; this
is a device we and others have employed to allow a single parameter to
introduce random effects.
Thus the null hypothesis H of no random
O
effects and the alternative hypothesis H of the existence of random
a
effects can be expressed as
-e
6 =0
HO
Ha
(3.30)
8 >
O.
Based on the one-way layout contingency table in Table l3.1} the 10glikelihood function of 8, apart from the additive constant, is given by
G
I
nij -1
n+ j -1
£(6)= Lj=l{Li=l Lr=O 10g(P i +r6)- Lr=O 10gll+r6)} ,
(3.31)
3.2.2.1
Case of £ Known
It is easy to show that the uniformly most powerful (UMP) test for
HO versus Ha does not exist in this case. However, the LMP test of
Potthoff and Whittinghill l1966,a} rejects HO for large values of
a£(6}1
= 1 \'.G I· {I
nijlnij-1} - n+ . ln+ , -l) }
\'. 1
a8 6=0 2 lJ=
l,=
p.,
J
J
.
(3.32)
G
a:
Lj=l
{I
Li=l
n .. (n .. -1} }
'J 'J
pi
==
T1 '
82
\1-1
\1-1
where n1j = n+ j - Li=l nij and PI = 1- Li=l Pi'
Potthoff and Whittinghill (1966,a) proposed a method of moment approximation to the nul
I
distribution of Tl by finding constants e,f, and g
that satisfied
e T + f ~ x2 (g} ,
(3.33)
l
where x2(g} refers to a chi-square random variable with g degrees of
freedom; however, by expressing Tl in (3.32) in terms of a quadratic
form we can suggest another approximation of the null distribution of
Tl . To aid in the development, we introduce some useful results without proofs. Proofs can be found in Ronning (1982).
Under HO the covariance matrix of Dj = (n 1j , ... ,n 1_1 ,j)'
Lemma 3.3:
·e
is given by
(3.34)
= n+ j V ,
where 0p. = diag(P 1 , ... ,P 1- l }, and V = Op.1
1
Lemma 3.4:
EE'.
Let V and 0p. be defined as in (3.34).
Then
1
-1
V-1 = 0p.
+ ( l/P1)E,
(3.35)
1
where E is a (1-l}x(1-l) matrix consisting of one's only.
Lemma 3.5:
Let ~J
Z. be a (1-l) dimensional vector with elements
n ..
z .. = ~(--'!.l._ P.)
lJ
J
n+ j
1
for i=1, ... ,1-1, j=l, ... ,G;
then
Z~V-lz.
= \~ 1(n .. -n .p.}2/ n+. p .
~J
L1=
lJ +J 1
J 1
~J
is Pearson's chi-square statistic for goodness of fit in the j-th group.
83
Hence ZjV -1 Zj has an asymptotic chi-square distribution with I degrees
of freedom under HO.
Simple calculation can show that the test based on Tl in (3.32) is
equivalent to the test based on
T*l = [nJ"-n+J"P+ -2I{P- 1
1)] ,
-J
J- 1
1 1)],v-l[n.-n+·p~2I{p1~
where
1 is a
.~
~
(3.36)
~
(1-1)x1 vector of one's only and I ~ 2.
Thus by use of
Lemma 3.5 we can derive the following results;
(l)
When the n+j's are all equal and P = ll/I)!, Ti is equivalent to
Pearson's chi-square statistic.
(2)
If we assume that there exist a., 0
J
< a. <
J
1 for j=l , ... ,G such
lim n+ j , then the limiting distribution of (l/n++)Ti
n++-+<>o n++
is of the form
that a j =
__1__ T* ~ \~_
2{I 1 0)
n++ 1 H l.J-1 a j Xj -, ,
l3.37)
o
where {X~{I-1,0}; j=l, ... ,G} is a set of independent noncentra1
chi-square random variables with 1-1 degrees of freedom and non-
2
0=i L~:~
centrality parameter
(3)
(Pi _
~)2
.
In the special case of equal n+j's we have
__1__ T*
n++ 1
V
)
x2 {G (I -l) , Go) ,
where 0 is defined in (3.37).
3.2.2.2 Case of
E Unknown
The case of unknown P is far more interesting, especially in terms
of applicability to real problems.
It is shown in Appendix II, that a
locally optimal test for testing HO: 8 = 0 versus Ha : 8
>
0 does not
84
However, a C{a) test is readily available.
exist.
In order to derive the C(a) test statistic we need the following
partial derivatives of the log-likelihood
8
~(8)
of (3.31) evaluated at
= O.
(3.38)
I-l
).(
21. - 11n.~
(n+J.- I 1=
- 1n..
.-1 )}
1J n+ J.- . 1=
2p2
I
(3.39)
for ;=1,2, ... ,1-1.
1 G I
= _ -I
{I
8=0
6 j=l i=l
·e
n.. (n .. -l)(2n ..
1J 1J
1J
-1)
p~
(2n+ j
-1)}
1
(3.40)
for i =1, ... , 1-1 ,
where EO implies that the expectation is taken under e = O.
Neyman
(1959) (see also Moran, 1970) has shown that when EO[¢2i(£)] = 0, the
null hypothesis can be tested using the statistic ¢l(E). where E is a
root - n++ consistent estimator of £. An obvious choice for E is the
l
MLE ~~ = ---n
under HO. Substituting the MLE ~~ in (3.38) we
++ \~
L. J =In.
~J
obtain
A
2~1(f)
~.-
=
I J.G l[n.-n+.PJ'V
A
=. ~J
J~
A-l
A
[n.-n+oP]
- (1-1)n++
~J
J~
(3.41)
n+j[n;j
ni +
-_ n++L.j=lL.i=l
\ G \ I ----- - --n. r.=-n
1+ vn+ j
++
J2 - (1-1)n
~
+j
++'
85
where
v=
, £ £1
0"P. -
Hence we see that the C(a) test is based on
(3.42)
In determining the approximate null distribution of Tk two
limiting results are available. One uses the central limit theorem
(CLT) on the iid multinomial random vectors as the sample size G tends
to infinity.
In this limiting argument, Tk , properly normalized, has
an asymptotic N(O,l) distribution by the result of Neyman's C(a)
procedure (Neyman, 1959).
Since
EO[~2i(E)J
= 0 for i=l, ... ,I-l,
the variance of ~l(£) is estimated by -EO[~3(E)].
From (3.40) it
foll ows that
·e
1
G
-EO[~3(£)] = 2(I-l)Lj=ln+ j (n+ j -l)
1
(3.43)
A
Since Tk = n++ 2~1(E) + (1-1), by normalizing Tk , we find that under
HO: e = 0 the statistic
(3.44)
has an asymptotic chi-square distribution with 1 degrees of freedom.
We may consider another limiting argument that uses the multivariate normal approximation of the multinomial distribution when the
number of groups, G, is held fixed and the group sizes {n+j}~=l tend
to infinity in such a manner that n+j/n++
j=l, ... ,G.
+
aj ,
a<
a j < 1, for
In the following discussion, the approximate null and
alternative distributions are based on this limiting arguments, which
may better reflect practical experimental considerations where the
number of groups is fixed; we conjecture that these results will
86
provide a better sampling approximation for finite sample sizes.
The hypothesis test HO : e = a versus Ha : e > a has been described
as detection of a Dirichlet-multinomial departure from the multinomial
distribution.
For this purpose two other test statistics that have
been proposed for fixed effects problems are worthy of consideration:
(3.45)
and
c =
(3.46)
where Xp2 is Pearson's chi-square statistic and C is the Catanova
statistic suggested by Light and Margolin (1971, also Margolin and
Light, 1974).
For the relations among these three statistics, Tk, C
2
and Xp,
we observe that
(i)
When n+ j = n for j=l, ... ,G, the test based on Tk is identical to
2
the test based on Xp.
(ii)
When 1=2, C is equivalent to X~.
Hence when n+ j = n for all
equivalent.
j
(Light and Margolin, 1971).
and 1=2, these three statistics are
For the comparison of the three statistics in terms of large sample
behavior, we obtain the asymptotic relative efficiency (ARE) of X~
relative to Tk.
tractable form.
The ARE of C relative to Tk does not turn out to be a
Later we discuss duality between Tk and C.
87
3.2.3 Approximate Null and Alternative Distributions
Approximate Null Distributions
3.2.3.1
We define the following notation for j=1,2, ... ,G;
Z
~J
I.
Z'.(P)
~J
=
=_
A
1 (
P
P )
nij-n+
j 1,···,nI_lj-n+j 1-1
r---
A
(n l +
A
.t: = (P1' . . . , PI-1)= n++ ,...,
I -1+J
nn++
L. = Z.(P)
~J
~J
(3.47)
vn+ j
(3.48)
(3.49)
~
(3.50)
(3.51)
·e
vH
=[ / AJ
~ , /~
~ ,.... / ~]
~
++
++
++
M = [n+1, n+2 ,..., n+G]
n++ n++
n+
=
~
= (!a1
J
A
(3.53)
1
n++
j
lim --as n+ . and n++ tend to infinity
J
n++
o.·
,... , 1aG)1
= (0. 1 '""
= lim
n++-+<:o
(3.56)
~
and A = lim M.
n++-+<:o
It is well known that as n .
+J
v
Z. -'-'-+H
~J
0
N(O, V)
~
for j=l , ... ,G, where
V = V(£) = DP. - ££1
1
(3.54)
(3.55)
o. G) 1
~
We may express
(3.52)
+
00
(3.57)
88
Also we note for j=l, ... ,G
Z.
~J
~(P-P)
+J ~ ~
= ~J
Z. -
(3.58)
(.n+j)~
= ~j
Z - 1
- I G_ (n+k)~
- k
In++
k-l n++ k
By using the above we can express %in terms of
1 as
(3.59)
where I k is a k x k identity matrix and ~ stands for the Kronecker
product. The asymptotic distribution of k can be obtained by using
(3.57) and the independence of kl, ... ,kG:
v N(Q,(I @
k~
G
o
(3.60)
V).
~~I)
Hence by using (3.59) and the idempotency of (I G -
-e
V
k le-+
N(Q, I Go
A
~~I) ~
we obtain
(3.61)
V).
Now, Pearson's chi-square statistic X~ can be expressed as
2 _ \'G
A
I
A_
Xp - Lj=lZjV
1
.A
(3.62)
Zj
= ZI{I G ~ V-l)Z
~
~
For further discussion, the following lemma is useful.
Lemma 3.6:
Proof.
Under H
O
V= V + 0p(l).
Using maximum absolute column sum
1\
-II,
for the matrix norm
we have
II
A
V-VIl l =
1-1
A
1-1
A
A
A
max L--l
IP'P
- P.PII
= I·-1IP.P
P.PII
l::=;;i::=;;I-l
11 I
1
11 c
1
where PI = 1 - Li:~Pi and Pi is accordingly defined.
ni +!fa B(n++ ' Pi)'
Pi
= Pi + 0P(1) as n++ -+
the continuity the result follows.
00
Since
for i =1, . . . , I-1. Thus by
0
89
Thus by lemma 3.6 (3.62) can be written as
(3.63)
By invoking a theorem in quadratic forms it can be seen that X~ is
asymptotically distributed as
2 V
\(I~l)G A* 2(1)
Xp ~ Li=l
iXi
a
(3.64)
'
where {Ai; i=l, ... ,{I-1)G} is the set of eigenvalues of
(I
and
G
V-1)[(IG-~~')
0
V] = (I - ~~I)
G
0
0
1 _
1 1
(3.65)
{X~(l); i=1, ... ,(I-1)G} are iid chi-square random variables with
1
The eigenvalues of (I G - ~1A1) 0 11-1 are cross
products of eigenvalues of (I G- ~~') and those of 1 1 _1 . Since 1 1_1
degrees of freedom.
has an eigenvalue
·e
1 with multiplicity
I~l,
(3.64) is equivalent to
2 V
G
2
Xp ~ Ii=l PiXi(I-1) ,
(3.66)
a
where Pi's are eigenvalues of (I G- 1A/A
Since I G - ~~I is idempotent of rank G-l we have (G-l) one's and one
1
).
zero for its eigenvalues.
Thus (3.66) becomes
X~ ~ x2 (I -1)( G-l )) ,
(3.67)
a
a well known result.
We now consider the null distribution of the C(a) statistic Tk ,
which can be expressed as
(3.68)
=
\~
[n+j]~ ~J
Z~V-l[n+j]~Z.
LJ=l n
n
~J
++
++
90
For notational convenience we define
(n+j)~Z.
Z"'!" =
""J
(3.69)
n++ ""J
and
A
A
A
7:,* = (fi',"· ,fG
(3.70)
I )
By the same arguments for obtaining the distribution of % in (3.59)
we obtain:
%* ~ N(Q, (D
o
ai
-AA ' )@ V),
(3.71)
where
, = di ag (0'.1 ' • • • , a G) •
Da .
Now using %* and lemma 3.6, we can express Tk as
T = Z*'(I @ V-l)Z*(l+o (1)).
k ""
G
""
p
·e
(3.72)
Thus, using the same arguments employed in (3.64)-(3.66) the asymptotic
distribution of Tk under HO is obtained as
V
G
2
Tk ~ Lj=l Aj Xj(I-l) ,
o
(3.73)
where {AJ.:j=l, ... ,G} is the set of eigenvalues of (D a. -AA
We may
note here that n(Da -AA ' ) is the singular covariance matrix of a
,
I
).
i
multinomial distribution M(n,8).
Even though some computer subroutines can readily provide the
eigenvalues of (D a -AA ' ), the determination of the eigenvalues appears
i
to be an algebraically unsolved problem except that one of the eigenva 1ues is known to be zero.
and Ronning, 1982).
(Roy et
Since the O'.i
IS
~,
1960, Li ght and Margo 1in, 1971,
are known, however, we may approxi-
mate the distribution of Tk by 9X 2 (h), where the constants g and hare
chosen so that gX 2(h) has the same first two moments as those of Tk. In
91
doing this we use the following results on D -AA';
a
i
trace(Da· -AA ' ) = Al+···+A G- 1 = l-\~-l
lJ- a~J
,
2 2
2
trace(Da.-AA ' ) = Al+···+A G_l
,
G 2 \G 3 \G 2 2
= Ij=laj-2lj=laj+(lj=laj)
(3.74)
(3.75)
Thus the asymptotic distribution of Tk can be approximated as
1
g- Tk H:'
o
x2 (h)
where
g
-1
and
h
·e
=
G 2 G la.+(\.
3 G la.)
22'
\.
lJ= la.-2\.
J lJ= J lJ= J
3.2.3.2 Approximate Alternative Distributions
We next derive the asymptotic distribution of Xp2 and Tk under H .
a
Here we use the remarkable resemblance of the mean and covariance
matrix of the Dirichlet-multinomial to those of the multinomial distribution (Mosimann, 1962);
Ee(n.)
.....J = n+.p,
J.....
Cove(nj) =
(3.76)
j=l, ... ,G
(n+~:~l]
n+jV,
j=l, ... ,G,
(3.77)
where the subscript e indicates that the underlying distribution is the
Dirichlet-multinomial.
It has been observed that there are four different asymptotic forms
of the Dirichlet-multinomial distribution (Paul and Plackett, 1978).
Among them, one is of particular relevance to our development.
92
Theorem 3.1:
~
n.
~J
(Paul and Plackett, 1978).
M(n+.,u)
A D(~)
J ~ 1!;;::
= M(n+.,u)
J ~
Let
1!A D(p,e),
~
(3.78)
where
and the P.'s
and e are defined in (3.26).
1
Write Si
n++
-+
n++¢i for all i, where ¢i's are fixed quantities and let
=
Then
00.
-~
n+j(TIj-n+ j £)
V
H ) N(Q'Yj(e)v),
a
(3.79)
where
1i m (n+.e+1) .
-+00
J
++
n
n .-+00
+J
·e
We may note that by the construction of Si
that e
= n++¢i for all i we assume
= (L~_lS.)-l
= en++ = O(l/n++).
11
Hence using this result of Paul and Plackett, it is easy to see that
V
Z. -n--+
N(O,y.(e)V),
H
~
J
a
(3.80)
V
H ) N(Q, Dy . 0 v)
(3.8l)
~J
and
Z
~
a
J
where
Dy . = diag(Yl(e)""'YG(e)).
J
Thus by using (3.59) and (3.81) we obtain
Z
where
~) N(Q,Q
a
0
V),
(3.82)
93
= Dy. -
Q
~~'D
J
y.
J
-D
y.
J
~~I+ ~~'D
y.
J
~~I
i . e. ,
sym.
Q=
G
-~
l J (Y·+Y·-\k
1 Jl.= lakYk)
Now it becomes straightforward to show that
(3.83)
-e
where {oi:i=l, ... ,G} is a set of eigenvalues of Q, and
~
Tk
a
)
L~ =1oiX~ (I - 1) ,
(3.84)
where {oi:i=l , ... ,G} is a set of eigenvalues of D
QD
fa:
ra:
J
J
3.2.4 ARE of X~ Relative to Tk
To summarize the relevant distribution results, we have derived
the following:
2
2
(a) X --nV--+l x [(I-l)(G-l)]
P HO
(b)
where
Tk
(c)
H
O
are eigenvalues of Da. -AA ' .
1
2 V
G
2
Xp Hal Li=lOiXi(I-l),
A.IS
1
V
94
where 0i 's are eigenvalues of Q.
where
•
O~IS
are eigenvalues of 0
Q0
ffliffli
1
Thus it can be shown that
2
var(XplHO)
Var(TkIHO)
-----+)
(3.85)
2(I-l)(G-l)
) 2(I-l)L~=lA~=2(I-l)trace(Da.-AA,)2
(3.86)
1
G 2 G 3 G 2 2
=2(I-l)[Lj=laj -Lj=la j +(Lj=la j ) ]
•
d
ae
Ee[TkIHa]le=o ----+)
G 2 G 3 G 2 2
(I-l)[Lj=la j -2Lj=la j +(Lj=la j ) ]
= (I-l)trace(D
ai
(3.88)
_AA , )2
Hence the asymptotic relative efficiency (ARE) of the chi-square
statistic X~ relative to the C(a) statistic Tk is given by
(3.89)
where under H : e = e
= O(l/n++).
n++
a
Interestingly Collings and Margolin (1983) obtained the same
expression of an ARE as (3.89) when they compared a C(a) test with
another test for detecting a negative binomial departure from a Poisson
in the regression through the origin case.
They proved the following:
95
Theorem 3.1:
(Collings and Margolin, 1983)
where the left equality holds if and only if G=2 and the right equality
G
holds if and only if the group sizes {n+j}j=l
are asymptotically
balanced.
Using the expression of ARE epIC in (3.89) we can prove
Lemma 3.7:
The C(a) test is asymptotically equivalent to Pearson1s
G
are
chi-square test if and only if G=2 or all the group sizes {n+ j }j=l
asymptotically balanced.
Proof.
We may express epiC as
-2 2-2
epiC = A /(SA+A )
where
-1\,G-1
A = ( G-l ) li=l Ai'
2 ( )-1\,G-1( -)2
and SA = G-l
li=l Ai-A
.
Thus epiC = 1 if and only if s~ = O.
But s~ =0 if and only if G=2 or
A1= ... =A G_1' which is equivalent to a1= ... =aG.
1971, and Ronning, 1982).
3.2.5 Monte Carlo Simulation:
(Light and Margolin,
0
Power Comparison
As shown above, the test based on Tk is superior to Pearson's
chi-square test based on considerations of asymptotic relative efficiency; however, the large sample properties do not necessarily hold for
small samples, nor are the local properties of the asymptotic relative
efficiency readily transferable to practical situations.
Therefore, a
Monte Carlo simulation was conducted to compare the performance of the
two tests in terms of their sizes and powers.
96
The data for the Monte Carlo simulation were generated on the VAX
780 computer system at the National Institute of Environmental Health
Sciences.
The program was written in Fortran and used two IMSL sub-
routines:
GGAMR and GGMTN.
The following lemma is useful to generate random observations from
a Dirichlet distribution, say 0(£,8).
Lemma 3.8
(Wilks, 1962):
Let Xl ,X 2 , ... ,X k+l be independent variables
having gamma distributions G(l,Sl)' G(1,S2),···,G(1 ,Sk+l). Define
k+l
y., = X./(
" lX.)
, I J= J
for i=l, ... ,k.
Then
~
·e
r = (Yl,··.,Y k)
has a Dirichlet distribution
D(~),
where
= (6 1 , ... ,6 k+l ), and D(~) is defined in (1.16).
The Dirichlet distribution D(~) can be reparametrized as 0(£,8) by
(3.26).
The Fortran program of the Monte Carlo simulation is outlined
as follows;
S
and 8 = 8 0 , and initialize {n+j}j=l and the
upper bound (upbound) of 8.
(i)
(ii)
Set
£ = £0'
Generate a set of S independent probability vectors
Ql ,···,QS from a Dirichlet distribution 0(£,8)
~sing
IMSL subroutine
GGAMR and lemma 3.8.
(iii)
Generate a contingency table from a product multinomial
distributions n~J= lM(n+",u.)
using IMSL subroutine GGMTN.
J ~J
Calculate Tk and X2p.
(v) Count the number of Tk and X~ values exceeding their cut
off values corresponding to a = O.OS.
(iv)
(vi)
Go to the step (ii) and repeat for 2,000 times.
97
Set e = 80 + 6 and go to the step (ii) until 8 ~ upbound.
For the calculation of sizes of Tk test and x~ test a subset consisting
(vii)
of (iii)-(vi) of the above program was employed, because putting 8
=
0
in the step (i) involved division by zero in the step (ii).
The actual program was run for two sets of £0 and 8 ranges with the
same group sizes, which are listed in Table 3.2.
First Set
Po
(0.05, 0.1,0.4, 0.45)
(0.15, 0.2, 0.3, 0.35)
8
0.001
0.001
0.002
0.003
0.031
0.025
20, 20, 20, 200, 400
SAME
0
6
Upbound
·e
Second Set
Group
Sizes
Table 3.2
Two sets of Input Values of the Program.
The asymptotic relative efficiency of x~ to Tk is 0.415 for these group
sizes. Tables 3.3 and 3.4, respectively, display approximate power
functions of Tk and x~ tests for an 0.05 level based on the first and
the second sets of input values. Over the ranges of e values considered
the difference in powers can be as large as 0.086 for the first set of
input values and 0.115 for the second set.
The ratio of the power of
the Tk test to that of the x~ test falls as low as 0.76 both cases
considered. Clearly, the Tk test can perform better than the x~ test.
98
Table 3.3
Approximate Powers of T Test and Xp Test for
k
0.05 Size and!O = (.05,.1,.40,.45)1 and
5 _
{n+ j }j=l - {20,20,20,200,400 }
Approximate Power
.-
8
Tk
Xp2
0.000
0.0525
0.0505
0.0020
0.001
0.1060
0.0885
0.0175
0.003
0.2445
0.1700
0.0745
0.005
0.3545
0.2685
0.0860
0.007
0.4535
0.3680
0.0855
0.009
0.5255
0.4470
0.0785
0.011
0.5860
0.5175
0.0685
0.013
0.6385
0.5855
0.0530
0.015
0.7030
0.6480
0.0550
0.017
0.7165
0.6645
0.0520
0.019
0.7555
0.7270
0.0285
0.021
0.7805
0.7600
0.0205
0.023
0.7950
0.7925
0.0025
0.025
0.811 0
0.8100
0.0010
0.027
0.8250
0.8175
0.0075
0.029
0.8465
0.8410
0.0055
0.031
0.8640
0.8610
0.0030
difference
99
Table 3.4
Approximate Powers of T Test and Xp Test for
k
0.05 Size and ~o = (.15,.2,.3,.35)1 and
{ n+ j } j=~ =. { 20,20,20,200,400 }
Approximate Power
-e
e
Tk
XP2
0.000
0.0535
0.0480
0.0055
0.001
0.1050
0.0780
0.0270
0.004
0.2900
0.2095
0.0805
0.007
0.4790
0.3640
0.1150
0.010
0.5825
0.4885
0.0940
0.013
0.6460
0.5840
0.0620
0.016
0.7265
0.6865
0.0400
0.019
0.7640
0.7390
0.0250
0.022
0.7870
0.7815
0.0055
0.025
0.8440
0.8335
0.0105
difference
100
3.2.6
Duality Betwep.n C and
T.t<
Light and Margolin (1971) developed a categorical analysis of
variance (Catanova) procedure for data in an IxG contingency table.
They demonstrated that the measure of variation due to Gini could be
used to develop a measure R2 of explained variation, which in turn
could be viewed as a qualitative analog to the coefficient of
determination for continuous data.
Followinq Gini's definition of the total variation, Light and
Margolin defined the total sum of squares (TSS), the within group sum
of squares (WSS) and the between-group sum of squares (BSS) for the
replicated one-way classification under study as follows;
(3.90)
·e
(3.91)
BSS. .
'oJ
=
T55. . W55. .
'oJ
'oJ
(3.92)
where the index ioj indicates that the row variables are random and are
being predicted from the fixed column variable.
Based on these com-
of variation in the row variPonents a measure R?, ° J. of the proportion
.
able attributable to the column variable was proposed:
R.2 .
'oJ
=
B55. ./T55. . .
'oJ
'oJ
(3.93)
Later Margolin and Light (1974) observed that the R?, ° J. measure of
association and t a , the sample version of Goodman-Kruskal's La' were
computationally identical.
This observation led them to provide a
101
means of testing in the product multinomial model the hypothesis that
was equal to zero, a test for which Goodman-Kruskal's asymptotic
a
distribution result (Goodman and Kruskal, 1959) was not applicable.
T
The test statistic was
C.,OJ.
2
(n++-l) (1-1 )R..
'OJ
=
HO
X2 ((I - 1)( G-l) ) ,
where C.
. is Light and Margolin's Catanova statistic and'
1 J
'is approximately distributed as'.
(3.94)
, is for
0
The C(a) statistic Tk obtained in (3.42) can be rewritten as
n ..
1J
[ In+ j
ni+ In+
-nj
++
J2
(3.95)
I
L1=
. 1 -n
=
1· I G
. 1
i + "J=
n? J __n1_ \~
'~J--l
1
++ -
2
n+ J.
From (3.92) and (3.95) we observe that
T
k
2BSS. .,
=
J 1
0
(3.96)
where BSS. . is obtai ned from BSS. . by systema ti c interchange of
JO'
1°J
columns and rows. As a corollary to the above relationship (3.96) we
have
Lemma 3.9:
When we have only two grouping variables, i.e., G=2, the
C(a) test based on Tk is equivalent to the chi-square test based on
Xp'
')
Proof.
X~ and Tk statistics can be rewritten, respectively, as
102
(3.97)
(3.98)
If i and
are interchanged, the argument provided by Light and
j
Margolin (1971) yields the result that
X2
P
= 2 [n~+]
n+ l n+ 2
o
T
k
where G=2.
0
Remark:
This is stronger than one part of Lemma 3.7, i.e., the
asymptotic equivalence is now proven for all ratios of sample sizes.
Since
·e
(3.99)
is nonrandom, a test based on Tk is equivalent to a test based on
C. ", i.e.,
J
0'
T
k
a:
BSS. ./TSS. .
JO'
JO'
a:
C. .
(3.100 )
JO'
2
It can be shown that R..
= BSS JO'
.. /TSS JO,
.. is computationally equivalent
Jo,
to t b , an estimate of Goodman-Kruskal 's Lb (Goodman and Kruskal, 1954).
However, R~ . has a different operational interpretation from
J
tb(or Lb)·
0'
Rio i '
or equivalently Tk , is based on the column-wise
product multinomial model.
A possible case in which R~ . can be interpreted as a measure of
J
0'
association is discussed in Chapter 5.
103
Appendix I:
Wisniewski-type Alternatives
The proof follows the arquments in Lehman (1959) and Fraser (1957).
Let
¢(~)
denote any test for detecting mixture departures from the
binomial distribution.
1.
The power function 8¢(a) of any test is given by
under the Wisniewski-type general mixtures alternative (3.16).
fixed P 8¢(a) is continuous a = O.
For a
Hence any unbiased size-a test is
similar of size a.
L~=lXi is the complete sufficient statistic under HO. Thus any
similar test of size a has a Neyman structure with respect to I~=lXi.
2.
3.
As Potthoff and Whittinghill (1966,a) noted, a most powerful test
of Neyman structure is necessarily unbiased.
Hence the locally most
powerful test of Neyman structure is necessarily LMPU.
Apoendix II:
The Dirichlet-Multinomial Alternatives
Under HO: 8=0 (n l +, ••• ,n I _1+) is the complete sufficient statistic for the unknown probability vector
By conditioning on the
e.
sufficient statistic we attempt to find the locally optimal test of
Neyman structure as e
Under HO the conditional likelihood of the
data n = (nl' ... ,nG) given the sufficient statistic (n 1+, ... ,n 1_1+) has
7
O.
104
a 'generalized multivariate l ~ypergeometric distribution, which is
given by
(Al)
For the development of the conditional likelihood under Ha , the following lemma is useful.
Lemma A.l
Under Ha
n++
I
PH {(nl+,···,n I _l +)} =(
n) II
a
nl +,···, 1+ i=l
n. +-1
1
e
II {P.+r ,,)
r=O 1 U
n
-1
{A2)
+~ (l+r~)
r=O
G
(proof) .
Using the multi-urn (G urns) extension of the urn models with
stochastic replacements that generates a multivariate Polya-Eqgenberger
distribution and noting the equivalence of the multivariate
Polya-Eggenberger distribution
to the Dirichlet-Multinomial distri-
bution, the result readily follows.
Hence under Ha , using the lemma A.l, we obtain the conditional
likelihood of ~ given (n 1+,· .. ,n I +) as
~his is a generalization of the multivariate hypergeometric
distribution discussed in the context of urn models in Johnson and
Kotz (1977).
o
105
G
n+ j
I
n .. -1
lJ
(P.+re)
II
PH {n \(n l +,··· ,n I +)} = A[ IT (n
n) II
a
j=l lj'"'' Ij i=l
r=O
1
n .-1
+J
II
(A3)
(l+re)],
r=O
where A is a quantity depending on the data only through the
sufficient statistic.
Now, with some algebra (A3) can be rewritten as
(A4)
.. {n .. -1)
2
e II n lJ
{1+ z[
lJ
- n+J.(n+J.-l)]+O(e )}
i =1
Pi
{ 1+
-I
G n .. (n .. -1)
[. 1 . 1
zeLL
1= J=
1J
1J
P.1
D
G
2 }
- In. (n . -1) +0 (e )
. 1 +J
J=
+J
Thus the LMP test of Neyman structure, if it exists, has critical
region based on the large values of the ratio (A4) to (Al), which is
-I
,..
n .. {n .. -lf~
2
}
A{1+ Toe ' \ . ,\\; - lJ lJ
+ o{e )+ constant
~ L 1=FJ=1
P.
1
(A5)
= A{1+ 2e T1 + o{e 2 ) + constant }
where Tl is given by (3.32).
The test criterion Tl , which is equivalent to
unspecified parameters V- l .
Ti
in (3.36), involves
0
.
106
Remark:
Even for the iid case, because of multiparameter
f = (Pl,.·:,PI-1)', the dependence on V- l in (A4) cannot be removed.
Thus the result of Wisniewski that there is a LMP test of Neyman
structure for Wisniewski-type general mixture alternatives does not
extend to the multi-dimensional generalization.
-e
CHAPTER IV
BALANCED NESTED MIXED EFFECTS MODEL
4.1
Introduction
The one-way layout random effects model of chapter III can be extend-
ed within the framework of the Dirichlet-multinomial distribution to a balanced nested mixed effects model in which the row variable has fixed effects and the replications within each level of the row variable have random effects.
An example of a balanced nested mixed effects model for dis-
screte data may be obtained by modifying an example concerning anneals and
tinplates in Scheffe (1959, PI78).
While the tinplates are regarded as a
random sample from a large population, the anneals are not, the interest
being in individual performance of anneal treatments on a common number of
tinplates in terms of various levels of corrosion resistance.
Now, how-
ever, we consider a qualitative response with I levels instead of a quantitative one.
Let n. ok be the number of observations that were classified into the
lJ
i-th level of response for the k-th replication within the j-th level of
the row variable for i=l, ... , I,
j=l, ... , Rand k=l, ... , C.
Because
the random effect is nested within the fixed effects, the {n ijk } do not
constitute a true three-dimensional contingency table. Nevertheless, the
data would probably be reported in the form of such a table and might be
analyzed via a Pearson chi-square test by an unthinking statistician.
data might also be viewed as a three-dimensional table if there were an
The
108
attempt at blocking of experimental units, but in fact, the experimental
units were actually a source of random effects.
data as a three-dimensional contingency table.
We loosely refer to this
Denote the probability
~1' ~2""'!R'
vectors corresponding to the R levels of the row variable by
where TI, = (TIl" TI 2 " ... , TIl 1')~
-J
J
J
- J
fined in (1.13).
E
SaTI.
for j=l, ... ,R and
-J
Then by assuming that given
SaTI.
is de-
-J
~j
the k-th replication of
the j-th layer is determined by a Dirichlet-multinomial distribution
DM~+jk'
TIj' e), the joint distribution of {n ijk } is given by
(4.1)
.e
n+J'k =
II
i =1
n" k for j=l, ... , Rand k=l, ... , C.
lJ
The full model (4.1) is specified by parameters (TIl"'"
TIR,e).
Using this parametrization we consider the following hypotheses of interest;
(i)
No nested random effects
H:
r
e= a
(ii) No fixed row effects
Hf : TIl = TI2 = ... = !R .
Discussions in the following sections consist of finding a suitable test
statistics and its null distribution in each of these hypothesis testing
problems.
4.2 Test of the Nested Random Effects
We test the existence of the nested random effects in the presence
of the fixed row effects, which are represented by distinct TI.'s.
-J
How-
109
ever, if there were no fixed effects, the arguments in section 3.2 could
be employed for this problem.
4.2.1 C(a) Test
We define the following notation for j=I, ... , R, k=I, ... , C;
OJ = dia g(n 1j , n2j ,···, n I _1j )
V.J = V.(n.)
= D.J J ~J
A
(4.2)
n.n~
(4.3)
~J~J
A
V. = V.(n.)
(4.4)
Z'k
= ~J
Z·k(n.)
= (n+·k)-i
(n'
- n+·kn.)
~J
~J
J
~J k
J ~J
-1 C
TIj = (n+ jk ) Lk=1 ~jk
(4.5)
Z'k = Z'k(;')
~J
~J
(4.7)
J
J
~J
A
(4.6)
~J
fll = (/n+ j1 , jn+ j2 , ...
.e
n+ j +
J
a jk = lim n+jk/n+ j + as n+ j + tends to
ajk
(4.8)
n+ j +
00
(4.9)
•
(4.10)
= n+jk/n+ j +
,fA.J = ([a;;,
~2'
J1
J
(4.11)
A. = (a· , a· , ••• , a. )
J
J1
J2
JC
(4.12)
(4.13)
(4.14)
where nT = n+++.
Under the full model the joint probability of {n ijk } is given by
I
({
Pr nijk
e
})
'R
n 'k
.IT n .. (n ..+8) ... [IT ..+{n. 'k- 1)8J
C (+J
) 1=1 1J 1J
1J
1J
j
k
= n =1 IT =1 n'Uk' ... ' n1jk
(1+8)(1+2e) •.. [I+(n+ -l)8]
jk
Define £(8) = 'log, pr({n ijk }) . In order to obtain a C(a) test statistic
we need the following derivatives evaluated at e = 0;
(4.15)
= ddtll
de
(2)
'II ••
1J
_ d
({1!.}) -,,~
J
~(e)
'de
u" • •
lJ
e=o
= L\'R· =1 \,C
{r
L. k =1 \'.
J
L 1 =1
_
e-o
-
-
c
I k=1
n.1 J·k(n.1 J·k- 1 )
(4.16)
2TI .•
lJ
{-nijk(nijk-l)
onIjk(nrjk-1)}
2
+
2
2TI..
2TI ·
(4.17)
rJ
lJ
for i=I, ... , 1-1, j=I, •.• , R, where nljk = n+ jk - I~:~ nijk .
Since
E[,¥.~2)J
= a for all (i,j) under Hr:e = 0, following Neyman (1959) the
lJ
hypothesis Hr : e =
~j
where
MLE TIj
a
can be tested based on the statistic '¥(I)({nj})'
is a root-n+ j + consistent estimator of
we obtain a C(a) test statistic
(3) _ -1
T
- nT [
~j'
Substituting the
IR. 1 I CkIn.
(
~ )~A_l(
A)
k- n+ .k TI. V. n .k- n+ .k TI. ]
J=
=
~J
J
~J
J
~J
J-J
.e
Here we note the following representation of T(3);
(4.19)
where
(4.20)
is a C(a) statistic based on the j-th IxC contingency table for testing
H :
r
e = O.
Denote Pearson's chi-square statistic based on the three-dimensional
111
IxRxC
contingency table by X(3).
Then by the additivity of chi-square
random variables
(4.21)
where xj (2) is a Pearson's chi-square statistic based on the j-trr IxC
contingency table.
We may note that the C(a) statistic T(3) based on a three-dimensional contingency table is a weighted sum of corresponding C(a) statistics
based on its two-dimensional contingency sub-table, whereas the chi-square
statistic is a simple sum of chi-square statistics based on lower dimensional contingency tables.
The representation (4.19) of the C(a) statis-
tic T(3) will be used to obtain the asymptotic relative efficiency of
X(3) to T(3) later in this section.
.e
For notational convenience we define for j=l, ... ,R,
k=l, ... C
(4.22)
(4.23)
(4.24)
(4.25)
(4.26)
(4.27)
112
o
(4.28)
o
and
A
A
"
W = W(V , ... ,V )
R
1
(4.29)
Then after some algebra it can be shown that the C(a) statistic T(3) can
be expressed as
R
C "*~ A_I "*
- \LJ'-I
Z'k
V.J Z'k
L ~J
~J
- 'k-l
(3) _
T
A*~
(4.30)
A_I A*
= ~(3) W ~(3)
.e
Let
l
o
o
A>; (D ra: - D;a::
Rk
Rk
IA IA~)
I
H
(4.31)
Then by (4.9) and (4.13) and using
"
~j
= ~j+Op(l) it can be verified that
(4.32)
e
i
I I-I
~
Since a multinomial random vector converges in distribution to a multivari-
113
ate normal distribution when
for k=l, ... ,C; j=l, ... ,R,
~j
~(3)
is constant and n+ jk tends to infinity
has a limiting distribution given by
(4.33)
as {n+ jk } tends to infinity.
"*
Thus by (4.32) and (4.33) we obtain the limiting distribution of ~(3)
as
(4.34)
I
"
Then using properties of Kronecker products and IA j IA j
= I~=l ajk=l, we can simplify I to
Let
= U"WU.
o
.e
I =
(4.35)
o
L
Hence from (4.30), (4.34) and (4.35) we can derive that
T(3)
~ ) I~~11)RC Ai X~(l)
(4.36)
r
where {Ai; i=l, ... ,(I-l)RC} are eigenvalues of w-1I .
Let
o
G=
(4.37)
o
114
Since W- 11 = G s 11_1' (4.36) is reduced to
(3)
T
V
\RC * 2
H) Li=1 Ai Xi (I-I) ,
(4.38)
r
* i=1, ... ,RC} are eigenvalues of G in (4.37) .
where {A.;
1
Proceeding as before, we may approximate the asymptotic distribution of
T(3) by equating the first two moments of g*-l T(3) with those of a chisquare random variable with h* degrees of freedom.
First we note that
and
(4.40)
Thus the null distribution of T(3) can be approximated as
.e
g*-l T(3)
t i( h*)
,
(4.41)
r
where
and
4.2.2 ARE of X(3) relative to T(3)
Employing previous arguments in section 3.2 and noting the representations(4.19) and (4.21), the alternative distributions of x(3) and T(3)
can be easily obtained.
Thus omitting much algebraic details we can de-
rive the following;
(a)
Var (x(3)\H r ) ------~ 2(I-1)(C-l)R
115
where G is defined in (4.37)
(c)
2)
) (I-l)(l-Lj Lk b.a.
J Jk
JL
E [x(3)\K JI
de e
r e=O
:e Ee[T(3)IK r J\
~ (I-l)trace (G 2 ) = (I-I) trace (G) •
8=0
Thus the asymptotic relative efficiency e~lt of K(3) relative to T(3) is
given by
(d)
(3) _
(1-
f ~~b_j_aj_~_)2
---:-
epic - (C-l)R{L b. 2 [L a.~ - 2L a.~ + (L aJ.~)2}
j
J
k J
k J
k
(4.42)
(I(C-l)R ;\*)2
R,=1
R,
=----=-----(C-l)R(Li~il)R ;\~2
_e
under Kr : 8=e =0(I/n T), where
nT
*
{;\R,;
R,=l, ..• ,(C-l)R} is a set of eigen-
values of G in (4.37).
Now, as a straightforward extension of Theorem 3.1 we can obtain the
following theorem, given here without proof;
Theorem 4.1
where the left equality holds if and only if C=2 and R=1 and the right
equality holds if and only if the group sizes
R
C
{n+ jk }j=l k=l
are asymp-
totically balanced.
4.3 Test of Equality of the Fixed Row Effects
In testing the equality of the fixed row effects Hf
~l""~R
in
116
the pseudo IxRxC contingency table, two statistics deserve consideration,
the Wald statistic, say W, and Pearson's chi-square statistic x~.
It
can be seen that the generalized Wald statistic has a simple reference
distribution due to its construction, whereas Pearson's chi-square statistic has a complicated reference distribution as is shown in (3.83) in
the case of a one-way layout contingency table;
this is because of the
underlying Dirichlet-multinomial distribution.
The comparison of these two statistics, Wand x~, is small samples is not
practicable.
Even in a large sample comparison, as Puri and Sen (1971) indi-
cate, the unique answer regarding the relative efficeincy of Wrelative to
x~ may not be possible, since the alternative distributions of the two statistics depend on more than one parameter.
.e
Hence we consider the simpler problem
of testing H*f : ~1=~2 in a 2x2xC contingency table and calculate the asymptotic relative efficiency of Wrelative to x~ in an attempt to gain some insight into the original problem of testing Hf : ~1=~2= ..• =~R .
Our product Dirichlet-multinomial model is reduced into the product
multinomial model with the same probability vector, when 8=0.
Hence,
when 8=0, by aggregating the data along the random dimension we obtain
a two-dimensional contingency sub-table of sufficient statistics.
Thus
when 8=0, a test of Hf : ~l= ... =~R should be based on the collapsed
two-dimensional contingency sub-table of sufficient statistics. In a
product Dirichlet-multinomial model collapsing a three-dimensional contingency table along the random dimension does not yield sufficient statistics for
(~l' ... '~R).
Thus a statistician may want to base his test on
the full three-dimensional contingency table under two possible cases:
(i)
He decides that collapsing the pseudo three-dimensional contingency
117
table along the random dimension may incur loss of information on the
random effects, because collapsing does not yield sufficient statistics
when 8>0.
(ii) He mistakenly treats the balanced nested mixed effects model as a
crossed mixed effects model and bases his test on the pseudo three-dimensional contingency table.
The effect of employing a test procedure based on the full three-dimensional contingency table in a product Dirichlet-multinomial model can
be investigated by comparing test procedures based on collapsed and uncollapsed tables.
In doing this we may suggest beneficial conditions
under which the test based on the collapsed table is asymptotically more
efficient than the test based on the uncollapsed table.
.e
In the remain-
ing discussion we refer to the tests based on the collapsed two-dimensional table and full uncollapsed three-dimensional table as a test C,
and a test F, respectively.
4.3.1 Wald Statistic and Chi-Square Statistic
and let 'IT'" = ('IT ,'IT )' be a sequence of points
l n 2n
-n
in the two-dimensional Euclidean space R 2 of the form ~n= ~6 + o~/;n
where lim
~n=~
, and
~O
and
~
are fixed points.
In order to have only
n~
one extra parameter under the alternative hypothesis we set 0'" =
Define U(~)=('IT1-TI2)' where ~ . . = (TI l ,TI 2 ) is a point in R 2.
the above formulation the null hypothesis
is understood as
as n
-7
00,
(~,O).
Now under
118
and the alternative hypothesis, say K*f is formulated as
K*f : In U(~n) = In (TI 1n -TI 2n )
• t"
where n = nT. (See Stroud, 1971, and Shuster and Downing, 1976 for
recent development of the generalized Wald statistic and its applications.)
We use the following notations throughout this section.
(4.43)
(4.44)
Yjk = nl~m~ n+jk/n+ j +
+J+
C
AoJ = .Ao(8)
=
Lk
J
=lYoknok(8)
J J
(4.45)
for j=I,2 and k=I, •.. ,C, and
n+1+ (1 -(3
11m -n--'
nT~ T
B=
.e
B"
0
=
=
11m n+ 2+)
nT~ nT
(4.46)
0
(4.47)
(B, I-B)
(4.48)
(4.49)
For simplicity we sometimes denote Band Yjk for
n+1+ and
n
T
n
°
k
~,
nT
re-
spectively, where there is no confusion.
Let Wc and x~ be the generalized Wald statistic and Pearson1s chisquare statistic, respectively, based on the collapsed table along the
random dimension.
of Wc to x~.
We consider the asymptotic relative efficiency (ARE)
Denote
n + n12+
( 11
- , - ) = (TI ,TI )
=
~n
1n 2n
n+1+ n+ 2+
A
A"
e
TI
~O
n ++ n2++
= ( n1- , -n-)
T
T
A
(4.50)
(4.51)
119
Then based on the asymptotic normality of the beta-binomial random variable as was indicated in Paul and Plackett (1978), we have under the null
hypothesis H*f
rnr (~n-~O)
V
A-'
• N(2.
H*
f
~o{1-~O} [a-1\1
o
0]) .
(4.52)
(1-S)-l A2
Thus
) N(~~ [S-l A1 + (1-S)-l A2 ] TI O(l-TI O))
H*f
V
hiT U(;n)
(4.53)
and
A
~O
_
-
-nr-++ -_ 0 ( -t)
n1
(4.54)
p n
where n=n T
Hence
A
.e
A
2
nT(TIln-TI2n)
W = -_...:....--=..:..:---::.:.:...._---
c
(4.55)
[S-IAl+(I-S)-lA2]TIO(1-TIO)
Similarly, under the alternative hypothesis
(4.56)
because
lim TI (l-n ) = TI O(I-TI O) •
n
n
(4.57)
n-+«>
Thus is follows that
V*
Hf
-+
2(
XI,
~ 2/ {[ B-1 A +( I-S) -1 A ]nO(l-TI
1
2
O)}
(4.58)
where x2(v,o) is a noncentral chi-square random variable with v degrees
of freedom and noncentrality parameter
o.
120
Now, by using previous argument it can be shown that under the null
hypothesis H*f
(4.59)
•
where ¢1 and ¢2 are eigenvalues of the matrix 12
12 = (I2-1[1[~)
~
A (I2-1[1[~) ,
(4.60)
and xI(l,O) and X~(l,O) are independent.
Since the matrix 12 in (4.60) is singular it can be readily found that
one eigenvalue is equal to zero and the other one is equal to A1(1-B)+A2 B.
Thus from (4.59) it follows that
(4.61)
•
~
To find the asymptotic distribution of
ceed as follows.
x~
under the alternative
K; we
pro-
We can derive
(4.62)
o
and
A
A
1-B
-/B(l-B)
~I
1+
=
~G ::
A
(4.63)
A
-/B(l-B)
Now, because of the singular transform in (4.63) it can be seen that
121
z
2+
Z
IS-
=
11-S
(4.64)
1+.
Hence Pearson's chi-square statistic becomes
(4.65)
where
"-
Z1+
V
N((1-S)IB~, TIO(1-TIO)[A~1-S)2+A2S(1-8)]) .
~
K*
f
(4.66)
Hence
xc2
.e
V
K*f
~ [A (1-B)+A2 8J x2(1,
1
B(1-B)~
TI o(1-TI O)[A 1(l-S)+A 2S]
(4.67)
Based on the above results (4.55), (4.58), (4.61) and (4.67) it can
be seen that
Var(W c )
(4.68)
8(l-B)
var(x~)
~ E[X 2]
dt;2
c
Kf*
£;=0
~ 2[A 1(1-S)+A2 B]2
K*f
~
s(l-S)
TIO(l-TI O)
(4.69)
(4.70)
(4.71)
Hence the asymptotic relative efficiency e 21
of X~ of Wc is equal to
Xc Wc
1. This may be considered as an extension of the equivalence of these
122
two test procedures in a product multinomial model to a product Dirichletmultinomial model.
4.3.2 ARE of a test F relative to a test C
Since the generalized Wald Statistic is asymptotically equivalent to
the Pearson chi-square statistic for testing H*f : TI 1=TI 2 ' the Wald Statistic, due to its simpler reference distribution, may be chosen for the
discussion for the ARE of a test F relative to a test C.
We compare the
large sample behavior of the generalized Wald Statistic based on the collapsed table and the full uncollapsed table.
Let WF be the generalized
Wald statistic based on the full uncollapsed table.
Then WF becomes the
sum of the generalized Wald statistic on each of 2x2 table along the ran-
dom dimension.
.e
We define for j=1,2 and k=l, ... ,C
A
TI jk = nljk/n+jk
A
~k
(4.72)
A
= (TI 1k , TI 2k )
Bk = nlim n++k/n T
~
(4.73)
(4.74)
(4.75)
Then under the alternative hypothesis K*f we can derive
Hence the generalized Wald statistic Wk based on the k-th 2x2 table is
123
(4.77)
and
(4.78)
Thus
Now by the standard method of asymptotic relative efficiency of WF relative to W can be calculated as
c
2
.e
e W IW
F c
(8)
=
l~
C 1
Sknk(l-Yk)
,k=l Ykn2k(8)+(1-Yk)n1k(8)
L__
/--
/
s(l-sl
J
!
A1(1-B)+A2~
'
(
4• 80 )
and we can prove
Theorem 4.3:
When 8=0, i.e., under the product multinomial model
(4.81)
eW !W = lim
F c
nrm
where equality holds if and only if {n+ 1k } is proportional to {n+ 2k } .
Proof.
This can be readily proved by noting a theorem in Hardy, Little-
wood and Polya (1923, p.61, theorem 67), which states that
I.
1
(a.+b.)
1
1
I.
1
a.b.
1 1
a. +b .
1
1
<
Ia.}).
. 1 1
1
unless {ail and {bi}are proportional.
0
124
When 8>0, we may have a reparametrization by noting that 8=8 n=0{1/n)
in the passage to the limit.
(Paul and Plackett, 1978).
Thus we define
n¢ n = 8n
(4.82)
and
lim n8 n = lim ¢n
~
n~
=¢
,
(4.83)
where ¢ is some positive number.
Then by using (4.82) and (4.83) the
(4.84)
.e
Investigation of the formula (4.84) shows that the ARE depends on ¢ and
the group size ratios unless {n+ 1k } is proportional to {n+ 2k }. When
{n+ 1k } is proportional to {n+ 2k } it can be readily seen that the ARE of
WF relative to We is equal to
t.
The formula (4.84) of the ARE as a function of ¢ is based on the
O{*) assumption of e=8 n • In practice, however, e is determined by nanature, and hence is fixed.
Thus in order to provide some suggestions
to the practical statistician for the choice between WF and Wc in terms
of the ARE we may calculate the ARE of WF to Wc as a function of 8 for
different group sizes with the same group size ratios.
Indication of likely practical values of e may be obtained from
past empirical studies on the beta-binomial distribution done by Skellam (1948), Kemp and Kemp (1956), Chatfield and Goodhardt (1970), Williams
(1975), Feder (1978), and Segreti and Munson (1981), among others.
Their
125
Table 4.1
Author
Total number of
observations
A
a
A
nTa
Type of Data
Ske11 am
337
0.095 27.37
Number of associations in
chromosomes
Kemp and Kemp
200
0.171
200
0.126 25.35
200
0.129 25.77
Number of contacts with
pins in 200 frames tin
the analysis of point
quadrat data)
200
0.058 11.62
200
0.482 96.48
50
0.320 16.02
474
1.279 606.14
Wi 11 i ams
145
0.465 67.43
Number of pups survived in
pregnant female rat
Feder
524
0.073 38.25
Number of fetal deaths among
total fetuses per litter
27.24
Number of fetal deaths among
total fetuses per litter
Chatfield and
Goodhardt
.e
Approximate Range of a, nTa and the Type of Data
Segreti and
Munson
40
0.681
34.25
Number of r weeks on which
purchases of certain item
are made out of n weeks (r~n)
126
estimated values of e and the total number of observations are shown in
Table 4.1.
We consider a simple case when C=3 for the calculation of ARE's of (4.84)
as a function of 8.
Three sets of hypothetical group sizes, say D1 , D
2
and D3 are considered, where D1= n+11 n+ 12 n+ 13]
n+ 21 n+ 22 n+ 23
=[10
20 75].1S 1ntend.
80 5 40
ed to refer to the seriously unbalanced group sizes and D3 = [30 30 40]
25 35 45
represents reasonable proximity to the balanced group sizes. D2 =[20 15 50]
35 60 20
is considered to represent the
somewhere between D1 and D3 .
Group sizes are varied but group size ratios are maintained by multiplyto D1 , D2 and D3 , respectively. ARE's based on D1 , D2
and D3 are presented in Figures 4.1, 4.2, and 4.3, respectively.
From the calculation of ARE's we may note that when the group sizes
ing constant
.e
inba1ance
k~l
do not exhibit 'serious' unbalance the total sample size barely affects
the ARE, which is substantially below 1 (see Figures 4.2 and 4.3).
If,
however, the group sizes show 'serious' unbalance, the total group sizes
can affect the ARE (see Figure 4.1).
It may be concluded that for group
sizes that do not exhibit 'serious' unb1ance the ARE of WF relative to
Wc is less than 1 for a practical range of 8 values. Thus based on this
conclusion we may point out that loss of efficiency is the effect of using a test F procedure based on the full three-dimensional contingency
table in a product Dirichlet-multinomial model with practical e values.
However, this must be used with caution in practice.
unknown
8
on the size of the test needs further study.
The effect of the
In the remaining
127
Figure 4.1
ARE of WF to We Based on
10 20 75
kD 1 = k (80 5 40) for k=1,5,10
ARE
1.2
1.1
1.0
0.9
.
e
+++-
0.8
k= 10
k=5
k=l
0.7
0.6
0.5
0.4
0.3
0.2
O.T
0
'---+---if---+---i--+--+-+--t-+--+-+--+-+--+-+--+--!--l)Q
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
128
Figure 4.2
ARE of WF to We Based on
kD 2 =
k(j~ ~g ~~) for
k=1,5,lO
ARE
1.2
1.1
1.0
0.9
.e
0.8
0.7
0.6
__ k=10
.... k=5
... k=l
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2 0.3 0.4 0.50.6 0.70.8 0.9 1.0 1.1
1.2 1.3 1.4 1.5 1.6 1.7
e
129
Figure 4.3
ARE of WF to We Based on
30 30 40
kD 3 = k( 2~ 35 45) for k=ls5 s10
ARE
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.2
O. 1
o
e
0.1
0.2
0.30.4
0.5
0.6
0.7 0.8
0.9 1.0 1.1
1.2
1.3 1.4
1.5
1.6
130
section, we derive the form of Wc in the general pseudo IxRxC table.
4.3.3 Wald Statistic for testing the Equality of Fixed Effects.
From the results of previous discussion it may be concluded that
for the most practical cases of the group size ratios and the practical
range of e values, the generalized Wald statistic, Wc based on the collapsed table appears asymptotically more efficient than the generalized
Wald statistic, WF based on the full uncollapsed table. Hence in what
follows we employ Wc to test the equality of the fixed row effects in
the general pseudo IxRxC contingency table.
We construct a test statis-
tic together with its asymptotic null distribution.
We use the following notations for j=l, .•. ,R,
k=l, ... ,C.
(4.85)
Tf-"
-0
=
(4.86)
1
A-"
Tf. = -----
n+ j +
-J
-"
~G
=
(4.87)
(n 1 ·+, n2 ·+,···, n I - .+)
1J
J
J
(-"
-")
-"
(4.88)
~l' ~2""'~R
(4.89)
A
A
A
~G = (~1' ~2"'" ~R)
1
1
H= 1
-1
(4.90)
0
0
0
1 -2
0
0
-3
0
1
1
o
o
1
1
1
1
. l-(R-l)
(R-l)xR
(4.91)
131
U* = H
(4.92)
D 11_1
(4.93)
(4.94)
(4.95)
(4.96)
(4.97)
~1=~2= ... =~R(=~O'
The null hypotheses Hf
U*~G
Hf :
=
say) can be restated as
~
( 4 •98 )
Now, based on the asymptotic normality of ~jk
=
(n+jk)-i(~jk-n+jk~j)
we can deri ve
v ~
N(O, 0 -1
H
f
(e)
D
(4.99)
Vo)
S. A.
J J
where
thus
;n,:-
V
N(~,
U *"~G ----'---r~
Hf
U* [0
-1
S· A.
(e)
D Vo]
U*' )
(4.100)
J
J
;n,:-
Using (4.92) the covariance matrix of
U*~G in (4.100) can be sim-
pl ified as
u*[O
-1
(e)
B· A.
J
J
D
VO]U*'
(e)w']
= [HD -1
B· A.
by the property of Kronecker product.
J
J
D
Vo
4.101 )
132
Since
U*
= H III 11_1 is of rank (R-1)(1-1), by the theorem in
A
A
A
Shuster and Downing (1976) and noting Vo = VO+O p(l), where Vo = VO(~O)'
we obtain
v ) x2((1-1)(R-1)).
Hf
(4.102)
Chapter V
FURTHER RESEARCH
In this chapter, we list four topics that are related to the previous chapters and deserve further research considerations:
A
(1)
The uniqueness of the MLE e of a finite mixture of binomial distri-
butions.
(2)
The likelihood ratio test of Ho : c=l vs. Ha : c=2, where c is the
number of components of a finite mixture of binomial distributions.
(3)
(4)
The development of the TK statistic as a measure of association.
The development of a nested pure random effects model for count
data.
5.1 THE FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS
Aside from the earlier development of some numerical algorithms that
provide an ML estimator of the mixing distribution in a finite mixture of
members of the exponential family, the fundamental properties such as the
existence, uniqueness and consistency of the ML estimator of the mixing
distribution have not been discussed until quite recently in the literature.
Simar (1977) presented an extensive examination of these properties
of the MLE in the case of a finite mixture of Poisson distributions, and
Jewell (1982) applied Simar's arguments to a finite mixture of exponential
distributions.
Hill et
~
(1980) considered these problems in an infinite
mixture of the form h(t) = I~=l Pkfk(t) for known densities fk(t) that can
be found in the mixture of Poisson distributions, where fk(t)=e-At(At)k/kl
134
for A>O,
k=O,1,2, ....
Lindsay (1983) provided a convex geometric ap-
proach for the solutions of these problems in a finite mixture in general when identifiability is not an issue.
In fact all the families of
mixture models that have been considered for the investigations of these
fundamental properties of the MLE were either always identifiable or assumed to be identifiable.
In this section we prove the existence of a MLE of a mixing distribution in a finite mixture of binomial distributions and indicate that
the MLE may not be unique.
Let Xl' X2 , •.• , Xt be a random sample from a finite mixture h(x) of
binomial distributions with mixing distributions G, i.e.,
(5.1)
•
where GEG*1 , a class of all discrete distribution functions with at most
c atoms.
Suppose the observation vector (Xl, .•. ,X t ) has
O~Yl<Y2<"'<Yk~n.
Let ni be the number of
XiS
k distinct points
equal to Yi' i=l, .•. ,k.
The mixture model (5.1) is poorly specified unless it is
identifiable.
Hence we assume n22c-l to make the mixture model identifiable.
The log-likelihood function of X1"",X t can be written as
1
x.
n-x.
L = IJ=l l09{!o (~j)p J(l_p) JdG(p)}
•
(5.2)
135
where
S. = f
,
,-
1
0
y.
n-y.
(yn)p '(l-p) 'dG(p)
(5.3)
i
for i=1,2, .• .,k.
The equation (5.2) defines a many to one, map ¢ from G*c to a set B of
k-tuples (B 1 , ... ,B k) in Rk unless k~c. If k~c then due to the identifiability condition one and only one G€G*c is associated with a single point
in B;
hence ¢ becomes a one to one map.
Let {G£} be a sequence of distribution functions in G*c • Since for
each
~~1
G£ has a finite support [0,1], {G£} is tight.
Also by the Helly-
Bray lemma there is a subsequence {G£ } that converges to a distribution
k
function G*.
Lemma 5.1:
* .
c
G*~ G
It suffices to show that G* does not have c+l atoms.
Proof
the contrary G* has c+l atoms.
Suppose on
Let the support points of G£ and G* be
k
denoted by x1(k),x 2(k), ..• xc (k)' and xl,x2, •.• ,xc,xc+l' respectively.
Since
G~
(x) converges to G*(x) as
k goes to
00
at all continuity points
k
of G*, for sufficiently large k we can choose E>O such that xjENE(xj(k))
c
for j=I, ••• ,c., and an extra support point x +1$U N (xJ"(k))' where N (y)
c j=1 E
E
implies the E-neighborhood of y.
xl(k)+E<xc+l<x2(k)-~
for large k.
Without loss of generality we assume
Set G*{(x c+1)}=a.
Hence, for each
136
verge to G*(x O) due to the jump size a.
Lemma 5.2:
Thus a contradiction is obtained.
B is compact.
Proof
The binomial mass function is a bounded and continuous function
of p.
Hence by the Helly··Bray lemma, lemma 5.1 and theorem 4.4.2 of Chung
(1968), every sequence of points of B contains a subsequence converging
to a point of B.
Consequently B is compact.
o
However, we note that B is not convex due to the identifiability conwhich gives an upper bound of c to G*c '
The likelihood function (5.2) is strictly concave on a compact set
dition
B.
n~2c-1,
Hence it has a unique maximum at some point(s) in B.
But due to non-
convexity of B the point at which the likelihood function attains its
maximum may not be unique.
The investigation of sufficient conditions under which the likelihood
function attains its maximum at a unique point in B is proposed as further
research.
Another important problem in the finite mixture of binomials is that
the majority of estimation techniques assume that the number of components
of a mixture, which is c in our notation, is known a priori.
However,
no really adequate test has been suggested for testing hypotheses concerning c, even for the simple case of testing Ho : c=l versus Ha :c=2.
Everitt and Hand (1981) noted that IIthis may be a consequence of the problem rather than any lack of ingenuity.1I
We briefly describe the problems involved in the likelihood ratio
•
test for testing Ho : c=l versus Ha : c=2 in the case of a mixture of two
binomial distributions. A mixture h(x;8) of two binomial distributions
o
137
is represented as
h(x;~)
where
m~3,
= n(xm) px(l-p )m-x + ( I-n )(m)
x qx( I-q )m-x ,
and
8
(5.4)
= (n,p,q) is a parameter in the parameter space
n in
which
(5.5)
where
.e
Wo = {(n,p,q)
O<n<l, O<p<q<IJ
WI = {(n,p,q)
O<n<l, O<p=q<l}
W2 = {(I,p,q)
O<p<q<l}
W3 = {(O,p,q)
O<p<q<l}
The null hypothesis of no mixture Ho : c=1 now equivalent to HI =
~Ewl'
H2 : ~ew2"
or H3 : ~Ew3' Here we may note that two non-standard conditions exist in the parameter space n. First the parameter 8 under the
null nypothesis falls on the boundary of
n; hence the standard chi-square
distribution result of the likelihood ratio statistic -2 logA does not
hold.
Second, the null hypothesis region w1uw2uw3
consists of a union
of hyperplanes of different dimensions.
Wilks' original result (Wilks, 1938) of the asymptotic distribution
of -2 logA was generalized by Chernoff (1954) to the case where the parameter fell on the boundary between the null hypothesis and alternative
hypotneses regions. Feder (1968) also investigated the asymptotic distribution of -2 logA when the parameter was near the boundary between the
138
null hypothesis and alternative hypothesis regions.
Feder (1968) relat-
ed Chernoffls and his results to obtain the null and alternative distributions
of -2 10gA for testing Ho : 8<0 vs. Ha :8>0 in the context of the beta-binomial
distribution described in (3.5), and observed that the asymptotic null distribution of -2 10gA has a jump 1/2 at the origin and a chi-square distribution with 1 degree of freedom when A>O, i.e.,
-2 10gA ~~--+ (1/2) I(A=O) + (1/2)x 2(1) I(A>O),
o
where I is an indicator function.
Quite recently, Symons et
~
(1983) provide a Monte Carlo simulation
study of the distribution of -2 10gA for testing Ho : c=1 versus Ha : c=2
in a mixture of two Poisson distributions, and observe that the distribution function of -2 10gA has a jump 0.4 at the origin.
.e
Even though Symons et
~
(1983) consider a mixture of two Poissons,
they still have non-standard conditions in their parameter space analogous to those stated earlier.
The fact that their simulation study sup-
ports certain aspects of Feder's results (i.e., jump at the origin) suggests the need for further research on the asymptotic distribution of
-2 10gA under the two non-standard conditions in the parameter space mentioned earlier.
5.2
IK
STATISTIC AS A MEASURE OF ASSOCIATION
We have discussed in section 3.2 that a measure of variation R1.;
could be constructed from the c(a) statistic TK by noting the duality
of the c(a) statistic TK and Light and Margclinls Catanova statistic.
Since R~ . is computationally equivalent to a Goodman-Druskal
JO'
IS
139
t b, an estimate of their T b, the following properties of R1 i or equiva0
lently, t b are known.
Light, 1974)
(Goodman and Kruskal, 1954, 1963, Margolin and
i)
If there exists j such that n+j=n++, the TSS joi is equal to zero;
hence R~ . is undefined.
JO'
ii) If there does not exist a j such that n+j=n++ and if nij=ni+nj+/n++ for
all pairs of (i,j), then R1 i =0.
0
iii) If there does not exist a j such that n+j=n++ and if for each i
2
_
there exists a j such that nij=n i +, then Rjoi - 1.
iV) If none of (i), (ii), or (iii) occurs, then O<R 2.. <1.
JO'
v) R1 i is unchanged if all counts {n ij } are multiplied by the same
0
positive constants.
.e
vi)
R~. is asymmetric in its treatment of rows and columns of a contin-
J.'
gency table.
vii) R~ . is invariant under the permutation of rows or columns of a
J.'
contingency table.
Even though R1.i is computationally equivalent to Goodman-Kruskal's
t b, the two are derived under different sampling models.
For Goodman and
Kruskal (1954), and Light and Margolin (1971) row margins are fixed group
sample sizes and columns represent the response from a fixed effect or a
product-multinomial model.
Here the column margins are fixed group sample
sizes and the row represents the response from a random effects or a product
Dirichlet-multinomial model.
An hypothetical example of a fixed group effects model in which
2
Rj • i can be used as a measure of association can be envisaged as in the
140
following situation;
suppose nA, nB, and nC represent the number of
patients with final diagnostic records A, B, and C, respectively, who
were initially classified into primary diagnostic records AI, BI , and C
1
as in the following contingency table.
~,
.e
I
Final
Primary"'",--
A
B
C
AI
nll
n12
n13
B'
n21
n22
n23
C'
n31
n32
n33
Total
nA
nB
nC
Table (5.1)
One of the primary interests in this situation may lie in how much the
primary diagnostic records can account for the final diagnostic records.
The causal relation of interest goes from the row to the column, whereas the data can be collected so that the probability model of the contingency table is based on the column-wise product multinomials model,
i.e., nA, nB, and nC are fixed.
We feel further research needs to be done to study Rj.i as a measure
of association in the fixed and random group effects models.
5.3 THE NESTED RANDOM GROUP EFFECTS MODEL OF COUNT DATA
In chapter 4 we discussed a nested mixed effects model of count data
141
in which random effects are nested within fixed effects.
A natural ex-
tension of the nested mixed effects model would be the corresponding
nested pure random effects model within the framework of a Dirichletmultinomial distribution.
By drawing an analogy to nested random ef-
fect models in ANOVA we may explicitly specify
effects model for count data.
the
nested random
Only the balanced case is duscussed; here
the data can be presented in the form of an IxRxC contingency table,
where I, R, and C represent the number of response categories (nested
Let no ok be the number of sub'J
jects that are classified in (i, j, k) cell, and let ~jk = (nijk, ... ,nI-ijk)~
within each row category), respectively.
denote a response vector.
It' notations will be used for denoting the
sum of nijk's over the corresponding indices.
.e
Now, we may imagine that there is a Dirichlet population cf row categories, labeled by the parameter (n,8), from which the R levels e1' e2'···' eR
of the row category are sampled.
We next suppose that for each ej there
exists another Dirichlet population distribution of C levels ej1' ej2,.·.,ejc
of the column category and that D(ej,8) is the population distribution
of ej1' Ej2'···' ejc given ej·
Similarly given (ej'~jk} the response
vector nOk can be conceived as a single observation from the multinomial
-J
distribution M(n+jk'~jk).
Using the conditioning arguments we may express the hierarchy of the
nesting as follows:
. "d
(pJn)'!
:.J -
D(n,8 1 )
-
° °d
(p"k!po)'2
-J
-J
D(p.,8 2)
-J
142
(n'k1n+'k'P.,P·k)
- M(n+'k'P'
~.J
J -J -J
J -J k)
for j=1,2, ... ,R and k=1,2, ... ,C, where (xla,B) - F is understood that the
conditional distribution of
x
given
a
and B follows the distribution F.
We define the following notation for convenience.
TI
*
=
(TIl' ... , TIl_I)
~
ej* = (Plj' ... , Pr-1,j)
* = (P1jk'
~jk
, PI - 1jk )
* = (n ,
~jk
1jk
, nl-1jk)~
Using the analogy to the nested random effects model of continuous data,
we may represent the mean response vector
.e
* in the k-th column
n+jk~jk
category within the j-th row category as
(5.6)
where
* - p,*
C'k = -J
P'k
-J
-J
(5.7)
The two random vectors R, and CJ'k have zero means by their construction
-J
and variances given by
(5.8)
Var(~J'k)
=
8
81
* *~
+1 [Op .. - p.p. ]
lJ
-J-J
2
143
where 0 = diag(TI 1 , ... ,TI r 1) and D
is similarly defined.
TI i
Pij
We may note
further that the two random vectors R. and C. k are uncorrelated since
~J
-J
" * ,p.] = R. E[C·kITI
" * ,p.] =
E[p,D·kITI
J J - -J
-J
-J ~ -J
a,
(5.9)
hence
Eqn. (5.6) resolves the mean response vector into parts which may be regarded as the overall mean, row effects and column effects (within each
row category).
Also by noting (5.8) the hypothesis of no row random ef-
fects Ho : ej=O for all j is equivalent to HR : 81=0, and similarly the
hypothesis of no column random effects (within each row category) is equal
.e
Since the response vector
~jk
is a random observation from M(n+jk,ejk)'
we may specify the nested random effects model of interest as
*
n'* k = n+'k[TI
+ -J
€'k
J - +R.+C·kJ
-J -J
(5.10)
~J
where €'k has mean vector
-J
and
~jk
a- and
*
*"
- P'kP'k ] and ~J'
covariance n 'k[D p
+J
i j k -J " J -
are independent, defined in (5.7), and have zero mean vectors and
variances as in (5.8).
The model (5.10) is in complete analogy to the
nested random effects model of continuous data except for the normality
assumptions which are not valid here.
The joint distribution of {n ijk } in a nested random effects model
is not in tractable form. Nevertheless, it is mildly encouraging that
e
one can specify the nested random effects model of count data in the explicit form of (5.10).
Even though we feel that the arguments of chap-
144
ter 4 may be similarly employed for the hypotheses test HR
He : 8 2=0, no results have been obtained at this time.
81=0 and
BIBLIOGRAPHY
Altham, Patricia M. E. (1978). Two generalizations of the binomial distribution. Applied Statistics 27, 162-7.
Ames, Bruce N., McCann, Joyce and Yamasaki, Edith
(1975). Methods for
detecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenecity test. Mutation Research 31, 347-64.
Bahadur, R. R. (1961). A representation of the joint distribution of
responses to n dichotomous items. In Studies on Item Analysis and
Prediction, H. Solomon (ed.), Stanford University, Stanford, California.
Blischke, W. R. (1962). Moment estimators for the parameters ofa mixture of two binomial distributions. Annals of Mathematical Statistics
33, 444-54.
Blischke, W. R. (1964). Estimating the parameters of mixtures of binomial distributions. Journal of the American Statistical Association
59, 510-28.
Brier, Stephen S. (1980). Analysis of contingency table under cluster
sampling. Biometrika 67, 591-6.
Chanda, K. C. (1954). A note on the consistency and maxima of the roots
of likelihood equations. Biometrika 41, 56-61.
Chandra, S. (1977). Onthe mixture of probability distributions. Scan---dinavian Journal of Statistics 4, 105-12.
Chatfield, C. and Goodhardt, G. J. (1970). The beta-binomial model for
consumer purchasing behavior. Applied Statistics 19, 240-50.
Chatterji, S. D. (1963). Some elementary characterizations of the Poisson distribution. American Mathematical Monthly 70, 958-64.
Chernoff, Herman (1954). On the distribution of the likelihood ratio.
Annals of Mathematical Statistics 25, 573-8.
Choi, K. and Bulgren, W. G. (1968). An estimation procedure for mixtures
of distributions. Journal of the Royal Statistical Society B 30, 44460.
146
Chung, Kai Lai (1974).
New York.
.,
A Course in Probability Theory.
Academic Press,
Collings, Bruce J. and Margolin, Barry H. (1983). Testing of fit for
the Poisson assumption when observations are not identically distributed. Submitted for Journal of the American Statistical Association.
Cramer, Harald (1946). Mathematical Methods of Statistics.
University Press, Princeton.
Crowder, Martin J. (1978).
Statistics 27, 34-7.
Princeton
Beta-binomial anova for proportions.
Applied
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal
Statistical Society B 39, 1-38.
Deely, J. J. and Kruse, R. L. (1968). Construction of sequences estimating the mixing distribution. Annals of Mathematical Statistics 39,
286-88.
Efron, B. and Hinkley, D. V. (1978).
formation. Biometrika 65, 581-90.
.e
Everitt, B. S. and Hand, D. J.
Chapman and Hall, London .
The observed versus expected in-
(1981).
Finite Mixture Distributions.
Feder, Paul 1. (1968). On the distribution of the log likelihood ratio
test statistic when the true parameter is II near the boundaries of
the hypothesis regions. Annals of Mathematical Statistics 39, 204455.
ll
Feder, Paul I. (1978). The beta binomial likelihood ratio test --- with
application to the analysis of toxicological data. Unpublished Manuscript, National Institute of Environmental Health Sciences, Research
Triangle Park.
Feller, W. (l943). On a General Class of IIContagious
Annals of Mathematical Statistics 14, 389-400.
ll
Distributions.
Fell er , Wi 11 i am (1968). An Introduction to Probability Theory and Its
Application. John Wiley and Sons, New York.
Fienberg, Stephen E. (1975).
review by Sonya t,1cKinlay.
ciation 70, 521-3.
ll
Comment on liThe observational study - a
Journal of the American Statistical Asso-
Fraser, D. A. S. (1957). Nonparametric Methods in Statistics.
Wiley and Sons, New York.
John
147
Greenwood, M. and Yule, G. U. (1920). An inquiry into the nature of
frequency distributions representative of multiple happenings with
particular reference to the occurrence of multiple attacks of disease
or of repeated accidents. Journal of the Royal Statistical Society
A 83, 255-79.
Griffiths, D. A. (1973). Maximum likelihood estimation for the betabinomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29, 63748.
Goodman, Leo A. and Kruskal, William H. (1954). Measures of association for cross classifications. Journal of the American Statistical
Association 49, 732-64.
Goodman, Leo A. and Kruskal, William H. (1959). Measures of association for crosS classifications. II: Further discussion and reference.
Journal of the American Statistical Association 54, 123-63.
Hardy, G. H., Littlewood, J. E. and Polya, G.
Cambridge University Press, Cambridge.
(1964).
Inegualities.
Hasselblad, V. (1969). Estimation of finite mixtures of distributions
from the exponential family. Journal of the American Statistical
Association 64, 1459-71.
Haseman, J. K. and Kupper, L. L. (1979). An analysis of dichotomous
response data from certain toxicological experiments. Biometrics 35,
281-93.
Haseman, J. K. and Soares, E. R. (1976). The distribution of fetal
death in control mice and its implications on statistical tests for
dominant lethal effects. Mutation Research 41, 277-88.
Hill, David L., Saunders, Roy and Laud, Purushottam W. (1980). Maximum
likelihood estimation for mixtures. Canadian Journal of Statistics
~, 87-93.
Jewell, Nocholas P. (1982). Mixture of exponential distributions.
nals of Statistics 10, 479-84.
Johnson, Norman L. and Kotz, Samuel
John Wiley and Sons, New York.
(1969).
Johnson, Norman L. and Kotz, Samuel (1977).
cation. John Wiley and Sons, New York.
An-
Discrete Distributions.
Urn Models and Their Appli-
Kabir, A. B. M. L. (1968). Estimation of parameters of a finite mixture of distributions. Journal of the Royal Statistical Soceity B
30, 472-82.
-
148
Kemp, C. D. and Kemp, Adrienne W. (1956). The analysis of point quadrat data. Australian Journal of Botany 4, 167-74.
Kiefer, Nicholas M. (1978). Discrete parameter variation: Efficient
estimation of a switching regression model. Econometrica 46,427-34.
Kupper, L. L. and Haseman, J. K. (1978). The use of a correlated binomial model for the analysis of certain toxicological experiment.
Biometrics 34, 69-76.
Laird, N. (1978). Nonparametric maximum likelihood estimation of a
mixing distribution. Journal of the American Statistical Association 73, 805-11.
Lehmann, E. L. (1959).
Sons, New York.
Testing Statistical Hypothesis.
John Wiley and
Light, Richard J. and Margolin, Barry H. (1971). An analysis of variance for categorical data. Journal of the American Statistical Association 66, 534-44.
Lindsay, Bruce G. (1983). The geometry of mixture likelihoods:
eral theory. Annals of Statistics 11, 86-94.
A gen-
Louis, Thomas A. (1982). Finding the observed information matrix when
using the EM algorithm. Journal of the Royal Statistical Society B
44, 226-33.
Margolin, Barry H., Kaplan, Norman and Zeiger, Errol
(1981). Statistical analysis of the Ames salmonella/microsome test. Proceedings of
the National Academy of Sciences 78, 3779-83.
Margolin, Barry H. and Light, Richard J. (1974). An analysis for categorical data, II: Small sample comparisons with chi square and other
competitors. Journal of the American Statistical Association 69, 75564.
Mosimann, James E. (1962). On the compound multinomial distribution,
the multivariate - distribution, and correlations among proportions.
Biometrika 49, 65-82.
Moran, P. A. P. (1970). On asymptotically optimal test of composite hypotheses. Biometrika 57, 47-55.
Neveu, J. (1965). Mathematical Foundation of the Calculus of Probabilj!y, Holden Day, San Francisco.
Neyman, J. (1947). Outline of statistical treatment of the problem of
diagnosis. Public Health Reports 62, 1449-56.
149
Neyman, Jerzy (1959). Optimal asymptotic tests of composite hypotheses.
Probability and Statistics: The Herald Cramer Volumn, ed. Ulf Grenander. John Wiley and Sons, New York.
Orchard, T. and Woodbury, M. A. (1972). A missing information principle:
Theory and application. Proceedings of Sixth Berkeley Symposium on
Mathematical Statistics and Probability 1, 697-715.
Paul, S. R. and Plackett, R. L.
son mixtures, Biometrika 65,
(1978). Inference sensitivity for Pois591-602.
Pearson, K. (1894). Contributions to the Mathematical Theory of Evolution. Philosophical Transactions of Royal Society of London A 185
71-110.
Pearson, K. (1915). On certain types of compound frequency distributions
in which the components can be individually described by binomial series. Biometrika 11, 139-44.
Potthoff, Richard F. and Whittinghill, Maurice (1969 a). Testing for
homogeneity I: The binomial and multinomial distributions. Biometrika
53, 167-82.
Potthoff, Richard F. and Whittinghill, Maurice (1969 b). Testing for
homogeneity II: The Poisson distributions. Biometrika 53, 183-90.
Puri, Madan Lal and Sen, Pranab Kumar (1971). Nonparametric Methods
in Multivariate Analysis. John Wiley and Sons, New York.
Rider, Paul
R. (1961 a). The method of moments applied to a mixture
of two exponential distributions. Annals of Mathematical Statistics
32, 143-7.
Rider, Paul R. (1961 b). Estimating the parameters of mixed Poisson,
binomial and Weibull distributions by the method of moments. Bulletin
of the International Statistical Institute 39 Part 2, 225-32.
Ronning, Gerd. (1982). Characteristic values and triangular factorization of the covariance Matrix for multinomial, Dirichlet and multivariate hypergeometric distributions and some related results. Statistische Hefte 23, 152-76.
Roy, S. N., Greenberg, B. G. and Sarhan, A. E. (1960). Evaluation of
determinants, characteristic equations, and their roots for a class
of patterned matrices. Journal of the Royal Statistical Society B
22, 348-59.
Segreti, Anthony C. and Munson, Albert E. (1981). Estimation of the
median lethal dose when responses within a litter are correlated.
Biometrics 37, 153-6.
150
Scheffe, Henry (1959).
New York.
The Analysis of Variance.
John Wiley and Sons,
Shuster, J. J. and Downing, D. J. (1976). Two-way contingency tables
for complex sampling schemes. Biometrika 63, 271-6.
Simar, Leopold (1976). Maximum likelihood estimation of a compound
Poisson process. Annals of Statistics 4, 1200-9.
Skellam, J. G. (1948). A probability distribution derived from the
binomial distribution by regarding the probability of success as
variable between the sets of trials. Journal of the Royal Statistical Society B 10, 257-61.
Stroud, T. W. F. (1971). On obtaining large-sample tests from asymptotically normal estimators. Annals of Mathematical Statistics 42,
1412-24.
'Student' (1907). On the error of counting with a hemacytometer.
metrika 5, 351-60.
Bio-
Sundberg, R. (1974). Maximum likelihood theory for incomplete data
from an exponential family. Scandinavian Journal of Statistics 1,
49-58.
.e
Symons, M. J., Grimson, R. C. and Yuan, Y. C.
rare events. Biometrics 39, 193-205.
Tarone, R. E.
tribution.
(1983).
Clustering of
(1979). Testing the goodness of fit of the binomial disBiometrika 66, 585-90.
Tarone, Robert E. and Gruenhage, Gary (1975). A note on the uniqueness
of roots of the likelihood equations for vector-valued parameters.
Journal of the American Statistical Association 70, 903-4.
Teicher, H. (1963). Identifiability of finite mixtures.
ematical Statistics 34, 1265-9.
Annals of Math-
Wilks, S. S. (1938). The large sample distribution of the likelihood
ratio for testing composite hypotheses. Annals of Mathematical Statistics 9, 60-2.
Wilks, Samuel S.
New York.
(1962).
Mathematical Statistics.
John Wiley and Sons,
Williams, D. A. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-52.
Wisniewski, T. K. M. (1968). Testing for homogeneity of a binomial series. Biometrika 55, 426-8.
151
Wolf, J. H. (1970). Pattern clustering by multivariate mixture analysis.
Multivariate Behavioral Research 5, 329-50.
Wu, C. F. Jeff (1983). On the convergence property of the EM algorithm.
Annals of Statistics 11, 95-103 .
•