Tolley, H. D.; (1974).A two-stage approach to the analysis of longitudinal type categorical data."

This research was partially supported under Research Grants
GM-70004-04 and GM-00038-20 from the National Institute of
General Medical Sciences.
A TWO-STAGE APPROACH TO THE ANALYSIS OF
LONGITIDJINAL TYPE CATEGORICAL DATA
by
H. Dennis Tolley and Gary G. Koch
Department of Biostatistics
University oflNorth Carolina at Chapel Hill
Institue of Statistics Mimeo Series No. 962
OCTOBER 1974
ABSTRACT
HA1WL]) DENNIS TOLLEY. A Two-Stage Approach to the Analysis of
Longitudinal Type Categorical Data.
(Under the direction of
GARY G. KOCH.)
This research deals with the analysis of experiments in Hhich
.
categorical data is collected longitudinally in time or space by imp1.ellIenting a two-stage procedure of analysis.
In the first stage one
select:; a module, or subset of the experiment "'hich, \"hen considercd
alone, crll1 be modeled reasonably with a likelihood model contain.jng very
few pal-amcters.
Parameters of modules are then estimated with maximum
1ike1 ihooll techniques by considering each "partial" likelihood inclividually.
In the second stage, the parameters of the entire experiment are
estimated and any relevant tests are made by combining the parameter
estimates of each module by weighted least squares.
Examples i11ustrat-
ing this procedure are given.
The maximum likelihood estimates for a module can he considered as
implicitly defined functions of the observed proportions of the cxperi2
ment.
Such well defined estimates can be used to form Neyman minimum-Xl
or linearized Wald statistics under certain regularity conditions.
Such statistics resemble a generalization of the formulation given by
Grizzle, Stanner, and Koch in 1969, and retains the same unity in the
construction of test statistics.
Thus, in this setting, calculations for
investigating various relationships of the module parameters are kept
simple.
From the hierarchy of possible module levels, the investigator must
make an appropriate choice.
are
investi~ated
also.
Considerations in criteria for selection
• . . 0 the vainness, and the frailties, and
the foolishness of men!
When they arc learned
they think they are wise, and they hearken not
unto the counsel of God, for they set it aside,
supposing they kno\v of themselves, wherefore,
their wisdom is foolishness and it profiteth
them not. And they shall perish.
But to be learned is good if they hearken
unto the counsels of God.
(2 Nephi 9:28,29)
iii
TABLE OF CONTENTS
LIST OF TABLES. • • • • • • • • • • • . • . • • •
......
v
CHAPTER
1.
INTRODUCTION AND LITERATURE RIWIEW.
·
1
·········
· ····
··
. · · · · · · · · · · · ·· ·· ·· ·· ·
·
····
·····
····
· ··· ····
·····
·
·
·
·
·
·
. · · · · · · · · ·· ·· · · ·
II.
III.
1.1 Introduction
1.2 Dilution Series and Survivnl
1.3 A Clumped Binomial Model
1.4 A Negative Binomial Problem.
1.5 Modules.
1.6 Philosophy Underlying Analysis
1.7 Categorical Data
1. 7.1 Introduction.
1. 7.2 Notation.
1. 7.3 Equivalence of Two Test Criteria.
1. 7.4 Weighted Least SquarL3.
1. 7.5 Examples.
Proposal
1.8
14
15
15
16
17
19
20
22
SOME EXTENSIONS TO CATEGORICAL DATA METHODOLOGY
24
2.1
2.2
2.3
2.4
2.5
2.6
24
24
25
27
30
35
Introduction • . • • • • • . • •
Implicitly Defined Functions • • •
A Note on the Variance Estimate.
Non-Linear Models • • • . • • • •
Correlated Multinomial Models ••
Discussion. . •
• ••••
;
.
·..
SECOND ORDER VARIANCE CONSIDERATIONS.
3.1
3.2
3.3
··
···
· ·· ······
Introduction
The Node1.
First and Second Order Variances
3.3.1 Preliminary Remarks
(1)
(1)
3.3.2 Comparison of 1Cjkm (]!O) and 2Cjkm(]!0)'
··· ·········
· · · · ·· ·
1
3
6
9
13
37
37
37
41
41
42
iv
TAIILE OF CONTENTS (continued)
Page
CllAPTER
3.3.3
Calculation of
(2)
2
IV.
- ]km,rtw
(n) and
~O
C '1"
)··········
J oTI,rtw (1f
~O
3.11
3.3.4 Discussion. . . . . . . . . . . .
Extensions to the Hodel h(¢)=XO for Level One.
48
49
3.5
Discus~;i(Jn.
53
SOHE EXAHPLES·.
4.1
4.2
4.3
4.4
v.
]C~2)
Introduction
A Dilution Experiment.
4.2.1 Introduction . .
4.2.2 The Experiment.
4.2.3 Module Level One.
I~. 2. /1
Nodule Level T\"o.
4.2.5 Module Level Three.
I~ • 2.6
Module Level Four
4.2.7 Discussion . .
The Clumped Binomial
4.3.1 Introduction.
4.3.2 Methodology.
4.3.3 An Example . .
Animal Dispersion and the Negative Binomial
Distribution: A Multiv.::triate Extension.
4.4.1 Introduction.
4.4.2 The Model.
4.4.3 An Example.
4.4.4 Discussion.
DISCUSSION.
54
54
54
55
56
58
62
66
66
67
67
69
71
76
76
77
79
83
84
v
LIST OF TABLES
PAGE
7
1.]
1.2
DepIction Data. .
8
Bi ases of NLE and \{LS for Dilution Experiment
• • 59
Estimated Parameters and Goodness-of-Fit Statistics for
Within Experiment Exponential Decay Models. . . . •
· 61
Tests of Hypotheses for Comparisons of Intercept and Slope.
63
Estimated Parameters and Standard Errors for Final Model.
63
4.5
Test of Hypotheses for Final I:Clc1eL • • • • •
4.6
Estbnatcd Intercept and Slope Parameters from Maximum
Likelihood Analysis
• • • • • 65
fl.7
Estimated Parameters, Standard Errors, and Tests of
Significance for Final Model Fit of 1'1aximwn Likelihood
Intercept and Slope Parameters. • . . . . . . .,.
• • . 65
Estimated Parameters and Standard Errors for Pure Maximum Likelihood Fit of "Final Model".
66
4.8
· 63
4.9
Depletion Data.
• .72
4.10
Depletion Data Estimates.
• 74
\
4.11
Parameter Estimates for Depletion Data.
4.12
Core Sample Counts of Benthic Invertebrates in Three Zones . • 81
Fits to the Netative Binomial
4.14
Tests of Hypotheses for Comparison of 'k' •
• 74
• • 82
• 82
CHAPTER T
INTRODUCT10N AND !ZEVH\\l OF L1TEl:,\TUKE
Int n)(illct i
1.1
Oil
Often r;tuuics arc
uJldertak(~n
dina11y in time or space.
r;ol1lc
The underlying mode]
(po~;sibly stochastic).
pHlCC0S
J<;ll~',ill:
\vldch involve c1at:a collc'cl"cl
for such c1at:!
Interest often cent en:
"
i Ii
t!!(
Ii
I
llIating the parameters of this proccr;r<.
<IS
biologic.'!l .-mu medical stuuies, the: interer;t is not only in
( : ; [ j.
('II
mate associntcd \vith thir; process, but in comp:u-ing such er:t ;r.latL,'~
I
I\J!
ll,
hor;c [rom otber process(,s.
C'r;lllllate the parameters corresponding to eaell process \vhicll
\
to the
~]tucly
and then
ill'I'
)
,,1 (
;1.
\
gcnc\l~atc
mcnni ngful testr;.
Much \·.'Ork has been done in investigating the mathcmaLlc,lJ i'nij'
crtlcs of stochastic processes.
Recently, hm,C'ver, the less pUFul
are;] of time series analysis (i. c., the methodology for Cstilllil L
j"ll
statistical analysis of a stochastic: process) h<lS received inert':t:;
al:U~ntion.
Bar;ic results .in this arca may be found in
boo1~s
ilr
iPl')
in~'
r;ue!, il:;
Andcn;on [1971], Cox and LClvjs [1966], and Billingsley [1961].
results usc data collected in one of three ways.
Observatio,ur;
UI)
a continuous or discrete process at discrete time points an, ur;l'd
analyzing processes such as moving nveragc or autoregressive
i Iii"!
III
pr('u';;;;(
Ana]yf;is of rvncwal processes or fnil un' time processes use cont
obscrvatiOlw of the procesf;.
('
illUUU:;
The third type of cbta are obr-.;erv'IL i cmf;
2
(contlnu01W or discrete) from a dis'crete outcome space such as on
observed Markov chain.
;111
From these data tests related to the process
arc developed.
In many studies the nature of the experiment makes it
il\lpoc~cdblc
to follow one of the above sampling plans.
Instead, at discrete points
in time a cluster of observations is taken.
For example, the n-tl: dilta
point may be the number of failures of a hom'ogcneous set of inc.l.iv idu;, Is
between time t n- 1_ and tn.
The snmpling procedure may be further com--
pU.cated hy an inability to observe the entire available set of Dutcomes
even at discrete points in time.
For example, in studying bacteria
decay or growth one could not hope to enumerate the total bacteda <1t
any given time point.
In this type of problem one usually
subset of the bacteria alive at time t, say.
sample~; d
In such studies the (c1is-
crete) distribution of the sampled data is a function t'f both the unc.lcrlying stochastic process and the sampling process.
Parameters of this
distribution may admit a relevnut analytic study of the underlying ]nocess.
We consider hereafter populations whose underlying processes
(either stochastic or deterministic) are sampled by this last procedure.
Ilence, data for the populations of this process are subject to error
both because of the stochastic nature of the process (if any) and
because of the sampling procedure.
In many problems (see example sections below) one is primarily
interested in comparing processes from different populations.
Exper-
imcnts of this type include studies in which knowledge of the different
effects of known conditions giving rise to the different processes is
desired.
In such a case one hopes that the information available in tlte
distribution can be used to explain variation among the underlying proc··
3
es~:es.
Fot- example, tf one wIshes 'to Jetennine the effcct of a treat-
ment over
A
control by a dtffcrence in decay rates of tllC corresponding
populations, the sample dtstributions must contain a parameter analoBoUS to tllC decay rate parameter of the underlying processes.
Very
little research has been done on tests of hypolllcses among several
processes.
The reJ E~vant \wrk in this area is referenced below.
The purpose of tId s thesis is to present and illustrate a procedure useful in comparing differences in underlying processes.
Condi-
tions where the process must be sampled as described above nre seen in
the examplps given belm".
110W
These examples will be used to illustrate
meaningful inferences on the populations may be made from the
parameters of the samp.1e distribution.
1.2
Dilution Series and Survival
Many situations in public health studies involve the estimation
of bacterin density in a solution.
Moreover, in certain applications,
such statistical procedures arc further complicated because the estimation of the density at a given time point is but one module of a larger
experiment concerned with decay rates or extinction times of bacteria.
These experiments require one to consider models involvinij density estimates at several points in time.
As indicated by Finney [1964], the two major procedures for bacteria enumeration are the colony count method and the quantal response
method.
The colony count method assumes that the progeny of each bac-
tcrium grow in discernible colonies which are counted after an incubation time.
From these counts estimates of the density are formed.
Because all bacteria are not suitable for colony count metllOlls,
4
quantal response methods have been·used to form a variety of estimates.
Although the procedure illustrated i.n the sequel is potentially appli.cable to colony count methods, we will consider only dle quantal
response method.
Data of this second type arc generated by inoculation
of several sterile tubes (or plates) by each aliquot taLen from a
sequence of serial dilutions of the original solution.
From the number
of fertile (positive) tubes (1. c., tubes showing growth after incubati.on), density estimates are derived.
Cornell and Speckman f1967]
review this statistical problem in detail; their conclU[dons indicate
that the maximum likelihood estimate has satisfactory properties for
botll large and small sample sizes in such experiments.
In the enumeration of bacteria by the quantal response method
three asswnptions are made.
In the first, one assumes the bacteria are
uniformly distributed throughout the sulution.
The second assumptioli
has to do with the probability of growth of a bacterium inoculated into
a tube.
Worcester [1954], for example, has considered several different
models for the probability of response.
For simplicity, however, we
will assume that growth in a medium will ensue upon inoculation of only
one
bacterium~
The third assumption is that
A,
the mean number of bac-
teria per unit volume, is constant throughout the population.
Because the exact value of A is UnknO\Vll, experimenters often use
a series of dilutions
to prepare inoculants spanning a predetermined
\
range within which
A should
lie.
of the original solution, and n
i
If there are q dilutions, zl, ..• ,Zq'
tubes are inoculated with the i-th
dilution, the likelihood function for the vector
1r
-
== (rl, ••• ,r ) of
q
fertile tubes is given in (1.1) according to the assumptions above.
5
1,(1.',1.)
nq
[n ]oj
(1.1)
1=1 r i
(For
dJscw;sJon on the design aspects of a dilution series one is
il
referred to Cochran [1973].)
The cstJlIlate of A we will usc is the maximum likelihood estimate
or 'mo~;t probable numbcl:'
thl~
(HPN) as named by HcCrady [1915].
Illilximiz:ltion of (J .1) is not trivial.
l1lUf>t he
tl~:ed
to reach a solution.
In fact, iterative methods
Algorithms for doing llJis are given
by Peto [1953] and Finney [1964].
monotonic decreasing function of
If q ~ 2
Since the derivative of (1.1) is a
>,
the MPN will be easily solved by
successive approximation on a computer.
Examination of survival curves from data ccl]ected at diffCJ:cnl
points in time has been done by past researchers for the case q
=
1.
MathC'l" [19!t9] used simple density estimates of bacteria exposed to bactcricide for x
porti on
1T
- x
= 12 (2)
36 minutes.
At the end of x minut2S the pro-
of sterile tubes was estimated.
exponential decay model log(-log IT ) = U
x
+
Mather then applied the
ax to the observed results.
Epstein [1967] gave a theoretical justification for this lleuristic anal-
.
ysis by considering extinction time as an extreme value problem.
A is
When
assumed to have 'a d~st~ibution throughout the population, an
assumption made in other areas of bioassay, the parameters of this dis-tribution may be a function of time.
that
Ax
For example, Harris [1958] assumes
at time x is distributed over the population according to
the gamma distribution, with parameters a functions of time.
these
rc~;('archers
Both of
base their estimates of survival curves on one
observed djlution per time point.
¢x ,
6
In the fo1lmving we will i111,lstrate the usc of the exponential
decay model for more than one observed dilution per time point.
In an
experiment undertaken by Schiemann [1972] three dilutions \"ere used to
estimate A at each time point (see Table 1.1).
Schiemann wished to
determine the effect of pH and temperature on decay rate.
He thus co1-
lected data from five independent decay processes, eadl at different
pll/temperature conditions.
In light of the introduction we may set Schiemann's problem as
follows.
In the j-th population the death process is Poisson with
parameter fl ..
Hence the time until death, or survival timo, is the
~.+8 .x
translrlted exponential e J J • Due to the sampling process described
J
above, the likelihood of r for a fixed time x is given by
n.-r.
[n'J
II3
1 (exp (-z. exp (11 . +!3 . x»
i=l r i
1
J J
1
1
(1. 2)
r.
(1 - exp(-z. exp(ll.+s.x»
1
J
J
1
Although this looks quite different from the Poisson process assumed,
the parameter of interest, /3., is in (1.2).
J
Hore about this problem,
including estimation of survival curves and comparisons of the
S.,
J
will
be given in Chapter 4.
As a second example we consider the data in Table 1.2.
set was first presented by Kastenbaum and Larnphiear [1959).
This data
In this
data set the number of deaths in a litter of mice before weaning was
observed for two different treatments across several different litter
sizes.
For each treatment-litter size combination, three possible out-
7
TAUf.
1 • I
SU:WIVAL Of .l}~£_rI1~~rJ£~.~ !-\t;!.L'~~~_L.!..~
-- _- --_.- ---------,-_._-_._--- ..
...
~
Ebp,ed
lime
(Hrs.)
----_._----_._-----
59.92
(,6.00
14.83
£1,. ) 7
10
10
10
10
10
10
10
10
10
30.00
35.75
41. 92
47.92
51, .00
59.75
65.83
74.72
e4.00
94.00
10
10
10
10
10
8
6
7
1
1
6
4
3
3
2
4
1
1
0
0
24.03
36.08
42.25
48.08
54.17
60.17
66.75
72.17
10
6
5
4
2
6
4~
1'1
-
~
Expcr1cC'lIt P2
I'll u 8.0
20· C
T
.
F.xperl",cnt Tl
7.4
I'll
T '. 20· C
·
.08
43.08
51,. )3
1~0.00
Exper Jrlcnt T2
7.4
I'll
25· C
T
.·
[xFcrlnlC'ot TJ
·
7./,
I'll
T· 30· C
10
10
10
10
10
10
9
7
7
. 8S;~:l
96.08
107.00
107.50
114.00
138.CO
4.08
12.00
23.92
29.83
35.92
47.08
47.97
60.00
66.00
71.92
80.17
88.33
95.92
107. 33
8
10
10
5
5
3
2
10
10
10
10
10
10
10
6
6
4
4
9
2.00
3.83
15.75
19.83
35.75
41.92
47.75
53.75
~~.83
71. 75
80.00
88.17
Esrlrn.Hed
Dcn;(lty
A
... ----_ ..
_--------_.__ .
E:lt 11".1ted~
8.e. !0r A
/..;
....
Lod.\)
2
1
8
6
7
4
~
6
4
3
3
~
4
2
0
0
0
0
0
0
0
10
9
7
3
3
2
(I
0
2
0
1
1
1
1
1
0
0
10
10
10
10
9
7
3
8
10
2
1
0
0
0
0
1
1
0
0
"6
1
0
0
F'il
i.~\.\U·J
s.e. f0r y
/~
y
,------------------_
--~
30.17
35.97
rXperfnLlt
7. I,
I'll
20· C
T
_._-----~_
110. of renlle TutH~ 9 ror
[111\Jt1on .",)f"tor V01ll~~~
0.'1-- -0''-0 J-' o--:ooi "'o''-a'I)!:)l
..
~_.----_._-_.
7.45
6.SJ
O. ~\_~.
O. :<.
7.01
6.20
(l . ~\7
O.
~'.l
(. . I,)
O.
f.
6.63
l).fl'j
6. 3~j
5.9J
5.99
0 . .',1
O. :,'.)
2
172 3. 8
60~.3
2
1
0
0
0
1
0
0
10S&.I,
493.t
621. 7
792 .4
5e9.7
399.1
399.1
40,1.8
437.0
200.3
756.8
318.7
243.6
15 1,.9
154.9
2
2
0
0
lCl8C,.5
699.6
399.1
399.1
329.1
216.1
92.2
116.2
18.9
9.4
408.8
2f6.7
lSI, .9
154.9
171.9
75.3
35.9
42.8
D.6
9.4
5.38
4.52
4.7&
2.94
2.25
935.9
621. 7
493.2
336.9
792.4
621. 7
589.7
222.1
99.6
99.6
127.6
59.9
59.9
39.9
32.9
363.7
25&.8
200.3
149.0
318.7
256.8
243.6
6.8 1,
6.43
6.20
5.96
6.68
6.43
6.33
O./')
5.l,0
0.3',
0
0
0
0
1
0
1
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
lltJ.8
2397 .9
1704.8
1012.2 .
473.6
399.1
3S6.9
231.7
107.1
77.8
56.4
56.4
9.1
9.4
19.3
1504.6
2B72.~
493.2
792.4
192.'1
99.6
31. 5
30.0
9.4
9.4
2.0
0.9
n.5
' 35.0
38.0
46.2
26.7
26.7
15.5
12.2
85 / .7
6.99
6. ')5
t
O. :1)
O ..13
0.1,]
5.~)
o. ")',1
5.99
O.
5. Eil
0.37
4.(0
4.60
4.85
4.09
4.09
' 3.69
3.49
7.78
7 .I,~
6.92
6.16
5.99
5.96
601.9
386.7
191.0
154.9
149.0
SO.9
(0.2
31.8
2~~ 7
25.7
9.2
9.4
6.7
4.67
4.35
4.03
4.03
2.20
2.25
2.96
536.7
'1054.1
200.3
318.7
67.1
38.0
18.1
17.6
9.4
9.4
1.4
0.9
7.32
7.96
6.20
6.68
5.26
4'.60
3.45
3'/.0
2.2:,
2.25
0.69
-0.06
5.~4
}g
O. Jj
O.
~\)
li.
37
(1./:'1
1. (;0
0.3:J
0.!,1
O./IJ
O. jC)
a.110
O.~l
(I • ~\:~
O.
(.I.
J:)
:f,
O.I.S
0."'5
O.
3~j
0.37
0.
J('
0.3:,
0.33
0.40
0.39
0.39
0.1',
0.33
O. /11
0.4[,
0.1,6
L02
1.
ro
O.
Jl)
0.3(,
0.37
0.41
0.1.0
0.3)
0.33
O.
~I (l,
O. '.9
LOO
l.Oll
0.70
1.00
--_.---_._--
8
TABLJ~
1.2
DEPLETION DATA
(see Knstenbaum and Lamphiear [1959 ])
Litter
Size
Treatment
Number of Depletions
0
2+
1
11
A
B
58
75
19
5
7
8
A
B
49
58
lLf
17
10
8
9
A
33
B
45
18
22
15
10
A
15
B
39
13
22
15
18
12
15
17
8
7
10
A
11
B
4
.s
comes, zero, one, or two or more depletions, were considered.
was in the interaction of treatment to litter size.
Interest
Grizzle et al.
[1969] considered these data also,· but in the light of their linear
models framework, and found no significant interuction, a result
agreeing with previous researchers.
Suppose we assume that the number of depletions for the i-th
treatment-litter size combination follows a binomial distribution with
parameter
1f ••
1
Then we have a clumped binomial distribution for the
sampling distribution characterizing the observations of each treatmentlitter size combination.
of
1f
j
When this assumption is correct, the estimate
, say P , should account for most of the variation in the j-th
j
margin.
The analysis based on such a set will admit a test for inter-
action as wns initially intended.
9
Often
experil1J(~nts
eV[lluatl~d by
and :;tnclics in biology, especinlly ecology, nrc
counts of individuals per unit of space.
expc'rill1cnts invo]ve comparisons of these count
u1at.i0l1B.
In this cnse, cstillwtion of the
rate~;
Norc complex
in different pop-
paral11l3~erS
of an assumed
rnndol1l count mock1 for a particulnr population is only the beginning of
the ntLltistlcal analys1:;.
In studies of this kind one must consider
models involVing the parameter estimates of the count processes for each
of the different populations.
In many situations the alternative to the notion of uniformly distril)utcd individuals on a plot is that of overdispersion.
Overdispersed
populations arc cllaracterizcd by the variance of counts exceeding the
mean.
Typ;cally such a populntion llas both 'clumps' of individuals and
aredS of sparsity.
In case uvcrdispcrsioll is suspected, one usually
specjfics a model from one of the "contagious" distributions (Neyman
[1939)).
The importDnce of these distributions in modeling overdis-
perscd populations is discussed by several researchers (see Evans
[1953], Beall and Rescia [1953], Beall [1940], Williams [1964], Holgate
[1966], and Elliott [1971]).
Oue very important overdispersion model is the negative binomial
distribution.
TIlis distribution, given in (1.3) with parameters l' and k
[1969])~
(seC' Johnson and Kotz
can arise from a variety of situations
(see Patl1 [1970]).
f(x=r)
C
r(k+r) r
k
r1I'(k) p (l-p)
~ r
O~1~2,
••.
(1. 3)
When sampling individuals confined to discrete habitable sites or when
10
sampline quadrats from a continuum; the count of individuals per sampling unit may be distributed as a negative binomial (Pielou [1967]).
Estimation of the parameters and fitting the data to this model have
been discussed hy several authors (Bliss and Fisher [1953], Katti and
Gurlancl [1%2], Hartin and Katti [1965], and Pahl [1969]).
di~;c\lss
\~e
do not
the merits of the different: procedures here hut adhere to the
maximum likelihood method.
Although this TC(jldres an iterative proce-
dure for solution, simple programs for a two--c1imensiol1al search arc
usually available.
He
use a program adapted for this type of problem
by Gillings [1972).
One hypothesis of interest wIlcn comparing populations dcals with
the difference in dispersion.
The use of dispersion indexes has been
discussed in the literature (Bateman [1950], Thomas [1951], Ske11am
[1952], and Shelby [1965]).
As quoted by Williams [1964], Hunter and
Queoouillc [1952] suggest that the parameter k in the negative binomial
distribution can be used as an index of dispersion of the population.
The larger the value of k, the more nearly uniformly distributed the
population.
Elliott [1971) discusses the use of k as an index along
wiih other commonly used indexes of dispersion.
His conclusion is that,
if indeed a sampled population is distributed as a negative binomial,
use of the estimate of k as an estimate of a dispersion index is justified.
Hunter and Qucnouille [1952] usc the results of their fitted distributions to conclude a differen'ce in the dispersion of parasites
betwecx:t sheep grazing on "heather hill" and those grazing on "pasture."
Statistics for such hypotheses can be generated by transformations of
the data attempting to yield a normal distribution (see Johnson and Katz
11
II
(J()l)
J ,mel i\nscombe [1948)).
Statistical procedures based on untr:1llS-
dnta arc our area of application.
fUI'IJl\'d
Work in this area has been
dOlle hy Bliss and Owen [1958) and Ilinz :md Gurlanel [1968].
I:liss :l1lel O\l1ell [1958) give a procedure for estimating the parameter
I,
\,)Ilell
COl\llllOn
llli~;
1'1
to seve'ra1 populations, and give a test for the validity
as~nllnplion.
Let
x.
l.
u.
s.
l.
1
2
IN.,
l.
(l.lf)
Yi
\,,11<'
1'('
U,
I
s.
2
u. ,
l.
1
is the sample mean and s.
1
2
is the sample variance of the i-th
l'()11I11:ll-ion based on a sample of size N .•
Then, according to these
l.
r
("'\':II'(:lIo.rs, if we linearly regress y on x, constraining the intercept
I ()
1)0'
/'('1'0,
the regression coefficient of x is An estim3te of 11k.
I; I i ~~; and O\ven use weighted least squares regression for
\1
i I h ,,,eight w.
1
estimated by an iterative formula given in their paper.
Lllese results procedures for estimating the common parameter are
JlI'UIll
1'. j
this estimate,
Ven.
Two tests of the common k assumption are given.
init jill
!~jv('n
test of homogeneity.
The first is an
2
If the populations have a conuuon k, X
in (1.5) is approximately chi-square in large samples with g-2
dcgn'cs of freedom (g is the number of populations).
(LWiX.Y. )
1
2
1
(1. 5)
"'-'w.x. 2
1
1
1\ second tl'st of validity, with 1 degree of freedom, may be split off
from this approxImate chi-square.
This test corresponds to testin[;
12
whether the intercept of the regression line is zero.
II significantly
non-zero intercept when (1.5) is nOllsJgnificant indicates a progressive
change in k.
Hinz and Gurland [1968] give a more gCllcral linear model approach
to the comparison of the parameters of several ncgative binomial models.
1l1Cy start by forming a vector U. of functions of the sample factorial
J
c\llllulanU; and observed zero outcomec; of the ,j-th popula! Lon.
tors
t.,
J
corresponding to tbe population counterparts of
mated by tbe minimum modified chi-squan> method.
('stilllate of E"
The vec-
U., are estiJ
Explicitly
l~,
the
minimizes
(U - ~)' V-I (U - E,)
where E,'== (t;i'''''E,~),
covariance of
V.
V'=: (Vi'''.'V~), and V is
uJlcon~;trail1ed
EIl1
cstiJI1:ltc of the
Hypotheses of the form IIO:Cr.; == 0 may be tested by min-
illlizing (1.6) subj eet to H '
O
of
(1. 6)
The difference in residual
SUIll
of square"
and constrained minimizations is approximately a chi-
square when llO is true.
Hinz and Gurland ShOH hOH to choose C to nwke
tests on p. and m. = k.p. across populations.
J
J
J J
To set tllese problems of population distribution in the [ralllcHork of
the jntroduction, recall that several different underlying processes will
give rise to a negative binomial process (Patil[1970]).
The tHO methodol-'
ogles reviewed assume that the underlying processes are uncorrelated.
In addition, we assume that the parameter k in each sampling distribution
may he used to characterize population dispersion.
Thus, we may implement
the methodologies above to compare "clumpedness" of several uncorrelated
populations when the underlying biological processes give rise to the
negative binomial distribution.
Situations sometimes·' arise where the underlying processes arc high ly
13
correlated.
In addition, the sampling procedure Ill,ly in(Tca:;c or even
create some correlation in sample distr ibutions.
For
examld.l~,
if
vH~
\Vi8h
to compare dispersion cllarncteristics of several species of animals in
the same area, each sample Illay consist of counts of each specie.
By
n:;sIIIlIing only the form of the marginal distributions, He \Yi1] ::;hO\'1 thJl.
the proposed procedure can be used to analySt' such multi,,;])" 1<1 l.e samples.
Application of the proposed procedure to both uncorrelated negative binOIll.L;d processes and other correlated
1. 5
processe~;
will be apparent.
~lodules
SUPPOf;(~
that the set of observations of the experiment rnny be divi d-
ed into a set of disjoint subsets of observations such that the cxpectations in a subset may be modeled, usually by a model containing few paralIl(~t('n;.
If these subse ts are such that we IIl.J.y make meaningful inferences
on the experiment by considering only the p.3r<lll1eter estimat.es for these
models, then \'1e call the subset a "module unit".
The set of such "mod-
ules" for the whole experiment corresponding to a particular division of
the observations is called a "module level".
A "module" of an experiment may be considered as a basic unit of tl18
experiment upon which an analysis will be based.
lIenee, an obvious
module level will be the complete set of observations with each module
containing one element.
We use larger "modules" from "higher module
levels" when we feel that the analysis should be based on certain functions of the observations.
A special class of modules are generated by using factors of the
likellhqod as models.
Often the likelihood may be factored in a way such
that the sets of observations corresponding to different factors are dis-
joint.
ate
The model for each of these modules corresponds to the appropri.--
par ti;tl likclihood ll or likelihood factor.
Il
Nodule levels generated
by different factorizations of the likelihood form a hierarchy of modcl:lngs of the experiment.
))i~rarchy
In the dilution experiment, for eXilll1pIe , a
of four module levels exist.
individual proportions as modules.
of three
(:dn~;
ob~;erved
The first corresponds to the
In level two, the module is the set
proportions per time point.
Hodulc ll'vcl three con-
all observations for all time points for a particular level of the
cxpcrJlilcnt.
In module level four, the entire set of observations,
corresponding to the complete likelihood model, are used.
In this thesis,
we consider only modules gene'rated hy various meaningful fDctorizations
of the Ukelihoocl.
1. ()
Yhil()sc:r~~_LJJndcrlying
AnalJ:':sis
Vue to the noture of the sam;1ling process,
f;lckring Drc discrete.
tllC
datA
\VC
ilrc con-
Hence, when one considers ohservations from
several populations, the problem resembles a categoricol data problem.
The two methodologies in the analysis of this type of data, maxjmum
Ukelihood and weighted least squares, stem from the implementation of
two different philosop))ies.
Although these two methods yield asymptot-
ically equivalent statistics, computational and inferential aspects of
any particular prohlem may be quite different.
When the hypothesized model is correct, we feel that inferences
bo~ed
upon likelihood procedures effectively use distrihutional informa-
tion.
These procedures are less sensitive to observed zero proportions
and have a tendency to smooth ill conditioned data.
Solutions of the
likelihood equations, however, often involve quite sophisticated com-
puter techniques.
expert.
Hare complex problems Illay even require a
softh'~II.·L'
Inferences given by the resulting estimates have attractive
largc"Gmnple features which, in certain problems, are felt to off set
COlllput;1
tLonal cliff iculties.
For categorical data prohleTl\!; llLi.r; ~l'l)l) nl;JC]1
:is L.lvored by Goodnwn [1968, 1970], Bishop [1969], and others.
A Illajor lldvantagc of the \Jeighted ] cast squares procedure
,'\'l'r
maximuill Jil(clihoocJ is its simplicity and unification for a \vidc
1:111;;('
:lpplicahlu models.
1:<'1,
For categorical data problems, Grizzle, SLn
Koch [1969] illustrate the use of linear Illodel techniques to a
class
o[
problems.
squares approach is usually available, extraction of estimates
:~lld
lin
<,,'I
Since the necessary soft\,'iue for this lineal:
01
l'Clc:l
illjiJ
related test statistics for these prohlems requires a minimum of
ClJii1pU-
tational effort.
nUll'
Expansion of this unified technique to certaln
Uncal' models (see Grizzle, Starmer, and Koch [1969] and Forthofl'! ,1])<:
Koch [1973]) has widened the class of prabl CIll,'; to \Vhlch this
aplHl':Jcll
is applicable.
'hlC
111ain rcr;ults of this thesis are in the implementation of
categorical data techniques at more involved levels of an experlli\Cllt
lIenee, we now review the basic results underlying this methoc1alot',Y.
1.7.1
Introduction
Data '-lld-ch can be modeled in a
(complex) contingency tab]" l'r:l~ill'-
work arc often called categorical data.
In practice, a model of tills
form usually arises from experiments in which data points arc c 1:1:;,; ifiabl(~
a priori into one of several groups called "factor" group:;,
postcriorl into one of several groups called "response" groups.
:lll"
III all
:,
16
experimental design context, one may think of "factors" as a gencrnlization of "treatments,1I to include blocking variables (Dhapkar and Koch
[1968a, b] and Imrey and Kocll [1973]).
Usually one is interested in
dctermJnillf, hmy [m experimental IIfactor ll affects a IIresponse.1I
In the IIfactor-response ll setting hypotheses of interest are
usually of the form used in multivariate analysis of vuriance for continuoW3 variables.
Estimation and corresponding tests for a \l7ide class
of hypothc'ses, including the above, have been placed in a simple a1gorithmle framework.
This approach, illustrated by Grizzle, Starmer, and
Koch [1969] (abbreviated GSK hereafter), is based on the theories of
Hald [19 113] and Ncyn13n [1949].
This same theory can be used in the
analysis of the problems considered in Chapter I by generalizing the GSK
methodology.
In t11is section \.,Te \l7ill review this methodology and indi-
cale an extension of this proccdure useful in our examples.
In addi-
lion, the method of llinz and Gurland will be sho\l7n to be essentially one
variant of this mcthodology.
1. 7.2
Notation
Let us consider T independent multinomial populations, each with
s outcome states possible.
number, n., of individuals.
J
For the j-th population
w~
observe a fixed
Denote by r .. thc number of the n.indi1J
J
viduals from population j whose response is in outcome state i and let
1r ij denote the probability of this event.. Then
s
L 1f
1
j
1, ... , T
(1. 7)
i=l ij
If
n :::
then the likelihood for r
=
(r 1 1,r Z l,···,r l,r] 2"'"
"
S,.., _
17
r
th(~
,,) -Is
fi.1
pro(ltict 1ll111U.ll0I1lLal
r ..
[(r) .
'IT .. lJ
.--2:J.._
,
s
T
II n.!
j=l J
IT
r
i==l
(1. 8)
i J· .
It: is knowll tllilt in sllch a formulation the unrestricted maximum
cf;t:illl:JLe~:
like.lihood
r .. In..
lJ J
of 'IT .. are p ..
lJ
lJ
Cov(r lJ
..• PI<. 9)
'IT .. (l-'IT .. ) In.
lJ
lJ
J
i
k. j
-11 •.
j
9... i
0
j
l.J 'ITk·/n.
J J
"
£,
9..
"
k
1,) be.estimatcc1 by p
., l,···lf s -1
. ,..]TI 12
, _.···1T s _] .•.
Let.l! ==
('If]
-
P._]
]p].• 2.···'P.-1
T)'
s .•
s .•
(1. 9)
(PI 1·····
•
Denote the covariance of p by
V(TI) == Cov(P . .
-
V(~)
CuvarL:1I1ces of these
lJ
,Pkn)'
(1.10)
x.,
is estimated by Vee), the matrix resulting from replacing all TI ..
lJ
in (1.10) by the corresponding estimates from
r.
One way of forming hypotheses on TI is in terms of constraints of
the form
k
(1.11)
1 •...• u.
where gk(') are known functions.
\
such that when JI
(i)
to lr;
O
obtains and .:u is in some neighborhood of
~O:
£(n) has continuous second partial derivatives with respect
18
dg (~)
k
-----_.-
(H)
has rank u < (s-l)T;
dlT ••
1J
n=lT
- -1
(j.i i)
u x
(s-l)'1'
g (n) is functionally independent: of the constrainU; (1.7).
\\'hen the abov(' assumptions are saUsfieg, a consistent cst imate of
the covari <tHee of g (p) is given by
s(p)
which
n.
COIlWf;
G(p) V(p) G' (p)
(1.12)
from the first order Taylor series expanf,ion of g(p) about
To test the hypothesis
(1.11), Wa1d [1943] proposed the statistic
(1.13)
which
j
s asymptotically distributed as a chi-square \Iith u dcgrcef3 of
freedom when (1.11) is correct:.
Another way of testing (l.ll) is by applying Neyman's lincc1riza-
minimum-x~
tion tccJlnique and forming the
statistic (Neyman
[1949]).
Explicitly, define
g*(~)
g(p)
+
G(p)(~
o.
p)
(1.14 )
Neyman hns shown that in large samples one may consider
of (l.11).
The minimum value of
2
X .=
1
T
s
L L
j=l i=l
(r .. - n.n .. )
_.2:J___J 1J
2
\~hen (l.l!')
(1. 15)
r ..
1J
Hubject to the constraints in (1.14), defines the
of Neyman.
(1.14) instead
minillmm-x~
statistic
(or (1.11)) is true, this statlf;tlc isa1so
a6ymptotlcally di.stributed as a chi-square variable with u degrees of
19
f rct'l!olll.
Bllapk::r[ [196(» has shown that the test statistics
mum-xi for
·tcstin~
x~
and mini-
(1.111) nrc idcnticn1.
Suppose the hypothesis concerning Tr can be put in the form
(1.16)
1, ... ,q,
where t < q
.~
(5-1) 'J'.
We assume that the
satil;fy the same nssumptions (i),
h~, (.)
are known functions and
(ii), and (iii) as gQ, above.
Also
assume that the knm·m constants dQ,k are such that the qxt matrix
D
=
{d
1,
k} has rank t.
The covariance of h(p)
=
(hl(p), ... ,h (p»
q -
is
estimated as before by
S
SeE)
II(E) VeE) H' (p)
(1.17)
ah.
where II(E)
1
is assumed to have rank q in a neigh-
an jk
q x (s-l) T
~=£
borhood of ~O.
A statistic used to test the validity of hypothesis (1.16) for
large samples is
2
Min X (8)
e
2
De)
(1.18)
" which solves (1.18) is called the weighted least
The value of ~, say 0,
sqaures estimate of
-e
and is given by
(1.19)
20
It is known tbat X; (0) is asymptoti cally distributed as a chi-square
wJth q-t degrees of freedom when (1.16) is true.
One may also test (1.16) by either the Neyman procedure or the
Rlld procedure by solving (1.16) for a set of constraints
£(~).
If one
linearizes h in (1.16) by
(1. 20)
the Neyman
minimUl1l-X~
statistic and the Wald
x~
statistic for linearized
hypotllcsis (1. 20) as has been mentioned are identical.
In addition,
Bh<lpk.'1r [1965] has shown that the Xi(£) statistic above is identical to
both of these statistics.
Hence, linearized W;etld statistics or Neyman
minimum-xi statistics for hypotheses of the form (1.16) may be calculated simply by applying the linear models technique of weighted least
squares to the data in h(p).
this procedure Hill
furIli~,11
In addition to a test of fit of
e
estimates of
components of the overall experiment.
(1~l6),
useful in estimating variance
GSK have illustrated the use of
this procedure to determine sources of variation in their examples.
1. 7 . 5
Exa-!uples.
The application of the weighted least squares procedure has been
illustrated for several particular functions.
Koch [1969] show methous for
work.
h.Q.,(~)
h~(~)
=
TI~
and
Grizzle, Starmer, and
h~(~)
= log
TI~
in their
Forthofcr and Koch [1973] have given the steps for analysis when
belongs to a type of log-exponential family.
Also the method of
lIinz and Gurland [1968] can be put in the formulation (1.16) as we will
sec below.
For die j-th population let Pij(TI
ij
) be the observed (expected)
21
proportion of
~~zllnplc~; h;lvjn~
1 coun,ts for i"'" O, ... ,r-1, and p
.(H
r,]
.)
r,]
the oh~;('rved (expcclf~d) proportion having r or more counts, \vhcrc r is
prc!;pcci
rll~r1.
Form
T. (11 .)
J
1
(IT 1.. ,11
J
•) , ,
. , .•. ,11
2J
r,J
1, ... , T,
j
(1. 21)
and
C
1
2
r
1
4
r
2
(1.22)
8
2
1
s
r
8
where dctl'.nnination of s will be mentioned later.
CT
s
Then
"
')
,
( l-I].,P2·'···'P
..
l-I.
J
J
-J
8,
(1.23)
J
lIinz <md Gurland form the functions
h
OJ
(j)
(IT)
/
K
=
(j)
[1]
(j)
i
hij(lT) = K[i+l] K[i]
= 1, ... ,8
(j)
for (,,'ll:b j, where K [i) is the i-th factorial cumulant.
The functions
h .. (11) arc functiom; of 11. since
1J
-J
K
(j)
[l]
1"
t-'
Ij ,
(1.25)
~ (j)
~'[2]
l-I;j - l-I~.j -
2
(l-I~j) , etc.
llin? and Gurland give a matrix analogous to
appropriate to theJr
nnaly~;is.
They also
J)
f;}IOW
in (1.16) above \vhich is
how to estimate
22
-
Cov(h(p».
No claar procedure is given for determining the value of s,
....
but the authors suggest small values.
Maximum likelihood estimates of a model can be considered as
sophisticated averages of the sample proportions.
As such, these esti-
mulcs more closely approximate their large-sample properties tilan proportions.
The purpose of this thesis is to combine some of these prop-
erties with the simplicity and unity of the weighted least squares
method by applying the latter method on a higher hierarchal plane.
This
will be done by applying maximum likelihood to natural "modules" of the
experiment: and using weighted least squares on the estimated parameters
of
t11(~f·;e
modules.
Often the cell probabilities of the underlying contingency table,
denote~d 1T •• ,
1)
may be modeled in natural subsets which may be assumed to
explajn the relevant variation in the experiment.
Hhen the experimental
situation is as described in the iritroduction, both th~ underlying proc('Sf; and the resulting sample distributions may be used to model these
nl\tl1l~al
"modules" of the experiment.
In fact, this will give a hier-
\
archy of possible modules.
As we move up this hierarchy of resulting
mode]s by taking larger subsets of the observations to estimate the
\
r~ramctcrs
lIlodel.
of more complex modules, we approach tbe complete likelihood
Ona is mOl\ivated to take advantage of the likelihood properties
hy \vorking with more complex modules.
However, one is inhibited in this
by the increasing complexity of assumptions and computations necessary
"for HlIch complex models.
I
One may compromise the computational complc:Ei:ty and gain in "e[fj-
23
cJcncy"
(~,e('
Rohertson [1972) or Rao [1962)
ChOf>Cll modules by one of two slightly
by ;ll1;lly;dng suiUlbly
difreJ~ent prOCC(hll~Cf;.
In the
first: procedure one treats the maximum 1ike1 ihood e:;timatc of a module
with i U: corresponding covaJ:im1C:c estimate as a gcneraliz,l1:iol1 of the
unre:;tTicted max:!mulll likelihood
estimat(~s
di1ta prohlems by Grizzle ct al.
[1969].
p or
III
If
as lIsed in catcgCl!.- Ie;)l
the second pl-occc.\urc the
likelihood equations of the modules arc COllsidered
of pLlralllct:ers as implic.i t functions of
If.
;l~~
defining
tJIC
set
Compari,.;ons analogow-; to
multivariate analysis of variance hypotheses JIlay be constructed from
eithcr method.
first
The major difference betHeen the t\YO methods is that thc
(Hhich is mon' "efficient" in
Cl
certain sense) require:; the
modules to be uncorrelated whereas the ,;ccond c10es not.
CHAPTER II
SOME EXTENSIONS TO CATEGORICAL DATA HETllODOLC)(;Y
2.1
Introduction
In
Chaptl~r
I we proposed a two-stage method for anal yzing <13 ta
in:isj ng in certain experimental situations.
Tile first stnge is to
c~;t
i-
mate parameters of separate modules of the experiment by maximum likclihood t cchni q'Jes.
least
squ,lrc~;
The second sta[',c is the implemcntation of the Hcightcd
tec:lmiqucs used in categorical data methodology to form
relevant statistics.
As was alluded to preViously, the likelihood equa-
tions implicitly define functions of the observed proportions among
which tIl(> second stage extracts i U; statistics.
This chapter extends the categorical data methodology to include
hoth implicit functions and non-linear functions for us~ in this two
stage procedure.
The necessity for implicit [unction results is imme-
diaLe from the preceeding remarks.
The utility of non-linear results
will be realized in more complex problems.
Although none of the examples
will illustrate this extension, its usc in marc involved time series
problems is apparent.
TIle reader is referred to Section 1.7 for notatjon
and preliminary remarks.
Let h(lT) be a (t-variate) function defined by
o
1, ....
,t,
(2.1)
25
where the form of f (.) is knmvn.
Assume hand f have continuous second
partial derivatives with respect to n. and f has continuous second par1
tinl derivatives with respect to hi'
then for all
1T
in a neighborhood of
a.
'=
dfQ,
{ dh'
Assume also that
1f
O
J
l~=E
t
WIWIl
:':0 obtains
the follmving halO.:
has rank t;
x
t
b.
has rank t;
x (s-l)T
c.
Then we may solve for H(r) by differentiating (2.1) with respect to
and evaluating at n
=
p.
11
Explicitly,
F(p) H(p)
whence ,
(2.2)
For a particular observed set £ one may solve (2.1) for
iterative procedure.
Cov her) as above.
~(E)
by some
One may use this and (2.2) to find an estimate of
According to how one models h(n), one can nmv choose
either linear or non-linear weighted least squares to generate statistics for hypotheses about hen).
2.3
A Note on the Varlanco Estimate
In pracU.ce ~(r) is an estimate of ~, the vector of parameters in
26
some lI1odel.
2,
If the vacCo}: of probal)ilit.ics ~ is modeled in term:; of
aile may estimate
1T
from h(p), by say~. :in <lddition to the unrestricted
mnx:inllll11 l:ikelihood c[;tim.:ltGs p.
Since, in our case, ~(r) will arise as
A
so]uLion:; to part:Lol likelihood equations one may usc either
1f
or
E
in
the fOrIlliltion of the tost statistics.
In the
~cncral
formulation of
\~alc1
sUltistics and also veighted
least ::quares statistics calculations arc made using a consistent estimate of variance.
parclm(~lcr
\hlll I
original procedl1re was for quite ~eneral
S
sets, using an estimate of the Fi:;her information matrix to
form t.he stati:;tics.
If the functions h. are solutions to
]-
lil~elihood
equations Wald statistics may be formed in either of two \-laYs.
1.
In the first case one uses the consistent estimate of covariance given by TI, i.e.
(2.3)
ii.
Alternatively one may use the consistent estimate based on the
raw data, namely
S(p)
(2.4)
H(p) V(p) HI (p).
Weighted least squares may also use either estimate as the
wei~ht
matrix
with resulting statistics identical to the corresponding Wald statistic.
TIle problem, therefore, becomes one of choosing Wllich estimate of covariance to usc.
If the model r;enerating the functions h. as maximum likelihood
1
csUmates is correct, then use of (2.3) is more efficient in the sense
of Rno [1967.].
However, if one
suspect~;
this model, hut feels the gcn-
crated functions h will reflect true differences, one is advised to use
27
the ('stimate in (2 ./f ) .
Usc of S (r) k,:-; ] C55 Ch011I:C of concc.:l1ing or
blowl ng up differences in tIle
(~xpcrjmcnt
when a lack of fit is observed.
Sindlar rC'marks may be made when the functions consist of compound function~"
2./,
the inner [unction being the likelihood esUmat-c (eg.
q(f(r»)'
Non·-Lincar I-lodels
It wi 11 be noted that tIle
hypotllc':;j~;
parameters giving the constLlints.
car in h(p)
(J~hapkar and Koch
apply to the proportions
cah]e.
r
lIence, the constraints \"i11 be lin-
[1965]).
for the
(1.16) is linear in G, the
This 1incority does not have to
proC(~Jun,
of Section 1. 7 to be appli-
/l.s was illustrated, sOllle \-7ell behaved non-1inear functions of
have a]rendy i1ppcared in the literatllre.
r
It Ill.:ly h3ppen, however, that
hypotheses of interest in some studies are non-linear in both p and 8.
In this case, putting the hypotheE;ized model in constraint form (2.5) is
difficult if even possible.
Calculation of the weighted least square
statistics is also more complicated, nnd these statistics are not equal
to those fanned by a constrL1int hypothesis as above.
However, one can
approximate the test procedures above by appropriately linearizing the
hypothesized model.
Suppose the hypothesis concerning
TT
lIIay be put in the form
1, ... ,q,
where
G
(8 " .. ,0 t)' as above.
1
(J)-(i:li) of Section 1. 7.3.
(2.5)
He aS~;lIllJe h • (.) satisfies assumptions
9
Suppo~_;e Lll<IL
for all () in a neighborhood of
~O' ~(~) "-' (kl(~),· .. kq(~»' ha~; conLilllJOlIS second partial derivatives
with respect to 0, and
K(O)
28
lws r anlt t < q \l7ho11 ~ 0 obt<.l ins.
Def inc
(2.6)
keG) )
WhtTC'
S is as clef incd in (1.12).
X;(2)'
For
<.I
£0
n
It can he shown (see Hitra [1958]) that for a class of functions
g ilnl! k, the sequence of random vectors
to
sample of size n let 8,k minimize
as n
-)-
{O)~}<Xl
.
converges in probability
-n n=l
<Xl , provided n. /n - c. + o (]:) where c. I- 0 is a constant,
P n '
J
J
J
such that the convergence rate is
10_0 - -n
0*1 ""
0
q
p (n- ) ,
q
S
(2.7)
(1/4,1/2).
Thus, using the above results ue may rewrite (2.5) as
k(O*)
+ K(O*)(8
- 8*)
+ R(Oo,O*)
_ -n
-n _0
-n
--n
h(Tr)
where R(O
_0
,O~c) =
-n
0
p
(n-2q).
(2.8)
Thus, for large n the hypothesized model
(2.5) is approxbnated to first order by
K(O)~)
-n
• 8
hen) - k(O*) + K(O*) • 0*
-n
-n
(2.9)
h*(n), say.
A gooclncGs of fit statistic for (2.5) has been calculated fronl (2.6) as
X2 (O~c) •
3 -ll
If one \l7ishes comparisons among the parameters in ~, hO\l7ever,
one may usc the
procedure above.
in (2.5).
formulat~on
(2.9) and the linear weighted least squares
This is \'impler than reformulating each hypothesis as
One must, of course, initially solve (2.6) for 0* to give a
-n
stilrtlng point.
To apply tile linear weighted least square procedure to (2.9) we need
to c~;t:lm.:lte covariance of ~* (r) .
Since 0* is also a function of p, the
- n -
29
c;l1cul at:Lon of
ll'~
() 0 j .
.t1epcucJs upon -;---Clpjk
ExplicItly,
(2.10)
lI(r)
+
}
.
A(p) Z(p)
where
{a .. }
1J
T(p)
and
Z (r) =
To solve for
o=
1
3----
P jk
t
(2.11)
x (8--1)1'
Z(p), differentiate (2.6) wIth respect to both
Note that the differential
-
dO.
{
0"', by definition of
(D
kk
_n
.+ W
kk
The resulting equation is
- U ) Z(p)
kh
W
kh
- R
where
{u
W
kl }
C
{w
2
1 (0)
i J'
{ L\' d__ ,( i _- °h.(p)s'}
i,j dObdOa
J ~
l}
a, )
n
a,iVffi
}
a~d p.
of (2.6) Hith respect to 0 is zero \\7hen
e)~.
-n
e
{
I
dk'(O)~)
- -1 --- dO a
i,j
dh· (p)
_2...:-__ s i -'j }
dP £,111
(2.12)
30
h
R
Since
Ih.J (p)
-.
<1,k9.
- k. (0*)
J -
} '" { ):
(k.(O~()
J -
i,j
I -~P
:I~;i;umpti()n \vC' m:1y ~301vc
0
,1f;
n+
.- h.(p»
J -.
\ve \"i11 assumc
m
R == O.
Under this
fur ,/,(p) provided the' :Jj'jHopriatc ffiiltrix is of
(U
kk
+
hi
(2.13)
kk
Sinc(' hypotheses (2.9) and (2.:.i) ,In' af;yl:ll'totically equivalent,
for l<llT.c
II
01]('
may make relevant tcsu;
011
0 by uf;ing (2.9) .. The equiv-
alencc of u,[;t criteria as pointed out above ]lOJds for this model, and
1H'IlCC
asympt'otic optimality of the Hald statistic Dnd Neyman statistic
hold for the proposed procedure.
2 • .'i
Corre1:1tcu MultinomiDl Hodels
In some coscs, such as in stationary Markov chains or multivariate
s<J1l1pling prucedures, the different multinomial variables arc correlated.
The work reviewed and extended thus far has been for the case of indepcnclellt (or at least ullcorrclated) multinomial variables.
in the correLl ted case is that the Neyman minil:lUlll
<h·f; ned.
Hhcn the NeYl:lan linearization
te~chni que
xi
One problem
stati.stic is not
is extended in a nat-
ul'al \Yay, the resul Ls for Heightcc1 lCClst squares :md Hald stab sties for
linear, non--11neal:, :lna implicit functions are the same as
tho~;e
giv<:n
above with V(p) replaced by V>"(e), the Fisher information estimnte of
eovarJilllcc in the correlated case.
The
cXU~nsLon [01-
tlw linear case faJ Js jllJlTicdiilLe]y
given tn Bh.1pkar and Koch [1965].
frolil
One may see this by noting
the proof
(11;11'
Koch's lcnulla, wwd to prove the equivalence of cri teria, does \lot depend
31
on whether
VCr)
is block diagonal or not.
The non-U.near case w1.1l follow the linear case provitled the consistcncy result given above can be shown to hold.
The proof of consist-
oncy for correlated multinomial variables is straightforward and follows
Cramer [19 L• .5].
Since I have not been able to find
thi~;
resul t in the
11.teraturr, we give the 1 emma here \.;rith an indication of the proof.
individual in population j falling into cell i.
Let r .. denote the
1J
observed nU1I1ber of individuals in population j actually falling into
E with
covariance of
the Fisher information es timate V,', (p) .
Suppose n.
J
observations arc taken on the j-th multinomial and let g .. (O)=n.
.
1J
~
J
(0)
1J-
·If ..
with
g' (0)
We assume n./N ~ k.
J
~O
00, where N-
'1'
~ n .•
j~l
J
obtains, we assume that there exists an open neighborhood
-·0
of 8
~
J
When 0
n
f a as N
such that for all 0 E
n
the following hold.
2
a
g .. (0)
--~- exists and is continuous for every combination
(j)
ClOkoO 9,
of k and 9,;
G(O)
(il)
_5 Og!i(~)}
~ ~
(iii.)
7f ij
(~) > c
oOk
2
has rank q;
q x (5-1)'1'
> 0 for all i and j;
V*(p) is such that every clement of V,,<
(iv)
A
If 0 minimizes the expression
-1
(p)
32
Q(O)
then ",hen 0
<1
(2.111)
obtains the fo11mving holds,
--0
l.emm;1 2.1:
E(~»
== (r
Under the above assumptions,
c (J / I. , 1 / 2) .
A
The vector 0 \vhich minimizes (2.1 l f) also satisfies
Proof:
0,
for j
=
1,2, ... ,q, anel Vmi,hk is the (m(s-l)
of (V*(p»-l.
1
= -
N
'mi,hk
j)-th element
Due to the assumption (iv) we may "rite
v(mi), (hk)
where t
+ i, h(s-l) +
0 (1).
p
•
t
(2.16)
mi,hk'
vle may now add
t
o
•
fel, bc
to both sides of (2.15), where the superscript (or subscript)
cat:('~;
the functions is evaluated at O.
,
q
I
(&1 -
1=1
""
+
O~)
T
I
8-1
I
T
I
8-1
L L
k""1 m=l
T
8-1
T
5-1
l.
L k=1.
1:
l.
11\'=1
1:
1'''1
gkh(~
0
gk,h(~ )
N
aOj
a
[a g Jm. (6)]
_
t mi , hk
--00.';-A
0
-
0
»
N
1=1
rl~h
ael
fd,bc
-
s-l (r k h
............--1
1:
T
[dgdf(~)] l(~gCb(~)]
t
I
f=1 d=l b=l c=l
8-1
h=1
Thus (2.15) becomes
-0
T
h=1
(2.15)
t
ag.
(0)
'J,
ao j
- _
_ .. _:..L III_
ml, hl~
0
"0" indi-
33
_ '1'): ,<,-I
-I I'1'
h~l k=1
TIFI
sf
1 [
gk,h
i=1
'"
(0)
~__:
(8 0 )]
gk!.!:t_:'_ _
N
(2.17)
T
- I
11=1
8-1
T,
s-1
,
I l ;L
k=1 m-l 1.=1
[a (0)]
tmi,hk ---~
gT lID
•
0
[
~
J
0
"
gkh(~)
for j
- gl
o
] (8 )
(,1
-
1, ... , r.
We can rewrite (2.17) in matrix form as
(8 - e
-
where T
W
j
-0
)'G(O )'1' -lG,(O ) = -N~ G(O )T -l(r - g(O » +
-0
0
-0
~O
0
-0
wee),
(2.18)
" =' (w (O),
"
"
{ tml,
. kh}' _w'(O)
... ,wr (0»,
and
~
;l -
(0)
(2.19)
We
a~;sull1e
now that the functions g '. . (0) are functionally
1)
slIch that C(O ) l'
-0
-]
0
-0
-0
(2.] 8) .
indel)cn(1L~nt
h;)s rank q.
. C' (0 )
If we denote F(O)
-
G(8)'1' -lc' (0 ). then 'lie may solve for 0 in
-0
0
-0
Explicitly.
"
o
(2.20)
Notc tllat the set of equations (2.20) is id~ntical to the set in (2.]5).
Consider now the sequence
2k
generated by
(2.21)
Note that w(O
_ _ 0 ) == O.
If this sequence. converges. to say
a solution of (2.15).
~~"
then
~>"
~
is
Following Cramer. it can be shown that using the
Bienaymc-Chebychev inequality in conjunction with a first order Taylor
series expansion of w(O) that with probability greater than 1 - s
T/~
0
2
we have
K2
0
~]
k
---< K
1 o
[
,
IN
wllere K and K are constraints independent of
2
l
~
or N, and
~
E
(O,n
1/4
).
This shows that for large N the sequence {Qk} docs in fact converge.
Therefore we may write 8* as
8* == 8
-0
+ (8 1 - 8 ) + (02 - 0 ) + ...
-0
-0
(2.23)
Then using (2.22).
00
10*
-
El
-0
I
I
< K1
b\
1=1
-
(2.24)
35
whcn~ b :::
K/JIN.
From (2.24) one' sees that
(2.25)
which proves
19*
.- - a I""
~o
0
p
(n- q ), qc(1/4,l/2) since A c (0,1/4).
One
may !;ec tha t 8* ts unique with probability going to one since for any
at
solving (2.15) one may determine that
The result is innnediate upon applying (2.26) to (2.20).
large,
"'-
0
~
Hence, for N
0* and the proof is completed.
Q.E.D.
One,,,i 11 note that the lemma is for models of the form
n
k(O) •
(2.27)
An immediate corollary is the extension to models of the form
hen)
keG),
where hand k satisfy conditions similar to those in Section 2.4.
With tIle results of both linear and non-linear ~odels carrying
through to the correlated multinomial case, the implicitly defined function models will also follow in like manner.
2.6
Discussion
In summary, one may define the module method for cases when the
module sets of data are not independent.
The analysis, in such a case,
follows the same proceduro with the exception lllat
VCr)
is replaced by
36
v* (p) .
In a dependent case, the "partial likclihooll" models giving the
constraints arc now "marginal likelihood" models.
this morc by example in a subsequent chapter.
\vc \Vil1 illustr<.lte
CHAPTER III
SECOND ORDER VARIANCE CONSIDERATIONS
3.1
Introduction
When implementing the two-stage procedure one evaluates, in
~;omc
sense, the merits and drawbacks of the particular module level used.
In Chapter I we made some informal remarks about the hierarchy of possible models.
The purpose here is to formalize a procedure for evalu-
ating different possible modules in this model hierarchy to enable the
experimenter to choose the level of module most appropriate for his
particular experiment.
Essentially, the criteria developed in this chapter are the differences of first and second order variance approximations of estimators
at different levels.
These criteria are closely related to the defini-
tions of first and second order efficiency as given in Rao [1962).
For
the single multinomial case such efficiency results for a variety of
estimators are summarized by Robertson [1972).
For other work in
related areas one may consult Rao [1960), [1961), and Shenton and Bowman
[1963] .
3.2
The Model
Let us assume that the experiment consists of T
=
~'u'v
independ-
ent multinomials in such a way that a module may consist of u'v multinomials each at the higher level, or a subset of v multinomials each at a
~
38
lower level.
We assume, of course, that different modules consist of
disjoint set of modules.
In practice we always have at least two levels
when the experiment consists of more than one multinomial since 9,
and v
:=
:=
I
1 give us both the complete likelihooll :ll1d tranditional least
squares models.
Illltlti l1ol1lial
Assumc that n'j
observations • arc made at the j-th
1
of the 1.--th module with the number of outcomes in cell k
(of the l)Ossible s cells) being denote r. '1.
1J \.
Let the vector of param-
N:('rs for the 1-th module in the lower level be denoted by
,I)
(1)
I
'j
=
(¢'
" " , ¢' ), i
11
such a vector as
1y
e
(") I
=
l, •.. ,9,·u and for the higher level denote
(0. , .•. ,0, ), i = 1, ... ,9,.
1
·11
l
For the i-th mocl-
q
ule let the 1\.-th cell probability of the j-th multinomial be denoted by
11
jJk
(T(i»
case.
for the low level case and by
We assume that 1T"
1Jk
with respect to <Pi
(or
la
for thL' high level
have continuous third partial derivatives
e. )
la
1rij1\.(~(i»
in a neighborhood of ¢ (1) (or 0 (i»
-0
-0
s
Implicitly assumed is the fact that
I
k=l
n"
1J k
=1
for all i and j
in the
same neighborhood.
ObViously, there exists some functional relationship between
<P' =(p(l)', ... ,~(9,'U)I) and!1'
=
\Sl(I)', ..• ,~(9,)').
module incorporates fewer parameters per module than
ule we knmv that
y < q.
Since a low level
~
higher level mod-
We now assume that the functional l:c.lationsldp
is of the form
xo
and 0
DB,
where X and D arc known matrices of maximal rank and
(3.1)
~ =
0\, ... ,(3z)
the vector of unknown parameters which we wish to estimate.
FollmvJng the proposed module method, one estimates p, first by
estimating the paramcterH of each module at the module level
tl,~cd
by
is
39
maximum likelihood methods,
Then considering these estimates as impli-
cltly defined functions of the vector of observed proportions, 1', one
In our case the
uses wcip,htcd least squares to combine the results,
vector
2(1)
'I'
which minimizes
v
1
and the vector
1
-
e (i)
A
s
L I
L,(¢(i»
r. Ok log 'IT ijk
1.J
j=l k=l
(2
(i)
)
(3,2)
which minimizes
L (6 (i»
2 i -
r
=
j=l
s
I
k=l
rook log 'IT ° 'k (8
1J
1J
(i)
-
)
are the maximum likelihood estimates at the low level (subscripted 1)
and the high level (subscripted 2), respectively,
(Subsequently, we
suppress the parameters and use subscripts instead,)
To estimate the inverse of the covariance matrix we define
={r
.
l'
j=l
O'::.:.:.2:.J_
IT i Ok 0'~IT i 'k
ijk 08.
08·
1 a . 1Q,
(3.4)
1, ... ,Q,.
Similarly, we define
s
I
k=l
n.1Jop·'k
d'ITo'k
° '1
1J
__~_ d'IT 1J
C
2
a¢o
o¢ .
1.Q,
'IT ijk
1. a
(3.5)
2
_
no1.Jopiok
}
J . a 'lTiOk
J
'lT o
a¢i aepi
'
i k
J
a
Q,
One may sec that the Fisher information matrix
1
F(i)(¢) for the low
-0
40
level c<wc is
f or all 1.
IT(i)(To'~(Po)) and
similarly 21"(1)(£0) ==
2T(i)(~O,T1(~0»
If \-le apply the chain rule to (3.4) we go t
LLL
k (1
C
n. .
an .. k alT •• k
_-Uh_ p.. -~- __-2:J_ ()q>.
2 1]Ie a<!>i
[
n ij Ie
c
1d
(3.6)
(3.7)
lIenee
(3.8)
The second· stage of the estimating procedure for the level one
A
model is to estimate ~ with l~' the vector minimizing
(3.9)
lQ(15)
where
"-
4)'
"-(1)'
(4)
, •••
A(u~i)'
,4>
)•
Similarly, level two gives thes csti-
matc 2~ which minimizes
(3.10)
where
0'
(8 (1) , , •.• ,8 (u) , ) .
41
Fi.rst, second, and higher orders of variance approximations, in
addition to various orders of bias, are derived from the stochastic
'l';lylor series expansion of the particular statistic.
Explicitly, we
may. eXjJ.::lnd the estimator
. (3 a of Sa as
2
i ,j ,k
(lSa
(P"J1J <.
(p
+ . . . , a
"j) -~--1.J <. '\
ll.
+
OPijk
rtw
-
1T
rtw
(3.11)
)
1, ... ,z,
where p' = {p .. k} is the vector of__obscrvcd proportions and the bar
1J
dS
(" - ") in a derivative, such as a--~' indicates that the derivative is
Pijk
to be evaluated at
E = ~O'
when ~O = ~(£O) obtains.
In (3.11) we assume
all necessary differentials and regularity conditions hold.
A
The first order relative efficiency of the estimator B , then, is
a
calculated by comparing the first order variance of
6a ,
based on the
first term in the expansion (3.11), with the first order variance of
some standard estimator, say the maximum likelihood estimator.
DIe
Due to
properties of the remaining terms of (3.11), the second order var-
lance of ~ , based on the first and second terms, is a measure of the
a
rale an estimator converges to its first order properties.
difr(~rences
Hence, large
in the second order variances of two estimators indicate a
large difference in information contained in its first order variance
.:lpproxLmation (see Rao [1962]).
42
Upon examining (3.11), one notices that the differences in first
and second order approximations of different estimators lie in the differonces of the first and second partial derivatives.
Hence, these
derivatives for the two-stage estimates and the likelihood estimates
become the point of this chapter.
Z
C ~1) (n)
Z
Jkm -
nnd
:=
)J.
C(2)
(n)
z jk.m,rtw
3.3.2
For convenience denote
(1)
1,2
Z :::
(3.12)
1,2.
(1)
Comparison of lCjkm(~O) and 2Cjk.m(~O)
To determine
2Ci~(~o)
we first differentiate (3.1) with respect
to B , a ::: 1, ... ,z, to get the estimating equations
a
(3.13)
Thus, differentiating (3.13) with respect to Pjk.m one gets
C ~1) ('IT)
2 JkJn -
(3.14)
where
and
2 H
for all j, k, and m.
43
"(1)
From (3.3) we note that 0
satis'fies
01fi-j k
• -.----..~ == 0,
aOi
a
= I, ... ,q
and i
= I, ... ,u,
(3 ~ 15)
a
Differentiating (3.15) with respect to
P'l n we have
J eN
2
"
)
T(O__
'1'
--
• l·l (1)' (O/'-)
2 j km -
Applying the chain rule to 2G'1
where 2G'k (0)
J
III
(3.16)
G,
(0)
2 J kill _.
J
-
(0)
<'111 -
we see
G.
=
(8)
2 Jkm -
{
an'l..
n'l
L- ~
n'k
C
J
_.l~
a¢.
~
III
c
a¢_i c
}
ae.
~
(3.17)
.
a
Recalling formula (3.1) we may rewrite (3.17) as
(3.18)
where
(cp)
G,
1 Jkm -
=
-n'
k
--~
n'
{
J km
Substituting in the results of (3.8), (3.16) and (3.18) into (3.14), we
sec that, assuming Fisher consistency, l.e.,
C (l)(
2 jkm ~O
)
(n'x' IF(~'VO )
X D)-l
-
Analogous to (3.14) we may write
"
~(~(~O))
nIx'
G
=
~O'
(~)
1 J<1I1'I
'Va
we have
(3.19 )
44
(1)
l cJ'km(1r)
-
(3.20)
(~ - R 1 8) + (R I 1F(~) R) -1 R' 2F(~)
" ' - - 1
where R
XD
and
,
1
"{
<l</>i a } •
11 (1)
k.n (,I,)
= ------1!
j
dPjkm
Jl
~l)
Jkm
From (3.2) we
I
(¢)
- '
.\.0'-'
--
that
"
G
1 J'1<.m (</».
-
(3.21)
Substituting (3.21) into (3.20) we get
(D'X ' F(</> ) Dt Xt )-l D'X'
-0
1
C'k (<1>0)'
J m -
To summarize, we have the following
Lemma 3.1:
If
'lT
ijk
are functions of
derivatives with respect to
<I>
a
<I>
(or 0) such that the third partial
(8 ) exist and are continuous for all a
a
;;: (1)
and the functions (3.2) and (3.3) have unique maximizing vectors '"
and
0(1),
then
(3.22)
when model
3.3.3
(3~1)
!o
holds and
Calculation of
To determine 2C~12)
obtains.
C(2)
('IT ) and C(2)
(1T )
1 jkm,rtw -0
2 jkm,rtw -0
t ('ITO) differentiate (3.14) with respect to
Jon, r w -
P
rtw
to get
(D '
2
F(B) D)·
-0
C(2)
('IT ) + (D t p(I)(B) D) c(l)(n)
2 jkm,rtw -0
2
-0
2 jkm -0
(3.23)
=D
t
G(l)
(8) + D'
2 j l<.m, r tw - 0
F(l)(O). F(O )-1 • ZG'k (8 )
2 . r tw - 0
2 -0
J m -0
45
- DI
'1'(1)(0
•
11)
2rtw -0'-0
2
F(O )-1 • 2GJ'lnn(9 ),
-0
~,,- 0
I
We may rewrite
0"'0
~ ~O
T(1)(0 IT)
2 rtw ~O'-O
IT=lT
~
~O
1
I , ('J,i(1)(0
as (18fl
2 rwt ~O'~O )) ,Wlcre
Ti (1) (0 I f )
2 rt\1l ~O'_O
s
- { Iq
-2n'l
L L
k=1 m=1
d If ikm
1(
---2 -dT0,
lT
ikm
C
1
d IT 1'Jem
--
ao,l
a
2
a
a 1T 1.'km
lT
ikm
+ U ik - - - afj dOi
oe.
lb
a
C
2
n
a
b
2
d 'If 1'km d
'If'
l 'k
m
-ao,ae.
af\ l-b
1
+ n ik
a
A
'lf
dlf
ikm
ik
.-----+ ---ao,
aO,
If . kI
1
II
1
a
'k
III
-1-
as c
lb
a2f\
ap rtw +
1T
n
If
drr
rt
2
rtw
---ao.1
rtw
a2 IT rtw
n
- -rt-
~ 'If j Ian
d f3 c
a
(3.24)
an rtw
Cle,
1
b
}
rtw
is derived from (3.4).
similarly by replacing 0,
l'a
cllain rule to
ae.i
a
and
ae,1b
with ¢'
1a
and O.
1b
with ¢' .
lb .
Applying the
we see that
T(l) (0 IT)
2 rtw ~O'-O
XI
T(l) ('"
) X
'
1 rtw !O'~O
(3.25)
due to Lemma 3.1.
In order to simplify the expression (3.23)
and
G(1)(O).
2 rtw -0
(1)
\ole
From (3~24) we sec that we may write
rewrite 1Trt\1l (PO' ~O)
,],i(l) (A,
) ""'
'1'0'
- rf
-O
(3.26)
·
1 ·rlw
when~
"{~
Fi(l) Cql)
1 rtw ~
n
LL
TT ikm
m c
()
lr ikm
dTT ibn +
--_.__...
+ .....ik
:-- _._----_.
TT
ikm
dBcd<Pi
d<p.
2
ik
~----
n
,irtw (</»_ "{I L L ik
k m c TT ik111
l·b
d 11 ikm
-_._----
If 1'kHI
lb
a
n
01Likm
-----dql.
._~--
2
l!
n
alT iJall
--_._d¢·1
a
ik
-1
TT
ikm
dBcdc!>i
01T
dTT i kIll
-~---
o<Pi
b
ibn
a
arT ibn
--._--dB c
~~::w }
,
dTT ikm oTT ibn
------
-----
Cl¢ j
dBc
dcjJ •
a
lb
and
arT rtw
--a<pj
a
To rewrite
e(l)
(8 ) note that
2 jkm,rtw -0
e(l)
(El) -
2 jkm,rtw -0
n'
k
-.1_.
L 1
TT
{ jkm
c TT' k
J 111
2
a TT jkm
Cle i aoc
a
~~~-}
aprtw
(3.27)
8=8-0
p=TT
-
0
Applying the chain rule and using (3.1) we have
C(l)
(8)
c
2 'jklll.rtw -0
r.]km((II)
~·O
Xl
X D
e(l) ('lf ).
1 rtw - O
(3.28)
Hence. we may rcwdotc (3.23) as
(n' 2.1"
(9- 0 )
+
C (2)
(TI )
2 jkm.rtw -0
D)
(D'
F (1) (0 ) n) • lC ~ll ) (lT )
2 rtw -0'
J \.m - O
(3.29)
- nIx'
W
rtw
(¢) X
o~o
- nIx' r rtw (~)
!O
2
X
1
F- (0)
G
(0)
2 jkm -0
-0
1"-1(0)
2'
G
-0
(8)
2 jkm -0 .
(2)
To calculate l C , km t (TI )' we substitute R
J
.r w ~ O
(R 1
I1"(~'O)
== R'
R)
!
C(2)
(If)
1 jkm.rtw -0
(¢ ) 0+ R'
e(l)
1 jkm.rtw -0
+
(R'
XD in (3.23) to get
F(l)(.h). D)'
C(1)(TI)
1 rtw !O
1 jkrn -0
F(l) (¢)
1 rtw -0
1
F- 1 (¢) lG · °, (cp
... O)
-0
J km -
(3.30)
- R'
Following an argument similar to the prececding and putting XD for R we
have
( n'x'
F(~)
I!O
XD)
C(2)
(11 )
1 jkm.rtw -0
nIx' r
jkm
+ (nIx'
(A,) X D
!o
F(l)(A
1 rtw !O
e(1) (If
1 rtw -0
) XD)'
c(1)( )
1 jkrn ~O
)
(3.31)
-
D' X' W
(A)
rtw ~O
IF
-1
(~)O)·
'~
G
(A)
1 jkm :~O
-
DIXI
r rtw!O
(,I;)
F
.:..1
l'
(,j..)
G
(,j..)
!O'· 1 jkrn !O .
One notDs that the only differences between (3.29) and (3.31)
F-1(~0) =
appear in the pairs
(X'
IF(~O)
X)-l and IF-l(PO) and
(00)
2. (;']
J U1\ -
Xl lG'1 (¢O) and IG'k (cPO) and in.a matrix X after Hand
in (3.29).
Obviously, if X is the identity, i.e., the same level ",as
J <rn -
J m -
used both times, the second derivatives, c(2), are the same.
r
Secondly
one notes that the complete likelihood model corresponds to tile higher
level two with D equal to the identity matrix and t
=
1.
We may summarize the results of this section with the following
Lemma 3.2:
Under the same condit ions as Lelmna 3.1,
(D'X' IF (cPO) X D)-I {D'X' (W.
-
3.3.4
1
(<PO)
tw -
+ r rtw (cPO»
-
(3.32)
Discussion
One will note that the last terms in (3.32) look an expression
for residual sum of squares after a weighted regression has been fit to
-1
-IF
(PO) • 1Gjkm(PO)
(The minus sign will cancel the minus sign in
the definition of 1Gj km (~O) (see (3 .18») .
One wilJ recall from regres-
sion theory that increasing the dimension of cP (and X) will cause the
"fit" to be better, whether the added parameters are appropriate or not.
Hence, in this last expression each vector entry is monotone with
respect to the dimension of cP.
The degree of positiveness or negative-
49
ness depends upon how close the components of lGjkm(~O) approach a lincar model of the form Xy, for y some set of unknown parameters.
The first term on the right hand side of (3.32) is the first
" and hence is positive definite.
order variance approximation of ~,
fact, if
]l
In
is the identity matrix, corresponding to fitting a complete
likelihood model to
e,
then this term is the inverse of the Fisher
information matrix (see (3.8».
From the above one sees that the sign of the components of the
lGjkm(~O)
difference vector (3.32) depends on the form of
two matrices Wr t w (</>0)
and
_
r r t w(</>0)·
_
and on the
Further investigation of this
point to determine the possibility of 'super second order efficiency'
is being undertaken.
3.4
Extensions to the Model
~(sV = X~
for Level One
It may happen that the parameters </> in level one cannot be represented as linear in
formation can.
e
or
8.
However, suppose that some suitable trans-
Explicitly, suppose we modify model (3.1) to
X(i)O,
e
where h
ia
=
a
l, .. ,y,
(3.33)
DB,
satisfies the regularity conditions of Section 1.7, and that,
in addition,
H
(i)
(</»
-
dh.l.a }
= { ---d</>.1.
J
is non-singular in a neighborhood of
(hi (</>
1 -
(i)
), ... ,h i «~
y -
(i)
»
20.
"(i)
and let ~
Let
(i) ,
S
=
denote the estimate of
S
(")
1
ca1-
50
culatcd by replacing ~
above with S(<p)
(")
A
wit.h <P
1
(i)
The
.
estinwtillg procedure is as
:=
dh,
~~ (1) (~)
-jkm ~
1
:=
__
~
{ dP 'k
.l m
.Hence. for model (3.33) 'l7e have
l CJ~knl)l(~TO)
-
:=
(D'X' S(¢ ) X D)-l D'X' S(cjJ )
~O
-0
~~~l)'
Jkm
(q) )
-0
(3.34)
for all j,k, and m, by an argument similar to that given in Section
3.3.2.
The vector
2C~11) (1f O)
J un -
remains unchclllged, (3.14), except that the
relationship to (3.19) does not hold.
,
I ' b y eva 1 uat j .ng
tlonsnp
--
J
To evaluate ~jkm
J k TIl
(1) '(A,)
,.
'PO an d rewrltlng
terms.
~'k
_ (1)'
to P'
Let us now get a similar rela-
TIl
-
(20)' note that differentiating h
ia
Hith respect
gives
dh.
1
a
8p J' km
dh i
L
c
A
d<P c
a
aep c
,
for a
Clp jkm
I, ...• y and i
:=
I, ...• v. (3.35)
\
lienee, we have
';:; (1) , (A,)
H(cjJ) •
Substituting in the value of
-jkm
H
~l)'(cjJ)
Jkm
-
!
.
(3.36)
given in (3.16) we get
(3.37)
Substituting in (3.37) and the value of S(!o) we get
51
e(l) (1r )
1 jkm a
(3.38)
On the other hand. the relationship of lGjkm(~O) and 2Gjkm(~O) under
assumption (3.33) may be derived by
=:
an}
-n
__
~ dB_ jkm
{ If J'km
-n
'
J.
jk
lr 'k
a
an'
k aep }
dT- dG-. •
J m
.J'm
c
(3.39)
c
J.
a
From (2.2)
(3.40)
whence,
(3.41)
One may also use (3.40) to show that
(3.42)
To summarize, we have
Lelnma 3.3:
rank, and
If
7f
~(i)
ijk
is such that H exists, is continuous and of full
sati.sfy the conditions of Lemma 3.1, then
(3.43)
In deriving a relationship between
one first notes that from (3.40) we have
e(2)
(u ) and e(2)
(n )
1 jbn,rtw -0
2 jkm,rtw -0
52
(3.44)
where
"Ie
n
a
-1 A
== ----- 11
(cp).
(q)
3p rtw
=
D'X' ]1- 1 '(,1, )
- nIx'
From (3.44) we have
-
!O
(,1,).11 - 1 (,1,) X D C(l)(lT)
jkrn!O
'!O
1 rtw -0
r
-1 '
II
. (PO) Hrtw(~O) lI-l(PO) X ZF
-1
(~O) 2GJ'kIii (8- 0 )
(3.45)
- D'X' H- 1 ' (PO) rrtw(PO) II-l(PO) X ZF -1 (~O) 2 Gjkm (~O)
1
*'
- 2D'X' Hrtw(T.O)
IT(pO'~O) n- (Po) X ZF
-1
(~O) zG jkm (~O)·
Similarly, we have
+
(D '
F(l)(o) D) C(l)( )
rtw -0
.1 J'km !fo
z
(3.46)
53
From (3.115) and 0.46) we may show a lemma similar to LcnIDla 0.2).
3.5
Discussi.on
The formul::ttion of tho criterion has be~n with respect to the
expansion (J .11) .
One 'viII also note that using the above results one
can estimate first order bias.
Thus if one wishes to base module selec-
tion on second order variance differences, or differences in first order
bias, all necessary results are given here.
the estimates of
~
In addition, if one feels
are stable, one may estimate these criterion without
calculating estimates for 8!
CHAPTER IV
SOME EXAMPLES
4.1
Introduction
In this chapter we give three examples in which the two-stage
module approach is illustrated.
The first example is used to illus-
trate tIle different module levels and their resulting analysis.
second example tIle use of a different variance estimate is given.
In the
The
third example involves the extension of the approach to a multivariate
problem.
4.2.1
Introduction
In Chapter I we discussed a hierarchy of possible models upon
whicll
i
module method of analysis may be based.
Each of these different
models is used to analyze a different subset of the data as a module of
the experiment.
Since the parameters of the module a-re estimated by
maximum likelihood procedures, more complex models, giving modules which
involve larger subsets of the data, more efficiently use the data for
the ultimate inferences.
However, numerical calculations become very
complicated in these "better" models.
In most experimental situations
in whIch this two-stage module method may be applied, one must weigh the
pros and cons of any selected model.
scctloll treats.
It is this point that the present
55
The purpose of this section is to illustrate the choice of varfous module levels in a particular experimental situation.
Explicitly,
we b;lse the analyses on different modules of the dilution experiment
introduced in Section 1.2.
In this experiment, Wllich modules are the
natural or most obvious modules to use is not as clear as in examples
given in subsequent sections.
Thus, which of the four possible modules
to use will be examined by looking at each module level individually.
4.2.2
Experimen~
The
To review, consider the dilution experiment of Schiemann [1972]
In this experiment there were 5 populations of bac-
descrihpd in 1.2.
teria, subscripted by h, corresponding to the different pH-temperature
conditions.
In the h-th population, observations were taken at the n
time points t}
1"
population, q
were made.
l,t
=
h
2, •.. ,t h
.
, nh
h
For the j-th time point of the h-th
3 dilutions (zl'zZ' and z3) of the bacteria solution
(The subscripts hand j are suppressed in the zls, although
the dilution factor used depends on the population and· time point.)
From each of these dilutions 10 tubes of growth medium were inoculated
by a portion of the dilution.
After incubation, the number of tubes
showing a bacterial growth, say r .. for the i-th dilution, were
1J h
recorded (see Table 1.1).
Inferences on the different decay rates will
be made from these data.
One may consider the responses of the different inoculated tubes
as independent.
Thus, if I we follow the assumptions of Section 1. Z and
assume an exponential decay rate, following Mather [1949], the likelihood for the entire experiment is
56
5
3 [ 10 ] .
r ij h
r 'jl (l-exp (-z .exp (Pi +6 i t.,))
i::l 1 1
1
1
1 J
.
L(r,8) :: JI
h::l
IT
(4.1)
From this formulation one sees four possible module levels to
choose from.
l,
Module level one treats the observed proportion of fertile
tubes at a particular dilution, time point, and population as the module
unit.
2.
In module level two, a module is the set of three proportions
resulting from the three dilutions of a particular time point and population,
3.
Level three modules are made up of a complete set of propor-
tions for all dilutions of all time points in a population.
4.
The fourth level corresponds to the complete likelihood
model. and the module consists of all observed proportions of the experiment.
4.2.3
Module Level One
If we use the observed proportions as modules, then the procedure
is as fol10ws.
First. the maximum likelihood estimate of TT
ijh
1 - eXP(-ziAjh) is
(4.2)
where A'1
J
1
exp(llh + 13
t
h hj
)·
To estimate A
from (4.2). note that
jh
57
-- cXP(-zi A 'I)'
J 1
Hence an estimate is
~ (i)
- log (1 - Pijh)
(4.3)
jh
zi
Notjce here that there are three estimates [or A'h' one for each di1uJ
tions.
The superscript in parentheses in (4.3) denotes which dilution
was used.
Following the Jevelopment of Chapter II, we may estimate the
" (i)
variance of A
by
jh
Pijh
(10 - r, 'h)z,
1J
_
2
(4.4)
1
One may use these variance estimates as weights, Wi' to form the
. 1 d 1 east squares est1mate
''\
(w) of I\jh
'
we1g1te
I\jh
as
(~
~~:)/w,)/ i=l
~
1=1 J
"(w)
1.'1
J 1
1.
1
l/w ..
(4.5)
1
-"'(w)
To make inferences on the decay rates one could model Log A'h as
J
.
a linear
~odel
and proceed with a weighted least squares analysis using
variance estimates derived from (4.4) (see Chapter II).
Thus, the "two-
stage" module procedure bases inferences essentially on the estimate in
(4.5).
One may therefore determine the merit of module level one by
"(w)
investigating properties of A
.
jh
Unfortunately, the estimate
wllich affect its use.
~~wl)
J
has two significant drawbacks
1
The first of these is that (4.4) assumes r. 'hllO
1.J
and (4.5) assumes rijhlO,
\
Referring to the data in Table 1.1, we sec
58
lII<1ny ?crn and tcn counts.
A cOUllllon method to adjust for this problem
js to replace each 7.cro count with a small number, say cO' and each ten
count with 10 - c ' where c is a small number.
l
1
as to what values to choose.
A difficulty remains
It appears that the larger
Co
and c
are,
l
thc less nearly correct the resulting estimates and statistics will
tend to be.
The
A( )
A(')
1
second drawback is the bias of A. w when A'1
J 11
J 1
based on a small number of samplcs (in this case 10).
are
Cornell and
Speckman [1968) indicate that indeed the weighted least squares (WLS)
estimatc (4.5) can be critically biased.
select set of A values in Table 4.1.
values for various values of
Co =
This bias can be seen for a
This table contains expected
c ' calculated by enumerating the
l
complete sample space for this experiment.
As is shmvn in the table,
the WLS estimates are biased low, especially for large A values, while
the maximum likelihood estimates (HLE) are about 10% biased high for the
range of A values.
Because of the two possible sources of problems in a module level
one analysis, no further analysis is given here.
4.2.4
Hodu1e Level Two
Now consider the module of the experiment as the set of observed
proportions for all three dilutions at each time point and population.
The likelihood for the module corresponding to the j-th time point of
the h-th population is
L.h(r,A)
J
-
=
J [rli~l](l -
i=l
J
eXP(-ziA'h))rijh
1
(exp(-ziAjh))
10-r ijh.
J
(4.6)
59
TABLE
If.
1
BIASES OF HLE AND HLS FOR DILUTION EXPERIHENT
MLE
with
Actual
A
s.c.
----------_
Cl
c=.025
..
26.03
27.74
26.06
22,1+8
18.93
15.20
17.15
18.00
16.19
13.lf6
12.10
11.83
52.78
53.82
49.52
41.99
35.91
30.50
26.04
25.1+2
22.94
22.29
23.92
25.82
79.84
78.60
72.02
62.14
55.15
49.19
34.91
31. 79
30.96
34.25
37.67
39.70
107.44
102.44
94.43
83.65
76.37
69.76
44.59
38.52
40.68
46.47
49.77
50.5q
164.63
147.92
139.88
128.53
119.32
108.89
66.90
52.92
61.54
66.98
66.02
62.74
223.95
189.77
185.36
172.65
157.79
139.78
91.56
64.17
79.86
83.66
77.3q
72.06
453.57
300.88
335.37
332.04
292.59
245.21
184.33
63.51
98.75
12.6.24
144.96
173.86
660.33
345.61
414.03
444.69
414.8
373.60
271.13
41.12
73.81
131.76
202.39
275.79
867.46
362.72
451. 51
518.30
516.67
502.98
357.90
24.47
51.29
131.45
239.92
345.74
1083.23
368.56
469.52
567.85
600.09
622.84
446.66
15.39
37.51
130.55
260.64
387.40
25.0
50.00
75.00
100.00
150.00
200
400
600
800
1000
c=.5
WLS estimates with s.c.
for various values of C = Co
c=.25
c=.05
c=.1
From (4.6) we get the likelihood equation for the maximum 1ike1illood
estimate
i jh •
Explicitly, ~jh solves
60
r ij h
3.
Zi
L ( ( A )--1)i=l exp Z i j h -
._
-
3
L
i= 1
(4.7)
(10 - r. '} ) z ..
1J 1
1
Using the Fisher information number, one can approximate the variance' of
Xjh
by
3
L
~
(4.8)
{ i=l
~
This in turn has a consistent estimate, using A , given by
jh
3
=
L
10·z.
1.
- 1)
{ i=l (exp(z'·~'l)
1
J1
r
(4.9)
This estimate is the explicit result of the general formulation given
in Chapter II.
The stated purpose of the experiment is to compare decay rates of
the populations,
Thus, following Mather [1949], \"e assume that an expo-
nential decay model of the form
characterizes the variation of A
over time.
jh
With this assumption,
the (independent) functions of the proportions, namely
may be modeled as linear in these unknown parameters.
One may now
esUmate the parameters for each of the 5 populations by the weighted
least squares procedure described in Chapter I,
variance estimates are
where the resulting
61
(4.11)
V'I
J 1
The results of these individual fits (sec Table 4.2) give goodness-offit statistics which indicate that the assumption of an exponential
decay model [or each population may be valid.-
TABLE 4.2
ES'l'IMA'J'ED
l' i\PJ\~1ETERS
AND GOODNESS-OF- FIT STATISTICS
FOR HTTHIN EXPEIZUlENT EXPONENTIAL DECAY NODELS
Experimcnt
Estimate
Estimated
of intercept
s.c.
Estimate
of slope
Estimated
s.e.
Residual
2
X -statistic
r). F'.
PI
7.99
0.42
-0.025!.
0.007!1
3.26
7
P2
8.79
0.46
-0.0601
0.0082
5.54
8
1'1
7.65
0.25
-0.0291
0.0028
11.44
13
'1'2
7.90
0.21
-0.0502
0.0036
6.19
12
'1'3
7.97
0.22
-0.0852
0.0059
7.38
10
Under the assumption that (4.10) suitably accounts for the variaA
tion in A,\., tests of linear hypotheses "lith respect to the parameters
Jll
comprising
e=
~ince the param-
(U 1 ,SI'" .,US,a S ) may be undertaken.
eters in each independent population may be modeled as in (4.10), one
may estimate the set of ten parameters in
E)
by combining all data of
A
the experiment (i.e., all A'I) into one linear model formulation.
J 1
one can test a general hypothesis of the form II
o
C'O
-
== 0, where
Then,
C is
some known matrix of maximal rank, by using standard linear model proccdures as illustrated by Grizzle, Starmer, and Koch [1969].
In reviewing Table 1.1 we note two types of experiments.
One type
62
:i::
th<.l t ill which the pl! level is varied for a fixed temper:l ture <:lud has
labels 1'1 uud 1'2.
The second type, labeled Tl, T2 and T3, has exper-
imcnu; Ju which temperature is varied for a fixed pH.
By the nature of
the laborntory procedure in beginning each experiment, the initial concentrat.:iol1s of bacteria among PI and 1'2, and among 'II, 1'2 and T3, will
probDb1y he eqlwl \Vitllin experiment types.
A second look will reveal
that expl'rjments 1'1 and T1 were actually done under the
~;ame
pH and
temperature conditions, and hence one may expect these two to have the
same rate of decay.
These and other hypotheses of interest are given
in Tables 4.3, 4.4, and 4.5.
In the context of the exponential decay process, the equality of
initial concentrations is the same as equality of "j.1" parameters, and
equal decay rates correspond to equal "13" parameters.
Thus these
hypotheses are linear in the estimated par<Jrllcters, and tests arc given
in Table 4.3.
Also of interest are hypotheses comparing decay rates (or "13"
parameters) for different populations.
As shown in Table 4.!f, these
comparisons give significant differences.
The final model for this
experiment is fitted in Table 4.4, where parameters appearing equal in
the results of Table 4.3 have been combined into one.
Corresponding
tests arc in Table 4.5.
4.2.5
Module Level Three
In this case the modules are made up of the observed proportions
for all dilutions and time points for each population.
The likelihood
equation for the moJule corresponding to the h-th population is
TABLE
LI.3
TESTS OF HYPOTHESES FOl{ COHPARISONS OF INTERCEPT AND SLOPE PARAHETERS
2
X
D.F.
Hypothesis
----------PI' 1
111'2
'=
1l'J'1 :", llT2 == Jl,l'3
8T1
1
1. 68
2
1.05
131'1
ST1
1
0.22
81'1
Bp2
1
9.85
2
79.07
== ()T2 ==
BT3
TABLE 4.4
ESTIHATED PARAMETERS AND STANDARD ERRORS FOR FINAL NODEL
Parameter
Estimated
s.c.
Estimate
PI'
8.34
0.15
11'1'
7.86
0.12
8
-0.0313
0.0017
131'2
-0.0524
0.0035
13'1'2
-0.0495
0.0027
13'1'3
-0.0828
0.00Lt6
TABLE
fl
.5
TESTS OF HYPOTHESES FOR FINAL MODEL
D.F.
Hypothesis
1
PI' == P T
13 1
==
13 1 ==
131'2
13 T2
t:I
43.91
181.'45
4
251. 30
1
== 131'3
BT2
== 131'3
P T > 131'2 == 13
T2
13 T3
PI' == P T > 131'2
PI'
2
<=
B1
==
13 1
== 0
5
64
1)1
3[10]
.
r
(l-exp(-z.
L
= L i=l
j=l
.}
iJ 1
1
(4.12)
Of interest here is study of how estimates and test based on (4.12)
compare with the level two analysis.
estimates of JJ
h
In this brief section we give the
and Sh acquired directly from maximizing (4.12). and
compare these and resulting tests with the results in Section 4.4.
Although the estimation in this level is more difficult. the
results of these estimates are more efficient (Rao [1962]).
Thus, any
change in conclusions based on analyses using (4.12) will reflect inadcquacies in a level two analysis.
JJ
h
and
Bh
for each population.
Table 4.6 gives the estimates of the
One will notice a high degree of agree-
ment of both estimates and goodness-of-fit statistics between Tables
4.3 and 4.6.
To test hypotheses on the parameters in
squares, recall that one needs the covariance estimate
e
by least
of~.
Estimates
of this covariance may be derived from the Fisher information matrix
(4.13)
3
(l-TI"h)(A'hz,)
L--r.,}
l_J 1
~ J
1
i=l
2
3
r, 'hth' (l-lT, 'h) (A 'h
L ---2:.1
J
J
1J
TI ijh
i=l
2 .)
TI ijh
3
L r.1J'1) t l1J. (l-TI 1J, '}1 ) (A ]'}1 z 1. t}
A
jh
= exp(JJ h + Bht hj ).
before.
,)
2
~
Symmetric
as described in Chapter II, where TI
2
1
1=1
ijh
TI ijh
exp(-z,A. ) and
1 Jh
Tests for linear hypotheses on
e
follows as
The results of these tests, given in TabJle [•• 7, compare favor-
65
TABLE 4.6
ESTIMATED INTERCEPT AND SLOPE PARANETERS
FROM NAXIMUM LIKELIHOOD ANALYSIS
Experiment
Estimate
Estlm3ted
of
s.e.
intercept
Estimate
of slope
HPN
likelihood
ratio
X2 -statistic
Estimated
s.c.
-----------------------------------------0.0253
0.0074
0.42
PI
7.97
3.18
D.F.
7
1'2
8.94
0.43
-0.0635
0.0076
1'1
7.64
0.24
-0.0293
0.0027
11.03
13
T2
7.93
0.21
-0.0516
0.0036
6.92
12
T3
7.93
0.21
-0.0855
0.0056
7.19
10
8
TABLE 4.7
ESTIMATED PARAMETERS, STANDARD ERRORS, AND TESTS OF SIGNIFICANCE
FOR FINAL NODEL FIT OF NAXIMUM LIKELIHOOD INTERCEPT AND SLOPE PARAHETERS
Parameter
Estimated
s.e.
Estimate
PI'
8.36
0.15
]1T
7.86
0.12
81
-0.0316
0.0016
81'2
-0.0539
0.0034
f3 T2
-0.0505
0.0026
131'3
-0.0841
0.0043
\
RCHidua1
3.69
D.F.
4
D.F.
2
X
]1T
1
11.46
81'2
1
52.22
2
211. 22
Hypothesis
]11'
<=
81
81
81'2
81'3
66
ably with previous results (see Tables 4.4 and 4.5).
4.2.6
Module Level Four
In a manner similar to that of Section 4.5, we desire to compare
results from the full likelihood model, module level four, with those
of module level two.
In using this model, we must estimate all ten
parameters simultaneously.
tain parameters a priori.
parameters initially.
As noted above, we expect equality of cerThus \Je fit the "reduced model" of six
These results are given in Table t•. 8 and compare
well with previous results.
TABLE 4.8
ESTIMATED PARAMETERS AND STANDARD ERRORS
FOR PURE MAXIMUM LIKELIHOOD FIT OF "FINAL HODEL"
Parameter
l-lp
l-lT
Bl
Bp2
BT2
BT3
4.2.7
Estimate
8.36
7.86
-0.0317
-0.0540
-0.0505
-0.0840
Estimated
s.e.
.15
.13
.0017
.0032
.0027
.0043
Discussion
In summary, we have investigated the results of combining the
maximum likelihood and weighted least squares procedures at the four
different levels of the dilution experiment.
This investigation indi-
cates that module level two, where the basic module is the observed proportions for the series of three dilutions, gives very good results.
This level analysis involved a series of one-dimensional computer
67
scarches in conjunction with standard matrix calculations and was simple
to employ.
If the assumption (4.10) had appeared unsatisfactory, one
m:ight have investigated other models for A
quite easily by using linhj
ear models tec1miques.
When the assumptions are correct, module levcls.three and four
arc considered
~refcrable.
computer procedures.
thc~;c
llowever, these analyses require complicated
In our example, cstimates of the parameters in
last two levels were extracted using a pror,ram prepared by Kaplan
and Elston [1972].
If the assumption (4.10) had been shown unreason-
able, the entire analysis would have to be redone to investigate more
complex models of A...
J..J
Because of these complications and the similar-
fty of the results for levels two,
th~'ee,
and four, module level two
seems reasonable for such experiments.
4.3.1
Introduction
The formulation and tests of hypotheses of "no interaction" in
multidimensional contingency tables have been given by Bhapkar and Koch
[1965].
By considering the categories of a (complex) contingency table
as factor groups (fixed marginal totals) or response groups (random
marginal totals) these researchers give test criteria for a variety of
experimental situations.
Calculation of relevant test statistics are
made using linear models methodology and are very easy to implement.
Grizzle, Starmer, and Koch [1969] indicate that the special case of the
"no second order interaction" hypothesis on a set of t 2x3 contingency
tables considered by Berkson [1968] and Plackett [1962] is only a slight
modification of the fornmlation of Bhapkar and Koch.
Calculation of
68
test statistics in this case follows the linear models technique of
weighted lcast squores on special log functions of the data.
In other
situations the modification and generalization of the Bhapkar-Kocll formulation for higher order interaction hypotheses in contingency tables
is recognized to be of interest.
In certain experimental situations one may wish to use particular
functions of the data in a more general formulation of the hypothesis
of " no interaction."
In case the response group categories of Bhapkar
and Koch are structured (eg .• response categories may be numerically
related) one may wish to appeal to this structure to gain efficiency in
his tests.
For example, in an experiment giving a multi-factor single-
response situation we can form a hypothesis of " no interaction" with
respect to relevant functions of the response proportions (eg., mean
scores).
Choice of relevant functions is left to the researcher and his
understanding of the experiment and its sources of error and variability.
Besides the possibility of increased applicability of this func-
tionalformulation, one may also increase the convergence rates of test
statistics in the sense of Rao [1962] by using particular functions.
In this chapter we examine the "no second order interaction"
hypothesis in
th~
data of Kastenbaum and Lamphiear [1959] by formulating
the hypothesis in tenns of functions of the data.
The functions used
are the maximum likclihood estinlates derived from assumed partial likelihood equations.
In the no-interaction setting we model these impli-
citly defined functions as linear.
Estimation and tests follow the
module procedure previously described.
69
Consider
illl
experiment consisting of two factor groups subscripted
i
ilnd j respectively and one response group subscripted h, where
h
:=:
1 ..... r. i :: 1 •...• s. and j
=
1 .... ,t.
Let
'lT
hij
be the probability
that an experimontal uni.t from tho (i.j)-th combination of factors
belongs jn the h-th category of response.
Then
r
I
'lT
h=l
hij
=
1,
i = l •..•• s; j
l, ... ,t.
In the notation of Chapter I , i and j are subscripts for the m
s •
t
tIlultinomials of r categories each.
Following Bhapkar and Koch. we say there is no second order interaction. with respect to a certain set of functions. between the response
and the two factors if the dependence of the set of functions of the
distribution of the response on one factor is constant over levels of
the other factor.
If we choose v«r) functions ¢l' ••• '¢ , a general
. v
formulation for such a hypothesis. with respect to this set of functions, is
<PR,ij
(4.14 )
i
where the suhscript
*
< s, j < t, R,
= 1, ... ,v,
denotes that the subscripted quantity is independ-
cnt of the corresponding suffix.
¢R,ij - ¢tit - ¢tsj
This is equivalent to saying
+ ¢tst = 0, i < s, j < t, t :: 1, ••. ,v.
For the additive model of Bhapkar and Koch set v
(4.15)
r-1 and define ¢R,ij
70
as
and for the multiplicative model define
Actually, as discussed in Chapter II, any other set of functions
¢~
felt to reflect differences of interest may be used provided the second
partial derivatives with respect to
TIn ••
,x,lJ
exist and are continuous.
The test criterion for the "no interaction" hypothesis (4.14) is
based on the results in Chapter I I and the constraints in (4.15).
Explicitly, let z'
=
(zlll,z211"" ,Zvst) be the estimate of
In our case zo.. are maximum likelihood
,x,lJ
estimates hased on the (i,j)-th module (factor group combination).
(4.16)
In matrix notation choose B such that
Bz
(4.17)
If we define
<l>(p) =
d¢O' .
,x,lJ
{ a1T hij
Then, in accordance with (1.17), the statistics
(4.18)
71
is distributeu asymptotically as a· chi-square variable with v· (s+t-1)
degrees of freedom when (4.15) holds.
In some cases the functions ~1ij may be linearly related.
Suppose
~1ij are defined by
(4.19)
where f(£,j) are known functions and u
and 8
are unknown constants.
1i
1i
In this case, (4.14) becomes
(f(1,j) - f(1,t»
• (B n . - Sn)
);,1
);,S
o.
If the functions f are such that
f(1,j) - f(1,t) ~ 0,
i < s, 1
1, ... ,v,
then the hypothesis of no second order interaction is equivalent to the
constraints
B1i
Q
- ....1s
-
0,
i < s,
1
(4.20)
1, ... , v.
The "Wald" statistic in this case is the same as in (4.18) except that
B is the matrix used to give
of
B z as the (weighted) least squares estimate
8 = (8 1-8 s ,8 2 -8 s , ... ,8s- 1-8).
s
-
4.3.3
An Example
The data in Table 4.9 (originally given in Kastenbaum and Lamphiear [1959]) represent the number of deaths of baby mice prior to weaning, by litter size and treatment.
Each litter in the study was
observed to have zero, one, or more than one such depletions.
Since we
can determine the number of experimental units for each treatment-litter
e
72
size combination a priori, these variables are treated as factor groups,
where i indexes treatment, and j indexes litter size.
Interest, in this
study, is in determining any: second order interaction bct\\Tccn number of
depletions and a particular treatment-litter size combination.
TABLE 4.9
DEPLETION DATA
Litter
Size
7
Treatment
A
B
8
A
B
9
A
B
10
A
B
11
A
B
0
Number of Depletions
2+
Total
1
58
75
19
5
7
74
101
49
58
14
17
10
8
73
83
33
45
18
22
15
10
66
15
39
13
22
15
18
43
79
4
5
12
15
17
8
33
28
11
77
To form functions of interest, note that the various categories
of response may be given numerical values which have relative information.
w2
==
For example, we may give the value WI
1 for one depletion, and w
3
= 0 for zero depletions,
= 2 for more than one depletion.
One
type of functions of the observed proportions, using these weightes,
might be the mean score, i.e.,
(4.21)
An analysis of interaction based on such a function may be formed by the
73
procedure.
A major drawback of the functions defined in (4.21) is that they
do not account for differm\ces in litter size.
Two or more depletions
in a litter of size 7 may have a different meaning than in a litter of
size 11.
Perhaps a solution would be to make the weights w functions
h
of the litter size.
responses.
An alternate solution would be to model the
The estimated parameter(s) of a simple model may serve as a
more sophisticated mean score and could be used as the functions <Po"
)(,1.J
of the data defining interaction.
Assume that the number of depletions for the (i,j)-th factor group
e,..
1.J
combination follmoJs a binomial distribution with parameter
If m.
J
represents the litter size of factor group (i,j) then we have the following probabilities.
(0)
Yij
m,
= probij(zero
(1 - 8 .. ) J
1.J
depletions)
Prob .. (one depletion)
~J
.m.-1
m.8 .. (1-8 .. ) J
J ~J
~J
(4.22)
(2)
(0)
(1)
Yij
= probij(more than one depletion) - I - Yij
- Y
ij
Let n-ij be the number of litters of size m receiving treatment i.
j
Then the likelihood model for responses nlij,n2ij,n3ij' is
(4.23)
"'-
The maximum likelihood estimate, 8 .. , of 0
1.J
ij
may be considered as a type
of mean score accounting for the different litter sizes.
Solutions to
e
Since the functions ¢ are
moximi zing (4.23) ore given in Table [1.10.
only implicitly defined by
6ij = ¢ij'
these functions a[o evaluated by
iterative methods.
TABLE 4.10
DEPLETION DATA ESTIMATES
Litter
Size
A
Treatment
Predictecl
S.E.
O••
1)
G••
1)
A
.0412
B
.01+ 75
.009985
.008843
.0446
.0370
A
.0605
.0510
.011423
B
.0095 /+5
.0676
.0525
9
A
B
.0867
.0630
.012472
.009917
.0906
.0679
10
A
B
.1130
.0790
.017339
.010780
.1136
.0833
11
A
B
.1570
.1100
.021356
.016764
.1366
.0988
7
8
One will notice that the estimates § .. for each treatment appear
1J
·to be linearly regressed on litter size.
In Table 4.11 weighted least
squares estimates of an intercept and slope for each treatment are
TABLE 4.11
PARAMETER ESTIMATES FOR DEPLETION DATA
Parameter
Estimate
S.E.
a
.0216
.0079
B
.0230
.0034
BA = 0
44.71
BB
.0154
.0030
BB = 0
26.99
A
Hypothesis
SA
2
Residual X
=
3.80 with D.F.
=7
BB
7.57
75
glvnn.
Notice, however, that a COllll\lon intercept for the trentl\lcnts was
estimated.
2
TIle residual X
(X
2
= 3.8 with D.P. =
7) indicates that
both the ilG5111llption of a linear regression model and a common intercept
are valid.
Assuming that the variation in 0 is accounted for by a 1in-
car regression line, we may predict the value of
e
for any particular
l.rciltl11ent-litter size combination using tbe estimates in Table f,.ll.
Tile last column in Table f,.lO gives these predictions for comparisons
wit.h the individual estimated G's.
Because of the good fit of the §ij to straight lines, one is
inclined to form the "no interaction" hypothesis by using (If .19) and
(l,.20).
In this case, the test of equality of SA and BB is identical
to a test: of no second order interaction.
We assume that the "straight
line formulation" is correct and find a significant second order interaction indicated (see Table 4.11).
In sunull.:Jry we note that by such a formulation of the interaction
hypothesis one finds an apparent second order interaction in these data.
Such an interaction was not discovered in more simple formulations (sec
Berkson [1968] or Kastenbaum and Lamphiear [1959]).
Of course, i t is
possible that many experiments are such that some functions of the data
will indicate an interaction and others will not.
Therefore, the
researcher must be able to determine sensible functions which will
reflect true differenc~s of interest.
4.3.4
Discussion
When estimating the parameter
B.. from the likelihood model
1J
(4.23), one can also estimate a goodness-of-fit statistic.
In some of
the faetor combinations of the above example this goodness-of-fit sta-
76
tistic was significant at the a = .• 05 level,
a = .01 level.
model (4.22).
althou~h
not at the
This indicates a possible lack of fit to the assumed
However, the difference in observed and expected outcomes
was unifonn enough to give us some degree of confidence in the relevance
...
of the "mean score" G..•
1J
Since the
...
e..
1J
are felt to reflect the differ-
ences of interest, these functions were used with the variance estimate
precaution of Section 2.3 being observed.
4. 4
~l~:i!lL0J.:-.l?1-~ersion
and the
Multivariate Extension
4.4.1
N~ative
Binomial Distribution:
A
Introduction
As we mentioned in Chapter I, use has been made of the parameter
k in fi.tted negative binomial models to study dispersion charactcl:istics.
When several independent populations are to be studied, research-
ers have' avoided using
a
product negative binomial model by application
of some transformation or set of summary statistics.
Obviously, the
use of the two-stage procedure would avoid this problem by the use of
the individually fit negative binomials as natural modules.
However,
in the study of several interacting populations, samples may contain
data on many different species, all of which are of interest.
Use of
the current techniques to handle such multivariate samples is unclear.
The purpose of this section is to illustrate the two-stage procedure
for this multivariate data problem by assuming the form of the marginal
distribution of a specie is known.
The assumed marginal distributions may be considered to take the
role played by "partial" likelihood functions, given previously, in a
more general sense.
Explicitly, we have used the "partial" likelihood
equations to define implicit constraint functions of the observed pro-
77
portions.
In like manner, fitted marginal models will define such con-
strajnt ,functions upon whicll one may base an analysis.
Data used to
fit thC88 different marginal modc]s may be highly correlated, however,
and such interdependencies must be accounted for in any attempted analyHis.
The Model
4.1•. 2
For the i-th negative binomial, corresponding to, say, a fixed
specie-location combination, define
1T
ij
= probability of observing j counts of individuals of
module unit i in a sample, j = 0, ... ,01-1
1T.
l.m
= probability of observing m or more counts of the i-th
individual type in a sample.
Then the marginal likelihood model for the counts r .. , j=O,l, .•. ,m, in
~J
the i-th module unit is
r ..
m . 1T..
L.(r.,k.,p.)
~
-~
~
1
~J
n. ! IT ~
1
i=O rijl
(4.24)
m
where r.
(r.1, 0"'" r 1,m
. ), n.1.
-1
r (j +
k.)
__~~~1_ •
1T ••
jlr(k )
i
1J
I
j=O
r .. ,
1.J
p~ (1 - p)
1
k
i'
J' < m,
m-l
and
1T
im
=1
-
I
j=O
(4.25)
A
1T iJ'
•
The maximum likelihood estimates, k
A
i
and Pi'
maximize (1•• 24).
To estimate the covariance of ~i and Pi' i
apply the formula (2.3).
the likelihood equations
= l, .•• ,q,
say, we
This formulation requires the derivative of
78
aLi
0,
ok.1 (ri,k.,p.)
1.
1
(4.26)
aLi
with respect to k. and p., j
J
0
(r.,ki,p·)
-1
1
OP i
:=
J
(4.26) with respect to k. and
J
I, ... ,q.
p., j
J
f.
.
Obviously, derivatives of
i, are zero.
Gillings [1972]
gives the fo110\ving recursive relationship for the non-zero differentiais.
dll iO
-i)k- = TI. 0 log (1 - p.)
o
•
1.
1
1.
(4.28)
arT ij _
"k
o i
-
(log(l - p.) +
1.
J,
L
1
-k-+-n-I) TI •• , j=l, ••. ,m-l.
£= l i N -
1.J
(4.29)
m
From the fact that
I
j=O
TI
m
= I we determine
orr im
m-I
----I
dk1.' - • 0
J=
orr ..
-~
()k.
. 1.
(4.30)
()1T im
--=
ar i
m-I dlli.
!:.1
j=O
i
-I
ar
Equations (l•• 30) .represent the functions which define the various
79
functions hi(O) of Section 2.2.
used to determine H(k,p).
Thus (4.28), (1•• 29), and (4.30) can be
If the different modules contained independ-
cnt data, we could use V(n) as given in Chapter II.
Since, however,
the module sets are not independent we may estimate V(n) by the standard method
I
v
N
N L
i=l
where
~i
(X. - X) (X. ~1
-
~1
X)' ,
(4.31)
represents a vector containing "1" in the slot corresponding
to a realized count of an individual in sample i and zeroes elsewhere,
and N is the total number of samples.
In case of independent sets of
llIultinomials, (4.31) gives the standard estimate for a block of the
block diagonal covariance matrix.
We discussed, in Chapter II, the use of the smoothed proportions
acquired by using the maximum likelihood estimate of the parameters as
true values of the modeled proportions.
In a correlated multinomial
mOdel, one cannot do this unless the functional form of the corre1ations between mUltinomials is known.
4.4.3
An Example
To illustrate the use of the extension described in Section
4.4.2, we shall consider data based on an investigation by Henry Lee,
Curriculum of Marine Sciences, University of North Carolina.
This
study involved taking core samples of sand on the beach at the mouth of
•
the White Oak River.
From each core, counts of individuals were made
on three different species of benthic invertebrates, Gemma gemma,
Scolopus, and Heteromastus.
In addition, counts of several other minor
species were recorded, but we will not make usc of this information.
e
80
The experiment involved, among other things, determining dispersion characteristics of the three species at different exposure levels
of the tidal zone.
The portion of the beach sampled may be divided into
three exposure zones:
zone 1, long exposure to air and sunlight; zone
2, moderate exposure to both water and sunlight; and zone 3, long exposure to water.
Data resulting from the core samples of each zone are
given in Table 4.12.
Table 4.13 gives the parameters and goodness-of-
fit statistic for the individual fits of each specie-zone combination
to the negative binomial model.
These fits are by maximum likelihood
estimation of kand p from equation (4.26).
For simplicity, let us double subscript the parameters, the first
subscript representing the zone and the second the species ("1" corresponding to Gemma gemma and "2" corresponding to Scolopus).
Due to the
independence of samples acr06S zones, the covariance matrix will be
block di.agona1 with each block representing the covariance qf the three
species within a zone.
This correlation, because of the sampling pro-
cedure, will be estimated by (4.31) for each block.
Once the covar-
iance has been estimated, we use the methodology of Chapter II to construct Table 4 .1If.
From Table 4.14 one may determine a reduced model to fit much
like the dilution experiment.
Resulting parameter estimates and tests
of equality of k may give rise to some interesting queries.
For exam-
ple, what conditions in the life of lleteromastus cause it to be dispersed the same in all zones?
Secondly, since Scolopus seems to favor
less water exposure and Gamma gemma favors more water, what conditions
cause "pockets" or clusters of individuals in less favored areas?
Cer-
tainly not least in the set of possible questions is that concerning
81
TABLE 4.12
CORE SAMPLE COUNTS OF BENTHIC INVERTEBRATES IN THE THREE ZONES
Core
Gomma gem:na
Scolopus
0
6
1
0
0
0
0
0
4
13
24
4
11
Zone 1
1
2
3
4
5
6
7
8
9
10
11
Zone 2
1
2
3
4
5
6
7
8
9
10
11
-12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
'29
30
10
3
5
13
6
0
6
3
1
1
2
9
2
8
17
5
13
1
15
8
2
5
7
6
0
1
5
0
4
3
0
1
1
6
3
5
1
6
0
7
3
2
0
0
0
23
10
6
12
0
0
111
1
0
1
1
8
2
2
1
2
3
4
5
6
7
8
3
5
3
4
1
4
0
0
0
2
2
5
4
1
1
0
0
Zone 3
9
10
11
12
13
14
8
6
7
0
0
5
9
17
12
4
1
5·
0
0
0
1
0
0
0
0
Heteromastus
0
1
2
3
0
0
0
0
1
2
3
3
2
6
1
3
0
2
5
10
• 3
25
5
7
3
11
3
4.
6
3
8
7
6
11
6
5
7
1.
8
5
6
6
0
1
7
3
1
0
1
8
0
3
5
4
0
e
TABLE 4.14
"TEST OF HYPOTHESES FOR COHPARISON OF 'k'
D.F.
Hypothesis
k
ll
= k 1Z
k
13
= k Z3
k
k
k
31 = 32 = 33
k
k
k
ll = 21 = 31
k
k
k
Z1
12
13
k Z2
= k 22
k
k
= k 33
23
32
xZ
Z
103.6
2
•8
2
36.0
2
67.6
2
1.6
2
2.0
83
jntcraction and correlation of the. ohserved species making life more
(or lesB) favorable for others.
4.4.4
Di~cussion
In this problem, the degree or type of correlation existing
bcl\oJccn specj C'~;
rc~;earchers,
W<lS
not known.
Due mainly to the results of prevjous
the margh1al distribution of each specie was
follow the negative binomial model.
the analysis was carried fODYard.
~lIf;pccted
to
Since this model appears to fit,
Naturally, any assumed marginal model
that explains a subset of the data can be used.
The important point is
that we can deal with complex situations in which a total likelihood
model cannot be ventured, but piecemeal models can.
CHAPTER V
DISCUSSION
.The purpose of this thesis has been to ill ustrate ho\\' somc
obscrved processes may be more conveniently handled by a so called twostage or module procedure.
In studies in which the complete likelihood
function can be written down, one uses some criterion to determine what
level to employ a least squares procedure to simplify any model investigation.
In other studies one is willing to assume only the form of mar-
ginal distribution.
Treating these marginal "likelihoods" as module
models one applies the two-stage procedure to combine margins.
Simple
examples illustrate the use of the procedure in biological studies.
Naturally, more traditional models of processes may be considered.
For example, consider an experiment in which the individuals are
observed T times.
If we assume that the different states of response
may be modeled as a Markov chain with (non-stationary) transition probabilities
TI
.(t,8), where 8 is somc unknown vector of parameters, then
iJ
-
the likelihood for the observed transitions r .. (t) is
1.J
L(r,8)
where C
r
S
C
r
T
S
II
II
s
ri_(t-l)
t=l i=l
1.J
II
-
a~e
r .. (t)
(t,8) 1.J
-
(5.1)
r .. (t)!
j=l
is a function of r independent of
is the number of states, and ri.(O)
TI ..
1J
TI ••
1.J
or 8, r. (t-l)=Ir .. (t-l),
-
1.-
. 1.J
J
assumed known and fixed.
Max-
imization of (5.1) resembles the maximization of T-s multi.nomials and
,
\
85
the lllnximulII likelihood estimates of' 0 may be gotten in much the sallle
way.
If the chain is long (large T), or if the functions
TI ••
1J
(t,O)
becollle very complicated, one may apply the two-stage procedure to this
model also.
chain.
The modules consist of contiguous pieces of the observed
Maximum likelihood estimation of the parameters of a piece of a
chain should be simpler.
"1'11en combining modules, however, procedures
similar to the dispersion example of Chapter IV must be used to account
for the dependence of modules.
Further research should be given to determining which level is
good to usc.
Explicitly, the results of Chapter III are given only for
independent modules.
Yet two examples violate this assumption.
Exten-
sion to the dependent module case is necessary.
In addition, other cri-
teria on convergence rates may be investigated.
For example, the work
by llocffding [1965] could be considered in this light.
REFERENCES
Anderson, T. W.
[1971).
Wiley and Sons, In. c,
The StatisLical Analy.sis of Time Series.
york------.----------------
N~w
John
Anscombc, F. J.
[1948). The transformation of Poisson, binomial, and
negative binomial data. Biometrika 3~, 246-7.91.
BatellJan, G. 1.
[1950]. The power of the X2 index of dispersion test
wllenNcyman's contagious distribution is the alternate hypothesis.
Biomet~ika l?., 59-63.
Beall, G.
[1940]. The fit and significance of contagious distributions
when applied to observations on larvae insects.
Ecol~gy 11:., 460-474.
Beall, Ceoffery and Rescia, Richard R.
[1953]. A generalization of
Neyman's contagious distributions. Biometrics ~, 354-386.
Berkson, Joseph.
[1968]. Application of minimum logit X2 estimate to
a problem of Grizzle with a notation on the problem of no interaction. Biometrics 2 11, 75-95.
Bhapkar, v. P.
[1966]. A note on the equivalence of two test criteria
for hypotheses in categorical data. Journal of the American Statistical Assoc..:!.ation 61, 228-35.
Bhapkar, v. P. and Koch, G. G.
[1965]. On the hypothesis of "no interaction" in three dimensional contingency tables. University of North
Carolina Institute of Statistics }1imeo Series No. 440, 23-28.
Bhapkar, V. P. and Koch, G. G.
[1968a). Hypotlleses of 'no interaction'
in multi-dimensional contingency tables. Technometrics _~Q., 107-123.
Bhapkar, v. P. and Koch, G. G.
[1968b). On the hypotheses of 'no
interaction' in contingency tables.
Biometrics 24, 567-594.
Billingsley, Patrick.
[1961) . St_atistical Inference for Harkov Processes. The University of Chicago Press, Chicago.
Bishop, Y. M. M.
[1969]. Full contingency tables, logits, and split
contingency tables. Biometrics ~~, 383-399.
Bliss, G. 1. and Fisher, R. A.
(1953). Fitting the negative binomial
distribution to bl010gica1 data: note on tile efficient fitting of
the negative binomial. Biometrics 2., 176-200.
87
Bliss, C. 1. and Ch.,ren, A. R. G.
[J.958]. Negative binomial distributions with <l comlllon k. .B}ometrika l~5, 37-58.
Cochran, W. G.
[1973].
Experiments for nonlinear functions.
Societ:Y~, 771-787.
Journal
.9K. th~l~~eric<:l!:l~tatist:}c;]l
Cornel], R. G. and Speckman, J. A.
[1967]. Estimation for a simple
cxponenti<ll model. .!3..iometrics ll, 717-737.
Cox, D. R. and Lewis, P. A. \oJ.
[1966]. The_li-tat_~sti~al Al~_l.Y~h~-i
Series of Events. Methuen and Co. Ltd., Londou.
[1971]. .St3tist.:i.:-c:..~
__Al~1:Y~L~~~Sal~J...9s of Benthic
Freshwater Biological Association, Scientific Publication No. 25, The Ferry House, Ambleside, \\Testmoreland.
Elliott, J. M.
}..:!~~~rtc:.!)E::r_t_~_~.
Epstein, B.
[1967]. Bacterial extinction time as an extreme value
phenomenon. Biomet_rics 23, 835-839.
Evans, D. A.
[1953]. Experimental evidence concerning contagious distributions in ecology. Biomet!:ik~ .9Q, 186-211.
Finney, D. J.
[1964].
Edn., Section 21.5.
Statis.tical Methods in Biological Assay.
Hafner Publishjng Company, New York.
Forthofer, R. N. and Koch, G. G.
functionD of categorical data.
2nd
[1973]. An analysis for compounded
Biometrics 29, 143-157.
Gillings, Dennis.
[1972].
Some statistical methods in health services
research. Ph.D. Dissertation, Exeter University.
Gillings, Dennis.
[1974]. Some further results for bivariate generalizations of the Neyman type A distribution. To appear in Biometrics.
Goodman, L. A.
[1964].
Simple methods for analyzing three-factor
interaction in contin~enc'y tables. Journal of the American Statistical Association~, 3l9~352.
Goodman, L. A.
[1968]. The analysis of cross-classified data:
independence, quasi-independence, and interactions in contingency tables
with or·without missing entries. Journal of the American Statistical
Assod.ation ~~, 1091-1131.
Goodman, L. A.
[1970]. The multivariate analysis of qualitative data:
interactions among multiple classifications. Journal of the American
Statis_S!.~_~l.-A_~soci~~65, 226-256.
Grizzle. J. E., Starmer, C. F. and Koch, G. G.
[1969]. Analysis of
categorical data by linear models. BiomctIics l~, 489-504.
Harris. Eugene K.
[1958].
Biometrics ll~, 195-206.
On the probability of survival in sea water.
88
IlJn~,
Paul and Gurland, John. [19.68] • A method of analyzing untrans'formed data from the negative binomial and other contagious distributions. ~iometrika~, 163-170.
Holgate, r. [1966]. Bivariate generalizations of Neyman's type A distribution. ,Biometrika 53, 241-245.
Iloeffding, Wassi1y. [1965]. Asymptotically optimal tests for multinomial distributions. Annals of Hathematical Statistics )6, 369-401.
Hunt.er, G. C. and Quenoui11e, H. H.
[1952]. A statistical examination
of the worm egg count sampling technique for sheep. .Journ'iL Hel~lli.n_th.
I§.,
157-170.
Johnson, N. L. and Kotz, Samuel. [1969]. Distribution~~~ta~_isti~~;
Discrete_ lHstributioJls. Houghton Mifflin Company, Boston.
Kaplan, Ellen B. and Elston, R. C. [1972]. A subroutine package for
maximum likelihood estimation (NAXLIK). University of North Carolina
Institute of Statistics Himeo Series No. 823.
Kastenbaum, M. A. and Lamphiear, D. E. [1959]. Calculation of chisquare to test the no three-factor interaction hypothesis. Biomet!ics J5, 107-115.
Katti, S. K. and Gur1and, John. [1962]. Efficiency of certain methods
of estimation for the negative binomial and the Neyman type A distributions. Biometrika 49, 215-226.
Martin, D. C. and Katti, S. K. [1965]. .Fitting of certain contagious
distributions to some available data by the maximum likelihood
method. Biometrics ll, 34-48.
Mather, K.
[1949]. The analysis of extinction time data in bioassay.
Biometrics 2, 127-143.
McCrady, M. H.
[1915]. The numerical interpretation of fermentationtube reSUlts. Journal Infectious Diseases !I, 183-212.
Mitra, Sujit Kumar. [1958]. On the limiting pO\~er functions of the frequency chi-square test. Annals of Mathematical Statistics l2.,
1221-1233.
Neyman, J. [1939]. On a new class of 'contagious' distributions,
applicable in entomology and bacteriology. Anm.a1s of Mathematical
Statistics 10, 35-57.
Neyman, J. [1949]. Contributions to the theory /O·f the X2-test. Proceeding of the Berkeley Symposium on Mathematieal Statistics and
Probability, Berkeley, University of California Press, 239-273.
Pah1, P. J.
[1969]. On testing the goodness-of-ffit of the negative
ginomia1 distribution when expectations are small. Biometrics 25,
143-151.
89
G. P. Ed.
[1970]. Random .Counts in Models and SY.l~~c:_t:..ures. The
Pennsylvania State University Press, University Park, Pennsylvania.
)';Il:ll,
Pelo, S.
[1953]. A dose response equation for the invasion of microorganisms. ~io.!lIC'trics.2-, 320-335.
Pielou, E. C.
[1969]. An Intro~_~E_tion to HathematJ.cal
Wiley and Sons, Inc., New York.
Plackett, R. L.
[1962].
~col~.
John
A note on jnteractions in conUngency tables .
.:l.E_~~_l~_nl~f_t 11 e__B-~ .'] ~ .. ~_t a ti s ~~_c~ll ~0 cieJ:.Y_J?. ~, 162-166 .
Rao, C. R.
[1960]. A study of large samplc test criteria through properties of efficient estimates. Sankhya 11, 25-40.
Rao, C. R.
[1961]. Asymptotic efficiency and limiting information.
Proceedings Fourth Berkeley SymposiulU on Mathematical Statistics and
Probability!, 531-546.
Rao, C. R.
[1962]. Efficient estimators and optimum inference procedures jn large samples. Journal of the Royal Statistic<:tl Society_J?.
~, If6-63.
Robertson, C. A.
[1972].
~ries ~ 34, 133-:-144.
On minimum discrepancy estimators.
Sanl~a:
Schiemann, D. A.
[1972]. Viability maintenance in Leptospira AUtUl~:
alis Akiyama A. Ph.D. Dissertation, Department of Environmental
Sciences and Engineering, University of North Carolina.
Selby, B.
[1965]. The index of dispersion as a test statistics.
metrika.21.., 627-629.
Bio-
Shenton, L. R. and Bowman, K.
[1963]. Higher moments of a maximum
likelihood estimate. Journal of the Royal Statistical Societ~ ..2 5,
305-317.
Skellam, J. G.
346-362.
[1952].
Studies in statistical ecology.
Biometrika
12..,
Thomas, Marjorie~ [1951].
Some tests for randomness in plant populations. Biometrika l~, 102-111.
Wald, A.
[1943]. Tests of statistical hypotheses concerning several
parameters when the number of observations is large. Transactions
of the A~erican Mat~ematical~_c:iety li, lI26-482.
Williams, C. B.
[1964]. Patterns in the Balance of Na~urc and Relatc_<!
Problems in Quantitative Ecol~. Academic Press, New York.
i
Worcester, Jane.
[1954].
How many organisms?
Biometrics J.O, 227-23 l l.