Grizzle, J.E.; (1960)Application of the logistic model to analyzing categorical data."

APPLICATION OF THE LOGISTIC MODEL
TO ANALYZING CATEGORICAL DATA
by
James E. Grizzle
Department of Biostatistics
School of Public Health
University of North Carolina.
This research was supported by Cancer· Chemotherapy National Service Center of the National
Institutes of Health under Contract No.
SA-43-PH-1932.
Technical Report No. II
Contrac~ No. SA-43-PH-1932
Institute of Statistics Mimeograph Series. No. 251
April 1960
iii
ACKNOWLEDGMENTS
I am indebted to the members of my committee, H. L. Lucas,R. J •
•. l'.
Monroe, D. B. Duncan, J. E. Legates and E. R. Barrick for their assistance in preparation of this dissertation.
I am especially grate:f'ul to
Dr. Duncan who initiated this 'Work and to Dr. Lucas for .supervision and
help:f'ul suggestions during its preparation.
To the Department of Biostatistics of the University of North Care-
lina and National Institutes of Health Cancer Chemotherapy Service Center I am tha.nk:t'ul for financial support provided by Contract No. SA;.43PH-1932.
I wish to e:x;press my appreciation to B. G. Greenberg for his
e~couragement
.e
-
..
.
and for the opportunity of 'Working with him •
For her understanding and constant encouragement I am particularly
grs.te:f'ul to my wife, Barbara•
e
iv
TABLE OF CONTENTS
,
Page
w
LIST OF TABLES
·
LIST OF FIGURES
vi
•
•
•
•
•
•
•
THE PROBLEM AND REVIEW' OF THE LITERATURE
The Problem • •
•
Review of the Literature •
Analogues to Correlation
Analogues to Regression
SummariZing Remarks
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Conditions on Probability Density and HYPotheses •
Tests of the Fit of the Model
•
'....
Tests on the Parameters' in the Model. • •
•
Remarks
d
ll.
• • • • • • • • •
•
•
•
BASIC THEORY
.e
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
THEORY FOR THE LOGISTIC MODEL •
•
•
•
•
•
•
•
•
•
'. .'
•
•
•
•
•
•
•
•
•
--
'oJ
1
•
1
2
2
•
3
8
•
II
•
•
•
12
15
16
•
17
II
•
•
•
•
•
•
The Model. • • •
•••
••••
•••
Conventional Methods of Testing Hypotheses. • • • • •
A New'Approach to Testing HYPotheses for the Logistic Model
Choice of Root to Complete the Estimation • • • • • •
Remarks.
••••
••••
'• • •
COMPUTATIONS FOR THE NEW' METHOD AND EXAMPLES •
•
•
•
•
vi
•
•
•
•
17
19
20
23
25
•
•
•
27
Choice of Computational Procedure..
•••••
Computations for the One d.f. Test.
••••
Computation for the Two d.f. Test and General Case
•
I.n:\Portant Special case Which does not Require Matrix
Inversion • • • • • • ..~ • • • • • • • •
Shorter MethOds for Computing X"'" and Fit~ing the Model •
•
•
27
27
29
•
•
•
•
•
•
•
•
30
34
NON-CENTRALITY PARAMETERS FOR OBTAINING THE ASYMPTOTIC POWER OF
TESTS OF HYPOTHESES FOR THE LOGISTIC MODEL. • • •
.'.
46
The Utility of Non-Centrality Parameters
• • ••
•
Non-Centrality Parameters for Tests of the Fit of the Model
Non-Centrality Parameters for Power of Tests of Parameters
in the Model • • •
•
•
.'••
•
•
50
•
•
e
v
TABLE OF CONTENTS (continued)
Page
STUDIES ON THE SMALL SAMPLE PROPERITES OF THE TESTS •
Choice of Sampling Populations • • •
Studies-on the Exact Distribution of
Methods • • • • • • • • • •
Results • •
• • • • •
Random Sampling Investigations
•
Methods • • • •
•
• •
Results • • • •
• •
•
Alternative Approach for Small Samples
xi
LIST OF REFERENCES •
'.
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
55
•
•
•
•
•
•
•
•
•
•
•
•
55
56"
56
57
•
•
61
61
66
•
•
•
•
•
•
70
•
•
•
74
•
vi
LIST OF TABLES
Page
1.0.
4.0.
The Incidence of Leukoplakia among Cottonmill ~loyees
Who Use Unburnt Tobacco, Smoke Only, or do Neither. •
Linear Restraints for Testings Hypotheses for Data in
Table 1.0
•••
•••
•
4.1.
Data on Number of Mothers with Previous Infant Losses
6.0.
P Configurations Used in Sampling
6.1.
"Distribution of
6.2.
Distribution of
6.3.
Observed and
6.4.
Parameters Tested, Sample Sizes and Number·of Times Each
~eriment Was Repeated
••
•
• •
.e
•••
xi for n = 1 in Each Cell. •
xi for n = in Each Cell •
Cell Probabilities
•
•
112
•
• 56
58
•
2"
~ected
• 10
59
•
62
•
•
62
•
65
Probability of a Type I Error and Power Observed· for an
.05 Test
••
••
• • • •
•
6.6.
Probability of a Type I Error and Power Observed for an
• •
• •
.01 Test
Mean and Variance of
xi
when H Is True •
o
•
•
66
•
LIST OF FIGURES
6.0.
Observed and
6.1.
95% Confidence Region for ex and 13.
E~ected
Power •
•
•
•
•
•
•
•
69
•
•
71
CHAPTER I
THE PROBLEM AND REVIEW OF THE LITERATURE
1.0 The Problem
In some research areas an experiment may consist of taking samples
from several binomial distributions" each with parameters
and ni "
is the probability of a success in the i-th distribution and
where Pi
ni
is the sample size.
Pi
The eJq>erimental situation envisioned is anal-
ogous to the conventional eJq>erimental designs for which data are
analyzed by the analysis of variance" except that here the data in each
cell are obtained by sampling from a binomial distribution.
A more
general case would be the same ty.pe of eJq>eriment with a multinomial
distribution in each cell.
However" this will not be considered here.
Usually the experimenter will want to test hY.Potheses about the
Pi themselves or about parameters in a mathematical model.
For exam-
ple, in the bioassay of a toxic substance a commonly assumed model is
probit (or logit) Pi
where a
and
the log dose.
~
=a
+ ~xi
,
i = l" ••• "r
,
are unknown parameters to be estimated" and Xi
is
The investigator may want to test
or if two lines are being tested for parallelism" the hY.Pothesis can
be written
where
~l
and
~2
are the slopes of the respective lines.
2
In e:>q;>eriments more complex than the example just given, the treat-
ments may be applied at several levels in several combinations as in,
for exan:q>le, a factorial
e~eriment.
Historically, data of this type have been analyzed by testing hypotheses of independence between row and column ways of classification.
In many respects these tests are analogous to those tested in correla-
tion models.
In recent years the theory underlying this approach has
been greatly extended and methods of analysis that are analogous to regression have been developed.
It is the objective of this thesis to
develop further the regression approach for a particular model.
Before
proceeding to do this, the literature relevant to both approaches will
be reviewed briefly.
1.1 Review of the Literature
l.l.a Analogues to Correlation.
Lancaster and Irwin (1949) show-
ed how to make a separation of the total chi square into single degrees
of' freedom by dividing an r x s
table into
(r-l)( s-l)
2 x 2 tables.
These component chi squares do not add to the total as does the orthogonal partitioning of' Lancaster (1951).
In Lancaster's
(1951) orthogonal partitioning, the total chi square
is divided into several cOIqponents, each cOIqponent being supposed to
test some hypothesis.
After examining Lancaster's (1951) test of inter-
action in detail, Mitra (1955) and Corsten (1957) concluded that it requires main eff'ects to be nonexistent f'or its validity.
e
Ea~lier,
Bartlett (1935), f'ollowing a suggestion from Fisher, had
.~
given an unusual test of interaction that uses the root of' a polynomial
e
3
equation in the test statistic.
Mitra and Corsten concluded that this
test does not suffer from the defect of Lancaster's.
Norton
(1945)
extended Bartlett's test and outlined a computational method for obtaining the proper roots of the equations involved.
Roy and Kastenbaum
(1955), (1956) extended Bartlett's and Norton's results and supplied
a more logical basis for their derivations.
1.1.b
Analogues to Regression.
The regression approaches differ
among themselves in two important respects, the model and the method
of estimation.
Usually it is assumed that treatment effects are linear-
ly related to the response when the response is measured on an appropriate scale.
Choosing the correct scale is a critical issue in this
approach.
The common methods of estimation are maximum likelihood (ML) and
best asymptotic normal (BAN), (see Neyman ["1949J and Bhapkar L195'i!).
Both methods have the same asymptotic properties, but in general BAN
estimates are easier to obtain.
The small-sample properties of esti-
mates and the resulting test statistics obtained by the two methods
have not been extensively investigated.
Berkson
(1955) conducted a
sampling investigation comparing the two methods of estimation for the
Pi
parameters in the model lo~ Q ::: a + I3xi ' and concluded that BAN esi
timates have a smaller mean square error than do ML estimates.
The first approach made to testing linear hypotheses was to make
a variance stabilizing transformation.
Perhaps the oldest and most
Widely used transformation is the arCsin or the arccos.
formation
y:::
att'ccos(l-2.P)
, where
p
The trans-
is the observed proportion,
4
was used by Fisher (1922) in analyzing data from genetics experiments.
Bartlett, Yates, and Bliss used the equivalent transformation
arcsin~
s~ling
y
=
in analyzing a multifactor experiment with data arising by
from several binomial distributions.
This transformation not
only stabilizes the variance for adequate size n, but also, for a large
class of data, provides a scale of measurement on which the treatment
effects are linearly related to the transformed variable, except at
extreme values of
siderably
P
si~lified
•
The problem of testing hypotheses is thus con-
by permitting straightforward application of the
analysis of variance and related techniques.
Eisenhart (1947) gi?es an excellent examination of the arcsin
_e
transformation.
He concludes,
" ••• that while in large samp.:j.es the variance of a transformed proportion in independent of the true proportion
for most practical purposes, the variance of the transformed value, y = 2 arcsinvp" , is still proportional
to lin where n is the number of observations upon which
the observed proportion is based. Consequently, ifPl'
P2' P3'.·. are proportions based on different numbers
of observations ~, n 2 , n , ••• , and if the n's differ
3
widely, then the variance of the corresponding angular
values may be so unequal, because of variation in the n's
that the advantages of the transformation are lost almost entirely•••• II
Eisenhart's investigation shows that for
variance of y
still depends markedly on P
n
as large as 10 the
Therefore, the arcsin
transformation cannot be used validly in an analysis of variance unless
the sample sizes are large and equal for each treatment combination.
e
.~
In his book, Probit Analysis, Finney (1952) gives a method of ana-
lyzing factorial experiments using the probit transformation.
The
main reason for using the probit transformation for data of the type
5
being discussed is to provide a scale on which the treatment effects
are additive • A long history of use in bioassay indicates that the
probit transformation is usually successful in
jective.
acco~lishing this
ob-
The methods given by Finney present no new theoretical pro-
blems, but the
co~utations
for obtaining maximum likelihood estimates
are lengthy.
Dyke and Patterson (1952) assumed that the treatment effects were
P
linearly related to
loge I-P
, or logit P as this transformation is
called in bioassay.
They obtained maximum likelihood estimates of
contrasts and their variances an iterative technique that is very similar to a weighted multiple regression analysis.
A numerical example of
the estimation of main effects, some interactions and their variances
in a 2 4 factorial arrangement is given in their paper.
Both the logis-
tic and probit models lead to test statistics that are asymptotically
distributed as chi square when the null hy:pothesis is true.
Reiers¢l (1954) used the theory of BAN estimation to test hypotheses when the treatment effects are assumed to be linearly related on
the
Pi
or untransformed scale.
The hypotheses for which Reiers¢l
developed tests are parallel to those tested in the analysis of variance, and the test statistics are asymptotically distributed as chi
square when the null hypothesis is true.
Mitra (1955) extended.
Reiers¢l's work to the multinomial distribution.
Bhapkar (1959) ex\:;end-
ed this w0rk still further to a variety of experimental situations,
including the case where the responses have a natural ranldng.
Corsten
(1957) applied a model equivalent to Reiers¢l's to a certain problem
and obtained maximum likelihood estimates of the cell probabilities
6
subject to the null hypothesis being true.
Mitra (1955) assumed that the investigator is interested in studyn
ing variation on the ratio l~P and developed an analysis for 2
factorial eJq)eriments.
For convenience he eJq)ressed hypotheses as,
functions of log l~P and then obtained maximum likelihood estimates
of the Pi
subject to the null hypothesis being true.
have the same form as Bartlett's test for interaction.
His tests all
Previously
Dyke' and Patterson had observed that Bartlett's test could be derived
by assuming that treatment e:f:f'ects are additive on the logit scale" but
they did not demonstrate the procedure.
Mitra and Reiers¢l introduced a new way of obtaining the test
statistic.
To test
Ho :
$1 - $2 = Co
"where $1
meters in the model" they construct functions
fl(P)
and
$2
are para-
and f (P)
2
such
that if Ho is true" f l (p) - f 2 (P) = Co
Estimates of the Pi are
then obtained subject to conditions implied by these fUnctions. In
this way hypotheses are tested about parameters Without estimating them.
In some cases this procedure is more flexible, than conventional methods.
In practice" the methods reviewed in this section and the previous
one have encountered difficulties not implicit in their theoretical development.
Often in the correlation approach" little or no attention
has been paid to distinguishing between ways of classification that are
random and those that are controlled by the eJq)erimenter in constructing the hypotheses to test" and little or no attention has been given
to alternatives to the null hypothesis.
e
.
In the regression approach some
researchers have not stipulated the model underlying the proposed test
statistics •
7
As a case in point, Corsten examined Bartlett's test as applied to
the data in :Bartlett's paper.
He concludes that, since one of the mar-
ginal totals is obviously fixed by the design of the
test statistic is incorrect.
e~eriment,
the
As a result he proposes a new test.
of the tests are correct, but for different models.
Both
The test Bartlett
gives is proper when treatment effects are additive on the logit scale
regardless of whether the grand total or one set of marginals is fixed
and the alternative one proposed by Corsten is correct when the treatment effects are additive on the P or untransformed scale.
Roy and his co-workers, Kastenbaum (1955), (1956), Mitra (1955),
Diamond (1958) and Bhapkar (1959), proceeding along the lines of Cram€r
(1945), ob'tained general results for large samples that encOll1J?ass both
_e
the approaches given above.
Their approach is characterized by (1)
abandoning conditional probabilities in deriving the tests, (2) care:f'ul
differentiation between variates, i.e., success and failure, and fac:.
tors, i.e.j ways of classification that are controlled by the
menter, and (3) specification of the model.
e~eri­
They point out that the
distinction made in (2) may not alter the test statistics in some cases,
but it must be made in studying the power of the tests and in setting
.) up meaning:f'ul hypotheses to test.
To ell1J?hasize this essential concept, .
they divide their results into two general classes, those for single
multinomial samples and those for product multinomial samples. A single
multinomial sample is one divided into
rs
categories where the proba-
bility of falling into any category is
Pij
ZP
= 1. A proij ij
samples are classified into
duct multinomial sample is one in which
s
r
with
categories, and the probability of falling into a category is
8
s
with
E
j=l
=1
P..
J.J
Aitchison and Silvey (1958), and Silvey (1959), using much the same
aJ?proach as Roy and his students, obtained many of the same results for
more general probability density functions.
1.2 SUmmarizing Remarks
It is doubtful that a model exists which is correct for all binomial
e~eriments
with multiway classifications.
However, it should be
possible to ascertain models that can be used for broad classes of experiments.
P
A large volume of bioassay data has shown that both logit
and probit
P
are effective in transforming the observed proportions
to a scale on which the treatment
data.
effects are additive for biological
There has been considerable discussion about the relative merits
of the probit and logit transformations, but we do not Wish to enter
into this controversy here.
Suffice it to say that no important practi-
cal differences in the two models have been demonstrated, see Finney
(1952).
In the sim;pler binomial
e~eriments,
it is found almost invariably
that the relationship between a stimulus and a response is described
by a sigmoid curve.
For this reason the validity of tests made by as-
surning a sim;ple linear model, especially when the stimulus is applied
at several levels, seems questionable.
However, if we measure the re-
sponse ona scale which linearizes the relationship between the stimulus
and treatment effects, the problem will be simplified considerably.
The arcsin" logit and probit transfoJ;'Jllations have been used to accomplish this.
9
The arcsin transformation will not be considered because it may
not provide a linear relationship over the entire range of P, and, if
it is desired to obtain maximum likelihood estimates of parameters, it
does not appreciably shorten the computations.
This transformation does
provide, however, an adequate approximate analysis for many purposes
when the sample sizes are equal or, with suitable weightang, when the
s~le
sizes are unequal.
To present the problem more clearly, the data in Table 1.0 , for
which I am indebted to Drs. Peacock, Brawley and Greenberg (1959), are
given as an example of a type frequently encountered in some research
areas, for example medicine, where conventional "methods do not test the
most meaningful hypotheses.
We will propose a new approach which com-
bines the Dyke-Patterson (1. e., logistic) model with the methods of obtaining estimates of the
P.
work of Mitra and Reiersp1.
J.
subject to restraints suggested by the
Within this framework some new methods will
be developed for testing hypotheses, and the non-centrality parameters
for these tests will be obtained.
Using these results the asymptotic
power of these tests against specified alternatives can be computed if
values can be assumed for the true parameter points.
As a by-product, we Will obtain shorter, more flexible methods of
testing hypotheses for the logistic model and also a shorter method of
fitting the model to data.
Lastly, the sma.ll sample properties of the
tests Will be examined with respect to power and agreement of the distribution of the test statistic with the chi square distribution.
10
Table 1\.0.
The Incidence of Leukoplakia among Cottonmill
EmPloyees Who Use Unburnt Tobacco, Smoke Only,
or Do Neither
Erwin Mills (Durham Branch)
1958
White Males
White Females
Affected Unaffected Total Affected Unaffected Total
.-
Age 25-34
Unburnt Tobacco
smokers Only
Neither
4
4
1
16
89
33
20
93
34
4
1
1
5
54
9
55
41
Age 35-44
Unburnt Tobacco
smokers Only
Neither
13
17
5
28
78
25
41
95
30
11
5
7
34
55
108
60
Age 45-54
Unburnt Tobacco
Smokers Only
Neither
26
23
4
47
66
27
73
2l
89
35
32
90
56
31
8
3
Age 55-64
Unburnt Tobacco
Smokers Only
Neither
26
9
6
47
73
49
27
21
3
5
35
1
31
56
4
36
40
2l
40
45
115
40
93
11
CHAPTER II
BASIC THEORY
In this chapter some of the basic results of Roy and his students
will be presented.
The notation and sty;le will follow closely that of
Diamond (1958) •
All of the theorems Diamond proves require that both
the probability density and the mathematical model satisfy certain analytic conditions.
We will now give these conditions and' follow with a
statement of the theorems.
2.0
Conditions on Probability Density and HYPotheses
Suppose we are given rs
••• , r
0t
and j
= 1,
••• , s"
of
functions
t
e'
= (°1 "
E POj(O)
~
° 1
J=
-
=1
for i
2
(b)
o < c < Pij (Q.) < 1
(c)
Every
Pi/~)
(d)
P12 " ••• j Pls ;
= 1"
••• " t
the
Pij (~) ,
= 1"
••• " r
for all i and j
has continuous first and second order deriva-
tives with respect to the
Note that the
k
i-.L
••• " 0t)" satisfy the following conditions:
s
(a)
Ok for
i = 1"
°1 " ••• "
such that" for all points of a non degenerate open interval
where
e
where
< rs-r unknown parameters
. in the t-dimensional space of the
.
Pij ( 91 " .... " 0t)
Ok
•
of rank t.
Pij
are to be arranged lexicographically (1.e." P "
ll
P21 " P22 " ••• " P2s ;
••• ;
Prl " Pr2 " ••• , Prs)
and the
.matrix (OFcd.(~))
i
.
12
has
rs
rows and t
columns.
k
Suppose we are given u
such that for every point of
<t
i-1
f l (~), ... ,fu(~)'
ek
functions of the
the following conditions are satis-
fied:
(e)
EverYfm(~)
has continuous first and second order deriva-
ek
tives with respect to
(1')
The matrix
:i s of rank
And let ~, where
•
(araO:~») where
u
m = 1,
000,
8Zld k
U
= 1, u,
0
t
•
~ = (e~, ••• , e~),
be a point of
i-1 .
Assuming these analytic requirements are f'ulfilled, Diamond proves
.e
the following.
2.1 Tests of the Fit of the Model
Consider the hypothesis
for i = 1, •.• , rand
H'
o'
j = 1, ••• ,
subject to
f
(e) = 0
m-
,
m = 1, ••• , u
and the aJ. ternative
subject to
f (e) = 0
m-
where
,
<t
,
s
13
s
E
j=l d ij
=0
but not all
dij
=0
,
and where
Then the equations
(2.1.1)
r
s
E
E
i=l
j=l
o,
k
= 1,
••• , t
m = 1, ••• , u
for min1m1zing chi square in the modified sense, which for this case is
the same as maximum likelihood (see Cramer
1:1 945J) ,
subject to
H ,
o
have exactly one system of solutions
such that
~ ~eo in probability as n-.oo
subject to the ratios
n.
-n
~.
being held fixed and the statistic
r
s
E
E
i=l j=l
is in the limit, distributed under
H
With rs-r-t+u d.f.
as a non-central chi 3quare variate
and under
~
o
as a central chi square variate
With a non-centrality parameter
6* = _d' Lr-r'. - B*"*
(B'B* )-lB'J
d
*' _
14
where
,
(2.1.4)
i
... , r
= 1,
.
-
being
Il
P..
p? (e)
~J
ek ; (OP!j(~»)
d6 " 0
ent
-
k"
.
0
P .. ( e) are evaluated at
~J
-
t-u
e~ressed
as
(2.1.6)
denotes the
ek
B2
where
(2.l.8)
ek
= 1,
••• , u
made dependent under
rsxt-u
denotes any
k'
= [.
Ho
'
~ ff- .(~OP~~'----:!!.»)
v P": .
= u+l,
]
0
.
~J
k"
independ-
indicate that the derivative and
With
where
,
0
e0 •
Alternatively B* may be
••• , s
]
e~ressed in terms of the
and Pij (~)
k
= 1,
j
~J
~J
j
=[,..;;;0
2- ~ r;: (OPr (~»)
V
de
~*
rsxt-u
With P~ . (e)
and
••• , t
remaining independent under
u~ -[(Of~~~)J
H
o
,
15
and
F2
uxt-u
=[(~:») ]
0
An important special case occurs where there are no restrictions
on the parameters of the model, i.e.,
=0
set u
dom are
. in B so that
2
= 1,
k"
u
••• , t
= o.
Then B*
= B2
' if we
, and the degrees of free-
rs-s-t
2.2
Tests on the Parameters in the Model
= PoJj .(9)
-
Suppose we are given Pio
J
••• , s
i
= 1,
••• , r
and
J = 1,
Consider the hypothesis
f (9)
m-
,
=0
m = 1, ••• , u <
,
t
and the alternative
~:
where not all the
c
fitting the model
Po
=0
m
0
(
9)
J.J -
Let
Jf-
denote the statistic obtained by
without restraints, and
~ and .~
-
-
be the
unique consistent solution of the equations given by (2.1.1), and
X;
be the corresponding test statistic defined by (2.1.2), then
is, in the limit, independent of
under
H
o
..J
in probability and is distributed
as a central chi square variate with u d. f. and under
~
as a non-central chi square variate with u d. f. and a non-centrality
parameter,
16
where
c'
and B*, Blj B2 and Fl
= (cl ,
••• , cu )
are defined by (2.1.5), (2.1.6), (2.1.7) and
(2.1.8) •
2.3 Remarks on
~.l
and 2.2
Because of their generality, the results given in sections 2.1
and 2.2 constitute a powerful set of tools for the solution of many
problems in the statistical analysis of categorical data.
Both the mo-
del and the hypotheses are com;pletely unspecified except for certain
analytic conditions that must be satisfied.
Thus. the e;x;perimenter or
statistician, as the case may be, has Wide latitude in selecting arealistic model and in formulating meaningful hypotheses to test.
The results presented in section 2.1 give the theory necessary to
test the goodness of fit of a proposed model and also give the noncentrality parameter of the test.
To
test hypotheses about parameters
in a model and to obtain the non-centrality parameters of the tests,
section 2.2 should be applied.
17
CHAPTER III
THEORY FOR THE LOGISTIC MODEL
3.0 The Model
"The product multinomial sample 11., as Roy, Mitra and Diamond have
referred to it, will be assumed to be the probability sampling model.
However, as noted in Chapter I, we shall be concerned only with the
product binomial sample.
a
Hence, the number of events "R" denoted by
and "not R" denoted by b
i
i
' will be the variates and a
is fixed for each cell by the design of the eJg)eriment.
+ bi
i
= ni
Diamond IS no-
tation can be simplified since we are concerned only with the binomial
distribution.
= Q.
=2
In this special cases
, and no double subscript is needed.
~
will be from
i
= 1,
••• , r
, therefore. P
i2
=1
- P
il
Herea.iter all summations
unless otherwise noted.
The relationship assumed to hold between treatments and
[. = loge
i
~
= 1,
Pi
is
••• , r
or written more conu;>actly,
L = X f)
ril
rxt til
where
X
is a matrix of real numbers of rank
t
determined by the de-
sign of the eJg)eriment.
Equation (3.0.1) implies
(3.0.2)
Pi <.~)
1
= ----tr----1 + eJg)(-
E x.kf)k)
k=l
~
which is the most convenient form for applying the results given in
Chapter II.
18
First we must verify that the logistic model given by (3.0.1)
satisfies the analytic requirements set forth in section 2.0.
,
by definition
(b)
2
0 < c ~ Pi (~)
<i
for all real values of xik and
ek by inspection of (3.0.2) ,
dP.(e)
(c)
J. -
=
de k
o2p . (e)
J. -
for kand k'
dekde kl
, and these are continuous •
••• , t
(d)
oP (e)
i
. de-
The matrix
k
D
X
rxr
rxt
Pi (~)Qi(~)
,
=!Ji (~)Qi (~)XikJ
can be w.ritten
where D is a diagonal matrix of rank r
with
on the diagonal and zeros elsewhere, and X is
non-singular and of rank t
DX
= 1,
by hypothesis.
Since t < r
,
is. of rank t •
Suppose we wish to test linear hypotheses,
F
wet
e =0
where
F is of rank u < t
,
tXl
about parameters of the logit model using the results in sections 2.1 or 2.2.
Taking matrix derivatives,
o(F~)
de
= F
~
0
1
and
e
o2(F~)
o2e
=
.
19
these are constants and hence continuous.
oFe
(f)
The rank of de
= F
is
u< t
by assumption.
Therefore; we conclude that sections 2.1 and 2.2 can be applied
to the estimation of parameters and testing linear hypotheses about
parameters of the 199it model.
3.1 Conventional Methods of Testing HYPotheses
Cramer I S modified chi square minimum method of estimation is identical to maximum likelihood (ML) estimation for the case being considered.
The folloWing method of obtaining ML estimates of .the parameters
of the logistic model is similar to the method given by Dyke and
Patterson.
This method, though relatively laborious ca:il be used to ob-
tain the proper test statistic for the tests given in sections 2.1 and
2.2.
Some alternative methods, which in many cases Will be shorter,
for testing the same hypotheses Will be presented later.
TO obtain estimates of the parameters in the model given by (3.0.1)
we observe that the likelihood for a product binomial
s~le
r
IT
is written
•
i=l
Then
and the maximizing equations are
k
= 1,
••• , t •
20
These equations cannot be solved
mate of
e
e~licitly
for
A
e
,the esti-
They are usually solved by the Newton-Raphson method.
•
That is, (3.1.2) is e~anded in a Taylor series about a trial value
!!..• . , non-linear terms are neglected, and the resulting equations are
solved by iteration.
To test the hypothesis
,
i
= 1,
••• , r
,
we use the fact that
,
where
Pi (~)
is
Pi (~)
~
evaluated at
ted as a central chi square variate with
Ho is true.
Given P.
J.
= Pi(e)
-
theses about any SUb-set of the
,is asymptotically distribur-t
degrees of freedom if
, this method can be used to test hypoets
•
The details are much the same
as in normal regression theory except that the regression equation must
be fitted by iteration.
Berkson (1.957) has published a table which is
help:f'u1 in the computations.
3.2 A New Approach to Testing HYPotheses for the Logistic Mode1.
The methods derived here will apply to designs for which linear
functions of
Pi
log ~
= Ii
can be easily constructed such that
i
where the
the
f
i
are real numbers dictated by the model and the design of
e~eriment.
TO test
21
or
subject to
.
we obtain maximum likelihood estimates of Pi (Pi <.~) 'Will henceforth
be written Pi
for convenience.), subject to the restraint
The log likelihood 'With one restraint is
where
A
is a Lagrangian multiplier.
The rn.axi.mizing equations are
giving
ai-~A
n
i
A
=
Pi
,
which still have the Lagrangian multiplier,
ted.
The restraint
(3.2.2)
= 1,
i
A
A
E1 10g AQ
= hk
'
i
or alternatively,
f}
n( Ai)
Q
f1k
i
SUbstituting for
f}i
' we have
lL
= e -k
= gk
,
, that must be elimina-
implies
Pi
... , r
•
22
or
,p*,.. )f~J.
a i - .&.illo
II ( b + f"!fX
:::
i
i
gk
Either of_these equations may be solved for
•
~
•
The extension to the general case is easily seen from the 2 d. f.
test.
To test
H.
o'
or
..
. . ,~;
Ho :
°1 = hl
°2
= h2
Pi
= P.J(0)
.-
,
subject to
the two restraints are
Efrlli = .~
Efr/i = h 2
•
The log likelihood with two restraints is
and
=
which gives
ai - frl~l - f!2A. 2
n
To . eliminate
A.l
i
and A.2 from (3.2.7) either
•
0
23
or
must be solved simultaneously.
3.3 Choice of Root to Complete the Estimation
The question of which root to use now arises.
factorial
Mitra (1955) proved
that for a
2x2
po1ynom~al,
the smallest real root is the proper one to use.
e~eriment,
which generates a third degree
Lancaster
(1951) said that Bartlett's equation had only one real root in the
aSYm.Ptotic case.
prove this.
However, Mitra constructs a counter example to dis-
For the data used in Bartlett t s paper (1935), the equation
has only. one real root.
For larger e:x;periments which give rise to high-
er degree polynomials, Mitra recommended taking the root that minimized
-i?-
as being a safe procedure.
In general the root which maximizes the likelihood in the numeri-
cal sense should be used.
This may be difficult to determine for large
e:x;periments which generate high degree polynomials.
It seems reasonable
to restrict ourselves to solutions of (3.2.5) which yield positive estimates of Pi
and Q
i
when substituted into ~ 3.2.4) •
The folloWing con-
siderations show that (3.2.5) has at most one root satisfying this
24
criterion.
Using the log form of (3.2.5) we have
U
= E~'LIog(ai
~
- ~~)
- 10g(b~ + ~~)-7
~
~
o
-h=
0
.
,
and
1\
1\
and Q are to be positive for i = 1, ••• , r , then,
i
o
dU mus t b e negato~ve, ~.e.,
°
f or any·accept abl e so1ut ~on,
~
U is monoIf Pi
tone decreasing.
Therefore, (3.2.5) can have at most one root satisfy-
ing the criterion
w~
root
~
*
have set u;p for
acc~ptability•
Let us call this
, then
where
and
zi
X; - r-
=
are distributed as central chi squares with 1 d. f.
zi
X;
if H is true.
and
should be used depending on whether the
o
hypotheses discussed in section 2.1 or 2.2 are being tested •.
By an argument similar to the one given for the 1 d. f. case, it
can be shown there is not more than one solution of (3.2.8) such that
1\
Pi
1\
andQi
are positive.
The first partial derivatives always have the same sign in the
25
region of interest.
Therefore, they can both equ.al zero simultaneously
at most, once.
Then
X; =" L:(ffl
AI "+ ff2A~)2 (ai -f*il~*-f*
A* + b +f* i*+f* A*)
""
1 i2 2
1 U 1 12 2
(3.3.2)
xi = X; ~ Jf
and
if H
o
is true.
have central chi square distributions 'With 2 d. f.
x;.
xi
or "
is used as the test statistic depending on
which section of Chapter II,is being applied.
3.4 Remarks
The preceding results can be applied to the solution of other prob,lams encountered in the analysis of data.
The problem of combining
2x2 tables which is often encountered in the literature can be approached as the analysis of a single
of small
e~eriments.
e~eriment
The model assumed for the analysis should include
parameters which measure the differences in
one table to another.
OJ:'
e~erimental
material from
If the logistic model can be assumed, the methods
presented here are appropriate.
(1954)
rather than a series
In other cases the results of Reierspl
Bhapker IS (1959) extension of them may be appropriate.
Occasionally data are encountered from a single multinomial sample in which it is desired to study how the occurrence of events A and
B, where B is in some sense the complement of A, are influenced by the
other classifications.
If we let
t
P
= loge
PA ,
Where
PA and P
B
are the probabilities of the occurrence of events A and B respecB
tively, then the techniques given in this chapter and Chapter IV may be
used, even though they were not developed with this kind of a sample in
26
mind.
However, the non-centrality parameters given in Chapter V are no
longer appropriate.
The methods of testing hypotheses and the computational techniques
in the next chapter may be usefUl in bioassay when two or more compounds
and mixtures of these compounds are administered.
The new methods presented in this chapter have the added advantage
of ease of interpretation as compared to the conventional methods discussed in section 3.1.
The effects of assuming the nu11hypothesis is
true are conveniently displayed on the observed scale even though a
different scale is used in formulating the hypothesis.
27
CHAPTER IV
COMPUTATIONS FOR THE NEW METHOD AND EXAMPLES
4.0 Choice of Computational Procedure
In the examination of
(3.2.5) and (3.2.8) given in section 3.3,
it was shown that not more than one solution of these equations satisA
.
tying the criterion, 0 < P.
<1
, J.
~
exists.
Hence, we do not need to
be concerned with the method of com:putation chosen converging to the
wrong roots, as long as the solution gives
A
o < P.J. < 1
Of course,
in practice, speed of convergence is im:portant.
Scarborough (1958) examines the Newton-Raphson method of computing
the roots of equations.
In the case at hand, it is not possible to
examine the method, generally, for speed of convergence because it varies according to the observations.
However, experience has shown that
the Newton-Raphson method works in practice and is adaptable to either
desk or high speed computers.
In fact, in the well known method of
fitting the probitor the logistic model, the Newton-Raphson method is
used to com:pute the solution of the ML equations, as in example 4.3.
4.1 Computations for the One d. f. Test
To proceed to obtain the solution of (3.2.5) by the Newton-Raphson
method we expand it in a Taylor series about an initial value of A*i'
Aio say, neglect non-linear terms, and solve for (Ai - Aio ) • AiO
may be chosen anywhere in the interval which will make the P's positive,
or it may be taken to be zero if a
the closer the initial guess is to
i
r0
A*
f 0
In practice,
i
, the faster the convergence.
and b
28
Before e2Q?anding (3.2.5), let us write it as
!:fi€-log(a. - f~X) = !:f~log(b. + ~X) + h
~
~
~
~
~
~
•
E2Q?anding this form about the imtial value we obtain
(4.1.1)
f~)- ax!:f~2
U~)
~ lai-xoJ.i
!:fi*log(ai - X
0 ~
,
=
h + !:f~log(bi
+ A0 f~)
~
~
and
!:fr1og ( a i - Xof!> - h - !:fr1og(bi +, Xof!,
S
~t1here
S=I::
and ~ is
~ evaluated at Xo
•
.Aa?Plying the same technique to the alternative form of (3.2.5),
we obtain
where
Rl
= n(ai
R =
2
nCbi
f'!f'
,
- xoft) J.
+ X0 :f'~>
J.
f'!f'
,
J.
f*2
81 = I:: (a
' i
i
_ X f*)
0 i
and
A +
O
ax
i.s the first approximation to
x*
•
Xo
+ aA
isthen
substituted into (3.2.5) and the equation is solved again to obtain a
second correction,
This process is continued until
5~.
5~
is as
small as desired.
4.2.Computation for the Two d. f. Test and General Case
For the 2 d. f. test the algebra is more straightforward if we do
not take the terms of (3.2.8 ) involving bi
to the right hand side.
A solution can be obtained from (3.2.5) if desired.
about the trial points
~lO
and
~20
E.x;panding (3.2.8)
we obtain
•
!!~ - 5~ l!*1 D!!' - 5~ 2!tD~' = ~
(4.2.1)
•
~~- 5~ l!~D!y' - 5~ 2~D!~t=
h2
4-
where
L is a vector of elements
Ii
log
=
~ ,
Q
,
!~
= (fY2' f~2'
••• , t;2)
,
•
D is a diagonal matrix of elements
4-
Pi
A
is .pi
evaluated at
~o
' and 5~i = (~i - ~iO)
pact way of writing (4.2.1) is
( 4.2.2)
.
.
F*~ - ~~F*DF*I=!:
,
where
F*I
giving
= I:
!It:~!.J
,
• A more com-
30
~
The solution to the general case is identical with (4.2.2) except
P
is now defined as
and the elements of
~A.
1
.
D and
.
L
are redefined accordingly.
To
:fUrther bring out the analogy of the form of the solution with mul tiple regression, it might be noted that for most practical problems
h =
°
,.
therefore
( 4.2.4)
As in the other cases we have considered, the first approximation to
This
>'* is ~ = ~ + ~A.
~A.
is obtained.
now becomes
A.
~
and a second correction
The process is continued until the elements of
~).
are sufficiently small.
4.3. Important Special Case which does not Require Matrix Inversion
There is one type of test having more than 1 d. f. that occurs frequently in practice and deserves special mention because matrices do not
have to be inverted in computing the roots.
It is analogous to the test
of homogeneity of a set of treatment effects in the analysis of variance.
The notation will be clearer if we let
ij
where
i
= 1, ••• , u and
j
i
become a double subscript
= 1 , ••• , v
use:f'ul there must be linear :f'unctions of the {ij
,
For the method to be
such that
31
zf*.f
· = OJ'
i J.J i J
(4.3.1) cont.
To test the hypotheses given that
,
Pij
= Pij(O)
H.
o·
or
,
subject tOOl =
timates of the
~2 = •••
P..
J.J
0v
j
= 1,
••• , v
,
' we must obtain maximum likelihood es-
subject to
or
,
,
Zf*
I
i i,v-1 ,i,v-l
The log likelihood with v-l
~
= log$
Zf* (I
- 0
i ivAiv -
M
restraints is
- ~l(Zf~ll·l
'i ) - ••• - ~v- l(Z~
lli , v-l-Z~i
'i )
i . vA v
i J.:I vi vA v
i J. J. - Zf*i'
and
01Jr
dP:-:= a ij
ij
'. n .. P .. - ~J.frJ' = 0
J.J J.J
for i=l, ... ,u and j=l, ••• ,v~l ,
32
and
do/
~
o.t"iv
= a.~v
v-I
+ f'l!'
- n. P.
J.V J.V
~
j=1
J.V
A.
j
There1'ore
=1,
i
and
a
A
P
iv
=
i
V
,
v-I
+' 1'~
~
.-1
J.V
n
••• v-I
A.
J-
iv
j
a.
-:t:7!"~vA.
v
=----D
~v
iv
where
v-I
A.
v
=
-~
j=1
Substituting the estimates 01' P
ij
A.
j
into (4.3.2)
we
have
v
Note that now
t A.
"" 0
j=l..,e1
v
EJ!;panding about the points
we have·
where
and
, where
~
A.
j=l j 0
=0,
33
7.\
15\
nijPijQij
Let US choose the j-th and k-th members of
(4.3.4)
and solve for
5
A
•
k
Then
•
v
Since
A
j=l j
L
v
=0
and any solution must satisfy
L A*
j=l
j
=o
then
v
L 5
k=l Ak
~
0
•
Therefore,
v
o ,= L 5 A
k=l k
=
v
k=L_
l
Tk
Sk - Tj
v 1
v 1
L S + 5A Sj k=L Sk
k=l k
j
l
or
and
where
and
Note that the subscript
be replaced by
j
•
k
has the same range as
As in preVious cases
first approximation to
j
and, in fact,. may
Aj = A'jO + 8Aj
A*.
J
Ii'deSired, the antilog can be taken of each term of
a solution derived for this form.
,
,~:
(4.3.6)
is only the
The solution is
(4.3.2)
and
where
=
Rj
Sj
and ~ are as in
"\
f
I(-
~ (aij - ~jo 1j
J.. 1 b .. + A.. fi<".
= J.J
JO J.J
fJ.t<"J.
)
'
(4.3.4) and (4.3.5) , and
v
1
h= Z k=l RkS k
v. 2
E A.*jSj
j=l.
are distributed asymptotically as central chi square
_2
For tests of hypotheses using sections 2.2 or 2.3, X;
and
xi = ~ -:l
=
variates with v-I d.f. if H is true.
o
. Often tests of interaction can be put into this form as will be
illustrated by
ex~les
4.1 and 4.3. Norton's extension
test can be derived as a special case of
ulas can be obtained from
4.4 Shorter Methods
~f
Bartlett's
4.3.2 and his computing form-
(~.3.6).
for Computing
:l and
Fitting the Model
The method of setting up hypotheses and the computing formulas
previously given can also be utilized in fitting the logistic model to
,data and in co:m;puting the
:l
of section 2.2.
Obtaining
:l
and fit-
ting the model using these methods are not as obvious as testing hypotheses about parameters in the model, but they are easily obtained because of analogies with the analysis of variance.
Recall from section 2.2 that
:l
with no restraints on the parameters.
:l = Z
(a.J.
is computed by fitting the model
A
- niP. ( e) )
J. -
A
:l
Then
A
niPi(~) Qi(~)
2
is given by
35
In the analysis of variance, the error often can be thought of as
co~osed
being
of block x
treatment interactions or in some instances
as high order treatment interactions.
puting
Jt
A more convenient way of com-
is .to set up linear functions of
t.
~
that would compose the
error in an analysis of variance, set these functions equal to zero and
obtain ML estimates of the
P.
J.
subject to these restraints.
Then
The advantage of this method over conventional methods lies in the fact
that
t-u
is usually less than u
Thus the order of the matrices
that must be inverted has been reduced.
prs
Furthermore, m.a.ny times the
can be estimated without inverting matrices, under the hypothesis
of no interaction, by; applying section 4.3.
these
})r s
It should be stressed that.
have been obtained under the hypothesis of no interaction
among effects included in the model.
Although never explicitly stated,
this is really what we are doing in the conventional fitting methods.
Hence, the estimates of the
Pi
used in computing
Jt
can be used to
obtain the provisional logits and weights needed to fit the model and
no :f'u.rther iteration will be required.
This technique is illustrated
and convenient computing formulas are given in example 4.3.
In addition to providing greater flexibility in testing hypotheses
for the logistic model and giving computing formula.s which can effect
some saving in computation" several bits of apparently unconnected sta~istic~
theory have been shown to be related closely; namely"
Bartlett's test of interaction and Norton's extension of it, Mitra's
approach to testing !JY.potheses and the work of many people on the
logistic model.
Example 4.1
Illustration of testing hypotheses in a complex experiment Without inverting matrices
An exam,ple of a type of data that the preceding techniques might
be applied to is given in Table 1.0.
We will apply the techniques
developed in this chapter to test some of the hypotheses that would be
tested by an analysis of variance.
The model assumed is
where
a.~ = effect of i-th age,
T.
J
4
!:
i=l
a. = 0
~
= effect of j-th type of tobacco use,
3
!:
Tj=O
,
j=l
ek = effect
2
of sex,
!:
k=l
e =0
j
•
For this type of data we have no knowledge of what parameters are necessary to include in the model to fit the data.
Hence the complete
model is assumed and, section 2.1 is applied to get estimates of the
Pijk subject to certain restraints, the restraints being chosen so
that the desired hypothesis is tested.
As in the analysis of norroa.J.ly distributed data, the tests of main
effects
~e
not very help:f.'ul in interpreting the data when interactions
are significant.
For this reason it is desirable to start With the
37
tests of high order interactions and be guided by the results of these
tests in setting up the most meaningful hypotheses to test about main
effects.
In the analysis that follows the highest order interaction
is significant.
When this is the case it is difficult to construct
meaningful tests of lower order interactions and main effects.
It might be argued that the highest order interaction is really
error, as it is often considered to be in normal analysis of an unreplicated factorial experiment.
ever, the
.fls
Then one proceeds using F tests.
are not independent.
How-
It may be that whenever the
sample size is large enough for the test statistic to follow chi square
when the null hypothesis is true, independence has been attained.
At
present this is not known.
For the sake of example, we will proceed to set up the linear fUnctions to test the hypotheses and show how to compute some of the test
statistics even though they Will not be particularily helpful in interpreting these particular data.
We Will use the notation,
,
ML estimates of the
in Table
Ziijk
ij
= i •• k
' etc ••
P. Ok are obtained subject to the restraints shown
l.J
4.0.
For all the tests having more than 1 d.f. the roots were computed
by the methods in section
4.3. The roots for the 1 d.f. tests were
computed by (4.1.1) or its alternative form.
shown in the table on page 39.
The roots and
X;' s
are
e
38
Table 4.0.
Main
Effects
Linear Restraints for Testing Hypotheses for Data in Table
1.0.
Hypotheses
Homogeneity of {1..
Age Effects
Linear Restraints
12•• = 13•• = 14-I1-··- 14•. = 0
or
=
{ 2· - - I 4· - --
0
{ 3· - - { 4-· -- 0
Homogeneity of
{ ··1 - { ·-2 = 0
Sex Effects
Unburnt Tobacco
Users vs_
2{ ·1· - { ·2· - { .3·
Smokers and
Non-Users (A)
Smokers vs.
I ·2- . . { .3· -- 0
Non-Users (B)
11 •1 - {1-2 = 12•1 First Order
Interactions
{l.l -
[2-1 [3-1 Age x (A)
Second
Order
Interactions
e
-
/3.2
or
11 _2 - 14-1 + 14•2 = 0
12•2 - 14-1 + [4.2 = 0
13 •2 - [4.1 + 14•2 = 0
2[11· - 112 - - I 13 - = 21 21· - / 22.
- 123· -- 2[31· - 132· - 133· -- 21 41-
- 142 •
Age x (B)
12-2 = 13 •1
= /4.1 - {4-2
Age x Sex
e
=0
- 1 43 -
112 • - 113 • = [22-- 123 • = 132 - - 133 •
= l42. - 143 •
Sex x (A)
2l. 11 - l.21 - l-31
-- 21 .12 -
Sex x (B)
1. 31 = 1. 22 - 1. 32
2/111 - 2/112 - 1121 - 1131 + 1122
'+ 1132 = 2/ 211 - 2[212 - f 221 - 1231
+ 1222 + 1232 = 21311 - 21312 ... /321
mfl331 + l322 + 1332 = 21 411 - 21 412
- l421 - l431 + 1422 + 1 432
1121 - l122 - l131 + 1132 = 1221 - 1222
- 1231 + 1.232 = 1321 - 1322 ':' 1331
Age x Sex
x (A)
. Age x Sex
x (B)
1·22 - 1·32
[.21 -
+ 1332
= 1421 - 1422 - 1431 + l432
To illustrate the computations we will show one iteration of the
Age x Sex x (B) interaction.
several different forms.
Choose
The equations may be rewritten in any of
We will use the form,
Alo =A20 = A = A40 = 0 to start.
30
Then,
40
e
R1
= 2.002247
81
R2
=
.776947
82 --
.681942
R =
3
.313636
8
.846361
=
.042339
R4
h
~I
8
= 18.130559
= (2.002247
= 1.10,
~3
3
4
=
-R18 = 1.887387
1
RS = 3.767198
R~
= 1.915988
lj; 4
= 12.326220
.191373
= -1.83
These values of the
.149754
3 3
1
8h =
~4
=
2 2
- .191373) .1'+9754
= .46,
1
R 8
1 1
= 3.335 058
~'s
=
.27 and by similar calculation ~2
•
are substituted intotlle equations and
they are solved for a second approximation.
For this set of equations
the solutions are accurate for most purposes after four iterations.
by sufficient insight, we had chosen
and
~~
= -1.25
iterations.
~lo
= .5,
~20
= 1.0"
~30
If,
= -.25"
, convergence to the roots would have come in fewer
It might further be noted that these
co~utations
are
easily· programed for high speed computers.
Example 4.2
Illustration of the use of a linear restraint in fitting
a sinq)le model
This example is given because it illustrates fitting models by
setting up restraints that are considered to belong to error and estimating the
F's
subject to these restraints being zero.
Hodges (1958) gives a new method of fitting the logistic model in
a simple linear regression case--by MI..
gives the following example:
To illustrate the procedure" he
41
a
Xi
n
0
10
3
1
10
8
2
10
6
i
i
A
His estimates of the pt s
from the logistic model are
and
A
To obtain estimates of the pt s
line
f. =
~
ex + f3x.~
, we obtain estimates subject to the restraint
( 73- A)~\ (64 _~\~)
+
must be solved for
A
mates of PI
+
A*.
= .414,
A
P2
=
(82- 2~
+
In this case
= .571,
estimates to three places) and the
the line is 3.33.
assuming that they fall on the
A
P
3
2~J
A*
2
= -1.145
which gives esti-
= .714 (agreeing with HOdge t s
x2
for the test of lack of fit of
It is believed that with some practice.this could
.
become a very fast method of fitting 3 point assays when the
equally spaced.
xts
are
It would not be advantageous for assays with more than
three points because each cycle of iteration would require solving two
or more equations depending on the number of points.
Example 4.3
Illustration of the use of the
pts
obtained under the
hypotheses of no interaction in fitting models
Cochran (1954) gives the following data which he obtained from
Dr. Martha Rogers.
42
e
Table 4.1-
Data on Number of Mothers with Previous Infant Losses
Birth
Order
No. of Mothers with
Losses
None
Total
2
Problems
Controls
20
10
82
54
102
64
3-4
Problems
Controls
26
16
41
30
67
5+
Problems
Controls
27
14
22
23
46
49
37
It is desired to compare the mothers of children in Baltimore
schools who have been referred by their teachers as presenting behavioral problems to mothers of a comparable group of control children.
For each mother 1 it is recorded whether she had suffered any infant
losses previous to the child in the study.
Suppose it is desired to fit the model
where (Xi
i
=1
or 2
is the effect of problem or control depending on whether
1
ap.d f3 j
is the effect of the
j-th birthorder ,
j = 1,
Fitting the model will use 4 degrees of freedom 1 leaving 2 for
error.
These two may be thought of as (Problem vs. Control) x (Birth
order) interaction.
To estimate the
P's
I f there is no interaction
subject to these restrictions we must solve
43
for
is
),,*1
),,*1
' ),,*2
and
= -.5 057,
A.-*3
),,*2
where
= -1.2126
),,*1 + ),,*2 + ),,*3
and
),,*3
=0
= 1.7183
•
and the estimates
1\
of P..
1\
J.J
under the restriction of no interaction are
1\
1\
P12 = .11K33,
Pa = • .4062,
P22 = .3215,
= .421K3. ~
with 2 dS. is .85.
The solution
Pu
=
1\
.2010,
1\
P = .5160 and P
31
32
The logits corresponding to these
1\
pIS
are
-1.38, -1.75, -.38, -.75, .06 and -.31 •
1\
Earlier in this chapter it was suggested that estimates of the
pIS
subject to restraints which ordinarily are called error could be used
to get ML estimates of the parameters in a model without iteration.
Let us derive a form convenient for this purpose.
(3.1.2) may be written
where
~P
is a
rxl
X'
~P
txr
rxl
=
0
vector of elements
n.J.Lr p. J.. - P.J .(e
)J.
-
i
1
= 1,
Expanding this equation about a triaJ. point . ~
X'
txr
rxl
X'
D
l
X
txr
rxr
rxt
.
, we obtain
~
txl
.
=
0
txl
has elements Lni(Pi - Pi(~»J 1 Dl is a matrix with
•
•
n.p.(e)Q.(e) on the diagonaJ. and zeros elsewhere; :::.eo is the txl
J.J.- J . • •
•
vector (~-~); and Pi (~) is Pi (~) evaJ.uated at e • Add
•
•
(X' Dl X)~. to the left and X'Dl {* , where l.* is the vector of prowhere
~:P
0· -p
••• 1 r
visionaJ. logits, to the right hand side of the above equation.
Thus
44
we obtain
.
.
l*
(X'D1X)~ = X'2p + X' D1
Now let
D1
= f*
+ 1- -p
a·
K* = _"lEL
·-1
- D1 -p
a·
L*
-
then
(X'D x)e = xla·
1 -
-p
+
XID
1
(L
-*
_ ~-la·)
1 -p
and
1\
Let us use the
obtained under the no interaction
pIS
for the elements of D and a·
1
-p
multiple regression form so that
Let ex
new parameters.
,
X
hy'po~hesis
reparameterize the model into a
is of rank 4, and solve for the
= OJ.. - Ct2 and
T1
= 131
- 132 and
T2
=
131 - 13 •
Then
X=
1
1
1
1
1
1
-1
1
-1
1
-1
1
1 1
1 1
-1 0
-1 0
0 -1
0 -1
a.nd the equations
[
71.93772
-17.62.000
- 1.72980
3.i8720
-17.62000
71.93772
- 2.17140
- 5.10068
- 1.72980
- 2.17140
'50.65988
24.46504
-52.4868
3.5318
=
-23.0773
[ -34.6770
3
have the solution
1\
I.l. =
-.752
,
1\
<X=
-.185
,
1\
T
l = -.187
,
1\
T
2 = -.627
The predicted logits are
ill
=
-1.38
, i 12 = -1.75 , i a = -.38 ,
K22
=
-.75
which are exactly those used to obtain D and £p for this solution.
l
Any further cycles of iteration would give the same estimates unless a
more accurate table of logits was used.
If it is desired to fit models
to data, this technique may be used to shorten appreciably the computations in many cases.
46
CHAPTER V
NON-CENTRALITY PARAMETERS FOR OBTAINING THE ASYMPTOTIC POWER
OF TESTS OF HYPOTHESES FOR THE LOGISTIC MODEL
5.0 The utility of Non-Centrality Parameters
In Chapter II the non-centrality parameters derived by Diamond
(1958) were given as general forms with
Pij(~)
except for certain analytic conditions.
Since the asymptotic power
being unspecified
is a function of the non-centrality parameters and the d.f. only" the
non-centrality parameter provides the reference point needed for
empirical investigations of the small sample power of tests.
Without
such a reference point" the way in which various parameters" viz. sample size, cell probabilities" hypotheses and alternatives" affect the
power of a test is difficult if not impossible to determine and empirical stuqies become sterile exercises in computation.
Also the non-
centrality parameters may have other important uses such as" for example" designing experiments which maximize the power of the tests.
This
particular aspect may be of interest in bioassay.
In this chapter we will obtain the non-centrality parameters for
tests on the logistic model from Diamond's general forms.
Studies of
the small sample power of the tests will be deferred until the next
ohapter.
5.1 Non-Centrality Parameters for Tests of the Fit of the Model
To test the hypothesis
Pi
:=
p.(e)
~
-
47
subject to
f (e)
=0
f (e)
=0
,
magainst the alternative
m = 1., ••• , u <
,
t
subject to
m-
and
d
i
+ dt = 0
i
where
,
and
for "any €I in
1-1 ,
we note that
where
d and B* are defined by (2.1.3) and (2.1..4).
B*
:2
Recall that
-1.
B2 - B1.F1. F2
B1 , B2 , F1. and F are the general forms defined by (2.1..6),
2
(2.1..7) and (2.1..8). Let
where
f
(e)
m-
= 0
,
be the linear functions of the
m = 1., ••• , u
8k
'
•
•
where the matrix of coefficients,
F , is of rank u
• We may write
48
(5.1.1) more compactly as
Fe:: 0
til
uxt
u
of the
ek
may be arbitrarily chosen as the dependent
as the matrix of coefficients is of rank u
•
eI s
as long
This corresponds to the
partitions
F2.l
u
t-u
and
e
t
~l
=
••
til
t-u
~
Therefore,
,
giving
dFG
~l
dFe
and
= Fl
~
On substituting the logit model for
P. ( e) and taking partial. de~-
rivatives" we obtain the matrices
•
•
•••
~
oo
n -...:t'rru
~
oo
- . P g-x
n
= F2
r-...:t' ru
••• _ - g-p. x
When multiplied together
d'B.
-
i
= d*'X
i
for
1 and j
=
1, 2,
* ,
or no subscript, and
d'd ,.
where
d*' D -ld*
w
'
:Jl n. ,
d* = r i d , ••• , nr d
1
n
- Ln
w is an rxr diagonal matrix with elements .2:.
n P~Qo1
J.
D
on the diagonal,
(5.1.6)
where
i
j
= 1,
= 1,
••• , r
••• , u
and
~
rxt-u
= LXijJ
1
j
= 1, ••• , r
= u+l, ••• ,
t
•
50
Then
b.*
= -d*"D
-1_
Lw
For the special case u
B =
X (X'D X )-lX'J d*
* *w·X*-
= 0 , we obtain
•
•
2rxt
•
...
...
and
b.*
= -d*' L'w
'D -1
_ X(X'D xrlxtJ d*
w
-
where
5.2 Non-Centrality Parameters for Power of Tests of Parameters
in the Model
TO test the hypothesis
H·
o·
F
uxt
e =0
til
against. the alternative
~:
where not all
c =0
I
F
uxt
e = !
til
-vn
c
uil
the general non-centrality parameter is
51
as given by (2.2.1).
-1
On substituting for
Fl
,Bl , B* and their pro-
ducts we find
•
A
hY.Pothesis that occurs frequently is
against the alternative
k
= 1,
••• , u
•
The non-centrality parameter is given by
,
c = [Cl' ••• , cuJ, Bl is as in (2.1.6) and B2 is as in
(2.1.7)~ This is'a special case of (5.2.2) in which F = I . and F
l
2
= 0 • The non-centrality parameter for the logistic model then be-
where
comes
•
Exanu;>le 5.1
To obtain the non-centrality parameters for the test in bioassay
against the alternative
c
J3=J3 + -
0yJii
we set
52
,
and
X2 = (1,
Mu1tiplying
1, .. 0' 1)"
the matrices together and simplifying, we
obtain
~
=c
2
(~
. 1
~=
r
2
Z w.x. i=l ~ J.
w.x.
~ ~
where
n.
= .2:.
pOQo
n
i i
w-t
...
Let us now define
, then
Since the power is a monotone increasing function o:f
~
, the above
result shows that if an e:x;periment is planned to minimize the variance
of b
, the estimate of
~
, the power is
~zed.
Example 5.2
To obtain the non-centraJ..ity parameter for the test of paraJ.lelism
of two lines, i.e.,
against the alternative
H •
n·
in (5.4.2) we let
~
1
c
2-vn
- 13
= -
,
53
,
and
o
1
o
1
o
o
•
•
•
•
..~
=
~k
0
~
::
x21 0
x22 0
•
1
1
•
and multiplying the matrices together and simplifying we obtain
where
54
,
and
n
o
i
=_JpoQo
wij
n ij ij
•
, we have
Hence we see that the power of the test is maximized when the sum of
the variances of b
1
and b2 is minimized.
55
CHAPTER VI
STUDIES ON THE SMALL SAMPLE PROPERITES OF THE TESTS
6.0 Choice of Sampling Populations
In order to find the sample size at which the distribution of the
test statistics is adequately approximated by the chi square distr1bution and to check on the agreement between the observed power and the
power predicted from non-central chi square distribution using the
non-centrality parameters given in Chapter
V~
some sampling eJq)eriments
were conducted for a 2x2 faGtorial eJq)eriment.
The four cell probabilities were obtained by assuming
logistic
that~
in the
model~
P ..
lO~~Q =}4 + a. + 13. + I iJ.
:ij
1.
J
where
i (and j)
a=
= l~
2
01 - a2 = -.3
13 = 131 - 132
=
.6
and
o
in all the configurations sampled.
obtained by letting
~
= 0 ~
configurations respectively.
ample~
by
~
= -1
The individ ual configurations were
and
p. = -2
in the
A~ B~
and C
Configuration C was cOq>uted ~ for ex-
56
e
Pll
log - = -2 + (-.3) + .6 = -1.7
,
P12
log - = -2 - (-.3) + .6 = -1.1
e ~2
,
e~
log
e
P21
21
Q = -2
+ (-.3) - .6 = -2.9
P
log -22
= -2 - (-.3)
e Q22
The
piS
J
- .6 = -2.3
corresponding to these logits were obtained from Berkson's
tables (1957).
Using this technique J the
P con:figurations
shown in
Table 6.0 were obtained.
Table 6.0.
e
P Conf'igurations Used in Sampling
Con:figurations
.
A
B
C
Pll
P12
P21
P22
.57
.33
.15
.71
.29
.13
.052
.43
.21
.47
.25
.091
These cell probabilities were used in investigations of' the exact sampIing distributions f'or
e~irical
n = 1
and n = 2 in each cell and in an
investigation of' signif'icance level and power f'or larger sam-
ple sizes.
6.1
6.1.a
Studies on the Exact Distribution of'
Methods.
·xi
The distribution of' the test statistic was found
in two cases by enumeration of' all possible outcomes f'or
n = 2 in each cell.
n
=1
There are 16 and 81 outcomes respectively.
and
The
57
data were analyzed by assuming they had been generated by configurations
A, B, and C.
Only the null cases were investigated.
hypotheses tested were a0
= -.3,
To test these hypotheses,
ML
~
=
.6, and I
That is, the
0 0
estimates of the
= 0
P
ij
were obtained
sUbject to the following restraints:
( 6.1.1)
1.12 +
12l - 122
=
4(ao )
,
III
+ f12 -
12l - 122
=
4(~
,
fll
- 1 12 -
f 2l
~:
III -
H~ :
0
0
HI :
0
+ 122 =
0
)
4(I )
0
The root of a third degree polynomial which is required to complete
the estimation was computed by methods given in section 4.1.
Since we are assuming existence of both main effects and interaction, there is no
X2
for lack of fit and the test statistic is
58
e
Table 6.1.
Distribution of
xi for n = 1
in Each Cell
Configuration
A
Test
xi
I
fCxi)
C
B
Vari..
Mean ance fCxi)
VariM3an ance fCxi)
0.00
2.89
5.52
.88086
.09157
.0275 6
.42 1.43 .92304 .27
0.00
2.15
7.43
.81807
.16679
.01513
.47 1.39
0.00
4.00
.89953
.10046
VariM.ean a.nce
.96
.97627
.01823
.00550
.08
.31
.30
.95
.96375
.03323
.00302
.09
.31
.40 1.44 .93508 .26
.97
.97997
.02003
.08
.31
.05914
.01718
•882lJ8
.10773
.00978
.06491
=2
The frequency distributions for the case n
are given in Table 6.2.
The frequencies for A, B, and C models followed the same general trend,.
except that they became more skewed, with the frequency of
.xi
= 0
increasing and the frequency of the larger values decreasing correspondingly, as the
P's were shifted more toward zero.
It will be observed in Tables 6.1 and 6.2 that the
not have as many possible values of
.xi
I
as the other tests.
a symmetry present in this test that is not in the others.
thesis tested was
tests.
I
= 0 and not I = I o
~
'
I0
~
'f'
0
There is
The hypo-
• as in the other
,
This gives rise to more configurations having the same
in fact twice as many, as the other tests.
and
test does
If we had tested a
.xi '
=0
= 0 , they too would have had fewer possible values of Xl2 ,
but the results would not have been comparable with the random sampling
studies.
60
The exact distributions of
xi
for these cases show several things
that would be difficult to prove conclusively with sampling experiments
unless an enormous amount of sampling was done.
The distribution is
not the same for all the tests in a single configuration.
n
=1
xi
the means of
For the case
for the various tests are to be related to the
size of the treatment effect and the variances are all almost equal.
In the case
n
=2
the effect of the size of the parameter being test-
ed is even more pronounced with both the means and variances of
varying directly as the size of the treatment effect.
xi
pected, the moments of
as the assumed pI s
tion of
xi
when
As would be ex-
depart more from the moments of chi square
are shifted more toward zero.
n = 2
xi
Also the distribu-
is not unimodal.
From this investigation it is apparent that the distribution of
xi
is not closely approximated by chi square in experiments with small
numbers in the
more cells.
cells..
This mayor may not be true for experiments with
Considering the assumptions made in deriving the asymptotic
distribution of
xi ' these results are not surprising.
However, one
always hopes that the large sample theory may hold reasonably well for
small samples or that some modification can be made which will improve
the approximation.
We have not been successful in discovering such a
modification.
Since increasing
n
in each cell rapidly increases the amount of
enumeration and computation, no further work was done toward obtaining
exact distributions.
pIing investigations.
The remainder of the work was done by random
sam~
61
6.2
6.2.a
tained.
Random Sampling Investigations
Methods.
The original configurations of
COnfigurations~, A
2
time to time.
p's
were re-
, and C and C will be referred to from.
l
2
The sUbscripts refer to different investigations on the
A and C configurations.
The difference between
and A , and C and
2
l
~
C is shown in Table 6.4.
2
In the sampling studies that follow, random binomial variates were
generated by a program obtained from the Royal McBee Company for use
with their computer, the LGP-30.
The program generates variates by
squaring a random number, a, multiplying
a
2
by
c
where
c
is
so~e
large number, and, if there is any overflow, rounding off the most significant part of
2
a c
and then adding the result to
a
2
The result
of these operations nmv becomes the number operated upon and the whole
process is repeated once more.
middle of this generated number.
Several digits are extracted from the
These are presumed to be a variate
from a uniform distribution with limits
maximum. size of the extracted number.
0
< x < R , where R is the
The uniform variate is transform-
ed to a binomial variate by comparing the generated number,
a number
If
0
<
r
x
recorded.
is chosen so that
~
a failure is recorded or if
r
"where
<r
r
x
, with
is the cell probability.
<
x
<R
a success is
By the ,use of this binomial variate generator, the predeter-
mined size sample was taken for each of the
4 cells of a 2x2. factor-
ial eJq>eriment.
The eJq>ected cell probabilities and those observed from sampling
are give'n in Table 6.3.
The hypotheses tested, sample sizes in each cell and the number of
times each experiment was repeated are given in Table 6.4.
e
Table 6.4.
Parameters Tested, Sample Sizes and Number
of Times Each E;<;periment Was Repeated
Sample
No.
Size in Times
Each
Exp.
Cell
Rep't'd.
Configuration
(J.
CXo
01.
13 Test
130
131
A
l
0
-.3
.1
.6
.5 .
0
.3
10
A
2
0
-.3
0
.6
.5
0
.9/. 8
20
50/5 0
B
-1
-.3
0
.6
.5
0
.8
30
100
Cl
C2
-2
-.3
0
.6
.5
0
.8
45
100
-2
-.3
.5
.6
0
0
-.3
45
100
cx Test
I Test
Io
Il
50
TO .check on the significance level of the test I the parameter values shown under (Xo' 13o and
I0
in Table 6.4 which were used in
generating the samples were tested by obtaining ML estimates of the
P
ij
sUbject to the. restraints given by (6.1.1), computing
xi
and
~ exceeded the tabular values of chi
noting the percent of the time
square.
The power of the tests was estimated by obtaining estimates
of Pij
subject to
(6.2.1)
~:
111
-
112
+ 121 -
H :
131
lil
+ 112 -
HI :
1
lil - 112 - 121
121
-
122
=
1Ja.L
122 =
4131
+ 122
These estimates were used in computing'
by noting the percent of the time
xi
=
4I
l
•
~ and the power was estimated
exceeded the tabular value of
chi square.
Configuration
1\
was performed as a pilot trial to observe the
random variable generator and the computations required to perform the
test.
If these appeared to work properly an idea could be obtained
about the sample sizes and alternative parameters to use in a more extensive sampling program.
It appeared from the results of A
l
that
the sample size of 10 was much too small and the alternative parameters
C1..:I 131 , and Il:l were too close to the parameters used in gen-
erating the sample.
The sample size was increased to
alternatives were changed as shown in A •
2
a:f'1ier the fiftieth
e~eriment
pI S
from.9
and the
However, a check of A
2
revealed that the observed power was high-
er than anticipated against the alternative
decreas~d
20
to.8 for the remaining 50
I
l
•
I
l
was, therefore:l
e~eriments.
Because the
were shifted more toward zer0:l the sample size was increased to
64
30 for the
B configuration.
Before decreasing the cell probabilities still further to obtain
C configurations it was decided to try to relate the agreement
l
with chi square to some parameter which is a :function of both the samthe
ple size and the cell probabilities.
The parameter chosen was
•
This was chosen partly from intuition and partly because functions of
~Q occur in the moments of the goodness of fit test, see Irwin (1950).
For
A.J,
Z
= 1.79;
for A2 , Z = .89, and for
B, Z
= .78 •
thumb for determining sample size it was decided to tryZ
the
Cl
which
As a rule of
< 1 . For
C cases, 45 observations per cell is the number for
2
Z = 1 . Applying this rule to A
and B would have given samand
ple sizes of
2
18 and 23 in each cell.
The change of parameters from
C
l
to
C
2
was to observe the
effect of switching the alternatives on the power of the test and to
get a point on the long middle section of the power curve that had not
previously been examined.
The observed probability of a Type I error for .05 and .01 tests,
and the power of these tests against the alternatives shown
tained by observing the percent of the time
xi
exceeded 3.84 and 6.64,
the .05 and .01 tabular values of chi square respectively.
The non-centrality parameters were computed by letting
Xl
.e
and
= (1, 1, -1, -1)
were ob-
'-
•
t
TabJ.e 6.5.
Ie
Probability of' a Type I Error and Power Observed f'or an .05 Test
ex Test
Obs.
Type I
Error
~
Obs.
Power
Obs.
Type I
Error
~ect-
ed
Power
~l
e
I Test
Test
ed
Power
.Obs.
Power
Obs.
Type I
Error
~ect-
61
~ect-
61
ed
Power
Obs.
Power
15%
98'/0
93'/0
24'f,
94'/0
9CY/o
--
~
4%
.37
9.5'/0
l~
0
.09
5.5'/0
2'f,
0
A
110
1.61
25.0%
15'10
10%
.18
7.2%
10%
2'f,
.80
14.116
ll.43
B
9'/0
1.85
29.0'/0
31'/0
4%
.21
7.5'/0
7'/0
4'f,
13.18
9610
8810
Cl
6%
1.43
23.0%
32'/0
510
.16
7.CY/o
9'/0
3'/0
10.17
89%
76%
C2
4%
10.17
90.0%"
89%
8'f,
5.72
67·0tf0
(f:;'/o
4%
1.43
23'/0
24%
2
Table 6.6.
Probability of' a Type I Error and
~
ex Test
Obs.
Type I
Error
ed
Power
.,..
Obs.
Power
Obs.
Type I
Error
~ect-
Po~r
Observed f'or an .01 Test
Test
I Test
ed
Power
Obs.
Power
Obs.
Type I
Error
~ect-
~ect-
ed
Power
Obs.
Power
-
~
0
2.5'/0
0
0
1.210
0
0
4.7%
610
A2
0
9.5'/0
2'/0
4%
2.(YJj,
2%
0
89.0%
79.0'/0
86%
82'/0
B
0
U.O%
16'/0
2'fo
1.5%
?!/o
0
85. 0 '/0
82'/0
Cl
0
8.5'1;
15'/0
0
1.7%
0
0
73.0 'fo
7ff1,
C2
2!fo
73. r:Yf,
79'/0
0
43. r:Yf,
55%
0
8.5%
4'/0
0'\
VI
- 66
~ = [1,
1,
l~
1, 1,
-1, 1,-1
1, -1, -1, 1
for the test of
f3
, for example, and applying (5.2.3).
The expected
power was obtained by referring the non-centrality parameters,
.6:t '
to Fix's tables (1949).
6.2.b
Results.
The non-centrality parameters, observed probabil-
i ty of a TYPe I error and the power for an .05 test and an .01 chi
square test are shown in Tables 6.5 and 6.6.
ple means and variances of
xi
(See page 65).
are given in Table 6.7.
The sam-
The discussion
of the results will be confined to the .05 test since the .01 test results seem to be similar to those for the .05 test.
Mean and Variance of
Table 6.7.
e
ex
Configuration
Mean
Variance
Mean
xi when flo Is True
f3
Variance
Mean
I
variance
~
.79
1.02
.86
.78
.93
.98
A
2
.57
.61
1.30
3.27
.95
1.27
B
1.18
2.10
.89
2.10
.85
1.18
C·
1
1.40
1.84
1.26
1.56
1.15
1.39
C
2
1.29
1.99
.98
1.73
1.16
1.36
Even though. the experiment was repeated only 50 times, it is clear
from the results of
e
.
~
that a sample size of 10 in each cell is not ade-
quate for the large sample theory to hold for this configuration of P's.
Instead. of getting an .05 test, the test is probably about an .01 test •
67
This is understandable when we look at the first two moments.
The mean
is not so far from the e:x;pected value of 1 as to be outside the range
of sam,pling variation, but the second moment is about the same size as
the first, rather than twice as large, which would e:x;plain the very conservative test obtained.
That is, the mass of the distribution of
for this sanwle size and P
xi
configuration is more closely concentrated
about the mean than chi square.
The observed power would seem to be at
least as high as the e:x;pected power obtained from the computed noncentrality parameters.
There is considerable variation in the performances of the tests
of the
A
2
configuration when they are looked at . individually.
stead of the e:x;pected
1%, 10% and
2.0~
5%
Type I errors ,for the three tests, they were
for the a , f3 , and I tests respectively.
average Type I error rate for the e:x;periment is 4P/o.
as a whole,
xi
In-
for a sample size of 20
However, the
For the e:x;periment
follows chi square adequately,
but when the tests are considered indiVidually the number of TYPe I
errors may deviate considerably from the e:x;pected
The performance of the test in the
isfactory.
5%.
B configuration was very sat-
The null hypothesis was rejected 4P/o of the time for the f3
and I test, and
9%
of the time for the a
Type I error for the e:x;periment
test.
5.7% which
This would make the
is certainly adequate agree-
mente
The number of TYPe I errors for C and C fall in a range accept2
l
able by most standards.
From these results of C and C the rule of
l
2
thumb for a one d.f. test,
.E
iJ'
P1 Q
nij ij ij
<1
' does not seem too
68
conservative.
Further work needs to be done for larger experiments and
tests having more than 1 d.!.
In the meantime it appears that the
rules given by Cochran (1954) would not lead one astray even though
they were not formulated with these tests in mind.
When the sample size is large enough for distribution of the test
statistic to be adequately approximated by the chi square distribution
when the null hypothesis is true, the non-centrality parameters predict the power of the experiment with adequate accuracy.
The agree-
ment between predicted and observed power is shown graphically in
Figure 6.0.
It is generally good except that the test for interaction
has lower power than predicted.
However, even for this test the dis-
agreement is not enough to negate the usefulness of the non-centrality
parameters and, according to these data, they should predict the power
to within 7% for a 5% test and to within 3% for a 1% test, on the
average.
The number of Type I errors, power of the test and the moments are
consistent in the sense that the tests with large variances tended to
have more Type I errors and those with small variances had fewer.
The
number of Type I errors for both the .05 and the .01 test seems to be
related to the absolute size of the parameter being tested.
This is in
accord with the results of the exact studies described in section 6.1
where it was observed that the size of the treatment effects seemed 'to
influence the mean and variance of the tests.
The agreement between the distribution of
xi
observed in sam-
pling and chi square is not as close as one would expect intuitively
by looking at the .05 points and the first two moments of the exact
'e
e
'e
1.0
.9
~
.8
.7
a:
LaJ
~
3= .5
.4
Q(I
.3
.2
.I
o
,;
~/
'r;pfJ
,..'
.~
,
~./
0
[I
~
LEGEND
Expected Power Curve for
I d.f. Test, d=.05.
---Expected Power Curve for
I df. Test,d =.01.
00 Points from A2 Configuration
./
~
(1[1 Points from B Configuration
./
Q ~Points from C I Configuration
.//
• • Points from C2 Configuration
ttl'
0'S Indicate observed values for
/~
et= .05. and
/
D'S indicate observed values for
d. =.01.
~/ 110
2
,..-0
~,..
,/
/
~/
/
(I ......... ~- .
....... .......
""' ......
ttI'/
./
.... ---IS,.... .
J'
/"
V •
/
~
V
V
/
?
.6
~
........ ........
........
0
""".".: ...
3
4
5
6
7
tq
8
9
10
II
12
Figure 6.0. OBSERVED AND EXPECTED .POWER
13
14
15
$
70
distributions for
n = 1
and 2
The reason for this lack of agree-
ment, if it is other than sampling variation, is not apparent at this
time.
6.3
Alternative Approach for small Samples
As an alternative to a conventional test of significance for small
samples the following proposal is offered which would
high speed com;puters.
feasible for
Let us assume the simplest case, viz., 2x2 fac-
torial with no interaction in the model.
dence region for the two main effects.
We will find a joint confiAssume hypothetical values for
f3 effects; using Berkson's tables we can find the cell
the a and
probabilities which correspond to the assumed a
.e
b~
and
f3
The proba-
bility of observing this sample with the cell probabilities implied by
f3
the a and
chosen can be com;puted.
In this way we find a region
inside which the probability of observing the sa.m:ple obtained is greater than .05.
The process is sped
tal axis and
let
f3
=0
tq)
if we imagine a is assigned the horizon-
f3 the vertical axis as illustrated in Figure 6.1.
Then
and vary a until the .05 point is found; then let a and
f3 both be positive and equal, and find the .05 point.
Continue to
select pairs of points until a region is inscribed in the a
plane.
If the inscribed region cuts the a
and if it cuts the
f3 axis we .rejectH :
o
axis we reject
H : a ... 0,
o
f3 .... 0 •
For an illustration, let us consider a sim;ple example with a sample size of 2 and with 1 success in each cell.
95% confidence region for a and f3
We then find the joint
1............/
....
V
-~
.4
'" \
~
.3
V
I
.2
.1
1
I
-.4
\
-:3
-:2
0
-:1
.I
Ot.
-:1
.
\
-;2
\
.2
"~
f3
-
-;4
~
I-
~
.3
J
,,:,3
Figure 6.1.
~
~
l-/
/
/
V
95 % CONFIDENCE REGION FOR
V
ex AND
f:l
72
ex
13
.00
.18
.45
.41
.32
.32
.41
.45
-.18
Probability of Observing
.05
.05
.05
.05
.05
.05
.05
.05
.05
.18
.00
.41
-.32
.32
-.41
-.45
.18
.00
The second half which is symmetric with the above computations inscribes·
a circular region in the
ex
,
13 plane.
In general the confidence
region for two effects will be an ellipse.
the treatment effects are equal.
This one is circular because
Extensions of this idea could be pro-
grammed for high speed computers.
It is obvious from the preceding discussion in this chapter that
the' small sample problem still remains troublesome.
At the present
time ali that can be done is to make recommendations of a sample size
at which the large sample theory holds.
This we have done for the spe-
cial case of a 2:&:2 factorial arrangement with equal numbers of observations in each cell.
There still remains the problem of determining an
adequate sample size for larger eJq>eriments.
This has yet to be cover-
ed adequately for the more conventional chi square tests, not to mention
the tests newly proposed here and elsewhere.
An enormous amount of cOIJ:q)uting time could be eJq>ended in investigati~
the effect of varying the parameters 'Which affect the closeness
of the chi square approxLmation.
to vary .the
n ij
and Pij
To do this thoroughly one would need
configurations, and the degrees of freedom.
Also various combinations of n ij
and Pij
need to be examined.
73
If the problem is- approached empirically, it seems that the best
approach is to develop some critical parameter such that when this.
parameter is of the desired maguitUde, the· chi square. approximation is
adequate regardless of n ij
relations.
or
Pij
configurations or of their inter-
This we have attempted to do for the case described.
Perhaps with the increasing utilization of high speed computing
equipment, small sanwle techniques such as that proposed above or variations of it, will result in more acceptible tests for small samples.
This remains to be seen.
74
LIST OF REFERENCES
J.,
Aitchison,
and S. D• Silvey. 1958. Ma:x:i.mum likelihood estimation
of parameters subject to restraints. Ann. Math. Stat. 29: 813-
828.
Bartlett, M. S. 1935. Contingency table interactions.
the Jour. of the Royal Stat. Soc. 11: 2.48-252.
Supplement to
Berkson, J. 1955. . Maximum likelihood and minimum chi square estimates
of the logistic function. Jour. of the Amer. Stat. Assoc. 50:
--
130-162.
Berkson, J. 1957. Tables for the maximum likelihood estimates of the
logistic function. Biometrics. 13: 28-34.
Bhapkar, v. P. 1959. Contribution to the statistical analysis of
e:x;periments with one or more responses (not necessarily normal).
North Carolina Institute of Stati.stics Mimeograph Series No. 229.
Cochran, H. G. 1940. The analysis of variance when e:x;perimental errors
follow the poisson or binomial laws. Ann. of Math. Stat. 11:
335-347.
Cochran, W. G. 1943.
unequal numbers.
Analysis of variance for percentages based on
Jour. of Amer. Stat. Assoc. 38: 287-301.
Cochran, W. G. 1952.
Math. Stat. 23:
The chi square test of goodness of fit.
Ann. of
315-345.
Cochran.1 w. G. 1954. Some methods for strengthening the connnon chi
square tests. Biometrics. 10: 417-451.
Corsten, L. C. A. 1957. Partitions of e:x;perimental vectors connected
. with multinomial distributions. Biometrics. 13: 451-.484.
Cram~r,
H. 1945. Mathematical methods of statistics.
versity Press, Princeton.
PriJ:!.ceton Uni-
Diamond.1 E. L. 1958. Asym;ptotic power ahd independence of certain
classes of tests on categorical data. North Carolina Institute of
Statistics Mimeograph Series No. 196.
Dyke, G. V., and H. D. Patterson. 1952. Analysis of factorial arrangements when the data are proportions. Biometrics. 8: ~-12.
Eisenhart, C., M. W·. Hastay, and W. A. Wallis. 1947. Techniques of
, sta.tistical analysis. McGraw-Hill Book Co. Inc., New York.
15
LIST OF REFERENCES (continued)
Finney, D. J. 1952. Statistical methods in biological assay.
Publishing Company, New York.
Finney, D. J. 1952.
Cambridge.
Pr6bit analysis.
Hafner
Cambridge University Press,
Fisher, R. A. 1922. On the dominance ratio.
Society of Edinburgh. 42: 321-341.
Proceedings of the Royal
Fisher, R. A. 1954. The analysis of variance with various binomial
transformations. Biometrics. 10: 130-139.
fiX, E. 1949. Tables of noncentral chi square.
nia Publications in Statistics 1: 15-19.
Hodges, J. L.
metrics.
University of Califor-
1958. Fitting the logistic by maximum likelihood.
14: 453-561.
Bio-
Haldane, J. B. S. 1939. The mean and varianc~ of chi square when used
as~a t~st of homogeneity when expectations are small.
Biometrika.
31:
•
346-355.
Irwin, J. O• 1949. A note on the sUbdi~ision of chi square into components. Biometrika. 36: 130-134.
.
Irwin, J. o. 1950. Biological analysis with special reference to biological standards. Jour. of Hygiene. ·48: 215-238.
'.
Kastenbaum, M. A. and D. E. Lam;phiear. 1959. Calculation of chi square
to test the no three factor interaction b.y:pothesis. Biometrics.
15 : 101-115 •
Landcaster, H. o. 1949. The derivation and partition of chi square in
certain discrete distributions. Biometrika. 36: 111-219.
Landcaster, H. o. 1951. COlIq)lex contingency tables treated by the
partition of chi square. Jour. of the Royal stat. Soc. Series B.
13:
242-249.
Mitra, S. K. 1955. Contributions to the statistical analysis of categorical data. North Carolina Institute of Statistics Mimeograph
Series No. 142.
Neyman, J. 1949. Contributions to the theory of chi square tests.
Proceedings of the Berkeley S~osium on Mathematical Statistics
and Probability. 239-213.
LIST OF REFERENCES (continued)
Norton, H. W'. 1945. Calculation of chi squared for cozqplex contigency
tables. Jour. of Amer. Stat. Assoc. 40: 251-258.
peacock, E. E. Jr., B. Brawley and B. G. Greenberg. 1959. Incidence
of leukoplakia among cottonmill ezqployees. Unpublished, Departments of Surgery and Biostatistics, Chapel Hill.
Reiers¢l, Olav.
1954. Tests of linear hypotheses concerning binomial
Skandinavisk Aktuaritidskrif't.· 37: 38-59.
e~eriments.
Roy, S. N., and M. A. Kastenbaum. 1955. A generalization of analysis
of variance and multivariate analysis to data based on frequencies
in qualitative categories or class intervals. North Carolina
Institute of Statisti,cs Mimeograph Series No. 131.
Roy, S. N., and M. A. Kastenbaum. 1956. On the :hypothesis of no
"interaction" in a multiway contingency table. Ann. of Math •. stat.
27:
749-757.
Scarborough, James B. 1958. Numerical mathematical analysis.
Johns Hopkins Press, BaJ.timore.
•
,.,
•
Silvey, S. D. 1959. The lagrangian multiplier test.
Stat. 30: 389-407•
The
Ann. of Math•
Winsor" C. P. 19!J8. Factorial analysis of a multiple dichotomy.
Human Biology. 20: 195-204.