Salama, I.A.; (1974).Contributions to the theory of statistical inference from two-dimensional contingency tables."

CONTRIBUTIONS TO THE THEORY OF STATISTICAL INFERENCE
FROM TWO-DIMENSIONAL CONTINGENCY TABLES
By
Ibrahim A. Salama
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 924
MAY 1974
ABSTRACT
IBRAHIM A. SALAMA. Contributions to the Theory of Statistical Inference
from Two-Dimensional Contingency Tables. (Under the direction
of GARY KOCH and DANA QUADE.)
1) A procedure for constructing exact tests in two-way contingency tables is introduced.
the unknown parameters.
Two methods are suggested to deal with
For the large sample case, an attempt is made
to differentiate between some competing statistics to test for homogeneity by using the theory of Bahadur efficiency.
2) Let X be a random variable with distribution function G(e).
Based on a sample of size n from G, let
~.
e be
an (unbiased) estimator of
Suppose we are interested in estimating f(a).
A procedure to reduce
.~.
the order of the bias of f(a) is introduced.
We show under certain
conditions, that there exists a sequence of functions {f } such that
i
E[fi(S)] = ~(a) + O(n-(i+l»,i - 1,2, •••• The second order efficiency
is used to differentiate between the members of {f } and f(a).
i
applications, in different contexts, are considered.
3) Let X ~ B(n,a),
Y~
Some
B(n,a), where X,Y are independent.
methods to estimate a are considered.
Three
Their behavior in the small
sample case is considered using expected values, variances, mean square
A
error, and Pitman's measure of closeness (p{la- al ~ £}).
numerical results ate shown.
Some exact
ACKNOWLEDGMENTS
I wish to express my .incer. aratitude to Dr. Dana Qusde and Dr.
Gary Koch (my co-advisers) for their immense help in the preparation of
this study.
The patient guidance, encouragement and support of Dr. Koch
during my study and research were invaluable.
I would like to thank him
for directing my attention to some research areas.
The applications
carried out in Chapter IV were suggested by Dr. Koch (among some other
applications which do not appear here).
My deepest gratitude is due to
Dr. Quade for the amount of time and thought he gave to this work.
critical reading and helpful suggestions were invaluable to me.
His
It was
through his stimulating discussions and critical comments, most parts
of this dissertation took its present shape.
I am greatly indebted to Dr. P. K. Sen and Dr. R. R. Kuebler for
·their help in many ways.
I wish to thank Dr. Sen for his helpful comments
in establishing some of the results in this work and for the knowledge
I gained simply by working close to him.
I would like to thank Dr.
Kuebler for his encouragement during the early stages of my study and
for his helpful comments during the research.
I would like further to
extend my thanks to Dr. R. J. Cannon for the encouragement I received
during my study and for his participation in my doctoral committee.
lowe a very special debt of gratitude to Dr. A. E. Sarhan, Dean of
the Institute of Statistical Studies and Research at Cairo University for
his support and encouragement.
His personal advice, help and confidence
were the most valuable and encouraging things to me during my study at
Cairo University and here.
Finally, thankful appreciation is due to Mrs. Gay Hinnant for her
excellent work in typing the manuscript.
•
CONTRIBUTIONS TO THE THEORY OF STATISTICAL INFERENCE
FROH TI.JO-DU1ENSIONAL CONTINGENCY TABLES
by
Ibrahim A. Salama
A dissertation submitted to the faculty
of the Univer1'lity of North Caroli'18 at
Chapel Hill in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy in the Department
of Biostatistics
Chapel Hill
1974
Approved by:
Co-adviser
•
TABLE OF CONTENTS
Page
........................
LIST OF TABLES AND GRAPHS. • . . . . . . . . . . . . . . . . . . .
ACKNOWLEDGMENTS. •
ii
v
Chapter
I. INTRODUCTION.
· .. ......... .........
1. Estimation and hypothesis testing for
categorical data • • • •
• • • • • • • • • • • • •
2. Testing for homogeneity in two dimensional
contingency tables • • • ~ • • • • • • • • • • • •
3. The minimum chi-square statistic (TM) to test
for homogeneity in rxc tables. • • • • '. • • • • • • •
4. Some relationships among the statistics • • • • • • • • •
5. Summary and outline. • • • • • • • • • • •
• • • • •
·.....
1. Introduction • • • • • • • • • • • • • • • • • · . . . .
exact test for homogeneity in 2x2 tables •• · . . . .
2.
II. SOME EXACT TESTS IN TWO-WAY CONTINGENCY TABLES. ".
An
2.1 Ordering the sample space
•••••••• • •
2.2 Assigning a level of significance (the
problem of the unknown parameter) • • • • • • •
2.3
Example. •
e·
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
.. .. .. .. · . . . · .
• • • • · .
4.1 Tables. • • •
. . .. . . . . . . . . .
5. Summary. • •• • • . . . . . . . . • • • • • • · . • • •
III. ON THE BAHADUR EFFICIENCY OF CERTAIN TESTS. . . . · . . . · .
3. An exact test for rxc tables • • • •
4. Numerical comparison • • • • • • • •
1.
2.
3.
4.
Introduction • • • • • • • • • •
The theory of BAHADUR efficiency
Applications • • • • • • • • • •
Rem.arks. • • • • • • •.• • • • •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1
1·
8
12
17
21
23
23
26
27
34
38
43
47
54
61
62
62
63
67
80
•
d.J
Page
IV. A PROCEDURE FOR REDUCING THE ORDER OF BIAS OF ESTIMATORS.
1. Introduction • • • • • • • • • • • • • • • • • • • • ••
2. On reducing the order of the bias. • • • • • • • • • ••
3. Applications • • • • • • • • • • • • • • • • • • • • ••
83
83
92
3.1 On estimating the density of bacteria
in a solution • • • • • • • • • • • • • • • • •
3.2 A note on the logit x2-statistic • • • •
92
104
·....
..• • • • •
1. Introduction • •
..
· ....... ....
..........
2. Theorem (5.1) ••• . . . . . .
3.
3.1 The normal case •••
·· .. .. .. .. .. .. .. .. .. .. ..
3.2 Other distributions •
NUMERICAL RESULTS FOR SMALL SAMPLES . . . . . . . • • • . . .
V. GENERALIZATION OF THE BIAS-REDUCTION PROCEDURE.
VI.
83
109
109
109
115
123
126
1. Introduction • • • • • • • • • • • A • A • • .A. • • •.
2. On some aspects of the behavior of aRt aM an4 eN
in the small sample case • • • • • • • •.• • • • • ••
126
2.1 The expected values • • • • • • • • • • • • • • •
2.2 The variances and mean square errors •••
2.3 Pitman's measure of closeness • • • • • •
127
133
143
.. . ...........•
• • .. • • .. . • • • • • • •
·· .. .. ..
..........
VII. SUMMARY •
•
REFERENCES •
'0 • • • • • • • • • •
iv
127
146
151
.......
•
LIST OF TABLES AND GRAPHS
Page
Table
2.1
2.2
2.3
The order induced by the statistics T and T
M
p
on the set F • • • • •
8
·.............
The exact distribution of the statistics Tp ' TN
and TM for n=m-7 • • • •
·...... .......
2.5
40
The level of significance assigned to tables in
41
F using different methods. • • • • • • • •
7
2.4
33
Fifteen successive functions Lp(xi,yi,P) for
n-m-17. . . .' . . • . • . • • • . . . . • .
·...·.
Four successive functions Lp(xi,yi,P) with the
correspondin"g" Up' ~ and ~, for n=m=7.
• • •
·
·• • ·
Some functions L (xi,Yi,P) with the correspondirig
.
p
H and ~ for n-m-17. •
• •
• • •
P
2.7a The first moment of Tp and T for different
M
.sample sizes.
• • •
• • • •
•
• • •
42
49
2.6
··
·····
··
2.7b
2.8
The variance of T and
p
sample sizes'. • •
~
·· ···
50
·
52
•
53
for different
..·• • ·• • ···• • ··• ·•
T and its level of significance using the
p
maximum likelihood estimate of
2.9
·
~
·.
56
·..
59
for n - 2,3, ••• ,16
Critical values of T using the maximum likelihood
p
estimate of ~ • • • • • • • • • • • • • • • • • •
2.10 The indefinite set I and its maximum probability
3.1
using T assuming ~ £ [.05,.95] • • • • • • • • • • • •
P
.Some values of the BAHADUR asymptotic efficiency
function for T and TN. • • •
•
• •
• • • • •
•
p
· · ·
·
60
82
•
)'
\
Table
4.1
Expected values of
Bias(A )
,.1
~, ~1
and
~2
101
Variances of A, 1. and 1. for different
1
2
values of A • • • • • • • • • • • • • • • • • • • • • •
102
M~an
103
,..
A
,..
,..
4.4
x 100 for
..............
different values of A • • •
4.3
with
Bias (2)
,..
Bias(A)
x 100 and
Bias(A)
4.2
Page
,.
,.,.
,..
square error of A, 1.
and 1. • • • • • • • • • • ••
1
2
Examples of the effect of using the modified
logit x2~statistic. • • • • • • • • • • • •
4.5
Further examples of the effect of using the
modified1ogit·x 2-statistic • • • • • • •
......
.. .. •
106
• •
108
6.1
Expected values of eM and eN' and the maximum bias. • ••
131
6.2
Variance x 2n of eM and eN. • • • • • • • • • • • • • ••
136
6.3
Mean square error x 2n of eM and eN • • • • • • • • • ••
138
6.4
The intervals in which the minimum
,.
,.
,. .
,.
,.
,.
chi~square
estimate has a smaller variance and mean square
error than the maximum likelihood estimate. • • • • • •
6.5
140
The intervals in which the maximum likelihood
estimate has a smaller variance and mean square
error than the minimum Neyman chi-square estimate • • •
6.6
141
The intervals in which the minimum chi-square estimate
has a.amaller variance and mean square error than
the minimum Neyman
chi-squa~e
vi
estimate. • • • • • • ••
142
CHAPTER 1
INTRODUCTION
1. Estimation and hypothesis testing for· categorical data
Suppose that we have r populations each divided into c
categories.
Let
~ij
be the probability that an observation drawn at
random from the i-th population belongs to the j-th category.
sample of size n
i
is drawn from each population.
A
Let X be the
ij
observed frequency in the j-th category of the i-th sample, i • l, ••• ,r;
The frequencies x ij can be repr~sented formally in a
contingency table. An appropriate probability model ~, for the
j - l, ••• ,c.
observed frequencies x ij ' is given by:
r
nil
n i nx
i-l j ij
c
I
n
j-l
xij
~ij };
Note that i,j may be multiple subscripts.
(1.1.1)
Barnard (1947) pointed out
that the same contingency table (two dimensional) could arise under
three different models.
For further information about types of models
in general contingency tables, see
Imr~y
and Koch (1972, pp. 6-10).
Hypothesis testing concerning contingency tables is normally
carried out in two steps:
1) providing point estimates for the unknown
parameters involved in the underlying probability model, 2) producing a
test statistic for the given null hypothesis.
section maintains this logical order.
The discussion of this
•
2
Estimation:
For the probability model given by (1.1.1) suppose that the
following (regularity) conditions hold:
(i) ~ij
= fij(~);
~' • (al,.··,e t ) where ~ £ 0 is a non-
degenerate interval in t-dimensional Euclidean space •
. (ii) f ij (~)
~ 0, L
j
f ij (a) • 1.
(iii) The true parameter point
(iv) The functions
fij(~)
~O £
int (0).
possess continuous partial deriva-
tives up to the second order with respect to ei,s
for all 8 £ n.
-
(v) The parameters 8k 's are independent, i.e., the rank of
the rcXt matrix
~
f
.
(e)j
i •
k
l, ••• ,c
k • l, ••• ,t
ai1e -
. -
is t for all a
1,.~.,r
j •
£
0, and t
~
rc - r.
x
ij
Let N be· the total sample size, i.e., N • t n , qij • n
and Qi
i
i
i
Several methods of estimation could be used to provide
estimates for the unknown parameters, 8 's, involved in (1.1.1).
i
Three
of them are presented here.
l~
A maximum likelihood estimate,
the function
~,
~R'
of
~
is that value
~ £
0 at which
attains its supremum.
2. A minimum X2 estimate, ~M' of ~ is that value ~
2
function X attains its ~nfimum. where
£
n'at which the
(1.1.2)
3
3. A minimum Neyman X2 estimate, ~N' of ~ is that value of ~ e
that the function
n such
X; attains its infimum. where
nifij(~»2
(x ij -
xi = i.jL
•
x
(1.1.3)
ij
It turns out that these three methods lead to estimates of ...6
which are members of one class:
the best asymptotic normal (BAN)
estimates, defined by Neyman (1949) as follows.
Definition
A function 6 of the random variable qij which does not depend
i
directly on N is called a BAN estimate of the parameter 6i if it
•
satisfies the following conditions.
A
(a) 6i is a consistent estimate of 6 • Le •• 6i
i
P
-+
6i •
A
(b). As N -+
ClO.
the distribution of 6 tends to be normal.
i
N(6i.ai/~).
where a i is a fixed constant
independent of
N.
A
(c)
aSk
aexists and
qij
is continuous for every i.j and k.
A
(d} If 6{ is any other function
s~tisfying
a-c with a taking
the place of ai' then
ai
A
The fact that
A
~R' ~M
~
a.
A
and
~N
are BAN estimates for
~O
is stated
in the following theorem by Neyman (1949).
Theorem· (1.1)
Under the conditions i-v. the systems of equations
aa108
ek
~ • O. k • 1
t
••••• •
(1.1.4)
•
4
ax2
~.
aSk
0, k • l, •••• t.
(l.l.S)
aoo
a~
---. 0, k
=
aSk
(where it is assumed that all x
is a BAN estimate of
l, •••• t,
~
ij
(1.1.6)
0) each possess a solution which
~O.
At this point we see that the aforementioned three methods of
estimation are asymptotically equivalent in the sense that they lead
to BAN estimates of the
unknown parameter 8. Rao (1963 1965)
.
~,
~onsidered
another criterion, the second order efficiency, to distinguish between
estimation procedures.
According to that, he showed that the ,maximum
likelihood estimates are superior.
As far as computations are con-
cerned, the equations giving the maximum likelihood or the minimum
X2 estimates are sometimes fairly complicated and an iterative procedure must be used to obtain the solution.
On the other hand, the
minimum Neyman X2 method has an advantage in this respect.
functions fij's are linear
in~,
a system of linear equa'tions.
(1949) introduced
If the
then the system (1.1.6) is reduced to
Even if the fij's are not linear, Neyman
a linearization
technique to generate BAN estimates
using the third method mentioned earlier.
The result is stated 'in the
following theorem (Neyman (1949».
Theorem (1.2)
If a system of equations «1.1.4) - (1.1.6»
subject to
Wij • fij(~) leads to BAN estimates of Wij's, then the corresponding
system obtained subject to the linearized constraints
s
•
(1.1.7)
1 • 1,2, ••• ,rc-r-t
leads to BAN estimates of
~ij's.
So, the minimum Neyman X2 equations obtained subject to
(1.1.7) lead to BAN estimates of
~ij'S,
they are linear in
~ij'S
and
in general easier to solve.
Before we conclude this part about estimation we note:
1. The aforementioned three methods of estimations are not the only
methods which lead to BAN estimates.
For example see Taylor (1953).
2. Almost nothing is known about the properties of these methods in
the small sample case, except for some numerical studies (see Berkson
(1955) and Odoroff (1970».
Testing hypotheses.
Consider the probability model
(1.~.1).
Let H
o
be a hypothesis concerning the parameters involved in that model,
defined by:
(1.1.8)
The following theorem prOVides us with three test
s~atistics
for
Ho (see Neyman (1949».
Theorem (1.3)
,.
Let
e be
,.
a BAN estimate for
~O
and let
~ij
,.
•
fij(~).
Define
(l.1.9)
•
6
"
(xii - n i
x
n
1T
i
2
1Tij )
,
(1.1.10)
ij
and
~.
1:
i,j
(x ij - n i
x
"
1T
i
,)
2
(assuming x
ij
ij
~
0)
(1.1.11)
If Ho is true, then as N + ~ and Q's are held fixed, the random
2
2
variables -2 log A, X , and
have.x 2 (rc-r-t) as their asymptotic
XN
distribution.
In some situations it is of interest to test for H 1 c H or
o
for H02 • Ho n Ho1 '
0
For the test statistics for Ho1 and H02 ' cor-
responding to the statistics given by Theorem (1.3), see Neyman (1949).
Another approach to testing hypotheses concerning contingency
tables is to adapt to the categorical data set-up the statistic given
in the following theorem of Wa1d.
Theorem (1.4) (Wa1d. 1943)
Let Xi' i • 1,.~.,N, be a set of independent and identically
..
distributed random variables with joint probability function
W(x,a),
.
.
"
a' - (el, ••• ,e~) and ~ € O. Let e be the maximum li~elihood estimate
of
~
and Ho be the hypothesis that
~ €
Wc
l - l , ••• ,u
0 which is defined by
(u < t) •
Write
Assume that:
(i) Wpossesses continuous partial derivatives with respect
to
~
up to the second order.
7
t E~ [~2e~O! :jJ] i,j·
(ii) The matrix
!(~)
•
i
•
1, ... ,t
is positive definite.
(iii) The matrix
B(a)...
- -
~
FaR,a-(8)~
i
R,
= l, ... ,u
i
= l, ••• ,t
is of rank u.
Then under B as N +
o
w=
~
the statistic
+.(e~[H(~) B-1(~) H,(~)C1[h(~)J
has X2 (u) as its limiting distribution.
In the case of testing a "linear hypothesis," of the form
F(~)
... 0, it has been shown that
IN2
and Wa1d's statistic are alge-
braically identical (see Bhapkar (1966».
This algebraic identity is
preserved when the hypothesis to be tested is nonlinear but Neyman
linearization is used to obtain the statistic ~.
If·the hypothesis is such that linear functions of
defined in terms of linear functions of
-a,
~ij'S
are
-
where 8 is a vector of
unknown parameters, then the generalized least squares technique provides us a handy method for obtaining the
IN2
statistic (or Wa1d's
statisti'c).
Theorem (1.5) (Bhapkar. 1961)
Let a linear hypothesis be defined by
c
Bo :
j:1 a j ~ij • d il 81 + d i2 82 + ... + dit
·a t , i • l, ••• ,r
•
8
where the di's and ai's are known constants and the a's are unknown
parameters.
Then the
XN2
statistic for testing Ho is the same as the
minimum sum of squares of residuals obtained by the general least
c
square technique on
variances."
L a qij with the variance estimated by "samplei
j=l
2
The statistic
has x2 (r-v) as its limiting distribution,
IN
where v is the rank of the rXt matrix
~
= [d ik ].
For more general cases, where functions of
linear are defined in terms of linear functions of
squares technique can be used to. obtain
Bhapkar and Koch (1965).
2
IN,
~
not necessarily
-a,
and the least
see Bhapkar (1966) and
This technique is used extensively by
Grizzle, Starmer and Koch (1969).
The test criteria described in this section
~re:
1) Consistent
for testing Ho and 2) Asymptotically equivalent in the sense that the
probability of contradicting each other tends to zcro as N ~
~
for
every admissible hypothesis (see Neyman (1949) and Wald (1943».
2. Testing for homogeneity in two dimensional contingency tables·
Given an rxc contingency table with the probability model given
by (1.1.1), let Ho be the hypothesis of homogeneity of the r-populations, where
i •
l, ••• ,r
Several statistics could be used to test for H.
o
(1.2.1)
First, the
well known Pearson's statistic (denoted here by T) which ishistori.
p
.
.
cally the oldest, having been introduced first by Karl Pearson (1900).
The form of this statistic is given by
9
;..
T •
L
P
i,j
(x ij -XNi
n
;..
where the
~j'S
~j
parameters
~j)
2
(1.2.2)
~j
i
are the maximum likelihood estimates of the unknown
(obtained under Ho ).
x2 «r-l)(c-l»,
see Cramer (1946).
•
The asymptotic distribution of T is
p
It has been shown (under certain
conditions) that Tp has a noncentral chi-square distribution (see
Mitra (1958) and Diamond (1963», under certain local alternatives.
For the case of 2x2 tables we adopt the following notations:
nand m are the (fixed) sample sizes from the first and second population, respectively.
x and yare the observed frequencies in the first
category of the first and second population, respectively.
y
~ B(m'~2)
and x,y are independent.
o is
H
~l
-
x
~ B(n'~l)'
~2.
For this case the statistic T assumes the form
p
Tp(X,y) •
(n+m)(mx-ny)2
om [ (n-ku) (x+y) -, 2xy- (x
(1.2.3)
2 2
+y )]
Another approach for testing Ho is to use the likelihood ratio
test.
Consider the probability
mode~.~,given by
the parameter space and let w(cO) satisfy (1.2.1).
~w
- sup
w
~
and
~o
•
(1.1.1).
Let 0 be
Let
sup ~.•
n
Then the likelihood ratio for testing H is
o
(1.2.4)
such that mj • L x ij •
c
•
10
Wilks (1935) shOWed that the limiting distribution of' -2 log
(denoted here by TR) is
~
x2 «r-1) (c-1».
Hoeffding (1965) established "if a given test of size
~
is
sufficiently different from the likelihood ratio test, then there is a
likelihood ratio test of size
~ ~
which is considerably more powerful
than the given test at 'most' points p in the set of alternatives when
N is large enough, provided that
~ ~
0 at a suitable rate.
In particu-
lar, it is shown that chi-square tests of simple and of some composite
hypotheses are inferior, in the sense described, to the corresponding
likelihood ratio test."
On the other hand, Hoeffding (1965, p. 394)
showed that this theory does not apply if we are testing for independence in contingency tables.
Generally speaking, if nothing is definitely known about which
procedure is better, the common practice (in a very general context) is
to use the likelihood ratio test or the maximum likelihood method for
estimation.
(In contingency tables, the common
practice is to use Tp ).
.
For the general properties of the likelihood ratio test, see Wilks
(1938) and Wald (194la, 1941b, 1943).
In the case of 2x2 tables, we have
x-y)n m
nm
n+m x y
n-x
m-y
(n+m)
x y (n-x)
, (m-y)
x+y
(m)
TR(x,y) • -2 log
(n+m (n+m - x-y)
(1.2.5)
The third possible statistic to use in testing for Ho is the
Neyman chi-square statistic (denoted here by TN)' which is defined as
~ 2
'II'j)
TN •
,
(1.2.6)
11
~
where nj's are the minimum Neyman chi-square estimates (under Ho ) of the
unknown parameters nj's.
This statistic was introduced by Neyman (1929), its limiting
properties are shown by Neyman (1949) and it has
x2 «r-l) (c-l»
•
as its
limiting distribution.
An interesting property possessed by TN is stated in the
following theorem.
Theorem (1.6) (Goodman, 1964)
The test of homogeneity based on TN
will reject H0 if at least
.
"
""
one estimated contrast (i.e. ~i,i',j • nij - ni'j' i
cantly different from zero.
+ i')
is signifi-
In the case of 2x2 tables, we have
2
nm(mx-ny)
TN(x,y). 3
2
3
2'
m (nx-x ) + n (my-y )
(1.2.7)
Theorem (1. 6) could be utilized to produce another statistic to
"
test for Ho ' Let the nj's be the maximum likelihood estimate 'of the
unknown parameters nj's.
Define the statistic Tc by
" 2
T = L (xij-ni ~j)
c
i,j
x ij
(1.2.8)
From the results of Section 1, we can easily see that Tc is
consistent, asymptotically eqUivalent to the other test criteria (mentioned earlier) and it has
x2 «r-l)(c-l»
as its limiting distribution.
The reason we introduced this statistic here is to use it in the stochastic comparison between test statistics for Ho ' carried out in the
third chapter.
In the case of 2x2 tables we have
••
12
T (x,y) _ jmx(n-x)2+ ny(m-y)] (mx-ny)
c
(n+m) xy(n-x)(~y)
2
(1.2.9)
An interesting example in which the proposed test criteria give
inconsistent results is given here, that obtained by adapting the logit
chi-square statistic presented by Woolf (1955) and extended by P1ackett
(1962) and Goodman (1963) (mainly for testing interaction in higher
dimensional tables) to test for homogeneity in 2x2 contingency tables.
The form of that statistic (denoted here by T ) for 2x2 tables is given
L
by
_
TL(x,y) -
xy (n-x) (m-y) [log ...L - lo~]
n-x
[(mx(n-x) + ny(m-y)]
m-y
2
;0
,x,y •
(1. 2 .10)
2
The statistic hasX (1) as its limiting distribution.
Finally we derive the minimum chi-square statistic to test for
homogeneity in rxc tables.
3. The minimum chi-square statistic (TM) to test for homogeneity in
rxc tables
Consider the familiar x2 expression
(1.3.1)
Under H , the expression (1.3.1) becomes
0
2
Xo -
r
c (xij - ni
1:
1:
n 'lr
i j
i-I j-l
'lT )
j
2
,
c
1: 'lr j - 1.
j-l
. (1.3.2)
The minimum chi-square estimates 'lrj'S of the unknown parameters
...
'lrj'S are defined to be those values 'lrj'S which minimize (1.3.2).
A straightforward well known technique is used to derive them.
Consider the derivatives of X2 with respect to 'lr ,
o
j
j - 1,2, ••• ,c. We have
13
•
(1.3.3)
It is clear that the values
XO2 are
~j'
j • 1,2, ••• ,c, which minimize
the solutions of the system:
a~
a~
j . l, ••• ,c
0,
c
j
(1.3.4)
c
1::
j-l
~
j
• 1
To solve (1.3.4), set
Then we get
r
1::
i-I
(1.3.5)
Simplifying (1.3.5) we get
r
2
~j+l
r
1::
i-I
2,
2
Xij - ~j+l
~
I.-
k~l llc
'i-I' k"i
•
14
r
2
11"
=
n
r
L
j 1=1
nk
k-1
k"1
(1.3.6)
Noting that
2
r
1I"j+1
L
1=1
... 11"
2 r
L
j i-I
Then, from (1.3.6) we get"
"'2
~2
1I"j+1
-
( n 01)
1=1
"
k;1
[~ [k~1 Uk] X~,j+l]
1
r
( n n1)
i-I
i-I
-
[~1 k~1 ~ x~]
r
1
r
k;1
r
2
L Xii
i-I n
1
(1.3.7)
r
2
L X1 ,j+l
i-I
n
1
Hence
~
'JI'j+l
- ~~1
~!1
2 T/2
:u..
n
1
2 1/2
x 1 ,1+1~
n
1
(1.3.8)
15
... ...
We note that (1.3.8) holds for
This implies that
;j
~j/~1
such that 1 • 1,2, ••• ,c and 1
~ j.
takes the following form:
~ j
r
...
1
~j - G
x2
l/2
r.:!L
i=l n
where G is independent of j.
•
(1.3.9)
,
i
Let G be defined as
j
(1.3.10)
Gj -
Then,
But since the
~j'S
must satisfy the condition
c
r
j-l
~j
- 1, it follows that
(1.3.11)
Thus we have the following results.
Lemma (1.1)
Consider rxc contingency table with entries xij ' i - l, ••• ,r;
j - l, ••• ,c, where ni is the sample size from the i-th population. Let
H be the hypothesis of homogeneity of the r populations.
o
"
...
Then, under
Ho ' the minimum chi-square estimate, ~j'S of the unknown parameter ~j ,
is given by
where
G
j
-
2 1/2
ij
x .] and G
i-I n i
[~
~ ~
j-l
G •
j
We note that the preceding lemma corrects the following statement of
~
r n i Pij
x ij
Goodman (1964, p.' 720): "The values of Pj • ( r
N ' Pij ~ n ~
i-I
i
•
16
2
are obtained by minimizing X
K
(
. . (x ij - n i Pj)
~
i,j
-
2
-)
n i Pj
subject to the
c
condition that
~
j=l
P
- 1."
j
Lemma (1. 2)
With the conditions and notations of Lemma (1.1), the minimum
chi-square statistic, T , to test for homogeneity in rxc contingency
M
tables is given by
2
TM • G - N,
where
Proof:
Substituting n for n in expression (1.3.2) we get
j
j
Lemma (1.3)
1. The estimates defined by Lemma (1.1) are BAN estimates
for the nj's.
2. The statistic T has x2 «r-1) (c-1» as its limiting distriM
bution. It is co~sistent (for Ho ) and asymptotically equivalent to Tp '
TH, TN and Tc •
Lemma (1.3) is
~
direct consequence of the results of Neyman
and Wald mentioned in Section 1.
17
The statistic TM has the following nonstochastic properties:
1. By the definition of T we have
M
2. The 'statistic T
M
•
is well defined for every point in the sample space,
so that the existence of zero frequencies or zero expectations does not
disturb the computations.
That gives a good advantage to T especially
M
in the small sample case where zero expectations and zero frequencies
are more likely to occur.
Also it is clear that T is easier in
M
computations than the corresponding T or the T •
R
p
4. Some relationships among the 'statistics
Tn this section, some relationships among the statistics
mentioned earlier to test .;"or homogeneity are introduced.
'Relation (1.1)
For rxc tables, we have
(1) T < T ,
Mp
(i1) ~ ~ TN'
(iii) TM ~ Tc •
Proof:
Holds by the definition of T •
M
Relation (1. 2)
Consider a 2xc contingency table with nl • n 2 • n.
< T < (2 log 2) T
P - RP
T
Then
•
18
Proof:
See Margolin and Light (1973).
Relation
(1. 3)
To test for homogeneity in 2x2 tables with n
= m,
we have
Hence
(ii) 1) TM(x,y)
D
Tp(x,y) if x = y or x + y
~
n,
2) Tp(X,y) • TN(x,y) iff x • y,
and
3) TN(x,y) • Tc(x,y) iff x
D
y or x + y • n.
Proof:
(i) Write
2
•
2n(x-y)
TN (x,y) 2[x(n-x) + y(n-y)]
2
. . :::;;2n;';'(lo,;;x:;"-""YI-)~_ _--=-
•
.
2n(x+y) - (x+y)
2
- (x-y)
2
that gives
111
.--TN T
2n
p
and (i) holds.
(ii) 1) by direct .substitution and 2)
dir~ct1y
2
• n(x-y) [x(n-x) + y(n-y)1
4xy(n-x) (n-y)
• t
(x ) Jx(n-x) + y{n_y)l2
N ,y 4xy(n-x) (n-y)
from (i).
19
Consider
•
[x (n-x)· + y(n_y)]2 - 4xy(n-x) (n-y)
• [x(n-x) _ y(n_y)]2
• (x_y)2 [n _ (x+y)]2,
which equals
zero iff x .. y or x + y • n.
Another type of result concerning estimators is presented here.
Consider 2x2 contingency table with n
= m.
Let e , eM and eN be the
R
maximum likelihood, the minimum chi-square and the minimum Neyman chisquare estimates (under the hypothesis of homogeneity) of the unknown
parameter n, respectively.
6R a~
2n
Thus we have
'
and
6•
men-x) + (n-v)]
n[x(n-x) + y(n-y)] •
N
Under these conditions we have
Lemma (1.4)
A-
A-
A-
(i) eN • e R • eM·· ~ if x ... y or x+y
A-
.A-
A-
(ii) eN < ea < eM < 1/2 if 0 < x+y < n, x
,..
A
n,
=
A
~
y,
.
(iii) eN > e a > eM > 1/2 if n < x+y < 2n x
~
y.
Proof:
(i) By direct substitution.
(ii) and (i1i):
•
20
Consider
eR _e.
(X_V)2 [n-x-y]
N 2n[x(n-x) + y(n-y)]
that gives
,..
,..
eN < eR
,..
,..
eN > eR
,
if
0 < x+y < n, x ; y,
if
n < x+y < 2n, x ; y.
Now, consider
2 2 1/2
.
2
2 1/2
(2n-x-y){x ty )
- (xty) [{n-x} + {n-y} ]
(~}{x2+y2}1/2
Let
222
A2 • (2n-x-y) (x ty )
{x+y}2 [(n_x}2 + (n_y)2]
222
222
.
• (x +y ) [{n-x} + (n-y) ] + (x +y ) [2(n-x)(n-y)]
•
(x2+y2) [(n-x) 2 + {n_y)2] + 2xy[(n_x)2 + (n_y)2]
Let
B
= (x 2+y 2) [2 (n-x)(n-y)]
2
2
- 2xy[(n-x). + (n-y) ]
2
• 2n(x-y) [n-x-y].
Now, if x
~
y then
o< ~
< n ... B > 0 -.A > 1
and
n <
~
< 2n -+ B < 0 ... A < 1
That proves the lemma.
21
This result is utilized later in a study about the behavior of
5. Summary and Outline
•
In Sections 2 and 3, several chi-square statistics (Tp ' TN' TR,
T and T ) were introduced to test for homogeneity in rxc tables.
M
c
of these statistics have the same limiting distribution.
are consistent (for H ) and asymptotically equivalent.
o
arise here:
All
Also, they
Two
problems
First, what shall we do when the sample' size is small, and
hence the asymptotic theory is questionable?
In Chapter II, this
problem is treated from a special point of view.
An exact test for
homogeneity in 2x2 tables is introduced with a generalization to the
rxc case.
The approach used in constructing this exact test makes uS
able to compare the behavior of the exact distribution of some statistics with the corresponding theoretical X2 distribution.
Second, if
the sample size is large enough,which one of these statistics shall
we 'use?' In Chapter III, an attempt is made to answer that by applying
the theory of Bahadur efficiency (Bahadur (1960»
to these statistics
in some cases.
The second part of this work deals with some problems in estimation.
In Chapter IV, a general method to reduce the order of the
bias is introduced.
problems:
An application of that is carried out in two
I) adjusting the statistic T , 2) the problem of estimating
L
the "most probable number" discussed by Koch and Tolley (1973).
Further
extension of this procedure is discussed in Chapter V.
For 2x2 tables, a comparison between three methods of estimation
(to estimate the unknown parameter
~
under the homogeneity hypothesis)
•
22
is carried out in Chapter VI.
entire thesis.
Chapter VII contains a summary of the
•
CHAPTER II
SOME EXACT TESTS IN TWO-WAY CONTINGENCY TABLES
1. Introduction
To test hypotheses concerning contingency tables, we usually
use a test statistic T whose asymptotic distribution is X2 with suitable
degrees of freedom.
The test is approximately valid for finite samples
and the approximation is expected to be close for large samples.
The
question that arises naturally is what to do when the sample sizes
and/or expectations are small.
Several authors have treated this problem in different contexts:
see Fisher (1934), Yates (1934), Cochran (1936,52,54), Haldane· (1937,
39), Welch (1938), Wise (1963,64), Freeman and Halton (1951), Barnard
(1947), Pearson (1947), Tocher (1950), Maxwell (1961), Odoroff (1970),
MOhberg (1972), and Margolin and Light (1973).
The work that has been done on this point can be outlined as
follows:
1. Suggesting rules of thumb concerning the minimum expected
frequencies, or the minimum expected frequencies in conjunction with
the number of degrees of freedom.
2. Presenting a modified version of T on the basis that the
distribution of this modified version is better approximated by the'
corresponding X2 distribution.
3. Comparing the exact distribution of the statistic T (in
•
?
24
some special cases) with the appropriate X2 distribution.
This method
is also used to differentiate between two competitive statistics.
4. Establishing an exact test.
One class of exact tests (the
conditional exact tests) is based on the assumption that all the margins
of the contingency table are fixed.
The other class does not impose
this restriction.
The purpose of this chapter is to present an exact test for
homogeneity in 2x2 contingency tables.
considered too.
The extension to rxc tables is
As far as the physical concept of a contingency table
is concerned, we adopt the philosophy presented by Barnard (1947),
Pearson (1947), and independently hy Roy and Mitra (1956).
Barnard (1947) pointed out that a 2x2 contingency table could
be the outcome of three different sampling schemes:
1. A random sample of size N is drawn from one population and
the individuals of this sample are classified into one
classes of the 2x2 table.
~fthe
four
(Here the total sample size, N, is fixed).
2. Two random samples of size n and m are drawn from the first
and second populations, respectively.
The members of the sample drawn
from each population are classified into one of the two categories of
response.
(Bere one margin of the table is fixed).
3. The third conceivable situation is to consider both margins
of the contingency table to be fixed in advance.
is Fisher's tea-tasting
experimen~
An example for that
(1935).
In the course of our analysis of the experimental results, we
should keep clearly in mind the conditions under which the ,experiment
has been done.
Any hypothetical change in the real situation leads
to a different probability space for which the results of the analysis
25
will not be accurate.
So, the conditional approach to analyze the
results of experiments arising from the first and the second sampling
schemes mentioned above is not considered here, simply because we do
•
not want to impose restrictions which were not part of the original
experiment.
Thus, Fisher's exact test (1935) is not used or recommended
in our situation.
Roy and Mitra mentioned that they abandoned the con-
ditional approach for mathematical reasons [see Roy and Mitra (1956,
p. 362)].
The tests presented here are to some extent based on the ideas
explored earlier by Barnard (1947).
The approach we are using will
give two kinds of results:
1. An exact test for homogeneity in 2x2 tables.
2. A method for comparing the exact distribution of some of
the statistics used to test for homogeneity in 2x2 tables with X2 (1).
Now we formulate the
probl~~
and establish the notations.
Consider the 2x2 contingency table in the homogeneity situation,
i.e., we have two populations and two categories of response.
Let n
and m be the sample sizes from the first and second populations,
respectively; we assume that nand m are fixed.
Let
~l
and
~2
be the
probability that an observation from the first and second population, .
Let x,y be the observed
respectively, falls into the first category.
frequency in the first category of the first and the second population,
respectively.
So, the 2x2 table is given by:
\ cl
c2
pop. 1
x
~x
n
pop. 2
y
~y
m
~
n+m-x-y
•
26
B(nln ) and y ~ B(m n ) and they are independent,
l
l 2
the probability of observing the above table is given by:
Since x
~
The hypothesis of homogeneity is given by:
B :
o
Under Bo ' the probability of observing the above table becomes:
P ( x,y,n ) .. ( nx) (my) nx+y (l~n ) n-kD-x-y •
Note that:
P(x,y,n) • P(n-x,m-y,l-n).
Now we want to test R •
o
The following notations are used here:
T • any . test statistic for B0 (larger values are more
significant).
W~
used this notation already in
Chapter I.
L(x,y,n).
1:
R(x,y)
L*(x,,) • max L(x,y,n).
n
2. An exact test for homogeneity in 2x2 tables
Two main steps are
(or any other test).
i~vo1ved
in defining the proposed exact test
First, the set of all possible results,
totally ordered in a suitable way.
Fn,m,is
This ordering reflects our idea
about H (the alternative hypothesis), ,or in Barnard's terminology,
1
27
the ordering device is a reflection of our idea about what we mean by
the "width of difference" (between 11"1 and 11"2)' see Barnard (1947,
pp. 129-130).
Second, we assign a level of significance to each
element in Fn,m , which is. now a totally ordered set.
•
2.1. Ordering the sample space
Consider the set F
• We are looking for a test statistic T
n,m
such that the order induced by T on F
will reflect our idea about
n,m.
HI. For our problem, it is reasonable to require that T must satisfy
the following two conditions:
1. The symmetry condition:
T(x,y)
= T(n-x,m-y).
2. The convexity condition:
T(x,y) > T(x',y) if Ix - ~I
m > lx' - ~I,
m
T(x,y) < T(x,y') if
I!!n x-yl
>
l!!x
n
y'l •
These two conditions were mentioned first by Barnard (1947); we state·
them here as a general requirement which must be satisfied by any test
statistic for H •
o
Now, any T which satisfies these two conditions induces a
partial order on
Fn,m . But what we need is a total ordering of Fn,m •
To do this, Barnard (1947) introduced what he called the maximum
condition, which can be stated as follows:
Consider the set {(x,y): . x > ~}.
n
If the first (k-l) points
(xl'Yl)' (x 2 'Y2)' ••• , (~-l'Yk-l)' have been chosen with T(xi,yi )
> T(xi_l'Yi_l)' then the k-th point (~'Yk) is that point of all points
(x,y) permitted by the convexity condition for which
•
2S
max[P(x,y,~)
+
L(xk_l'Yk_l'~)]
~
is the least.
To clarify that, we consider an example given by Barnard
(1947, p. 133).
Consider the set F ,6.
deal only with {(x,y) E Fs ,6; x <
s
By the symmetry condition we
iY}.
The convexity condition tells
us that the highest table in ranking is the table given by (0,6).
For
this table we have:
Now, the tables allowed by the convexity condition to come next in
order to (0,6) are (1,6) and (0,5).
select one of them.
The maximum condition is used to
If (1,6) is to be chosen next to (0,6) then we
have:
and
"It
-4
L. (1,6) • 10.97 x 10
On the
oth~r
•
hand, if (0,5) is to be chosen next to (0,6) then we
have:
and
~*
-4
L (0,5) • S.58 x 10 •
~* (0,5) < L* (1,6), we choose the table (0,5) to come in order
Since L
next to (0,6).
Now the set of tables allowed by the convexity condition are
(1,6) and (0,4).
(Note that (1,5) is not allowed by the convexity
29
In the previous process we note that the maximum condition is
an arbitrary condition, i.e., some other functionals of n like
f 1o L(x,y,n)
*
dn could be used instead of L (x,y,n) with the symmetry
and the convexity conditions to induce the required total ordering of
F
(see Barnard (1947, p. 131».
n,m
Barnard's test uniquely.
But the maximum condition defines
We see that this procedure would be very
complicated in cases other than the 2x2 case.
Even in the 2x2 case we
can see that the amount of calculation is heavy.
What we suggest here is to use one of the statistics presented
in Chapter I to order the set F
, with the requirement that this
n,m
statistic must satisfy the symmetry and convexity conditions.
Since
,.,t'",
testing for homogeneity in contingency
tables.f~-,omostly handled
using
the X2 theory, which implies that we accept implicitly the nature of
order induced by the corresponding test statistic on the sample space,
it would appear that any of the statistics presented in Chapter I
(provided that the symmetry and the convexity conditions are satisfied)
would be a good device to use in ordering the elements of F
• We can
n,m
easily see (by direct substitution) °that all the statistics mentioned
in sections 2 and 3 of Chapter I satisfy the symmetry condition.
*'
In
To show that T satisfies the convexity condition, it is sufficient to show that for mx > ny,
>
o.
~o
30
Now
. 2
. 2
.4
..
2
2nm 2 (mx-ny)[m3
(nx-x
) + n.3 (my-y
)] - nm
(n-2x)(mx~rty)
32322
[m (nx-x ) + n (my-y )]
= A + B,
where after simplification the quantity A reduces to
A
= n2m2 (mx-ny) [m 2 (x(m-y) + y(n-x»
+2ny(m-y)]
and
aT
mx > ny ...) A > 0 ~--!! > 0; ~ < x < n.
ax
m
For T we have:
p
aT
a::-...
2
2 2
[2nm (n+m)(mx-ny)( (n+m)(x+y) - 2xy - (x +y »
- nm(n+m) (mx-ny)2«n+m)-2x-2y)]
. 22
222
.. n m t (n+m) (x-+y)-2xy-(x +y )]
... A + B.
After simplification the quantity A reduces to
2
.
A· nm(n+m) (mx-ny) [(m-y) (x+2y) + yen-x)]
and
mx > ny ~A > 0
aT
wil;af > 0; iY < x < n.
Consider the statistic Ta .
aTa
ax
• 2 10
Then
x (n+m-x-y) • 2 log A.
(n-x)(x-+y)
.
aT
.
n
a
n
Clearly, for =y
< x < n, A > 1. Hence ~ > 0; =y < x < n.
m
aX
m
g
31
For the statistic T we have:
M
~.[.!..-2 +. I....]
2 1/2
+
aT
_ M ... 2
ax
n
m
[(n-x)
n
2
+ (m~y) 2]~
•
x
... 2 x A x B.
Now
x[(n-x)
B•
2
n
2
2 1/2
2
+ (m-y) ]
(n-x) [.!..-
2 1/2
+ I....]
m
n
m
2 1/2
2
(m_y)2 1/2·
n[.!..- + I....]
[(n-x) +
m ]
n
m
n
.. C f D.
The claim is C > 0 iff mx > ny. That C > 0 ~
2 ... 2 1/2
[1L I....]
-lL > -..;n....
· ·~m=--_ _~-:-~
n-x
2
2 1/2
[en-x) + (m-y) ]
n
m
+
x
2
(n-x) 2
>
2
2
mx + ny
m(n_x)2 + n(m_y)2
x2 (m_y)2 > y2(n_x)2
mx > ny.
So, mx > ny ... C. > 0 .... B > 0 .....
aTM
n
ax> 0; i1 < x < n.
For T we have
c
aT
axc • {(n+m)2 xy(n-x) (m-y) [2m(mx-ny)(mx(n-x) + ny(m-y»
+ m(n-2x) (mx-ny) 2]
- (n+m)2 y(m-y) (n-2x) (mx-ny) 2 (mx(n-x) + ny(m-y»}
•
32
... A .;. B.
After simplification the quantity A reduces to
+ ny(n-x) (m-y) (mx+ny)
+ nxy(m-y) (mx-ny)]
n
and A > 0 for nl < x < n.
Thus
aTc
ax
n
> 0; -::y < x < n.
m
Finally we show that T does not satisfy that condition.
L
that if mx
= ny
then TL - 0. also lim TL (x.y)
x+n
tinuous and so is TLI
y~o
- TL
0
= O.
Note
Now. TL is con-
(x).' Since TL ~ 0 and is not a
O·
constant function. then T assumes a maximum at x such that
Lo
o<
X
condition.
o ...
n
nlO)'
So. T does not satisfy the convexity
L
For example we have for n ... m • 100. T (99.1) ... 41.81 and
L
x < n (and
X
TL(98.2) • 59.77.
Hence the candidates to be used in ordering F a r e
ntm
TM• TR and Tc '
Let T be any statistic.
Let D be the set of ordered values
T
taken by T. i.e •• DT - {aO' a 1 • ""
(If there is some point s £ F'
n.m
Tp"-N'
• ~-t
a~} such that a O < a1 < ••• < ~.
such thatT is not defined at
. T(s) is defined to be the limiting value taken by T at 8).
St
then
In our
case. the exact distribution of any statistic T. for Bo • is given by:
33
Lemma (2.1)
•
For the case of 2x2 tables with n = m we have:
R (x,y)
P
=RN (x,y)
hence
Lp(X,y,~)
•
~(x,y,~) --=;>
*
*
Lp(X,y)
• ~(x,y).
This result holds by the fact that
T
T • 2n N
(in the 2x2 case with nam).
p
2n + TN
We have noticed empirically that the relation mentioned in the
lemma holds between Tp and TM if n = m ~ 7; it does not generally hold
for other values of n. Table (2.1) shows that:
TABLE 2.1
THE ORDER INDUCED BY THE STATISTICS TP AND T
M
*
ON THE SET F8
Order
Tp
.1
2
3
4
5
6
7
8
9
10
(4,3)
(3,2)
(2,1)
(5,3)
(4,2)
(1,0)
(3,1)
(5,2)
(2,0)
(4,1)
*If
1M
(4,3)
(3,2)
'(2,1)
(1,0)
(5,3)
(4,2)
(3,1)
(2,0)
(5,2)
(4,1)
Order
T
~
11
12
13
14
15
'16
17
18
19
20
(3,0)
(6,2)
(5,1)
(4,0)
(6,1)
(5,0)
(7,1)
(6,0)
(7,0)
(8,0)
(3,0)
(6,2)
(5,1)
(4,0)
(6,1)
(5,0)
(7,1)
(6,0)
(7,0)
(8,0)
P
x-y then Tp .• TM • O.
•
34
2.2. Assigning a level of significance (the problem of the unknown
parameter)
We come to the point of drawing inferences about H.
o
Given any
statistic T (from among Tp ' TN' TR, TM and T )' for fixed nand m, let
c
. {ail be the set of the ordered values taken by T, such that
a O < a 1 < ••• < ako
For each value ai' there is a set of cumulative
probabilities Fi(n) , n E [0,1].
Of course, we are interested in the
upper tail, and hence the rule for a test of size a is:
Reject Ho for all a such that
i
So far, the test depends on the unknown parameter of which
nothing in the statement of the problem can provide us with a value.
We should try to make the test independent of the value of the unknown
parameter.
This problem has been treated in different contexts.
For the
general approach see Wa1d (1939), and for the case of testing homogeneity in 2x2 tables see Barnard (1947) and Tocher (1950).
Considering the background provided by the previous references,
we can see that introducing an a priori distribution for n could solve
this problem.
Some considerations argue against that.
rather a strong assumption.
First, it is
Second, if we agree that we are ready to
accept this strong assumption, we have in general no possibility of
determining the distribution of n and any assumptions regarding this
distribution are of hypotheticai character [Wa1d (1939)].
Even if we
accept that, we can see that the type of the assumed distribution may
vary from one problem to another.
That adds a lot to the computations.
Third, the objection can be made against it, as Neyman has pointed out,
35
that n is merely an unknown constant and not a variate, and hence it
•
makes no sense to speak of the probability distribution of n [Wald
(1939)].
A comment is in order about this last point.
If, for example,
the statistical analysis is concerned with one point of time, then n is
a constant.
On the other hand, if we consider a case of a sample taken
from a machine every hour to test the proportion n of defective units
in the production, then it is logical to consider n as a random variable.
A weaker assumption concerning the range of n is introduced
later.
This assumption will not solve the problem completely, but it
will make a significant contribution.
The following two methods are suggested here to deal with this
problem.
,..
1. The first method:
Let n(xo'YO) be any BAN
Let (xO'YO) be the observed table.
es~imate
of n.
The approximate level of
significance assigned to this table is given by:
,..
For example,if the maximum likelihood method is used to estimate n,
then the level of significance assigned to each element of Fn,m is .
given by:
L(x,y,n • ~
n+m)'
We note that the same level of significance is assigned to all the
members of . { T-1
p (ai)}.At this point we see that the assignment of a
level of significance based on a BAN estimate for n is more convenient
than the method used by Barnard (1947) in which the level of signif-
•
icance would be L (x,y,n).
•
36
2. The second method:
The set F
is partitioned into the
n,m
following three sets (for a test of size a):
(a) The acceptance set (A):
A • {(x,y) E Fn,m :
L(x,y,~) >
A • {(x,y) E Fn,m :
T(x,y)·
~ E
a,
[O,l]}.
Note that
ol.
(b) The indefinite set (1):
1 .' {(x,y)
E
L(x,y,~) > a, ~
Fn,m :
E
01 and 01 ~ [O,l]}.
(c) The rejection set (R):
R • {(x,y)
E
Fn,m :
L(x,y,~) <
a ~
E
[O,l]}
i.e., '
R • {(x,y)
If (x,y)
E
A
U
E
F • L*(x,y)
n,m·
~
al.
R, the test is independent of the unknown value
of the parameter in the sense that H is accepted or rejected for all
0
values
of~.
If (x,Y) E I, the level of significance
table is given by L
(x,y,~)
where
~
is a
BAN
estimate
assigne~
of~.
to the
We note
tbat if we decided to use the first method then the question arises as
to which method of estimation we shall use.
By using the second
method, we are limiting this question. to the set I.
The question is
irrelevant if the observed table is an element' of the setA
U
R.
As the calculations will show later, the size of A could be
-
reasonably increased by imposing the restriction that . E <
for some E > O.
~
< 1,- E
Unless we have a good reason to believe that
~
lies
in a certain subset of the closed unit interval, we feel that E should
37
In our calculations we considered E - .05.
be as small as possible.
Remarks
1. If the same BAN estimate is used by the first method and by
•
the second method (on the set I), then the resulting two tests are
equivalent in the sense that Ho is rejected (or accepted) by the first
method iff it is rejected (or accepted) by the second method (given
that the same statistic is used to order the set Fn ,m)·
2. The set Fn,m is partitioned among three sets by the. second.
This part! t!on is carried naturally to the set {ail defined
method.
Thus, if ~ E [0,1] then A = {a • O}. On the other hand if
O
E < ~ < 1 - E, E > 0, the set A is obviously non-empty and A {a = O}
O
earlier.
=
iff
a >1 ~
max
P{T ""O}.
E [E,l-E]
To show that, note A - {aOi ~f a < 1 and a > min P{T, > O} •
1T
min[l - P{T ""O}] .. 1 - max p{T.-oh E < ~ < 1 - E.
~
1T
3. The set I is non-empty.
Note that L
So, if
°
(x,y,~)
It contains at least two elements.
is not a constant function of
< a < 1, then I is non-empty.
write the table (x,y) as (Xi'Yi).
1T
iff T (x,y)
~
0.
If T (x,y) - a 1 then we shall
If (xi,Y i ) E I then either
(xi-l'Yi-l) E I or (xi+l'Yi+l) E I.
4. The set R is the empty set iff:
a < L.*(n,O,1T).
If n-m then R is empty iff a/2 < (1/4)n.
T (n,O).
The rest is obvious.
Note that max T (x,y) ..
•
38
2.3. Example
In this example, we construct the proposed exact test for the
case n
= m = 7. The set of all 2x2 tables, F7 , is represented in the
following diagram:
By the symmetry property, and by the fact·that T(x,y) - T(y,x)
when n - m, it is sufficient to use the set E in defining the set of
all values taken by T.
A probability is associated with each ai.
For
example, considering the statistic T (although for this case TN and
p
T are equivalent to Tp ), we have.Tp (4,2) • 1.167 and
M
P{T • 1.167} R
2(r)(~) n6 (1_n)8 + (~)(~) n8(1_n)6]~
(a) To construct the test by the first method, we use the maxi.A
*
mum likelihood estimate n, of n.According .to that, the possible
values which can be obtained for n are 0, 1/7, 2/7, ••• , 1.
L (x,y,n)
A
p
~
Since
L (x,y,l-n), it is sufficient to consider only
p
n • 0, 1/7, ••• , 1/2.
The exact distribution of T is calculated for
those values of n(table (2.2».
p
A summary of the results is given in
table (2.3).
*Naturally, one could equally well use some other BAN estimate.
We used the maximum likelihood estimate here because it is more convenient.
39
(b) To use the second method, the exact distribution of T is
p
calculated for
~
= .05,
.10, ••• , .50 (and naturally for
~
• .55, .60,'
..., .95) •
•
For a test of size a = .05, F is partitioned among the three
7
sets mentioned earlier and given by:
{(x,y); T (x,y) < 2.333},
A
c'
I
= {(x,y); 2.333
R
= {(x,y);
p
-
< T (x,y) < 5.600},
Tp(x,y)
p
~
5.600}.
Remarks
it
We saw that the calculation of LB(x,y) is complicated
(TB'~(x,y)
and LB(x,y,~)' are associated with Barnard's test (1947».
But the information of columns 4 and 5 in table (2.3) suggests that we
*
could use L (x,y,l/2) as an approximation to LB(x,y),
even though in
p
this case R (x,y)
p .
~ ~(x,y).
T (5,2) < T (2,0».
B
B
(Note T (2,0) < T (5,2) while
p
p
The question Is, can we do that?
For very small n this could be reasonable, the example for that
is given by table (2.3).
Unfortunately we do riot expect this to be
true for larger values of n for two reasons.
expect to find Rp(x,y) •
~(x,y),
First, we do not generally
as we saw for n • 7.
it
L (x,y,1!2) is not a good approximation of L (x,y).
p
.
sider the case n
functions
p
=m •
Lp(xi'Yi'~)
17.
Second,
For example, con-
Graph (2.4) provides fifteen successive
as indicated in the graph).
These functions are.
bounded above by .08, but for some of them we have:
it
IL p (x,y,1!2) - Lp (x,y) I > .01.
We see that this error is relatively large.
Some other functions
•
TABLE 2.2
THE EXACT DISTRIBUTION OF THE STATISTICS T ' TN AND
p
~
FOR n=m=7
1T
Tables
1/14
2/14
o-tab1es
l.00
l.0
3/14
l.0
4/14
l.0
(4,3)
(3,2)
.5374
.5374
.6848
.6837
.7393
.7300
.7660
.7299
(2,1)
(1,0)
(4,2)
(3,1)
(2,0)
.5360
.4885
.1069
.1068
.1008
•1269X10-1
•1269x10-1
.6619
.5046
.2350
.2313
.1876
•5279X10-1
•5242X10-1
•4514X10-1
• 7691x10- 2
• 6942xlO- 2
.70l5xlO-3
•66l0xlO-3
•3646X10-4
•8255X10- 6
.6537
.4499
.3194
.2972
.5910
.4215
.3712
.3083
.1949
(5,2)
(4,1)
(3,0)
(5,1)
(4,0)
(6,1)
(5,0)
(6,0)
(7,0)
•1222X10--l
•9318X10- 3
.910lxlO- 3
.4l68xlO-4
.4ll3xlO-4
.1045xlO- 5
.1129xlO-7
.2045
•9777x10-1
•9439x10-1
•6898X10- 1
.2046x10-1
.1601xlO-l
•2773XIO- 2
•2397xlO- 2
.2192X10~3
•767lxlO- 5
.1345
.1215
•7515x10-1
•3481x10-1
•2224 x10-l
.6043 xlO- 2
•4598xlO- 2
•628lxlO- 3
.-2949xlO- 4
5/14
l.0
.7807
.6983
.5228
.4181
.4021
.2857
.1867
.1600
.1303
•7184X10-1
•4690x10-1
•2362X10- l
•9484X10- 2
.6l88 xlO- 2
.1176xlO- 2
.6726 xlO- 4
6/14
l.0
1/2
l.0
.7882
.6587
.7905
.6410
.4770
.4230
.4187
.2569
.1846
.1747
.1281
•6756x10-1
•5470x10-1
• 2234 xlO- l
.1202xlO- l
•6840 xlO- 2
.1647 xlO- 2
.1057 xlO- 3
.4615
.4257
.4240
.2445
•.1847
.1796
.1257
• 6592x10-1
•5737x10-1
•2148 xlO- l
.1294xlO-l
•6958 xlO- 2
.183lxlO- 2
.122lxlO- 3
.r:--
o
41
TABLE 2.3
•
THE LEVEL OF SIGNIFICANCE ASSIGNED TO TABLES IN F7
USING DIFFERENT METHO~S
.
(x,y)
Tp (x,y)
(0,0)
0.00
1& 1st Method
L~S.;
L(X,y,1T" • .!±I:
2n )
L(x,y,1/2)
1.000
1T=1/2
L.S. (Barnard)
1.00
(4,3)
.286
.7905
.7905
(3,2)
.311
.6983
.6410
(2,1)
.424
.6537
.4615
(1,0)
1.077
.4885
.4257
(4,2)
1.167
.4187
.4240
(3,1)
1.400
.3083
.2445
(2,0)
2.333
.1876
.1847
.200
(5,2)
2.571
.1796
.1796
.210
(4,1)
2.800
.1303
.1257
.130
(3,0)
3.818
.0689
.0659
.075
(5,1)
4.667
.0547
.0574
.057
(4,0)
5.600
.0222
.0215
.024
(6,1)
7.143
.0129
.0129
.013
(5,0)
7.778
.0062
.0069
.007
(6,0)
10.500
.0016
.0018
.0018
(7,0)
14.000
.00012
.00012
.00012
•
42
GRAPH
~.4
Fifteen successive functions Lp(xi,yi,P)
for n = m = 17
()
·
CD
..
..
·
<1'
·
CI
CD
,..
.
.
o
M
- .'"
"'0'"'
x
0
.~
o .......~Ill:::.._--r-...:...._~-t------t----
·
q.OO
2.00
6.00
.......--~~ ....
10.0
8.00
P )C 100
1
The functions (from top to bottom) correspond to the tables
(xi,Y i ) • (8,3), (7,2), (11,5) , (10,4), (6,1), (4,0) , (9,3), (8,2) ,
(12,5) , (11,4), (5,0),
(7,1) ,
(10,3), (9,2) , and (6,0).
Tp (8,3) < Tp (7,2) < ••• < Tp (6,0).
Note that
43
possess a larger error in magnitude, for example
IL (7,6,1/2)
p
•
- L p*(7,6)/ > .14.
3. An exact test for rxc tables
The method used to construct an exact test for homogeneity in
2x2 tables is here extended to include more general situations con-
cerning rxc tables.
Suppose that we have the rxc contingency table" {x
ij
},
i • l, ••• ,r, j • l, ••• ,c.
This table could arise under different
experimental conditions.
Denote the experimental conditions by y.
Let F be the set of all rxc tables satisfying y and let ~y be the
probability model
associa~ed
with the experiment.
depends on a vector of unknown
parameters~'.
satisfy certain relations, which depend on y.
Ho:
G(~)
~y
The components of n'
If we consider
= 0, where G is a vector valued function, then Ho imposes
another restriction on the components of
admissible values of
by Ho •
The function
Let
~
~'
~'.
Let
n be
the set of all
satisfying the restrictions imposed by y and
be the form taken by
yO.
~y
under H .
0
Let T be any given statistic to test H. The statistic T
o
induces the partition {F ,F , ••• ,F } on F by means of the relation:
k
l 2
E F and t E F iff T(t ) • T(t ); t , t E F. The set of ordered
l
2
l
l
i
2
2
i
values taken by T is then given by {ail such that a • T(Fj ) for some
i
i and some j.
t
Let
i.e.,
~Oi(~)
be the probability associated with a
i
(under H )'
o
•
44
The exact distribution of T, under H , is then given by:
o
i-I
F " • P{T < ail~}· L ~Oj(~)' ~ E O.
i ,~
j-l
To tackle the problem of the unknown parameters involved in this
expression, two methods are suggested.
1. Let t E F be the observed table with T(t) • ai. Let
to the table t is given by:
BAN estimate of 'IT based on t.
i-I
"-
-
be a
Then, the level of significance assigned
i,'IT = 1
1 - F
~
-
L
j-l
A
~Oj('IT).
-
2. For a given test of size a, it is possible to partition
among three sets.
A .' {t E Fj 1 - Fi,'IT > a, ~ EO},
-
and
R· {t E Fj 1 - ~i,! ~ a, ~'e O}.
Clearly, the test is independent of 'IT, in the sense mentioned earlier,
-
if t E (A U R).
If tEl, then the first method is used.
As we saw earlier, it is possible to increase the size of
imposing a condition on O.
all 'IT E int (0).
assumption:
~
E0
This condition is written here as:
A by
Consider
This restriction practically implies the following
0
c
int (0) and 00 is closed.
Remarks
In our discussion we considered a aiven statistic T.. to test
for Ho ' as an ordering device for the results of the experiments. It
i. understood that the asymptotic properties of T are known. The exact
45
test could also be carried out if we consider any other reasonable
ordering device.
For example, the approach used by Barnard (1947)
could be utilized here to test for homogeneity, but that leads to a
very complicated situation for the rxc tables.
•
Since we are looking
for a test to be applicable for a wide class of problems, and since
hypotheses concerning rxc tables are mostly handled using the X2 theory,
which implies that we accept implicitly the nature of order induced by
the corresponding statjstic on F, we see that the aforementioned procedure is a good device to use in ordering the elements of
F.
Examples
1. Suppose that we have two populations and three categories
of response with probabilities
~ij'
i
size n is drawn from each population.
= 1,2,;
= 1,2,3.
j
A sample of
We want to test
j • 1,2,3.
In this case (2 X 3 table) we have:
such that
I n - {(xtY) E Bn ~ B:
n
x+
y ~
n},'
where
Bn - {O,l, ••• ,n}
and
2. Consider a 3x2 table (i.e., we have three populations each
with two categories of response).
Let n be the sample size from each
population, and let Ho be the hypothesis
of homogeneity, then:
.
•
46
F c· {(x,y,z) £ B x B x B },
n
n
n
{1 •
[0,1].
3. Suppose a sample of size n is classified according to two
characteristics each having two response levels.
Suppose that the
numbers in the four resulting categories are x,y,z,w and the probabilities associated with these categories are
~ll' ~12' ~2l
and
~22'
We
want to test the independence of the two characteristics.
Let
~l - ~ll
+ ~12 and ~2 -~ll + ~2l'
Let
~ • ~ll - ~1~2 •
Then we have
and the hypothesis of independence is, H:
o
~.
O.
Under H , the probability of observing the entr.ies x,y,z,v in
o
the table is given by
In this case we have
F. {(X,1,Z,W);
(x,y,z) £ Bn x Bn x Bn , x+y+z~n, w • n-x-y-z},
and
n.
[0,1] x [0,1].
47
4. Numerical comparison
We consider again the case of 2x2 tables and nam with the
hypothesis of homogeneity.
The exact test for this case has been intro-
•
duced earlier; the tables associated with the test are given in
Section 4.1 for values of n up to n
= 16. For n ~ 17 we may use one
of the x2-statistics, where its exact distribution is approximated by
X2 (1).
It is useful to give some idea about which one of these
statistics is better approximated by X2 (1).
Let (x,y) be the observed table.
Define H(x,y) by
B(x,y) - P{X 2 (1) > T(x,y)}.
For given (x,y) and
~ £
(0,1), we say that T is more closely approxil
mated by X2 than T2 ~ff
For practical reasons, we are interested in investigating this
relation for those tables (x,y) such that Hl(x,y) or B2 (x,y) is in the
neighborhoo~ of
a, where a is somewhere around .05 or .01.
The statistics discussed here are Tp ' TN and T • We argue that
H
for a.given test of size a, no single statistic is better than the
others, in the previous sense, for all values of
~
£ (0,1).
So, each
one of these statistics is better than the others for some values of
~.
This is indicated by the following numerical examples.
1. For n • m ~ 7 we noticed that Lp(X,y,~) • ~(x,Y,~).
In
Figure (2.5), for n • 7, we consider four successive functions
Lp(Xi'Yi'~) with the corresponding Hp(xi,y i ), ~(xi'Yi) and ~(xi'Yi).
According to that and to the numerical calculation cif the exact
bution we communicate two points.
~istri-
First, we use TN if we believe that
•
48
n is near the boundary of the closed unit interval (n £ B) (otherwise
T is recommended).
M
Second, the tize of
B
increases as a decreases.
2. In this example we consider Tp and TN for n • m • 17.
Figure (2.6) shows five successive functions Lp(xi,yi,n) such that
*
Lp(Xi,y ) is in the neighborhood of .05 and two successive functions
i
*
with Lp(xi,y
i ) in the neighborhood of .01, with the corresponding
Hp(xi,y i ) and
~(xi'Yi).
According to that and to the calculations
of the exact distribution we draw the same conclusion as in the previous
example with TM replaced by T •
.
P
3. Finally we consider a comparison between Tp and TM in terms
of their expected values and variances with respect to the expected
value and variance of X2 (1), (see Berkson (1972, pp. 464-466»).
According to the results in Table (2.7) we see that
and
We note that this is consistent with what we get in the·first example.
So, in the small and moderate sample sizes we do not expect
that one statistic will be more closely approximated by X2 than the
others for all values of n.
Second, we see that
TM
is most closely
approximated for n in the neighborhood of 1/2, and TN for n near 0
or 1.
49
GRAPH 2.5
•
Four successive functions L (X 'Y1'P) with the corresponding
p i
Hp '
~
and
~,
for n • m = 7
.,.
.
·
0'
CD
0'
..
..
,.. .
,.. ·
.w
.
--
CIt
..J
CIt
G
.....
--
0-
-
."
X
~
....J
."
f\J
·f\J
X
~~
(CJ
Ci'
· D.
.f\J
·
0
'I. DO
8.00
D.
P )11 Ir'
C&J
·
....J
tt'
...-
f\J
,.. ·
.
f \J
r-
(CJ
~
!"-
?
-- ·
-
---·
-...
~
."
."
x
~
x
Ci'
~
CIt
..J
o ....- . ._--'f==========F=~
o
· o.
8.00
--
..
-
'1.00
, )110-1
8.00
· o.
't.DO
,
8.00
)r
10-
1
•
50
GRAPH 2.6
Some functions Lp(xi,yi,P) with the corresponding
H and ~ forn • m • 17
p
t:I'
·
r-
..
·...J
-.
C I'
..
.
CI'
r- ...J
....,
..wco
.
-...."
_N
...."
.... 1\.1
·
~~
)C
X
~
o-r'------......------I.--~
8.00
D.
'LOO
P
.o..,.'-------t------t--~
8.00
o.
'LDO
P )t 10-1
~ler'
.
t:I'
......
..r-
..
..
.
....,
~
...
".CI'
J=:. ...J
..-~
0
-... .
... "
-.
VI
"
....X
~
X I\J
~~
I\,)
I\,)
.~ CO
~-t"-----
D.
.....-----+--~
'LDO
P )c 1era
8.00
~
~-1'-----~----~t--~
D.
'f.DO
8.00
P :.10'"'
GRAPH 2.6 (Continued)
51
•
.
DI
..
,..
-- ""
.
(,A
--.I
0'
10
t-'
"V
X
~
.
""<0
_
.o-t'r....------t----_
D.
't.oo
+-_~
8.00
p )tl0:o'
.t
......
-.J
.....
,.. ........
-.
,..
~
\0
•N
.
0
-.
"V
~
...
~
.
....X
.....
.....
"V
0
X
.....
...
(,A
0
<.-
--.I
~ .....
--.I
D.
't.DO
P
8.00
)C
10-'
D.
q.OO
f xl V'
"8.00 .
•
TABLE 2.7a
THE FIRST MOMENT OF Tp AND
~
FOR DIFFERENT SAMPLE SIZES
'II'
n
10
.10
.05
.6753(Tp ) .9247
.15
1.012
.9189
.20
1.040
.9699
.25
1.049
.9994
1.039
.45
1.053
1.044
.50
1.053
1.045
1.048
1.028
1.048
1.048
1.048
1.036
1.040
1.042
1.016
1.043
1.026
1.043
1.033
1.043
1.037
1.043
1.038
1.040
1.015
1.040
1.024
1.040
1.031
1.040
1.035
1.040
1.036
1.037
1.014
1.037
1.023
1.037
1.029
1.037
1.032
1.037
1.033
1.034
1.034
1.034
1.034
1.034
.30
1.052
1.018
.35
1.052
.40
1.053
1.031
1.047
1.017
1.043
.5811(~)
.8163
.7087
1.018
.6094
.9445
.8352
.7388
.6351
.9602
.8509
1.022
12
.9351
.9758
.7659
.6584
.9728
.8640
1.025
.9407
1.037
.9778
1.039
13
.7904
.6796
.9828
.8751
1.026
1.035
.9793
1.037
14
.7863
.9781
1.022
1.032
1.034
.6769
.8740
.9455
.9796
.9997
1.013
1.021
1.027
1.030
1.031
.8298
.7144
.9946
.8907
1.026
.9513
1.031
.9814
1.032
.9998
1.032
1.012
1.032
1.020
1.032
1.025
1.032
1.028
1.032
1.029
11
15
16
.9280
.9453
1.040
.9733
1.039
1.046
.9997
1.042
.9998
.9999
.9999
U1
N
TABLE 2.7b
THE VARIANCE OF Tp AND TM FOR DIFFERENT SAMPLE SIZES
11"
n
10
11
12
.05
.10
.6140(Tp ) 1. 030
.8061
.4620(~)
.15
1.392
1.133
.6472
1.453
·1.249
14
.7450
.5520
.7738
.5721
1.294
15
16
.8069
.5950
1.339
1.034
.9651
.9995
2.075
2.021
1.908
1.981
2.024
1.884
1.968
2.020
2.050
1.676
1.817
1.918
1.986
2.067
2.026
2.072
1.224
1.743
1.483·
1.556
1.264
1.774
1.515
1.901
2.021
2.049
1.699
1.976
1.832
1.926
1.990
2.063
2.026
2.068
2.038
1.598
1.300
1.801
1.544
1.915
1.718
1.982
1.844
2.022
2.047
1.933
1.993
2.060
2.027
2.064
2.038
1.634
1.822
2.045
1.939
1.995
2.056
2.027
2.060
1.568
1.986
1.855
2.023
1.333
1.926
1.735
1.667
1.363
1.841
1.935
2.023
2.043
2.053
1.591
1·.751
1.989
1.864
1.944
1.997
2.027
2.057
2.037
1.508
.9275
1.896
2.053
1.975
.50
2.082
2.036
1.649
1.145
1.198
.45
2.071
.6800
.7126
.5296
1.943
1.779
.40
.35
2.012
2.052
1.180
13
.30
2.017
.4846
.8884
1.657
1.405
.25
1.832
1.618
1.958
1.800
1.089
.8478
.5072
.20
1.704
1.447
1.861
2.077
2.038
2.038
2.037
\.11
W
•
•
54
4.1. Tables
Here we give the tables associated with the test described in
Section 2 for the case when n • m and for values of n up to n • 16.
1. Table (2.8) gives
where
is the maximum likelihood estimate of
~
given by T-p 1 (x).
~
based on the tables
Note that
{
}
T-1
p (x)· (xo,yo) , (yo'x o), (n-x O' n-yO)' (n-yO' n-x O) ,
for some
X
o and YO.
Also we have
"
"
~(xO,yO) • ~(yO'xO)
=1
"
- ~(n-xo,n-yO)
and
piTp-> xl~} • piTp-> xI1-~}.
Thus, Table (2.8) associatps a unique value with each of the members
of the set· {T-1 (x)}.
p
Now for fixed n, let a and b be the
fi~st
and the last value,
respectively, given for Tp in the table. Let c be any value assumed by
Tp and not given in the table. Let a 1 , a 2. be the level of si$nificanc~
associated with a and b respectively.
"
1
> cl~ • ~(T- (c»}
p{Tp-
p
Then
> a1
{
if
c <a
if
c
.
<a
2
> b.
2. A summary of results in Table (2.8) is given by Table (2.9).
For a given n and a, a value of Tp (x- say) ·is given by the table such
that
55
and for any other value y we have
if
y < x
if
y >
•
x •
3. In Section 2 we mentioned that it is possible to divide the
set of all 2x2 tables among three sets. A. I and R.
~ £
(.05 •• 95]. Table (2.10) gives the set I for a
Assuming
= .05 and .01. Two
values of T «(a.b]- say) are given for a fixed nand a.
p
The set I is
given by
I
= {xla
~ x ~ b and x is assumed by T }.
P
Note that
p{T > x} > a for ~
p-
£
(.05 •• 95] if x < a
and
p{T
> x}< a for all ~ if x > b.
p-
If ~(1-~) ~ 0. then as ~ T ~ X2 (1). i.e ••
p
p{T
> xl~} ~
p -
-
p{X 2 (1) >
-
x}.
Thus by the definition of the set I we get la-bl ~ 0 as n ~~.
also implies that p{t £ I}
max p{t
£
~O
as n
~~.
This
Table (2.10) also shows
I} for different values of n and for a • .05 •• 01 (assuming
~
~ £
(.05 •• 95]).
•
56
TABLE 2.8
T AND ITS LEVEL OF SIGNIFICANCE USING THE MAXIMUM
P LIKELIHOOD ESTIMA+EOF ~ ~Rn ~ 2.3 ••••• 16 .
T
p
Level of
Significance
1.00
.5391
.1250
T
p
Level of
Si8n1ficance
.1836
.1399
.0638
.0386
n-2
0.0
1.333
4.000
n=3
0.0
.667
1.200
3.00
6.00
1.00
.6875
.4915
.1866
.0313
n=7
0.0
.533
1.143
2.00
2.667
4.800
8.00
1.00
.7162
.4899
.2891
.P06
.0608
.0078
2.800
3.818
4.667
5.600
7.143
7.778
10.500
.1303
.0689
.0547
.0222
.0129
.0062
.• 0016
n-8
.400
.476
1.111
1. 667
2.500
3.600
4.286
6.667
10.00
.7539
.6468
.4893
.3326
.1778
.1094
.0581
.0188
.0019
2.618
3.692
4.000
4.267
5.333
6.349
7.273
9.00
9.600
.1363
.0729
.0768
.0529
.0253
.0202
.0075
.0042
.0019
n-9
.343
.444
1.091
1.333
1.500
.7709
.6482
.4889
.3877
.3021
2.492
3.600
4.00
5.143
5.556
.5.844
.1441
.0964
.0569
.0278
n-4
n-5
n-6
2.400
3.086
4.00
5.333
6.000
8.571
12.00
.0192
.0056
.00048
.0309
.0205
57
TABLE 2.8 (Continued)
T
P
6.923
8.100
9.00
10.889
Level of
Significance·
.0090
.0072
.0025
.0013
n-10 3.200
3.333
3.529
3.810
5.000
5.051
5.495
6.667
7.200
7.500
8.571
9.899
.1158
.0920
.0778
.0621
.0299
.0403
.0227
.0104
.0118
.0076
.0031
.0025
2.933
3.143
3.474
3.667
4.545
4.701
4.889
5.238
6.471
6.600
7.071
8.250
8.909
9.214
10.267
.1324
.0957
.0796
.0666
.0528
.0410
.0311
.0256
.0115
.0164
.0087
.0038
.0043
.0027
.0011
u-11
.T ..
p
Level of
.Significance
n-12
3.00
3.429
3.556
4.196
4.444
4.800
5.042
6.000
6.171
6.316
6.750
8.000
8.224
8.711
9.882
10.667
.1020
.0812
.0704
.0632
.0435
.0324
.0283
.0228
.0174
.0122
.0101
.0044
.0064
.0032
.0013
.0015
n=13
2.889
3.391
3.467
3.846
.1080
.0826
.0737
.0756
.0611
.0474
.0336
.0306
.0286
.0189
.0130
.0116
.009S
.0071
.0046
3.939
4.248
4.727
4.887
5.S71
S.850
6.190
6.500
7.538
7.7~1
7.800
•
•
58
TABLE 2.8. (Continued)
T
p
n-14
0-15
Level of
Significance· .
8.327
9.579
9.905
10.400
.0039
.0016
.0024
.0012
2.800
3.360
3.394
3.590
3.743
4.094
4.667
4.762
5.143
5.250
5.600
6.087
6.300
7.036
7.337
7.636
8.023
9.143
9.333
9.956
.1132
.0837
.0764
.0864
.0630
.0513
.0346
.0327
.0357
.0284
.0211
.0137
.0129
.0124
.0019
.0051
.0046
.0038
.0028
.0015
2.727
3.333
3.394
3.589
3.968
4.615
4.658
4.821
.1177
.0989
.0829
.0672
.0548
.0355
.0345
.0423
.Tp
5.000
5.400
6.000
6.136
6.533
6.652
7.033
7.500
7.778
8.571
8.889
9.130
9.600
10.800
0-16 3.137
3.239
3.282
3.310
3.463
3.865
4.500
4.571
4.800
5.236
5.926
6.000
6.149
6.348
6.788
7.385
7.575
8.000
Level of
Significance
.0298
.0233
.0143
.0141
.0161
.0126
.0090
.0055
.0052
.0052
.0032
.0019
.0018
.0015
.1096
.0849
.0986
.0781
.0716 .
.0579
.0502
.0419
.0324
.0234
.0149
.0151
.0198
.0135
.0102
.0058
.0059
.0070
59
TABLE 2.8 (Continued)
T
p
8.127
8.533
8.960
9.309
10.165
10.494
Level of
Significance
Level of
Significance
.0054
.0037
.0021
.0021
.0021
.0012
•
TABLE 2.9
CRITICAL VALUES OF Tp USING THE MAXIMUM
LIKELIHOOD ESTIMATE OF n
a
.1
.05
3
6.00'
6.00
4
4.800
8.00
5
4.286
6.667
6
4.00
5.333
8.571
12.00
7
3.818
5.600
7.778
. 10.500
14.00
8
3.692
5.333
7.273
9.00
12.444
9
3.600
5.143
6.923
9.00
11.455
10
3.333
5.000
7.500
8.571
10.769
11
3.143
4.701
7.071
8.250
11. 733
12
3.429
4.444
8.000
.
8.711
13
3.391
4.248
7.538
7.800
11.556
14
3.360
4.667
7.337
8.073
11.200
15
3.333
4.615
7.03j
8.889
10.909
16
3.239
4.571
7.385
8.533
10.667
n
.01
.• 001
.005
10.00
"The value associated with 8.0 is
.0044.
"
.10.971
•
60
TARLE 2.10
mE INDEFINITE SET I AND ITS MAXIMUM PROBABILITY
USING T,p ASSUMING n E [.05,.95]
.
a
.05
•01
m~xp{tEI}
I
D
,.
.
3
3.0
4
[2.0
5
[1. 667 , 4.286]
6
[1.50
I
m:XP{tEI}
6.0
, 4.80 ]
4.80
.107
.322
[3.60
]
.279
[3.086 • 6.0
]
.140
7
[2.571 , 4.667]
.158
[2.571 • 7.143]
.173
8
[2.618 • 4.267]
.108
[2.618 • 6.349]
.129
9
[2.492 , 4.0
]
.111
[4.0
• 6.923]
.052
, 4.0
• 6.667]
10
[2.40
3.810]
.116
[3.810 • 7.20 ]
.057
11
[2.329 , 4.54.5]
.118
[3.667 • 6.60 ]
.060
12
[2.274 , 4.196]
.120
[3.556 • 6.750]
.067·
13
. [2.229 , 3.939]
.131 .
[3.467 • 6.50 ]
.070
14
[2.191 , 4.094]
.149
[3.394 • 7.036]
.080
15
[2.160 • 3.968]
.131
[3.394 • 6.652]
.075
16
[2.327 • 4.50 ]
.105
[3.463 • 6.788]
.064
61
5. Summary
For a contingency table, assume that we have a problem of testing a null hypothesis H which has an asymptotic solution.
o
4It
If the
sample size is relatively small, it is desirable to consider an exact
procedure.
One way to do that is to consider an axiomatic approach,
like the one developed by Barnard (1947) to test for homogeneity in 2x2
tables.
We see that this is a reasonable approach to follow.
On the
other hand, for each problem and for each experimental situation, a
different set of axioms must be developed, so that the situation gets
very complicated.
What we have tried to develop here is a procedure to
follow in constructing an exact test by taking advantage of the existence of an asymptotic solution to the problem.
We proceed as follows.
There is an asymptotic solution to the problem at hand; i.e.,
there is a statistic T with known
solve the problem on an
as~mptotic
asymptotic properties.
Using T to
basis implies that we accept the
nature of the ordering induced by T on the sample space.
Hence, it is
completely legitimate to use T for ordering the sample space when the
sampie size is small.
(The statistic T is replacing the set of axioms
mentioned earlier).
Now, by using T, we can arrange the sample space as a totally
ordered set.
If the exact distribution of T depends on a set of unknown
parameters, two methods were suggested to deal with this problem.
method is to consider BAN estimates of these unknown parameters.
One
The
second method divides the sample space into three sets, two of them
independent of the values of the unknown parameters in the sense that
H is accepted or rejected for all these values, and the third set
o
handled by
th~
first method,considering an estimate for
~.
•
CHAPTER III
ON THE BAHADUR EFFICIENCY OF CERTAIN TESTS
1. Introduction
In the first chapter, several consistent and asymptotically
equivalent statistics were introduced to test for homogeneity in twodimensional contingency tables.
It is of interest to have a comparison
between these competing test statistics.
One method for comparing two tests of the same hypothesis is to
compute the (asymptotic) efficiency of one test relative to the other.
The concept of efficiency is well known in point estimation.
If 8 is a
'" , '"8 are two unbiased estimates for 8 then the effiparameter and e
l
2
A
A
A
A
ciency of 8 with respect to 8 is defined as V(8 )/V(8 ). This measure
l
2
1
2
allows us, at least for large samples, to compute how many more observations are
~eeded
in order to get as accurate an estimate from the less
efficient estimator.
This concept is usually taken also as a basis
for defining the efficiency of one test relative to another.
Methods and theorems for the efficiency of tests were given by
Pitman (1948), Chernoff (1952), Hoeffding and Rosenblatt (1955),
Noether (1955), Hodges and Lehmann (1956), Bahadur (1960), Hoeffding
(1965) and Abrahamson (1967).
For a summary of the work in this area
and for more references see Puri and Sen (1971, p. 112), Bao (1965,
p. 390) and Noether (1967, p. 84).
63
The theory of Bahadur efficiency (Bahadur (1960»
is considered
here as a basis for the comparison between the competing statistics for
two reasons.
First, it is computationally easier than the other methods
of comparison.
•
Second, it is consistent with the other methods (based
on the power function), see Bahadur (1960, pp. 290-293).
2. The theory of BAHADUR efficiency
Let S be an abstract sample space.
Consider {Pa;a
£
O} to be a
collection of probability measures defined on S, where n is an abstract
parameter space.
For nO
c
n, let Ho denote the hypothesis that
Suppose {Tn } is a sequence of real valued functions (statistics) on S.
Definition 1
. {Tn} is said to be a standard sequence (for testing H ) if the
o
following three conditions are satisfied:
1. There exists a continuous probability distribution function
F such that, for each a £ 0 ;
0
lim Pe{T < x} - F(x)
n
n-+co
2. There exists a constant a, 0 < a <
log[l - F(x)] - - ~
(3.2.1)
for all x.
~,
such that
2
[1 + 0(1)],
3. There exists a function b on 0 with 0
as
~
~.
b <
~,
(3.2.2)
such that
for each e £ 0 we have
T
lim Pa{l-E - b(e)1 > x} • 0
(3.2.3)
for all x.
Iii'
From (1) and (2), we see that b(a) - 0 when a
n-+co
£
no
p
and Tn ~ ~
when e £ 0 - 0 , So, if T is used to test for H , then large values
n
0
0
of Tn may be judged as giving evidence against Ho '
•
64
Definition 2
For S E S, the level attained by T is defined by
n
r{Tn -> Tn (s)IH0 }.
If the exact distribution of Tn (under H0 ) is not at hand for one
reason or
another~
then the limiting distribution function, F,·of Tn
could be used to obtain approximate levels.
So, we have:
Defini tion 2'.
The ap.proximate level attained by Tn is defined by
The behavior of the random variable Ln as
~
could be described in
terms of the transformation
Kn(s)
Clearly, as
n~ L
D
-2 log
L~ •
-2 log[l - F(Tn(s»].
n is approximately uniform on (0,1) if 8 E 00.
-2 log Ln is X2 (2) for e
E
00.
(3.2.4)
Hence
This transformation serves as a
standardization of the limiting distribution function F.
Definition 3
The function C(e) is called the asymptotic slope of the test
based on {Tn} when 8 obtains, where C(8) is defined by:
0
,
(3.2.5)
C(8) • {
a[b(8)]2,
it can be shown (Bahadur (1960, p. 277» that:
It
P
~+ C(8),
n
8 EO.
(3.2.6)
65
K a.s
-a
n
C(6),
+
for all 6 £ O.
(3.2.8)
In the following, assume that" {T } is strongly consistent.
n
let a be a chosen level of significance and 8
£
0 - 0 •
0
Now,
How large
does n have to be for the level a to be attained in the sequence {T }?
n
Clearly n
= n(s,a), and n
Kn(s,a} a.s ()
+ C 6 as n
N(s,a}
+
+
= as a
+
O.
=; i.e., as a
So, we have
+
O.
From the definition of
Kn we get:
l~
a+O n/2 10g(1/a)
i.e. as a
+
= l/C(S)
a.s.
0, we have:
n ~ 2 10g(1/a)/C(8}
a.s., 6 £ 0 - 00.
(3.2.9)
Let" {T(l}} and" {T(2)} be two standard sequences for testing H •
n
For 8 £ 0 -
n
0
00~
let Cl (8) and C (e} be the approximate slopes of both
2
sequences respectively. Relation (3.2.9) implies that as a + 0 we
have:
a •••• }
(3.2.10)
a.s.
•
66
Relation (3.2.l0) indicates that Ci{S), i
= 1,2, could be used
as a measure of performance of the sequence {T(i)}.
For arbitrarily
n
small a, if Cl{S) > C2 (S) then n < n and hence {T~l)} is faster in
2
l
rejecting H by the approximate test based on the limiting distribution
o
function F.
So, the performances of {T(l)}
n
and {T(2)} may be compared by
n
considering the ratio of the sample sizes needed by each to attain the
Accordingly, as a
same level using the same data.
+
0 and
e£
0 - 0 ,
0
we have:
a.s
(3.2.ll)
+
Definition 5 (Asymptotic efficiency)
Let {T(i)}, i • 1,2, be two standard sequences defined on S,
n
i
and let F (x), a i and bi(S) be the corresponding funccions and constants described in Definition 1.
S
Consider an arbitrary but fixed
0 - 0 , Then the asymptotic efficiency of sequence 1 relative to
0
sequence 2 is defined by:
£
,
So far we have considered the comparison when S
let 0 be a metric space and let 0 - 0 0 be dense in O.
sequence in 0 - 0 0 such that lim ev •
V
eO' eo
£
0 - 0 • Now,
0
Let {~} be a
£ 0 •
0
Definition 6 (Asymptotic limiting efficiency)
The asymptotic limiting efficiency of sequence 1 relative to
sequence 2 is defined by:
67
Clearly
~
is a function of the limiting value 80 as well as the
path to 8 • We note that, in practice, large sample sizes occur in the
0
non-null case only if 8 is in the neighborhood of some 8 E QO' hence
0
it is of interest that other methods of comparison often lead to the
•
same limiting efficiency function.
Finally, we state the following.
Proposition:
(Bahadur (1960»
The function
~1,2(80)
a
lim
~l
8+8 o
2(8) is the asymptotic effici'
ency of sequence 1 relative to sequence 2 in Pitman's sense.
3. Applications
An application of the Bahadur efficiency to the x2-statistic
for testing homogeneity in 2x2 tables is considered.
section consider a 2x2 table with nam.
namial random variables, x
[0,1]
X
[0,1] and.n
O
Ho is 81 = 82 ; i.e.,
Lemma p.l)
a
~
~ E
Let x,y be two 1ndependent bi-
B(n 8 ), y
I 1
{(8 ,8 ) E n;
l 2
Throughout this
~
B(n 8 ). 'Let Q
I l
8 • 8 }.
1
2
a
The null hypothesis
nO.
. {TR}," {Tp }' {TN}' {TM} and" {Tc } are strongly consistent
sequences.
Proof
~
~
(2n-x-y)
2n x yc
,fn-'/)
2 x y n-x ) (n-x) C.n-yr.
(~y) x+y (2n-x-y)
•
68
This can be written as:
Ta
a
2
X
2x + 2 Y log ~
log ---+
+
X
Y
X
Y
+ 2(n-x) log ~(n-x) + 2(n-y) 1
2 (n-y)
n-x-y
og (2n-x-y) •
a.s 8 and (by the continuity of the log
Note that as n ~ ~. ~
n ~
1
2x a.s
28 1
function) log x+y ~ log 8 +8
; hence
1/2
1 2
TaJ
a.s b (8)
~
a ...
[2n
281
28 2
= 81 log 8 +8 + 82 log 8 +8
1 2
1 2
[
2. By direct calculations we get:
2
Iii l!:2!
.
2
--'""2~~/~
[2n
1:f1D[(nx-x ) + (ny-y .)]1 2
TN] 1/
and
•
_ _ _-.=I........
~~
3.
[~:]
1/2=
G
[(G+H~> 2n
69
~ 1/2
J
T
/2
2
.
2 1/2] 2
Y
Dl)2 1/2 + ({n-xl 22:2{nl )
-
• [[ (X
2
lJ
•
a.s b (S)
.... M ...
=
T
4.
]
C
[ 2n
~[
S2+S2 1/2
(1-8 )2 + (1-8 )2 1/2]2
[1 2J
+ [1
2 J
2
2
~1/2
_1
1/2
1/2
=
a.s b (8)
..... c ...
Lemma (3.2)
Le~ F(x) = p{X
k ~ x} where X~ is a chi-square random variable
with k degrees of freedom; then F satisfies condition 2 of definition 1.
Proof
If k 1s even then
and (3.2.2) 1s immediate.
If k is odd then
2
2
p{X:_ l > x } < 1 - rex) < P{X~l > x }
and the same result holds.
•
70
The two preceding lemmas give:
Lemma (3.3)
The five previously mentioned statistics in lemma (3.1) are
standard sequences (for testing H ).
o
Lemma (3.4) (Margolin and Light (1973»
2i
E
(x-y)
< (x log x + Y log Y - (x+y) log x±Y )
2
i-1 (x+y)2i-1 2i(2i-1) r
~ ~:+;~
2
log 2,
x,y > O.
Proof
By the Taylor expansion of x log x about
x log x
= x0
log x + (x-x )(1
00
~
xo
> 0 we have:
+ log x )
0
(x_x)1 (_l)i
+ E
0
x i - 1 1(i-1)'
o
i-2
0<x<2x.
0
Let xo - y0 - x±Y
2 ,then
x log x + y log y - 2x0
log 0
x + 0
(x-x ) (l+log x0 )
Note that
(x-x )i + (y_y )1 • 0
o
0
for odd 1; hence
x log x + Y log y - 2 xo log x0
+
~
(x-x )21 + (y_y )21
tOO
1-1
x 21- 1 21(2i-1)
o
71
But
(x-xo ) 2i • (Y-Yo) 2i
•
(
))2i ,thus
.X;Y
x log x + Y log y - (x+y) log (x'7)
00
(x-v)
1:
=
2i
i=l
Since
2i
(x-Y)
2i-1
(x+y)
2i(2i-1)
part of the inequality.
~
On the other hand,
;
(x-v) 2i
i=l (x+y)2i 2i(2i-1)
where the last factor is
.
0 for all i, that establishes the first
I; ~~1~-:-
(x-v) 2
(x-V) 2i(x+y) li=l 2i(2i-1) x+y
I:
~
1J
2
1; but the remaining sum is just log 2,
which completes the proof.
Lemma (3.5)
1. If 8 E 0 - 0 ,
0
1 <
2. lim
8.....8
~R
~hen
~R
(8)
,p -
~
2 log 2.
(8). 1.
,p -
- -0
Proof
(8 -8 )2i
1 2
•
72
00
2
• b (8) + E
p
i-2
Ai;
ow
if 81
~
82 then Ai > 0 for all i, hence we get 1 <
WR,p(~).
Hence we get
W
(8)
R,p
ow
< 2 log 2.
-
2. Note that
converges uniformly to b:(~) for, 8 £ O.
That is obtained from the
following theorem (see Rudin (1964. p. 136). theorem 7.13).
Theorem
Let E be compact.
Let' {f } be a aequenceof functional
n
continuous on E. which converges to a continuous function f on E.
If
73
fn(x)
~
fn+l(x) for n· 1.2.3 •••• ; and for every x £ E, then f n
uniformly on E. Now
(81 -8 2 )21-2 .
~ f
•
]
... .',
~
By the uniform convergence
Hence
lim
tP
(6)· 1.
R.p ...
6 ~6
1 2
Lemma (3.6)
1. For 6
£
0 - OQ'
1
and
~N
(6) 1s not bounded above •
• p ...
2.
Proof
lim tPN (8)· 1.
-.e
.p ..
8
...
...0
•
74
So,
b2(~)
bN2(~) • _-...P~, and then
2
1 - b (a)
p -
1
Since
2
bp(~)
a1
> 0 for
~
e2 ,
the inequality follows.
~N
To show that
(a) is unbounded, assume the converse;
,p i.e. there exists M < ~ such that:
1 2
< M-) b 2 (e) < MM-1 < 1.
1 - b (a)
p-
.p -
2
2
Note that 0 < b (a) < 1 with b (a)
-
le 1 - a21
= 1,
Hence, W
M,p
p -
-
p -
=0
iff a
1
= a
2
(*)
and b 2 (_a)
p
=1
iff
and (*) contradicts that.
is not bounded above •.
2
.
2. Clearly, lim b (e) • 0, so
6+8
- -0
r-
Lemma (3.7)
For
a £ n we
have:
1
< W (8) < _~1_ _--=2 log 2
N,R - - 1 _ (a ~8 )2 '
1
~
~N,R(~)· 1.
~N,R is not bounded above.
- -0
Proof
For
e £ n,
Lemma (3.5) gives
2
especially
75
(l-b;(~»bi(~)
2
2
< (1 - b (a»
-
b (8)
•
2 log 2
p
p ...
that gives
2
Since bp (a)
..."
2
= 0 iff bR
(8)
= b.2 (8)
• 0, and we stated in the
. . " .N
"
lemma that lim ~N R(a)
1, that implies ~N R(a) > 2 11 2'
8 -+8
,...
,'"
og
a
1
2
To show that
~N,P
Then there exists M <
~
is unbounded above, assume the converse.
b;(8)
such that
2
b (a)
•
2'" < M.
bR(~)
2
b (8)
< M~ p ...
p...
bi(~)(l-b;(~» -
Now
< M(1-b 2 (8».
bi(~) -
p ...
b 2 (8) .
Since 2 11
og
2
...
( lemma (
)
~ ...P,
2'-3.5)
b (£)
R
2
2 11
2
<
M(1-b
og p (8»
...
then
2
(8) -< 1 - M(2
_)b p...
iog
2) < 1,
2
.
(note M is large) and that contradicts the fact tha.t b (1,0)
p
thus
~N
.
= 1,
R is not bounded above.
"
Finally we have
lim
8-+8
- -0
~R
(8)
,p -
'
• 11m .
~~O
(l+b~(£»
•
76
Lemma (3.8)
1. For 8 E
°- 00'
1 ~ $P,M(~) ~ M, M is a finite number.
2. lim $ M(a) = 1.
~
p,
-
- -0
Proof
1. By the definition of TM we have:
T < T , which implies
M- p
T ]
lim [ M
n-+co 2n
1/2 < lim [3?Jl/2
2
'
- n-+co
n
i.e.
b~(~) ~ b;(~).
Note that 0
iff 181 - 821
~(~)
2
~ bM(~)
2
< 1 with bM(~)
2
= 0 iff a1 = 82 and bM(~)
=1
= 1. That gives for 8
E
°- 00' $
p,
M(a) > 1.
-
-
Since
2
.. 0 iff 8 E 00' b (8) is a bounded function, that gives $ M(8)
P .
p, -
is a bounded function too.
2. For the second part, let:
x .. 81 - 82
e1 .. .!±Y.
2
2
2 x 2+y2
81 + e2 ..
2
,
Y ..
81 + 82
then
,
,
8
Y-x
and
2" 2
.
2
2
(1-8 )2 + (1-8 )2 .. x + (2-y)
•
2
1
2
Under this transformation define f(x,y) and g(x,y) as follows:
2
2
x
b (8 ,8 ) .. f(x,y) .. y(2-y)
p 1 2
2
b (8 ,8 ) .. g(x,y) •
M 1 2
~["Z..i
and
il
_Zy+fx2ofy2 Ix 2
+(2-y) J'
77
Now:
af
2x
ax = y(2-y)
... 0
as
x'" 0 ;
2
2
-a f = --:-::"-~
ax2
y(2-y)'
and
... 0 as x ... OJ
So,
11m i& = 1.2 [2 + ..L
+ 2-11
2-y
y j
x+0 dX 2
D
2y(2-y) + ,2 + (2_y)2 • (y+(2=y»2
2y(2-y)
2y(2-y)
2
• y(2-y) •
A.
Hence by L'Hopita1's rule we get:
lim f (x.y) • 1,
x+O g(x,y)
which proves 2.
Lemma (3.9)
1. For
e£
0 - 00' 1 ~ ~c,N(~) and ~c,N(~) is not bounded above.
~
78
2. lim W N(e). 1.
8 +e e, ...
1 2
Proof
...
Note that
1
A(~) ~
~ Wc,N(~)
for
~
b~(~) • A(~).
1 with equality iff 81
£ 0 - 00.
= 82
or 81 + 8
2
= 1.
Now fix 81 £(0.1) and let 82
~
Thus
1 or 0;
then A(~) ... Wc.N(~) ~ ~.
By direct substitution, we get the second part of the lemma.
Lemma (3.10)
For
e£
0 'we have
W
(8)
1
< W (8) < c,N and
2 log 2
e,R - - 1-(8 -8 )2 '
1 2
lim < W R(e). 1.
8 +8
c,1 2
.
Wc R. 1s not bounded above.
,
Proof
For e £ 0, lemma (3.7) gives:
<
1
2 log
.
2
2
note that b (8) ... b (8).
N
c -
2
b~(~)
2
1
~ -"';;;"'---:::-2 '
b (e)
1-(e -8 )
l 2
RW N(8), thus
C,-
Wc,N(~) < b~(~)
2 log 2
< Wc,N(~)
•
2
b (8) - 1-(8 _8)2 .
R...
1
2
The inequality follows on noting that W N(8)
c,
-
~
-
1.
79
To show that
there exists M <
~
c.
such that
00
2
< M(l - b (8»
-
R is unbounded assume the opposite.
p ...
~c,R(~) ~
< K<
00
M.
So,
Now:
•
•
But this contradicts the fact that
~c,N(~?
is an unbounded function.
Finally,
lim ~ R(8)
8 ~e
c,'"
1 2
= elim
~8
1
[~
2
c,
N(8) • ~N,R(~)]
...
= 1.
The aforementioned results are summarized in the following
two theorems.
Theorem (3.11)
Among the statistics mentioned in Lemma (3.1), the asymptotic
limiting efficiency, and hence Pitman's efficiency, of any· statistic
with respect to any other statistic is 1.
Theorem (3.12)
For
~ E
n - nO
we have:
1
1. 2 log 2 < ~N,R(~) <
1
1
2. 2 log 2 < ~c,R(~)
3. 1 < ~ N(e),
- c,'"
~
c,
N is not bounded above,
•
80
1
4. 1 < W
~
2 '
N ,p (0)
1-(0 -0 )
1
2
5. 1 < WR (6) < 2 log 2,
,p -
Consider the case of 2 xc contingency tables, i.e., we have two
populations.
Suppose n l
= n 2 = n,
and let
~
be a vector of unknown
Let Ho be the hypothesis of homogeneity of the two popula-
parameters.
tions and TR ' (T ) be the likelihood ratio statistic (the Pearson
1
PI
X2 statistic) for testing H:
e = 0 , Along the same lines as the
0 _ _0
proof of Lemma (3.5), we can show the following result.
Lemma (3.13)
In the case of 2xc tables with n l
1.
For e € n -
= n 2 - n we have:
0 , 1 < W
(6) < 2 log 2.
0
Rl·Pl -
2. lim
&+e
.... -0
4. Remarks
1. As a measure of relative efficiency of {T(l)} with respect
to' {T(2)} we used l/J1.2 • e fc
where Ci is the approximate slope of
We note that whether or not C and C are exact slopes, l/J 2
1
2
1,
1
{T(i)}.
2
is an exact relative efficiency in the sense that it is based on an
accurate description of what happens in the limit when the prescribed
methods of computing levels are used [Bahadur (1960)].
2. If the prescribed methods of computing levels are inexact,
the plausibility and usefulness of the present procedure is diminished
by the following considerations [Bahadur (1960)].
81
(a) The usual inexact methods are not intended for computations
of very small probabilities.
Consequently, if a non-null
obtains, and n is sufficiently large, the
e
~
prescribed
methods may better be abandoned, or at least the levels
computed thereby not be taken seriously.
(b) A related consideration is that the inexact slope C of
{T } can scarcely be said to describe the actual performance
n
of {Tn }, since C incorporates computational errors of
unknown magnitude and direction.
3. There is, however, some reason to think that the numerical
value of an inexact
~
can be very misleading only if
In particular the limiting efficiency
~
e is.
far from
derived from an inexact
~
nO.
often·
coincides with the limiting efficiency function derived by exact methods
of comparison.
It is perhaps fair to say that such value as a given
method of asymptotic comparison may have stems mainly
efficiency function obtainable by that method.
f~·om
the limiting
If so, the comparison
of inexact slopes affords, or at least promises, a short cut to the
main conclusion of exact analysis [Bahadur (1960)].
4. For 2x2 tables with n • m we saw that T is a monotone
p
function of TN.
~ €
for
n.
So, the (exact) efficiency of T relative to TN is 1
p
On the other hand we get 1 <
~N,p(~)
difference is explained by the second remark.
for
~ €
g -
nO.
This
We can also see that the
first remark is reflected here by the fact that T < TN.
p5. Consider a numerical example. Table (3.1) provides some
values of ~N,p(~). Let el - e 2 • ~, then ~N,p(el,e2) can be written as
'"
'"
'"
~N,P(W1'~). (Note that ~N,p(el'~) • ~N,p(1-e1'-~». We can see, for
81
€
[.2,.8] and I~I ~ 0.1 that ~N,P(~) ~ 1.02.
From a practical point
~
82
of view this difference is negligible.
we consider some other
~'s
The conclusion is still true if
studied here.
TABLE 3.1
SOME VALUES OF THE BAHADUR ASYMPTOTIC EFFICIENCY
FUNCTION FOR T AND TN
p
8
1
.1
.2
.3
.4
.5
.1
1.0200
1.0135
1.0111
1.0102
1.0102
.05
1.0057
1.0036
1.0029
1.0026
1.0025
-.05
1.0091
1.0043
1.0031
1.0027
1.0025
-.1
1.0405
1.0200
1.0135
1.0111
1.0102
l:1
'Ie.
l:1
= -.09.
'Ie
•
CHAPTER IV
A PROCEDURE FOR REDUCING THE ORDER OF BIAS OF ESTIMATORS
1. Introduction
Let x be a binomial random variable with parameters (n,n).
Suppose we are interested in estimating fen).
fen) is given by f(p), where p
estimator.
= x/no
A natural estimator for
Generally this is a biased
The purpose of this chapter is to present a procedure for
reducing the order of the bias of f(p).
The theory of unbiased estimation for the exponential family
was developed by Kitagawa (1956) and applied by Washia, Morimota and
Ikeda (1956).
For a good summary of the work done in the general
theory of unbiased estimation see Zacks (1971).
If it is not possible to obtain an unbiased estimator of f(n),
then the second best thing to do (as far as bias is concerned) is to
reduce the bias of f(p).
Quenoui11e (1956) introduced what is now
known as the Jackknife method for reducing bias.
For a complete
coverage of that subject and further references see Gray and Schucany
(1972).
2. On reducing the order of the bias
Let f(p) be an estimator of fen).
-1
If E[f(p)] • fen) + O(n
),
then we can generate another estimator g(p) such that E[g(p)] •
fen) + O(n- 2).
•
84
Lemma (4.1)
Let x
~
B(n,TI).
If f(p) is an estimator of f(TI) and f is
four times differentiable for all TI E (0,1), then the function
g(p)
= f(p) _ f" (p) p(l-p)
2n
satisfies
E[g(p)]
= f(TI) + O(n-2 ).
Proof:
Consider the expansion of g(p) about TI:
g(p)
1 "
2
= g(TI) + g , (TI)(p-TI) + ~
(TI)(p-TI) + .•..
Now
but by definition,
,,
g
=f
(TI)
,,
(TI) + O(n
-1
).
Thus
E[g(p)]
= (f(TI) - TI(l-TI) f" (TI»
2n
.
-2
.. f(TI) + O(n
).
Lemma (4.1) is generalized as follows.
Theorem (4.2)
Let x be a random variable such that x
Let f(p) be an estimator of f(TI).
~
B(n,TI),
~ E
(0,1).
If f is totally differentiable on
the interval (0,1), then there is a sequence of functions· {f } such that
i
85
•
Proof:
Consider the expansion of f(p) about n:
,
1 "
f(p) = fen) + f (n)(p-n) + 2f
(n)(p-n)
2
+ ••••
Then,
1 " (n) E(p-n) 2 + 6f
1 '" (n)(p-n) 3 + •••
E[f(p)] • fen) + 2f
write
E[f(p~]
as
-2
E[f(p)] • fen) + H1 (n) + O(n ),
where H (n) = O(n- 1).
1
For example, in this case we have
H (n) • f" (n) n(l-n)
2n
1
Define f 1 (p) as
Then by Theorem (4.1) we have
E[f 1 (p)]
= fen)
+ O(n-2 ).
Now write E[f (p)] as
1
where
~2(n)
.
-3
E[f (p)] • fen) + H2 (n) + O(n ),
1
-2
• O(n ).
•
86
Note that
thus
E[f 2 (p)] =
f(~)
+ O(n-3 ).
Continuing the process we reach f _ such that
i 1
E[f i _1 (P)]
a
fen) + O(n-i ).
Write E[fi_l(P)] as
where Hi(n)
a
E[f _ (P)]
i 1
O(n- i ).
a
fen) + Hi(n) +.o(n-(i+1»,
Define fi(p) as
Noting that
we get
-(1+1)
E[fi(p)] • fen) + O(n
).
Example
Let x
fen) • n 2 •
~ B(n,~).
Suppose we are interested in estimating
A natural estimator is f(p) • p2, where p • x/n, and
E[f(p)] • n 2 +~(l-~) •
n
Let H1(~) • ~(~-~) , and define
87
so that
E{f (p)]
1
•
!
= n2 + n(l;n)
n
= n(l;n) and define
Let H {n)
2
n
f 2 {p) .. f (p) - H (p)
1
2
.. f
1
(p) - p(l;p)
n
Note that
E[f (p)] .. n2 + n(l;n) •
2
n
Generally we have
2
fi(p) .. p - p(l-p)
• p
2
- p{l-p)
and
..
n
2
+ n(l-n)
n
i+1
that this method fails to give the unbiased estimator of·
Not~
n2 , namely np
2
n-
iP ,
in a finite number of steps.
2
f 00 (p) - p - p(l-p)[
00
But consider
2
np -1'
r n-j ] .
..
n-1 '
j-1
we see that f 00 (p) is the unbiased
estimator of fen).
.
•
88
Theorem (4.3)
~,
II:
'"
Let x be a multinomially distributed random variable (n,~),
'"
xl
'" x
t
(~l' ••• '~t). Let f(p) be an estimator of f(~), p' a ( - - , • • • , - - ) .
'"
'"
If f is totally differentiable at
~,
'"
n
n
then there is a sequence of func-
",.
tions {f } such that
i
Proof:
Consider the expansion of f(f) about
~:
where
DI(~)"
IXt
'"
and
[
af
(-a
af]
)~'···'(-a)~,
PI '"
P '"
t
2
2
[3aPl2£ J~
D2(~)
.
[3ap l£ ap 2J
~
2
[3api£ P J
t
•
•
,
•
•
2
tXt
[3ap £ 2] ~
t
D2(!) is symmetric.
Write E[f(p)] as
'"
-2
E[f(f)] • f(!) + HI (!) + O(n ),
-1
where Hl (!) • O(n
).
~.
89
f 1 (f) =
Noting that E[HI(f)]
= HI(~)
•
f(f) - H1 (f)'
+ O(n-2 ), we see that
Suppose we have fi_I(E) such that
Note that
whence
Theorem (4.4)
Let x' • (x 1 , ••• ,x t ) be a vector of independent random variables
such that xi
~
B(n,n i (8», where n i (8) is a known totally differentiable
function of 8, and 8 is an unknown parameter. Let f(p) be an estimator
xl
x
of 8 such that fen)
•
8,
where
p'
•
(--,
•••
,~
and
_
n
n
-
If f is totally differentiable in p at nand
...
•
90
af does not depend directly on n for
a-Pi
i • l, ••• ,t, then (under some
conditions mentioned in the proof) there is a sequence of functions
Proof:
-
Consider the expansion of f(p} about
~:
-
where
Dl(~}
D2(~}
and
are as defined earlier.
So,
e+ 12 i=l~ [aa 2f]
2
E[f(p}]
-
=
Pi ~
Wrjte f[f(p}] as
-
E[f(e}]
where
Hl(~}
. -1
• O(n
a
-
e + Hl(~} + O(n-2J,
}.
"
.
Suppose that we can find a function H (£} such that
l
"
"
Hl(~} = Hl(~} and Hl(e) .is differentiabl~ in f. Define fl(f} as
"
flee} = fee} - Hl(f}·
Note that
Thus
The rest of the proof follows by induction.
satisfies
Suppose that fi_l(f)
91
4It
Write E[f i _1 (f)] as
E[f _ (£)]
i 1
where
Hi(~)
= 8 + Hi(~) + O(n-(i+1»,
= O(n-i ).
Suppose that we can find a function H (£) such that
i
= Hi(~) and Hi(f) is differentiable in f.
"
= f i _ 1 (£) - H
(£)·
i
Noting
we find
Theorem (4.5)
Let
= (x1 •••• ,x t )
~'
such that xi
be a vector of independent random variables
~
B(n,TI (8», where TI (8) is a known totally differentiable
i
i
"
function of 8 and 8 is an unknown parameter. Let 8(p)
be an estimator
-
of 8 defined by F(£,8(£»
Xl
(-,
n
x
... ,-).
n
t
= 0, where F is a known function and
If
(i) F(~,8)
"
= 0, that is 8(£)
ITI
c
8 where
TI' • (TI,(e) ••••• TI t (8»;
(1i) F is totally differentiable in P1 ••••• Pt and
"
8 at TI • TI(S);
...
(iii)
a!
as TI...
~ OJ
-
4It
92
(iv)
aa does
a--
not depend directly on n and is totally dif-
Pi
ferentiab1e in p at W. for i • 1 ••••• t.
~
~
A
then there is a sequence of estimators'
f[a ] •
i
a+
{a i }
such that
o(n-(i+1».
Proof:
Direct from Theorem (4.4) assuming that the regularity conditions mentioned in the proof of Theorem (4.4) hold here also.
3. Applications
3.1. On estimating the density of bacteria in a solution.
Consider estimating the density of bacteria in a solution by
the quanta1 response method.
Data of this type are generated by the
inoculation of several sterile tubes (or plates) for each aliquot
taken from a sequence of serial dilutions of the original solution.
From the number of fertile tubes (i.e •• tubes showing growth after
incubation). density estimates are derived.
.
Cornell and Speckman (1967)
.
. reviewed
this'
statistical problem in detail; and their conclusions
.
.
. indicated that the maximum likelihood
e~timate
has satisfactory proper-.
ties for both large and small sample sizes in such experiments.
Enumeration of "bacteria by the maximum likelihood method (or
what is essentially the same as the 'most probable number' (MPH) procedure as named by McCrady (1915»
is based on two assumptions.
1. The distribution of the individual bacteria cells is random
without aggregation of any kind. and hence the number of bacteria
in a small unit of the solution follows a Poisson distribution.
2. Growth will ensue in a sterile tube with the introduction of one
or more bacteria.
93
Under these assumptions the probability that an inoculated tube
.
-AZ
is sterile (shows no growth after incubation) is e
t where A is the
density per inoculator unit in the original solution and
centration of the inoculator used.
tube is 1-e
-AZ
Z
is the con-
Hence the probability of a fertile
,
•
•
At the time point for which the density estimate is desired t
we create k dilutions zlt ••• tZk from a portion of original solution.
At the i-th dilution we inoculate n
i
tubes.
The dilution and inocula-
tion of aliquot processes are undertaken so that individual tube
responses are independent.
From these assumptions t it follows that
the number Xi of fertile tubes for the i-th dilution has the binomial
distribution
(4.3.1)
Thus t the k-tup1e
=
~'
=
L(XtA)
(X1' •••
'~)
has the likelihood
k ni
-AZ i Xi -AZ i ni-x i
II ( ) (l-e
) (e
)
•
i=l Xi
(4.3.2)
....
The 'most probable number' of bacteria is the value of A, say A, which
maximizes (4.3.1) or, equivalently, the value A which solves the
equation (see Koch and Tolley (1973»:
....
-AZ
k
I:
xizie
x
i ..l
-AZ
l-e
....
If k=l, then A ..
i
(4.3.3)
i
n-x
1
1
log
-•
n
zl
--
.
" to
However, for k>2, the solution A
(4.3.3) must be obtained by iterative methods.
Algorithms ·for doing
this have been given by Finney (1952) and Peto (1953).
•
94
Now, our purpose is to reduce the order of the bias of A by
applying Theorem (4.5).
Reducing the order of the bias of
Note that
Clearly we have
A
'"A is the solution of the equation
F(~,A) = 0,
where u =
~
Also it is easy to check that
E(~).
'"
F(~,A)
That is,
~(X)I
= o.
=
A.
u
o.
2'"
~
i l l b e use d i n th e nex t s t eps.
~ ~
whi chw
Now we want to find
ax!
1. Differentiating both sides of equation (4.3.3) with respect to Xi
we get
k·
r
j=l
j,i
'"
AZ
+
zi(e
i
2
-1) - xiz i e
'"
AZ
i
x
(e
AZ
That gives
,..
aA
• rx:-i
i_I) 2
,..
k
r
2
XjZ j e
AZ j
x
j-1
(e
AZ j
,..
-1)
,..
(aA )
aX i
2
-
z1 e
AZ i
•
x
(e
AZ1
-1)
(4.3.4)
95
Hence
(4.3.5)
where
Zi
---';;"",-
•
,
-AZ
(1-e
i)
and
2
k
'"
D(~,A)
XjZ j e
= E
'"
AZ
j
k
= E
x
j=1
(e
AZ j
2
XjZ j
x
j=1 2[Cosh (AZ ) - 1]
j
2
- 1)
2. We have
(4.3.6)
(4.3.7)
'"
AZ
(e
k
...
E
j
-
1)2
XjZ 3j e
'"
AZ
j-l
j"i
'"
2AZ
2x j z j e
j ~
i
4
j - 1)
(e
3
'"
'"
AZ (aA )
'"
j
AZ
(e
'"
j-1) •
(~)
x
AZ
(e
4
j - 1)
•
96
(e
'"
AZ
i
4
- 1)
After simplifications we get
'"
Fi(A)
'"
aD(X,A)
aX
=
k
[ Lx e
j=l j
x
-
D(X,A)
i
+e
'"
-Az
i
'"
-AZ
.",
'" 2
j (F ()J)
(Zj - 2F j (A»]
j
'" 2
(F i (A» ,
(4.3.8)
or equivalently,
'"
_aD_(:--~_,A_) = __
-.....;zi~_
aXi
.'"
-AZ
(l-e
i)
2
+
s.
z.l.
x
2 [Cosh(Az ) - 1]
i
By substitution in (4.3.6) we get (after simplification):
(4.3.9)
97
or, equivalently
•
k
r
j=l
(4.3.11)
Lemma (4.6)
The estimate "Al of A given by
"
"
1 k
A "" A - - r
1
2 i=l
satisfies
Proof:
Consider the expansion of "A
"
A "" A +
Since for
i~j
k
r
about~:
a~
(]Xi)U (xi-u i )
i-I
Xi and x j are independent, we get
"
1 k a2~
E(A) .. A + 2' r (2)
i-I
aXi
n
~
i
e
-AZ i
-AZ
-2
(l-e i ) + O(n i ).
•
98
Now we are looking for a function
1
-AZ
2~
k
(~)
=- r
aXi
2 i=l
H(~,A)
n
~
i
e
such that
-AZ
i(l - e
i)
(4.3.12)
~
G1(~,A)
Clearly, the function
satisfies this property; hence by
Theorem (4.4) the conclusion follows.
A
We note that
fying (4.3.12).
~
n
i
e
-AZ i
G1(~,A)
~
is not the only choice for
Let G2 (A) be the same as
G1(~,A)
H(~,A)
satis-
with Xi replaced by
~
(1 - e
-Az i
).
Then we claim
Thus we have:
Lemma (4.7)
A
"
A
A
The estimate A2 of A given by A • A - G2 (A), where G2 (A) is
2
as defined previous1Y,satisfies
Proof;
The same as Lemma (4.6).
Lemma (4.8)
Let
and
99
(a) gi(~)
= Fi(~)'
K(u)
...
= D(~,A)
"
lu
....
and
Hence
2
(8i (x»
nj-x
k
k
j
(b) W(x) • - L
[L x ( - -)
....
2 i=l { (K(x»3 j=l j n
j
...
1
. 2
(gj(x»
(Zj - 2g j (x»]
satisfies
So
2"
-AZ
-AZ
1 k
i (1 - e
i) •
W(u...) = - L (~)
~~.
ni e
2 i=l d1C u
i
...
" =A
" - W(x) satisfies
(c) A
3
Proof:
(a) and (b) can be shown by direct substitution, and (c)
follows directly from Theorem (4.4).
Thus, we have produced three
"
-2
esti~ates
for A, all
in the sense that E(A j ) • A + O(n i ), j • 1,2,3.
equiva~ent
One may prefer one
estimate over the others according to the computational aspects and
the behavior of the variance.
Generally, A is not recommended because
3
the computations are disturbed when xi • 0 for any i.
•
100
A numerical example
For the model described earlier, we consider the special case
where we have three dilutions with n l
Z
2
I:
.001, and z3
= n 2 • n3
I:
10 and zl • .01,
= .0001.
A.
A.
A.
The expected values of A, Al and A2 are shown in Table (4.1)
for different values of A.
Also shown are
Bias(A )
i
I---x-r=;..-I x
100, i • 1,2.
Bias(A)
That shows the procedure introduced here has significant effect
in reducing the bias of A.
A
For the
A.
V(A ) as well as
2
sam~
A
values of A, Table (4.2) shows V(A), V(A l ) , and
V(A )
A
A
i
x x 100, i • 1,2; which shows V(A i ) < V(A),
V(A)
1 • 1,2, for all values of A considered here.
Similar information concerning the mean square error is given
in Table (4.3).
101
•
TABLE 4.1
A
A
A
A
EXPECTED VALUES OF A, Al AND A WIre
2
A
Bias(A 1 )
A
X
100
Bias (A)
Bias(A2)
A
X
100 FOR DIfFERENT VALUES OF A
Bias (A)
A
Bias(A 2)
A
"A
"
Al
"
A
2
Bias(A)
X 100
A
B1as(A 2)
"
Bias(A)
X 100
25
26.03
24.77
24.78
22.3
21.4
50
52.78
49.86
49.90
5.0
3.6
75
79.84
74.74
74.88
5.4
2.5
100
107.44
99.45
99.82
7.4
2.4
150
164.63
148.18
149.67
12.4
2.3
200
223.95
196.13
199.53
16.2
2.0
300
342.41
292.14
299.09
18.5
2.1
400
453.57
389.70
397.67
19.2
4.3
500
558.18
488.16
495.39
20.4
7.9
600
660.33
587.26
593.24
21.1
11.2
700
762.97
686.96
691. 91
20.7
12.8
800
867.46
787.06
791.46
19.2
12.7
900
974.21
887.23
891.66
17.2
11.2
1000
1083.23
987.14
992.18
15.5
9.4
1100
. 1194.33
1086.54
1092.76
14.3
7.7
1200
1307.32
1185.28
1193.25
13.7
6.3
•
102
TABLE 4.2
" AND "A FOR DIFFERENT VALUES OF A
VARIANCES OF "A, Al
2
"A
"
Al
"A
2
V(A" 1)
x
A
X
V(A)
100
V(A )
2
x
V(A)
X
25
294
262
262
89.1
89.1
50
678
582
586
85.8
86.4
75
1,219
990
1,007
81.2
82.6
100
1,988
1,499
1,556
75.4
78.3
150
4,475
2,949
3,191
65.9
71.3
200
8,384
5,418
5,855
64.6
69.8
300
19,511
15,157
15,292
77.7
78.4
400
33,976
30,685
29,870
90.3
87.9
500
51,902
49,383
47,901
95.1
92.3
600
73,510
69,474
68,001
94.5
92.5
700
98,869
90,191
89,420
91~2
90.4
800
128,094
111,334
111,905
86.9
87.4
900
161,471
133,045
135,583
82.4
84.0
1000
199,503
155,714
160,860
78.1
80.6
1100
242,676
.179,787
188,206
74.1
77 .6
1200
291,545
205,792
218,140
70.6
74.8
100
103
TABLE 4.3
•
"" "MEAN SQUARE ERROR OF A,
Al AND A2
"-
A
"A
"
Al
"
1.
2
"-
MSE( )
1
"
x 100
MSE{>.)
MSE{A )
2 x 100
"
MSEO)
25
295
262
262
88.8
88.8
50
686
582
586
84.8
85.4
75
1,242
990
1,007
79.7
81.1
100
2,043
1,499
1,556
73.4
76.2
150
4,689
2,952
3,191
63.0
68.1
200
8,958
5,433
5,855
60.6
65.4
300
21,310
15,219
15,293
71.4
71.8
400
36,846
30,791
29,875
83.6
81.1
500
55,287
49,523
47,922
89.6
86.7
600
77 ,150
69,636
68,047
90.3
88.2
700
102,834
90,361
89,485
87.9
87.0
800
132,645,
111,501
111,978
84.1
84.4
900
166,978
133,208
135,653
79.8
81.2
1000
206,430
155,879
160,921
75.5
78.0
1100
251,574
179,968
188,258
. 71.5
74.8
1200
303,063
206,009
218,186
68.0
72.0
•
104
3.2. A note on the 10git
x2-statistic
To test homogeneity in contingency tables we may use the 10git
x2-statistic TL mentioned in Chapter I.
The version of this statistic
for 2x2 tables with n=m is given by
xy(n-x) (n-y) [log ~ - log~]
_
n-x
n-y
TL(x,y) 2
2
n[(nx-x) + (ny-y )]
2
There are two problems with TL(x,y).
1. First, the statistic TL(x,y) does not satisfy the convexity
condition mentioned in Chapter II (given that xy(n-x) (n-y)
~
0).
= 16, and consider TL(13,1) = 11.799,
T (13,2) = 11.861, TL (14,1) = 13.222 and T (14,2) = 13.253. We treat
L
L
As an example for that, let n
Let X ~ B(n,n ), Y ~ B(m,n )
2
1
Consider the problem of testing the
this problem intuitively as follows.
where x,y are independent.
hypothesis:
n
1
log ---l-nl
H:
o
n
2
= log ---1-n
2
A natural estimator for' f(n)' is given by f(p),
Let fen) = log ---1
-n •
where p = x/no Theorem (4.1) gives for fen) the reduced order bias
n
estimator
f ()
1 -E- + (1-2p)
2np(1-p)·
1 p • og 1-p
N~
~
f (n) + f'(n) (p-n)
1
and the approximate variance of f 1 (p) based on that expansion is
f 1 (p)
given by:
105
V1 [f (p)]
1
= [f'(n)]2 V(p)
•
2
2 2
• J2nn(1-n) - «l-n) + n )]
33
4n3
n (l-n)
= v 1 (n,n).
where P1
= x/n and P2 • y/m.
,
Now for n=m=16 we have:
TL (13,1)
M
= 21.346
T~(14,1)
= 24.530
T~ (13,~)
T~
and
Table (4.4) shows some values of TL and
T~
= 14,579,
(14,2).= 17.393.
for some values of n.
•
106
TABLE 4.4
Examples of the effect of using the modified logit x2-statistic
Table
n=16
n=17
n=18
TL
Table
T~
(13,1)
11. 799
(13,2)
14.579
(13,2)
11.861
(14,2)
17.393
(14,1)
13.222
(13,1)
21.346
(14,2)
13.253
(14,1)
24.530
(13,1)
11. 237
(13,2)
13.857
(13,2)
11.413
(14,2)
16.136
(14,1)
12.678
.(15,2)
18.978
(14,2)
13.012
(13,1)
20.674
(15,1)
14.069
(14,1)
23.268
(15,2)
14.329
(15,1)
26.436
(13,1)
10.746
(13,2)
13.309
(13,2)
10.973
(14,2)
15,380
(14,1)
12.096
(15,2)
17.672
(14,2)
12.562
(13,1)
20.162
(15,1)
13.529
(16,2)
20.532
(15,2)
14.138
(14,1)
22.586
(16,1)
14.825
(15,1)
25.151
(17,1)
15.162
(16,1)
28.296
(16,2)
15.374
(17,1)
3'8.088
107
2. The second problem with T is the existence of O-cells (i.e.,
t
when xy(n-x) (n-y)
=
0).
A common rule to use in this situation is to
replace any observed zero in the table by 0 > 0 (see Berkson (1955»,
usually 0
1
2.
D
This rule is not appropriate to be used in our situa-
tion because it leads to a violation of the convexity condition.
example, let n
= 16;
T (8,0) < T (8,1).
t
t
then T (8,1)
t
= 5.570
1
and T (8'2)
t
= 5.095,
•
For
i.e.
On the other hand we may get better results if
we use this rule with
T~.
Table (4.5) shows that for n • 16.
The
corresponding values of T and T are shown in the table also.
p
n
•
108
TABLE 4.5
Further examples of the effect of using the modified logit x2-statistic
T
L
T~
(7,0)
4.369
22.201
8.960
12.444
(7,1)
4.570
7.724
6.000
7.385
(8,0)
5.095
27.446
10.667
16.000
(8,1)
5.570
9.691
7.575
9.924
(9.0)
5.858
32.562
12.522
20.571
(9.1)
6.632
11. 776
9.309
13.128
(10,0)
6.675
37.333
14,545
26.667
(10,1)
7.771
13.967
11.221
17,280
(11,0)
7.569
41.540
16.762
35.200
(11,1)
9.005
16.293
13.333
22.85;
(12,0)
8.568
44.994
19.200
48.000
(12,1)
10.350
18.732
15.676
30.730
Table
T
P
TN
CHAPTER V
GENERALIZATION OF THE BIAS-REDUCTION PROCEDURE
1. Introduction
In this chapter we consider the extension of Theorem (4.2) to a
more general case.
Let x be a random variable with distribution function G(8),
where 8 E 0 is an unknown parameter.
Based on a sample of size n from
G, let 8 be an estimate of 8(satisfying some nonrestrictive conditions
to be mentioned in Theorem (5.1».
"
A natural estimator is f(6),
but this is usually
estimating f(6).
biased.
Suppose that we are interested in
The purposes of this chapter are to:
1) Generalize the procedure introduced
i~Theorem
(4.2) to
reduce the order of the bias of a biased estimator.
2) Find the U.M.V.U. estimator of f(8) in some cases.
2~:'Theorem
(5.1)
Let x be a random variable with distribution function G(8),
" be an estimator of f(8), based on a sample of size n
e E O. Let f(8)
"
from that distribution, such that E(8)
a
8.
Assume that f is totally
differentiable and
(i)
(11)
(iii)
E(e _ 8)k
< ~
E(e _8)2k-l • O(n~k), E(e _ 8)2k • O(n- k)
Ey(e" - e) k • [E(e" - 8) k Je-y is differentiable for all
y
E
(8±o), 0 > O.
•
110
i
(iv) f (1) (e) = (2.,J)
is independent of n for i - 1,2, •••
i
ae "
e-e
(v) e
E
int(n).
Then there is a sequence of functions {f i } such that
Proof:
" about
From the expansion of fee)
f[f(8)] = f (e) +
¥"
(e) f(e-e)2
e,
we have
+ ~"'(e) f(e-e)3 +
6
(5.2.1)
Assumptions (i) and (ii) imply that (5.2.1) can be written as
"
f[f(e)]
= fee) +
g11(e)
g12(e)
n
+
2 + ...
(5.2.2)
n
where the 8lj'S do not depend on n (we note that gll(e) =
"2·
811 (e)
~"(a) f(8-e». Let H11 (a) = n
and write (5.2.2) as
"
f[f(8)] • f(a) + H1l (a) + H12 (a)
-1
-2
Clearly H11 (a) = O(n ) and H12 (a) = O(n ).
(5.2.3)
~
Define f 1 (e) as
"
"
"
f 1 (a) = fee) - H11 (a)
(5.2.4)
The claim is
(5.2.5)
" about 8:
To show that, consider the expansion of fl(a)
f 1 (8) • fee)
+ £'(a)(8-e) + ~"(e)(8-e)2
+ ~"'(e)(8-e)3 + •••
- H11 <e) - Hi1<e) (e-e) -
~ii(e)(8-e)2
- •••
(5.2.6)
111
whence
E[f 1 (8)]
- H11 (8) -
- ~ii(a)
Hii(8)
1uii(a) E(8-a)2 - ~ii'(8)
E(8-8)3 - •••
+ ~"'(a) E(8-8)3 + ••.
= f(a)
Noting
•
= f(8) + ~"(8) E(a-8)2 + it"'(8) E(a-8)3 + ••.
"2
E(8-8)
E(8_a)2 =
O(n
-2
~ii'(8) E(6-8)3 - •••
(5.2.7)
), we see that (5.2.5) is established.
Now write (5.2.7) as
"
E[f (8)]
1
g
(a)
= f(8) + 21 2
n
+ •.• ,
(5.2.8)
where g2j'S do not depend on n.
. Let H (8)
21
=
g21(e)
2 and write (5.2.8) as
n
" • fee) + H (a) + H (8)
E[f 1 (8)]
21
22
Clearly, H (8)
21
= O(n-2 )
and H (8)
22
"
f 2 (8)
"
= f 1 (8)
= O(n-3 ).
"
- H21 (8)
(5.2.9)
" as
Define f (8)
2
(5.2.10)
Since
and
it follows that
•
112
Continuing the process, we reach £i-1 such that
-i
A
E[fi_l(S)] = f(S) + O(n
).
A
Write E[fi_l(S)] as
(5.2.11)
where the gij'S do not depend on n.
Let Hil(S) =
gil (e) .
i
and write (5.2.11) as
n
A
E[f 1_l (S)] = f(S) + Hil(e) + Hi2 (e)
(5.2.12)
-1
-(1+1)
Clearly, Hil(S) = O(n ) and HiZ(S) = O(n
).
A
Define fi(S) as
A
A
A
fi(e) = f i _l (8) - Hil(S).
(5.2.13)
Noting that
and
we obtain
which proves the theorem.
~:
We did not say any thing about the variance of the members of
the sequence {f i } with respect to the variance of f.
We shall consider
113
that later 1n terms of Rao's second order eff1c1ency (Rao (1963».
Now conS~der the function fk(e) as defined by Theorem (5.1).
~
Suppose that f k (8) can be written as
where the 8 1 'S do not depend on n.
Then we have
Lemma (5.2)
If 8 1 (8) 1s bounded almost everywhere, then as n ~ ~
Proof:
,.
Let h (8)
n
k
= i=l
1:
8 (8)
i
n
,.
Since 8 (8) is bounded (a.• e) there is
i
i
,.
181 (8)/
a pos1tive constant k 1 < ~ such that
K - max{ki }.
1
Now for every
€
>
0
~ k
i
(a.e).
Let
we have
m
...
1im{lim p( U [/hi(S)1 ~ €])}
n-+oo nr+oo ian
m'
k
-t
< lim{lim p( U [K 1:
n-+oo nr+oo ian
j=1 i
m
~ €])}
k
• lim{lim p( U [K' ik- I ~ €] )}
n-+oo nr+oo i-n
i
(i-I)
k
• lim{p(K n - 1 ~ E)}
n-+oo
n k (n-l)
• lim{ (K >
p
n-+oo
k
€
n en-I»} •
k
n - 1
o.
A-
fee) •
•
114
"-
We have shown that fk(e) has the same asymptotic distribution
as fee).
Since comparing the exact variances of fee) and fk(e) is not
feasible, in almost all cases, we shall use the concept of second order
"-
efficiency (see Rao (1961»
"-
to differentiate betweenfk(e) and fee).
"
Let V[fi(e)]
be written as:
•••
n
.•
= 0,1,2, •••
i
and
"-
Then we say that fi(e) possesses the second order efficiency
" iff
property over fee)
~i2(e)
< ~02(e).
The following lemma could be used to verify this property for
all members of the· sequence {file
Lemma (5.3)
~i,i+1(e)
= ~~,i+1(e)
for ~ ~ i + 1, i
= 0,1,2, •••
Proof:
"-
.Consider the function fi(e) as defined by Theorem (5.1); then
"-
fi(e) can be written as:
Clearly,
,.
,.
V[f i +1 (e)] • V[fi(e)] +
1
+
n
2
,.
"-
i+1 Cov[fi(e), gi+1<e)]
n
,.
21+2 V[g1+1(e)]
• V[f (8)] + O(n-(1+2».
i
115
3. We consider the application of Theorem (5.1) to some special cases.
Throughout this section consider a sequence of independent and identin
cally distributed random variables xl' ••• ,x •
n
Let x
= r xi!n and
i=l
let f be a function satisfying the conditions of Theorem (5.1).
3.1. The Normal Case
Let x be a random variable distributed as
known.
2
2
) where a is
N(~,a
Suppose that we are interested in estimating
f(~).
Lemma (5.4)
Let Xi
~ N(~,a
2
) where a
in Theorem (5.1) for estimating
fk(x)
2
is known.
f(~)
The sequence {f } mentioned
i
is given by:
(2i)
= f(x) + ~~ (_l)i
- f
(x)
i=l (2i) 1
E(x-~)
2i
,
or, in another form,
Proof:
Direct application of Theorem (5.1).
It is reasonable to consider the following.
Definition
f(~) is estimable iff f~(x)
= lim
fk(x) exists for every
~
admissible value in the domain of f.
Lemma (5.5)
If f~(x) exists, then it is the unique unbiased estimator of
f(~) which is a function of
i and it has uniformly smallest variance.
•
116
Proof:
We recall the following theorem:
Theorem (5.6) (Lehmann (1949»
Let x have density Ge(x)j let T be a sufficient statistic
for ej and suppose that T is complete.
Then every estimable function
f(8) possesses an unbiased estimator with uniformly smallest variance,
and this estimator is the unique unbiased estimator of f(8) which is a
[See Lehmann (1949, Chapter III, p. 6)].
function of T.
Since
x is a
complete sufficient statistic for ~, the con-
elusion follows.
Having the previous definition, the estimability of
f(~)
can
be studied in terms of the convergence or the divergence of the
infinite series representing f CXl •
Naturally, if f exists everywhere and f (2i) , ~. • 1,2, ••• is
uniformly bounded on the real line then
f(~)
is estimabie.
Lemma (5.7)
Suppose that f exists everywhere on the real line.
If there is
a function g which exists everywhere on the real line such that
If(2i)1 ~ g for all i (everywhere), then f(~) is estimable.
Proof:
CXl
Consider the series f(x) + E
i-I
1
iT
_ (12i
g(x)
i ' for which we
(2n)
have
(12i
_
1f(x)
+ E iT
g(x) ~- - f(x)
(2n)1
1-1
00
+
(12
_
2n
s(x)(e
- 1) <
But, for every i,
1(-1)
11
1
21
21
f(21)(x) (1
I < 1- g(x) (1
•
(2n)1 - 11
(2n)i
00.
117
This implies that the series
f~(x)
= f(x) +
converges absolutely.
L -->-(_....l=)_i f (2i) (-x) a
ial
if
2i
(2n)i
.:
Hence f~(x) exists and f(~) is estimable.
Lemma (5.8)
If
~ [k;l
then
f(~)
f (2k+2) (x)]
f(2k) (x)
J
2
Q.... > 1
n
is not estimable.
Proof:
Direct application of testing the convergence of the series
representing f~(i).
Example 1:
f(~)
- ~
a
is estimable iff a is a positive
inte~er.
Proof:
Suppose a is nonintegral; then
f(k~x) = a(a-l) ••• (a-k+l) xa-k
and
_1_
lim
k-+oo
[
• ;: [
Hence
~
a
k+l
f (2k+2) (x)' ] a2
f(2k) (x)
n
~l (a-2k)(a-2k-l)] 2:~n • ~ •
is not estimable.
Similarly it can be shown that if a is a negative integer then
~a is not estimable.
From that we draw the conclusion that g{~) - log ~
•
118
is not estimable even if
~
is restricted to be positive.
It is of interest to obtain a general formula for the variance
of f •
k
The following lemma gives an expression for the variance of the
U.M.V.U. estimator of
f(~)
(if it exists).
Lemma (5.9)
ClO
L
i=l
Proof:
Clearly
2 2
V[f(x)] • [f'(~)] £+ O(n- 2).
n
Now
f
1
(x)
=
f
(~) +
1
(f'
(~)
-
-k"
'(~)
2
2
+ l(f' '(~) _ ~iv (~) £-) (x-~)
2
+
1
-(f' ,
6
2
2
Q...)(x-~)
n
2
n
1 v
'(~) -.;£
2
02
_
(~) - ) (x-~)
n
3
+
...
and
222
V[f (x)].
1
[f'(~) -~"'(~)
£-] Q...
2
n
n
...
119
22'
... [ft(ll)]
Now
~ + ~[flt(ll)]
2
4
0"2
n
+
0(n- 3 ).
•
+ •..
•
120
,
6
+ ~'(ll) fV(ll) °3 + ¥"(ll)
f v (ll)
n
n
3
2 4
2 2
.. [f' (ll) ]
6
£..-]
£..- + l[f" (ll) ] £..2
n
+ ~[f"'(ll)]
3
n
2
6
4
°3 + O(n- ).
n
Continuing the process and using lemma (5.3) we reach the conclusion.
Corollary (5.10)
V[fk(x)]
=
k+l 1
(
L -- [f i)(ll)]
i=l il
2 2i
a + O(n-(k+2».
ni
Proof:
Use lemma (5.3).
Lemma (5.11)
The second order efficiency of {f } over f is established iff
i
> 0 over the admissible range of f.
f'f'"
Proof:
Note that
V[f (x)]
=
2 a2
[f' (ll) ]
2
4
-- + [l(f" (ll»" + f' (ll) f'" (ll)]
n
2
.
2.-2
n
and that
2
V[fk(x)] .. [f'(ll)]
2
2
4
~ + t[f"(ll)] ~ + O(n-3).
n
The conclusion follows.
Example 2:
1. Let fell) • sin ll.
2
The u.M.V.U. estimator of sin"ll is f m.
(x)
£..• e
2n
sin II (Lemma (5.4».
Lemma (5.9) gives
121
2
V[f 00 (x)]
= -0n Cos 2
1
0
Cos
2
n
2
Cos
EO
0
4
2
11
...
n
6
3T 3"
+
1
+ -21 - 2 Sin
~
~
1
8
0
4T ""4
+
n
2
2
Sin ~ +
6
10
[0
1 0
1 0
Ln
+ 3T 3" + Sf -5- +
~
t2T
l
2
+ Sin ~
n
n
4
12
8
I 0
a + 4T
1 0
2'
""4 + 6T
"'6 + ...]
n
n
n
2
Cos 2
EO
n + Sin2
Sinh
~
]
.••
0
~
r
Cosh
2
2
-a
·2
2Ien + 2Ien (Sin2
~ - Cos
2
n - 1J
0
a
EO
~)
- Sin
2
~.
From that we get
-a
V(Sin x)
2
= en V[foo(x)]
_20 2
EO
!2 + !2
2. Let g(lJ.) ... Cos~.
0
_0 2
(Sin 2 ~ _ Cos 2 ~) - e n
en
The U.M.V.U •. estimator of Cos ~ i.· goo(x) ...
2
e 2n Cos -x, and
02
_20
1 en
V[goo( -x)] -- 12 en + 2
2
(Cos 2
~
- Sin 2~) - Cos 2
~.
Hence
_20 2
1 .n
V(Cos it) • -1 + -e
2
2
_0 2
(Cos
2
~
2
- Sin lJ.) - e
n
2
Cos lJ..
3. From I and 2 we get Cov~Cos it, Sin x).
Consider,
E(ei2it ) . E( Cos -x + 1.• sin x-)2
• E(Cos
2
2
x - Sin x) + i2 E(Cos x Sin it).
•
122
Thus
.
~2E(Cos
x- Sin x)
a
But
E(ei2x ) + E(Sin 2 x- - Cos 2 -x).
_20'2
E(Sin 2 x- - Cos 2 -x)
a
(Sin 2
en
~
- Cos 2
~).
Hence
i2E[Cos
x Sin x]
-20'
n
2
• e
• e
-20'
n
i2~
+ e
(Sin
2
~
- Cos
2
~)
2
• en
-20'
_20'2
(Cos
2
~ - Sin2 ~ + i2 Cos ~ Sin ~)
2
+ en
(Sin
2
~
- Cos
2
~)
2
-12:....
Cos
~
Sin
~.
Thus
_20'2
E[Cos
xSin x]
n
• e
Cos
~
Sin
~
hence
_20'2
CoV[Cos
x,
Sin
x]
-0 2
= en
n
Cos ~ Sin ~ - e
Cos
2
2
-Q... -S!.• en (en - 1) Cos ~ Sin ~.
~
Sin
~
4. Let h(~) • e~; then the U.M.V.U. estimator of h(~) is
-
h~(x)
s.
Let
of
0'2
0'2
- ']
2~n - 1 ) •
• eit - 2n , and V[
h~(x) • e (e
f(~)
f(~)
k
• ~ , where k is a positive integer.
is
The U.M.V.U. estimator
123
2i
- = (x)
- k + m
(x-)k-2i..
fk(x)
r (_l)i
- - (k-2i)!
- a
i-I i!
k!
(2n)i
k-l if k is odd.
where m = k/2 if k is even and m • ~
~
V[f (x)] =
.L [ (k-i)!
1=1 i!
k!
k
•
Also
k-l] 2 1
J.l
ni •
3.2. Other distributions
Lemma (5.12) (The Binomial distribution)
Let x
~ B(n,~)
and let f be a differentiable function, then:
1
(i) f 2 (p) = f(p) - 2n f"(p) p(l-p)
- p (liP) [4f' , , (p) (1-2p) - 3f iV(p) p(l~p)].
24n
(where f 2 is the second element of the sequence {fi} described by
Theorem (4.2) to estimate
f(~».
f , (~)[~(l-~)f '" (~) +
(1-2~)£
"
(~)] >
0 for all
~.
Proof:
(i) Direct application of Theorem (5.1).
(ii) Note that
V[f(p)]
= (f' (~»2 ~(~-~)
+ l2[~(f" (~»2 ~2(1_~)2
n
+ f , (~) f "
(~) ~(1-~)(1-2~)
+ f , (~)f ' "(n)~ 2 (l-n) 2 ]
+ 0 (n-3),
and
V[f (p)] • (f' (~»2 n(l-'II') + L[1(f" (n»2 n 2 (1_n)2
1
n
2 2
n
"
+ f'
(~) f
(n)~(1-~)(1-2~) +
f
,
(~)
f '"
(~)n
2
(l-n) 2
•
124
- f' (n) f" (n)n(l-n) (1-2n) - n 2 (1_n)2 f' (n) f'" (n)] + 0(n- 3 )
So,
1
=:2
n(l-n)
V[f(p)] - V[fl(p)]
"
f (n) [f
,
(n)(1-2n)
n
+ f '" (n) (l-n)] + O(n- 3)
The result follows.
Lemma (5.13) (The Poisson distribution)
Let x ~ peA) and let f be a differentiable function.
Let x be
the mean of a sample of size n from this distribution; then to estimate
f (A) we have:
(i) f 2 (x)
-"
D
f(x) - ~n f
-
(x) + ~[8f
24n
f , (A) (f " (A) + Af '" (A»
'"
i
(x) + 3x f vex)].
> 0 for all A.
Proof:
(i) Direct application of Theorem (5.1).
(ii) We have
V[f(x)]
D
(f' (A»2! + 1- [1 A2 (f" (A»2
n
2 2
n
+ A f , (A) f " (A) + A2f "
(A)·"
f
(A)] + O(n-3 )
and
The result follows.
125
Lemma (5.14) (The exponential distribution)
Let x be a random variable distributed exponentially with
E(x)
= e. Let
tion.
x be
~
the mean of a sample of size n from this distribu-
Suppose f is a differentiable function; then to estimate fee)
we have:
= f(x)
-2
I~' 2 , ,
- ~ f (x)
+ x2 (~ f
n
2n
,, -
-2
(x) + ;x f'" (x) + ~
,
f
v (x» •
(ii) ~12(e) < ~02(e) iff
f , (e)(2f " (e) + f
'"
(e»
> 0 for all e.
Proof:
(i) Direct application of Theorem
(5.l)~
(ii) We have
V(f(x»- (f' (e»2 :2 +
~(t"
(e»2 e:
n
2e 3
,
"
'"
e4
3
,
+ f (e) f (e) ~ + £ (e) f
(e)--2 +-O(n- )
n
n
~-
and
V(fl(x»
= (£'(e»2 :2 + i(£"(e»2 e: + O(n- 3).
n
The result follows.
~
CHAPTER VI
NUMERICAL RESULTS FOR SMALL SAMPLES
1. Introduction·
Suppose that we have two populations each divided into two
categories.
Let 8 be the probability that an observation drawn at
i
random from the i-th population belongs 'to the first category, for
i
= 1,2.
Samples of size nand m are drawn from the first and the
second populations, respectively.
Let x and y be the observed frequen-
cies in the first category of the first and the second populations
~
respectively.
Thus, x
Assume that 8
1
= 82 = 8
B(n,8 }, y
l
~
B(m,8 } and x,y are independent.
2
and suppose that we are interested in estimating
the unknown parameter 8.
Several methods of estimation could be used to find an estimate
of 8.
Three methods were mentioned in Chapter I, namely, the maximum
likelihood, the minimum chi-square, and the minimum Neyman chi-square.
These three methods are asymptotically equivalent in the sense that
they lead to BAN estimates of the unknown parameter 8.
On the other
hand, almost nothing is known about the properties of these methods in
the small sample case, except for some numerical studies (see Berkson
(1955), and Odoroff (1970}).
Assume n-m, and denote the maximum likelihood estimate, the
minimum chi-square estimate, and the minimum Neyman chi-square estimate
of
a,
as defined in Chapter I, by "
8R,"8 , and "eN' respectively.
M
127
;..;..
;..
We now study some aspects of the behavior of e R, eM' and eN
;..
when the sample size is small:
;..
(1) the expected values of a , aM' and
R
;..
;..
eN' and (2) the efficiency of a R , aM' and eN in terms of their variances
and mean square errors. Also, we shall consider the measure proposed
•
by Pitman (1939, p. 401) to differentiate between those estimators.
That measure is denoted here by
;..
;..
F(a,a,E)
= p{la-al
~ E},
where a is any estimator of a and E > O.
Further discussion of this
follows later.
2.1. The expected values
Lemma (6.l)
1
(i) If e • 0, 2,.or 1, then
;..
;..
;..
E(e R) • E(8M) • E(8 N)
(ii) If 0 < a <
21
8.
c
then
;..
;..
;..
R
=e
< E(8 ) <1:.
1:.< E(8 ) < E(e )
M
2
R
=8
< E(e ) < 1.
()< E(8 ) < E(8 )
N
M
2
(iii) If 1:. < 8 < 1 then
2
;..
;..
;..
N
Proof:
;..
That E(8 R) • a for all a is well-known, and Part (i) of the
Lemma is trivial.
For a given n let S denote the set of all possible results for
x and y, 1. e. ,
•
128
s = {(X,y): x = O,l, ••• ,n and y = O,l, ••• ,n}.
Let p(x,y,e) be the probability of observing an element of S,
that is
Now
L eM(x,y) P(x,y,8)
(6.2.1)
S
and
A
A
E(8 R)
L 8R(x,y) P(x,y,8).
a
(6.2.2)
5
Let
51
= {(x,y)
E
5:
0 < x+y < n and
52 =' {(x,y) E 5:
x+y = n or x=y},
53 .. {(x,y)
n < x+y < 2n and
x~y},
and
E
5:
x~y}.
Then
A
A
E(e"M) - E(e" R) .. 1: (SM(x,y) - eR(x,y» P(x,y, e)
81
"
+ 1: (8 (x,y) - 8 (x,y» P(x,y,8) ,
M
R
52
A
+1:
"
- "Sk(x,y»p(x,y,e).
(eM(x,y)
8
3
The second of these three summations is clearly 0 by Lemma
(1.4).
In the first summation write u for x and v for y; hut in the
last write u for (n-x) and v for (n-y), and note that with u and v as
indices this summation is now over the set 8 • Hence we have
1
129
"
"
[8M(u,v) - 8 (u,v)] P(u,v,8)
R
"
"
[8M(n-u,n-v) - 8R(n-u,n-v)] P(n-u,n-v,S).
Now recall that
and
P(n-u,n-v,8)
= P(u,v,1-8);
hence
A.
1'\
E(8M) - E(SR)
1'\
a
A.
E [8M(u,v) - SR(u,v)] [P(u,v,S) - P(u,v,l-S)].
Sl
By Lemma (1.4) the first factor in each summand is positive.
the second factor is positive if
"
e < 21 and
.
negative if 8 >
"
E(SM) - E(8 R) is positive or negative according as S <
.-
2'
1
2'
Since
we find
1
or S > 2.
" - E(SM)
" has the opposite signs.
A similar proof will show that E(SN)
"
Since obviously 0 < E(SN) < 1, it remains only to show that
1
1
1
1
< 2 or > 2 according as S < 2 or > 2.
Let
S4 • {(x,y)
£
S:
x+y < n},
S5 • {(x,y)
£
S:
x+y. n},
86 • {(x,y)
.£
S:
x+y > n}.
and
Consider
1
- -.
2
•
130
+ 1: (8" M(x,y)
8
5
+ 1: (8"M(x.y)
8
1
- -)
2
P(x,y,8)
1
- -)
2
P(x,y.8)
1
- -)
2
[P(x,y,8) - P(x,y,1-8)].
6
= 1: (8" M(x,y)
8
4
1
"
Lemma (1.4) implies 8M(x,y) - 2 < o on 8 , [P(x,y,8) - P(x,y,1-8)] is
4
"
1
1
1
if 8 < 2 and
positive i f 8 < 2 and negative i f 8 > 2' Hence f(8 ) < 1.
2
M
>
~
if 8 >
~.
Table (6.1) shows the expected values of 8M and 8N for
8
= .05(.05).45 and n • 5(1)15,30,50. The table also shows approximate
values of the maximum bias
MB
" - 8
= sup f(8)
0
0<8<1
8=8
o
These have actually been calculated as
" max
f(8)
i-1,2, •••• 99
e
8=i/100
but from the smoothness of the calculated values we claim that the
error is less than .0001.
TABLE 6.1
"
"
EXPECTED VALUES OF eM AND eN' AND THE MAXIMUM BIAS
e
n
.05
.10
.15
.20
.25
.30
.35
.40
.45
.0631
.1194
.1715
.2210
.2689
.3158
.3622
.4082
.4542
.0216
.0111
.0398
.0806
.1298
.1848
.2439
.3057
.3695
.4345
.0707
.0629
.1185
.1698
.2189
.2667
.3138
.3605
.4071
.4536
.0198
.0129
.0450
.0888
.1400
.1957
.2514
.3144
.3757
.4377
.0614
.0626
.1174
.1682
.2170
.2648
.3122
.3592
.4062
.4531
.0182
.0146
.0495
.0957
.1481
.2040
.2617
.3206
.3801
.4400
.0543
.0622
.1164
.1667
.2154
.2633
.3108
.3582
.4055
.4528
.0169
.0162
.0535
.1015
.1546
.2104
.2674
.3252
.3833
.4416
.0486
.0619
.1154
.1654
.2140
.2620
.3098
.3574
.4050
.4525
.0156
.0177
.0571
.1063
.1599
.2154
.2718
.3286
.3856
.4428
.0441
.0615 .
.1145
.1642
.2128
.2610
.3089
.3567
.4045
.4523
.0146
.0191
.0603
.1104
.1642
.2194
.2752
.3312
.3874
.4437
.0403
MB
5
6
7
8
9
10
,...
t...l
,...
•
•
133
2.2. The variances and meansguare errors
Lemma (6.2)
(i) If 8
1
= 2'
then for every n
,..
,..
,..
V(8 M) < V(8 R) < V(8 N)·
•
1
(ii) There is an open interval B(n) symmetric about 2
such that for all 8 E B(n) we have
,..
"
V8 (8M) < V8 (8 R) <
,..
~(8N)·
Proof:
We consider the left-hand inequalities; the right hand
inequalities are proved similarly.
If 8=
21
then
and
clearly
Thus
•
134
But on 51 we have
Hence if 8
= 21 then for every
n(~2)
"
"
we have V(8 ) < V(8 ), which is (i).
M
R
"
Noting that VeeR)
and V(8" ) are both continuous functions of 8
M
"
"
and V8 (8M) ~ V(1_8) (8M), (ii) follows also.
Corollary (6.3)
1
For every n there is an open interval C(n) symmetric around 2
such that for every 8 E C(n) we have
"
"
Table (6.2) gives 2n V(8 ) and 2n V(8 ), and Table (6.3) gives
M
N
"
"
2n MSE(8 ) and 2n MSE(8 ), for 8 ~ .05(.05).50 and the same values n
M
N
as in Table (6.1).
The factor 2n has been included in order to facili-
tate comparison with 8 :
R
Vee ) ~ MSE(e ) • 8(1-8)
R
.
R
2n
Table (6.4) gives the interval B(n) (C(n»
on which
V(8" ) < V(8" ) (E(8" -8) 2 < f(8" -8) 2), correct to three decimal places,
M
M
R
R
for n· 5(1)15,30,50.
" - V(8" ) I, which
Also shown are MD· maxIV(8R)
M
8
"
V(8 )
R
x
, which
and MR. • max
8 V(eM)
135
Similar information is shown in Tables (6.5) and (6.6), for
A
comparing
A
aR with aN
A
and OM with
A
aN.
•
•
140
TABLE 6.4
THE INTERVALS IN WHICH THE MINIMUM CHI-SQUARE ESTI}~TE
HAS A SMALLER VARIANCE AND MEAN SQUARE ERROR THAN
THE MAXIMUM LIKELIHOOD ESTIMATE
Variance
n
MD
MR
Mean Square
Error
5
[ .180, .820]
.0038
1.1794
[.198, .802]
6
[.170, .830]
.0027
1.1518
[.190, .810]
7
[.162, .838]
.0021
1.1313
[.185, .815]
8
[.155, .845]
.0016
1.1158
[ .180, .820]
9
[.150, .850]
.0013
1.1037
[.177, .823]
10
[.146, .854]
.0011
1.0938
[.175, . •825]
11
[.143, .857]
.0009
1. 0858
[.173, .827]
12
[ .140, .860]
.0007
1.0789
[.172, .828]
13
[.138, .862]
.0006
1.0731
[.172, .828]
14
[.136, .864]
.0005
1.0682
[.171, .8291
15
[ .136, .864]
.0005
1.0637
(.171, .829]
30
[.135, .865]
.0001
1.0327
[.177 , .823]
50
[ .140, .860]
.00005
1.0196
[ .180, .820]
141
I
TABLE 6.5
THE INTERVALS IN WHICH THE MAXIMUM LIKELIHOOD ESTIMATE
HAS A SMALLER VARIANCE AND MEAN SQUARE ERROR THAN
THE MINIMUM NEYMAN CHI-SQUARE ESTIMATE
Variance
n
MD
MR
•
Mean Square
Error
5
[.108, .892]
.0244
1. 9765
[.065, .935]
6
[.095, .905]
.0152
1.7334
[.056, .944]
7
[.084, .916]
.0100
1.6151
[.049, .951]
8
[.076, .924]
.0071
1.5531
[.043, .957]
9
[.069, .931]
.0054
1.5132
[.039, .961]
10
[.063, .937]
.0043
1.4847
[ .035, .965]
11
[.058, .942]
.0035
1.4631
[.032, .968]
12
[.054, .946]
.0029
1.4460
[ .030, .970]
13
[.050, .950]
.0025
1.4319
[.028, .972]
14
[.047, .953]
.0021
1.4205
[.026, .974]
15
{.045, .955]
.0018
1.4134
[.024, .976]
30
[.024, .976]
.0005
1.3489
[.013, .987]
50
[.015, .985]
.0002
1.3277
[.008, .992]
•
142
TABLE 6.6
THE INTERVALS IN WHICH THE MINIMUM CHI-SQUARE ESTIMATE HAS A
SMALLER VARIANCE AND MEAN SQUARE ERROR THAN THE
MINIMUM NEYMAN CHI-SQUARE ESTIMATE
Variance
n
MD
MR
Mean Square
Error
5
[.125, .875]
.0282
2.3312
[.093, .907]
6
[.111, .889]
.0180
1. 9926
[.082, .913]
7
[.101, .899]
.0121
1. 7890
[.073 , .927]
8
[.092 , .908]
.0086
1.0778
[.066, .934]
9
[.085, .915]
.0065
1.6049
[ .060, .940]
10
[.079, .921]
.0051
1.5524
[ .055, .945]
11
[.073 , .927]
.0041
1.5123
[.051, .949]
12
[.068, .932]
.0034
1.4802
[.048, .952]
13
[.064 , .936]
.0028
1.4545
[.045, .955]
14
[.061, .939]
.0024
1.4330
[.042, .958]
15
[.058, .942]
.0021
1.4159
[.039, .961]
30
[.032, .968]
.0005
1. 2973
[.022, .978]
50
[.020, '.980]
.0002
1.2542
[.014, .986]
143
2.3. Pitman's measure of closeness
Definition (6.1) (Pitman, 1939)
Let 81 and 82 be two estimators of 8.
F(8" i ,8,£) = P{18" i - 8 1 2 £} for
i
Let
= 1,2 and £ > o.
•
If
"
F(8 1 ,8,£)
~
"
F(8 2 ,8,£) for all £ > 0
and
we say that 8 is better (or closer) than 8 2 for estimating 8.
1
Lennna (6.4)
Let A(n)
= ['21 - 14n' '21 + 14n]' then for all 8 £ A(n) and all
£ > 0 we have
and for some £1' £2 we have
"
"
F(8M,8'£1) > F(8 R,8'£1)'
and
"
"
F(8 R,e'£2) > F(8 N,8'£2)·
Proof:
For some e £ [0,1] and £ > 0 let
RR, .. {(x,y) £
s:
x+y<n, 1"8 (x,y) - el
R
R • {(x,y) £
r
s:
x+y>n,
MR, • {(x,y) £
s:
"
x+y<n, leM(x,y)
- el ~
Mr • {(x,y) £
s:
x+y>n, leM(x,y) - 81 < £}.
I"eR(x,y)
- el
2 El,
2 El,
El,
and
"
•
144
If a E A(n), Lemma (1.4) implies
R~ ~M~
Thus
(R~
u Rr )
C (M~
and Rr £M , for every E > 0.
r
u Mr ).
But
"
F(aR,a,E)
= P{R~
F(6"M,6,E)
= P{M~ u
u R },
r
and
Mrl.
Hence for a E A(n) and E > 0 we have
"
F(aM,a.E)
Let
~
"
F(aR,a,E).
be the nearest value assumed by aM
to a E A(n) and
n-1
1
1
1
n+1
. 1
Clearly 2n < Xo < I if a ~ I and I < Xo < 2n if 6 ~ I
X
o
la - xo I. it follows that
F(e" ,a,E ) > F(a" ·e , E1 )·
1
M
R
The rest of the lemma can be shown similarly.
Lemma (6.5)
There is 6i E (0,1),
~i
> 0, i • 1, ••• ,4
such that
"
(i) F(a" ,6 p E) > F(a ,6l'E),
N
R
"
(ii) F(6" ,6 ,E) > F(8 82 ,E) •
W
N 2
E < °2'
(iii) F(a" R,6 3 ,E) > F(8" N,6 ,E),
3
e: < ~3'
(iv) F(6" ,6 4 ,E) > F(8" ,6 ,E),
R
M 4
E < ~4'
E <
~l'
hence
(v) None of "
6 ,"6 and 8" possess the property mentioned by
N
R M
Definition (6.1) for all 6 E [0,1].
145
Proof:
(i) and (ii)
1
1
1 1
Let A1 (n) = (O'4n) u (1 - 4n' 1) and 0 = 2 - 4n.
Then
Lemma (1.4) imply (i) and (ii) with 8 , 8 £ A (n) and
1
2
1
81 = 82 = 8.
•
(iii) Follows from Lemma (6.4).
(iv) Let 84
1
Let {Xl' ••• ,X } be the set of all values
k
assumed by 8M and Xo be the element in this set satisfying
Let 04
c
Ixo
=
-
2n.
2~1
then (iv) follows from Lemma (1.4) (clearly
(v) Follows at once by Lemma (6.4) and the previous part of
Lemma (6.5).
•
CHAPTER VII
SUMMARY
In this dissertation we have considered the following:
1. A procedure for constructing exact tests for contingency tables
In Chapter II we introduced a procedure for constructing exact
tests for contingency tables by taking advantage of the existence of
an asymptotic solution to the problem using a test statistic T.
Using
T to solve the problem on an asymptotic basis implies that we accept
the nature of the ordering induced by T on the sample space.
Hence, it
is completely legitimate to use T for ordering the sample space when
the sample size is small.
Now, by using T, we can arrange the sample space as a totally
ordered set.
Where the exact distribution of T depends on unknown
parameters,- two methods were suggested to deal with this problem.
method is to consider BAN estimates of these unknown parameters.
One
The
second method is to divide the sample space into three sets, two of
them independent of the values of the unknown parameters in the sense
that H is- accepted or rejected for all these values, and the third set
o
handled by the first method, considering an estimate of the unknown
parameters.
When applying this procedure we face two problems.
1) If there
is more than one test statistic to solve the given problem on asymptotic
basis, then which one should we use?
2) If we decide to consider an
147
estimate for the unknown parameters, then which method of estimation
shall we use?
This procedure was discussed in detail for testing homogeneity
in 2x2 tables.
Six competing statistics were mentioned to test for
homogeneity on an asymptotic basis.
One of them was eliminated because
it does not satisfy some logical requirements.
For the other five
statistics more study is needed to choose one of them, perhaps based
on their power functions.
In assigning a level of significance for each table we considered the maximum likelihood estimate of the unknown parameter,
because it is more convenient in calculations.
In the light of the
results of Chapter VI that might not be the most efficient way to
estimate the unknown parameter, in our situation.
In Chapter III an attempt is made to differentiate between
these five competing statistics on an asymptotic basis.
Bahadur efficiency (1960) is used.
The theory of
The Neyman chi-square statistic and
the likelihood ratio statistic were favored in this comparison.
But in
Chapter II, in some special cases, we found that the Neyman chi-square
statistic is a monotone function of the Pearson's chi-square statistic;
hence they have the same exact efficiency (in this special case).
That
leads us to require that a method of stochastic comparison between tests
satisfy:
If T is a monotone function of T then the asymptotic effi2
I
ciency must be equal to lover all the parameter space.
2. A general theorem on bias reduction
In Chapters IV and V a general procedure for bias reduction is
introduced.
In Chapter IV we were generally concerned with the binomial
~
148
case.
An application of this procedure was carried out for:
1) estima-
ting the density of bacteria in a serial dilution experiment (or what
is called estimating the "most probable number," MPN).
For the case
studied here, calculations showed that this procedure was able to
reduce the bias of the original estimator by up to 98%, and that the
new generated estimator has a smaller variance and mean square error
than the original estimator for all the true values of the parameter
considered in the study, 2) using this procedure we introduced a modified form of the logit chi-square statistic to test for homogeneity in
2x2 tables.
In Chapter V the procedure was extended to a general distribution functionG(S).
Under certain conditions, we showed the existence
A
A
of a sequence {fi(S)} of estimators for f(8) satisfying f[f (8)] =
i
-(i+l)
f(8) + O(n
), i = 1,2, ••• , where 8 is an unbiased estimate of 8
A
.
(this could be relaxed) and f is a differentiable funct:fon (Theorem
(5.1».
It was shown that the new generated estimators have the same·
A
asymptotic distribution as the original one, f(8).
Since it is not
A
A
feasible .to compare· the exact variance of {f (8)} with that of f(8),
i
we considered the second order efficiency (see Rao (1963»
as a basis
for comparison.
This procedure was studied in detail for the normal distribution.
Assuming
02
is known, formulas for the U.M.V.U. estimator of
f(~) and its exact variance were derived.
The est1mabi1ity ofa certain
class of functions was discussed, and the second order efficiency was
claimed for others.
where
02
In future work we intend to consider the case
is unknown.
For the binomial, the Poisson, and the exponential distributions,
149
the first two members of the sequence {f } were derived.
i
In each case
the second order efficiency is claimed for some class of functions.
In
these cases it would be of interest to obtain a formula for the U.M.V.U.
~
estimator of the unknown parameter and its exact variance.
We have not included the version of Theorem (5.1) which deals
with the non-closed-form estimators (estimators reached by iterative
procedures), but it is obviously clear how to proceed in this case.
It would be of interest to carry out a comparison between this
procedure and the procedure introduced by Quenouille (1951), which is
now known as the Jackknife procedure.
3. Some numerical results concerning estimation from small samples
Let X ~ B(n,e), Y ~ B(m,e), where x,y are independent.
Chapter VI we considered the problem of estimating e when n=m.
In
The
maximum likelihood, the minimum chi-square, and the minimum Neyman
chi-square methods were cor.sidered.
The results showed that the
minimum chi-square estimator (the minimum Neyman chi-square estimator)
has. desirable properties if e is in the neighborhood of ~ (0 or 1).
This study shed some light on the following:
1) Let e
l
and e
2
be two estimators for 8; then (in general)
what criterion should we consider to differentiate between them in the
small sample case?
Intuitively, we see that we want
e
to bring us as
close as possible to e, hence Pitman's measure of closeness, (Pitman,
(1939», may be used.
Lemma (6.6) showed that none of the three
estimators considered for e satisfies the closeness property of definition (6.1) for all values of e. In fact a stronger result could be shown:'
If e
l
and e
2
are two point functions (estimators of e) acting on a
discrete sample space and the parameter space is continuous, then none
~
150
of them satisfies definition (6.1) for all the points of the parameter
space. That leads us to consider some modifications of Pitman's measure
as stated in definition (6.1).
2) If the estimation procedure is carried out over a discrete
sample space by a point function 8 and if the parameter space is continuous, then the range of
e does
not equal the parameter space.
for a given sample size we can find 0 >
that for every E ~ 0 we have
p{le" - el
0
and a set of points A(o) such
~ E}
= 0, e
this is not true if the sample space is continuous.
point:
Thus
E
A(o).
Clearly
That brings up a
If on the same discrete sample space there is one statistic
which assumes a larger number of points than another, then we expect
that this statistic will be better (in some sense) than the other over
some subset of the parameter space.
That leads us to consider the
possibility of modifying some of the methods used now.
For example,
the maximum likelihood estimator (for the situation mentioned earlier)
maps the set {(x,y):
x+y
= a,
a < n} into the -set {~}.
Under some
modifications, this set could be mapped into a set containing more than
one point and the new generated estimator will behave better than the
original one with respect to Pitman's measure.
•
LIST OF REFERENCES
Abrahamson, Innis G. (1967). Exact Bahadur efficiencies for the
Kolomogorov-Smirnov and Kuiper one- and two-sample statistics.
Annals of Mathematical Statistics 38, 1475-1490.
Bahadur, R. R. (1960). Stochastic comparison of tests.
Mathematical Statistics 31, 276-295.
Barnard, G. A. (1947).
34, 123-138.
Annals of
Significance tests for 2x2 tables.
Biometrika
Berkson, Joseph (1955). Maximum likelihood and minimum X2 estimates of
the logistic function. Journal of the American Statistical
Association 50, 130-162.
________• (1972). Minimum discrimination information, the 'no interaction' problem, and the logistic function. Biometrics 28,
443-468.
Bhapkar, V. P. (1961). Some tests of categorical data.
Mathematical StatiFtics 32, 72-83 •
Annals of
• (1966a).
----.hypotheses
A note on the equivalence of two test criteria for
in categorical data. Journal of the American
Statistical Association 61, 228-235.
________• (1966b). Notes on analysis of categorical data. University
of North Carolina Institute of Statistics Mimeo Series No. 477.
Bhapkar, V. P. and Koch, G. G. (1965). On the hypothesis of 'no
interaction' in three dimensional contingency tables. University of North Carolina Institute of Statistics Mime9 Series
No. 440.
Chernoff, Herman '(1952). A measure of asymptotic efficiency for tests
of hypothesis based on the sum of observations. ·Annals of
Mathematical Statistics 23, 493-507.
Cochran, W. G. (1936). The X2 distribution for the binomial all,!
Poisson series, with small expectations. Annals of Eugenics 7,
207-217.
____• (1952). The X2 test of goodness of fit.
cal Statistics 23, j15-345.
Annals of Mathemati-
•
152
• (1954).
----Biometrics
Some methods of strengthening the common X2 test.
10, 417-451.
Cornell, R. G. and Speckman, J. A. (1967). Estimation for a simple
exponential model. Biometrics 23, 717-737.
Cramer, M. (1946). Mathematical methods of statistics.
University Press, Princeton.
Princeton
Diamond, E. L. (1963). The limiting power of categorical data chisquare tests analogous to normal analysis of variance.
Annals of Mathematical Statistics 34, 1432-41.
Finney, D. J. (1952). Statistical methods in biological assay.
Hafner Publishing Company, New York.
Fisher, R. A. ·(1934). Statistical methods for research workers.
Oliver and Boyd Ltd., Edinburgh.
________• (1935). Design of experiments.
Edinburgh.
Oliver and Boyd Ltd.,
Freeman, G. H. and Malton, J. H. (1951). Note on an exact treatment
of contingency, goodness of fit and other problems of significance. Biometrika 38, 141-149.
Goodman, L. A. (1963). On Plackett's test for contingnecy tables.
Journal of the American Statistical Association A/26, 94-108.
_______• (1964). Simultaneous confidence intervals for contrasts
among multinomial populations. Annals of Mathematical
Statistics 35, 716-725.
Gray, H. L. and Schucany, W. R. (1972). The ~eneralized Jackknife
statistics. Marcel Dekker, Inc., New York.
Grizzle, J. E., Starmer, C. F., and Koch, G. G. (1969). Analysis of
categorical data by linear models. Biometrics 25, 489-504.
Haldane, J. B. S. (1937). The exact value of the moments of the
distribution of X2 , used as a test of goodness of fit, when
expectations are small. Biometrika 29, 133-143.
_____ • (1939). The mean and variance of X2 , when used as a test of
homogeneity, when expectations are small. Biometrika 31,
346-355.
Hodges, J. L. and Lehmann, E. L. (1956). The efficiency of some nonparametric competitors of the t test. Annals of Mathematical
Statistics 27, 324-335.
Hoeffding, W. and Rosenblatt, Joan Raup (1955). The efficiency of
tests. Annals of Mathematical Statistics 26, 52-63.
153
Hoeffding, Wassi1y (1965). Asymptotically optimal tests for multinomial
distribution. Annals of ~lathematica1 Statistics 36, 369-401.
Imrey, P. and Koch, G. G. (1972). Linear models analysis of incomplete
multivariate categorical data. University of North Carolina .
Institute of Statistics Mimeo Series No. 820.
•
Kitagawa, T. (1956). The operational calculus and the estimation of
functions of parameters admitting sufficient statistics.
Bull. Math. Stat. 6, 95-108.
Koch, G. G. and Tolley, D. (1973). A generalized modified X2 analysis
of categorical data from a complex dilution experiment.
University of North Carolina Institute of Statistics Mimeo
Series No. 864.
Lehmann, E. L. (1949).
notes.
Notes on theory of estimation.
Unpublished
Margoline, G. H. and Light, R. J. (1973). An analysis of variance for
categorical data. Small sample comparison with chi-square and
other competitors. Unpublished paper.
Maxwell, A. E. (1961).
London.
Analysis qualitative data.
Methuen and Co.,
McCrady, M. H. (1915). The numerical interpretation of fermentationtube results. Journal of Infectious Diseases 17, 183-212.
Mitra, S. K. (1958). On the limiting power function of the frequency
chi-square test. Annals of Mathematical Statistics 29, 122133.
Mohberg, N. (1972). A study of the small sample' properties of tests
of linear hypotheses for categorized ordinal response data.
University of North Carolina Institute of Statistics Mimeo
Series No. 815.
Neyman, J. (1929). Contribution to theory of certain test criteria.
Bull. Inst. Inter. Stat. 18, 1-48.
_______• (1949). Contribution to the theory of X2 test. Proceedings
of the Berkeley Symposium on Mathematical Statistics and
Probability. University of California Press, Berkeley, 239273.
Noether, Gottfried E. (1955). A theorem of Pitman.
Mathematical Statistics 26, 64-68.
_______• (1967).
New York.
Nonparametric statistics.
Annals of
John Wiley and Sons,
•
154
Odoroff, C. L. (1970). A comparison of minimum logit chi-square
estimation and maximum likelihood estimation in 2x2x2 and
3x2x2 contingency tables. Tests for interaction. Journal
of the American Statistical Association 65, 1617-31.
Pearson, E. S. (1947). The choice of statistical tests illustrated
on the interpretation of data classed in a 2x2 table.
Biometrika 34, 139-167.
Pearson, K. (1900). On a criterion that a system of deviations from
the probable in the case of a correlated system of variables
is such that it can be reasonably supposed to have arisen in
random sampling. Philosophical Magazine Series 5, 50, 157175.
Peto, S. (1953). A dose response equation for the invasion of microorganisms. Biometrics 9, 320-335.
Pitman, E. J. G. (1939). The estimation of the location and scale
parameters of a continuous population of any given form.
Biometrika 31, 391-421~
________• (1948). Notes on nonparametric statistical inference.
Columbia University, New York. Unpublished notes.
Plackett, R. L. (1962). A note on interactions in contingency tables.
Biometrika 51, 327-337.
Puri, M. L. and Sen, P. K. (1971). Nonparametric methol.s in multivariate analysis •. John Wiley and Sons, New York.
Quenoui11e, M. (1956).
353-360.
Notes on bias in estimation.
Biometrika 43,
Rao, C. R. (1963). Criteria of estimation in large samples.
25, 129-206.
Sankhya
_______• (1965). Linear statistical inference and its applications.
John Wiley and Sons, New York.
Roy, S. N. and Mitra, S. K. (1956). An introduction to some nonparametric generalizations of analysis of variance and multivariate analysis. Biometrika 43, 361-376.
Taylor, W. F. (1953). Distance functions and regular best asymptotic
ally normal estimates·. Annals of Mathematical Statistics 24,
85-92.
Tocher, K. D. (1950). Extension of the Neyman-Pearson theory of tests·
to discontinuous variates. Biometrika 37, 130-148;
Wald, A. (1939). Contributions to the theory of statistical estimation
and testing hypotheses. Annals of Mathematical Statistics 10,
299-326.
155
__----~. (194la). Asymptotically most powerful tests of statistical
hypotheses. Annals of Mathematical Statistics 12, 1-19.
__------. (194lb). Some examples of asymptotically most powerful tests.
Annals of Mathematical Statistics 12, 396-408.
_______• (1943). Tests of statistical hypotheses concerning several
parameters when the number of observations is large. Trans.
Amer. Math. Soc. 54, 426-482.
Washio, Y., Morimoto, H. and Ikeda, N. (1956). Unbiased estimation
based on sufficient statistics. Bull. of Mathematical Statis~ 6, 69-94.
Welch, B. L. (1938).
158.
On tests for homogeneity.
Biometrika 30, 149-
Wilks, S. S. (1935). The likelihood test of independence in contingency
tables. Annals of Mathematical Statistics 6, 190-196 •
...
• (1938). The large-sample distribution of the likelihood
ratio for testing composite hypothesis. Annals of Mathematical
Statistics 9, 60-62.
------
2
Wise, M. E. (1968). Multinomial probabilities and the X2 and X distributions. Biometrika 50, 145-154.
(19~4).
A complete multinomial distribution compareq with
the X approximation and an improvement to it. ~iometrika 51,
277-281.
_______ a
Woolf, B. (19l5}. On estimating the relation between blood groups and
disease. Annals Human Genetics 19, 251-253.
Yates, F. (1934). Contingency tables involving small numbers and the
X2 test. Journal of the Royal Statistical Society ~. 1,
217-235.
Zacks, S. (1971). The theory of statistical inference.
Sons, New York.
.
John Wiley and
•