Hoeffding, Wassily; (1964)Asymptotically optimal tests for multinomial distribution."

A8YM'P'lOP!CALLY OPTlMAL TESTS FOR MULTINOMIAL DISTRIBUTIONS
Wassily Hoeffding
Institute of Statistics
Mimeograph Series No. 396
July 1964
,
UNnERSITY OF NORTH CAROLINA
Department of Statistics
Chapel Hill, N. C.
ASYMl?TOTICALLY OPTIMAL TESTS FOR MULTINOMIAL DISTRIBUTIONS
by
Wassily Hoeffding
JUly 1964
This research was supported by the Mathematics Division of the Air Force
Office of Scientific Research.
Institute of Statistics
Mimeo Series No. 396
Corrections to
llAsymptoticaUy c)ptimal test.s for multinomial distributions"
}'1:'"
.}
(Universi tor of North Carolina) Institute of Statistics
Mimeo Series No. 396)
~5.l8}.
On p. 5 4 delete the line following
0
,
On pc.
50 5 replace the thre e line s follovTing (50 22) b;y:
The follolTinc lcr.1!:la t·,ill bE'
in t.he next sectiol!Jl.
'-l ~;ed.
On po
5.5 delete the three-line sentence preceding (5.24).
On pp, 7.1-7.9 the letter
£
is blurred in r:JB..ny places.
Correction
As it<c.~:PE!ar~
Po ~'o 2, :Lemma 4.2
Po 4.1j. ) Lemma 4.5(b)
P. 5.3) line 5
L
6.1) paragraph after
Definition 6 1
(4.4)
n
Y €
0
1::
IJ·~
0
Pi ::: 0
relative to p or
y
(4.31
¢
l;
p~
~
f-
0
rela.tive to
A
0
P. 603,
Po 6, 1.~,
P. 604,
p" 7.3,
P. B.l,
Theorem 6.3(II)
point of
line 6
line 9
line 3 from bottom
p(Y)
Pity)
for any
p(x)
line 10 from bottom
fN)
Po 8,2, line 4
p(z"
Po 8h 2, line after '[8.3)
{
1
PijX}
P. 9·1, {9.1J)
k
E
i=l
'p., Al, line 10
Po A3, line 8 from bottom
P. A5, line 7 from bottom
~r
)
!Ix _P )2
'i i
replaced
similar
((,r (Nil .. (NY)
Yi
," i
,1
replaced by
sir:dlar to
(z~N) .. }N)
~
~
P
ASYMPTOTICALLY OPl'IMAL TESTS FOR MULTDIONIAL DISTRIBUTIONS
l
by
Wassily Hoeffding
University of North Carolina
===================================================================================
Summary.
Tests of simple and composite hypotheses for multinomial distribu-
tions are considered.
It is assumed that the size
at a suitably rapid rate as the sample size
~
N irx::reases.
this paper is to substantiate the following proposition:
is "sufficiently different"
ratio test of size
<
~
~f
a test tends to
0
The main concern of
If a given test of size
C)r
from a likelihood ratio test then there is a likelihood
vhich is considerably more powerful than the given test
at "most" points in the set of alternatives When
N is large enough.
In particular,
it is shovin that chi-square tests of simple and of some composite hypotheses are
inferior, in the sense described, to the corresponding likelihood ratio tests.
Certain Bayes tests are shown to share the above-mentioned property of a likelihood
ratio tes·t.
l.rhis research was supported by the }/!8.thematics Division of the Air Force
Office of Scientific ResearCh.
,.
.
1.1
1.
Introduction.
This paper is concerned with asymptotic properties of tests
of simple and composite hypotheses concerning the parameter vector
of a multinomial distribution as the sample size
traditional asymptotic test theory the size
power is investigated at alternatives
as
N
~
00
,
p
p
= (Pl'"
N tends to infinity.
,Pk)
In
a of the test is held fixed and its
= p(N)
which approach the hypothesis set
in such a way as to keep the error probability away from
O.
The se
restrictions make it possible to apply the central limit theorem and its extensions.
However, it seems
number
Ie asonable
to let the s1 ze
N of observations increases.
~
of a test tend to
0
as the
It 1s also of interest to consider alterna-
tives not very close to the hypothesis, at whl.ch, typically, the error probabilities
will tend to zero.
To attack these problems, the theory of probabilities of large
deviations is needed.
For the case of sums of independent random variables thi s
theory is by now well developed.
It has been used by Chernoff [1]
to compare the
performance of tests based on sums of independent, identically distributed random
variables when the error probabilities tend to zero.
Sanov [2]
made an interesting
contribution toward a general theory of probabilities of large deviations.
He
studied the asymptotic behavior of the probability that the empirical distribution
function is contained in a given set A of distribution functions when the true
distribution function is not in A.
For the special case of a multinomial distri-
bution a slight elaboration of one of Sanov' s results implies the following.
1.2
, Let the random vector
z(N)
ta.ke the values
z(N)= (nJN,'••• ,njN), where
nl""'~
are nonnegative integers whose sum is N, and let the probability of
.
(N)
( N ) . -k
ni
Z
=z
be N. Iri=l(Pi Ini ,',),' where P = (Pl'''' ,Pk ) E: n, the set of
points P wi th Pi
l.et A(N)
PN(A lp)
(1.1)
~
Pl + ••• + Pk = 1.
0,
denote the set of points
zeN)
Let A be any subset of
which are in
~.
n ,
and
Then for the probability
of zeN) C A we have (see theorem 2.1)
PN(Alp)
= exp
(- N I(A(N), p) + 0 (log
uniformly for A (i; n and P
=
,
x. log (x1.. /p 1.')
(1 2)
I(x, p)
(1.3)
I(A, p) = inf ( I(x, p)
Z
. 1
1.=
,
n , '\-There
E:
k
N)}
1.
I
x
E:
A } •
This elementary and crude estimate of the probability PN(A Ip) makes it possible
to study, to a first approximation, the asymptotic behavior of the error probabilities of an arbitrary (non-randomized) test of a hypothesis concerning p
these probabilities tend to
0
when
at a sufficiently rapid rate.
In section 3 the special role of the likelihood ratio test is brought out.
H be the hypothesis that
an observation
p
E:
zeN)
p
E:
A (A ~ n).
of zeN), for testing
Let
The likelihood ratio test, based on
H against the alternatives
n - A rejects H when
I(Z(N),
where
I (x, A)
which rejects
=
inf ( l(x, p)
H when
zeN)
E:
> const,
A)
Ip
E:
A}.
For the size of an arbitrary test
A we have from (1.1)
,
uniformly for A c: n
and
A
c:::. n , where l(A,A)::: in! {I(x,p),
XE:
A, pE: A} •
all tests of size <
of
The union of the critical regions
This easily implies the following:
C\
for testing H is contained in the critical region
LJ; z e
of a likelihood ratio test for testing H against p E n..A
whose" aN
satisfies
log ~
Thus if
aN
tends to 0
=
~
N).
+ O(log
faster than any power of N, the size of the
'1-r.
is, to a first approximation,
B(N)
log
C\-
~ ~
Of course,
gion does not differ much from B(N)
~
test
It is trivial the;t; the
~ C\.
test is uniformly at least as powerful as any test of size
We can also define a likelihhod ratio test of size
B(N)
~
whose critical re-
in the sense that both critical regions are
of the form
N I (z(N), A)
~
~
- log
0 (log N) •
+
The main concern of this paper is to substantiate the following proposition:
If a given test of size
ON
is "sufficiently different ll from a likelihood ratio
~
test, then there is a likelihood ratio test of size
C\
which is considerably
more powerful than the given test at "most" points p in the set of alternatives
when N is large enough, provided that
~
-+ 0
at a suitable rate.
of the words in quotation marlts will have to be made precise.
By
II
The meaning
cons iderably
more powerful" we mean that the ratio of the error probabilitie s at p of the two
tests tends to 0 more rapidly than any power of N.
A general characterization of the set
rN of alternatives p at which a
given test is considerably less powerful than a
is contained in theorem 3.1_
comparable likelihood ratio test
Sections 4 and 5 are preparatory to what follows
and deal with properties of the function
I(x, p)
and its infima.
In section 6
we restrict ourselves to tests whose critical regions are regular in a sense which
implies that the expression (1.4) for the size of a test remains true with
1.4
I(A(N), A) replaced by
x
€
I(A, A ),
the infimum of
I(x, A) with re spect to all
A (not only with respect to the lattice points
z(N)
contained in A), and
an analogous replacement may be made in the expressions for the error probabilities
PN(Af
I p),
where At
=
11 - A and peAt = 11 - A •
for regularity are given in the Appendix.)
z(N)e A (where
so that B(N)
A=~
(Sufficient conditions
Consider a test which rejects H when
may depend on N).
Let B=
(xII(x,A)~I(A,A)}
is the critical region of a likelihood ratio 'test'.
It is shown that the set
rN
,
Note that A C B.
essentially depends on the set of common boundary
points of the sets A and B.
In particular, if the A test differs suffi-
ciently from a likelihood ratio test in the sense that the sets A and B have
only finitely many boundary points in conmon then, under certain additional
conditions, a likelihood ratio test whose size does not exceed the size of the A
test is considerably more powerful than the latter at all alternatives except those
points p Which lie on certain curves in the
(k-l)-dimensional simplex 11
and
those at which both tests have zero error probabilities.
In section 7 the result just described is shown to be true for a chi-square
test of a simple hypothesis \'/hose size tends to 0
This is of
at a suitable rate (theorem 7.4).
special interest in view of the fact that if the size of the chi-square
test tends to a positive limit, its critical region and power differ little from
those of a likelihood ratio test.
theses are briefly discussed.
In section
8 chi-square tests of composite hypo-
An example shows that at least in some cases the
situation is similar to that in the case of a simple hypothesis.
It is noted that
one common version of the chi-square test may have the property that its size
cannot be smaller than some power of N, lbich makes the theory of this paper inapplicable.
Certain competitors of the chi-square test are considered in section 9.
,.
1.5
It is pointed out that certain Bayes tests have the same asymptotic power
properties as the corresponding likelihood ratio test (section 10).
The likelihood ratio test was introduced by J. Neyman and E. S. Pearson in
1928 [3].
It is known that the likelihood ratio test has certain asymptotically
optimal properties when the error probabilities are bounded away from 0
(Wald [4]).
The present reSUlts are of a different nature and appear to be of a
novel type.
An extension of the results of this paper to certain classes of distributions
other than the multinomial class should be possible.
of several independent multinoroiel random
~tors
(The extension to the case
is quite straightforward.)
2.1
2.
Probabilities of large deviations in multinomial distributions:
estimates.
ZeN)
Let
(N)
Z
= (ZiN),
(N)"
= (Zl
Z~N»
.•.•.• ,
be a random vector whose values are
~.
(N)
~
' •••,
~
.
(N' ... , N) ,
=
)
nl' ••• '~ are any nonnegative integers such that
where
Crude
~
+ ••• +
~
= N,
and
whose distribution is given by
(2'.2) Pr ( zeN) __zeN) }
Here
P
= (PI'''. ,Pk )
=
PN
By convention,
Pi~
=
Z
. N1.
P =
,
~
....
I
nl
PI
I
~.
~
•••
Pk
•
is any point in the simplex
n = {(Xl' ••• '~)
n.
(2.3)
( (N) I)
1
~~
I
o, .•• ,~
~ 0,
xl + ••• + ~
= 1)
= n i = O.
if Pi
We can write
,
(2.4)
where, for any two points
x
and
p
,
n
in
k
(2.5)
l(x, p)
=
Z
Here it is understood that
We note that
u
x. log
i=l
(X./P1) .
J.
J.
xi log (x./P.)
J.
J.
l(x, p) > 0 unless
x
> 0, u f 1). Also,
l(x, p)
n
For any subset A of
<
~
(N)
if
= O.
x.
J.
(since
log
u > 1 - u- l
=0
and
x. > 0
p.
i.
contained in A will be denoted by A(N).
We
~
J.
let
z
p(z(N)j p)
zeN) e A
define
(2.7)
l(A, p)
=
inf {l(x,p)
for
for some
=
z
=p
unless
(2.6)
The set of lattice points
=0
I x e A},
l(A, p)
= +~
if A is empty.
2.2
The following theorem is a slight elaboration of a result due to Sanov [2].
Theorem 2.1.
where
For any set A C
and any point pEn we have
Q
C is a positive absolute constant.
o
(2.9)
PN(Alp)
=
exp { - N I(A(N), p) +
uniformly for A C nand
PN(A Ip)
uniformly for
Proof.
(2.10).
A
C
Hence
0 (log N)}
PEn • Also,
< exp (- N I(A, p) +
0 (log
N) } ,
nand pEn
(2.9) and since A(N) C A, (2.8) implies
Clearly (2.8) implies
It is sufficient to prove (2.8).
If A(N)
is empty,
= 0 and (2.8) is trivially true. Assume that
PN(A Ip)
is not empty.
The number of points
zeN)
implies PN(z(N) Ip)
A
E
zeN)
inequality
=
m~
m.l
m (2ron) 2 exp (-m +
N
Co
for some
(N~~~l). By (2.4),
~ exp { - N I(A(N), p») • Hence the second
1 ,
e
--r
W"
. 1
~=
for all
n.
~
n..
~
i, (2.11) is a fortiori true.
i
1.
k-l
> C N- ""2
0
If ni = 0
The first inequality (2.8) follows fram
is a positive absolute constant.
(2.4), (2.6) and (2.11) •
o< e<
12m)
Hence it easily follows that if n. > 1
~ p (z (N) I z (N»
= Nl
~
(2.11)
where
is easily found to be
(2.8) follows from (2.6).
By Stirling's formula, for
m!
n
in
(We can take
Co
=i.)
Theorem 2.1 is nontrivial only if the set A contains no points
are too close to
p.
z
(N)
which
In this sense the theorem is concerned with probabilities
of large deviations of ZeN)
from its mean p.
It should be noted that (2.9) gives an asymptotic expression for the logarithm
of the probability on the left but not for the probability itself.
This crude
result is sufficient to study asymptotically the main features of any test whose
size tends to
0 fast enough as
N increases.
3.
The role of the likelihood ratio test.
on the basis of an observation
H
toot the Parameter vector
z (N)
p
Consider the problem of testing,
of the random vector
Z (N), the hypothesis
is contained in a subset
likelihood ratio test for testing
H
n.
A of
against the alternative
pEA'
The
=n -
A
is based on the statistic
sup {p
(z(N) Ip) I peA)
N
(p (z(N)
N
sup
I p) I p
e 0)
=
exp { -N I(z
(N)
,A)} ,
where
(3.2)
=
I(x, A)
(3.1)
The equality in
H if
I{ z(N), A)
inf { I(x, p) I peA}.
follows from (2.4).
Thus the likelihood ratio test rejects
exceeds a constant.
Now consider an arbitrary test which rejects
is
any
n •
subset of
p
(3.')
(PN(~
sup
~
uniformly for
(3.4)
I(A,
~,
€
A)
I{A~N)
Ip) I pEA}
=
C n
A ( n , whe re
and
= inf
exp ( -N
inf {I(x, p) I x e A,
=
( I(x, A) I x E A}
=
, A) +
0 (log
of
aJ~
points
zeN)
contained in
(We could have assumed that
zeN),
~
~
~
N) } ,
pEA}.
inf (I (A, p), p
€
A) )
The test of the preceding paragraph will be referred to as test
~N)
where
For the size of the test (the sUPremum of its error proba-
I{A, A)
Clearly
zeN)
A) we have from theorem 2.1
bility for
€
H is
~.
•
The set
will be called its critical region.
contains no other points than the lattice points
bl1t it is often convenient to define the critical region in terms of a more
inclusive set.)
We may compare the test
if
zeN) C
~,
where
~
with the likelihood ratio test which rejects
H
~
(3. 5)
= (x
I I(x,
A)
~N)
Its critical region
~
eN}'
eN
~
Moreover, the size of the test
(N)
B is
N
for which
I(~(N),
B~N)
A)
is
~
eN.
Thus if N'ti/log N tends to infinity with N, which means
~
of the test AN tends to
then the size
~
of the test
log
In fact,
exp (-N ON + 0 (log N)} , since
that the size
sense that
A)
~N) •
contains the criticaJ. region
the union of the critical regions of all tests
I(BN ' A) = eN.
I(~N),
=
'1i = log '1i
0
faster than any power of N,
B is approximately equal to
N
'1i
in the
+ 0 (log N) •
In a similar way we can obtain the following conclusion:
~
critical regions of all tests of size
'1i
The union of the
for testing the hypothesis
peA
is contained in the critical region of a likelihood ratio test for testing peA
against p
fi
A
whose size
ct.N satisfies log
e>:N
= log qq
+ 0 (log
N).
The simple proof is left to the reader.
Since
~N)
~N),
the probability that test B rejects H is never
N
smaller than the probability that test ~ rej ects H. Thus B is uniformly at
N
least as powerful as AN' but the size of B is in general somewhat larger than
N
the size of AN. It may be more appropriate to compare a given test with a likeliC
hood ratio test whose size does not exceed the size of the former.
easily follows from
(3.3)
Now it
that we can choose numbers
(3.6)
such that the size of the likelihood ratio test
(3.7)
* =
~
(xl I(x, A)
.? cN +
is not larger than the size of test ~.
~)
If N~log N tends to infinity fast
enough, we may expect that the power of the test
the power of B •
N
B~
will not be much smaller than
For any
=n -
p e A l (A'
A)
the probabilities that the tests
~
and
B falsely accept the hypothesis are given by
N
(3.8)
PN(ANlp)=exp
(3.9)
PN(BNI p)
I(~(N),
lim
N-.ClO
(3.10)
(log
N»
,
p) + 0 (log N»
and
for which PN(AN I p)
pEA'
the test
(- N
PN(BN I p} .$ PN(AN I p)
Always
points
= exp
o
(-NI(AN(N),p)+
I(BN(N), p)
f
0
~
I (AN(N), pl.
and
N{ I(BN(N), p) - I(A;}N), p)} / log N
B is considerably more powerful than
N
of the error probabilities at
p,
At those
~
=
+
ClO
,
in the sense that the ratio
PN(B I p)/PN(AN Ip), tends to
0
N
more rapidly
tmn any power of N.
For the test
conclusion.
Note that
difference in
PN(AN, p)
BA whose size does not exceed the size of
f
B~
is not necessarily more powerful than
(3.10) with B replaced by
N
0,
B; may be negative.
lim
I(AN(N) , p)
l(Bi}N), p)
N-too
then the ratio
we have a similar
~,
and the
However,
if
(3.10) is satisfied, and
I(~I (N), p)
(3.11)
~
PN(B~' I p)/PN(AN I p)
tends to
0
o
=
more rapidly than any power of
The main conclusions of the preceding discussion are summarized in the
following theorem.
Theorem 3.1.
(a)
Let
A and
~
the size of the test AN
be non-empty subsets of
for testing the hypothesis
by (3.3) and its error probability at
(b)
There exist positive numbers
peAl
n.
Then
peA is given
by (3.8).
8
= O(N- l log N) such that the size of
N
the likelihood ratio test
Bfi = {x Il(x, A )
ceed the size of the test
~.
~
Iv.it),
A) +
~)
does not ex-
N.
(c)
For each
pEA'
such that
and (3.11) are satisfied, the ratio
at
p
tends to
0
In section
6
N,' p)
PN(A
~
0
PN(Bit'1 p)/PN(AN I p)
faster than any power of
and conditions (3.10)
of the error probabilities
N.
we shall continue in more detail the study of the set of
alternatives at which a given test is less powerful than a comparable likelihood
ratio test, assuming that the sets
~
are regular in a certain sense.
following two sections are preparatory to what follows.
The
4.1
4. The function I(x, p) and its infima.
In this section properties of the
function
k
I(x, p)
and its infima for
The function
by (2.3).
x
A or
€
lex, p)
is defined for
x
and
p
in the simplex
0 is considered as the s:Pace of the points
Throughout,
the Euclidean metric.
The closure of A
E Xi log (x./p.)
1.
1.
i==l
pEA are studied.
==
Thus the complement
A.
is denoted by
0
VIe define the subsets
The boundary of A
O(p)
and
0
of a subset A
At
==
( x
I xi > 0 ,
i == 1, ••• , k )
(4.2)
O(p)==
{x
I x.1.
if
Thus if
E o,
p
of those faces
{x
Lemma 4.1.
I{x, p)
<
O(p) ==
0
I x.1.
0 •
€
== o} of the simplex
0 S I(x, p) <
(a)
if and only if
00
If p
x
E
00
and p, with
0
is
0
by
,
,
or
0
0
O{p)
is the intersection
for which
=0
p. == O.
1.
if and only if
implies
I(x,
For each x EO,
...
pj)
I(x,
(d)
For each p
.)
is convex in
Proof.
(b)
(a)
If p
EO, I( • , p)
convex in
is
=
•
= p.
00
For
pj ~ p
o • That is,
is continuous in
I(x, p)
0
EO,
o
.)
I(x, p), even when
•
o.
For each
x E
o,
•
See section
< max.1. log (lip.).
1.
is similar.
lex,
x
o (p).
(b) For each p EO, I(., p) is continuous and bounded in
o
p € ot0' I(., p) is continuous and bounded in
O(p).
each
(c)
0 - A.
Pi == 0 }
I(x, p)
•
x
At.
(l
p EO) of
(for
0
== 0
A
is
(4.1)
0
of
given
0
2
I(. , p)
after
(2.5).
is bounded since
S
lX
For
p
I(x, p)
The proof of continuity is obvious.
i log (l/pi)
€
Ot
0
the proof
4.2
(c)
If x
If x e
nt(p)
E
xi > O.
with
(d)
n(p)
then I(x, p)
Hence
<
then I(x, p)
=
and the continuity at p is obvious.
If pt ... P then pi -+ 0 for some
co
I(x, pt) -+
co
co
The convexity of I(· ,p)
= I(x,
i
p).
and l(x, .) follows from the convexity of
u log u and - log u for u > 0 •
The next lemma is concerned with I(A, p), the infimum of I(x, p)
x e A.
The relevance of I(A (N), p)
clear from theorem 2.1.
for
for the approximtion of PN(A I p)
is
If the set A is sufficiently regular, the approximation
(2.9) is true ~dth I(A(N), p) replaced by I(A, p) (see section 6 and the
Appendix).
Lemma 4.2.
(a)
Let A be a non-empty subset of
P
Let
(4.3)
Then there is at least one point y
o
yeA,
If peA then
P
en.
A
~
n.
I(A, p)
=
I(y, p)
=0
such that
I(A, p)
and (4.3) is satisfied only with y
= p.
If
then I(A, p) > 0 and any y which satisfies (4.4) is in the boundary
of A.
(b)
Let p
A n n (p)
p
n.
o
Then I(A, p) <
is not empty.
co
if and only if the intersection
If this the case, then
I(A, p)
= I(A
n
n(p), p)
and the statements of part (a) are true with A replaced by A n n (p).
Proof.
The lemma follows easily from lemma 4.1.
assertion of part (a).
continuous)
I(A, p)
interior of A.
t < 1.
Since
Let p
en,
o
p
¢ A.
Then (since I(· ,p)
is
> O. Suppose that I(y, p) = I(A, p) for some y in the
Then the point
I(. ,p)
z
= (l-t)y
+ t p
is in A for some positive
is convex,
I(z, p) ~ (l-t)I(y,p) + t I(p, p)
=
(l-t)I(A, p) < I(A, p)
This contradicts the definition of I(A, p).
in the boundary of A.
We prove only the last
Hence any y which satisfies (4.3) is
4.3
A maximum likelihood estimate of
£
point
= ;(z(N»
p under the assumption
which maximizes
I p)
PN(z(N)
for
I(x, p(x»=
for any
x e n
p)
as a point in
is a
A). From
peA (or
(2.4) we see that ~ minimizes I(z(N), p), so that I(z(N),
extension, we may define ~(x)
peA
= I(z(N),
By
A).
A for which
The next lemma asserts the existence of at least one
~(x)
A be a non-empty subset of
n
I(x, A).
for each x.
4.2.
Lemma
Let
1\
the re is at least one point
/\
(4.4)
p(x)
(b)
If
x
(c)
If
x
-
+ A then
A.
I(x, p(x»
I(x,A) =
(d)
x
e
= I(x, A) •
° and (4.4) is satisfied only for
°
l(x, A) >
I(x,
For each
such that
A
A,
€
e Athen
boundary of
p(x)
n. (a)
and any p(x) which satisfies
A) is bounded in
n
if and only if
1(x) = x.
(4.4)
is in the
A () n
o
is not
empty.
The lemma follows easily from lemma
Remark.
A
I(x, A) may be bounded and not coninuous in
consists of two points,
is discontinuous at
4.4.
LeIlllIla
I(x,
A)
point
y
(1, 0, ••• ,0)
pl =
and
p2
n.
e
For example, if
n , then
o
I(x, A)
x = pl.
A
Let A and
is continuous in
be non-empty subsets of
nand
I(A, A) > 0.
n.
Suppose that
Then there is at least one
such that
(4.5)
yeA,
and any point
y
Proof.
is convex ,
z
I(x,
A).
By lemma
= (l-t)y
+
t ~
4.3
=
(4.5)
The existence of a point
the interior of A.
The point
I(y, A)
which satisfies
assumed continuity of
•
4.1.
I(A,
A),
is in the boundary of A.
y
which satisfies
Suppose that
(4.5)
follows
I(y, A) = I(A, A)
1\
there is a p
e -A such that
is in A for some positive
from the
for some
y
in
I(y, .A
p) = I(y, A) •
t < 1.
Since
I('
,'P)
4.4
1\
(4.6 )
I(z, p)
I(A, A )
due to
z E A, we have
implies the
.A
~
(l-t) I (y, p) = (l-t) I(A, A) < I(A, A) ,
,A
> O. But since
/\
I(z, p)
~
-
pEA, since
I(z,
N
~
I(z,')
is continuous, and
I(A, A), which contradicts
(4.6),
This
lemma.
We conclude this section with some remarks on the determination of the infimum
I(A, p) and on the set of points in A at which the infimum is attained.
We restrict ourselves to the case
pEn.
o
general case can be reduced to this case.)
~ I (A.• p) } ,
( x II(X, p)
gives
is
of C.
( x I E a.J. x.J.
a non-empty set
(x
I~
= 1)
C
and let
(a)
y
=
Let
(x
The following lemma.
al""'~
, where
k-2;
plane) in
are not all equal.
in degenerate cases, such as
, the di.nension may be less than
Lemma. 4.5.
is convex.
C
A hyPerplane (briefly:
= c)
The dimension of a hyperplane is at most
implies that the
The set A is contained in the set
whose complement
information about the boundary
(Lemma. 4.2
k-2.
En,
o
p
I I(x,
, o < c < max
p) < c )
I(X, p)
,
x
be a boundary point of C.
if y
E
n
o
then
k
(4.8)
I(x, p) - c
=
E (log (y./p.»
. 1
J.
].
(Xi - y.) + I(x, y)
J.
J.=
and the unique tangent plane of
(b)
If
yEn
tangent plane of
T
j
o
C at
·.e
at
then, for each
y
j
with y. = 0,
J
T. ={ x Ix. = O} is a
J
y, and C has no tangent planes at
. J
yother than these
and their intersections.
(c) All boundary points of C are in
•
T= {x I E (log(y]../Pi) ){x.]. -y.)
= 0 ) •
J.
is
(4.9)
c < - log (1 - PmJ.n
. ),
n
o
=
if and only if
min
i
Proof.
I(y, p) = c
y.
and y
and p
It is ·unique since the
(b)
y.
are in
Hence
O.
o
o I(x,
derivat~ves
T is a tangent plane of C at
p)/ 0 x. = log (x./p.) + 1
J.
J. J.
Clearly, if
y. = 0, then
T. = (x Ix. = 0)
J. J
J
is a. tangent plane of C
It is sufficient to prove that no hyperplane containing points in
a tangent plane at
taining y
y.
oo
oo,
be a point in
= (l-t)y
z(t)
+ t
X
O
Since FII(t)
c
=
F(t)
for some
I(z(t), p)
e (0, 1).
t
t e (0, 1)
We have for
o
= ot
=
I(z(t), p)
Z
Yi=o
x~
J.
k
Z (x~ - y.) log (zJ..(t)/Pi)
=
log t
i=l
+
J.
J.
0(1)
as
t
... 0
> 0, we have by Taylor's formula for t
=
is
C.
intersects the open convex set
,
We must show that F(t) < c
F'(t)
oo
This will follow if we shoW that every straight line con-
and some point in
O
X
Let
are con-
no •
tinuous in
at
The identity (4.8) follows immediately from the fact that
(a)
I(y, p)
= F(O)
It follows that F(t) < c
for
~
t
F(t) - t F'(t)
positive
E
(0, 1)
•
and sufficiently small, as was to be
proved.
(c)
Let
x
be a point
Pi = Pi/(l-Pj)'
i
I(x, p)
=
with equality for
Fj.
w. th
x j = O.
Let
p
be defined by Pj
Then
~ Xi 10g(xifPi ) - log (1 - Pj) ~
x
c < - log (1 - Pmin).
= 0,
= p.
Hence
I(x, p) ~
Part (c) folloYls.
c
- log (1 - Pj) ,
implies
x e
0
0
if, and only if,
4.6
Lemma 4.6.
Let
,
A = (x \f(x) > 0)
(4.10)
where the function
f(x)
n
is continuous in
and max
f(x) > O.
Let
p
be a
n
such that f(p) < O. Suppose further that the derivatives
o
..
f! (x) =
0 f(k)/ 0 x., i =1
1, ... ,It, exist and are continuous at all x in
n
1
0
point in
for which
f(x) = 0 •
Let
y
it is necessary that
(4.11)
where
=0
fey)
=a
log (Y./Pi)
1
a
A
be any point in
such that
I(y, p)
= I(A,
p).
and
,
i = l, ••• ,k
f!(y)
+ b
1
> 0 and b are constants.
Proof.
By lemma 4.2, any point
in the boundary of A.
Since
f(x)
y
in
A
for Vlhich
I(y, p)
is continuous, this means that
The method of Lagrange multipliers yields the necessary condition
constants
a
and b.
f(x)
and (4.8) since
Lemma
4.1..
That
= f(x)
f(x) > 0
a
Proof.
=(x
A and BI
y
II(x, p)
it follows from lemma
4.5
=
- fey)
implies
If A is convex,
The point
BI
p)
fey)
= O.
(4.11)
is
with some
E fi (y) (Xi - Yi ) + o( Ix-yl)
I(x, p) > I(y, p) •
A
y a
n no
A
is not empty, and if
such that
p e no' p ,
A,
I(y, p) = rCA, p)~
is a common boundary point of the disjoint convex sets
< I(A, p) }. Since A and
(b) that
y
is in
separating hyperplane of the sets A and B'
-
= I(A,
must be positive follows from
then there is exactly one point
point in
yen ,
o
Then if
which is in that hyperplane.
n.
o
B'
contain points in
Lemma 4.5 (a)
is unique, and y
n,
o
implies that the
is the unique
5.1
5.
The infimum of
I(x, p)
infimum I(B', p), where
=
B'
subject to the condition
(x II(x, A) < c ) ,
of the power of a likelihood ratio test
I(X,
10 <
The
is needed for the approximation
for testing the hypothesis
p
A consists of a single point
the case of a simple hypothesis, where
c.
£
A.
For
po, tbe
problem i s solved explicitly (theorem 5.1) and an asymptotic expre ssion for the infimum is obtained (theorem 5.2).
The case of an arbitrary A is then briefly dis-
cussed.
Theorem 5.1.
Let
(5.1)
po
B'
(I)
We have
and p
I(B', p) <
OCI
Z
p.~
J.
°
°
p.
< c.
J.
Suppose that condition (5.2) is satisfied.
(II)
a finite positive number
if and only if
- log
Y
c
< c) •
(x II(x, po)
•
n,
be paints in
Then there is a unique point
such that
I(y, po) ~ c,
(5.3)
If
I(p, po) ~ c
Pi = p/
(5.4)
If
c
then
oZ
Pj~
=
I(B', p)
°
y
=p
, "There
if p~J.
Pj
I(y, p)
o·,
-'F
< I(p, po) then
y.
3.
= (p~)l-S
J.
p~/ M(s)
,
3.
i
= 1, ••• , k
,
and
(5.6)
where, for
I(B', p)
=c
,
0 < t <1
k
M(t)
and the number
(5.8)
=
Z
.J.=1
(p~)l-t p~
J.
s(o < s < 1)
S
,
- M'(s)/M(s)
M'J~l -
M\sJ
J.
,
M'(t)
is uniquely determined by
log M(s)
=
c
•
=d
M(t)/dt
,
Pr92.!:. First assume that
functions
4.2
I(. ,po)
and
c, then necessarily
ia ins points in
lemma.
4.6
n,
with f(x)
and
B'
equation F(t) = c,
=t
F(t)
Now F'(t)
=t
Hence
=p
=1
= l, •• "k
i
s = 1/(1+60) > O.
= c.
Thus
s
L'(t) - L(t),
for
L(t)
= log
M(t)
t> O.
Also,
that
= I(y,
I(B', p)
Now consider the general case.
~
P.Fz 0p~J
y
p)
if
Pi ~
Define
0;
POi
-0
p
=
= W(l)
- L(l)
= I(p,po)
0 < s < 1.
is equal to the right-hand
pO
and p
are in
by
0
if
Pi
=0
•
J
In order that
is necessary that
The point
This implies that M(s)
This completes the proof for the case where
p:> = p~/
,
is a positive root of the
= L'(l)
F(l)
is uniquely determined by (5.8), and
s
n. By
must be in
°
I(y, po)
.
con-
where
L"(t) > 0
side of (5.6)..
I ( x, p 0)
and
must satisfy (5.8).
s
One easily calculates
~
C
y
a log (Yi/P~
) + b ,
1
~
By lemma.
£X I I(x, p) < I(B', p) }. Since B'
C =
Iiv.
n.
Then y is a common boundary point of the
we must have
=-
and the
•
(x, pO) +
must satisfy the conditions
is given by (5.7) and
°
implies that
°= -I
=p
Then p
(5.3) is satisfied, and if
such that
This :l.S equivalent to (5.5) with
a> O.
n •
lemma 4.5 (b)
log (y./P.)
~
~
where
are in
are continuous and bounded in
y
y
p
0 < c < I(p, pO).
Now suppose that
disjoint convex sets
and
I(. , p)
there is at least one point
I(p, po) ~
> c.
pO
I(x, p)
x
= I(x,
e
-0
P )
be finite for some
n(po) () n(p).
°
-0, p)
+ I(p
~
x
such that
< c,
I(x, po)
If this is the case, then
° = -
-0 p)
I(p,
log
Z
Pi~O
°
Pi
•
it
n.
°
These facts imply part (I) of the theorem.
(5.2) is satisfied, it follows from the preceding paragraph and
If condition
the identity
= I(x, p)
I(x, 1')
~ log
Z
p.
for
x e 0 (po)
p) -
log
E
J.
P~~O
that
:.
I(B', p)
is the infimum of I(x,
n
0(1')
0
Pi subject to the
Pi = 0
conditions
x e 0(1'°)
0(1') and I(x,
< c + log E
P~ = C, say •
pito
The solution of this problem follows immediately from the first part of the proof,
o
with o , 1', 1', c replaced by
0(1'°)
0 (1'),
p, C. It can be
n
po)
n
po,
verified that the result is equivalent to that stated in the theorem.
We now derive an asymptotic expression for the infimum I(B', 1')
as
'He confine ourselves to the case
00+0.
C
I(B', 1')
theorem 5.1,
as
5.2~·
Let
c we write
o
l' eO,
e
l'
o
B' ( c)
oo,
In this case, by
o
is finite for small values of
emphasize the dependence on
Theorem
l' eO.
for
c
of theorem
only if po eO.
o
To
B' •
=
B'(c)
Ix
I
I(x, po) < c ) .
c ... 0,
I(B'(c), 1')
= I(po ,1')
1
~
J..
~ (2m )2 ~2 + (1 + 3
2
m
2
) c + O(g
3/2
)
,
where
(5.10)
m.
Proof.
(5.11)
where L(t)
(5.12)
J
=
By
theorem 5.1,
I(B'(c), 1')
= log M(t),
F( s c )
=c
=
c ~ L' (sc)
M(t)
,
All derivatives of L(t)
F'(t)
=t
L"(t),
F' net)
,
=
and s
F(t)
and F(t)
=
> 0 is determined by
t L'(t) ~ L(t)
exist for all real t, and we have
F"(t) = L"(t)
= a·n(t)
c
+ t L·tt(t)
+ t L(4)(t) •
,
Then
5.4
=0
Since F(O)
Sc
~
0
is strictly increasing for
t > 0, we have
c .... 0 •
as
~O,
t
As
and F(t)
~ L"(O) t 2
F(t) =
+
~ L" '(0) t 3
+
0(t 4 )
•
It is easy to calculate that
= 0,
L(O)
where
m
j
L'(O)
= _I(po,P),
is defined by (5 .10).
=
c
Hence as
1
-2
m2
=
1'(0) + L"(O)s
= m2,
L"{O)
c
LIII(O)
= -~ ,
.... 0 ,
S2
c
This implies
Now
...
.
"
L.' (B )
C
= _I{po,
p) + m2 Sc -
C
~
+
m,
.!2 L'"{O)
s~
S2
c
O(S~) •
+
With (5.16) this yields
1
"3
(5.17)
(m,!m2 )
The expansion (5.9) follows fram (5.11) and (5.17).
Now let
A be a non-empty subset of
(5.l8)
=
B'
nand
(x I I(x, A) < c ) •
We briefly discuss two methods for evaluating
Suppose for simplicity
o<
that p
E
no
and y
that
y
E Bl ,
I(y, p)
must be in the boundary of
I(y, A)
= c.
and assume
c < I(p, A) •
By lemma 4.2 there is at least one point y
(5.20)
I(B', p).
Bt.
such that
= I(B',
If
p)
I(· , A) is continuous, this means
Note that in general the set
Bt
is not convex and there may be
5.5
more than one minimizing point
By lemma 4.3, for each
A
(5.21)
p(y)
y
there is at least one point
I(y, ~(y»
A ,
E
y.
=
p(y)
such that
I(y, A) •
Let
(5.22)
B'y
=
{x
I I(x,
~(y»
The following lemma shows that
~(y) the infimum I(B',
p)
y
some pair
(y, p(y»
Lemma 5.1.
dition
and
= I(B',
y
I(B', p)
enables us to evaluate
Let
B'
)
•
p).
For given points
is obtained from theorem 5.1.
Let
y
and
y
and
Thus the knowledge of
I(B', p).
be defined by (5.18).
(5.19) is fulfilled.
p(y)
Suppose that
pEn
°
and con-
be points which satisfy
(5.20)
(5.21). Then the set B'y defined by (5.22) is a subset of B' and
(5.23)
have
A
p(y)
Since
B"".
y
Hence
and, by lemma 4.3,
for all
x. Since
y
E
I(x, A )
BT,
= I(x,
I(y, ~(y»
-A), we
= I(y,
A) ~ c.
B' C B' •
Y
I(B', p) < I(B', p).
This implies
I(y, ~(y» ~ c <
I(B;, p) •
-A
€
I(x, A) ~ I(x, p(y»
It follows that
E
=
I(B', p)
Proof.
y
< I(y, p(y»
-
00
y
and IC' ,~(y»
On the other hand, since
is continuous in
I(B', p) = I(y, p) > I(F, p)
-
Y
= I(B',
Y
n (p(y»), we have
p), and
(5.23) folloVls.
The following alternative procedure for evaluating I(B', p) requires no knowledge of the minimizing points
blem.
y
£(y)
and
but involves another minimizing pro-
We have
This implies, at least for
pEn
a
,
For each po, I(B'(po), p) can be obtained from theorem
of evaluating
I(B', p) is reduced to that of minimizing
5.1. Thus the problem
I(B'(po), p) for po EA.
6.1
6.
The set of alternatives at which a likelihood ratio test is better than a
given test.
In this section we shall consider tests which satisfy certain regu-
1arity conditions and shall investigate the set of alternatives at which a 1ike1ihood ratio test of approximately the same size has a smaller erlO r probability
than the given test when N is sufficiently large.
(A ) of subsets of n
A sequence
Definition 6.1.
N
is said to be regular
relative to a point p in n if
(N)
-1
(6.1)
I(AN ,p) = I(AN, p) + O(N log N) •
A sequence (~) is said to be regular re1at i ve to a subset
(6.2)
I(~N), A) = I(~, A) + O(N- 1 10g N) •
A of
A subset A of n is said to be regular relative to p
A ) if
(or (6.2»
holds with AN
=A
(or
n
if
(6.1)
•
Sufficient conditions for a sequence of sets to be regular relative to p
A
or
are derived in the Appendix.
From theorem 2.1 and definition 6.1 we immediately obtain
Theorem 6.1.
PN(~ I p)
(6.3)
=
sUPpeA PN(~ I p)
= exp
is regular relative to p
(-N I (~, p) + O(log
exp
If (~) is regular relative to
(6.4)
(~)
If the sequence
N) }
A then
(-N I(~, A ) + O(log N»
•
We now state another version of theorem 3.1 which compares a test
testing the hypothesis
pe A with a likelihood ratio test.
( 6. 5)
I I (x,
B( c )
Theorem 6.2.
let
(6.6)
=
(x
Let
(~)
then
A)
2:
for
~
Let
c)
be a sequence of sets regular relative to
A
and
6.2
8
N
There exist positive numbers
and for any
p
€
= O(N- l
log N)
n such that the sequence
{
such that
AN}
is regular relative to
p,
PN(B'(CN + ~) I p) ~ exp (-N ~(p) + N eN(p) + 0 (log N) } PN(AN Ip)
(6.8)
,
provided that the two probabilities in (6.8) are different from 0,. where
(6.10)
The proof is clear if
=
note that relations (6.:3) and (6.4) with
"VTe
by ~ are true for arbitrary sets ~.
Hence to obtain inequalities
(6.8) it is not necessary to assume that the sets B(CN + 8N)
(6.7)
replaced
and
and their comple-
ments are regular.
The assumption that the probabilities in (6.8) are positive implies that
I (AN, p)
and
I(B'(C
By (6.8),
~ = A
ditions
if
(i)
PN(B'(CN +
the ratio
are finite, so that the differences
N ~(p)/log N -+
8N)
I p)/PN(AN
is independent of
set of points
Ip)
N, so are
(i) and (ii) reduce to
I(B'(c), p)
~(p)
+ eN)' p)
~(p)
and
ar~ defined.
eN(p)
If
N
d(p)
for which
d(p)
tends to
c
N
= c
(ii)
0
and
is positive.
faster than any power of N.
~(p) =
d(p), and con-
In the general case the set vThere
be seen in the sequel.
following theorem gives a characterization of this set.
N.
0, then
c, and then we need only determine the
> 0 is also of primary importance, as will
we omit the subscripts
eN(p)/~(p) -+
> 0 and eN(p) -+ O. The latter is true if
is a continuous function of
p
and
00
The
To simplify the notation
Theorem 6. ~.
o < I(A, A) <
Let A and
Let
00.
(6.11)
=
B
and for any p
d(p)
(I)
Always
(6.13)
(II)
d(p)
I I (x,
~ (x
such that
(6.12)
=
I(B', p) - I(A', p)
2:
0;
(x
I I(x,
If
d(p) = 0
p)
if and only if
d(p) == 0
)
I(B', p)
<
Suppose that
nand p
o
E
(6.14)
CA.
<
and O<I(B', p)
a point in '2\. such that
1\
let
00
of A and.
=0
d(p)
then
00
I(B', p)
= I(y,
for
p)
B.
00 • Let y be a common
and 0 < I(B', p) <
boundary point of A and B such that
p(y)
2: I (A, A)}
A)
I(A', p) <
some common boundary point of y
(III)
n such that
A be non-empty subsets of
A
p(y)
I(y, p) = I(B', pL and let
I(y, ~(y» = I(y, A).
If P
and
be
n
o
yare in
then
is on the curve
= pet)
p
- 00<
,
t
<
0
,
where
k
E
(6.15)
y~-t ,
Pj(y)t
i
= l, ••• ,k
•
j=l
Proof.
rex,
X
€
d(p)
p)
2:
AI
=0
B'
C
A C B,
d(p)
2: O. If d(p) = 0 then x
I(B', p), which is equivalent to
implies
2:
I(x, p)
(6.13).
I(B', p), hence
(6.13)
If
2:
I(A', p)
€
A'
implies
is satisfied then
I(B'I p)
and therefore
•
(n)
By lemma
Si~e
(I)
Suppose that
4.2,
A', y
I(B', p)
= I(y,
is in A', and
boundary of A' •
If P
d(p)
¢ n,
o
Thus
y
=0
and
p), where
I(y, p)
0 < I(B', p) <
y
= I(AI,
00
•
First assume
is in the boundary of B'.
p).
Again by lemma.
is a common boundary point of A and
4.2,
p
€
no •
Since
y
is in the
B.
the proof is ana.logous, with reference to lemma 4.:2 (b) •
6.4
(III)
Under the assumptions of part (III) the conditions of lemma.
c = I(A, A)
are satisfied.
=
B'y
is a subset of
assumption
Hence
B'
and
yen. )
o
with
Hence the set
(x I I(x, ~(y»
I(y, p)
Since
y e n (p(y».
5.1
y
= I(B',
E
I(y, ~(y»}
<
= I(B',
y
p)
5.1
(This is true without the
BT , we have I(y, p(y»
In particular, if ye n
It follows from theorem
p).
then p(y) e
0
with po = ~(y)
S. I(A, A ) <00.
= I(y, A )
no'
(or, more directly, by an argu-
ment used in the proof of that theorem) that
i = 1, ••• , k
where
a
> O.
(6.14)
This is equivalent to
(6.15)
and
with
,
= -a < O.
t
The proof
is complete.
Remarks on theorem 6.3,
implies
I(B', 1')
=
If
I(B', 1')
= 0,
vie have excluded the case
PN(A' Ip)
that is,
l' e
I (A', 1')
continuous then
A=~
Bt,
Theorem 6.3
=0
p
<
for
then clearly
= O.
d(p)
At such alternatives l'
00
and
(In this case
the error proba-
I(B', 1')
of interest to us are those for which
I(B', 1') > 0
¢ A
~ich
,
can not be very small.
d(p)
=0
imply
I(B', 1')
<
I(~, A)
~0
then
> O.
The
IfI(. ,A)
00
if and only if I(p, A) > I(A,~.
depends on N in such a way that
for each l'
d(p)
PN(B'I 1')
and
The alternatives
conditions
00
00
the set on the left of (6.13) is empty.)
bilities
=
I(A', 1')
is
(Note that if
I(p, A)
> I(Aw, A)
N large enough.)
shows that the set of points
l'
for which
I(B', 1')
>0
and
essentially depends on the set of common boundary points of A and B.
Suppose, in particular, that the test with critical region A(N)
differs sufficientw
ly from a likelihood ratio test in the sense that the sets A and B have only
finitely many common boundary points
y.
Under some additional oonditions theorem
6.3 implies that the set of points p with d(p) = 0 is small in a specified
sense; this is made precise in the corollary stated below.
To simplify the statement of the theorem we have assumed in part (III) that
as well as
p
n.
are in
o
For the general case theorem 5.1 implies a similar
result except that, for given points
be of more than one dimension.
y
/\
(Compare example
may
8.2 in section 8.)
(6.14) is necessary but not in general sufficient for d(p)
the complemmt A'
d(p) = 0
and p(y), the set where
Under the assumptions of part (III) the condition that
if, for instance,
y
p
is on the curve
= O.
It is sufficient
of A is convex.
We state the following implication of theorem 6.3.
Corollary 6.3.d.-
(6.11)
(6.12).
and
and B is finite;
Let
0 < I(A, A) <
I(B', p)
and let
Band d(p)
be defined by
Suppose that the number of common boundary points y
that all these points
there are only finitely many points
Then if
00
yare
p(y) E A
in
of A
n; and that for each y
o
such that
I(y, p(y»
= I(y,
A).
> 0 and pEn,
we have d(p) > 0 except perhaps when p is
a
on one of the finitely many curves (one for each pair (y, p(y»
defined by (6.14)
and (6.15).
In the special case of a simple hypothesis, where A
. t
p01n
0 we h ave
p,
E n
o
B
=
(x
I I(x,
po) ~ leA, po».
then, by lemma 4.5, the condition
Here
consists of a single
p(y)
= po
I(A, po) < - log
(1 -
is sufficient for all common boundary points of A and B to be in
We conclude this section with
for all y.
P°min)
n•
°
a lemma. concerning the behavior of eN(p) as
defined in (6.10) for the case where
A consists of a single point po.
6.6
Lemma 6.1.
pOE n,
Let
B f ( e)
Then as
(6.16)
pEn,
° I I (x, °pO )
< e ) •
=
o( 8 e- 2')
= (x
8 -+0+ ,
I(Bf(e), p) _ I(B'(e + 8), p)
uniformly for 0 < e < I(p, pO) Proof.
=
Let J(e)
'1,
'1 >
we have
Ff(sc)s~ = 1.
Jf(e)
Therefore for
8
o>J
>0
F(se) = e,
is defined in (5.7).
- L"(Se)
S
-2
e
S
I
c
=
s~
=1
_
lemma.
-+0,
= tL'(t)
For the derivative
s~
- L(t) ,
s~
= dse/de
= 1. Hence
s-3 L"(s )-1 > 0
c
e
,
(e +8 ) - J(e) ~ 6 Jf(e)
.!.
c
- LI(se)
s~l ,
For e bounded away from 0 and I(p, po),
As
F(t)
=e
Since Ff(t) = tL"(t), we obtain L"(Se)sc
=1
Jtl(e) =
o.
By theorem 5.1, J(e)
I(Bf(e), p).
for 0 < e < I(p, pO), where 0 < se < 1,
L(t) = log M(t), and M(t)
1
.!.
Sc'" (2/~)2 e 2
= s
8
e
by (5.16), where
l-s e
is bounded away from 0 and 1.
m > O.
2
This implies the
7.1
7.
Chi-square and likelihood ratio tests of a simple hypothesis.
Let
po
n.
be a point in
Let
°
H: p
The chi-square test for testing the simple hypothesis
Q2(z(N), po)
50:'
~
where
H if
rejects H if
We shall compare this te st
-N is a positive nunite r.
with the likelihood ratio test which rejects
= po
I(z(N), po)
~ ~,where
c
N
is so chosen that the two tests have approximately the same size.
It is well known that if P = po
and
2N I(z(N), po)
of freedom.
then the random variables
have the same limiting
~2
Hence if
2 (t.'N
=
c
N
X2
= c/N, where
N Q2(z(N) , pO)
distrib.1tion w.L th
c
k-l
degrees
is a positive constant, the
sizes of the two tests converge to the same positive limit.
In fact, in this case
the critical regions of the two tests differ very little from each other when
is large.
Indeed, we have
(7.2)
Ix - pO I
that the set
the form
.. . ' . .'
=
I(x, pO)
where
.
~ Q2(X, pO) + o(lx _ pOI 3) ,
denotes the Euclidean distance between
(x I Q2(x, pO)
(x I I(x, pO)
< 2c/N}
,Hi:r·/c;::,.;"~
< c/N + O(N-3/2)}.
.. that at any point p
I
pO
~
x
and pO.
This implies
both contains and is contained in a set of
whi ch is in
bilities of the two tests is bounded away from 0
p
N
no' the error probabilities at p
n
°
and
c
O.VI_,;;t~.·~ ~
$
h c,;v"
the ratio of the error probaco
as
N -'00 •
of both tests are zero for
(If
N sufficiently
large. )
In this section it will be shown that if
then at "most" points
p
IS tends to 0
lftOt;~too·:.~~..
the error probability of the likelihood ratio test is
much smaller than that of the chi-square test when
N is large enough.
»
We first observe that
where
xi
° = min Pi.°
l?min
= 0,
The ul?per bound is attained if and only if xj
i ; j, for some
j
such that P~
= l?~n
~ (1 - p~n)/p~n. The case
42 = (1
and
•
Hence when we consider the test defined by Q2(X, po) ~
E!1
=1
- p~n)/p~
c 2,
we may assume that
is trivial, and we shall re..
strict ourselves to the case of strict inequality.
lbIie that
(~
po =
,
. ~n
~
11k, and
P~n < 1/2 unless
k = 2
~).
Let
Theorem 7.1.
If
r
Suppose that l?°
E
no and
°
such Pj
is the number of components Pj
'1-
points y
° = Pmin
° '
there are exactly
r
such that
These points are defined by
(7.7)
if = b P~
(7.8)
a =1 -
if
i = j
and
(p~(l .. l?~»t c: ~
l?~ = P~n'
b
=1
yf= a
+ «1 ..
l?~
I(A(€), po) = P~n b log b + (1 - p~n) a log
Furthermore ,
i
p~)/p~n)t
and we have
(7.9)
if
a.
Fj
,
E. ,
and
7.3
°
°
Pmin
I-Pmin
Proof.
Let
we must have
y
= y.f,
if Pmin
2
It can be shown that necessarily y
Q2 (y, pO) = [.2
be deduced from lemma 4.5(b).
implies
y
E
no.
'21 •
,
l-~
As
0,
By lemma 4.2
denote any point which satisfies (7.6).
Q2(y, pO) = [.2.
. ;/ small enough,
< :2
c., pO) _<:.
° =
1 u£".2
where the second expression is to be replaced by
t.~2 _< I(A(~)
n.
E
°
(For
In general this result can
The details are left to the reader.)
By lemma 4.6
we must have
l~g (Yi/P~)
where
s
s Yi/P~
+
t,
i = l, ••• ,k
Yi=ap~ if
i
6
Yi=bP~
M,
M is a non-empty proper subset of
and Q2(y, pO)
=E:2
h
We may assume
a < b.
=
i
{l, ••• ,k}.
(a_l)2 h
~
i6 M
;
M,
The conditions
l:u.~
=1
°...
p~
+
(b_l)2 (l-h) =~2
•
Then
.!.
a = 1 - «1 - h)/h)2
To satisfy Yi
if
imply
a h + bel - h) = 1,
(7.15)
,
> O. Hence y/p~ can take at most two different values, say
(7.12)
where
=
l..
t,
b = 1 + (h/(l-h» 2 ~.
> 0 we must have a > 0 , that is,
~~.2
< h/(l-h).
If [,2
is
close to its upper bound (1 - p~n)/p~n' this condition is satisfied only when
takes its largest possible value,
y satisfies (7.6) if and only if
For y
1 h
=1
defined by (7.12) we have
p~n.
It will be shown that, for any
- P~n •
h
,
7.4
=h
I(y, pO)
a log a + (1 - h)b log b
are given by (7.15).
of
= f(h),
say, where
a
= a(h)
and b
= b(h)
By a straightforward calculation we obtain for the derivative
f(h)
~ + ~
f' (h) = b { 1 -
The expression on the right is negative.
f(h) attains its minimum at
~
satisfied if and only if
that
I(A(l.), pO)
y~
~)
(1 +
log
Hence as
h = 1- pO .•
nun
%} .
h
ranges over the values
(7.14),
This implies that condition (7.6) is
,
is one of the points defined by (7.7) and (7.8), and
is given by (7.9).
The inequality
~ !:2
I(A(i-), pO)
in (7.10) follows from the general
inequality
(7.16)
I(x, p)
=
E Xi log(Xi/P.)
< E Xi«X./P.)
- 1)
,J.J;J.
= Q2(X,
p)
The first two inequalities in (7.10) are contained in theorem 1 of Hoeffding [5].
(Note that the closer lower bound in (7.l0) is attained for
l.
°
° - Pmin»
° 2 .)
6= (1 - 2pmin)/(Pmin(l
The expansion
(7.11) is easily verified.
The next theorem gives the infimum of
X
A'(t), that is,
E
Theorem 7. 2.
unique point
Q2(x, po)
<
O
p En,
pEn,
Let
z·f, such that
(7 • 18)
.
As
to the
ao&d¢tton
c!.2 •
t >
°
p)
0,
=
(;2
I(z'~
< Q2 (p,
O
p ).
Then there is
p)
z '.- is determined by the conditions
°
'J,
0)/Pi
( zi·
... Pi
k
(7.19)
I(x, p) SUbject
I(Ar(~J,
.
The point
°
The proof is complete.
E
i=l
z~ = 1,
IS:, -+0 ,
=
)
-s.-i 1og (:s.;
zi: Pi
Q2(Z!; po)
=
+"
t...
~_,2
i=l, ... ,k;
•
~
>
0 ,
7.5
(7.20)
(7.21)
wl:ere
=
m. (p)
(7.22)
J
Proof.
By
k
1:
i=l
lemma 4.7 there is exactly one point
z!· which satisfies (7.17).
'j;
By
z·li.- must satisfy (7.18) with si, > O.
lemma 4.6,
are determined
by
The constants
(7.19).
",.
Now let
i
= l, ••• ,k.
r-. -) O.
The condition Q2(Z'-; po)
= C~2
implies
si. and t_,s
...
z.J. - p~J. =
oCt),
Hence
log
(z'~/Pi)
= log (p~/P.)
J.
J.
J.
t
+ (z'i -
p~)/p~
J.
J.
+
O(t.") •
With (7.18) this gives
Multiplication of both sides with p~ and summation yields
J.
=
t."
t.
s;; (I(po, p) +
0(~'~2»
•
':J
From (7.23) and (7.24) we obtain
a
,
s
Pi
(7.25)
(z~ - p~)/p~ = - 1+1~ (log - - I(po, p) +
Pi
e..
If we square both sides of (7.25), multiply with P~ and sum with respect to
o(e» .
i, we find that
2
Hence
s .'.
'-
1 + s
[
From (7.25) and (7.26) we obtain relation (7.20) •
Now
7.6
With (7.17) and (7.20) this implies (7.21).: This completes the proof.
Let A('.I;) be defined by (7.4) and let
B(!"
=
(x
d(p, l:)
=
I I(x,
•
° ,0< /-'< «1-P~n)/p~n)2
1
Let po e
(I)
no
E
,
~ I(A(i), pO»
I(B'(!), p) - I(A'(~), p)
~orem 7.3.
If p
po)
and
0
Q2(p, po) >,,-,2
•
d(p, t) > 0
then
unless for some
j
(7.30)
(7.31)
(II) As
;, -+ 0
,
1
d(p, l) = b
where
.!.
m (p).2 A (p) (}
2
+
0(~3),
E
no'
is defined in (7.22) and
m.(p)
J
~(p)
A (p)
m (p »)o'?"r"T':::""2-
=
2
(ill)
less
We have
A(p) ~ 0
satisfies (7.30) with
p
°
places the parameter
(7.34)
a ~
1,
p
0
<a <
p ~ po;
(1 -
and
A(p) > 0 un-
p~n)-l for some
such
j
°
that p. = Pmin •
J .
Proof. Part (I)
(II)
for all
follows from theorems 6.3
t
in theorem 6.3.
By theorem 7.1, as
;
-+ 0,
and 7.1.
The parameter
a
re-
7.7
By
theorem 5.2,
where
= I(A(t),
C
o
1
1
Il)(p) .
3/2
I(p, p) .. (2m2 (p»2 c 2 +(1 + 3m (P»)C + O(c
)
2
=
I(Bt(:f.), p)
From (7.34) and (7.35) we obtain after simplification
pO).
o
.1.
= I(p , p) .. ~(p) 2
I(Bt(,:>, p)
~(p)
m (p)
e:
1
.. 15
2
By theorem 7.2 ,
o
.l.
1
I(p, p) .. m (p)2 ~ + 2' ,£2 +
2
=
I(At({), p)
o(l:'sf.
The expression (7.32) for d(p, t) follows from (7.36) and (7.37) •
0
.1.
o
(III) Let u i = (log (P/Pi) .. I(p , p»/m2 (p'f
,
(7.38)
u
= (ul'.··'~)'
~ (u) = 0,
Then
7.3
~j(u)
~~/u) = 1,
-k
j
'0
~1=1 Pi u i
=
•
~(u) = ~(p)/m2(p)3/2.
is an immediate consequence of the following lemma.
o.
d(p, E)?
implied by
Lemma 7.1.
(Note that
A(p)
2:
0
is
The lema gives the conditions for equality.)
~. (u)
Let
Part (III) of theorem
J
be defined by (7.38), where
~ (u)
1
are any real numbers such that
=0
= 1.
~2(u)
and
'
p
O
€
nand
0
Then
1
~ (u) ~ (2p~n" 1)/ (p~(1 .. p~n» 2"
(7. 39)
The sign of equality holds if and only if for some
j
o
such that P j
0
= Pmin
i 1= j •
Proof.
~(u)
Since
=1
pO e
no' the, set of points
is bounded and closed.
Hence
~(u)
u
defined by
~1 (u)
has a finite minimum in this set.
An application of the method of Lagrange multipliers shows that for
minimizing point it is necessary that
i eM,
imply
u
i
= b
if
i
~ M,
a
> b.
.1.
u
i
u
take only two values, say u
The conditions
~(u) = (1 - 2h)/(h(1 .. h» 2 ,where h
= 0,
=
~ (u) = 0,
E1 e M P~.
to be a
i
=a
if
~2(u) = 1
The minimum with
7.8
respect to
h
h = 1 - P~n' and the lemma follows.
of this ratio is attained at
The following lemma. establishes the regularity of the sequences of sets (A(6
»)
N
{A! (;IN:)}
and
under general conditions.
Lemma 7.2.
tre sequence
Let po
(A{::N»
°
is regular at
(7.4l)
as
(A' (~N) )
the sequence
Proof.
p
Theorem. A.2
p.
(A(~) }
then
I(At(t), p)
is regular at
= I(i..;,
°
by
i
( At
is
C"N») is
~
1
J
/pOj I -+
as
00
N
-+ 00
•
(:N is bounded away from 0 or, by (7.20), if
The lemma foll,o,:.s.
,.
Remark.
The case where condition
(7.41)
is not satisfied is of no statistical
interest since if ~ ~~ is [)ounded, the size of the test tends to
r~ ~
p
p), where
of the Appendix implies that
(~N)
_ (z.
This condition is satisfied if
00
no' I~~~ < Q2(p,pO), and
if
i,j
-+
For any -::::N'
N -+00,
7.2, if ,[2 < Q2(p,pO)
(t )
N max I(z. N /p~)
(7.42)
E
is convex, the sequence
in the theorem.
regular at
If p
> &,2).
-
of the Appendix.
By theorem
defined
po.
is regular at
A t(~)
Since
theoremA.l
N~
A(G.-) = (x I Q2(x, po)
n,
e;
a 2/N2
and a
is sufficiently small, the set A' (~) (N)
finitely many N and hence the seqte nce (A' (.~N) )
1.
If
is empty for in-
is not regular at any
p
in
n
°.
We now can state the following result about the relative performance of a chisquare test and a likelihood ratio test of a simple hypothesis.
Theorem
(I)
pothesis
7.4.
Let
1
°
pEn
°
,. < « l-Pmin.l
° '/Pmin
° )2 •
and. 0 < "'"N
For the error probabilities of the chi-square test. which rejects the hyp
= po
if
z(N)
E
A(i )
N
=
(x IQ2(X, po)
~ e;~)
we have
7.9
I(A(.~), pO)
where
is given explicitly in
theorem
7.1; and if
(7.44)
as
N -+ co
,
then
,
where
I(A'(r), p)
(II)
is given in theorem 7.2.
likelihood ra.tio test which rejects
zeN) E
~ = O(N- l log N) such that far the
There exist positive constants
~=
(xl I(x, pO)
~
= pO
p
if
I(A(I ), pO) +
N
eN)
we have
(7.46)
and if conditions
d(p, l..,)
where
(7.44)
are satisfied
is defined in
,
(7.29) and has the properties stated in theorem 7.3.
In particular, if
(7.48)
N £;!log
then at each point
pEn,
o
p
f
pO
as
N -+ co
N
-+- co
,
which does not lie on one of the line seg-
ments
Pi
the likelihood ratio test
=a
p~, i
f
j;
0 < a < 1;
,
B . is more powerful than the chi-square test A(~)
N
when N is sufficiently large.
Proof.
Part (I)
follows from theorem
Part (II) follows from theorem
Remark.
The line segments
verti ce s of the simplex
6.1
and lemma
7. 2.
6.2, lemma 6.1, and theorems 7.1 and 7.3.
(7.49)
connect the point
pO
with some of the
n.
It would be of interest to investigate whether for moderately large values of
N the likelihood ratio test is more powerful than the chi-square test except in a
7.10
small neighborhood of the se l.ine segments when the size of the two tests is
not unreasonably small.
8.1
8.
Chi-square and likelihood ratio tests of a composite hypothesis.
There is reason to believe that in the case of a composite hypothesis the relation between a chi-square test and a likelihood ratio test in general is analogous to that in the case of a simple hypothesis (see section 7), with a notable
exception mentioned below.
For a chi-square test of a composite hypothesis the
determination of the common boundary points of the sets A and B (in the notation
of theorem 6.3)
is somewhat cumbersome.
We therefore present no general results.
We first shall show by an example that for one common version of the chi-square test
it may
happen that the size of the test is narer smaller than some power of N;
if this is the case, our theory is not applicable,
We then give
a simple example
where the situation is analogous to the case of a simple hypothesis.
There are several versions of the chi-square test for testing a composite
hyPOthesis,
statistic
p Eli..
One is the minimum chi-square test which is based on the
A.), where
Q2 (z (N) ,
(8.1)
is defined by
x.~ = p.~ = O.
hypotheses.
(7,1),
with the convention that
The calculation of Q2(X,
(x.~ - P.)2/
Pi
~
=0
A) is cumbersome for some of the common
When a maximum likelihood estimator
p(x)
pEA is available (as defined in lemma 4.3) one often
of p
under the assumption
resorts to the test based
on
(8.2)
If the size of the test is held fixed as
N increases and the set
sufficiently reeJ.lar, the tests based on Q2(X, A.)
a likelihood ratio test based on
thesis.
I(x,
or
1\
Q2(x)
A.), just as in the case of a simple hypo-
a
Q2
0 more rapidly
N, it turns out that this requirement can not in general
1'1
be satisfied with
A. is
differ little from
However, if we require that the size of the test tend to
than a certain power of
test.
if
8.2
'"
The
of smallest positive size for testing the hypotresis
= l~ .
only if Q2(z(N»
zeN)
N-1 log N.
p e .A rejects
H:
test
H if and
Suppose that this critical. region contains a point
which is close to.A
of order
Q2
I(z(N), A)
in the sense that
Then,
= I(z(N),
p(z(N»)
is
~2
b.r ( ;.3), the smallest positive size of a Q
test is
not smaller than some power of N.
The following example serves to illustrate this
phenomenon.
8.1:
Example
k = rs
where
!;!YPothesis of independence in a contingency table.
components of
r
(8.;)
~
E
n
be denoted by
> 2. Define x.(1)
s
2,
X
J.
=
A
Then P'j(x)
(p
=
J.
'"
I Pij
=
Z xi'
j
=
J
Z
i
= l, ••• ,r;
X. •
c
Let
J.J
j = 1, ••• ,
J.J
s }
-
1
,
j=l
A
Q2 test is proportional to
-r+l
N
•
In contrast, the size of the likelihood ratio test which rejects
~
= l, ••• ,s ,
X~.
s
E
Thus the smallest positive size of a
I(z(N), A)
j
and
r
Z
i=l
J
= l, ••• ,r;
(2)
x'
,
i
=
x~l)
x~2)
J.
J
Q2(X)
=
Xij ' i
Let the
does not exceed
exp (-Nc + 0 (log
N)}
•
H if
8•.3
In the following example the situation is similar to the case of a simple
hypothesis.
Exaruwl~ 8.2.
Let
x
= (~""'~)'
k ~ .3
,
i
>2
Hence
•
1
2
,
-2 )
max Q2(x)
= 1.
Let A
sets A and B =
points,
yl
(x
= «1-F.)/2,
=
(x
I Q2(X)
I I(x,
d(p)
that
>
(1+')/2, 0, ... ,0), y2
= «1+&)/2,
I(B', p) >
Pl
=
° or
° and
P2
d(p) =
the present hypothesis set
(1-tv)/2, 0, ••• , 0) •
no , part (III) of theorem 6• .3 is not di rectly
It is not difficult to show that if
° unless
It can be shown that the
A) ~ I(A, A )} have exactly two common boundary
Since these points are not in
applicable.
~ £.2) ,0 < l < 1.
= 0.
Thus if
I(p, A)
k ~
4,
> I(A, A) then
the set of points
° is of more than one dimension.
A is such that we have effectively
P
such
Since, however,
k =.3
components,
the result may be said to be analogous to that in the case of a simple hypothesis.
9.
Some competitors of the chi-square test.
There are a number of test
statistics for testing a simple hypothesis which have the same asymptotic distribution as the chi-square statistic when the hypothesis is true.
As an example
consider the test which rejects the hypothesis p = po if Di(z(N), po)
exceeds
a constant, where
k
E
=
2
D (x, p)
1
(x
i
i=l
(see Matusita [6]).
For po
E
no we have
Di(x, po)
ThuS
=
~ Q2(X, po)
+
o(
Ix
_ po I 3).
if the size of the test is bounded away from zero, the test behaves asympto-
tically as the chi-square test and differs little from the likelihood ratio test.
Let A =
o<
,1.\:2
< 2.
{x
1 D~
(x, pO) ~ e'2} ,
It is easily seen that there are only finitely many points y in A
for which I(y, po) = I(A, po).
Just as in the case of the Chi-square test they
are such that the ratio y]../p~].
takes only two different values.
If the size of
the test tends to 0 at an appropriate rate, the test compares with the likelihood
ratio test in a similar way as the chi-square test.
A
For testing a composite hypothesis we may use the test based on Di(x, p(x».
It can be shown that in the case of example 8.1 the size of this test may decrease
at an exponential rate, in contrast to the analogous chi-square test.
Another interesting class of'tests is defined in terms of the distances
(9.2)
D(x,p)= ~6,A{, Zi6M(xi - Pi)
where ~..
is a family of subsets of
subsets then D(x, p) =
and
(i, ••• ,k}
for
~ .. zlxi
.. Pi
i = 1, ••• , k,
Kolmogorov statistic (discrete case).
,
{l, ••• , k} • If jl-l.., contains all tre se
I.
If j1( consists of the sets £1, ... , :i)
then D(z(N), po) may be identified with the
9.2
e
Let po €
00'
0
< t:, < maxx
D(x, po).
The set A
I D(x,
= (x
is the union of the half-spaces ~ = { x I 1:i£M(xi - p~)
2: ~
,
po)
M
€
> C)
-
A.
Hence
we obtain
=
l(A, po)
where
~ =
J (h)
1: i
=
€
mi~ €~,({ l(~, po)
°
M Pi
( h +,~ ) log
The function
1/2 if e, is small.
P~
= l!k, i = 1, ... ,k,
points.
e
A J(~) ,
and
h
Again the minimizing points
vaJ.ues.
= mi~
J(h)
h+
t
+ (.
1 - h - £..r) log
1 - h -
1 _ h
y are such that the ratio Yi/P~ takes only two
has a unique minimum at a point
This implies that if D(x, po) ==
then, for
t.
I,.
~ 1:1
h
Xi -
small, there are close to
For the chi-square test this number if only
k.
°
which is close to
P~ 1
and
(k/2) minimizing
10.1
10.
Bayes tests and likelihood ratio tests.
In this section it will be
shown that certain Bayes tests differ little from the corresponding likelihood ratio
test if N is large, not only when the size
o
CXN of the test is bounded away from
(in which case a chi-square test has a similar property) but also when
ON
tends
to zero.
Let
PN{z(N) I G)
(10.1)
=
J
PN(Z(N)I p)
dG(p)
The Bayes test for testing the hypothesis H:
p
n and let
G be a distribution function on the simplex
is distributed according to
exceeds a constant.
'.
p = po
against the alternative that
G rejects
H if the ratio PN(z(N) I G)/PN(z(N) Ipo)
(N) ~ .
exp ( - N I(z
,P' J} •
This ratio is ~
Let U denote the uniform distribution on
has a constant probability density.
n , so that the vector (Pl'··· ,Pk-l)
We have
(10.2)
for all
z
(N) ,
•
Hence
PN(z (N) I U)
PN{z
eM)
Ip
0
)
=
N+k-l
(N)
(N)
log ( k-l ) + log PN(z
I z
).
Here the left side is the difference between the test statistics for the likelihood
ratio test and the Bayes test.
term in (10.3) (see (2.11»
An application of Stirling's formula
shows that if the components of
away from 0, the right side of (10.3) is of the form
not depend on
they are
IOints in
z{N).
This implies
tha~the
of approximately the same size)
c
N
zeN)
to the last
are bounded
+ 0(1), where
~ does
critical regions of the two tests (When
and their error probabilities at the
n o differ little from each other.
The uniform distribution U has been chosen for simplicity.
We obtain a
similar result if U is replaced by a distribution G such that, for example,
the probability density of
(Pl'''. ,Pk - l )
is positive and bounded.
10.2
Now consider a composite hypothesis,
on n
such that the set
choices of Go
on I(z(N), A).
(10.4)
Then
=
A
I(x, A )
Let U
o
of
Binomial hypothesis.
= (xo ' XI' .•• 'xm).
x
8 on
will differ 1i ttle from the likelihood ratio ta:! t based
This is here illustrated by two examples.
Example 10.1:
by
A has probability one.
(p(e)
I0~ e
= I(x,
p(x», where
n
.s
l},
1\
1'1
=
Pi(e)
= p(
p(x)
(~)
if(l - 8)m-i,
e(x) =
A
A
8 (x»,
= 0,
i
l, ... ,m.
/
1: i xi m.
denote the distribution on A induced by the uniform distribution
(0,1).
Then
= mN
s = E i n.
~
n.
and denote the points of
Let
1
(mN+l)
~
on
k = m+l
Let
N:
n n.:
where
Let G be a distribution
o
We may expect that for suitable
and G the Bayes test based on the ratio
I G)/PN(z(N) I Go)
PN(z(N)
H: pEA.
e
(z(N».
Let, as before,
U be the uniform distribution
After simplification we obtain
(10.5)
p (z(N)
N
=
log (N:m)
_ log
I U0 )
(mN+l) + log
PN(z(N)
I zeN»~
- log
~
,
where
(10.6)
~ =
s
(m:) ( :N)
mN-a
(1 -
m: )
•
Relation (10.5) is analogous to (10.3) and implies a similar conclusion.
Exam,ple 10.2:
!1YPothesis of independence in a contingency table.
defined as in example 8.1.
random vectors
(1) '
(Pl
Let U
(1) )
••• , Pr
o
be the distribution on
and
(2) '
(Pl
(2) )
••• , P s
Let
A be
A such that the
.
are J.ndependent and each
10.3
is uniformly distributed on the respective probability simplex.
nil) = E n ,
j ij
n~2)
t
=
i
nijO
N I(z(N) ,A ) - log
Let
z(N)=n . .IN,
J.J
We obtain
p (z(N)
N
I
u)
(10.7)
,
+ log
=
(1)
n.
where
J.
pel)
=
N
and
N1
T
r
n
i=l
(nil»
(1) t
n.
•
J.
is defined in an analogous way in terms of the
The result is quite similar to that of example 10.1.
The hyPOthesis sets
class of subsets of
true.
A of examples 10.1 and 10.2
n for which relations analogous to
are special cases of a
(10.5)
and (10.7)
hold
Al
Appendix:
Regular sequences of sets.
In this appendix sufficient conditions
n to be regular relative to a point in
are derived for a sequence of subsets of
n
•
We recall that, by definiti·:m
to
p
if
(Sanov [2]
in
considered the weaker regularity condition where the
(A.l) is replaced by
Since
=
I(AN' p)
to
if condition
for which I(~ ,p)
Lemma. A.l.
constants
No
point
€
y
(A.l), with
<
and
and a point
z
I zi
(A.3)
Proof.
(A.4)
c
2: I(~,
is satisfied.
:S,
replaced
p).
Hence for those
N for
(~) is regular relative
Thus
is fulfilled for those sets
~
is "regular re~&tive to
N
> No
> 0,
zi
wi th
p
if there exist
I(~I p)
<
00
there is a
for which
€
I(~,
~N)
- Yi
I
p)
for which
< c N-
l
if Pi
=0
if Pi
=0
for
>
•
It is sufficient to show that
The assumptions imply that
I(z, p) - I(y, p)
z.~
p)
such that for each
I(z, p) - I(y, p)
If
=
(Awl
The sequence
n
I(A~N),
00.
I(Y, p) :s
(A.2)
have
condition A.l
00
remainder term
o(l).)
~N) C~, ~
which
p
6.1, the sequence {~} is regular relative
=0
then
:s O(N- l
Y and
log N)
z are in
=
di
= zi
d. = -Y
~
n(p)
i
N
N ~ No.
l
Hence
log (Zi/Pi) - Yi log (Yi/Pi)
log (y./p.) = O(N~
~
l
log N) •
A2
d.1
= (z.1
+ y.1
log1
(Z./y.)
1
- Yi ) log (z./P
1
i)
< (Zi - Yi ) log (zi/Pi)
< Iz.1 - y.1
Ilog N-li
1
d.1 <
O(N- l log N)
-
Hence
For any real numbers
Yi «Zi/Yi) - 1)
+
O( Iz. - y.1 )
+
1
1
= O(N-llog
with Pi ~ 0,
for all i
al, ••• ,ak,c the subset of
and
N) •
(A.4) follows.
0 defined by
E a.x.
1
1
>c
Eaixi ? c will be called a ,half-space. (It is convenient here not to exclude
the case where all a. are equal.
Thus the proof of the next lemma for the case
or
1
P
+
o
is strictly analogous to the proof for P £ 0 . )
0
o
.
Lemma A.2. If p is any point in 0 and A is any half-space such that
I(A, p) < = then there is a point
for each N
is a point y
'i'le
Define
if p.1
t
=
p
£
0
o
and
(x
=,
A is not empty, so that max a. > c.
1
such that I(y, p):;: I(A, p)
£
:s
= O.
First assume that
I(A, p) <
smw that y
=0
z.1
A
Since
£
2:: k(k-l) there is a point
if P1' > 0 and
Proof.
0 for which I(y, p) = I(A, p), and
N
l
) such that I Zi-yi I
z
£
(k_l)N-
y
and
E a.y.
By lemma 4.2 there
> c. It is easy to
J. J. -
O.
o
have y.1 >
k
-
-1
Z = (zl' ••• , Zk)
for some
i.
as follows.
-1
For definiteness assume that
For
i
= l, ••• ,k-l
let
Yk.2: k •
Nz.
1
be an integer
such that
yo{ I <
Izo{•- •
N-
l
,
(a
)(
)
i - ~c zi - Yi
>
0 •
A3
These conditions can be satisfied since
y
en.
Let
o
~ = 1 - zl - ••• - ~-l •
Then
k-l
1 -
1
E (y. + N- )
. 1
1=
1
Hence if N> k(k-l)
-
IZi-Yi l <
then
(k-l) N- l
for all
k
E a.z
- c
.1 1 i
1=
(A.6)
If the
a.
->
Moreover,
Now
i.
k
E
i=l
n(N) •
e
Z
and
~<;: ~ 0
k
E a.y. =
i=l 1 1
aiz i -
k-l
E (a - ~)(z. - Y.).
i
. 1
1
1
1=
are not all equal, the last sum is strictly positive by (A.5).
1
wise the inequality in
(A.6) is strict.
Hence
z
€
A(N)
for
Other-
N ~ k(k-l).
The
lemma is proved for the present case.
Ifpen
o
and
A
=
{x
I Ea. x. > c} , the conc1'l~.sion of the lemma
1
1-
follows from the first part of the proof provided that the set
is not empty.
max
a
i
We have
wise.
= c
If it is empty then, since
A must be non-empty, we have
and A is the set of all points
I(A, p)
= I(y,
p) where
y.1
= p.1
1
x
such that
Ea.=C Pj
if
J
xi = 0
a.1 = C,
if
a. < c.
1
Y1. = 0
other-
It is trivial to show that the conclusion of the lemma is true in this case.
If p ~ n, the assumption
o
I(A, p) <
implies
co
and the proof is similar that for the case p e
Lemmas A.l and A.2
to any point in
Theorem A.!.
A •
I(A, p) <
no .
More generally we have
Any sequence of subsets of
Let A
n
whose complenents are convex is
n.
be a set whose complement is convex and p
We restrict ourselves to the case
pEn
an
'0
is treated irl analogous way, as in the proof of lemma A. 2.
00
•
n (p), p)
imply that any sequence of half-spaces is regular relative
regular relative to any point in
Proof.
I(A, p) = I(A ('i
a point such that
since the case
p
f
no
A4
4It
Let
y
A is the union
If yen
then y
A,
there is a point
y
n
¢n
in
= I(H,
I(A, p)
= I(A,
pl.
Since
At
is convex,
I(x,
Et Pj' where
Pi
if
paragraph, with n
By lemma A.2 for each
Ann (y)
and lemma 4.5.
Yi
pl.
N ~ k(k-l)
H(N), hence in A(N), with the properties there stated.
We have
Et
I 0, p.1
is not empty; this follows from the
= I(A n
I(A, p)
n(y).
is a convex subset of
p) - log
= P./ Et p.
1
J
I(y, p)
is in the closure of one of the non-empty half-spaces
then the set
0
n (y)
and
z
convexity of At
At
such that
of half-spaces bounded by the supporting hyperplanes of At.
o
~ose union is
If
A
be a point in
For
x
E n(y) we have
denotes the sum over the
°
=
otherwise.
replaced by the subspace
The set
n (y), p) •
j
I(x, p)
I
with y j
=
° and
The argument of the preceding
n (y), leads to the conclusion of
that paragraph.
Thus the conclusion of lemma A.2
complement is convex.
With lemma A.I
Define the subset
ni
of
is true for any subset A of n
whose
this implies the theorem.
n by
';t
n~; =
(A.7)
{x
I Xi > fJ,
i = 1, ... , k } •
.. f"
Theorem A. 2.
Let
(A.8)
(x
where the
eN
derivatives
are real numbers and
fi(x) =
and are continuous in
numbers
(A.9)
and
N
o
I f(x) >
and f
of(x)/
n.
o
Let
eN
f(x)
0 x.
1
} ,
is a function defined on n
and
pEn.
0
f~' . (x)
1J
=
02f(X)/ ox.
1
whose
ox.
J
exist
Suppose that there exist positive
. and for ·.each .. > N0 there is a point yeN) such that
I(y
(N)
,p)
=
I (AN' p)
H
A5
(A. 10)
lim
N
N~oo
( ~)
Then the sequence
Proof.
is regular relative to
The assumptions imply that for
p.
N > No
f(x) - ~
=
a~N)
(x. _
where
N
al )
kZl
. 1
~
~=
x
uniformly for
= 1, •• '. ,k-l
let
= N _ miN) _ ••• _
~~{
For
~N)
i
N zlN)
= miN)
i ~ k - 1
for
Or,/2'
E
if
(N)
y~N)
I <
J.
yeN) G
9r.,
zeN)
Hence for
N
aiN)
is
~
in
> N- l
N-
>
-
~
fI(y(N)}- fk(y(N».
0,
N ziN)
- - -
~-l
(N)
z(N)
N y (N)
i
if
a~N) <
and let
by
= miN)
•
<
- 1
~
0 ,
Then
= 1, ••• , k.
n£/2 for N > Nl .2: No.
> N- l
I Soi(N) I ,
Moreover,
i = 1, ••• , k-l •
1
2
k-l
()
Z I So N
I + 0(N- 2 )
i
i=l "
l
Nmax I f!(y(N» _ fl(y(N)}
ij
Condition (A.10) implies that for
z<N}
=
> Nl
( )
fez N ) _ C
that is
I 2)
O( Ix_y(N)
Define the point
2k/N, i
a~N) (y~N) _ Yi(N»
~
'.
+
denote the largest integer
~
= 1 - zl
_
~
m~N)
y~N)}
~
(N)
~
and
I z~N)
(A. 11)
Since
+ 2
~
E
~
0(N- 2 )
•
j ,
N large enough we have
~.
Thus the conditions of lemma A.l
I+
are satisfied.
f(z(N)} > c '
N
The proof is complete.
REFERENCES
/
NORTH CAROLINA STATE
OF THE UNIVERSITY OF NORTH CAROLINA
AT RALEIGH
INSTITUTE OF STATISTICS
(Mimeo Series available for distribution at cost)
364. Brosh, I. and P. Naor. On optimal disciplines in priority queueing. 1963.
365. Yniguez, A. D'B., R. J. Monroe, and T. D. Wallace. Some estimation problems of a non-linear supply model implied by
a minimum decision rule. 1963.
366. Dall'Aglio, G.
Present value of a renewal process.
1963.
367. Kotz, S. and J. W. Adams On a distribution of sum of identically distributed correlated gamma variables.
368. Sethuraman, J.
On the probability of large deviations of families of sample means.
1963.
1963.
369. Ray, S. N. Some sequential Bayes procedures for comparing two binomial parameters when observations are taken in
pairs. 1963.
370. Shimi, I. N. Inventory problems concerning compound products.
1963.
371. Heathcote, C. R. A second renewal theorem for nonidentically distributed variables and its application to the theory of
queues. 1963.
372. Portman, R. M., H. L. Lucas, R. C. Elston and B. G. Greenberg. Estimation of time, age, and cohort effects. 1963.
373. Bose, R. C. and J. N. Srivastava. Analysis of irregular factorial fractions. 1963.
374. Potthoff, R. F. Some Scheffe-type tests for some Behrens-Fisher type regression problems.
1963.
375. Mesner, D. M. A note on parameters of PBIB association schemes. 1963.
376. Bose, R. C. and J. N. Srivastava. Multidimensional partially balanced designs and their analysis, with applications to
partially balanced factorial fractions. 1963.
377. Potthoff, R. F. Illustrations of some Scheffe-type tests for some Behrens-Fisher-type regression problems.
378. Bruck, R. H. and R. C. Bose. The construction of translation planes from projective spaces.
379. Hogan, M. D. Several test procedures (sequential and non-sequential) for normal means.
380. van Zyl, G. J. J. Inventory control for perishable commodities.
M. S. Thesis.
Some multiple comparison sign tests. Ph.D. Thesis.
384. Kuehl, R. O. and J. O. Rawlings.
mating. 1964.
387. Khatri, C.
1964.
388. Khatri, C.
biasedness
pendence.
Ph.D. Thesis.
1964.
Estimators for genetic parameters of population derived from parents of a diallel
385. Khatri, C. G. Distribution of the largest or smallest characteristic root.
386. Johnson, N. L.
1963.
1963.
381. Smith, W. L. Some remarks on a distribution occurring in neural studies. 1964.
382. Orense, M. and A. Grandage. Estimation of the exponentials in a double exponential regression curve.
1963.
383. Rhyne, A. L.
1963.
1963.
1964.
Cumulative sum control charts and the Weibull distribution.
1964.
G. A note on the inequality of the characteristic roots of the product and the sum of two symmetric matrices.
G. Monotonicity of the power functions of some tests for a particular kind of multicollinearity and unand restricted monotonicity of the power functions of the step down procedures for MANOVA and inde1964.
389. Scott, McKinley. Study of some single counter queueing processes.
1964.
390. Overton, W. S. On estimating the mean ordinate of a continuous function over a specified interval. Ph.D. Thesis.
1964.
391. White, Kathleen. An investigation of small sample properties of Zahl's estimate of a general Markov process model
for follow up studies. Ph.D. Thesis. 1964.
392. Rohde, C. A.
Contributions to the theory, computation and application of generalized inverses.
1964.
393. Burton, R. C. An application of convex sets to the construction of error correcting codes and factorial designs.
394. Roy, S. N. A report on some respults in simultaneous or joint linear (point) estimation.
1964.
1964.
395. Potthoff, Richard F. and Whittinghill, M. Testing for homogeneity in binomial, multinomial, and Poisson distributions.
1964.
396. Hoeffding, Wassily. Asymptotically optimal tests for multinomial distribution.
1964.
397. Magistad, J. and R. L. Anderson. Some Steady-state and transient solutions for sampled queues. Ph.D. Thesis.
398. Linder, Arthur. Design and analysis of experiments.
399. Khatri, C. G.
1964.
On a MANOVA model applied to problems in growth curve.
400. Naor, P. On a first-passage time problem in chemical engineering.
401. Basu, D.
1964.
1964.
On the minimality of a boundedly complete sufficient sub-field.
1964.
1964.

Download Report

Hoeffding, Wassily; (1964)Asymptotically optimal tests for multinomial distribution."

Paperzz.com

Your Paperzz