Ghosh, Jayanta Kumar and Sen, Pranab Kumar; (1984).On the Asyptotic Performance of the Log Likelihood Ratio Statistic for the Mixtrue Model and Related Results."

•
ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO
STATISTIC FOR THE MIXTURE MODEL AND RELATED RESULTS
by
Jayanta Kumar Ghosh
Indian Statistical In$titute, Calcutta
and
Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1467
September 1984
ON THE ASYMPTOTIC PERFORMANCE OF THE LOG LIKELIHOOD RATIO STATISTIC
FOR THE MIXTURE MODEL AND RELATED RESULTS I
•
2
JAYANTA KUMAR GHOSH
Indian Statistical Institute, Calcutta
3
PRANAB KUMAR SEN
University of North Carolina, Chapel Hill
Summary.
The classical distribution theory of the log likelihood ratio
test statistic does not hold for testing homogeneity (i.e., no mixture)
against mixture alternatives.
developed.
Asymptotic theory for this problem is
For some special cases, asymptotically locally minimax tests
are also found.
It is pointed out that the main problem is lack of
identifiability of the usual parameterization even when the mixtures are
identifiable; if one chooses an identifiable parameterisation, then there
is a problem of differentiability of the density.
AMS Subject Classification Numbers:
Key Words & Phrases:
62E20. 62F05.
Asymptotic distribution; asymptotic local minimaxity;
identifiability; likelihood ratio test statistic; mixture model.
1) This is one of the three examples presented by the first author at the
Neyman-Kiefer Conference.
•
2) Work done partly at the University of California, Berkeley, supported
by the ONR Grant NOOOI4-80-C-0163 •
•
3) Work partially supported by the National Heart. Lung and Blood Institute,
Contract NIH-NHLBI-71-2243-L from the National Institutes of Health.
1.
Introduction.
(l-TI)g(x,e(l»
Consider a family of probability desnities and mixtures
+ TIg(x,e(2»,
0 < TI < 1.
identifiable in the sense that if
TI
~
We assume the mixtures are
0,
~
TI
1
and
e(l)
+ 8(2),
then
the equality
(l-TI)g(x,e(l»
implies
8(2)
TI
= TI',
= e(3).
e(l)
+ TIg(x,e(2»
= e(3),
= (1-TI')g(x,e(3»
8(2)
= 8(4)
Note that because of this,
or
TI
+ TI'g(x,e(4»
=1
- TI',
g(x,8) = g(x,8')
(1.1)
8(1)
= 8(4) ,
implies
e = 8' .
(Both here and in (1.1) the relations between two densities hold almost
every where with respect to the dominating
~.)
a-finite measure
Typically in cluster analysis one models data exhibiting two clusters
by postulating a mixture of two densities.
In this context it is important
to test whether the observed clusters are real or merely a matter of
appearance caused by random sampling from a homogeneous population.
Formally, denoting the true density by
H ;f
o
= g(x,e) ,
against the mixture alternatives
and
e(l)
f,
HI
one wishes to test
6
E
0
(1.2)
considered above with
TI
+ 0,
TI
+ 6(2).
The identifiability assumption (1.1)
no overlap.
ensur~s
that
H and
O
HI
have
But nonetheless the classical asymptotic theory for likelihood
ratio tests is not applicable.
Of course as pointed out in the literature,
the null hypothesis is in some sense on the boundary of the parameter
space of this problem, rather than its interior as assumed in classical
theory.
+1
However, Chernoff (1954) has shown how to handle this kind of
departure from standard assumptions; see also Feder (1968).
The real
problem is that though the mixtures are identifiable, the parameters
8(1) , 8(2)
are not so.
If the alternative hypothesis
true density is written as
f(x,TI,8(1) .8(2»
set of parameter values, namely
H
1
TI,
is true and the
then there is exactly another
(1-TI,8(2) ,8(1»,
which will give exactly
the same density; it will be seen in Section 5 that this kind of nonuniqueness is not hard to take care of.
true density
f
is
g(x,8(0»,8(0)
represented by three curves:
8(2)
= 8(0)
or
8(2)
= 8(0)
However, if
H is true and the
O
fixed, then the same density is
= 0 and 8 (1) = 8(0)
TI
8 (1)
and
or
TI
= 1 and
= 8(0) . Another way of expressing
this fact is to observe that we can pass to the one dimensional space of
H by specifying only one co-ordinate at a time -- and not two -- in the
O
three dimensional space of
H •
1
Of course one can try a parametrisation
which is identifiable, i.e., one which sets up Euclidean parameters in
one to one correspondence with the mixing distribution.
Then the problem
becomes one of lack of differentiability of the density with respect to
these parameters, at points in the space of
= (1-TI)8(0) + TI8(1),
A
1
A
2
a suitable convention for
TI
H '
O
For example, we may try
= the 8 corresponding to Min(TI,l-TI)
= ~),
and
A
3
(with
= {Min(TI,1-TI)}{2A 2-8(0)-8(1)}.
We shall return to this problem after considering in detail a similar
but simpler one which may be called the case of strongly identifiable mixHere one considers two families of probability densities gl(x,8 )
1
P1
P2
g2(x,8 ), 8 E 01 c Rand 6 E 02 c R . It is notationa11y
2
2
1
tures.
and
convenient to replace
TI
by
8
0
in the mixture
f(x,8 ,8 ,8 ) =
0 1 2
(1)
(1-8 )gl(x.8 ) + 8 g (x,8 ). If 8
~ 0 or 1, we assume that
2
0
0 2
0
1
f(x 8(1) 8(1) 8(1» = f(x 8(2) 8(2) 8(2)
implies 8(1) = 8(2) where
'0'1'2
'0'1'2
8
=
(8 ,8 ,8 ),
0 1 2
Such mixtures may arise as models of partial slippage,
-2-
contamination or cluster analysis with some information about the direction
of additional clustering.
We wish to consider the null hypothesis
(1.3)
against the strongly identifiable mixture alternatives.
is true the parameters are still not identifiable.
Note that if
H
O
Here is an example of
this sort which will be worked out in detail in Section 4.
Example 1.
Here
N(e,l)
stands for a normal density with mean
e
and variance one.
To motivate our main result in the strongly identifiable case, we must
make a few general remarks about the asymptotic behaviour of the mle
(maximum likelihood estimate) when the parameter is not identifiable.
Suppose, in fact, all of Wald's (1949) conditions for the consistency of
mle hold except for identifiability of
e.
Suppose also, to fix ideas,
that the parameter space is the three dimensional Euclidean space and all
points non-identifiable from the true value
eO
lie on a curve
f.
The
best that one can hope for is that the maximum of the likelihood will
eventually be attained in a neighbourhood of this curve.
Actually, Redner
(1981) has observed that essentially Wald's proof under Wald's conditions
(sans identifiability) guarantees this; Redner calls it convergence of the
mle in the topology of the quotient space obtained by collapsing
a single point.
r
into
This general fact has the following implication in the
strongly identifiable case.
When the true density is
A
two components of the mle
e,
A
namely
-3-
eo
gl(x,e~),
the first
A
and
e ,
l
will converge almost
surely to their true values
true value of
ez
eo
=0
Of course there is no
and
A
to which
eZ
can converge.
(In fact under the assumpA
tions made for Theorem Z.l in Section 3, it can be shown that
eZ
cannot
converge almost surely to a constant.)
The preceding facts will not be used explicitly in the sequel but they
motivate what is done here.
Among other things one sees that when
H
is
O
true, one cannot confine attention to a neighbourhood of a single point in
order to maximise the likelihood.
This means that the usual quadratic ap-
proximation to the likelihood is available only with respect to the first
two components
eO
and
8
1
for which the mle is consistent.
However,
under certain assumptions, we can still utilise these partial quadratic
where
statistic is distributed as a certain functional
sup{T(n
z)}
and
T(e)
W=
is a Gaussian process with zero mean and covariance
kernel depending on the true value
e~
under
H •
O
In Remark 2.1 we propose
a family of other tests with simpler limiting distribution.
Note that our
treatment does not follow from Chernoff (1954) or Feder (1968), because
they were able to exploit the existence of a consistent solution of the
likelihood equation in the identifiable case.
To follow their approach,
one would have to develop first results in solutions of the likelihood
equation in the non-identifiable case.
This can be done but use of tech-
niques similar to those of Redner (1981) seemed aesthetically more
satisfactory.
The fact that the likelihood is not locally approximable by a quadratic
has another repercussion.
The proof of asymptotic local minimaxity of the
likelihood ratio test via approximation by Bayes tests also breaks down.
In view of this it seems natural to introduce a prior
G on
82
and
then work with the integrated likelihood ratio statistic
n
f(x i .8 0 • 81 •8 2 ) G(d8 2 ) /s8uP ~ g(yi ,8 1 )
1
or some other functional on the integrated likelihood.
choose
One should probably
G so that the associated test is asymptotically locally minimax.
It is plausible that such a
reasonable conditions.
G and an associated test always exists under
As a first step. it is proved in Section 4 that for
Example 1 the prior degenerate at
test assuming
8
2
=b
general exponentials.
b
and the corresponding likelihood· ratio
works in this sense.
On
A similar result is proved for
the other hand it is not hard to show (though we
do not prove it here) that the likelihood ratio test is not asymptotically
locally minimax for these examples -- these are thus new instances of the
failure of the principle of maximising the likelihood.
For the case of (not strongly) identified mixtures. a result analogous
to Theorem 2.1 is obtained for the likelihood ratio test of the hypothesis
in (1.2).
where
To do this we assume a separation condition
E > 0
is a fixed quantity.
118(1)_8(2)
II
>
E
In a subsequent communication we shall
try to remove this condition.
A review of the literature on this topic is available in Bock (1981)
and Gupta and Huang (1981) and. the final chapter of Everitt and Hand (1981).
Even though there is no overlap with our results, a paper of Moran (1973)
deserves to be mentioned.
Moran derives the asymptotic distribution of
the likelihood ratio test of homogeneity against special mixture alternatives
in two cases. namely. Poissan and Gamma.
-5-
For his alternative Moran considers
k
G{(8-A )/a 2 }
mixing distributions
o
where
G is a fixed known distribution
of which one needs only that the third moment about mean is zero.
set-up of two point mixtures, this would correspond to assuming
so that
In our
8
= IT
0
=
k2,
18(1)_8(0)\ = O.
H is equivalent to the scale parameter
O
We conclude by making a few remarks about mixtures of
2
N(~,o
).
The
theorem in section 5 applies directly only the case of mixtures of means
with known
a
would hold if
and
a
~
in a compact set.
(~,o)
is unknown but
and only mixtures of mean are allowed.
However, we feel a similar result
lies in a compact set,
a
# 0
If one also allows scale mixtures,
substantial changes are needed in the treatment since the derivative of
the likelihood with respect to
TI
ceases to be square integrable.
that a similar phenomenon occurs in Moran's case if
moment about mean.
Note
G has non-zero third
It shou1q be possible to handle such cases by suitable
truncation of the derivatives.
Finally, it may be noted that though we
have confined ourselves to the case of mixtures our main conclusions hold
for other cases of non-identifiable parameters.
2.
and
The Main Result for Strongly Identifiable Mixtures.
f(x,8)
Let
8
=
(8 ,8 ,8 )
0
1 2
be as in the introduction and
L (8)
n
be the log likelihood based on
=
n
i.i.d. observations.
(1.3) is true and the true density is
Suppose
Ho
of
All expectations and
probabilities will be computed in this and the next section under this
assumption but this will not be displayed in the notation.
We now sketch an argument leading to Theorem 2.1, introducing notations
-6-
as we go along.
The details for handling several remainder terms as well
as the necessary assumptions (AI through AS) are collected in the next
section.
Here we only remark briefly on the nature of the assumptions.
The assumptions are similar to those in the classical case but have to be
.
strengthened suitably to ensure uniformity in
8
at various places.
2
T (-)
latter as well as tightness of the Gaussian process
below makes it convenient to work with
o2
8
Assumption A (vide
introduced
as a closed bounded interval.
8
However all we really need is compactness of
restriction to dimension one for
n
or its closure.
2
The
is made so that the tightness
2
Section 3) is easy to write down; it may be extended
by making use of analogous conditions in Bickel and Wichura (1972).
usual we take
8
1
The
as an open rectangle in
As
RP .
Among other things the assumptions of the next section guarantee that
all quantities introduced below are well-defined.
We now begin by rescaling the parameters through
8
8
0
=
+ 11 0 / In
0
where
eO
0
=
0
eO
ln
I + l1 l /
1
e2
eO
=
11 2
Let
L (e)
n
(11 0
,n 1 ,11 2)·
be denoted as
Note that
written simply as
V (11)
n
Vn (O,0,112)
V (0).
n
when regarded as a function of
is free of
Let
-7-
11 2
11 =
and hence may be
U
nO
(n )
Z
eo
be the normalised derivative with respect to
be the
lXp
•
and let
(row) vector of normalised derivatives with respect to the
(T1 )
components of
8 .
1
The presence or absence of
indicates dependence on
lowed below.
nZ
or lack of it.
nZ in Uno Z
The same convention is fo1-
Let
and
(p+l)X(p+l)
-rOo (n )
z
I(nZ)
r01(nz)T
-8-
U
nl
I .. 's
By Assumption AI.
log f(X ,8)
1
are related to the second order derivatives of
1J
in the usual way.
Expanding
v (n)
n
with respect to the first two co-ordinates, by Al
(2.1)
where the remainder term
uniformly in
n2
R
nl
is
o (1)
p
on bounded sets of
and
We prove in the next section that uniformly in
v
Sup L (8)
0<8 <1 n
-
n
er-
(0)
+
n2 ,
Sup A (n) + 0 (1)
p
n
ner->0
(2.2)
n1 ERP
GlEel
(The proof is similar to the classical case but one has to ensure uniformity
in
n2 ).
By the well-known Kuhn-Tucker-Lagrange theorem (viz., McCormick (1967)]
the supremum of
A (n)
is
n
(2.3)
if
(2.4)
and the supremum is
1
U
I-1U T
~ nl 11 nl
~f
...
-9-
(2.5)
Similarly
L (H )
n
0
=
def
Sup L (8)
n
(Z.6)
eo =0
8 Ee
l l
Hence,
A
n
(n z) = Z{L (n z) def
n
L (H )}
n
0
= 0 (1)
(2.7)
p
(Z.8)
So the likelihood ratio statistic is by definition
An = Sup L n (n z)
n2
W
where
n
K
n
0
nZ
T (e)
n
process
(Z.9)
L (H )
= Sup
Assume
cess
-
ez =
[b,c].
By A4 and A5 of Section 3, the stochastic pro-
taking values in
T(e)
on
C[b,c]
converges weakly to a Gaussian
whose mean is zero and the covariance kernel
is the same as that of
(under
C[b,c]
T (e)
1
T (n
and
and easy to write down.
ZZ )
The covariance
is given below assuming
scalar:
where
8~.
J(nZl,n ZZ )
Note that
is the covariance of
Var(Tl(n Z»
z
1 V n Z.
a continuous functional, we have
-10-
u
lO
Since
(n Zl )
A11
and
U
lO
¢oY (e)
n
(n ZZ )
where
under
¢ is
Under the assumptions A1 through AS of Section 3)
Theorem 2.1.
converges in distribution to
.
assumes only
k
would require that
distinct values.
k-l
~
n
¢oT(e).
A
Remark 2.1 (a) The limiting distribution of
Xi
A
dim 0
simplifies a little when
n
Identifiability of the mixtures
+ dim 02 + 1 •
1
(b) The limiting distributions of
Tn (8 2 )
and
An (8 )
2
are given
explicitly and applied to Example 1 in Section 4.
(c) To get alternative test statistics whose limiting distributions
are easier to compute, one may approximate
i
T (8 ),
2
n
and then consider the statistics
by a finite set
0·
2
i = 1, ..• ,m
totica1ly multivariate normal with zero mean under
1
m
8 )" .)8
2
2
which are asymp-
H .
0'
the dispersion
matrix can be consistently estimated since the m1e is consistent for
when
of
H
is true.
o
i
T (8 )'s;
n 2
form in
3.
One can use as a test statistic any suitable function
and then estimating the coefficients, one would get a
n
X2-distribution.
Assumptions and Details of Proof.
As in Section 2 all expectations
are computed under a fixed
AS.
1
for example) choosing a suitably· positive definite quadratic
T 's
limiting
8
The assumptions are marked Al through
Instead of collecting them at one place we shall present them as the
need arises, in course of supplying some of the details left out in
Section 2.
Let
for
nol '
=
801
0
nolo
(8 ,8 ),
0
1
Let
D
=
0
0
01 =
d
--
8
(O,e 0l )·
ae 0 ,
A similar convention is followed
d
= - -,
D.
ae lj
J
j =
(i)
0
1
is an open set of
interval
[b, c]
of
RP ,
1
R •
-11-
and
unless other-
0
(e ol
,8 )·
2
wise stated the derivations are evaluated at
AI.
l, ••. ,p,
0
2
a closed bounded
«ii)
f(x,8)
is continuous in
tiable with respect to
(iii)
E(D
8
01
and twice continuously differen,
= 0,
log f)
j
8
E(D.D., log f) = -E(D. log f x D., log f)
J J
(iv)
J
E{
Sup
II 8 ol-8~111 <0
J
IDjD j , log f(X,8 0l ,8 2) - DjD j , log f(X,8~1,82)1} ~ 0
8 E8
2 2
as
0
~
0,
j , j'
= 0, 1, .•• , p •
To handle the remainder in (2.2) we proceed in three stages, Assumption A2
will allow us to restrict attention to a compact subset of
culating the supremum of
n
which is replaced at the third
n-~ ,
and final stage by neighbourhoods that shrink like
uniformly on bounded
A2.
nol-plane.
nol-sets
The fact that
E(H(Xl
»
W(X1,o)
<
R
nl
i. e ., bounded
in (2.1) is
o (1)
p
now completes the proof.
There exists a compact neighbourhood
Moreover,
while cal-
Then with the help of A3 we work with
L (8).
arbitrary but fixed neighbourhoods of
neighbourhoods in the
81
is continuous on
82 ,
N of
8
01
such that
IW(X ,8 2 )1 ~ H(X ) V 8 2
l
l
and
00.
By the uniform strong law of large numbers (USLLN) applied to
L (8) - V (0),
Sup
and the fact that
8
we get
-12-
01
E[O,l]XN
c
n
n
•
L
Sup
(8) -
n
8
0l
n
0
p
(1)
(3.1)
E:[0,1]XN
8 .
2
uniformly in
A3.
L (e) =
there exists an open ball with centre
For each
and radius
0
[0,1] x 01
and
U = U(e 1,0)
such that if
o
0
0
is its intersection with
then
By A3 and continuity of
Let
U = {e
o
ol
I 18l-8~1 I
; 0 < eo < 0,
c
U n [0,1] x N by sets
U(8
o
Now apply the USLLN to
n
-1
f
0l
,01)
<
oJ.
Consider an open cover of
and choose a finite subcover
L:t/J(X ,U ,8 ),
i j 2
j
= l, ••• ,m,
82
E
02
Ul' ••• ,U •
m
to con-
clude that
=
uniformly in
.
e •
2
0
P
(1)
This completes the second stage of the proof.
At the third and final stage, note that, by Taylor's theorem and AI,
L (8)
n
-13-
where
IJij(T)
+ 1ij(T)Z)I = 0-0p (1)
(p+1)x (p+l)
1(8 )
2
A4.
is continuous in
greater than
AS.
EIDo log
> 0
£0
V 82 ;
o x 8Z·
V
uniformly in
We now use
and its minimum eigen value is
82
and
f(Xl,8~1,e2) - Do log f(Xl,e~1,e2)la~ KI82-8ill+Y for
•
a,y > O.
some
AS ensures tightness of
U
no
(To see this one has to use the theorem
(.).
Sup Uno (T)Z) is 0 p (1). Also by
T)Z
Hence (by A4) for given £, we can find £' < £ ,
of Dharmadhikari et a1. (1968». Hence,
AI,
is also
U
n1
K and
n
o
p
(1).
o
such that for
0
n > n ,
o
(Sup U (n ) + Iu 11 < K ,
no Z
n
r
" n
P the
smallest eigen value of
E:
> 1-
U0 x H2
(p+1)X(p+l)
[Jij(T)] > £' ,
Then by first making a suitable orthogonal transformation, one can find
such that for
P{Vn(T)
n > no'
< Vn(O)
An(n) < An(O)
where
>
n
n ,
A (n)
n
Rn1
a
if
if
T)
the supremum of
(over
E
Uo x HZ
K and
L (8),
n
£'.
i.e.,
[0,1] x RP) is attained in
of (2.1) is
0p(l)
I Inoll I
and
Iinoill > M} > 1-
depends only on
M
M
> M and
£
Thus with probability
v (n)
n
(over
IIn ol ll
~M.
o)
U
> 1-
E:
and that of
Since on this set
by AI, the proof of (2.2) is complete.
The proof of the similar result (2.6) follows along similar lines
from Al through A4 which are of course much stronger than what we need
for (2.6).
-14-
for
£
•
Remark 3.1 (a) The tightness assumption A5 holds if
(3.2)
E(~(X»8
and
<
8 > 1.
for some
00
T (e)
(b) The limiting distribution of
8 .
1
It is weakly continuous in
T(·)
is continuous in
8
1
depends on the true value of
n
provided (i) the covariance kernel of
8 ; and (ii) the Lipschitz condition (3.2) is
1
suitably strengthened to be uniform over
8 -neighbourhoods, (i) guarantees
l
convergence of finite dimensional distributions of
(ii) guarantees tightness.
o
1
and
el .
Suppose this is so and let
{A ~ t(Ol,a)} = a. If for each 8 , t exists,
1
1
is unique and a point of continuity of the limiting distribution of A
t(Ol,a)
•
1
8
+
Under these conditions the limiting distribu-
A is also weakly continuous in
tion of
8
as
T(e)
under
'"
01
be such that
8 ,
1
then
t
lim P
8
is continuous in
is consistent for
8
1
under
H •
o
8 •
1
By Redner's (1981) result,
Hence, as pointed out to us by
Peter Bickel, lim P {A ~ t(8 ,a)} = a. Thus the test which rejects H
8
o
l
"
1
if A ~ t(8 ,a) would be asymptotically similar provided the conditions
l
assumed here hold.
e2
They are easy to check under Al to A5 if
is a
finite set.
4.
Asymptotically Locally Minimax Tests in some Examples.
culate the asymptotic properties of tests based on
A (8 2),
n
We shall cal-
A (02)
where
n
is defined in (2.2) and (2.7), and show that it is asymptotically locally
minimax for problems like Example 1.
Fix
as in Section 2 and a sequence of alternatives
responding to a fixed
n = (n o ,n l ,n2 ).
We fix also a value
and consider the limiting distribution of
-15-
T (b)
n
under
eO1
cor-
K
n
b
and
of
8
K
n
2
where
T (b)
is defined in (2.4),
n
Let
Z*
n
= Vn (n)
Then by (2.1),
- V (0).
n
N(-~nOlI(n2)n~1,nOlI(n2)n~1)·
Z*
is asymptotically
n
By a well-known result of LeCam, namely his
first lemma on contiguity [cf. Hajek and Sidak (1967, p. 204)], this shows
K
n
is lIontiguous to
Since
and
T (b)
n
Z*
are asymptotically
n
bivariate normal, by another well-known result of LeCam, namely, his third
lemma on contiguity (vide Hajek and Sidak (1967)], T (b)
is asymptotically
normal under
6~
n
under
o
6 .
1
K
n
with same asymptotic variance as under
Kn
equal to mean under
Moreover,
plus the asymptotic covariance under
1
= Tn(b)I{T (b»O}+
n
relation holds under
6
o
1
and
0
p
(1)
K ,
under
n
-
K
n
is contiguous.
since the same
n
is asymptotically normal with mean zero and var-
T (b)
n
iance unity.
o
2
A (b)
Under
6
and mean
Also the asymptotic covariance of
Z*
n
and
T (b)
under
n
•
is
p= {I
where
00
(b)}
1
01
j
-~
[n I
o
and
00
(n 2 )Cov(U (b),U (n 2 )) +n
no
no
0
and
no
J=
2
1J
are the j-th components of
Un I'J
depends only on
•
P 01
L I. (n )Cov(U l"U
n J
no
and
and so may be written
(n 2 ))]
Note
p
By the
remarks in the preceding paragraph, the following result is true.
Theorem 4.1.
Assume the conditions of Section 3.
lim P
6
o
1
{A (b»x} = 1- <P(/x)
n
-
=1
lim PK {An(b)":::'x}
n
=
if
x> 0
if
x
Then
=0
1 - <P(v'x- p (n ,T)2))
o
if
x> 0
1
if
x = 0
-16-
.
where
~
is the standard normal distribution function.
Consider now Example 1 given in the introduction and fix
~n
Let the limiting power of a sequence of tests
be denoted by
S({~n},e~,no,nl,n2). Let us say
locally minimax if for all sequences
inf
n l ERl
nZE 0Z
for every
no >
~}
{ 'fin
of size
{~o}
n
a
a < .5.
under
K
n
is asymptotically
which have limiting power,
S({~n},e~,no,nl,n2)
0
and
By the classical theory for the likelihood ratio test
~o
n
based on
inf
inf
n ER1 S({~n},e~.no,nl,b) ~ n ERl S({~~}.e~,nl,n2,b)
1
r
for every
no > 0 and
~o
Hence asymptotic local minimaxity of
p(no,n z)
if we show
~
p(no,b)
easily by direct calculation.
n
V n Z ~ b.
will follow from Theorem 4.1
For Example 1 this follows
However the following lemma shows this
property is true for general exponentials.
Befpre stating the lemma we
introduce some notations.
Let
ge
= g(x.8) = A(e)exp{8x}h(x),
a family of probability densities.
Let
Let
1/!(8)
I
11
ge
'Cov(g'
a
gb
--)
g
a
-17-
e
E
a < b
some open interval
J,
be
be fixed elements of
J.
Ie =
where the covariances are computed under
a '
To relate to Example 1 (and similar problems) note that
elo = a.
with
We assume
Lemma 4.1.
Proof.
~(e)
~(e) ~ ~(b)
if
e > b.
Note that
=0
~(a)
Also
J.
is finite on
~(e)
<
can be expressed as
g
g'
/{I 11 ~
- o
I l(b)
~ - I11}gedlJ
g
g
a
a
Since
¢
(4.1)
~(b)
is convex, for any constant
K,
say
¢(x) - K can have at most two
sign changes and if there are two, they must be from positive to negative
and negative to positive.
Hence by Karlin's well-known result on sign
diminishing properties of the exponential densities (see, e.g. Karlin
(1968»,
8' > b
~(e)
- K has similar sign change properties.
such that
~(8')
<
contradicted at the points
Max{o,~(e')}
< K < ~(b).
~(b)
If there exists
then this sign change property would be
a,b,8'
provided we choose
This proves the lemma.
-18-
K such that
5.
The Case of Identifiable Mixtures.
interval and
0
its closure.
°< 8
0
< 1
and
182-8112:
£,
£
loss of generality we may take
i. e. ,
8
H :
o
Suppose
o
=
°
H
o
0
g(x,6),
6
be an open bounded real
E
0
be a family of densities
(1-8 )g(x,6 ) + 8 g(x,8 ) ,
0
0
l
2
and consider the mixtures
•
Let
Let
8
i
E
0.
being a fixed 'positive number.
° < 80
~~.
where
Without
We wish to test homogeneity,
against the above mixture alternatives.
In the sequel
We make the
is true and the true density is
following blanket assumption
B1.
Let
•
2
be any open set containing
0 n O = 0.
1
2
that
8
61
8~
O2
and
any closed set such
Then AI, A3, A4, AS hold with this
0 ,0 •
1 2
(Since
is compact by assumption, A2 is dropped.)
One can now imitate the arguments in Sections 2 and 3.
we can show that in order to maximise the log likelihood
restrict to
restricted to
Define
°< e
0
< 0
and
82 -< 80 - (£-0)
n,Vn(n),An(n)
I81-8~1
< 0
and
82 > 8
-
(0 < E).
0
etc. as in Section 2.
L (8)
vn (0) + Sup An (n) +
An
is over the set
•
-19-
0(1)
n
0
p
and
A (n)
= (0,0).
Hence
n
o
o1
8 ,
2
where the maximisation of
may be
Then one can prove as in
attain their maximum in a bounded neighbourhood of
Sup L (8)
n
8
01
82
we may
Call this set
Section 3, that with probability tending to one both
uniformly in
L (8)
n
Hence
+ (£-c)).
As in Section 3,
(1)
.
n
2
•
Va'
Because of the nature of
for given E' one can find K and
2
> 1 - E, the maximum of A
(over
n
such that with probability
8
0
1
+
o<
E -
0
0
attained at
no'
n ol
62
2- 8 1 +
0
if
8
E -
Kn
+
E
-
1
Kn
-~
0
2- 8 2 2- 8 1
-
E
n
o
+ 0 or
-~
•
An easy calculation now shows
Sup L (8)
8
where
n
0(2) = {8
2
L (8)
n
under
0
L (8)
n
in Section 2 with
H
0
o2
A (n)
n
n
H:
2
The supremum of
= V (0) +
=
+
0
p
(1)
~ 8~ +d.
8 2 2- 8 1 -
E
under
H
a
0(2)
2 •
Since the expression for the supremum of
or
has therefore the same expression as
remains unaltered, the conclusion of Theorem 2.1 is
valid, i.e., the following is true.
Theorem 5.1.
Assume Bl.
likelihood ratio test under
Then the limiting distribution of the
8~
..
is the same as that in Theorem 2.1 with
o =
0(2)
22·
ACKNOWLEDGEMENT
Thanks are due to the referee whose comments clarified many issues
and led to a better presentation.
REFERENCES
Bickel, P.J. and Wichura, M.J. (1971).
Convergence criteria for multi-
parameter stochastic processes and some applications.
Statist., 42, 1656-1670.
-20-
Ann. Math.
•
Bock, H.H. (1981).
analysis.
Statistical testing and evaluation methods in cluster
Paper presented at the lSI Golden Jubilee Conference, to
appear in the Proceedings .
•
Chernoff, H. (1954).
On the distribution of the likelihood ratio.
Ann.
Math. Statist., 25, 573-578.
Dharmadhikari, S.W., Fabian, V. and Jogdeo, K. (1968).
of martingales.
Bounds on moments
Ann. Math. Statist., 39, 1719-1723.
Everitt, B.S. and Hand, D.J. (1981).
Finite Mixture Distributions.
Chapman and Hall, London.
Feder, P.I. (1968).
On the distribution of the log likelihood ratio test
statistic when the true parameter is near the boundaries of the
hypothesis region.
•
.
Ann. Math. Statist., 39, 2044-2055.
Gupta, S.S. and Huang, Wen-Tao (1981).
On mixtures of distributions:
survey and some new results on ranking and selection.
Sankhya
a
~
43,
245-290.
Hajek. J. and Sidak. Z. (1967).
Theory of Rank Tests.
Academic Press,
New York.
Karlin. S. (1968).
Total Positivity, Vol. 1, Stanford University Press,
Stanford, California.
McCormick. S.P. (1967).
Moran. P.A.P. (1973).
Nonlinear Programming.
McGraw Hill, New York.
Asymptotic properties of homogeneity tests.
Biometrika, 60, 79-85.
Redner. R. (1981).
•
Note on the consistency of the maximum likelihood
estimate for non-identifiable distributions.
Wald, A. (1949).
estimate.
Ann. Statist.,
~,
Note on the consistency of the maximum likelihood
Ann. Math. Statist., 20, 595-601.
-21-
224-227 .