Chatterjee, S.K.; (1972)Rank approach to the multivariate two-population mixture problem."

RANK APPROACH TO THE MULTIVARIATE
TWO-POPULATION MIXTURE PROBLEM
by
Shoutir Kishore Chatterjee ..
University of Calcutta, India
and
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 817
MAY 1972
Rank Approach to the Multivariate
Two-Population Mixture Problem*
SHOUTIR KISHORE CHATTERJEE
University of Calcutta, India
and
University of North Carolina at Chapel Hill, U.S.A.
*Work carried out at the Department of Biostatistics, U.N.C., under the Multivariate Nonparametric Project sponsored by the Aerospace Research Laboratories,
Air Force Systems Command, U. S. Air Force Contract No. F 336l5-7l-C-1927.
Reproduction in whole or in part is permitted for any purpose of the U. S.
Government.
2
1.
INTRODUCTION
\ve consider the following problem.
Three independent random samples
(Xl(k) 'XZ(k) , .•. ,X(k»"
a.
a.
po.
o.=l,Z, ... ,n ; k=0,1,2
k
(1.1)
from three unknown p-variate populations with continuous cumulative distribution
functions (cdf's)
F(O)(~), F(l)(~), and F(Z)(~) are given.
It is known that
F(l)(~) and F(Z)(~) are distinct and F(O)(~) is a mixture of F(l)(~) and F(2)(~),
i.e.,
0<8<1
The 'mixture rate' 8 is unknown.
(1.2)
To estimate 8.
For this problem we consider procedures which utilize only the observational
ranks, and hence, can be used when ranks only are available.
For the i-th variate, let us rank all the N=nO+nl+n
three samples together, and let the rank of
Z
observations from the
X~k) so obtained be R~k). In this
~o.
~o.
way, from (1.1) we derive the rank vectors
o.=1,2, .••
,~;
k=0,1,2
(1.3)
Now suppose a pxN score matrix (depending on N)
~
~""N
is given.
(k)
Let
a~~)
•••
,p,•
-1 , ••. , N
0.-
(1.4)
With its help, we convert the ranks into rank scores
aN.~ (R.~o. )
Here
= (ClN].· (a.» 1·-1 ,
a~k) (say), o.=l; .•• ,nk ; k=0,1,2; i=l, ••. ,p.
J.o.
represents the random rank score corresponding to the observation
(1.5)
X~~) •
3
a
(k)
(a
-a
(k) (k)
- (k) ,
, ... ,a
,a
),
Za
pa
la
a=l,Z, ... ,n ; k=O,l,Z
k
n
-(k) -(k)
-(k)
(a
a
l
,a
-(k) ,
z , ... ,ap
I
)
k
a=l
a
(k)
-a
.
(1.6)
The relation (l.Z) means that we can regard the nO observations from F(O)
to have been taken in the following two steps.
success probability
e
are performed.
First, nO Bernoulli trials with
Then, for each trial, an observation from
F(l) or F(Z) is taken according as the trial shows a success or a failure.
With
this interpretation, it immediately follows that if the number r of successes were
observable, we could disregard the rest of the data, and take t = r/nO as our
estimate of
e.
Since r is not observable, we use the data to find some sort of
'estimate' of t.
If r, as well as the serial numbers Cl.1,aZ,.·.,a r
(l~al<aZ< ••• <a~nO)
trials resulting in success are given, then, conditionally,
\VOU
~~O),
of the
CI.=Cl.I,Cl.Z, ••• ,Cl. ,
r
" " butlon
"
X(l) , a= I , Z,'" ,n an d X(O) , aral,Cl.Z""'CI.
.J.
ld h ave t h e same d lstrl
as -a
~CI.
r,
l
the same distribution as X(Z)
-a
(0)
'
a=I,Z, ... ,n '
Z
the random score vectors a
-a
, a=al,aZ, ... ,a
e~l),
a!al,aZ, ... ,a
a=l,Z, .•. ,n
a=l,Z, ... ,n
z'
l
and
e~O),
r
r
would be interchangeable with
would be interchangeable 'nth
e~2),
Thus, we would have
so that unconditionally also,
(1. 7)
For any fixed non-null p-vector
&'
we have, therefore,
o
(1.8)
4
This suggests that we take as a determination of t, and hence, as an estimate of
e
e
(1. 9)
provided the denominator is non-zero and the ratio lies in [0,1].
F(Z) are distinct, for a suitable score matrix
As F(l) and
~, and a suitably chosen ~, these
conditions are expected to be realised with high probability at least in large
samples.
We call
e(~)
sequel, we shall allow
as 'the
~
fixed-~
linear rank-score estimate' of e.
In the
itself to be determined by the data so as to achieve
maximum asymptotic efficiency.
The corresponding estimate would be called
'optimised linear rank-score estimate'.
In the remainder of this section we consider some general results on random
sequences that will be used repeatedly later.
LE~A
l i gee) is a real-valued function continuous over an open p-dimensional
1.1
interval I, eN is a p-vector sequence such that for sufficiently large N, ~C=JC=I
J is a bounded closed interval and
~{here
is a sequence of random p-vectors such
~
P
P
that ~ - ~ -+ ~, then, g(!SN) - g(~) -+ O.
Proof.
Clearly we can find a closed bounded interval J', such that
JC:J'C=
I such
that for sufficiently large N, ~EJ' and ~EJ' with probability arbitrarily close
to 1.
g(~)
As
is uniformly continuous in J' the lemma follows.
Given a sequence of random
matrices
~
~N
p-vector~ ~
, we say "'TI
K_ is asymptotically
~'~l/[~'~N~J
k
2
and a sequence of positive definite
N(O'~N)
~
~[
if for every non-null p-vector
~,
converges in law to N(O,l).
For a symmetric matrix A we use the notations meA) and M(A) to denote the
minimum and maximum characteristic roots respectively.
normal cdf.
¢(z) denotes the standard
5
LE~~
1.2
Let
~
be a sequence of random p-vectors and LN(PXP) be a sequence
of positive definite matrices such that
~
is asymptotically
N(£'~N)'
Then,
provided
o<
lim inf {m(~N)/M(fN)}
N-?<x>
(l.10)
we have, for every z
(l.ll)
Proof.
-k
Let us write m(L ) =
N
~,
M(L N) =
}~.
Since replacement of
~
and L by
N
k
-1
~ 2~ and ~ ~N does not change &'~/[&'~N&] 2, without any loss of generality,
we may assume
~=l
for all N.
o<
Then (l.10) can be restated as
lim inf ~~~
N-?<x>
l.
(l.12)
The implication of (l.12) is that given any sequence {N } we can find a
k
k} such
that lim ~N' exists and is positive definite.
k-?<x>
k
Suppose (1.11) does not hold i.e., there is a z' such that
subsequence {N
(1.13)
Then we can find a number E>O and a subsequence {N } such that
k
(l.14)
and by (1.12) we can choose {N } so that
k
lim
k-?<x>
fN
k
= some positive definite matrix, say, f o'
Nmv, (1. 15) together with the fact that ~'~ /[&'~N ~]
implies that
.t~
~
k
k
~o
k
(l.15)
k!
2
~ N(O,l) for all ~f£,
h
.
Y
were
we wrlte
-0 for a random vector distributed as N(o,E ).
- -0
6
apply a general result on w'eak convergence due to Ranga Rao ([4] Theorem
Hell'_),;,]
4.2) from which we get that
sup Ip{~t~
k
2 z'} -
p{~I~O < z'}1
0
+
&-:f~
Now let us denote by
~N
a random vector following exactly the distribution
k
Then (1.15) implies
N(~'~N)·
(1.16)
k
J.
~N
+
k
Xo ' and again by Ranga Rao's result, we get
(1.17)
From (1.16) and (1.17) we deduce,
sup Ip{£I~
~:f~
- :;'~k
2 z'} -
P{~'XN
< z'}1
+
0
k
and hence,
i. e. ,
(1.18)
But this contradicts (1.14).
Note.
Hence the lemma.
Under the conditions of the lemma, if
~N
is a sequence of non-null vectors,
then ~~~ is asymptotically distributed as N(O, ~~~N&N).
2.
SOME PRELIHIT:{ARY RESULTS
In the following sections we shall assume that for every N there is a triplet
k=O,1,2,
Hhenever
N+oo
ue assume the following holds.
(2.1)
7
ASSilllPTION I
There is a number A* (O<A*<!) sueh that
3
k=O,1,2, for all N
(2,2)
He r"n:ite
(2.3)
By (1.2), this gives
(2.4)
As AO' AI' A2 vary with N, so does H(e)'
(2.2) entails
k=0,1,2.
We write
F~~~(X),
H[i](X) for the marginal cdf's of
(2.5)
F(k)(~)
and
H(~)
correspond-
(k)
ing to the i-th coordinate and F[. ,](x,y), H[. '](x,y) for the bivariate
1,J
1,J
marginal edf's of the same corresponding to the i-th and j-th coordinates.
Let the empirical cdf's based on the k-th sample be F(k)(x).
Then the same
based on the combined sample is
(2.6)
A(k)
A
The corresponding marginal cdf's are as before denoted by F[i](x), H[i](x) etc.
Ivriting
U(x)
0,
if
x < 0,
1,
if
x;:::' 0,
(2.7)
we have, clearly,
(2.8)
H[;](X)
~
=!N \Lk=O
2
n
\ k U(x-x(k»
La=l
ia
8
Following Hajek [1] we suppose that there are p 'score functions' ~.(u),
1.
defined over O<u<l, i=l, ... ,p, and that, for each i, CP,(u) determines the i-th
1.
tl~
row of
score matrix (1.4) by either of the following two relations:
a
~i (a) = CPi (N+l) ,
0.=1,2, •.. ,N,
(2.9)
o.=1,2, .•• ,N,
(U~l)
<
u~2)
< ••. <
(2.10)
U~N) are the order statistics of a sample of size N from
the uniform distribution over (0,1».
As in [1] we assume that the following condition is satisfied by the
ASSIJMPTION II
1.
'so
For each i=1,2, ... ,p,
tp.
1.
where
~.
~il(u), ~iZ(u)
(2.11)
(u) = <?'l(u)
- <?·Z(u)
1.
1.
are both nondecreasing, square integrable and absolutely
continuous inside (0,1).
~.
This implies that
1.
is square integrable over (0,1) and also that for any
O<a<b<l,
q>. (b) 1.
where the derivative
~~(u)
1.
f
tp. (a)
1.
a
b
cp~ (u)du
(2012)
1.
exists almost everywhere in (0,1).
Let us write
(k)
lli
00
(k)
a ..
1.J
00
f f
-00
-00
(2013)
=
(k)
(k) (k)
<P.(R[,](X»oq>.(H[.](y»dF[ . . ](x,Y)-1-I. 01-1.
1.
1.
J
1.
1,J
1.
J
i,j=l,Z, .•• ,p; k=O,l,Z.
(2.14)
9
These depend on N through H(;S)'
(k)
~.IS,
implies that U~k), a.. are all uniformly bounded.
1
l
J-J
grability of the
U
(k) _ ( (k)
-
~
However, (2.5) together with the square inte-
U
1
(k»l
, ... ,U P
~(k)
(k)
(a.. ). . 1 2
• k=> 0,1,2
1J
1,J=>, , ••• ,p
,6
Writing
(2.15)
from (1.2) we get
~(O) => e~(l) + (1_e)~(2)
(2.16)
e L:(l) + (1-e)f(2) + e(1_e)[~(2)_~(1)][~(2)_g(1)],
Finally, we assume that the following conditions hold as
N~.
ASSUMPTION III
(a)
for at least one i,
lim inf (U~2)_U~1»
1
N--xn
or
(2.17)
lim sup (U~2)_U~1»
1
N~
(b)
> 0,
1
< O.
1
{lim inf m(~(k»} > O.
min
N~
k=>1,2
(2.18)
The implication of III(a) is that, for at least one value of i, the mean
value of
~.(H[.](X~l»)
1
1
1
is larger than that of
for all but a finite number of values of N.
~.(H[.](X~2»)
1
1
1
Since F(l) and F(2) are supposed to
~.,
1
be distinct, for proper choices of the score functions
to get this difference reflected in the mean value of
one i.
(or vice versa)
Thus III(a) is not unduly restrictive.
it would be possible
~.(H[.](X.»
1
1
1
for at least
Assumption III(b) is a sort of
'nonsingularity assumption', which implies that for none of the two populations
F(l), F(Z), any of the p variables becomes ever 'useless' in the sense that its
value is predictable from those of the other variables.
\Vith the above notations and assumptions. we now state and prove some results
to be used i.n the subsequent sec tions.
THEOREM 2.1
Under Assumptions I and II
10
-(k)
a.
1.
Proof.
lJ
(k) P
i
-+
i=1,2, ... ,p; k=0,1,2.
0,
(2.19)
To prove (2.19), it will be sufficient to show that
(2.20)
.,
I ) , a.(k)(cp)
-(k)
( ) , to d enote quant1.t1.es
..
F rom (2 . 11) , Wr1.tlng
aN' (
a~.
. , a.
~.
lq
1.
obtained from the function
from
~.(u),
1
la
lq
1.
lq
f.lq (u) exactly as aN'1 (a), a~k),
~~k)
are obtained
la
1
q=1,2, and letting c stand for a generic constant, we have
-(k)
Var(a.1
-(k)
< 2 Var(a.1
-(k)
(CP'l)
- a.1
1
(9'1»
1.
(CP'2»
1
-(k)
+ Var(a.1.
(~'2»}
1
(2.21)
1 L\' N1 aN' (a Icp. ), q=1,2.
\\There a- Nl
.. (cp.lq ) = -N
a=
1
1.q
The last inequality follows directly
from Hajek's Variance Inequality ([1] Theorem 3.1), as
and aN.(al~. ), is nondecreasing in a=1,2, ... ,N.
1
lq
and a
Ni
(al~
1
N
1
where <p.
lq
iq
Now as~. is square integrable
1.q
) is as in (2.9) or (2.10), we have as N-+oo
N
L [a.. (al<p·lq )
a=1 Nl
1
fO~'
lq
1
J
- ~N·(CP. )]2 -:
[<p. (u)-~. ]2du,
1
1.q
0
lq
1.q
(u)du (see [2] pp. 158, 164).
q=1,2
(2.22)
From (2.21) and (2.22) we con-
elude that
(2.23)
11
Next consider the second term in (2.20).
Under our Assumption I, by lemma
5.1 of Hajek [1], given any E>O we can find a decomposition
(2.24)
~i
such that
is a polynomial,
~il
~i2
and
1
fo ~:l(u)du
1.
are nondecreasing, and
1
+ f~:2(u)du <
0
E.
(2.25)
1.
Hence, using notations similar to above,
-(k)
a.
1.
(k)
-11.
1.
Hence
00
- (k)
+ [Ea i
Now
~.
1.
(~i2)]
2
+[
f ~il (H[i] (»
x
-00
00
(k) ( )] 2
dF[i] x
(k)
2
+ [ f ~i2(a[i](x»dF[i](x)]}.
(2.26)
-00
being a polynomial has bounded second derivative in (0,1), and hence,
by a result of Hajek (see (4.27) in [1]), the first term on the right of (2.26)
has limit 0 as
N~.
Further, by Schwartz inequality,
<
< C
(2.27)
12
by (2.2) and (2.5).
Now, just as in (2.22), we have
1
-r
f ,1J;~
°
(u) du .
(2.28)
lq
From (2.25), (2.27) and (2.28), it follows that the other terms on the right
of (2.26) can be made less than an arbitrary quantity by choosing N large
enough.
Hence
(2.29)
(2.23) and (2.29) imply (2.20).
Q.E.D.
We define
(k)
(k)
a, a.
lCi. JCI.
~vhere
(k)
aiCi.
J
(2.30)
Under assumptions I and II
A(k)
(k) P
+ 0,
- a"
lJ
lJ
a, .
Proof.
1
is as in (1.5).
THEOREM 2.2
~vhere
-(k)-(k)
a. a.
i,j=1,2, .•• ,p; k=0,1,2
(2.31)
(k)
a, , is defined as in (2.14).
lJ
As
~~k)
1
i=l, ... ,p are uniformly bounded for all N, Theorem 2.1 together
with Lemma 1.1 imply that
Hence, to prove (2.31), it will be sufficient to show that
1
nk
L\'
a (. k) a C. k) n k C/.=l lCi. JO'.
00
00
-00
-00
f f
(k) '](x,y) P
<P.(H[,](x»q>,(H[,](y»dF[.
+0.
1
1
J
J
1,J
The proof will be similar to that of Theorem 3.1 of Puri and Sen [3].
(2.32)
We only
13
sketch the outline.
First using the decomposition (2.24) we can show that the
L.R.S. of (2.32) can be written as
nk
~ L\'
n k a=l
00
00
J J
a (. k) ("'.
) • a (. k) (11,
•.) 'fJ
't'
1a
1
Ja
1
~ ~
'(H [ .] (X) ),,,
(H [
'fJ •
1jJ •
1
1
J
.]
J
(y) ) dF (k)
[ . . ] ( x. Y) +R
1,J
(2.33)
where by application of Schwartz inequality and use of (2.2), (2.5) and (2.28),
it can be shown that
IRI
<
for sufficiently large N (c is a generic constant).
C'E
Hence to prove (2.32). it will be sufficient to show that the difference between
the first two terms in (2.33) converges in probability to zero.
Since 1jJ. is a
1
polynomial. it is known that
(see (2.9) and (2.10».
Hence it will be sufficient to take a
Ni
(al1jJi)
=
1jJi(N~l)'
Thus recalling (1.5) our problem reduces to showing that
00
-
converges to zero.
00
f J
~ ~
(k)
1jJ. (H [ . ] (x) ) 1jJ . (H [ . ] (y) ) dF [ . .] (x, y) •
1
1
J
J
1.J
(2.34)
As 1jJ. has bounded second derivative in (0,1), by Taylor
1
expansion, we may ,rrite
(2.35)
i=1,2, ... ,p.
Substituting (2.35) in the first term of (2.35) we get that (2.34) is equal to
(say)
14
1
n
n
\'k
(k)
(k)-
L l/Ji(H[l,](X1'a »·l/J,(H[,](X''-J, »-
CL= 1
J
J
Jv
Joo
_00
(k)
00
Jl/J,(I-I[,](X»1)J.(H[.](y»dF[,
-00
J.
J
J.
J
J. ,
,](X,y)+R*
J
(2.36)
Using the notation (2.7),
R(k)
ia
and hence, given
=L L
k' a'
x~k)
can be considered as the sum of N Bernoulli variables.
J.a '
Using this fact and Schwartz inequality, it may be sho\vu that in (2.36) R* is
the sum of terms each of which converges in mean square, and hence, in probability to zero.
As, in (2.36), the difference between the first two terms
converges in probability to zero (by the Khinchin
required result follows.
Law of Large Numbers), the
Q.E.D.
The following lemma will be required in proving the next theorem.
~
LE~illA
2.1
Under assumptions I and II, as
N~,
i=l, ... ,p
Proof.
(2.37)
We recall that, if t denotes the unobservable proportion of 'observations
from F(l), among
~~O), a=l,2, ... ,n o ' then (1.7) holds.
Hence (2.37) will follow
if He can shoH
i=l, .•. ,p,
or equivalently, as E(t) = 8,
i=l, .•. ,p
As
t
(2.38)
is a binomial proportion, by Schwartz inequality, the modulus of the left
hand member of (2.38) is dominated by
15
e·
which tends to zero by (2.23).
THEOREM 2.3
Q.E.D.
Under assumptions I, II, and III(b), as
is aSymptotically distributed as N(o,
~
no
L(O)
~(k) ~iven by (2.14) and (2.15).
Proof.
N~,
+ ~ L(l) +
nl
~
We have to show that for any nonnull p-vector £=(£1' •.• '£ )',
p
~
(2.39)
Now, by assumption III(b), the denominator in
(2.40)
is bounded away from zero.
to zero as
N~.
So, by 1enuna 2.1, the expression (2.40) converges
Hence (2.39) will follow if we can show
(2.41)
(2.41) follows from the results of Hajek.[l] in the same way as Theorem 4.1 of
[3] •
Specifically, one has to show first that when
~.,
~
i=1,2, ..• ,p, have bounded
second derivatives in (0,1), the left hand member of (2.41), has asymptotically
the same distribution as
16
n
(1-8)
- n
\'2
L [
2
~
(2)
Lt. CP. (H [ . ] (X. ) ) ]}
a=l i=l
1
1
1
la
(2.42)
This follows by an application of Hajek's [1] inequality (4.26) (the inequality actually holds under both (2.9) and (2.10»
and Assumption III(b).
As,
by Lindeberg-Feller Central Limit Theorem, (2.42) is asymptotically distributed
as N(O,l) (see (2.14», we get that when
~~,
1
i=l, ... ,p exist and are bounded,
(2.41) holds. Generally, under Assumptions I, II and III (b) we use the
decomposition (2.24) as in [1], to show that we can approximate the left hand
member of (2.41) arbitrarily closely by another expression of exactly the same
form but based on polynomial scores-generating functions
follows.
¢..
~
Hence the theorem
(Note that because of the particular nature of our problem, we do not
require a 'centering assumption' as in [3])
3.
FIXED-& LINEAR
Given a nonnu11 vector
~
Rk~
ESTI~~TE
SCORE
(p X1) in (1.9) we have proposed
P:' (~(2)
8 =
Q.E.D.
8(~)
_ ~(O»
= -----,------.,........,.--
~' (~(2)
~(1»
(3.1)
as an estimate of 8, provided the denominator is non-zero and the ratio lies
in [0,1].
Let
~
be such that
lim inf ~,(~(2) _ ~(1»
N~
>
°
(3.2)
17
&so
Under Assumption III(a), it is always possible to choose
that (3.2)
(For instance, we make take £.>0, if lim inf (~~2)_~~1»
1
1.<0, if lim sup (~~2)_~~1)
:L
1
1
1
1
> 0,
By Theorem 2.1, and
< 0, and £.=0, otherwise.).
1
(2.16) ue have
&' (~(2)
&' (~(2)
) _ &,(~(2)
-
-(1)
-
-(0)
~
) _ 8~' (~(2)
~
_ ~(l»
-
~
!
(1) )
0
(3.3)
p
+
0
(3.2) and (3.3) imply that, with probability approaching 1 as
a well-defined estimate of 8.
~(k)
N~,
(3.1) gives
In fact, as by our assumptions, the elements of
remain bounded, by Lemma 1.1, we immediately get a-8 : 0, so that 8 is a
consistent estimate of 8.
Further, by the same argument
(3.4)
and by Theorem 2.3,
IN ~'{~(O)
-(1)
8a
(1_e)~(2)} is asymptotically normal
with mean 0 and variance
(3.5)
As, by our assumptions, (3.5) is bounded,
bounded in probability.
IN &'{~(O)_ ~(I)_(I_e)~(2)}
Hence, by well-known results,
has asymptotically the same distribution as
So, IN (8-8) is asymptotically normal with mean 0 and variance
is
18
(3.6)
By the second relation of (2.16), (3.6) can be written as
(1-e) 1Ln:(2) H.
nZ
-
(3.7)
It is to be noted that the first term in (3.7) represents the asymptotic
variance of (N(t-e).
So far we have considered
~
as a fixed non-null vector.
Before concluding
this section, we note that, if in (3.1) we replace & by &N' where
sequence of non-null vectors, then provided (i) the elements of
bounded and (ii)
e(~N) as well.
~r
~N
~N
~N
are uniformly
satisfies (3.Z), the above conclusions will hold true for
This is because, (3.3) and (3.4) obviously apply to
asymptotic normality of
~N'
and
v'N J/,,{;;;:(O)-e;;;:(1)_(1_e)a(2)}
with mean 0 and variance
~N ~
~
-
follows by lemma 1.2, since by our assumptions the latent roots of
are bounded away from both zero and
4.
is a
I k: O :k
~(k)
00.
OPTIMISED LINEAR RANK SCORE ESTIMATE
In the previous section we have seen that the second term in the expression
(3.7) for the asymptotic variance of
.
v'N (e(~)-e) depends on~. In this section we
investigate what will be its minimum value and whether it is possible, by any means,
to attain this minimum.
The problem is involved because the matrix in the second
term of (3.7) involves the unknown
e
but is still tractable provided we are pre-
pared to solve higher degree polynomial equations.
well-kno\vu algebraic lemma.
We first state the following
19
LEM£'lA 4.1
For a positive definite matrix G(pxp) and non-null vector
min
&'g&
Mo
(~' ~)2.
and the minimum is attained for G&
~(pXl),
g"£ where g is an arbitrary non-zero
number.
By our Assumptions III(a) and (b), for sufficiently large N,
g(2)_~(1)=~
(say) :f 2, and
g(8) (say)
is positive definite.
(4.1)
Hence, by the above lemma, the minimum value of the asymptotic
variance (3.7) of lJN(e(~)-8), over all possible choices of ~, is
(4.2)
If 8 were known, to attain this minimum in
Q(8)~
8(~)
we could take
~
so that
=0
Of course, the solution of (4.3), which we denote by
(4.3)
~N
would depend on N.
But
as noted at the end of section 3, the results of that section would still remain
~
true (Q(8) and
being bounded, so are the elements of
~N).
These considerations suggest that to estimate 8 in an optimal manner we may
proceed as follows.
=
(a~~)) where a~~) is given by (2.30).
~J
~J
2(s)
Consider the set of (p+l) simultaneous equations
For any s (O<s<l), we set up
(4.4)
20
~'(~
(2)
-e-(0)-)-s·~'(~(2)
-(2)
a
A
G(S)~
-(2)
a
-(1)
- a
(1)
)
o
(4.5)
-(1)
a
(~1""'~p )=~'.
~
in the (p+l) unknowns s and
-~
- 0
(4.6)
By Theorem 2.1 we have
P
+
(4.7)
0
Also by Assumption I, Theorem 2.2, and expression (4.4)
Q(s)
Q(s)
P
+ 0
uniformly in O<s<l.
(4.8)
Hence, the equations (4.5) and (4.6) are suggested naturally from (3.1) and (4.3).
(We
~rrite
s instead of
e,
to avoid confusion with the true value).
&=~N is a real solution of (4.5) and (4.6), such that O~e~l, we propose eN as
an estimate of
THEOREM 4.1
e.
The following two theorems describe its properties.
With probability approaching 1 as N+oo, the equations (4.5)-(4.6)
possess a real solution.
If eN' &N is any such solution, and ~N is the unique
solution of (4.3), then, as N+oo
A
eN
Proof.
P
+
e,
(4.9)
Because of our assumptions I, II, and III(b) we can find numbers m and
o
M such that
o
o
< m < min m[Q(s)] < max M[g(s)] -< M0 <
0<8<1
o - O<s<l
for all sufficiently large values of N.
co
(4.10)
Hence, by (4.8), we can find numbers
m' and M' such that, as N+oo.
o
0
A
Prob{O < m' < min m[g(s)] < max M[§(S)] ~ M~ < co}+l.
o - O<s<l
O<s<l
(4.11)
21
"-
If G(s) is positive definite, from (4.6),
e-(1) ),
(4.12)
which, on substitution in (4.5), gives
(4.13)
Since each cofactor of G(s) is a (2p-2)-th degree polynomial in s, (4.13)
represents a (2p-1)-th degree polynomial equation in s.
Clearly, if with
probability approaching 1, (4.13) has a real solution for s, (4.5)-(4.6) will
also have a real solution for s, £.
Now, by Theorem 2.1 and the first relation in (2.16)
~(2) _ ~(O) _ 86 :
0
(4.14)
By (4.7), (4.8), (4.14) and Lemma 1.1, in view of (4.10) and the uniform
boundedness of
ference
~vi
2,
the polynomial in (4.13) has a stochastically vanishing dif-
th
(8-s)~' [g(s)]-l~
(4.15)
By Assumption III(a), 0 is non-null for all sufficiently large N.
Hence the
values of (4.15), and therefore, with probability approaching 1, those of the
left hand member of (4.13), at s
ability approaching 1 as
N~,
=
8±E are of opposite signs.
Thus with prob-
(4.13) would have a root between 8±E, whatever E.
This proves the first part of the theorem.
Again, if eN' ~N is any solution of (4.5)-(4.6), we can write
~N
e-(1) ),
(4.16)
22
~'(a(2)
N-
A
eN
A
-
~'(a(2)
N-
-(0)
~
-(1)
~
)
,
(4.17)
)
A
provided G(e ) is positive definite and the denominator in (4.17) is positive.
N
A
A
By (4.11), G(e ) is positive definite in probability.
N
as by Assumption III(a),
~'~
Also by (4.7) and (4.11),
is bounded away from zero, we can find an E>O such
that
(4.18)
Now, from (4.17),
A
8 -8
N
By (4.7) and (4.14),
~
ments of
(4.19)
e-(2) -e-(0) -8(e-(2) -e-(1) ) ~P o.
By (4.7) and (4.11) the ele-
~N
are bounded in probability. These facts together with (4.18),
P
imply from (4.19), that 8 -e ~ 0. Hence we get that
N
A
A
(4.7), (4.16), and (4.20) imply iN-iN
P
~
Q by
Lemma 1.1.
Q.E.D.
A
Note:
The above theorem shows that 8 is a consistent estimate of 8.
N
If 0<8<1,
A
then for large N, with high probability 8 lies in [0,1].
N
or 1.
Trouble arises if 8=0
If 've conventionally take eN to be 0(1) whenever s-solution of (4.5)-(4.6)
is negative (exceeds 1), then, whatever
sistent estimate of 8.
e,
A
eN always lies in [0,1] and is a con-
The cases 8=0 or 1 are of importance in the context of
the problem of classification, where the O-th sample may have come exclusively
from F(l) or F(2).
communication.
It is intended to discuss this problem in a subsequent
23
A
THEOREH 4.2
A
~N
If eN'
A
is any real solution of (4.5)-(4.6), as N-t<lO,
IN (eN-e)
is asymptotically normally distributed with mean 0 and variance
(4.21)
"-
Proof.
As earlier, with probability approaching 1, we can represent eN by (4.17),
so that
(4.22)
~N =
where
g(8)
-1
~
is the solution of (4.3).
Now by theorem (2.3),
with
~
1N{(1-8)~(2) + e~(l)
-
~(O)}
is asymptotically normal
mean vector, and a dispersion matrix which by (2.16), and (4.1) can be
written as
QQ' +
8(1-8) • :
gee)
(4.23)
o
By our assumptions (4.23) is uniformly bounded.
Hence the elements of
In {(1_8)~(2) + e~(l) - ~(O)} are stochastically bounded.
Therefore, the second
relation in (4.9), and (4.19) imply that the second term on the right of (4.22)
converges in probability to zero.
Again, by our assumptions, the roots of (4.23) are bounded away from both
o and
00.
Hence, as noted at the end of section 3, by Lemma 1.2,
IN ~'{(1_8)~(2) + e~(l)
-N
-
-
- ~(O)} is asymprotically normally distributed with
mean 0 and variance
Also, by (4.7) and (4.9),
24
Combining these facts, we get that the first term on th8 right of (4.22) is
asymptotically normal with mean 0 and variance given by (4.21).
This completes
Q.E.D.
the proof of the theorem.
The following reduction shows the structure of the equations (4.5)-(4.6)
more clearly.
With probability approaching 1, in large samples, E(l) and E(2)
are both positive definite, and therefore, it is possible to find a non-singular
matrix
f (pxp) such that
(4.24)
d=(dl,···,d )'
-
Then, putting
~=9~, ~=(hl, •••
p
,hp) I, and premultiplying (4.6) by C' , from (4.5)-
(4.6) we derive the (p+l) equations in sand h:
hId - s-h'b = 0
By (4.4) and (4.24), these can be explicitly written as
p
I
i=l
p
h.d. - s
1
1
I
i=l
(4.25)
h.b. = 0
1
1
b., i=l, ... ,p
1
Here d., b., y., i=l, .•. ,p are all known.
111
To solve these we may substitute
(4.26)
25
for h. in (4.25), using (4.26), and then solve the resulting polynomial equation
1.
in s.
Alternatively, we may put arbitrary values of hl, •.• ,h
a rough estimate of s and use (4.26) to solve for hl, ... ,h
p
p
in (4.25) to get
and then proceed by
iteration.
A
As for the sampling variance of
the estimate 8 , in large samples, we
N
may use
(4.27)
A
A
A
where G(8 ) is obtained by putting 8
N
N for s in (4.4).
A
As 8
N
is a consistent
estimator of 8, (4.27) also is consistent in the sense that the ratio of (4.27)
to the true asymptotic variance of eN converges in probability to 1.
5.
DISCUSSION
From Theorem 4.2, we get that the asymptotic variance of the optimised
A
linear rank score estimate 8 is given by
N
8(1-8) + ~ [2'{G(8)}-1§]-1
;
(5.1)
o
where 0
-=
~(2)_~(1) and G(8) is given by (4.1).
- -
variance of t
= rln o ,
Of this the first term is the
the standard estimate that we would use if r (the number
of 'first-population observations' in Q-th sample) were observable.
The second
term represents the inflation in the variance due to the unobservability of r.
Clearly, larger the value of 0'{G(8)}
-
-1
.
0, more accurate the estimate.
-
that i f Q(pxp) is positive definite, then for the partitioning
~12
~22
We know
Z6
where ~l is a q-vector and §ll is of order qxq, we have
The equality holds if and only if ~2
=
-1
g2lgll~1.
Hence, so long as this special
condition is not met and the matrix g(8) remains positive definite, consideration of a larger number of variables would increase the accuracy of
The choice of the score functions
importance.
~l(u), .•. ,~p(u)
eN.
is, of course, of great
In choosing these, we should use any knowledge that we may possess
about the way F(l) and F(2) differ from each other.
would be to choose the scores so that
Again, the general principle
~'{g(e)}-l~ is as large as possible. We
propose to deal with these aspects of the problem in a later communication.
Incidentally, we remark that the application of the procedures developed in this
paper require, only knowledge of the observational ranks.
However, when the
observational values (1.1) are themselves available, we may use these in the
same way as rank scores, to get an estimate of
e
based on the sample means pro-
vided (i) all second order moments exist for F(l), F(2) and (ii) F(l), F(Z)
differ in location (i.e., in mean vector).
Under appropriate conditions, the
asymptotic variance of this estimate would have the same form as (5.1), with
~(k), ~(k) now standing for the mean vector and dispersion matrix of F(k), k=l,Z.
The procedures considered in this paper encompass a wider variety of problems,
since the score functions are free to be chosen to take account of any kind of
divergence between F(l) and F(Z) •
Another point of practical importance is the effect of the relative values
of nO' n , and n on the accuracy of the estimate.
Z
l
While specific recommenda-
tions here must depend on the choice of score functions, some general observations can be made.
Generally, it would be profitable to have a large nO not
only because that reduces the first term in the asymptotic variance (5.1) but
27
also because in (4.1) (at least for
e
near around
~)
the dominant terms being
made small, that reduces the second term in (5.1) as well.
be preferable to have n
l
Similarly, it would
large relative to n , or vice versa, according as
2
e
is
expected to be close to 1 or O.
Finally, we remark that, although we have followed the 'best linear combination' approach to obtain the estimate of section 4, we could have obtained an
estimate by an alternative approach.
-~(O)
Thus starting from the vector
_ 8 ~(l) _ (1-8) ~(2), instead of equating a linear combination of its ele-
-
-
ments to zero, we could minimize a positive definite quadratic form in its elements
to get the estimate.
Minimization of
(where A is a positive definite matrix) with respect to 8, gives us an estimate
in the form
-(0»/(-(2)
-(1» 'A(-(2)
-(1» .
a
-a
a
-a
a
-a
( a-(2) -a-(1»'A(-(2)
- -
~
-
-
-
-
--
-
Optimisation with respect to the choice of A then leads us practically to the
same solution as before.
ACKNOWLEDG~rnNT
The author wishes to thank
Professo~
P. K. Sen for helpful discussions.
28
REFERENCES
[1]
Hajek, J. (1968). Asymptotic normality of simple linear rank statistics
under alternatives. Ann. Math. Statist •.~ 325-346.
[2]
Hajek, J. and Sidak, z. (1967). Theory of Rank Tests. Academic Press,
New York and London, Academia Publishing House, Prague.
[3]
Puri, M. L. and Sen, P. K. (1969). A class of rank order tests for a
general linear hypothesis. Ann. Math. Statist. ~ 1325-1343.
[4]
Ranga Rao, R. (1962). Relations between weak and uniform convergence of
measures with applications. Ann. Math. Statist. ~ 659-680.