..
.
RANK PROCEDURES FOR SOME TWO-POPULATIONS
MULTIVARIATE EXTENDED CLASSIFICATION PROBLEMS
By
Shoutir Kishore Chatterjee
University of North Carolina at Chapel Hill, N. C.
and
University of Calcutta, India
Institute of Statistics Mimeo Series No. 851
NOVEMBER 1972
RANK PROCEDURES FOR SOME TWO-POPULATIONS
MULTIVARIATE EXTENDED CLASSIFICATION PROBLEMS*
SHOUTIR KISHORE CHATTERJEE
University of Calcutta, India
and
University of North Carolina at Chapel Hill, U.S.A.
·e
*Work carried out at the Department of Biostatistics, University of North
Carolina under the Multivariate Nonparametric Project sponsored by the
Aerospace Research Laboratories, Air Force Systems Command, U. S. Air
Force Contract No. F33615-71-C-1927. Reproduction in whole or in part
is permitted for any purpose of the U. S. Government
i
Multivariate Extended Classification
Mailing address:
-e
Dr. S. K. Chatterjee
Up to Dec. 31, 1972:
Department of Biostatistics
University of North Carolina
Chapel Hill, N. C. 27514, USA
From Jan. 1, 1973:
Department of Statistics
Calcutta University
35, Ballygunge Circular Road
Calcutta - 19. INDIA
ii
Abstract
Given independent samples from three multivariate populations with
cumulative distribution functions F(l)(e), F(2)(e), and F(O) (e)
-
=
e F(l) (x) + (l-e)F(2)(x), where 0 < e < I is unknown, the three-action
-
-
-
problem involving decision as to whether the value of e is high, low, or
intermediate, is considered.
A class of consistent procedures based on
the relative spacing of three sample averages of linearly compounded rank
scores is formulated.
The asymptotic operating characteristics of the
procedures when F(l) and F(2) come close together are studied and the
best choice of the compounding coefficients in terms of these considered.
-e
The consequence of using estimates of the best coefficients on the
asymptotic operating characteristics is also examined.
AMS 1970 subject classifications:
Key words and phrases:
Primary 62H30; Secondary 62G99.
Multivariate two-population mixture, three-decision
problem, rank scores, linear compound, consistency, asymptotic local operating
characteristic, best choice of coefficients, procedure with estimated
coefficients.
iii
1.
INTRODUCTION
Suppose there are three p-variate populations with continuous cumulative
distribution functions (cdf)
F(O)(~), F(l)(~), and F(2)(~), x = (xl' ••. ' x ) , .
p
Of these F(l) and F(2) are known to be distinct and p(O) is known to be an
unknown mixture of F(l) and F(2).
That is
(1.1)
for some unknown e,
dO' d , d :
2
l
d
l
(O~e~l).
There are three possible decisions or actions
is preferred when the value of e is high, d
2
is preferred when
the value of e is low, and dO is preferred when the value of e is intermediate.
Independent random samples of sizes nO' n , n are available from F(O), F(l),
2
l
·e
and F(2) respectively.
Let these be
(k)
(k)
= (Xl a ' ... ,Xpa ) ,
a=l, ••.
,~,
k=O,1,2.
(1.2)
Let the N = n +n +n observations corresponding to the i-th variable be ranked
O l 2
together and let the rank obtained by
x~~)
be
R~~).
Thus the samples (1.2)
give rise to the rank vectors
(Rl(k) , ••. , R(k» , ,
a
pa
a=l, ••. ,n ,
k
k=O,1,2.
(1.3)
In this paper we shall formulate and study certain decision rules for
choosing one of dO' d , and d on the basis of the rank vectors (1.3).
2
l
The above decision problem can obviously be considered as an extended version
of the nonparametric classification problem.
(For the standard classification
problem e can have only 1 or 0 as its possible values.)
~
In many practical
situations it would be realistic to assume that the O-th sample contains observations from the two basic populations in unknown numbers and our interest
2
would be to determine, which, if any, of the two populations is preponderantly
represented. There, a formulation such as above would be natural.
Further, we shall make the preference pattern more specific as follows.
We shall suppose that there are two pairs of numbers (Ll,U ), (L ,U 2 ),
l
2
o~
L < U < L < U
2
2
l
l
e~
U and only to d for L < 8 < U ' (ii) d is preferred to both dO' d l for
2
l
l
2
l
8
~
U2
2
1, such that (i) d
l
is preferred to both dO' d 2 for
L and only to d for L < 8 < U2 ' (iii) dO is preferred to both d l , d 2 for
2
2
l
~
8
~
L , only to d for 8 < U ' and only to d for 8 > L •
l
2
l
2
l
preference between d
d
~
1
There is no
and dO for Ll < 8 < U ' and similarly no preference between
l
and dO for L < 8 < U .
2
2
With such specification the above problem can be con-
sidered as a monotone three-decision problem (see, for instance Ferguson [3],
chapter 6).
The results of the present paper are closely related to those of Chatterjee
[1], where rank methods for estimation of 8 under the above set up are considered,
and we would have many occasions to refer to [1].
2.
A CLASS OF DECISION RULES
We start with a compounding vector
~(pXl)r~,
two numbers 8 , 8
1
2
(0<8 <8 <1), and a score matrix
2 1
~(pXN)
= (ai(a))i=l , ••• ,P .
(2.1)
a=l, ••• ,N
With the help of the score matrix we derive from the rank vectors the
(random) rank score vectors
!!
(k)
(k)
(k)
= (al(Rla), ••. ,a (R
P pa ))
(k)
(k)
= (a a ' •.. ,ap,a )
1
,
(say),
,
a=l, ••• ,n ; k=0,1,2
k
(2.2)
e-
3
and hence their means for the three samples
-(k)
a
1
-(k)
-(k)
) = (a
' ... , a
p
l
~
L
k=O,l,Z.
~ a=l
(Z.3)
Whenever, l'a(l) # ~,~(Z), we write
(,Q, ~ (Z)
(Z.4)
The decision rule is formulated as follows.
When l'a(l) # ~,~(Z),
(a)
(i) choose d
l
T(~) ~ 61
if
(ii) choose d Z if
(iii) choose dO
·e
(b)
if
6Z <
T(~)
< 6Z
T(~)
< 61
When l'a(l) = l'a(Z) choose one of d , d ' dO at random.
l
Z
(Z.5)
An examination of (Z.5) shows that what is really suggested here is:
choose d
l
when l'a(O) is close to ~,~(l) and away from l'a(Z); choose d
-
- -
~,~(O) is close to ~,~(2) and away from ~,~(l); choose dO otherwise.
Z
when
In fact,
for any real number q if we write
[q]O,l = q
when
= 0
when
q < 0
= 1
when
q > 1,
(Z.6)
then e(~) = [T(~)]O,l can be considered as a reasonable estimate of 6 (see [1]).
Since 0 < 6
replacing
e
Z
< 6
T(~)
1
by
< 1, the above decision rule can be described equivalently by
6(~)
in (Z.5) and this explains the motivation directly.
For future use we note that the above decision rule can be reformulated
as follows.
4
(ii) choose d
2
if (~,~(2)_~,~(1»(~,~(0)_82~,~(1)_
(1-8 )~'a(2»
2 - -
(iii) choose dO if
> 0
-
(8l~,~(1)+(1-81)~,~(2)_~,~(0» x
(~'a(0)_8 ~'a(1)-(1-8 )~'a(2»
- -
(b)
When ~,~(l)
= ~,~(2),
2- -
> 0
2 - -
choose one of d , d , dO at random.
l
2
(2.7)
In case (a), writing
e-
and noting that
it may be verified that the rules (2.5) and (2.7) are equivalent.
In practice, before using a procedure such as above, one will have to specify,
apart from the score matrix
~,
the numbers 8 , 8 and the
2
1
vector~.
As regards
8 , 8 , it is obvious that by increasing the value of 8 and decreasing the value
1
2
1
of 8 , one can increase the probability of choosing dO at the cost respectively
2
of the probabilities of choosing d l and d •
2
In the next section we shall see
how 8 , 8 should be chosen in relation to the preference scale so as to achieve
1
2
consistency.
The best choice of
~
for the above rule is not immediately apparent.
In [1], for the problem of estimation of 8 under the above set-up, the best choice
of
&in
the estimate e(~) defined above was considered from the point of view of
minimisation of asymptotic variance when 0<8<1.
determination of
~
This led to the simultaneous
and the estimate of 8 by solving (p+l) (nonlinear) equations.
5
In the present context, we shall take up this problem after we formulate and
study the local operating characteristics for procedures of the above form.
Before concluding this section we note here an interpretation of the
above decision rule.
Under (1.1) the process of taking a sample of size nO
from F(O) could be interpreted as that of performing nO Bernoulli trials with
success probability 8 and taking rand nO-r observations from F(l) and F(2)
respectively if the trials show just r successes.
If r were observable it would
be a sufficient statistic for 8 and it is known that here the class of monotone
procedures based on r is essentially complete (see e.g., [3], Chapter 6).
The
above procedure can be looked upon as a (behavioral) randomized procedure based
on r, which we have to adopt due to the unobservability of r.
·e
3.
CONSISTENCY OF THE DECISION RULES
For the problem of section 1, consider a sequence of triplets of sample
sizes (nOV,nlV,n2\1)' v=1,2 •.• , n0\1+nlv+n2v = Nv ' Nv~ as ~, and a corresponding
sequence of decision rules of the form (2.4)-(2.5), based on fixed 8 , 8 , and
1
2
and a sequence of score matiices A (pxN,)'
-v
As in [1], we make the following
v
assumptions.
ASSUMPTION 3.1
There is a number A*(O<A*<l) such that
3
k=0,1,2, for all
ASSUMPTION 3.2
v.
~v = (aVi (a.»i=l, ••• ,p
a.=l, ••• ,N
(3.1)
v
is given by
a.
(a.)
= either CPi (N +l)or E CPi (UN ), 0.=1, •.• ,Nv ' i=l, ••• ,p ~
\1
V
~,
6
(1)
where UN
(N)
< ••• < UN
v
are the order statistics of a sample of size N from the
v
V
uniform distribution over (0,1), and, for each i, inside (0,1)
~.(u)
1
is expressible
as the difference of two square integrable absolutely continuous functions.
Let
2
H (x)
v -
= ~ L ~v F(k)(~),
Nv k=O
(3.2)
K
and further, let the i-th-coordinate marginal cdf's of
F[i](x), HV[i](x).
F(k)(~), Hv(~) be
Write
00
11(k) =
""vi
(k)
~v
Suppose
~
ASSUMPTION 3.3
J ~i(HV[i](x»dF~~~
(x), i
= 1,
. . ., p
-00
(k)
(k)
= (]..IVI ,···,]..Ivp )
,
(3.3)
is so chosen that the following holds
e·
lim inf
\)"'+00
If we have some knowledge about how F(l) and F(2) differ, choosing the score
functions
~i(u),
i=l, ••• ,p, and
&,
it would be possible to ensure that Assumption
3.3 holds (see [1] section 3).
Let a(k) and T
V
(nOv' n lv ' n 2V ) and
as
(~) be defined as in (2.3) and (2.4) corresponding to
V -
~v·
From theorem 2.1 of [1] under Assumptions 3.1 and 3.2,
\)"'+00
(3.4)
Hence, by Assumption 3.3,
(3.5)
From (2.5) and (3.5), it follows that, to study the limiting probabilities of
choosing d l , d 2 , and dO' for the sequence of decision rules, we have only to
7
TV(~) ~
study the limits of the probabilities of the three events (i)
8 ,
1
~
8 ' (iii) 8 Z < TV(~) < 8 1 respectively. As shown in [1], under
Z
P
the assumptions made T (~) + 8. Hence, we conclude that the probabilities of
(ii) T (&)
V
V -
choosing d , d Z' and dO tend to 1 for 81 < 8
l
respectively.
Further~
~
1, 0
~
8 < 8 ' and 8 Z < 8 < 8
Z
1
at 8=8 , the probability of choosing either d or dO>
1
l
and at 8=8 z,the" probability of choosing either d
Z
or dO tends to 1.
Thus if
and
where 0
~
LZ < Uz < L < U
l
l
~
(3.6)
1 are as defined in section 1, then the sequence
of decision rules is consistent in the sense that for each 8 the probability of
choosing a preferred decision tends to 1.
-e
The conclusions above remain true if
~
is replaced by
~v'
where
~V
is a
sequence of non-null vectors varying with V, provided (i) the elements of
uniformly bounded, and (ii) Assumption 3.3 holds when
at the end of section 3 in [1].)
-v replaces &.
--
~
~
-v
are
(See remark
Further, even when 8 , 8 are replaced by 8 ,
1
Z
lv
8 Zv ' the sequence of decision rules is consistent in the above sense provided
(3.7)
whatever V.
4.
ASYMPTOTIC LOCAL OPERATING CHARACTERISTICS
We first consider some general results about sequences of random vectors.
Let
~
-v
be a sequence of p-vectors, L (pxp) be a sequence of positive definite
-V
matrices, and X be a sequence of random p-vectors asymptotically distributed
-v
as
N(~
, L ) in the sense that, for every p-vector
- -v -v
distributed as
N(t'~
-V
, t'L t).
-
-v-
~10, ~'X
- - --v
is asymptotically
We denote the minimum and maximum characteristic
roots of any positive definite matrix
~
by
m(~)
and M(B) respectively.
The
8
following lemma is implied by lemma 1.2 in [1].
LEMMA 4.1
If
m(~)
M(E ) > 0
lim inf
\)-+00
(4.1)
-v
then for any sequence of non-null vectors -~v. , -v-v
~'X
is asymptotically
distri...._ '__
_
-----~---""-'--.....:...;'------------
--.:.._~_~_t_...
~ ).
-v E
-v-v
N(~'~v'~'
buted as
-v-
LEMMA 4.2
If P = P l +P2'
X =
-v
[~VI (PIXI)]
~v2(P2Xl)
'
}!v
i:::}
E
-v
=
(~Vll
(~'lX
-v -v l' ~'2X
-v -v 2)
-v12 (Pix P2)\
P x P
E
2
2
-v22
») ,
~V2l (P 2x P2)
then for any two sequences of non-null vectors
dition (4.1),
E
(PIx PI)
~Vl(PlXl), ~V2(P2Xl),
under con-
has asymptotically a bivariate normal distribution
with means ~~l~Vl' ~~2~V2 and dispersion
E
t
[£.
-vI -V11 -vI
~'
;V2]
E
~'
-v2 -v22 -v2
~
~' E
-v2l -VI
-v2
Proof:
E
-vI -v12
Follows from Lemma 4.1.
Let
L
r
rl + ..• +r~g
c
rlr2···rp
xl
l
r
x2
r
x P
p
2
be a g-th degree polynomial in the elements of x=(xl' ••. ,x )' and let
-
the class of all such g-th degree polynomials.
distributed as
N(~
N(~
-V
Let
~v
~
g
denote
be as above asymptotically
Further, for each v, let ~v be exactly distributed as
v _
, E).
, L ).
-v -v
LEMMA 4.3
P
Under condition (4.1),
~ \)-+00,
•~.
9
Proof:
If 9 (pXp) is a matrix such that gv ~V C~
v
tributed as N(2,!).
Writing
~(pxl)
tion N(2,!) and denoting 9v(~-~v)
= I,
then 9v(~v-~v) is dis-
for a random vector following the distribu-
= ~v'
we have to show
(4.2)
··e
as~.
Now,as (4.1) holds,by Lemma 4.1, Y is asymptotically distributed as
v
N(~,!).
From a result of Ranga Rao ([11] Theorem 4.1) we get that if the random
vector (Z l' .•• 'Z
V
W
(~l' .•• '~
) converges in law to a random vector
~l'
cdf of every linear combination of
•.•
'~m
) and i f the
m
is continuous, then for any numbers
Here ljJ(Y ) can be considered as a linear combination of the power products
-V
of the elements of Y •
-V
Since
~v
converges in law to
n,
the set of power products
of the elements of Y converges in law to the corresponding set of power products
-V
of the elements of
n.
Since
condition obviously holds.
n is
-
distributed as
Hence, (4.2) follows.
N(~,!),
the required continuity
Q.E.D.
Now consider, as in section 3, a sequence of triplets of sample sizes
(nOv,nlV,n2V)' nOV + n lv + n 2V
e
= Nv~
subject to assumptions 3.1 and 3.2.
and a sequence of score matrices
Here, we suppose there is a corresponding
sequence of triplets of p-variate cdf's
F(O) (x)
V
-
~v(pXNv)
(F~O) ,F~l) ,F~2» such that
(4.3)
10
for some fixed 8, for all v.
Let independent samples
~~~),
a=l, •.• ,nkv ' k=0,1,2
be taken from F(k) k=O 1 2 and on the basis of these and A , a(k) and ~(k)
v'
" .
-v -va
-v
be defined as in (2.2) and (2.3).
We consider the decision rule (2.5) (where
~, 81 , 82 are given) based on ~(k), k=0,1,2 and denote the corresponding prob-v
2
given 8 by L (8), k=0,1,2, L L ,,(8) = 1. As seen in
kv
k
k=O k v
F(l)
F(2)
section 3, if F~O),
v ' v remain fixed, then as v~, Lov (8), LlV (8), and
ability of choosing d
L (8) tend to 1 for 8 < 8 < 8 , 8 < 8, and 8 < 8 respectively.
2V
2
1
1
2
Therefore
to keep L (8), k=0,1,2 informative, as v+oo we impose the condition
kv
(4.4)
x
We shall show that under some further assumptions, it will be possible to find
certain mathematically tractable, meaningful functions
every 8, as ~, ILkv(8)-L~v(8)I~o, k=0,1,2.
L~v(8),
such that for
We shall call L~v(8), k=0,1,2 the
asymptotic local operating characteristic (ALOC) functions for the sequence of
decision rules.
Let us write
v1r
-(2) -(1)
v (~v -~v )
z
=
Z
= ~ {8 ~(1)+(1_8 )~(2)_~(0)}
v
I-v
1 -v
-v
Z
= ~. {~(0)_8 ~(l)-(1-8 )~(2)}
v -v
2-v
2 -v
-vI
-v2
-v3
(4.5)
Clearly, Z l' Z 2' Z 3 are related by
-v
-v
-v
(4.6)
From (2.7) and (4.5), our sequence of rules is given by
e-
11
(a)
When &'~Vl :; 0,
(i) choose d
(ii) choose d
l
2
i f (&'~Vl)(~'~v2) > 0
if
(~' ~Vl) (f ~V3) > 0
(iii) choose dO i f (~'~V2)(~'~V3) > 0
(b)
t'z
When
- -v
(4.7)
1 = 0, choose one of d , d 2 , dO at random
l
Thus the asymptotic behaviour of L (8), k=0,1,2, will be known if we can
kV
find the asymptotic form of the three pairwise marginal distributions of
t'z 2' t'z 3.
- -v --v
For
this~we
&'~Vl'
shall make use of the following result which is a
straightforward multivariate extension of Hajek's [4] theorem 2.4.
is essentially contained in Puri and Sen [10].
(The result
In Proving it one should first
consider the case of score functions with bounded second derivatives and then
-e
use the polynomial approximation technique of Hajek [4].)
, ••• ,Y
)' a=l, ••. ,N~" be independently
-va = (Y"l
v a
vpa
v
. For each v=1,2, ... , let Y
distributed random p-vectors, Y
-va
vector obtained from Y
-va
having a continuous cdf F
(y).
va -
Let the rank
by replacing the variate values by their coordinate-wise
ranks be
For each v, let
~v
= (aVi(a»
be a score matrix subject to Assumption 3.2, and
let
-va = (av 1(1v 1a ), ••• ,avp (I v pa
a
Define
a=l, ••• ,N .
N
s(r)
-v
=
V
L
a=l
c (r). a
va
-va
, r=l, ••• ,h,
s' = (s(l)', ... ,s(h)')
-v
where for each
»,
v,
-v
-v
v
12
is a matrix such that in at least one row all the elements are not equal.
Write
B
-v
= (b)
v.rs r,s=l, ••• ,h
•
_ such that
Suppose there is a sequence of p-variate cdf's FV (y)
Max
l<a.<N
Sup IFva.(~)-Fv(~)
--v
y
I~
(4.8)
O.
Write
1
CPi =
f
0
00
=
q>.(u)du, A
v.ij
1
~v = (AV.ij)i,j=l, .•• ,p
(4.9)
where FV[i]' and FV[i,j] denote the marginal cdf's of F corresponding to the
v
i-th coordinate and i-th and j-th coordinates.
THEOREM 4.1
If
(i) b
v.rr
,r=l, •.• ,h are uniformly bounded for all v
(ii) lim inf
~
m(B ) > 0
-v
m(B )
-v
(iii) lim inf
~
(iv) lim inf
~
> 0
N max ( c (r) -c-(r»2
v r,a. va. v
m(A) > O.
-v
(4.10)
e-
13
then
~,,-E
-- v
-v is asymptotically
- distributed as N(O,
- B
-v 0 A
-v ) (0 stands for
S
Kronecker product).
Note that condition (i) can always be realized by suitably normalizing
Because of (i), in the statement of conditions (ii) and
the elements of S .
-v
(iii) we can equivalently replace m(~v) by I~vl.
Similarly, as by Assumption
3.2, the elements of ~v are all bounded, in (iv) we can replace m(~v) by I~vl.
Now, going back to the problem of finding the asymptotic behaviour of the
pairwise marginals of
~'Z
- -v l' -~'Z-v 2 and -~'Z-v 3' let us set up
(4.11)
From (4.3) and (4.4) we then have, as v+oo
-e
Max
k=0,1,2
IF~k)(~)-Fv(~)
Sup
x
I+
Let A be defined with respect to (4.11) as in (4.9),.
-v
ASSUMPTION 4.1
lim inf
0
We impose the following.
m(A ) > O.
-v
v+oo
Now if we write Z given by (4.5) in the form
-vr
n
2
Z
-vr
k
I I
k=O a=l
=
c
(r)
(k)
vka ~va '
r=1,2,3,
it is easily checked that
n
2
I
k=O
k
I
a=l
C
(r)
vka = 0,
r=1,2,3
and that
b
are given by
v.rs
=
2
\
nk
\
(r)
(s)
cvka cvka '
k=O a=l
L.
L.
(4.12)
r,s=1,2,3
14
N (_1_ + _1_>,
v n
n 2v
lV
b .
V l1
==
b
== -N { - -
8
v.12
v n
1v
82
.
b
b
v.22
==
(1-8 )
1
n
}, b v.13
2v
1
N {-.1:- + -.1:- +
V nOv
n lv
(1-8
n
1
. (1-8 2)
82
--}
V n
n
2v
1v
== -N {
)2
},
2v
b
v.23
==
(1-8 )(1-8 )
8 8
1
2
-N {_1_ + ~ +
n
}
V nOv
n lv
2v
(4.13)
v.33
Because of (4.6h the 3x3 matrix (b .rs ) is singular.
v
However, it is easily
checked that the 3 principal submatrices
(4.14)
are positive definite.
and the fact 8
1
> 8
2
Further, it may be seen that by virtue of Assumption 3.1
each of the three matrices (4.14) satisfies conditions
(i)-(iii) of Theorem 4.1.
By our Assumption 4.1, we can hence conclude that
each of the three 2p-vectors
z -vI
[-v2
Z
E -vI
Z ]
- E Z
-v2
z , -vI
[-v3
Z
E
- E
- E
~V2]
- E -v3
is asymptotically normal.
Now let us write
(4.15)
From Lemma 2.1 in [1] it follows that under (4.3) and Assumption 3.2, as v+oo,
~ E{a(O) - 8 a(l) - (1-8)a(2)}
+ o.
v
-v
-v
-v
(In [1], the result is actually proved for fixed F(l), F(2), and
F(O)
==
8 F(l) + (1-8)F(2).
But a close examination of the proof shows that
e-
15
the result remains true even when F(k) is replaced by F(k) k=0,1,2).
V
From
(4.15)-(4.16)
E Z
-v3
-(8 -8)0
2
-v
(4.17)
+ 0
-
Hence by Theorem 4.1 each of the 2p-vectors
[
~Vl
z -vI
[ -v3
Z -
-v2
(4.18)
is asymptotically normal with null mean vectors and dispersion matrices
respectively.
~
A
(b
So for any
y!?,
(b V.rs ) 1,2
-v'
)
v.rs 1,3
@
A
(b
-v'
)
V.rs 2,3
~
(4.19)
A
-v
each of the three pairs (~'~Vl-~'£V' ~'~V2-(8-81)~'£)'
(~'~V1-~'~V' ~'~V3-(82-8)~'~v)' (~'~v2-(8-81)~'~V' ~'~V3-(82-8)~'£v) is
-e
.asymptotical1y bivariate normal with means 0 and dispersion matrices
(b
)
~ ~A ~,
V.rs 1,2 - -v-
(b
)
~ ~A ~,
V.rs 1,3- -v-
(b
)
t ~A
~
V.rs 2,3- -v-
(4.20)
respectively.
~'Z
Since
- -v
1 is asymptotically normal, for finding the ALOe functionsof the
sequence of rules (4.7), we can ne.g1ect the possibility
~'Z
- -v
1=0.
Let us denote
by .;.v
r;:
z:;, = (·~'l(lXp), t'2(l Xp), Z:;'3(l Xp»)
.;.v
-v
~V
-v
a random 3p-vector such that for each
(4.21)
v, Zv follows a normal distribution with
mean vector
(0'
(8 -8)0')'
v' (8-8 1 )0'
-V'
2
-v
(4.22)
and dispersion matrix
(b
As (b v.rs )1 , 2 , 3 is of rank
rank 2p.
)
V.rs 1,2,3
@ A
-v
(4.23)
i, this is a singular normal distribution of
In fact, it may be checked that just as in (4.6) we have
16
(4.24)
For any ~2, the pairs (~'~Vl-~'2v' ~'fv2-(8-8l)&'2v)' (~'~l-~'~v'
. ~'fV3-(82-8)&'~v)' (~'~v2-(8-8l)~'2v' ~'fv3-(82-8)~'2v) follow bivariate normal
distributions with zero means and dispersions given by the matrices (4.20)
respectively.
From (4.13) and Assumption 3.1 it follows that each of the three
matrices (4.20) satisfies condition (4.1).
Therefore, using Lemma 4.3, we get
that for the rules (4.7) we have
(~
means the two sides have a vanishing difference), where
e-
L~V(8)
L*O (8)
v
= p{ (R, 'f; v 2) (R, 'f; v
3) >
-
(4.25)
O}
Using (4.24) it may be checked that
L* (8) + L* (8) + L* (8)
Ov
Iv
2v
=1
For any two numbers h, k and Ipl<l, let us write
H(h,k;p)
J(h,kjp)
=
1
27T/1- p 2
Jh fk
-00
2
2
2
e-[x-2 pxy+ y ]/2(1-p)dxdy
(4.27)
= J(-h,-kjp)
(4.28)
-00
H(h,kjp) + H(-h,-kjp)
If X,Y have a bivariate normal distribution with means ~l' ~2' variances
a~, and correlation coefficient p(-l <
p{XY ~ O}
af'
P < 1), we have,
~2
= J(~l
a' 0-;
1
2
p)
(4.29)
17
So i f we write
b
v.rr
2
= b V.r
(4.30)
b V.rs /b V.r .bV.s = P"v.rs , r#s=1,2,3
i'o
- -v
(i' A i)~
- -v-
=
Y (i)
v-
(4.31)
from (4.25) and (4.29) we get the expressions
L!v(8)
(8-8 )
1
J(_l_ Iyv(~) I, b
Iyv(~)
bV• l
v.2
J(_l_ Iyv (~)
L~v(8) =
bV. l
I,
I;
p
V.12)
(8 -8)
2
Iyv(~) I; pV.13)
bV• 3
(8 -8)
(8-8 )
2
L~v(8) = J(b lIYv(~)I, b
IYv(~)I; PV.23)·
·e
v.2
L~v(8),
In order that
(4.32)
v.3
~,
k=0,1,2 may remain meaningful as
that Iyv(~) I/b v • , r=1,2,3 remain bounded away from zero and
r
00.
we require
From (4.13)
and (4.30) it is seen that by Assumption 3.1, b v >1, r=1,2,3 remain bounded as
.r
~.
By Assumptions 3.2 and 4.1,
~'~V~
is bounded away from both 0 and
the following assupption would ensure the meaningfulness of
ASSUMPTION 4.2
L~v.
(i) 0 < lim inf I~'~vl, (ii) lim sup I~'~vl <
V-+OO
So
00.
00
~
For future reference we note here the modified assumptions
(i) 0 < lim inf 0'0', (ii) lim sup 0'0
ASSUMPTION 4. 2A
-v-v
For Assumption
the elements of
A
~
~i
~
-v-v
<
00
4.2 (i) to hold 4.2A(i) is necessary, and unless some of
are zero, for
4.2(ii) to hold, 4.2A(ii) is necessary.
If
( u ) , i=1,2, .•. ,p and F(1) ( ~,
') F(2)
v (x)
_ are so chosen that, while ( 4.4) holds,
v
Assumption 4.2A is realised, then we can always find
~
appropriately to satisfy
18
o-v
Assumption 4.2.
given by (4.15) can be simply expressed in terms of ,. and
1
F(k) if Assumption 3.2 is slightly strengthened (see Hoeffding [5]).
v
rule~ ~
So far we have assumed that, for the sequence of decision
fixed.
If instead in the rule corresponding to sample sizes (nOV,n
i. e., in (4.7)
~
-
is replaced by
~
, where
~
-v-v
remains
lV ' n2V )
is a sequence of non-null vectors,
the ALOe functions would still be given by (4.25) (or equivalently (4.31)-(4.32»
with
~
~.
- replaced by -v
To see this note that by Assumptions 3.1, 3.2 and 4.1
and (4.13), each of the three matrices (4.19) satisfies condition (4.1).
(The
latent roots of a Kronecker product are obtained by multiplying the roots of one
factor matrix by those of the other).
Hence, applying Lemma 4.2 we get that
( ~'Z
-,R,'o
-~'o
~'Z
-v-vI -v-v' -vl-v2
~'o
-v-v
,~'Z
-v-v3
-(6-6
)~'o)
1 -v-v'
(~'Z
~'Z
-v-vI -v-v'-V-V3
2
-6)~'o)
-V-v'
(~'Z
-v-v2
-(6-6)
1
-(6 -6)~'o ) each have bivariate normal distributions with means 0
2
-v-v
and dispersion matrices obtained by replacing
the argument is as before.
-v should be such that
00.
&in
(4.20) by
~v.
The rest of
Of course, here, in order that the ALoe functions
may be meaningful, ~
both 0 and
-(6
Iyv (t-v ) I
remains bounded away from
In view of Assumptions 3.2 and 4.1 this means we require the
following.
ASSUMPTION 4. 2B
(i) 0 < lim inf
v-+oo
t' 0
I -v-v
I
1
(~'~ )~
-v-v
(ii) lim sup
V-+oo
t' 0 I
I-v-v
1
(t' ~ )~
<
00
-v-v
Before concluding this section we prove a lemma which throws light on the
forms of the ALOe functions.
LEMMA 4.4.
For fixed h and p, J(h,kjp) is increasing in k for pk<h and decreasing
in k for pk>h.
Proof:
From (4.27), (4.28)
e-
19
1
J(h,k;p)
21TIl-p2
[1
e
{hJ e - 2<x-PY)'
(1-p2)
J':2y2
J
dx dy
-£:t:J
00
+
f
k
e-'<Y'{
t
e
(x-Py) 2
2(1-p2)
dx}Y]
Hence,
oJ
ok
Hence
e
(h,k;p) =
-k 2 /2
21T/1-p2
oJ
ok
>
=
[1
e
(x-pk) 2
2(1-p2)
00
dx -
f
e
(x-pk) 2
2(1-p2)
h
or < 0 according as pk<h, pk = h or pk>h.
dX]
Q.E.D.
From (4.32) and Lemma 4.3, it follows that for any v, Lfv(8) would be
·e
increasing in 8 provided bv • 2 > bv . l •
Pv.12(8-81~
or by
(4.30~
provided
(4.33)
From (4.13) it is seen that (4.33) holds for all 8 •. Similarly for any v,
L~v(8)
would be decreasing in 8 provided bV3 > bV1 PV.13(8 2-8) which always holds.
The behaviour of
L~v(8)
follows from those of L!v(8) and
L~v(8)
and relation
(4.26) .
5.
THE BEST CHOICE OF COEFFICIENTS
In this section, under the set up of Section 4 we consider the best choice
of the compounding vector
&from
the point of view of ALOC functions.
We first prove a lemma involving the function J(h,k;p) defined by (4.27)(4.28) .
LEMMA 5.1
For fixed h>O,
increasing in u>O provided
k~O,
(h,k) # (0,0), and P, J(uh, uk;p) is monotonically
20
(S .1)
Proof:
Let us define for any two numbers b,a
1
T(b,a)
a
= -J
2iT 0
- ~b2 (1+x 2)
e
dx
1+x 2
(S.2)
Clearly then
(S.3)
T(b,a) = T(-b,a), T(b,-a) = -T(b,a)
Let ¢(x) stand for the standard normal cdf.
Owen [8] has shown that
whatever the numbers h, k, H(h,k;p) given by (4.27) can be expressed as
H(h,k;p) = ~[¢(h)+¢(k)]-T(h, (~ - p)(1_p2)J~)
-T(k,
(kh -
p)(1_p2)
J~
)-R(h,k),
(S.4)
e-
where
R(h,k)
=~,
if
hk<O, or hk=O but h+k<O
= 0,
otherwise.
From (4.28), (S.3) and (S.4) we get that for
J(h,k;p) = l-2T(h,
h~O, k~O,
(h,k)1(O,O)
2 ~
(hk - p)(l-p 2) J~) -2T «
k, h
k - p)(l-p)
)-R*(h,k) where
R*(h,k) = 0
if
h>O, k>O
~
if
hk=O
=
Hence for
h~O, k~O,
(S.S)
(h,k)1(0,0),u>0
where R*(h,k) is as in (S.S).
From (S.2) we see that for a>O, b>O, T(ub,a) decreases monotonically with u.
Since under (S.l),
(* -
p) ~ 0, ~ - p ~ 0 with at least one inequality strict,
from (S.6) the lemma follows.
Q.E.D.
21
Note
Given h2:.0, k2:.0, (h,k):;'(O,O), whatever P, I(uh,uk;p) tends to the limit 1
~
(when both h, k > 0) or
(when h or k=O) as
As proved above for
h
p~min(k'
. (h
k) t h e approac h may
However f or p>m1n
k' h
this approach to limit is monotonic.
not be monotonic.
u~.
To see this suppose O<k<h and p >
k
h.
Then from (5.5), (5.6)
and (5.2), we can write
2 J-1 h
(l-p) (k - p)
J(uh,uk;p) = 1
1
+!
TI
TI
f
°
(5.7)
Differentiating (5.7) with respect to u, we get
u(h-Pk) (1_p 2 ) J-1
d
f
TIU du J(uh,uk;p)
°
-e
_ uheJ-1U2h2
U(hf-k)(1-p2)~e_X2/2
°
Now we can choose u>O small enough to make uke
choose p >
~u2k2
*
< uhe
~u2h2
•
dx
(5.8)
Then we can
close to 1 so as to make the two integrals in (5.8) sufficiently
close making (5.8) negative.
We now make the following assumption.
ASSUMPTION 5.1
For sufficiently large
v,
82
n
8
- - < -lv
- < -11-8 - n
- 1-8 .
2
As 8
1
and 8
2
are close to 1 and
this is not very restrictive.
2v
1
° respectively,
in view of Assumption 3.1,
By (4.13) and (4.30), in (4.32) we have Pv.23<0
and Assumption 5.1 ensures that for large V, PV.12~0, PV.l~O.
(for
8~81)' L~v(8)
(for 8<8 2 ), and L5v(8) (for 8 2<8<8 1 ) all satisfy the con-
ditions of Lemma 5.1.
is not true)
Thus L!v(8)
(It is easily checked that without Assumption 5.1 this
Hence, it follows that, for each
v,
if we choose t so that
k
h)
22
is maximized, Lfv(8), L~v(8), and L~v(8) given (4.32) are respectively maximized in the domains
8l~8~1, 0<e~82'
and 8 2<8<8 1 ,
It is known that
(5.9)
(say)
max
~;'2
and this is attained for
R,
-
where g
= R,
(5.10)
-v
;' 0 are arbitrary numbers.
v
The ALOe functions of the sequence of decision rules corresponding to this
R,
-v
are obtained from (4.25) and (4.32) as
ePv.i3)
!:!"v
-b--;
v.3
where!:!"
(4.30).
V
is given by (5.9) and b
V.r
, P
, r,s=1,2,3 are given by (4.13) and
v.rs
In order that these may remain meaningful as
v~
we require Assumption
4.2A which together with Assumptions 3.2 and 4.1 ensures that!:!,.
away from both 0 and
00.
Assumption 3.1 ensures that b
We call (5.11) as the ideal ALOe functions.
v.r
V
remains bounded
> 1 remains bounded.
As (5.11) is obtained from (4.32)
by replacing IYv(~)1 by!:!,.v' from the remarks at the end of section 4,
is monotonic increasing and
L~v(8)
is monotinic decreasing in e.
Lfv(8)
Further, from
Lemma 5.1 it follows that under Assumption 5.1 we can increase the value of
Lfv(8) for e1~8~1, the value of L~v(8) for 02828 2 and the value of L~(8) for
8 2<8<8
1
by making 6
v larger.
In practice by enlarging the set of observed
23
~V
variables, generally, the value of
would increase (see the remarks in Section
5 of [1]), and hence, the performance of the procedure would improve.
~v
Clearly for 81<8~1, L!v(8) ~L!v(8l) = J(~, 0; P" 12) and as ~v becomes
vl
v.
large this lower bound approaches~. (For P • =0, the lower bound is exactly
V 12
whatever
~v).
For any 8>8 1 , however L!v(8) would approach 1 as
L~v(8)
Similar observations apply to
that as
~v
over
0<8~82'
As regards
increases its value at both 82 and 8 approaches
1
~
~
v
becomes large.
L~v(8)
we observe
'and at any inter-
mediate point approaches 1.
DECISION RULE USING ESTIMATED COEFFICIENTS
6.
In this section we consider decision rules that would be obtained if in
-e
(2.4)-(2.5) we substitute for the coefficient vector
of the best choice
~v
~
some 'sample estimate'
given by (5.10).
Given a sequence of samples
~~~),
(6.1)
a=l, •.• ,nkV ' k=0,1,2,
Let
(k)
-va =
a
-(k)
=
a
-v
(a
"(k)
v.ij
1
=~v
=
Let
(k),
v. 1a , ... , a v.pa )
-(k)
(a
be defined as in (2.2) and (2.3).
(J
(k)
~v
2
a=l
(o(k) ,)
-(k)
v. l, ... ,av.p )
,
Further, let
a
(k)
,
v.~a
'_
a
(k)
v.ja
v.iJ i,J-l, ••• ,p
,
-(k) -(k)
a
,a
v.~
,
v.J
k=O,1,2
(6.2)
q~k), k=0,1,2 stand, in general, for some nonnegative valued random
variables (possibly depending on the samples) such that
bounded away from
° and
00,
say
~
2k: O q~k)
is uniformly
24
(6.3)
" by
Define the sequence of vectors £v
(6.4)
some arbitrary non-null vector, when E q(k) f(k) is singular.
V
k
-V
Set up
(6.5)
whenever the denominator is nonzero.
Consider the following sequence of
decision rules obtained by replacing £ in (2.5) by "£ :
-v
" -(2) -(1)
(a) When £'(a
-a
)fO,
-v -v
" > 8
(i) choose d , if T
l
v- 1
(ii) choose d , i f "T < 8
-v
v- 2
2
e-
(iii) choose dO' i f 8 < "T < 8 .
2
1
v
( b) When t'(a(2)_a(1»=0,
-v -v -v
c hoose one
0
f d l' d 2' d 0 at random.
(6.6)
We shall see below that when the samples (6.1) are taken from sequences of
~v
populations subject to (4.3) and (4.4) and
so that (6.4) seems a natural 'estimate' of
~v
given by (5.10).
choice (6.4) includes the following special case.
s
simultaneous equations in the elements
£'(a(2)_a(0»
- -v
-v
and £ = (£1' .•. '£ )' given by
- s £'(a(2)_a(1»
N
~\
s ~)r(l) + (l-s) (_V_
n lV -v
nOv
Further, the
Consider the set of (p+l)
-
- -v
~~k)_~v ~ g,
is defined as in (4.9),
-v
N
p
= 0
+ (1-s)~)f(2)]£ =
n2v -v -
-(2) -(1)
~v
-~v
.
(6.7)
Let us consider an interval (-£, 1+£) where £>0 is a number to be chosen
suitably small.
Let 8"
v stand for a solution of s of (6.7) in this interval,
whenever such a solution exists, and otherwise, be arbitrarily defined in (0,1).
25
Then taking
(0)
= 0,
qv
A
NV
A
NV
(1)
= 8 (-.- + 8 - )
qv
V n
V nOv
Iv '
N
A
V
(2)
= (1-8 ) ( - + (1-8 )
qv
V
V nOV
A
N
V
-;;-)
,
(6.8)
2v
by Assumption 3.1, for a suitably small £, q(k) would be nonnegative and (6.3)
V
would hold.
Further, ~
given by (6.4) would be equal to the
-v
(6.7) whenever ~q(k)f(k) is positive definite.
V
-V
A
by (6.5) would equal 8 •
V
~-solution
of
A
Hence, in this case T
V
given
From the results of [1] it follows that under mild
assumptions, for a suitably small £, with probability approaching 1 the equations (6.7) have a solution in (-£, 1+£) and ~q(k)£(k) is positive definite,
V
whatever be 0<8<1.
-e
-V
Also, as shown in [1] within a particular class of rank
based estimates of 8,
8v
has the minimum asymptotic variance.
Thus, for q (k)
V
given by (6.8), (6.6) can be interpreted as the rule based on a 'best estimate.'
(In [1], it was implicitly assumed that 0<8<1 and accordingly s was
allowed to lie in [0,1].
Here, as
0<8~1,
we allow s to lie in (-£, 1+£), and
for a suitably small £ the results carryover).
Consistency
First suppose that the samples (6.1) are taken from fixed popu-
lations F(O), F(l), F(2) subject to (1.1).
Set up
(k) =
aV.1
.j
~(k)
-v
where
= (a(k~.)
V.1J i,j=l, ••. ,p
Hv(~), ~~~) are as in (3.2) and (3.3).
(6.9)
From Theorem 2.1 and 2.2 in [1]
we have, as v-+oo
-(k)
(k) ~
a -11
-V
J:V
o,
(6.10)
26
~~k),
Now,if the characteristic roots of
lim inf 1~(7)-~(:)
V1
V1
I is
-(2) -(1)
~'(a
-a
- -v
-v
O~and
positive for at least one i (see Assumption III in [1]),
~ q~k) ~~k)
from (6.3) and (6.10) it follows that
A
k=0,1,2 are bounded away from
is p.d. in probability and
k
) is bounded away from 0 in probability.
Further i~{8a(l)+
-v -v
(l-8)a(2)_a(O)} converges to 0 in probability (see the argument in Section 4
-v
of [1]).
-v
As, when ~ q(k) f(k) is p.d. and t'(a(2)_a(l»
k
A
it follows that T
v
P
+
v
8.
-v
-v -v
-v
# 0
Hence the rules (6.6) are consistent in the sense of
Section 3, provided (3.6) holds.
ALoe
functions
Now suppose as in Section 4 that the samples (6.1) are taken
from sequences of populations
F~k),
k=0,1,2 subject to (4.3) and (4.4).
Then
proceeding as in the proof of Theorem 2.2 of [1] and making use of (4.12) it
may be shown that for
~v
given by (4.9) and (4.11),
A(k)
~
-v
P
- v+ -0'
- 11.
k=0,1,2~
(6.11)
and hence by (6.3),
~ (k)~(k) _ (~q(k»1I.
k
qv -v
By Assumption 4.1, this implies that
k V
-v
!
0
-
~q~k) ~~k) is p.d. in probability. Hence,
by Assumption 3.2 and Lemma 1.1 of [1] we get
(6.12)
From the above it follows that, for finding the
(6.6), we may proceed as if
;r:r-V
(a(2)_a(l»_o
V
V
V
~q(k)
~(k)
is p.d.
v
v
ALOe
functions of the rule
Also, as proved in Section 4,
is asymptotically normal so that for asymptotic purposes
e·
27
we may neglect the possibility (b) in (6.6).
Thus, defining Z
,
~Vr
r=1,2,3~as
in (4.5»)as regards ALoe functions, the rule (6.6) is equivalent to:
Z' (Lq(k)E(k»-l Z
> O.
choose d , i f ~vl
V
~v
~v2 l
k
Z' (Lq(k)E(k»-l Z
> O.
choose d , i f ~vl
V
~V
~v3 2
k
Z' (L (k)E(k»-lZ
Z' (L (k)E(k»-lZ
< 0, ~vl
choose dO' i f ~vl
qv ~v
~v2
qv ~v
~v3 < 0
k
k
From the results of Section 4, we get that
probability.
Hence, by (6.3) and
~vr'
(6.13)
r=1,2,3 are bounded in
(6.l2~
(6.14)
From this we can deduce that
(6.15)
(This follows by noting that (Z'l'Z'
) has asymptotically a 2p-variate normal
~V
~vr
distribution with uniformly bounded means and variances [Assumptions 3.1, 3.2,
4.1 and 4. 2A] .
Hence given any subsequence of (Z'l'Z'
) we can pick a sub~v
~vr
subsequence which converges in law to a fixed 2p-dimensional normal distribution,
-1
and hence, for which ~v
Z'lA~v ~vr
Z has a limiting cdf continuous at O.
In view of
this and (6.14) there can not be a subsequence for which the absolute value of
the difference in (6.15) is bounded away from 0).
Now let
~
~v
l'
~
~V
2'
~
~v
3 be defined as in (4.21)-(4.23).
(4.19) of the 2p-vectors satisfy the condition (4.1).
The dispersion matrices
Hence from (6.15) and
Lemma 4.3 we get that the ALOe functions of the rules (6.6) are given by
28
L!)8) =
P{f~l ~~1 ~V2 ~'O}
L~V (8) =
P{~~l ~V
.
-1
~v3 ~ O}
A
(6.16)
(8)
L~V(8) = 1 - £*Iv (8) - L*
2v
For convenience, hereafter we keep the subscript
V
understood.
Let C(pxp) be
such that
CA
c' = I
(6.17)
~
For §, and 6 2 defined as in (4.15) and (5.9),
length.
-1
C 0 is then a vector of unit
Let P(pxp) be an orthogonal matrix whose first row is 6-1(~2)'.
n =P C
~r
~
~
l; ,
~r
Denote
r=1,2,3
(6.18)
= (n r 1'···' n rp )' (say)
From (4.21)-(4.23), it may be checked that (n1i,n2i,n3i)' i=1,2, ... ,p are
independently normally distributed with means given by
E
n1i
= E n
2i
= E n
3i
= 0,
i=2,3, ... ,p
and with (b
)1 2 3 defined by (4.13) as the common dispersion matrix.
rs "
(6.16) and (6.18) we have
(6.19)
From
A
Lf(8) = P{~i~2 ~ o}
£~(8) = P{~i~3 ~ O}
(6.20)
Or using notations as in (4.30) and writing
1
~r = (~1'···'~
r
rp )' = ~
r n~r
£f(8) =
r=1,2,3
(6.21)
P{~i~2 ~ O} = P{(~1+~2)'(~1+~2)-(~1-~2)'(~1-~2) ~ O}
£~(8) = P{~i~3 ~ O} = P{(~1+~3)'(~1+~3)-(~1-~3)'(~1-~3) ~ O}
(6.22)
e-
29
where (Sli' S2i' S3i)' i=1,2, ••• ,p are independently normally distributed with
means given by
(6.23)
i=2,3, .•. ,p
and with common dispersion (p
)1 2 3·
"
rs
Hence, it is easy to see that, for
r=2,3,
are independently distributed as noncentral
X21 s with same d.f. p.
The non-
centrality parameters, for r=2, are respectively
(6.24)
and for r=3, respectively
(6.25)
So, from (6.22), we get
L!(8)
L~(8)
= P{(1+P12)U l
P{(1+P
13
- (1-P12)Vl ~ o}
)U 2 - (l-P
l3
)V 2
~ o}
(6.26)
Or writing
V
r
U +V
r
=
r
W
r'
r=l,2
(6.27)
(6.28)
From the above, it follows that W (r=l,2) has the doubly noncentral Beta
r
30
distribution (see, for instance [7] pp. 197-198) with shape parameters
and noncentra1ity parameters
~*, ~
r
r
I' I'
i.e., with denisty function
(6.29)
1
B(P+;s, P+22t)
p+2s -1 p+2t _ 1
w 2 (l-w) 2
O<w<l
r=1,2
Hence, using the incomplete Beta-function ratio notation
1
JX
p-1 (1 )q-1 d
) = B(p,q)
I x(
p,q
0 w
-w
w,
O<x<l
(6.30)
we have, from (6.28)
00
\"L
00
~*
~
\".1..., (--±.)
L
2 s -L, (--±.)
2 t
s=O t=O s.
= e
where
~t, ~1
and
~(~*ff
2
~~, ~2
)
2
00
\"
00
t.
~*
~
P12
~
\"...!.-, (~)
s ...!.-(...1.) t
L
2
I 2
L
s=O
t=O s.
II (1+
t
I!.<
(1+P
.~
13
) (£.
2 + s, £.
2 + t ) ( 6 • 31)
(£.
£.
)
) 2 + s, 2 + t
(6.32)
are given by (6.24) and (6.25).
It is interesting to note that for the classical problem of classification
between two mu1tinorma1 populations with unknown means but a known common dispersion matrix, John [6] obtained similar series expressions for the unconditiona1 probabilities of misc1assification.
To study the shapes of these ALOe functions, differentiating (6.31) with
respect to
e
we get
e-
31
1 7T! s 1 7T l t d7T!
L L ;I(T) tT(T) [de{I~(l+p·12 )
s=O t = O ·
00
00
- II (1+
~
.
.E..
P12
(! + s + 1, .E..2 + t)
.E..
(6.33)
) (2 + s, 2 + t)}]
Or using the identity
I.(p,q) = -.E..-+q I (p+l,q) + -S-+ I (p,q+l),
x
p
x
p q x
-e
dL *(e)
J~(7T*+rr ) 00 00
7T*
. 7T
1
d7T
d7T*
- -l = ! . : e
II\'
\' J:..,(-!)sJ:..,(-!)t
{(.E..) l
(.E..) I}
de·'"
s;O t;O s. 2
t. 2
p+s+t 2 + s ~ - 2 + t ~
{II (1+
~
P12 ) (Trs,
!
+ t + I)-II (1+
~
P12
)(! + s + 1, !
+ t)}
(6.34)
As for all s,t
a sufficient condition for (6.34) to be non-negative is
(1 + s)
d7T l
de -
(1 + t)
d7T!
de ~
0
for all s, t
or equivalently
d7T*1
"de ~
0
Using (6.24), conditions (6.35) reduce to
b
- -
b
2
1
<
-
e-8 1
b
2
<1
- b
(6.35)
32
By (4.13) and (4.30) this can be written as
(6.36)
As 8
1
>~,
(6.36) is always satisfied for 8
1
~
8 < 1.
Generally, there is a
number 8! such that
(6.37)
~
for which (6.36) holds for all 8r
8
2
1.
From (6.37) we get
1
1 1 1
1-28 1 1
8* = 8 - [8 2 + ( - + - ) - ( - +
) ]~
1
1
1
nl
n2
nO
n2
(6.38)
If
eor equivalently,
n
-
n
we have 8!
~
al18e[0,1].
2
> 28 -1
1 '
o-
(6.39)
dL!(8)
So we have proved that if (6.39) holds, then
d8
> 0 for
dL!(8)
In general, we have established that
d8
> 0 for all 8c[8r,1].
O.
It may be noted that (6.35) is only a sufficient condition for (6.34) to be nondL!(8)
negative and it is possible that when 8! > 0,
d8
may remain nonnegative to
the left of 8t as well.
The author has been unable to get a full answer to the
question whether L!(8) is nondecreasing in the entire interval [0,1] in general.
In any case, the above shows that, quite generally, to the value of L!(8),
L!(8 l ) gives a lower bound in [8 ,1], and if L <8 <U , L!(U ) gives a lower
1
l
l l l
bound in [Ul,l].
33
As regards '"L~(8), we can similarly show that a sufficient condition for
dd8 L~(8) 2. 0 is
and as 8
2
<
~,
this is always satisfied for 0<82.82,
we can show that
d£~(8)
d8
< 0 in
[0,8~].
Generally, writing
If
we have 8~ ~ 1, so that £~(8) is non-increasing in [0,1].
·e
Here also it is
'"
possible L~(8)
is nonincreasing in [0,1] more generally, but this could not be
proved.
'"
'"
Quite generally, to the value of L~(8),
L~(82)
gives a lower bound in
[0,8 ] and, if L2<8 <U , £~(L2) gives a lower bound in [0,L ].
2
2 2
2
To see how the
to the ideal
ALoe
ALoe
functions of the procedure of this section are related
functions, we note that from (5.11), (6.18) and (6.21), we can
write the latter as
(6.43)
whereas from (6.20)
~
where
(~li' ~2i' ~3i)'
=
P{~ll~21 + I2 ~li~2i ~
O}
=
P{~ll~3l + I2 ~li~3i ~
O}
(6.44)
i=1,2, ••. ,p are independently normally distributed with
means given by (6.23) and with common dispersion (p
rs
).
Exact analytical
34
comparison of (6.43) and (6.44) seems difficult.
If we write B (p: n*,n) for the tail probability to the left of x of the
x
noncentral Beta distribution with shape parameters
!, !
and noncentrality
parameters n*, n, i.e., if
B (p;n*,n)
e
x
4 (n*+n)
Then, from (6.31) and (6.32), we have
(6.45)
Proceeding just as in (6.22)-(6.26), from (6.43) we can similarly show
(6.46)
n~,
where nt, n ,
l
n
2
are as before given by (6.24) and (6.25).
Table 1 gives
the values of the function B (p;n*,n) for a few selected values of x, n*, nand
x
p=1,2,3.
As by our Assumption 5.1,
taken only values x
~
0.5.
~(1+P12)
and
~(1+P13)
are both
Also, from (6.24)-(6.25), we have for
(1+P12)n 1 ~ (1-P12)n!, and for e~e2' (1+P13)n 2 ~ (1-P13)n~.
the values of B (p;n*,n) only for xn > (l-x)n*.
x
~~,
we have
e~el'
Hence, we have given
The values of B (l;n*,n) were
x
computed by using the relation
B (l;n*,n) = J(IXIT + I(l-x)n*, Ixn - I(l-x)n*; 2x-l)
x
where J is given by (4.27) and (4.28) and then using Table 8.5 iri [9].
The
values of B (p;n*,n) p>2 were computed by using the fact that if X is a random
x
variable following the ordinary Beta distribution with parameters
where f
2
l
= (p+n*) (p+2n*)
-1
, f
2 -1
2
= (p+n) (p+2n)
~fl' ~f2'
, then the transform
Xg/[Xg+I-X], where g = (p+2n*)(p+n)/(p+n*)(p+2n), is approximately distributed
e·
35
as a non-central Beta variable with parameters ~p, ~p, TI*, TI.
Dasgupta [2]
has found this approximation adequate and in our case it was found to give
surprisingly close values for p=l.
From an examination of the values in the table,it seems legitimate to
conclude that in [8 ,1] generally a high value for L! is attended by a high
1
value of L!, the value of L! is less than that of L!Jand the difference increases
A
with p.
L~
Similar observations hold for [0,8 ],
2
A
the values of L! and
and
L~.
The falling away of
A
L~
with increasing p is understandable, since the number
of estimated parameters increases with p.
7.
A SPECIAL CASE
In this section, we consider the situation where the preference scale of
Section 1 holds with L =0, Ul=l.
2
Such a situation would arise in practice when
the values 8=0 and 8=1 are of special interest and d
the most preferred decisions only in these cases.
the earlier sections still hold.
best determination of
&from
l
and d
2
are respectively
All that has been said in
But here we can also approach the problem of
another angle.
As the sample sizes increase, here it would be realistic to take 81 and 8 2
closer and closer to 1 and 0 respectively so as to make the procedure more and
more discriminating as regards the choice of d
performance of a particular compounding vector
choice of
&,
l
and d •
2
&,
So, to judge the
and hence, to find the best
we can alternatively proceed as follows.
Given fixed F(l), F(2),
and F(O) subject to (1.1) and sequences of sample sizes n
lV
' n
2v
' nOvas in
Section 3, we can consider the sequence of decision rules based on
8
2v
(+O).
&,
and 8
lV
(+1) ,
The limiting behaviour of the corresponding probabilities of choosing
d l , when 8=1, and d , when 8=0, can then be studied.
2
Of course 8
lv
, 8
2v
are to
be taken so that these probabilities remain bounded away from 1 in large samples.
36
We write P
and d
8
2v
and P
2V
respectively for the probabilities of choosing d l when 8=1
when 8=0 for the decision rule (2.7) based on nOV' n lv ' n 2v ' ~, 8 lv and
2
.
IV
Using notations as in Section 3 under Assumption 3.3, we have from (2.7)
and (3.5)
P
P
~ P{8
Iv
~'a(l) + (l-e
lv- -v
~ p{~,~(O) - e
2v
-v
)~'a(2) - ~'a(O) _> ole=l}
Iv - -v
~'a(l) - (l-e
2v- -v
- -v
)~'a(2) _> ole=o}
(7.1)
2v - -v
Now, by Theorem 2.3 of [1 ],
~~k), k=0,1,2 are defined by (6.9), converges to the standard normal cdf
where
¢(x), and as is well known, this convergence is uniform with respect to x.
K
1
in (7.1), taking e lv= 1 - ---
~
SO,
K
e
2
= ~ , K , K > 0, from (3.4) we get that
2V
I
2
V
(7.2)
(7.3)
As in Section 3, the expressions (7.2)-(7.3) remain valid even when we replace
~
-
by
~
-v
, where
~
-v
is a sequence of uniformly bounded vectors subject to
Assumption 3.3.
From the above expressions it is clear that when we are interested only in
the values e=l,O, for large sample sizes the best choice of
P
2V
would be that for which, subject to Assumption 3.3
~
-
in terms of PI '
v
37
is maximized, 8 being taken 1 or
° according to the situation.
The problem
then becomes the same as that considered in [1] in connection with the estimation of 8 with the restriction that 8=0 or 1.
this best
~
In practice, we may estimate
by solving the equations (6.7) simultaneously.
rule (see the discussion after (6.7»
The corresponding
is the rule based on the 'best' estimate
of 8 proposed in [1].
Just as in the proof of Theorem 4.2 [1], it can be shown
that for this rule P
when 8=1 and P
lv
8.
2v
when 8=0 are maximized in large samples.
CONCLUDING REMARKS
In the foregoing sections we have considered solutions to certain classification-type problems specific to the mixture set-up (1.1).
In practice, these
would be appropriate where we have apriori knowledge of the validity of (1.1).
The following questions naturally arise:
of (l.l)?
(i) How is one to test the validity
(ii) What happens if one tries to apply the proposed procedures even
though (1.1) does not hold?
So far as the author is aware, for ungrouped data,
no satisfactory answer to the first question has yet been found even in the
univariate case.
If the three samples are grouped to give contingency tables
based on the same system of cells, (1.1) implies a composite hypothesis involving
the cell probabilities, and this can be tested by standard methods.
sider the second question, for simplicity, suppose, as
~,
Assumption 3.1 nkV/N ' k=0,1,2, converge to some limits.
v
To con-
subject to
Then write
the limit of (3.2) and define e(k) with respect to it as in (3.3).
H(~)
for
For a given
~, if F(k), k=0,1,2 are such that ~'e(l) < ~'e(O) < ~'e(2), and if we are
prepared to treat It,~(k)_t'~(O)1/(t,~(2)_t,~(1» as a measure of the distance
--
--
- - --
between F(k) and F(O), k=I,2, then clearly the rule (2.5) has some meaning.
The asymptotic theory of Sections 4 and 5, however, no longer applies.
If we
still try to apply the rule (6.6) with t determined from (6.7), trouble may
38
arise in that (6.7) may not have real solutions.
with probability approaching 1, if
-~(1)
~(O)
A real solution would exist
is a convex linear combination of
and ~(2) .
-
We have seen that performance as judged by the ideal ALOe functions (5.11)
has an all-round improvement as
~v
increases.
The author has not so far been
able to prove the corresponding result for the actual ALOe functions (6.16) of
the procedure of Section 6.
The difficulty in the proof arises from the fact
that no compact expressions for probabilities of the type (6.44) (such as that
used for (6.43) in the proof of Lemma 5.1) could be found and the series
expressions (6.31)-(6.32) are not of much help.
However, as the figures in
Table 1 suggest, an increase in the value of the ideal ALOe function is attended
by an increase in the value of the corresponding actual ALoe function and thus
it seems that we can improve the performance of the procedure of Section 6 by
making
~
larger.
In this paper we have considered rules based on the variate wise ranks.
These may be useful in situations where ranks only are available or dependable.
Of course, the model (1.1) is invariant under a transformation of the set of
p variates into another set containing any number of variates.
So, when the
original variate values are at hand we can first apply a suitable transformation
and then use the procedures of this paper on the transformed variates.
opens up a wide range of possibilities.
This
Further, it may be noted that, although
we have based our rules on rank scores, when the original values are available
similar rules based on the values and their transforms can be formulated.
In
that case, to develop the asymptotic theory we would require conditions (like
existence of moments up to a certain order) which would guarantee asymptotic
normality, convergence in probability etc.
Investigations into some of these
various possibilities have been undertaken in the context of the estimation problem
and the results are hoped to be published later.
39
ACKNOWLEDGMENT
The author wishes to thank Professor P. K. Sen for helpful discussions.
REFERENCES
[1]
Chatterjee, S. K. (1972). Rank approach to the multivariate two-population
mixture problem. J. Multivariate Analysis .t 26l-28l.
[2 ]
DasGupta, P. (1968). Two approximations for the distribution of double
non-central beta. Sankhya Ser. B ~ 83-88.
[3]
Ferguson, T. S. (1967). Mathematical Statistics A Decision Theoretic
Approach. Academic Press, New York.
[4 ]
Hajek, J. (1968). Asymptotic normality of simple linear rank statistics
under alternatives. Ann. Math. Statist. ~ 325-346.
[5]
Hoeffding, W. (1968). On the centering of a simple linear rank statistic.
University of North Carolina Institute of Statistics Mimeo Series No.
585.
[6]
John, S. (1961).
1125-1144.
[7]
Johnson, N. L., and Kotz, S. (1970).
~ Houghton Mifflin, Boston.
[8]
Owen, D. B. (1956). Tables for computing bivariate normal probabilities
Ann. Math. Statist. ~ 1075-1090.
[9]
Owen, D. B. (1962).
Reading, Mass.
Errors in discrimination.
Ann. Math. Statist.
~
Continuous Univariate Distributions
Handbook of Statistical Tables, Addison Wesley,
[10]
Puri, M. L., and Sen, P. K. (1969). A class of rank order tests for a
general linear hypothesis. Ann. Math. Statist. ~ 659-680.
[11]
Ranga Rao, R. (1962). Relations between weak and uniform convergence
of measures with applications. Ann. Math. Statist. ~ 659-680.
Table 1
Values of B (p;n*,n)
a
x
~
n*=0.10, n=0.20
n*=10.00, n=20.00
n*=0.60, n=0.80
n*=6.00, n=8.00
n*=0.10, n=8.00
123
1
2
3
1
2
3
1
2
3
1
2
3
.10
-
-
-
-
-
-
-
-
-
-
-
-
.608
.383
.238
.40
.451
.411
.383
.641
.634
.601
-
-
-
-
-
-
.919
.877
.834
.45
.477
.462
.447
.744
.736
.713
.492
.470
.454
.532
.521
.509
.936
.911
.880
.50
.515
.512
.510
.823
.821
.807
.526
.521
.518
.606
.602
.599
.950
.936
.916
a For the empty cells xn < (l-x)n*.
See Section 6.
+:'-
o
e
-
•
e
© Copyright 2026 Paperzz