Deshpande, J.V.; (1967)A class of tests based on U-statistics for the several sample problems."

I
I ;,e
I
I
I
I
I
I
I
A CLASS OF TESTS BASED ON U-STATISTICS
FOR THE: SEVERAL SAMPLE PROBLEM
by
Jayant V. Deshpande
University of North Carolina
~~tute of Statistics Mimeo series No. 506
January 1967
Ie
I
I
I
I
I
I
I
••
I
This research was supported by the U. S.
Air Force Office of Scientific Research,
Grant no. AF-AFOSR-76o-65.
DEPARTMENT OF STATISTICS
University of North CaroHna
Chapel Hill, N. C•
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
1.
Summary and Introduction:
Let x ' ••• , x . be a sample of ,real observations from the i-th popuin
il
~
lation, ~ith cumulative distribution function (c.d.f.), Fi(x), i = 1,2, ••• ,c.
Let the c samples be independent and the Fls be continuous.
~e
shall consider tests for the null hypothesis
In particular
the
In this paper
~e ~ill
follo~ing t~o
be interested in tests for the situations
assumptions is considered 'reasonable.
(i)
~hen
one of
The Fls have
identical functional form and if at all, the differences are only in their
location parameters i.e. Fi(X)
= F(x-~.),
~
~.
~
being any real number.
In such
a situation, a test for H is,in effect, a test for the hypothesis ~l = ~2 =
O
... = ~.
(ii) Again assuming identical functional form and assuming that the
c
differences, if any, to be among the scale parameters, i.e. Fi(x)
= F(x8 i )
8 being real and positive, a test of HOreduces to a test of the hypothesis
i
8
1
= e2 = •.. = ec .
Let c-plets be formed by selecting one observation from each of the c
samples. The total number of c-plets that can be formed in this way is
c
~
n .• Let v .. be the number of c-plets in ~hich the observation selected
. 1
~=
~
~J
from the i-th sample is larger than exactly (j-l) observations ( and smaller
than the other (c-j)
observation~.)
Since the Fls are assumed to be continuous,
the probability of the existance of ties is zero. Let us define u .. =
~J
c
v .. / ~ . n • It is seen that these uls are the generalized version [13J
~J
. 1 i
~=
to c samples of Hoeffdingls U-statistics [8J.
I
c
c
Let us have N = E n , p. = n./N,
i=l i
~
~
Li = E a. uij ' where a's are real
j=l J
constants {not all equal)and
A
=
c
c
E
E
~
~ =
~
as
N{C_l)2 [; p. L2 _ ( c
)2J
2
~
i
E PiL.
•
Ac
i=l ~
i=l· ~
With each of the members of this class ·we associate a test:
at a significance level a if
~
Reject H
O
exceeds a predetermined constant la.
in this paper, show that under H ' ~ is distributed as X2 with c-l
O
degrees of freedom in the limit as N -> 00. Hence for sufficiently large N,
~a
may be approximated by the corresponding significance point of the relevant
x2
distribution.
Tests proposed by Bhapkar [2J,>[3J, Sugiura, [12J, and the.~uthor [5J J
[6J may be seen to be members of this class.
These tests have been prQposed
on heuristic grounds and on considerations of mathematical simplicity (at least,
so far as the tests proposed by the present author are concerned).
In this
paper it is attempted to give unified treatment of the asymptotic distribution of all the u
( i, j = 1,2, ••• ,c) and to explore tests based on them
ij
for the null hypothesis H against alternative hypotheses of interest. The
O
earlier test statistics, mentioned above, were constructed, taking into
account the magnitude of u's under the null hypothesis and the alternative
a's were selected in such a way as to emphasize
the difference between the two.
In this paper we shall obtain a's which
maximize the noncentrality parameter of the non central
which S. has under alternative hypotheses.
2
x2
I
I
We)
later
hypothesis of interest.
-.
I
I
j=l 1=1
Then we define a class of statistics
I
distribution,
Under the null hypothesis, the
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
~
3
]
noncentrality parameter is, of course, zero.
centrality parameter depends on Fj this approach will lead us to the test
which will have maximum asymptotic relative efficiency (in the Pitman sense)
amongst this class of tests against a specified alternative for a particular
F.
This test may be used if the particular F is suspected as likely.
We find a condition involving a's and F's for the consistency of tests
in this class against specified alternatives.
£ is shown to have, asympto-
2
tically, a noncentral X distribution with c-l degrees of freedom.
certain specified alternative
hypotheses.
Under
The a's which give the maximum
of the noncentrality parameter for a given alternative hypothesis are found.
These a's depend on the original distribution F.
The tests with these a's
defining L., will have maximum asymptotic relative efficiency (ARE). amongst
~
this class of tests, against the specified alternative and for the particular
distribution.
Some ARE 's of members of this class with respect to the Kruskal's
H test [9] are computed.
2.
Distribution of uij under H :
O
In this section we prove
Theorem 2.1
Fi(X) , i
]
Since the maximum of the non-
Let X be independent random variables with continuous c.d.f.
ij
= 1,2, ••• ,c,
j
= 1,2, ••• ,
l 2
the distribution of w. j = N / (u. . ~
way as p., (i= 1,2, ••• ,c) remain
~
~J
ni •
1:c ),
Then if Fl(X)
= ... = Fc(x) = F(x),
in the limit as N ->
00
in such a
fixed, is normal with mean zero and the
elements of the covariance matrix L: given by
3
(J • • •
~J,~
(2.1)
t
(C-1) (C-1)
2 [ j-1 [-1
(c-1)
(~C-2 )(2c-1)
J+K-2
1
=
1 J[(C-1)
Pi
c2
2
I
I
-,
+ L: .l.J
r p'~ pr
I
and
(c-1) (c-1)
1
c 1
kt = 1
- "2J[ L: 2 [j-1 [-1
c
r=l Pr
(c-1)
ipk
(~~i:2) (2c-1)
(J • •
~J,
c
- -Pi
c
- -Pk
J.
u .. ,N and p. being the same as defined in section 1.
~J
~
Proof:
Let us define
I
I
I
I
I
(
g)
'j(X
x
••• , xct ) = 1
1tl' 2t2'
~
c
if x it is larger than exactly
i
(j-1) other XIS
= 0 otherwise.
_I
Then it is seen that
u ..
~J
for i, j
= 1,2, ••• ,c.
Obviously u .. are U - statistics generalized to the c - sample case.
~J
They, therefore, under H have asymptotically, as N ->
O
p. remain
~
00
in such a way that
constant, a multivariate normal distribution with
lim
N ->
CD
c 1 (r)
N Cov (ui j , uk u) = L: p gi' k U. where
"
r=l r
J,,,
••• , X )
c
4
I
I
I
I
I
I
I
-,
I
I
I
•-
We evaluate these quantities •
Ii..
J.J
I
= e[~.J.j (Xctc, Xlt , X2t2 "'" Xctc ) ]
1
= Frob [Xit. is larger than (j-l) X's and smaller than
I
J.
- e[ ~ (X
... , X ).ii.u(Xl', ••• ,x., ... , X')]
ij,ii ij l' ••• , X.,
J.
c
J.~
J.
C
I
I
•--
c
S(i)
I
Ie
1
=
I
= Frob [X.J. is larger than (j-l) X's and ([-1) X' 's and smaller
than (c-j) XIS and (c-[) XI 's]
(2.4)
=
(r)
~ij,kt
r
..
~
- c2
1
c2 •
-
-.
•
(c-j) X's]
1
= e[qiij(X1,···,xr,,,·,Xc )· ~ki(Xi'·"'Xr'''·'X~)] - c 2
fi,k
Xr
= e[(
J (~:i)[F(X)]j-l[l _F(x)]C-j-ldF(x)
-00
00
+) (~-2)
J-2
X
r
[F(x)]j-2[1 _ F(x)]C- j df(X)}
r
·(I (i:i)[F(X) ]i-l[l_F(X) ]C-t-ldF(X)
-00
5
I
I
=
e[ (II + I ) •(I + I 4 )] - 12 , say.
2
3
c
It is seen that
and
I
+ I
3
=
4
(C:l) (1 - (t.:i)[F(X)]i-l[l-F(X)]c-i J
00
- J (C_l)2
(r)
• •
~ij,kK
-.
I
I
I
1
I
I
-00
r~i,k
I
And lastly
_I
= e[(~:i)[F(X)]j-l[l
_F(x)]C-j
Xi
.(J(t:~)[F(x)]i-l(l_F(X)Jc-K-ldF(X) +
-00
00
J(t:~)[F(x)]i-2
x.
[l_F(X)]c-K dF(x)}] _ 1
2
c
~
=
J (C:l)
«~=i)[F(X)]j-l[l-F(X)]C-j-
-00
(~-1)(C-l)[F(X)]j+i-2
J-l
I
I
I
I
I
I
I
-.
i-I
6
I
I
I
.I
I
I
!
I
i
~e
,
1
J
(2.6)
~(i)
_ ~(k)
~ij,kK - ~ij,kK·
It is seen that
From (2.3) we write
N C
ii~>
00
(
ov uij,uiK
)
=
1
Pi
g(i)
+ I: l... g(r)
ij,i/
r#i Pr ij,iK
Substituting from (2.4) and (2.5) we obtain
(2.7)
1
=
(c-l)
2
Similarly we write using (2.5) and (2.6)
(2.8)
N Cov(uij'~/)
N -> 00
i# k
lim
=
(C-l)(C-l)
l j-l /-1
1
(c-l)
2
( 2C-2 )(
j+K-2
2c-l
_~ ]
)
2
C
]
.
Hence we conclude that w.. is, in the limit distributed normally with
~J
zero means and the elements of covariance matrix given by 2·1.
j
3.
J
The statistic
~
and the class of tests.
Let us consider linear forms L. of u. j for i
~
~
= 1,2, ••• ,c.
We define
]
J
It is assumed that a
j
are not all equal and are real constants.
j-
J
7
I
I
-.
c
1 c
_
C(L.) = L: a. f. ( u. .) = - L: a. = a say
~
j=l J
~J
c j=l J
Let A = lim N Cov (Li,Lk ).
ik
N -> co
Then A..
~~
c
=
I
I
I
I
I
I
for i, k = 1,2, ••• ,c.
c
L:
L: a. au
. 1 . 1 J ~
J=
• fI
~J,~~
(J..
J=
1
J[
+ L:
-r~i P r
=
c
c
L:
L:
j=lK=l
a.ad
J ~
And
A. k
i~k
=
=
c
c
L:
K=l
a .a fI
J ~
L:
j=l
1
-=-(C_l)2
c
[L:
(J ••
k fl
_I
~J, ~
1
-- -
r=l Pr
£.. - £..
Pi
Pk
J[
c
L:
j=l
L:
K=l
I
aJ.a u
~
I
(3.2) and (3.3) can be rewritten using the definition of A given
in (1.1) as
and
A
-
ik -
A
(c-l)
2
c 1
[L:--
P
r=l r
8
c
P
i
I
I
I
I
I
-.
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•I
Hence we conclude that N1/ 2 (k -
a
~
has, in the limit as N -> co,
a multivariate normal distribution with mean vector Q and covariance matrix
The multivariate distribution is singular since
fact it may, trivially, be observed that
bution of NJP.
(Iio -a-
J 0) •
covariance matrix.
20 =
~ ~
=
~Li
=
~ai
=
K~
In
Q. We consider the distri-
It is nonsingu1ar with
% mean
and!::0 as
Here ~~ = (L1 ,···L 1)
, ~o' = (1,1, ••• ,1)1
1
"""0
c- lxc-l
xc- ,
(0, ••• ,0)lxc_1 and!:o = (A. ik )
i,k = 1,2, ••• ,c-l.
(C-~-l)
(T
)
-1 h
is distributed under the
Therefore £ = N.J.
o - a i o ' ~o ~O - a",Jo)
null hypothesis as N -> co, as a X2 variate with c-l degrees of freedom.
Following Bhapkar [2J, we simplify and obtain
£=
=
(C_1)2N
c
2
c
2
2
[~PiL. - ( ~ p.Li } J.
c A
i=l
J.
i=l J.
We have proved the following theorem:
Theorem 3.1
If F = F2 = ••• = F and n. = Np where the p. are fixed
1
c
J.
i
J.
C
numbers such that ~ p. = 1, then the statistic £, as defined in (3.5)
. 1
J.
J.=
above, for any real a. such that they are not all equal, has a limiting
J
2
X distribution with c-1 degrees of freedom •
9
I
It may be noted that Bhapkar1s V [2J and W [3J statistics, Sugiura1s
Vrs and Drs [12J statistics and the L [5J and D[6J statistics proposed by
the author are obtained from r by choosing appropriate a .•
J
4.
Consistency of tests based on
~:
In this section we give a condition for the consistency of tests
based on~, using a lemma of Bhapkar [2J which we q,uote below.
Lemma (Bhapkar):
Let
~(i) = f(i)
(F ,F , ••• ,F )' i
c
l 2
= 1,2, ••• ,g,
be
(i) (Pl,F , ••• ,F ) = ~O(i) for all
c
2
(i)
(i) (Xll, ••• ,Xl ;
(F,F, ••• ,F) in a class CO. Let T
n
n ,n ,···,n = t
real valued functions such that f
l
Xc 1 , ••• , X),
i
cn
c
2
l
= 1,2, ••• ,g be a seq,uence of real valued statistics such
(") c
(")
that T J. n
tends to T1 J. in probe as min(nl,n-., ••• ,n ) -> CD.
n , 2, ••• ,n
c
c
l
c
(i)
,(i)
Suppose that at least one f
(F ,F , ••• ,F ) ~ T10 for all F ,F , ••. ,F c
l 2
c
l 2
Further let W
in a class C •
l
nl'~'···'
n
= w(T(l)
c
nl
,~
, ••• ,nc
, ••• ,
nl,~, ••• ,nc
= ~(l)
0
for all i.
W
n ,n , ••• ,n
1 2
c
> d
Then the seq,uence of tests which rejects when
n l ,n2 ,···,n
c
is consistent for testing
l
Let us take
~
j=l
a • p[X is greater than (j-l)
i
j
XIS
and smaller than (c-j) xlsJ.
Where?Ci,are independent random variables with distribution functions
Let T(i)
nl ,n2 , ... ,nc
= Li
for
i
= 1,2, ••• ,c.
10
Then the convergence
I
I
I
I
I
I
el
I
I
H: Co at every fixed level of significance against the alternative HI: C •
11(i) =
-.
I
T(g)
) be a non negative function which is zero if and only if
n l ,n2 ,··· ,nc
T(i)
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
•
in probability of L to ~(i)
Ii
as min (n ,n , ••• ,n ) -> co,
l 2
c
i
l 2
N / (L
_
i
~(i))
follows since
has, in the limit, normal distribution with mean 0 and
finite variance.
We have
I
=
(4.1)
c
~ a
. 1 j
J=
1
e(qi.. IHO) = ~J
c
~
a
c j=l j
= a.
For the class C we have
l
00
(J
(4.2)
~*
-00
Here
~
*
~ Fr(x)
rr
[1 - F (x)JdF.(x)}
(j-l) terms (c-j) terms
s
~
indicates summation over all possible choices of (j-l) Fls
out of (c-l) F's (all except F ).
i
Hence we conclude that tests of the type which reject H: F
l
= F
= F2 =
c if
.I:
> .I:Q
are consistent for all F ,i = 1,2, ••• ,c, such that Tl(i) defined by
i
(4.2) is different from
a for
at least one 1.
It may b~ noted that .t
is a nonnegative function of L and equal to zero only when L =
i
i
for each i.
a
This condition does not appear to be too restrictive •
5. Distribution of uij and .I: under alternative hypotheses
In this section we derive the limiting distribution of u and .t
ij
under the following two sequences of alternative hypothesis
(5.1)
FL :
-~
n
F.(x)
~
H : F.(x)
~
Sn
= F(x
l 2
- n- /
= F(X(l
+ n- l /
11
e.) and
~
2 0.) ).
~
I
I
-.
n is given by the relation n.J .
=:ns.
where s.J. are fixed integers,
:t
Here
all eiare not equal, 0i > 0 for each i and all 0i are not equal.
Theorem 5.1:
as n ->
(a) wij ' as defined in theorem 2.1, has jointly in the limit
under
00,
multivariate normal distribution, with means
~
n
L-
n
j
c
c
l li =( !: si)1/2
i=l
!: (a
r=l
- aJ..) ((C :2 )
j 1
r
j'
f 2 (X)[F(x) Jj-l[l_F(X))C-j-ldx
\
-00
00
Ji f 2 (x) [F(x)J j - 2 [1 _ F(x)Jc-jdX}
_ (C-2)
j-2
-co
and elements of the covariance matrix given by (2.1) under the following
two conditions.
(i)
F is absolutely continuous with derivative f and
(ii)
There exists a function g such that
[f(y+h) - f(y)J/ h I ~ g(y)
for small hand
00
J g(y) f(y) dy
<
00 •
-co
(b)
Under
~
and the above two conditions £ has, in the limit
n
as n ->
00
1
2
noncentral x distribution with c-l degrees of freedom and
noncentrality parameter given by
(a' b)2
(5.4)
J.1:L
n
=
N AN
!: S1 (a
i
- 8')2
12
where
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I
I
I
I
Ie
00
bj
=
f
(~-l) {(c-j)
J-l
f 2(X) [F(x) ]j-l[ l-F(x) ]c-j-ldx
-00
00
J
-(j-l)
f 2 (X)
[F(x)]j-2 [1 - F(x)]C- j dx }
-00
Proof.
(a)
00
= ~'
J
-00
~
F(x_n- l / 2 a ) ~ [1 - F(x-n- l / 2a )]
k
(j-l) terms
r (c-j) terms
dF(x-n- l / 2 a )
i
{In the above expression~' indicates summation over all possible choices
of (j-l) F's out of (c-l) Fls (all except Fi )}.
00
J (j-l)
~
F(x_n- l / 2 a
terms
r
=~'
+ n- l / 2 8.)
~
-00
I
I
I
I
I
I
I
.I
We expand F in the above expression in a Taylor series around x
upto first term and the remainder.
Then using condition (ii) of the
theorem we may write
co
=~'
{[F(x)]j-l[l - F(x)]C- j - n- l / 2 f(x) [F(x)]j-2
J
-00
[l-F(X~-j ~ (a r - ei
)
(J-l) terms
+ n- l / 2 f(x) [F(x)Jj-l [l_F(x)JC-j-l
~
(ek - ai) }dF(x) + o(n- l )
(c-j) terms
00
=!
C
+ n- l / 2
~ (a -e.) {(Cj -2l )
r=l r ~
-
J f 2 (X)[F(x)]j-l[1_F(x)]C-j- l dX
-00
CD
Jr' f 2(X) [F(x)Jj-2[1_F(x)J c -j dxJ+ O(n- l )
_ (:-2)
J-2
-00
Proceeding on exactly similar lines and using conditions (i) and (ii)
of the theorem we see that
Lim
N
->
CD
N Cov (ui · uk'
J ,,,
IHr,n ) = Lim->
N
00
N Cov (UiJ"U k ' \ HO)
"
Hence part (a) of the theorem follows.
(b)
that~,
It follows
in the limit as n ->
CD
is distributed as
noncentral X2 with c-l degrees of freedom and the noncentrality
parameter
00
(C_l)2
A
~
s. (e
i=l
_ e)2
[~
j=l
i
1
a. [(C-2)
J
j-l
Jr(x)[F(X)Jj-l[l_F(x)JC-j-ldx
I
I
-.
I
I
I
I
I
I
-CD
CD
J
f 2 (x) [t - F(x)Jj-2 [1 - F(x)Jc-j dx}
I
I
-00
which further simplifies to
Theorem 5.2:
as n ->
CD,
(a)
(5.4)
w.. as defined in theorem 2.1 has jointly in the limit
1J
under Hs
' multivariate normal distribution, with means
n
CD
[l-F(x)Jc-j dx -
el
(~:~)
Jx f 2(X) [F(x)Jj-l[l_F(x)JC-j-ldxJ
-00
and the elements of the covariance matrix given by (2.1) under the following
14
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
conditions:
(i)
F is absolutely continuous with derivative f.
(ii)
There exists a function g such that
I [f(x)
- f(x+hx)]/hl <
g(x) for small h
00
J [x g(x)]i f(x) dx <
and
for i
00
= 1,2, •.. , 2c-1.
-00
and
(iii)
There exists A <
P
F
[I x
f
00
such that
(x) i < A]
= 1.
(b)
Under HS and the above three conditions, S- has,
n
in the limit as n -> 00, a noncentral X2 distribution with c-l degrees
of freedom and noncentrality parameter given by
Ie
where
I
-•
2'
= (dl,···,dc )
with
00
d
j
= (~:i) ((j-l)
J x f~X) [F(x)]j-2 Cl_F (x)]C- j dx
-00
00
Jx f~X) [F(x)]j-l[l_F(x)]C-j-ldxJ
- (c-j)
-CD
and '0 = ~ s.~ 6 i/ ~ s ~.•
a and A are same as in theorem 5.1.
f'V
Proof. (a)
00
J (j-l) terms [F (x)] (c-j) terms [l-F~~] dFi(X)
11
11
-00
15
I
I
(~'
indicates summation as in theorem 5.1)
(J)
J
=~,
[F(X(l + n- l / 2 6r)(1+n- l / 2 0.)-1]
J.
(j-l) terms
-00
~
6k)(1+n-l/20i)-1~dF(X)
[1 - F(x(1+n- l / 2
(c-j) terms
[We now expand F in a Taylor series around x upto first term and the
remainder.
theorem.
This is permissible in view of the condition (i) in the
Then using conditions (ii) and (iii) we write]
00
j'
=~,
{[F(x)]j-l[l _ F(x)]C- j + n- l / 2 x f(x) [F(x)]j-2[1 _ F(x)]C- j
-00
-.
I
I
I
I
I
I
_I
~
(or-oi)(1+n-l/20i)-1}dF(X) + O(n- l )
(c-j) terms
-00
After lengthy derivation on similar lines we ontain that
lim
->
N
N Cov(u .. , ~d
00
J. J
"
I HS
) =
n
lim
_>
N
00
N Cov(u .. ,UkU I HO)'
J.J"
Hence part (a) of the theorem is proved.
16
I
I
I
I
I
I
I
-.
I
I
I
.e
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
It follows easily that £, in the limit as n ->
(b)
00,
isdistri-
buted as noncentral X2 with c-l degrees of freedom and the noncentrality
parameter which simplifies to
00
J xr2 (x)[F(x)Jj-l[1_F(x)J C-j- l dxJ
[1 _F(x)Jc-j dx - (~=i)
-00
which can be seen to be equal to
I-L
c
sn =
L;'
. 1
J.=
_ 2
s.(o. - 0) •
J.
J.
6. AsymptotiC Relative Efficiency:
We know from Hannan's [7J and Andrews's [lJ work that the asymptotic
relative efficiency (ARE), in the Pitman sense, of one test with respect
to another, is equal to the ratio of the noncentrality parameters of the
two test statistics, provided that they are asymptotically distributed as
2
noncentral X variates with the same degrees of freedom under the given
sequence of alternative hypotheses (e.g.
HL
n
statistic,
or HS )'
Hence to obtain a test
n
from the class of statistics ;", which will have maximum ARE,
we need to maximize the noncentra1ity parameter over all real a, for
any given sequence of hypotheses.
We know that under
HL
n
2
and HS '
S,
is, in the limit, distributed as
n
noncentra1 X with c-1 degrees of freedom and noncentra1ity parameters
given by
I-L
Ln
and
I-L
S
respectively.
Let us first take
n
17
I-L
Ln
~L
n
=
I
I
(a'b)2
,... ,...
We need to maximize only
A
(~IRJ2
A
since the other factor does not
involve a.
Let us define ,D
i,j,= l,2, ••• ,c.
... = (d. j )
1
cxc
1
where
(~c~2 )(2c-l)
-2
c
1+J-2
and
Eo
= (d ,) 1
1 i, j = 1,2, ••• , c-l
i J c- x c-
It is then obvious that A = ,...a' ,...,...
Da •
£'
where Ki is a constant depending on i, is the covariance
c
matrix of u.. , j = l,2, ••• ,c. As we know that ~ u ' = 1, D
,... is singular.
lJ
j=l i J
We see that K
i
Since this is the only one restriction on the uij ( j = 1,2, ••• ,c).
£0
is nonsingular and of course, positive definite. We note that
c
c
c
~
d , = E d
= ~ b, = O. In view of these we may assume, without
i=l i J
j=l ij
j=l J
c
loss of generality that ~ a = 0, since the value of alb or aID a
i
i=l
I"J
remains unchanged even if a is replaced by a "'"
,...
It may be further seen that alb
,.""
t"'<J
i, j = 1,2, ••• ,c-l, with
18
= a'O~A
,...,
~
~
N
I'IttI
f"IJ
aJ
and aID a
f'tJ
f/'o.I
('oJ
= ~~
~
E
rw
~A,where
"'-/
-.
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I
Using the facts that
~ =
-
i
=
d ..
~J
! Eo!
~
j
= 0, it is seen that
d ..
~J
where T = (t ij )
i,j,= 1,2, ••• ,c-l and
and t. . = 1 if i 1= j.
~J
Therefore E is positive definite.
N
Since now we have that
I-
,..
~
, using Cauchy's inequality it may be seen that
for all real a and the equality is attained
whenever -y-O a j
-1
.eb 0 • On similar lines it can be proved that
Hence we have proved the following theorem.
Theorem 6.1:
(a)
The maximum of
~L
n
for all real a is
~
- 2.
s. ( 8. -8)
~
~
c-l
~
i=l
The maximum of ~S for all real a is
-1
n
c-l
and is obtained when a a E dO and a. =-~
O
c. 1
(b)
,..,
N
tv
ai ·
~ si(Oi-o)2ldto~-~0)
~=
Here ~O' .30' J o and ~ are the same as defined earlier in this section.
Though E- l may be computed for any specified c, it has not been possible
N
to compute it for general c in a simple and attractive form.
given for c = 2,3.
19
Below it is
-1
...
and ,....
E
For c "'" 2, E is a scalar
00
In this case b
J
=
l
I
I
= 3.
-.
00
2
f (x) dx and b 2
- J f2(x)
=
dx.
-00
-00
(6.2)
is the maximum of ilL
Hence for c
= 2,
under the sequence
Hr,
n
of alternative hypotheses.
n
the maximum ARE amongst that of any test belonging to this
class is equal to one w.r.t. the Wilcoxon-Mann-Whitney [14],[11) U test is
Kruskal's H test [9] restricted to two samples.
a
l
and a
2
This ARE is attained for any
so long as they are not equal.
CD
00
For c = 2, under HS '
n
Hence the maximum of Il
S
d
1
= -
Jxf
2
(x) dx and d
-CD
2
=
J
.
-00
x f2(x) dx.
is
el
n
This too is the same as the corresponding expression given by the above
mentioned two tests.
The maximum is attained for any a
they are distinct.
For c = 3, we have
-1°1
20
Under
Hr,
I
I
I
I
I
I
we can write the maximum of ilL
n
n
20
as
l
and a
2
as long as
I
I
I
I
I
I
I
-.
I
I
I
.I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
•I
which is attained when
Bb l -lOb 2+2b
3
~
i.e.
\2b l -lOb 2+Bb
-5bl+lOb2-5b3
b -5b +4b
2
l
3
3
and
3 3
2
2
Hence, in such case we have the maximum of ~L as 12 b
~ s.(e.-8).
l
n
i=l ~ 1
For all distributions symmetric about zero, we have b
b
2
o.
=
= -b
l
00
b
2
l
may further be seen to be equal to
[I
2
2
f (x) dxJ for
-00
symmetric distributions.
This maximum is attained for a
= 0 and
2
This test, therefore, coincides with the L test [5J
= -k for any real k.
3
proposed by the author with k
a
= k,
I2l
l
a
-1, for three samples.
For exporential distribution, we have the maximum of
~L
as
3
n
8 i:l si(6 i -8)2 which is attained when a l = 4k, a 2 = -5k and a = k for
3
any real k. The ARE of this test with respect to Kruskal's H test for c = 3
and for exponential distribution is 2.66 for
HL .
n
For H
S
we can obtain the a's which maximize
~S
n
and its maximum value
n
by replacing b. by d. in (6.5) and (6.4).
1
1
We know, for distributions symmetric about zero, d
such distributions, the maximum of
~S
l
= d
3
• hence for
is
n
00
(6.6)
720
[J
2
2
x f (x) F(x) dxJ
-00
for a
s.(0._0)2 which is attained
1
~
= k for any real k. In particular, for the
3
3
2
normal distribution we have the above expression equal to 1.524 ~ si(o.-o) •
l
= k, a
~
i=l
2
= -2k and a
. 1
~=
21
1
The ARE of this test w.r.t. a test proposed by Lehmann [lOJ (pp.273-275) is
'76 which is same as the ARE of the D test proposed by the author ~6J.
The maximum of
~S
for exponential distribution is seen to be
n
3
(8/ 9) E si(oi - 0)2 and is attained for a l = -2k, a2 = -5k and a = 7 k, for
3
i-l
any real k.
The ARE of this test w.r.t. Bhapkar's V test [2J (which is
more efficient against H
S for exponential distribution than the D test) is
n
1·60.
7.
Remarks:
It is noticed that for 'c
= 3,
the 'best' tests that we obtain out of
this class for normal (or more generally, any symmetric continuous distribution) distribution are those which have been already proposed on heuristic
grounds.
It is conjectured that for c > 3 the 'best' tests may not coincide
with existing tests.
For nonsymmetric distributions (e.g. exponential) the
'best' test even for c
= 3,
is different from any of the previously proposed
tests.
It is possible to take a different approach to construct tests.
Bhapkar
[3J has constructed a test for the c sample problem based on pairs (X. ,X.)
l.
of observations where Xi and Xj are from different samples.
has proposed a test based on triplets
(X.,X.,~
l.
J-K
Chatterjee [4J
) of observations for the
Let us consider t-plets formed by t « c) observations such
same problem.
that each of them represents a distinct sample.
-t
Q
J
. (x
Q'l..J
Q
s Q , ... ,
l
l
x
Q
t sQ
)
= mj
Let us define a function
whenever x
Q.s rv
1. ""'.
t
l.
exactly (j-l) x's and smaller than the rest.
22
is larger than
I
I
-.
I
I
I
I
I
I
_I
I
I
I
I
I
-I
I
-.
I
I
I
.e
I
I
I
I
I
I
nere (0:'
'"
' } ) ... :0t j\
('C
. , ," ? ; • • • ,(.,') •
SS,~l(re
symme-::.r.:' 'Let.',veen all
the t-plets in which an observation from the a. th sample occurs.
1
Let us consider
2::' q>t . (x",
x
a 1·J u l s a , ... , at sa ) where 2::' indicates summation
l
-=toccurs and (x~
over all the 1."s in which x
x , ) are from
a.1 s0'.
s· , ••• , 2t s
a1
at
1.
larger than exactly k-l other
XIS
in
we shall have
2::'
g);.J.,J.(xals
t
Q
1
, ••• , x s ) = .2::l(~:i) (~:~) mj
at at
J=
with the conventions that (g)
=1
and (;)
=0
whenever y > x.
Hence it seems likely that we shall develop identical tests if we base
them on t-plets using the function defined in (7.1) or if we base them on
c-plets using the function (7.2) defined below.
(7.2)
t
)
x
, ••• ,xct
= 2:: (k-l)(C-k)
'-1 t-· m. whenever
ltl 2t2
c
j=l J
J
J
q; ik ( x
x.
1t
8.
i
Acknowledgement:
is larger than exactly k-l
XIS
I am grateful to Professor Wassily Hoeffding for use-
ful discussions and constructive comments during this work.
.e
I
To
the c samples we will have to consider a function based collectively on all
Ie
I
I
I
I
I
I
I
are t memt'(;1'",~ vrf
23
I
I
References.
[1]
Andrews, Fred C., (1954), "Asymptotic behaviour of some rank tests for
analysis of variance," Ann. Math. Stat., ~, pp. 724-73'.
[2]
Bhapkar, V. P., (1961), "A nonparametric test for the problem of several
samples," Ann. Math. Stat., ~, pp. 1108-1117.
[3]
Bhapkar, V. P., (1964), "A nonparametric test for the several sample
location problem, " Uni. of North Carolina Institute of Statistics, Mimeo
Series no. 411.
[4]
Chatterjee, S. X., (1966), "A multisample nonparametric test based on
U-statistics," Calcutta Stat. Ass. Bull., 12., pp. 109-119.
[5]
Deshpande, Jayant v., (1965), "A nonparametric test based on U-statistics
for the problem of several samples," J. Indian Stat. Ass., 2." pp. 20-29.
[6]
Deshpande, Jayant V., (1965), "Some nonparametric tests of statistical
hypothese.," Dissertation submitted for the Ph.D. degree of the University
of Poona.
(7]
Hannan, E. J., (1956), "The asymptotic powers of certain tests based on
multipie correlation," J. Roy. Stat. Soc., Sere B.,1§., pp. 227-233.
[8]
Hoeffdi ng, Wassily, (1948), "A class of statistics with asymptotically
normal distributions," Ann. Math. Stat, ];2, pp. 293-325.
(9]
Kruskal, W. H., (1952), "A nonparametric test for the several sample
problem," Ann. Math. Stat., 23, pp. 525-540.
[10] 'Lehmann, E. L., (1959), "Testing of Statistical Hypotheses," John Wiley
and Sons, New York.
(11]
Mann, H. B. and Whitney, D. R., (1947), "On a certain test whether one
of the two random variables is stochastically larger than the other,"
Ann. Math. Stat., 18,pp. 50-60.
[12]
Sugiura. Nariaki, (1965), "Multisample and multivariate nonparametric
tests based on U - statistics and their asymptotic efficienciesY Osaka
J. Math., g, pp. 385-426.
[13J
Sukhatme, B. V., (1958), "Testing the hypothesis that two populations
differ only in location": Ann. Math. stat., 29, pp. 60-78.
[14J
Wilcoxon, F., (1945), "Individual comparisons by ranking methods":
Biometrics Bulletin (now Biometrics), .!' pp. 80-83.
24
t...,• .
-.
I
I
I
I
I
I
el
I
I
I
I
I
I
I
-.
I