Bhapkar, V.P.; (1964)A non-parametric test for the several sample location problem."

I
..I
I
I
I
I
I
I
A NONPARAMETRIC TEST FOR '!liE SEVERAL
SAMPLE LOCATION PROBLEM
by
V. P. Bhapkar
University of North Carolina and
University of Poona
Institute of Statistics Mimeo Series No. 411
October 1964
--I
I
I
I
I
I
..I
I
This research was supported by Air Force Office
of Scientific Research Grant No. 84-63.
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTIf CAROLINA
Chapel Hill, N. C.
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
A NONPARAMETRIC TEST FOR THE SEVERAL
SAMPIE LOCATION PROBLEM
by
V. P. Bhapkar
University of North Carolina and
University of Poona
This paper offers a new nonparametric test of the null
hypothesis
Fi(X)
= F(x
F1 = F 2 = ••• = F c
(i
- ei )
against alternatives of the form
= i,2, ••• ,c),
where
the
e. 's
~
are not all equal and F.
~
is ~1e unknown (continuous) cumulative distribution function of the univariate
!
population from which the
th random sample comes.
It is based on c-plets
that can be formed by choosing one observation from each sample.
The asymptotic
distribution of the new test statistic, W, is shown to be the chi-square distribution with
c-l
degrees of freedom, under quite general conditions, when
the null hypothesis holds.
The asymptotic power of the test is computed far
translation-type alternatives and it is shown that the test is asymptotically
as efficient, in the Pitman-sense, as the Kruskal-Wallis H-test.
i th population with continuous
= 1,2,.,. ,n. ) be a
c.d.f. F., i = 1,2, ... ,
J.
these samples are independent.
We consider a nonparametric test of the
?...~J.R~2ucm~.
Let (x .. , j
~J
~
random sample from the
c , and suppose that
hypothesis
Ho :
F1
= F 2 = ••• = F c
against alternatives of the form
F.(x)
~
= F(x-e.)
J.
vlith the
e. 's
~
not all equal.
Reference to prior work and some of the recent work may be found in [7], [6],
[4J, [2J and [3].
I
The observations, when regarded as random variables, 'Iilll be represented
by the corresponding capital letters.
where
i
r
= 1,2,
is the rank of
... , c •
xi
~'X2"",xc
arranged in increasing order,
Since the distributions are assumed to be continuous, the
probability that any two
n
are equal in zero.
XiS
n
Let
n
2
I::
t =1
2
1
v.J. = I::
t l =l
(2.2)
in
Let
... t I::=1c
c
It is then seen that
n.
c
I::
j=l
I::
r=l
J.
v. =
J.
where
n (r)
..
J.J
is the number of c-plets that can be formed by taking one observa-
tion from each sample,
has rank
that
and N = I::.J. n J..•
x
r
ij
It is seen that
in each of these c-plets.
c
12
2
and random variables
consists in rejecting
determined number
[
c
2
_
L: n.u.
i=l J. J.
since
H
o
U.
J.
I
S
H
o
w
=
t n.u)2
(i=l
N J.
J
J.
(12/c2 )I:: n.(u.-u)2 ,vD1ere
i J. J.
U=
are expected to be equal when H holds.
0
at a significance level a
if
I::.n.u./N
J. J. J.
The test
w exceeds some pre-
In the next section it is shown that, when
W
a
H
a
is
Thus a large sample approximation for
a-point of the ~
distribution with
It can be seen that
= c(c-l)/2.
c-l
W is provided by the upper
a
degrees of freedom.
I::.V.
= (n n ••• n )c(c-l)/2, so that
J. J.
c
1 2
Then for c = 2, i.e., for the two-sample problem the statistic
2
I
I
I
I
I
I
I
--I
u.J. = v./n
n ••• n c
J. l 2
W is asymptotically distributed as a chi-square variable with c-l degrees
of freedom.
i
Let
w may be regarded as a suitable measure of deviation from
the null hypothesis
L:iU
being the observation from the !th sample, such
Then the statistic now being proposed is
w=
true,
(r)
n ij
(r-l)
J
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
W is seen to be equivalent to the Mann-Whitney
where
also known that the Mann-Whitney
Rl
based on
Ri s •
the case of c
}
It is
U-test is equivalent to the Wilcoxon [10] test
, the mean ranI;: of the first sample.
the Wilcoxon statistic
I u-nl n2/21
(JS.o:) X2 /3 ) with} say} XlO: > X2 /3.
U is the number of pairs
based on
[9] statistic
The rnultisample analogue of
is provided by the Kruskal-Wallis [7] H-statistic
The motivation behind the tests based on c-plets is to use} for
samples} Mann-Wl1itney-type test statistics.
In [2] a test-
statistic , V , has been offered; it is based on the number of c-plet.s chat can
be formed by choosing one observation from each sample such that the observation
from the ,!th sample is the least (i=1,2, ••• , c).
It was shOi'm to be consistent
for the class of translation alternatives and asymptotically mar e efficient} in
the Pitman sense, tl1&n the H-statistic for some distributions.
aSy111Ptotically less efficient for normal distribution.
But it was
Deshpande [3] proposed
a statistic based on the numbers of c-plets such that the observation from the
,!th
sample is (i) the least or (ii) the largest.
from a similar drawback.
Tl~t
statistic also suffers
The statistic being proposed nov; extracts} presumably}
more information with the result that it is asymptotically as efficient, as
will be shown later, al! the H-statistic.
(2.4)
Where
(2.5)
$ •• (X.,X )
~J
~
j
=1
if x. > x j
= othe:n·tlse
~
Then, from (2.2),
Vi
= n l n2 .. ·nc
c
j~ u ij
,
so that
3
In fact, it can be seen that
I
c
(2.6)
u. =
~
E
u
j=l
J
,
ij
"There
n.
n.
~
u
(2.7)
ij =
t.=l
~
[R.l -
n
J
E
E
ell •• (x.
~J
t.=l
(n+l)/2] , where
J
R.
t
~.,
~
X jt
.
)/n.n.
~
J
,
i 1= j
J
is the mean rank of the _ith
~
this case, W statistic is equivalent to the H-statistic.
relation does not exist if the
n.
~
IS
sample; thus, in
Such a simple
are not all equal.
Ho •
From (2.2) it is seen that
ing to
U.
~
is a generalized U-statistic correspond-
From the c-sample version
ell..
~
(e.g. see [2]) of Hoeffding's theorem
1
[5J on U-statistics, it then follows that N2 [~ - II ] is, in the limit as
n.
~
~CXl
in such a way that
n. = Np. , the piS being fixed positive numbers
~
~
E .p. = 1, nornally distributed with zero mean and covariance matrix
such that
J:
~
given by
(j
rs
=
c
1
i=l
Pi
E
r,s = l,2, ••• ,c,
where
are independent random variables with c.d.f. F. (j=1,2, ••• ,c).
J
4
I
I
I
I
I
I
I
--I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
Now, when
H
o
Fl = F
2
holds,
= ••• = Fc = F,
say.
Then
[~i' (Xi,X.)] = (c-l)/2. Here, and hereafter in this section,
j=l
J
J
.
XIS are independent random variables each vdth c.d.f. F. Also
£ 1t
n· =
l
~ (l:j
S(i)(i,i)=
eG
= 1;J L
It
0.4)
l > .. (X.,X,») (l: 4>'k(X"x-'»)
lJ l J
k l
l --k
s(j) (i,i) =
(ll>lJ
.. (X.,X.)ll>'k(X.,JC'»)
l J l
l -11:
- (c-1F'/4
- (c-l)2/4
~.lS (X!,X')
l S
e
(l: ll>. (X.,X »)( l:
~ r lr l r
s~j
+ $.. (X!,X,») - (c-l)'",/ll.
lJ l J
= l: l: (1/4) + l:
(1/4) + 1/3 - (c-1F/4
r S~j
r~j
= 1/12
(i ~ j)
,
S(j)(i,j) = 12 (l: 4>. (X.,X»)( l: $j (X"X'»)
~ r
lr l r
s
S J S
=
- (c-l)2/4
L (1/4) + l: (1/6) - (c-l)2/4 = -(c-l)/12 ,
l:
r/:j
S
S
and finally
0.6)
S(k)(i,j)
=e
~
=
(L
r
E E
r
=
Tims, I'lhen
(3.7)
H
o
0', •
II
0'••
lJ
=
ll>.
lr
(X~,X»)(
~
r
(1/4) +
s/:k
E
L ll>. (X~,X') +
s/:k JS J S
r/:k
holds, we have
1
L
= 12
k/:i
+
1
12p .
J
L
j/:i
1
Pk
-
c-1
l2:P i
/:j
5
J
(1/4) + 1/3 - (c-l)2/4
1/12.
(c_l)2
l2pi
<l>'k(X~,JC»)
-
c-1
12P j
J-K
-
(c_l)2/L~
I
J
Thus
3.8 J2 I: = c 2 P-1 - c
q j
()
~
""",
N
-
f
+ a J
c j qt
P = diagonal ( p., i=1,2, ••• ,c) ,
a
J.
=
J
N
(1) CXC ,
j' = (l)lx
N
and
C
=
c
I: (l/p.),
i=l
J.
q' = (l/Pl,···,l/pc) •
....
~
I: UiN = c(c-l)/2, the distribution of
i
Since
the asymptotic normal distribution of
that I: j = O.
it can be verified
NN
,
....
.... ....
~
.IN (!4r -
is singular and, hence,
II ) is also singular.
In fact,
Then arguing exactly as in [2] it follO"TS
N
that
(3.9)
l2N
w=
c
=
l2N
c
1~
Ho •
l
l
2
2
c
i:l
Pi (UiN
c-l
-
2
;;>
)
{ c
J.=l
c
2
~ ~C PiUiN
Pi UiU i=l
'- 1=1
I:
} 2
asymptotically chi-square distribution with
Thus suppressing
Pi(UiN -
.I:
N in the subscript of
}
2
J
J
c-l
U,
c-l
2)
'\ITe
degrees of freedom tmder
have the statistic (2.3)
proposed earlier.
of a lemma due to Lehmann [8]:
(i)
~i = f
Let
such that
,,,
(F ,F , ••• ,F c ) ' i=1,2, ••• , g, be real-valued functions
l 2
f(i) (F, ... ,F)
(i) . ,
T
nl,···,n
= t
c
(i)
.
,
=~.W .for
all (F,F, ... ,F) in a class
(x_-' ••• 'Xln ; ••• ; X l' ••• 'X
-~
1
c
sequences of real-valued statistics such that
in probability as min(nl, ••• ,n ) -+
c
for some
i
W
for all (F 1, •••
nl' ••• ,n c
,F c)
00
•
c
6
)
c
0
, i=1,2, ••• ,g, be
(.)
T 1 .
tends to
~_,
Suppose
in a class
(l): .
= w (Tnl,···,n , .•• ,
cn
-e.
'el •
••• , n
t~at
Let
TJi
Cr(i)(Fl:F2, ••• ,Fc)fTJiO
Further let
I
I
I
I
I
I
I
ttl
I
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
be a nonnegative function lThicl1 is zero if, and only if,
for all
T(i)
nl,···,n c
TlleJ.1 the sequence of tests which reject "lihen
i=l, 2, ••• , g.
H
is consistent for testing H: l~;
n1, ••• , n c
'e l
level of significance against the alternatives
I f we take
T}i =
~
l
<lli(lS.,X2 ,·",xc )
ent random variables with continuous
T(i)
l1
l ,n2,···,nc
=
UiN
J,
it may be easily seen tbat '\-
e
one of the) largest among
•
Fl, ••• ,F ' respectively, and
c
c.d.f.
alternatives for which
- .!l).
F. (x) • F(x ~
e.),
~
IS.
The
to T}.
For the class
I'd tl1
e
e1
l
's not all eQual,
> (c-l)/2 , i.e. '\-0 ' vThere er
is the (or
W-test, thus, is seen to be consistent
tr~lslation-type
generally, the
~~re
at every fixed
, then the convergence in probability of
of translation-type alternatives
'tlio
where the Xi's are independ-
follows from the asyJtg?totic normality of .IN(~
against the class of
o
=
alternatives.
W-test is consistent against the wider class of
.~ [<lli (xl ,X2 , ... ,Xc)] ~ (c-l)/2 for at least one
i.
As the H-test is consistent for a fixed translation-type
alternatives:
........ ~-
alternative
F. (x)
l
min (nl, ••• ,n ) -)
c
= F(x-e.),
~
00.
~le
with not all e's erlual, the power -) 1 as
asyJtg?totic power is then defined as the limiting
pOlrer under a sequence of alternatives
tending to
H
n
provided that this limit is different from both
ficance
ex.
1
H , as
a
and the level of signi-
It can be seen that, for our purpose, the asymptotic power can
be computed if we take the sequence of alternatives
H:
n
1'litIl
not all
aS~lmptotic
e
rS
F. (x)
In
equal a.nd
= F(x
1
- n -"2
n.l = ns.,
"lith
l
e .)
~
s.
l
}
a positive integer.
The
power can then be computed in a manner similar to the one employed
7
I
in [2].
THEOREH 5.1.
a function
If F
g
possesses a continuous derivative
f
and there exists
such tl1at
I[f(y+h)
~
- f(y) ]/11
g(y)
for all
y
and
11,
and
00
f
g(y) f(y)dy <
00
,
-00
n . = n s., ivitIl s. a fixed positive integer, and. under the
~
of ~
~
sequence/distributions Fin' i=l, 2, ••• , c, as n ~ 00 the statistic W has
then ivi th
a limiting noncentral chi-square distribution ivith
c-l
degrees of freedom
and the noncentrality pararaeter
!L.T
'h
'\'There
= 12/\2 E.
( e. - -)2
e ,
~
s ~.
~
e = E is. e./E. S4....
~
~
-
and
~
Then it can be easily shmffi that
~
T)
in
J
=
~ l<l>. . (x.. , XJ.) I Hn
~J
~
(c-l)/2 + n- t A E
j
"
-
(ei
-
ej )
as
n ~ 00
l
+ o(n- )
Similarly it can be shown that
I Hn
N cov (~
) ~
k
,
,
and
.fN [ ~ - (c-l/2) ~]
is asymptotically normal with mean
1
( EiS ) -2" A 2
i
and covariance matrix
0. = c
~
ei
-
r: e
j
I
I
I
I
I
I
I
--I
00
PROOF:
= E.
J
j
8
~, where
2' = (°1 ,°2 ,."
.,oc)
I
I
I
I
I
.-I
I
I
..I
I
I
I
I
I
I
--I
I
I
I
I
I
I.
I
I
and
~
is given by
(3.8). The theorem then follows as in [2].
The asymptotic power is thus seen to be equal to the probability that
a noncentral X2
parameter
X;2
A_
'W
variable \"ith
exceeds
variable with
6.
c-l
Remarks.
,..~~""'U'
of the H-statistic.
2
X
' c-l,a
c-l degrees of freedom and the noncentrality
,
a-point of the central
degrees of freedom.
Andrew's [1] has obtained the asymptotic distribution
Under the same sequence of alternatives it is the
as the asymptotic distribution of
W relative to
the usual upper
s~~e
W, so that the asymptotic efficiency of
H is one and, hence, relative to the F-statistic is
3/1C
if the underlying distributions are normal.
Comparing the efficiency figures in [2] it appears that the V-statistic,
i.e., the test based on the number of c-plets such that the observations from
the lth population are the least, is much more efficient for populations
bounded below (e.g. exponential distribution
f(y,a)
= e-(y-a)
, y
~ a );
the statistic based on the number of c-plets with respect to the largest
observation is similarly much more efficient for populations bounded above
(e.g. reversed exponential distribution
f(y,a)
= e(y-a)
, y <
a).
Both of
them are fairly efficient (and the statistic based on c-plets with respect to
both the smallest and the largest observations is even much more so) for
distributions bounded on both sides (e.g. uniform distribution
f(x,a,/3)
= 1/ (/3-a)
a
~
x
~
/3) while the W-statistic (based on c-plets
with respect to all the positions) appears to be more efficient for unbounded
distributions.
9
I
REF ERE N C E S
[ 1]
FRED C. ANDREWS, "Asymptotic behavior of some rank tests for analysis
of variance,
[ 2]
Ann. Math. Stat., Vo. 25(1954), pp. 724-736.
V. P. BHAPKAR, lIA nonparametric test for the problem of several
Ann. Math. Stat., Vol. 32 (1961), pp. 1108-1117.
samples,"
[ 3]
II
JP::lANT V. DESHPANDE, "A nonparametric test based on U-statistics
for severa.1. samples," (Abstract), Ann. Math. Stat., Vol. 34 (1963)
p. 1624.
[ 4]
MEYER DWASS,
II
Some k-sample rank-order tests,
II
Contributions to
Probability and Statistics, Stanford University Press, Stanford,
California (1960), PP. 198-202.
[ 5
WASSILY HOEFFDING, "A class of statistics ",ith as;'lmptotic normal
distributions, II
[ 6]
Ann.~.~. Vol.
19 (1948), pp. 293-325.
J. KIEFER, "K-sample analogues of the Kolroogoror-Smirnov and
Cramer-v. Mises tests,"
Ann.~.~., Vol.
30(1959),
J
I
I
I
I
I
I
I
--I
pp. 420-447.
[ 7]
WILLIAM H. KRUSKAL, "A nonparametric test for the several sample
problem, II
[ 8]
E. L. LEHMANN, "Consistency and unbiasedness of certain nonparametric
tests,
[ 9]
Ann. Math. Stat. Vol. 23(1952), pp. 525-540.
II
~.~. Stat., Vol.
22(1951), pp. 165-179.
H. B. MANN and D. R. 'VlHITNEY, liOn a test of '\iI1etl1er one of t\-lO
random variables is stochastically larger tJ:lan the other,"
~. Math. ~., Vol.
[ JD ]
18(1947), pp. 50-60.
F. WILCOXON, "Individual comparison by ranking metll0ds," Biometrics
~., Vol.
(1945), pp. 80-83.
10
I
I
I
I
I
.-I
I