UNIVERSITY OF NORTH CAROLINA
Depa rtment of Statistics
Chapel Hill, N. C•
•
USE OF THE WILCOXON STATISTIC FOR A GENERALIZED
BEHRENS-FISHER PROBLEN
by
Richard F. Potthoff
February, 1962
Contract No. AF 49( 638)-213
The ordinary Vlilcoxon test is used to test the null hypothe..
sis that two populations are identical. This paper presents
a technique for utilizing the Vlilcoxon statistic to test a
null hypothesis of a broader type, such as is encountered in
the Behrens-Fisher problem where it is required to test the
equality of the means of two normal populations having (possibly) unequal variances. The dis cussion in this paper is on
a rather technical level; for a less technical discussion,
see Mimeograph Series No. 316.
This research 'Was supported partially by the Mathematics Division of the A:i.r
Force Office of Scientific Research and partially by Educational Testing Sendee.
Institute of Statistics
Mimeo Series No. 315
USE OF THE UILCOXOH STATISTIC FOR A
1
GENERALIZED BEHRENS-FISImR PROBLEI>1
by
Richard F. Potthoff
University of North Carolina
1.
Introduction and summary.
Let
F( "i'T)
be any continuous
c. d. f. (cumula-
tive distribution function) which is symmetrical about the origin:
(1.1)
F(-w) + F(w)
=
1,
-co
<w<
co
Suppose that we have a saIll]?le of m independent observations
from a population with c .d.f.
G(x), and a sample of n
Yl , Y2 , ... , Yn from a population ,dth c.d.f.
Xl' X , "" Xm
2
independent observations
H(y), where
(1. 2a)
and
(1.2b)
The scale parameters
Q
l
and
Q
2
b
l
are assumed
and b (b > 0, b > 0) and the location parameters
2 l
2
unlmOWIlj the function
F mayor may not be Imown. \'lith
no loss of generality we may assume that m5 n.
We suppose that it is desired to test the null hypothesis
(1.3)
against alternatives
Ha :
Let us define
m
(1.4)
!:
m,n = (limn)
.
i=l
1'1
n
L:
,
j=l
~hiS research uas supported partially by the Air Force Office of Scientific
Research and partially by Educational Testing Service.
-2-
where the function
U(d)
is
0
or
1 according to whether
d
< 0 or d > 0
In Section 3 it will be shotffi that
respectively.
E(vlm,n )
1:2
=
if and only if
Q
l
=
Q
2
(except that the "only if" part is subject to a trivial further restriction on F),
and that
(1.6)
Since
var(Hm,n )
l'l
m,n
= 1/4m
is (approximately) normal for sUfficiently large samples, tve "Till have
(approximately) a size -a test of
if i'Te use as the critical region
H
o
1
(1.7)
where
2m'2I Hm,n -1:I>z/
2
a 2
za/2
is defined by
dz
=
1
-
(a/2)
-co
This test (1.7) will actually be a conservative test; that is, the probability of
rejecting
Ho when
than exactly equal to
The statistic
Hilcoxon test ~5,
is true "Till generally be somewhat less than
H
o
ct.
H
m,n
27.
(1.4)
is of course the basic element of the "Tell-lmovm
Heretofore, however, this Hilcoxon statistic
been used only for testing the null hypothesis
the
Xi's
form H(y)
and
=
H being the
G(y + 0)
of a form such that
p
ct, rather
where
f ~,
c.d.f.
0
f
of the
Wm,n has
G = H (G being the c.d.f.
Yj'S)
of
against alternatives of the
0 (or, more generally, against alternatives
where we are defining
p
= p (X < Y)),
and has not
been used for testing a hypothesis of the form specified by (1.1, 1.2, 1.3).
The
-3variance of U
when G = H is (m + n + 1)/12mn, which is of course smaller
m,n
than (1.6). The asymptotic normality of H ,alluded to in the preceding param,n
graph, has already been proved; regardless of whether G = H or G ~ H, and regardless of what G and H are (so long as
°< P < 1), the distribution of
(1.8)
approaches a normal distribution with zero mean and unit variance as
provided that m, n ->
00
in such a 'Hay that the ratio
min ->
m, n ->
00
,
some constant c.
(For proof, refer to ~4, Theorem 3.g7 or ~2, Theorem 6.~7; alternatively, the
proof follows easily from Theorem (8.1) of
In the special case where
F in
~L7.)
(1.2) is known to be the normal
problem we are considering reduces to the Behrens-Fisher problem.
c.d.f., the
Since originally
it was the Behrens-Fisher problem which motivated this paper, and since this problem is of special interest, we will first give, in Section 2, some brief remarks
pertaining to it.
2.
N(Ol'
The special case of the Behrens-Fisher
~roblem.
The m Xi's
are
~i), and the n Yj's are N(02' ~~), where ~l = l/b l , ~2 = 1/b2 · F is the
N(O, 1)
c.d.f.
Note first that
1
(2.1)
t
(m + n _ 2)2 (x - y)
=
2
1/2
(Y. - Y)8:-!
b ) Z(X - 'X) 2 +
r(_1 + ;..
L m
n
i
where
b
X and Yare the means of the
= ~2/~ 1 = bl /b 2 ,
follows the
of freedom; but we cannot use
,
J
Xi's and Yj ' s
t-distribution (under
-
respectively and where
Ho ) with (m+n-2) degrees
as a statistic for testing H (1.3) due
o
to the fact that (2.1) is not free of the nuisance parameter b. Now we might be
(2.1)
-4inclined to try to approach the Behrens-Fisher problem by maximizing the denominator of
(2.1) with respect to b, since this would lead us to the minimum possi-
ble (absolute) value of t
value of the
unI~nown
this approach doesn't
comes infinite as
and would give us a conservative test valid for any
parameter b.
worl~,
b --> 0
It
quicl~ly
becomes apparent, however, that
due to the fact that
or as b -->
the denominator of
(2.1) be-
00.
Consider what happens, though, if we use this same approach with respect to
the statistic (1.8).
of
Under Ho the denominator of
(2.1), depends on b £see (3.1°17.
nominator of
(1.8), lilee the denominator
But when we try to maximize this de-
(1.8) with respect to b, the maximum turns out to be finite ["'see
(1. 617.
Thus, under Ho' the denominator of
(2.1) is unbounded with respect to b,
whereas the maximum value of the denominator of (1.8) with respect to b
finite.
is
It is this fact which provided the key to the basic approach used in this
paper.
The general proofs of both (1.5) and (1.6) given in Section 3 are of course
fully adequate to cover the special case in which F is normal.
we rely strictly on the proof of Section 3.
a quick proof when F is normal.
p = p [X < y J) .
Now the variate
By (1.4),
(Yj
(~2 - Ql)' and hence has median (Q
-
Xi)
2 - Ql)'
As for (1. 6),
lJe may note, though, that (1.5) has
=p
E(Wm,n )
(recall that we defined
is normally distributed with mean
Thus
p
= '21
if and only if
~l
= Q2'
which establishes (1.5).
It appears from rough preliminary investigation that Scheff~'s test £17 for
the Behrens-Fisher problem (which is a t-test with m-l degrees of freedom) generally has better pm.Ter than the test proposed here (1.7).
HOvTever, the advantage of
the test (1.7) lies in the fact that it is still valid for testing the equality of
the Q's
even when F is non-normal (our test of course even worles if F is the
Cauchy distribution), whereas the validity of Scheff~fs test would be questionable
-5if F were not normal.
Thus, if data have been obtained under a model of the form
(1.1 - 1.2), then Scheffe1s test appears to be better if the experimenter is sure
that
F is normal; but if the experimenter is not certain about
then it might be safer to use the test
3.
The general case.
for general F.
(1.7).
In this section we prove the relations
To establish
J
F being normal,
(1.5), we first
(1.5) and (1.6)
v~ite
00
(3.1)
E(Wm,n ) = P =
G(Y) dH(Y)
-00
J
00
=
F(blY - b l Ql) dF(b 2y .. b 2 Q2)
-00
J
00
=
F(bu + b18) dF(u)
-00
o
o
J
F(bu + b1 8) dF(u) =
J
F(-bv + b18) dF(-v)
00
-00
Jo
00
=
where we first set
into
u
= -v
~
-
F(bv .. b 1 8) dF(v) ,
and then applied
(1.1) thrice. Substituting (3.2)
(3.1), we obtain
J
00
(3.3)
If 8
p =
= 0, (3.3)
~
+
o
becomes
LF(bu + b18) - F(bu - b18L7 dF(u)
p
1
= 2'
If 8 > 0,
(3.3) tells us that p
1
~2' and if
-6p 5~
8 < 0, then
j
if the integ~l
in each case, the equality sign holds only
on the right-hand side of
(3.3)
is
Thus, for
O.
8 ~ 0
to imply
must impose the condition that this integral be non-vanishing.
p ~
only a rather odd-shaped F
fy it.
F(V):; 0, w 5 -1; F(w) :; 1 + ", - 1
F(w)
=~, ~ ~ ~ w 5~; pew) :; W, { 5 w 5 1; F(vl) :; I,
condi'tion if
b:; lD, b
l
:; 1,
'W
?
1
we
This is a some-
what trivial condition, however;
fE.g., the distribution
"21 '
would fail to satis-
5 w5 -
%;
fails to satisfy the
8:; 1.:.7
Having proved (1.5), we now turn to (1.6).
We will start by establishing a
result somewhat more general than what we actually need for proving (1.6).
For
the time being we are going to assume that
£ not
necessarily satisfying (1.1, 1.2, 1.;>-7.
(see
f§J
or
(;.4) var(W
from
Eij
H may be any
c .d.f. r s
Then we have the well-lmow formula
f§7)
m,n
Now let
G and
):; (1/mn)["p+(1-m-ri)p2 + (m-l)P(X < Y,x...< Y)+(n-l)P(X <Y ,X<Y )7.
a
~
r
v-
denote the event
G and two observations
Xi < Y , and consider two observations
j
Y , Y
l
2
from
H.
Xl' X
2
Then
< P (E ) + P [E E )
11
21 12
Also,
:;
p
:;
P + P
2
-7Recalling that
n-m > 0 and substituting (}.5) and (}.6) into (}.4), we obtain
var(wm,n) ~ (limn)
+ (l-m-n) p 2 + (m-l)(p + p2 ) + (n-m) P_7 '
£p
or
(}.7) var(Wm,n ) -< pel - p)/m
We return now to the model specified by (1.1, 1.2) 'With null hypothesis (l.}).
Under H (l.}), we have p
o
(}.8)
1
=2'
(}.7) tells us that
and so
var(lv
) < 114m
m,n -
["Obviously var(W
) cannot exceed 114m if H is false, e:f.ther._7 We will
m,n
0
next prove that, regardless of what F in (1.2) is,
(}.9)
lim
b
->
0
var(Vlm,n )
= 114m
whenever Ql
Since we have no knowledge of what b
= ~2
•
(}.9) tells us that the bound 114m in
is,
(3.8) cannot be improved upon (i.e., reduced) for any F. The relations (}.8)
and
(}.9) together 'Will be sufficient to establish (1.6).
To prove
(}.9), we first use (}.4) and (1.2) (setting p =~ and Ql = Q2)
to obtain
JF2(blY-blQl)clF(b2Y~b2Ql)
00
::l~ £
}-W-
n
+(m-l)
-00
+ (n - 1)
J
00
2
F (b2X
-
b2Ql)clF(b l X
-
b l Q1 L7
-00
::l~ f}-W-
IF
0
00
n
+(m-l)
2
(bv)dF(v) +
(n-l~1
F
2
(u/b)dF(u) +
.8co
+
J ~(U/b)dF(u)t7,
o
= v,
where b y - b Q
2 l
2
blx - blQ
l
= u.
Recalling that
F is continuous and re-
ferring to £1, p. 67, (IL7, we obtain from (,.10)
1 Lr 3-m-n
1 + (n-l) (0 + '2
1t 7
var ( Wm,n ) = mn
4
+ «
m-l)"Ii)
lim
->
b
0
This proves (3.9).
Now (1.5) and (1.6) are sufficient to establish that the test (1.7) is
(approximately) of size a.
expectation
"21
For, since
W
is (approximately) nornal) and has
m,n
under H ' we have (approximately)
o
1
P (IVIm,n -
~c:: I > -r var(vlm,n) - 72
z 1"\1/""
..... G )
=a ,
1. e.,
1
p(2m'21
(3.11)
under
H '
o
I Wm,n
-
~ I >[
G
var(Wm,n
)J2
1/4m
za/2
} = a
,
Comparing (1.7) with (3.11), and noting that £bY (1.6L7 the supremum
of the coefficient of
za/2
in (3.11) is
1, we conclude that (under
supremum of the probability of the event (1.7) is (approximately)
4.
(i)
Concluding remarks.
Sundrum
£§l
Ho ) the
a.
We make some final observations:
presents a result which is of the same form as (3.5) £ and
also gives a relation of the form (3.7) for the speci.a.l cs,se
:gl
= p},
but his
rather unusual proof of (3.5) seems to assume a special type of discrete distribution.
~
Although the idea behind his proof would appear intuitively to apply to
other distributions also, it would not seem that such an ertension, to a class of
-9distributions wider than the class tor which his
was originally intended ,
Thus the proof in (.;.5) 1 in additi on to being short,
would be completely rigorous.
makes no restrictions on G or
If m,n ->
proo~
H.
in such a 'Way that
mIn -> c, the test (1. 7) 1'Till be
consistent against all alternatives such that
p ~ ~ ~recall that (3.3) showed
(ii)
co
~ "2 implies p ~ ~]. The cona
sistency property can be proved (e.g.) by using an argument similar to that given
that, with certain triVial exceptions,
in
£5,
H : '\
58-52.7.
pp.
(iii)
It does not appear that the test (1. 7) is unbiased.
This of course is
(indirectly) a consequence of the fact that the probability of rejection when
H
o
is true will not be constant, but rather will vary with b.
(iv)
Confidence bounds on 5
(5 = "2 - "1) associated with the test (1.7)
are easily obtained, using exactly the same technique as is commonly used with the
ordinary Hilcoxon test.
fThe technique consists of finding that value of
which, when subtracted from each
0
(Yf Xi) in (1.4), will cause the resulting new
to be on the threshhold of being significantly small or large. 7
-
W
m,n
(v)
The d.iscussion in this paper 'Was in terms of a two-tailed test.
Obvious-
ly, it could just as easily have been in tems of a one-tailed test.
(Vi)
The ordinary Wi1coxon test (tor tes¥ng
l
(1. 7) except that the factor
F(m + n + 1)/12
IDE7- '2 •
zJ f
By
= (1/4m) -
"2
_7
G
= H)
is the same as the test
in (1.7) is replaced by
comparing these two factors, we can get some
inkling as to the price we have to pay for eliminating the nuisance parameter
when we use the test (1.7).
larger
For example, if
than (m + n + 1)/12mn;
m = n, 1/4m is not qUite
50
0/0
if m < n, the cecparison of course is worse
0
(Le. , ~ 50 /0).
(Vii)
We mention now the possibility of using the test (1.7) tor testing
null hypotheses broader than the null hypothesis indicated by (1.1, 1.2, 1.3).
b
-10-
For example, suppose that the specifications
(1.2) are replaced by the less re-
strictive assumptions
(4.la)
and
,
(4.lb)
where both Fl (w) and F (W) are assumed to be symmetrical about 1-1 = 0 L see
2
(1.117. Then the proof given in the first paragraph of Section 3 goes through
much the same as before, and in place of (3.3) we end up with
00
J
(4.2)
LFl{U + 5) - Fl(U - 5L7 d F2 (U)
o
From
(4.2) we see that 5 = 0 implies p = ~, and (With trivial exceptions)
5 ~ 0
implies
p ~
21 .
) cannot exceed 114m even under the
Now obviously ~see ('.7)7 var(1rl
m,n
broader null hypothesis that we are now considering. Hence (1.7) will be a
satisfactory test of
modification
Q
l
= Q2
(4.1),
= Q2 even under the broader model associated with the
l
in the sense that (a) the probability of rejecting when
Q
will never exceed a (we are disregarding inaccuracies due to the normal
approximation), and
(b) the test will be consistent against all alternatives
l ~ Q2 (with certain trivial exceptions).
Thus, if G and H are any two (continuous) symmetrical distributions, not
Q
necessarily of the same form, then (1.7) can be used to test the hypothesis that
their medians are equal.
(viii)
If we Ydah, we can broaden the null hypothesis still further:
use (1. 7) to test the null hypothesis
p
ever, such a broad null hypothesis as
p
="21
="21
against alternatives
1
p ~ '2'
we can
HOYT-
might not be very meaningful in most
-11practical situations.
( iX)
An upper bound for
var( W )
m,n
an interesting lower bound for
m
= n,
p
f4,
last line of
(4.;)
and
He might note that
var(vl
) can be obtained for the particular case
m,n
=~. For this case, (;.4) becomes
(We assume that
Lehmann
is given by <:3.7).
G and
H are continuous.)
We can write
pp. 172-17"d.7 proves that the non-negative integral appearing in the
(4.4)
(4.4),
is equal to
0 if and only if G
= H.
Hence, after combining
we conclude that
2
var(W
) > (2n + 1)/12n
n,n with the equality sign holding if and only if
One implication of
(4.5)
if l'
G
1
="2 '
= H.
is that, if an experimenter (erroneously) uses the
ordinary Wilcoxon test rather than the test (1. 7) for testing a null hypothesis
of the form (1.1, 1.2, 1.3) for of the form associated with the modification
then (if m = n) his Type I. error will al'WB.Ys exceed the intended value
G = H.
(4.117
a, unless
-12...
(x)
The line of thought on "Thich this paper 'Was based, expressed in the
first three paragraphs of Section 2, has been extended to a problem involving
comparisons of two regression lines having unequal variances.
This problem "Till
be discussed in other papers.
5. Acltnowledgments.
The idea as to the possibility of using the Wilcoxon
statistic for the Behrens-Fisher problem arose in a conversation with Frederic
M. Lord of Dducational Testing Service.
The author also wishes to acknowledge
helpful discussions "lith Y. S. Sathe and with Professor Wassily Hoeffding.
,
.
-13-
REFERENCES
.r.~7
Harald Cramer, Hathematica1 r,1ethods of Statistics, Princeton University
Press, Princeton, New Jersey, 1951.
15.7
Heyer Dvre.ss, "The large-sample power of rank-order tests in the tvTO-
sample problem," Ann. Hath. stat., vol. 27 (1956), pp. 352-374.
1 L7
W. Hoeffding, '~ class of statistics with asymptotically normal distri-
butions," Ann. Math. stat., vol.19 (1948), pp. 293-325.
L!!:.7
E. L. Lehmann, "Consistency and unbiasedness of certain nonparametric
tests," Ann. Math. stat., vol. 22 (1951), pp. 165-179.
LL7
H. B. Mann and D. R. Whitney, "On a test of whether one of two random
variables is stocbastlteal1y larger than the other," Ann. Math. stat., vol. 18 (1947),
pp. 50-60.
1f!
E. J. G. Pitman, "Lecture notes on non-parametric inference"(mimeograp1B:l),
Institute of Statistics, University of North Carolina, 1948.
£17
Henry Scheffe, "On solutions of the Behrens-Fisher problem, based on the
t-distribution," Ann. Math. stat., vol. 14 (1943), pp. 35-44.
£§J R. M. Sundrum, liThe power of Vlilcoxon's 2-sample test," J. Roy. stat.
~, Ser. B, vol. 15 (1953), pp. 246-252.
£27
F. Wilcoxon, "Individual comparison by ranlting methoo.s, Ii ,Biometrics,
vol. 1 (1945), pp. 80-83.
© Copyright 2026 Paperzz