I
I.
I
I
I
I
I
I
'I
ON A CLASS OF PERMUTATION TESTS FOR
STOCHASTIC INDEPENDENCE, II
by
Pranab Kumar Sen
University of North Carolina
Institute of Statistics Mimeo Series No. 533
I_
I
I
I
I
I
I
I
II
July 1967
Work supported by the National Institute of Health,
Public Health Service, Grant GM-12868
•
DEPARTMENT OF BIOSTATISTICS
UNIVERSITY OF NORTH CAROLINA
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I
I_
I
I
i
ON A CLASS OF PERMUTATION TESTS FOR STOCHASTIC INDEPENDENCE, II*
By
Pranab Kumar Sen
University of North Carolina at Chapel Hill
SUMMARY.
,
Extending the ideas of Hajek (1962) and Sen (1967a, 1967b), a class of
asymptotically most powerful permutation (rank order) tests is offered for testing independence of two stochastic variates when the observable random variables
correspond to a finite or countable set of contiguous cells having an underlying
continuous distribution.
1.
INTRODUCTION
Let X, = (X .,X ,), i=l, ... ,N be N independent and identically distributed
l
2
....,~
~
~
random vectors (i.i.d.r.v.) having a continuous (bivariate) cumulative distribu-
... ...
tion function (c.d.f.) F(x), x
€
R2 , the real plane.
On R2 , a finite or countable
set of contiguous cells is defined by
(1. 1)
where {-co = a
o
< a
1
< a
2
< ... < co} and {-co = b
sets of ordered points on the real line (-00,00).
1'
Zijk = { 0,
i
The theory is illustrated by some examples.
for all i=l, ... , N.
!i
€
I jk,
otherwise;
Z.
"'~
o
< b
l
< b ... <co} are any two
2
Let then
=
Our observable random variables are
(1. 2)
~1'
... , ~N' and having
observed them, we want to test the null hypothesis
(1. 3)
* Work supported by the National Institute of Health, Public Health Service,
Grant GM-12868.
I
-2 -
A class of permutation tests based on appropriate rank order statistics is offered
for testing H in (1.3).
o
This generalizes some earlier results of Sen (1967a,
••I
Section 5, (5.23) through (5.26», and the proposed tests are shown to be asymptotically most powerful for certain specific alternative hypotheses which are
generalizations of the regression alternative of H;jek (1962) to the case when
(i) both the variables are stochastic and (ii) the regression is not necessarily
linear.
Assuming X , i=l, ... ,N to be non-stochastic, Hajek (1962) has considered
2i
the model specified by
(1. 4)
where~, ~
and cr (> 0) are unknown parameters and we desire to test H*:
o
~O.
Sen (1967b) has considered the same model but the case when Xli' i=l, ... , N, are
not observable; the observable random variables correspond to a set of finite or
countable number of contiguous class intervals on (-00,00).
In the present paper,
the general case of both the variates being stochastic is considered, and the
tests are based on the observable random variables
~l'"
. ,3N' defined by (1. 2).
Recalling that cr in (1. 4), in general, depends on
~,
we may consider
the slightly more general model
where
mum at
cr(~)
~O,
is continuous (in
~)
in the neighbourhood of
(e.g., bivariate normal distribution).
~O
and attains a maxi-
In this paper, we consider
the following more general model.
Let F(00,x2 ) = H(x ) and F(x1Ix2) = G(x1Ix2).
2
some regression or association parameter
~,
such that
We assume that there is
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
-3 -
where ~ is such that Go (x l !x2 ) = Go(x l ) is independent of x2~ i.e.~ Go(x ) is
l
the marginal c.d.f. of xl'
(I)
Then~
we shall make the following assumptions:
H(x) and G~(xllx2) have continuous density functions h(x) and
g~(xllx2)~ respectively.
(II)
There exists a 0 > O~ such that for all 1~1 ~ 0
(1. 7)
where
(1. 8)
(III)
(1. 9)
where C(F) is a constant (which may depend on F) and £.(x.) is a sole function
~
of
x.~
~
i=1~2.
(1.8)~
By virtue of (1.7) and
we obtain that
00
= [
)'2
~
00
f
£~(x)dGo(x)][
-00
f
£~(x)dH(x)J
<
00
•
(1. 10)
-00
Our proposed rank order tests will be asymptotically most powerful for testing
H:
o
~=O~
2.
> 0 when the assumptions in (1.6) through (1.10) hold.
~
against H:
ASYMPTOTICALLY MOST POWERFUL PARAMETRIC TEST
Let us denote by
P .k = P {X
J
a
].1.
J
=
N
E
I. k} ~ P.
J
JO
b
j
f £l(x)dGo(X)/P jO'
a _
j
= L: p. k~ P k = L: p. k ;
k J
0
j
J
l
V
(2.1)
k
k = bJ_ £2(x)dH(x)/P ok ~
k l
(2.2)
I
-4-
for all j, k=l, 2, ... ;
(2.3)
As in Sen (1967b), it is seen that
00
00
A2 :::; J.t (x)dGo (x), B2 :::; J.t~(x)dH(x),
-00
-00
1
uniformly in (ao,a , ... } and (bo~bl""}'
l
Thus? by (1.10), both A2 and B2 are
finite; they are also assumed to be positive.
H :
o
~O
against
~ >
(2.4)
Now, in the sequel, for testing
0, we shall conceive of a sequence
(~}
of alternative
hypotheses, where
~:
~
-1.
= ~N = N 2 A; A is
real and finite.
(2.5)
= h(x2)g~(xllx2)
that under
and making use of (1.7), (1.8) and (1.9), we obtain
~:
(2.6)
for all j,k=1,2, •..•
Let us denote by L~l~""~NjA) the likelihood function
of the sample observations for
~N
-1.
= N 2 A•
Then from (2.6), we obtain
(2. 7)
N
= C(F)A[N2 r. r. r. Z"l.J.l.v k ] + O(A).
i=l j k ~Jt\. J
1
Thus, on writing
I
I
I
I
I
I
_I
Choosing N sufficiently large (so that I~Nj :::; 8 for all N ~ No)' writing
f(x ,x )
l 2
.-I
I
I
I
I
I
I
I
••
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
-5-
_-1.
N
(2.8)
'
TN = N 2L:,~= 1 L: J,L: k Z.~J'k].l,V
J k
it follows from Neyman-Pearsonts fundamental lemma [cf. Rao (1965, pp. 375-377)J
that for testing H : A = 0 against A > 0 (referred to (2.5», the asymptotically
o
~
most powerful size
(0 <
~
< 1) test is given by
(2.9)
aN,~'
where TN
,~
and a.
l'l,~
(0 < aN
-
< 1) are so chosen that
,~-
(2.10)
Now, on making use of the fact that
o~
i£
dF(X1'X2)1~=0
= 0, we obtain from
(1.9) that
(L: ].l.p,) (L: VkP k) =
j
J JO k
0
Let then Y.
~
= L:.L:
J k
o.
(2. ll)
Z, 'k].l'V for i=l, ... ,N, so that Y , ... ,Y are LLd.r.v.
1
~J
J k
n
Using (2.6), (2.11) and following a few simple steps, we obtain that
(2.12)
(2.13)
Hence, making use of (2.6), we obtain from (2.13) that
(2.14)
Thus, using the classical central limit theorem for i.i.d.r.v. [cf. Rao (1965),
p. 107)J, we obtain the following.
I
-6-
THEOREM 2.1.
i.e.,
[T~AB
{~}
Under (1.6) through (1.10) and
••I
in (2.5)
- C(F)AAB] converges in law to a standard normal distribution.
Consequent on theorem 2.1, we have from (2.5) and (2.9),
(2.15)
where <p(x) is the standard normal cdf and
3.
<p(~
~
) = 1-~.
A PROPOSED CLASS OF RANK ORDER TESTS
Let us denote by
N
L: i =1Zijk
G .
N, J
=
= Njk,
L:.e~lN.e.'
(l!N)
for all j,k=1,2, . . . .
GN, 0
=
= N. k,
0, ~,k
(l!N)
=
N.
J'
L:q~l
,
N. q' ~,O
(3.1)
=
0,
-1
J
.e l(G (u) )du![ GN ' - GN, j -1]' j= 1,2, ... ;
Nj = G
O,J
N,j-1
v Nk
~,k
= J
R_
-""N,
-1
£2(H
(u))du![~ k - ~ k-1 J, k=1,2, ... ;
k-1
where ,e 1(x) and.t 2 (x) are defined by (1. 9).
the statistic
=
L: N'k
k J
(3.2)
Let then
G •
N, J
Jl
L: Njk
j
"
Our proposed tes t is then based on
(3.3)
(3.4)
I
I
I
I
I
I
_I
I
I
I
I
I
I
I
••
I
I
-7-
I.
I
I
I
I
I
I
I
I_
I
I
I
(3.5)
To formulate the test function, we let
(3.6)
PNjo = GN,j - GN,j_l' PNok = HN,k-HN,k_l' j,k=1,2, ••• ;
(3.7)
It may be noted that as we are dealing with grouped data, the distribution of SN
will depend on the parent c.d.f. F(x)
even under the null hypothesis (1.3).
N
How-
ever, we may apply the same logic of permutation test theory as in Sen (1967a) to
formulate a permutationally distribution free test based on SN'
the details of arguments, denoting by
CP N the
Avoiding therefore
permutation (conditional) distribution
generated by the matching invariant transformations and proceeding as in Sen (1967a),
we obtain
E(SN1dPN) = 0,
E(S~ICPN)
£(p {SN/~BN}
=
N~l ~B~;
-rU(O,l).
(3.8)
(3.9)
N
By analogy with (2.9), we may consider the test function
If 2 (~l' ••• ,.tN) =
-
I
1,
SN > SN,ex,(CPN)
bex,(<f'N) ,
S = SN ,ex, (<P )
N
N
0,
SN < SN ,ex, (<YN )
(3.10)
where SN ,ex, (~N) and b ex, (@N)(O<b<l)aresochosenthat
ex,
(3.11)
e
(3.11) implies that the unconditional level of significance is also equal to ex,.
I
-8-
We intend to prove that
1 -
~(T
a
-AC(F)AB)
where {~} is specified by (2.5) and (~,Ta) by (2.15).
(3.12)
The proof of this statement
rests on the following
THEOREM 3.1.
Under (1.6) through (1.10) and (2.5)
&(SN/AB-AC(F)AB) -+),(0,1);
(3.13)
2 P B2
BN-+
•
PROOF.
(3.14)
The proof that ~ ~ A2 follows precisely on the same line as in lemma 3.1
of Sen (1967b) and theffime proof also applies to the stochastic convergence of B~ to
B2 ; the details are therefore omitted.
LEMMA 3.2.
PROOF.
The proof of (3.13) rests on the following.
I
I
I
I
I
I
_I
Under the conditions (1.6) through (1.10)
It follows from (2.8) and (3.5) that
(3.15 )
o
and EN denote the expectation over the permutation distribution c:P
where E(Y
.-I
N
N
N. ,N k,j,k=1,2, •.• ,) and the distribution of N. ,N k,j,k=1,2, ••• ,
(of Zijk' s given
. JO 0
JO 0
respectively.
We note that
E(Zijk/<Y N) = E(Z~jkICPN) = NjO Nok /N
2
,
E(Zijk Zi'~ql G?N) = 0 if i=i' but (j,k) ~ (~,q))
= N. N k(N~ -O.)(N -Ok )/N 2 (N_l)2
JO 0
~o
Jt oq
q
i f i~i',
(3.16)
(3.17)
(3.18)
I
I
I
I
I
I
eI
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
-9-
for j, ,k,q=1,2, ••• ; where
0j~
and 0kq are the Kronecker deltas.
Also, it is easy
to verify that under~, {N. ,j=1,2, ••• } and {N k,k=1,2, ••• } are stochastically in~
JO
0
dependent, and
(3.19)
Hence, from (3.15) through (3.19), we obtain after some simplifications that
(3.20)
Now
(3.21)
and hence, proceeding as in Sen (1967b, (3.33)), we get that (3.21) converges to
N.
N k
o
....l£
0
EN(~~'~N' N )(~VkVNk~)
j J
J
(3.22)
k
and hence, proceeding as in Sen (1967b, (3.34)), we get that (3.22) converges to
A2 B2 as N+oo.
Thus, (3.20) converges to zero as N+oo.
Hence the lemma.
An immediate consequence of lemma 3.2 and theorem 2.1 is the following.
LEMMA 3.3.
Under H in (1.3), (TN,SN) converges in law to a bivariate normal diso
tribution concentrated on the line TN=SN.
....
•
The rest of the proof of theorem 3.1 follows from lemma 4.2 of Hajek (1962)
and our theorem 2.1, lemmas 3.2 and 3.3; for brevity the details are omitted.
Comparing now (2.15) and (3.12) we observe that the permutation test in
(3.10) based on the rank order statistic SN is asymptotically most powerful for
H : S=O against S>O; a similar result also holds for H : S=O against S<O.
o
0
I
.-I
-10-
4.
ASYMPTOTIC EFFICIENCY CONSIDERATIONS
Under the conditions in (1.6) through (1.10), for ungrouped data, the
asymptotically most powerful (parametric) test (based on the likelihood ratio test
criterion) can be shown to have the asymptotic power (for {H } specified by (2.5»
N
where
~(x)
and T are defined by (2.15) and y by (1.10).
a
Thus it follows from
(2.15) and (3.12) that thehss in asymptotic efficiency due to grouping of data is
equal to
L({I jk }) = 1 - A2B2/y2,
where A2 and B2 are defined by (2.3).
(4.2)
Let us now define
el
(4.3)
b
= L:
00
fk [~(x)-Vk]2dH(X)/ f ~(x)dH(X),
k b _
k l
(4.4)
_00
where .Q,1(x),.Q,2(x) are defined by (1.9) and llj,V ,j,k=1,2, ... , by (2.2).
k
I
I
I
I
I
I
Using
(1.10), (2.3), (4.2),(4.3) and (4.4), we obtain
(4.5)
i.e.
(4.6)
< L({a ,a , ••• }) + L({b ,b , •.•• })
0
0
l
l
-
If P
and P ,j,k=1,2, ••. , are all sufficiently small, it follows from (4.3) through
jO
ok
(4.6) that the loss in efficiency due to grouping can be made arbitrarily small.
I
I
I
I
I
I
I
e'
I
I
I
-11-
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I
I
Now, instead of the rank order statistic SN' we may consider another rank
order statistic
S~
which leads to an asymptotically most powerful test for H : 6=0
o
against 6>0 when the true c.d.f. is F*(x),
x£R, i.e.,
IV
N
(4.7)
where
~~j' V~k' ~j' V~,j,k=l, •••
, are defined as in
with the c.d.f. F(x) being replaced by F*(x).
ru
definition of
noted by
ru
~l(x), ~2(x),
~!(x), ~~(x)
and Vk respectively,
Thus, replacing F(x) by F*(x) in the
N
"-'
and Pjk's, we obtain analogous expressions which are de-
and Pjk respectively.
efficiency of the test based on
the true c.d.f.
~Nj,VNk'~j
S~
We want to study the asymptotic relative
with respect to the one based on SN when
F~)
is
For this, let us define
=
L:~.~~*P.
j{[L:J.P. ][L:(~j**)2P. ]}1f
j J J JO
j J JO j
JO
(4.8)
where
~~*
J
=
j=1,2, •••
P2({b ,b , ••• }) = L:v V** P j{[L:v 2p ][L:(V**)2 p ]}~
o 1
k k k
ok
k k okk k
ok
(4.9)
(4.10)
where
v**
=
k
for k=1,2, ••••
Thus, proceeding as in theorem 4.1 of Sen (1967b) and generalizing it along the
lines of theorem 3.1 of this paper, we obtain after some simplifications that the
asymptotic relative efficiency of the test based on
--
based on SN' when
F(~)
S~
is fue true c.d.f., is given by
with respect to the one
(4.11)
I
-12-
(4.12)
If F(~) and F*(~) are specified, (4.12) may be evaluated for specified {a ,a , ••• }
o 1
and {b ,b , ••• }.
o 1
5.
SOME ILLUSTRATIVE EXAMPLES
Let us first consider the case when F(x)
is a bivariate normal cdf.
tV
In
this case, from (1.9), we have on taking p, the correlation coefficient as a measure
of association (i.e., p=S)
c(F)
X -E(X )
2
2
lIV(X )
= 1, Q,l(x)
(5.1)
2
Thus, as in section 2, on denoting by
~(x)
the standard normal c.d.f., we have from
.-I
I
I
I
I
I
I
_I
(3.3) and (3.4)
]JNj
xd~(x)/[GN
.-GN .-1]' j=l, ••• ;
,J
,J
(5.2)
(5.3)
On denoting by ¢(x) the standard normal density function, we may write
¢(~
v Nk
-1
(GN,J·»] /[GN,.-G
·-1]'
J ,NJ
(5.5)
=
for all j,k=1,2, ••••
(5.4)
The corresponding SN' defined by (4.7), may be termed the
normal scores statistic for doubly-grouped data.
The test based on this statistic
is therefore asymptotically most efficient for testing H : p=O against p>O (or <0)
o
I
I
I
I
I
I
I
e
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
II
-13-
when F(x) is a bivariate normal c.d.f.
N
A second example may be as follows.
Suppose that the second variate X
2i
has marginally a normal distribution while the conditional distribution of Xli
given X is logistic with a location parameter a+Sx •
2i
2i
In this case, it is easily
seen that for the optimal SN' V , k=1,2, .•. , are defined by (5.3) and (5.5), while
Nk
~N'
J
=(1/2X GN j-l + GN .) for j=1,2, ••..•
,
,J
(5.6)
Many other examples may be considered.
REFERENCES
HAJEK, J. (1962):
Asymptotically most powerful rank order tests.
Ann. Math. Statist.
33, 1124-1147.
RAO, C. R. (1965):
Linear Statistical Inference and its Applications.
John Wiley:
New York.
SEN, P. K. (1967a):
On a class of permutation tests for stochastic independence, I.
Sankhya. Ser. A.
SEN, P. K. (1967b):
11,
in press.
~symptotically
most powerful rank order tests for grouped data.
Ann. Math. Statist. 38, No.4 (in press).
© Copyright 2026 Paperzz