•
•
AN ALIGNED GOODNESS OF FIT TEST FOR THE
MULTIVARIATE TWO-SMIPLE MODEL: LOCATIONS UNKNOWN
by
Pranab Kumar Sen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1463
June 1984
AN ALIGNED GOODNESS OF FIT TEST FOR THE MJLTIVARIA'IE 'IID-SAMPLE
*
M:DEL : lOCATICNS UNI<N<lNN
•
Pranab Kumar SEN
1. INl'ROOUCl'ICN
Let Xl , ••• , X
be m independent and identically distributed (LLd.)
-m
~
randcrn vectors (r. v.) with a continuous distribution function (d. f.) F ,
defined on the Euclidean space
Y
~n
Ef
1
for sorre p ~ 1 • Also 1 let ~l ,... ,
be n LLd.r.v. with a continuous d.f. G
1
defined on
Ef. we
consider
the Irodel :
=F
F(x)
(1.1)
-
•
~l
where
and
(x 0
~2
-
8)
-1
are unknown (location ) pararceters and the forms of F0
and G are also not known. The problem is to test for the hypotheses
o
H' F
O' 0
(1. 2)
~l
treating
= G0
and
~2
against
H
l
t
: F
0
G
0
as nuisance paraneters. For
,
..61
~2'
=
under HO
1
F
and G are the sane, so that one may use the usual two-sample Kolnogorov Srnimov test statistic for testing H
o
against H • If F (x) and G (x) be
m
n
l
respectively the first and second sample empirical d. f.
1
then, this statistic
can conveneiently be written as
1
(1. 3)
K+
=
sup{ (ron/N) Yz I F (x) - G (xl]
m~
n-
(1.4)
K
=
sup{ (nUl/N) Yz
nUl
1
nUl
where N
(x) - G (x)
m~
n~
I
-
(one-sided case)
X E Ef }
(two-sided case)
= m+n is the total sample size. For the univariate case (Le. , p = 1) 1
under H
O
and hence
*
IF
XEEf}
the distribution of K+
1
nUl
1
or K
nUl
does not depend on F (continoous)
1
they are genuinely distribution-free when F and G are the sane.
WOrk partially supported by the (U.S.) Office of Naval Research, Contract
N00014-83-K-0387.
-2-
For small values of m and n , the exact distribution of K+ or K
(when F=G)
ron
ron
can be obtained by direct entmEration ; we may refer to Hodges(l957) for a
survey of computational methods and sorre nl..lITerical studies. Vincze (1960) has
also presented a survey of sane of these results. For large semple sizes, we
have for
(1.5)
(1.6)
eve:ry
t::.. 0,
•
under F = G,
lim
p{ K+
m,n-+<x>
ron
lim
p{ K
m,n-+=
ron
>
t}
= exp( -2t2 ) ,
>
t}
= 2
"[.00
k=l
(_l)k-lexp(_2k 2t 2 )
'
and, actually, these tails dominate the exact distributions for the finite
sample case. In the multivariate case (Le., p
.?:
2), this exact distribution-
freeness of K+ or K is not generally true (excepting when all the P coormn
ron
dinates of the X. (and Y.) are all mutually independent. However, when F = G,
-1
-J
even in the multivariate case, all the N vectors Xl'
, Y , ••• , Y
_ ••• ' X
-m....,
_n are
l
Li.d.r.v. (with the d.f. F), so that the Chatterjee-sen(1964) rank-pennutation
principle can be adapted to generate the pennlltational distribution of K+
ron
or K over the set of N! (conditionally) equally likely perumtations of these
ron
N vectors, and this yields a conditionally (pemD..1tationally) distribution-free
test for F = G • This task becates prohibitively labourious when N increases,
so that one is inclined to use suitable approximations to this pennutation
distribution. This has been studied in detail in Bickel (1969), and that
reveals the asymptotic distribution-freeness of K+ and K when F = G.
ron
ron
with respect to the nodel (1.1), when
~l"l ~2
' under HO in (1.2), F
and G need not be the same, and hence, the test based on
Itron
or K may not
ron
be valid (may even be inconsistent). One FOssibility to overcome this problem
is to consider suitable estimators
'"
~l,m
and
A
~2,n of ~l and ~2 ' respectively,
fonn the residuals (vectors) X. - '"8
' i= 1, ••• ,m and Y. - '"8
,j= 1, ••• ,
_1
- 1 ,m
-J
- 2 ,n
n, and to base a test on these aligned observations. Within each sanple, the
residual vectors are generally not independent ( they are symretrically
•
-3-
dependent (or exchangeable) with their marginal distribution and the dependence
pattern both depending on the underlying d.f. an:l the sample size m or n), but,
residuals from the other samples are independent of each other. This distorts
the exchangeability of all the N residual vectors (even when F 0
•
m
=n
= Go
and lor
), so that the rank-pennutation principle may not be applicable • From
the asyrrptotic distributional point of view, for the "t\o,Q-sarrple KolnogorovSrnirnov statistics based on such aligned observations, even if one chooses
efficient estimators of ~l and ~2 ' for the related (aligned) empirical
SOIre
distributional processes, the covariance functions may become quite involved
and no simple distributional theoryrnay therefore be available. This problem
has been studied in the univariate case in 8en(1984) where a variant form of
the Kolnogorov-Srnirnov statistic has been used (under an additional aSSUIT'Ption
e
of syrmetry of F
o
and G
0
) , and it has been shown that this alternative one
is asyrrptotically distribution-free under quite general regularity conditions.
The primary objective of the current study is to consider scme multivariate
generalizations of aligned Ko1.m:>gorov-Srnirnov type tests, to validly apply
them for the testing problem in (1.2), and to present their asyrrptotic propoerties
too. Along with the preliminary notions, the proposed tests are presented in
Section 2. Asyrrptotic properties of the tests are then considered in section 3.
SoIre general c:::onnents are made in the concluding section.
2. THE PROPOSED TESTS
'lb eliminate the nuisance pararreters from the testing situation, first,
we consider serre suitable estimators of the locations ~l and ~2 • Let
_81 ,m
A
and
_82 ,n be"t\o,Q arbitrary transZation-invariant estimators of
~l and ~2 '
respectively. we asst.Ilre them to be' square-root n I consistent , that is ,
(2.1)
where
m~ 1181
-,m -
II .11
_81 11 = 0 p (1)
and
n~ II 8
_,n
2
- _
8211
= 0p (1)
,
stands for the I:.uclidean norm, and we asst.Ilre further that there
exists a positive
A (0 <
o
A <
a
~),
such that
-4-
o
(2.2)
<
A
o
~
<
< 1 - A <
= mIN
1,
o
N > N
-
0
•
Consider then the residuals (vectors)
'"
(2.3)
A
~1
A
8
' i=l, •• o,m;
1
_,m
X. = X. -1
A
Y. = Y. - 8
-J
~J
For these aligned observations, we consider the
' j=l, ••• ,n.
2
~,n
~irical
d. f. ' S
A
-l...m
A
A
-1 n
A
F (x) = m l...._lI( X. < x) , G (x) = n ~'-lI( Y. < x) , X
m1'~1. n ...
J~J - ~
-
(2.4)
_0
t,-,
E:
where a_ <_
b rreans the cOOIdinatewise inequality a.1 .<- b.1 , i=l, ••• ,p. Further,
let us define
A*
X. = (
(2. 5)
~1.
and denote the
'1',...,I
IXA1
A
x.1p I)
~irical
,
A*
IA I
A
,
Y. = ( Y' l , ••• , IY. I)
~J
J
JP
and
d.f. 's for these observations by
A*
-1 n
A*
G (x) = n ~'-lI(Y. < x) , X
n J- - J - ~
-
A*
-l~
A*
F (x) = m L.._II( X. < x) and
(2.6)
m ...
where
1.-
-1. -
-
E* , the positive quadrant of
p
(2.7)
E* =
p
,
-#,
£
E
*
p
,
is defined by
(C
{(xl' ... ,x) : 0 < x. < 00, j=l, ••• ,p}
p
- J
-#).
Our proposed tests are based on the ~irical d.f. 's in (2.6). Specifically,
e
we consider Ko1.nDgorov-Srnirnov Type test statistics , though othen may be used
in the sane manner. Define
(mn/N) ~ I P* (x) - a* (x) J
(2.8)
i(*+
mn
= sup{
(2. 9)
A*
K
mn
= sup{ (mn/N)
m~
k
2
x £ E*
p
n~
I FA* (x)
A*
- G
m~
(xli
x
n~
£
}
*
E }.
p
Our primary concern is to study the properties of the test statistics i(*+
and
mn
A*
K
mn
with a view to testing for the hypotheses in (1.2). In this oontext, we
define the true residuals by
(2.10)
X~ = X,. - 81 ' i=l, ••• ,m and
_1.
_1
.p,
- J
-
= Y. - ~2 ' j=l, ••• ,n ,
-J
-
Then, parallel to (2.4)-(2.6), we define
m-lt?_lI(X~ < x)
(2.11)
FO(x) =
(2.12)
0*
X.
(2.13)
-l...m
0*
0*
Fm ~) = m l..i=lI (?5i ::~)
m-
_1
0
X '1
= ( I
1.
1~
_1 -
-
0
1,••• , Ix.1.p I
n-l~_lI(Y~ < x)
GO(x) =
'
,
n _.D*
Y.
-J
and
=
J-
(
_.D
~J -
-
x
'
E:
-#
_.D'
,
IY·ll
, ... , IYo I> ,
J
JP
0*
-1 n
_.D*
Gn (~) = n ~i=l I (~j ::~) , ~
*
£
Ep •
Note that under H in (1.2), both X~ and .p, have the camon (unknown) d.f. F ,
O
-1.
-J
0
0*
0*
*
so that X.
and Y. have also a ccmron d. f., which we denote by F • Parallel
~
~
0
-5-
to (2.8) and (2. 9), we define
*+
(2.l4)
K
rrn
= sup{
(my'N')
(2.15)
K
ron
*
= sup{
(rrn/N)
OUr
1:.J:J*
2 [ Jr
(~.9-
m
!<:
2
- Gn0* (~)]
-D* (x)
I 1".
m_
-
X
* (x)
cP
n_
I:
£
E
*
},
p
*
x £ E }.
p
basic goal is to establish a general asymptotic (stochastic ) equivalence
A*+
result for K
and K
rrn
*+
A*
(as well as K
ron
ron
and K
*
), and this will enable us
rrn
to study the asynptotic properties of the proposed test statistics thgrough
*+
those of K
rrn
and K
*
•
rrn
A
we may recall that the estimators 8
-I,m
A
8
and
- 2 ,n
are assumed to be
translation-invariant, so that the residuals in (2.3) are translation-invariant
ass~
too. Thus, without any loss of generality, we ffi3.y
-61
(2.l6)
=
we assUIIe further
82 = o.
-that
d. f.
the
xC:
-1
the origin, that is,
d.f. F
F
o
that
and G are both diagonally sy.mretric about
and (-l}XC:
-1
0
(and..p, and (-l}YC: ) both have the camDn
-J
-J
(and G 1. we also assUIIe that the d.f. F
0 and G0 both have unifonnly
0
o·
continuous probability density functions (p. d. f. ) .,f0 and g 0 , respectively,
alnost everywhere (a.e.). With these preliminary notions and basic ass'l.I£l'Ptions,
we proceed on to study the general properties of the proposed tests.
3. ASYMPTOI'IC PROPERrIES OF THE PROPOSED 'IESTS
First, we consider the following asymptotic equivalence result which provides
the access to our subsequent analysis.
Theorem 1. Under the regularity conditions.of·section 2, ·as N
•
,,*
(3.l)
0* _
sup{ m 2 IFm(~} - Fm
(x)
(3.2)
sup{
1:
A*
0*
n ~ IGn~) - G (~)
n
I
x
I
x £ E
-
£ E
*
p
}
* }
p
~
0
P
0
-+
-+
00
,
Proof. we shall prove the first assertion in (3.1) i the proof of (3.2) follows
precisely on the sane line~ By virtue of (2.l), we may write
(3.3)
t
-m
1:
= m 2(
A
8
- ~l} , so that
-l,m-_
M.Jreover, note that by definition,
= 0p (l)
•
-6-
(3.4)
A*
,,*
F (x) = F (x ' ••• ,x )
mm 1
p
(3.5)
(1')
) l<J<P
'
),
-1. (-X.-,
EitpO ,l ••• E =O,l (-1)P- EiJ' FAm(.1.X.+
J J
J
J
-ip
. 0*'
0*
F (x) = F (xl' ••• ,x )
m
m
p
E
= E.
1 ... E =0 1 (_l)p- i j FO <-i.x.+ (I-i.) (-x .-), l.s:iY) ,
i p'
~1= ,
m J J - J
J
=
°
--
m2
p
-
A*
k
(3. 6)
E * • Therefore, for every x E E * ,
~ E
for every
P
0*
F (x) - F
(x)]
m m-
[
~
~
(_l)p-Ei . m~ [ F (i.x.+U-i.) (-x.-) ,1<j<p)
J
= ~il=O,l ···~ip=O,l
m J J
J
J
-A
- FO(i.x.+(l-i.) (-x.-) ,1<j<p) ] •
m J J
J
J
-Recall further that
FA m_
(x)
(3.7)
°
Fm( x_ + m-k2 t_m ) , for every x_
=
Therefore, by (3.6) and (3.7),
A*
k
m 2 [ F (x) - F
(3.8)
m -
0*
WE!
_0
E t..~.
obtain that
(x)]
m-
(_l)p-Ei j ~{[ FO(i.x.+(l-i.) (-x.-)+.m-~~.,l<j<P)
m J J
J
J
J -- FO(i.x.+(l-i.) (-x.-), l<j.s»
m
...
(-1)
p-Ei
k
j m2{
J J
J
J
.
-k
~.,l<j.s»
I F (i.x.+U-i.) (-x.-)+.m
o
JJ
J
]}
-
J
J-
- F (i.x.+U-i.) (-x.-) ,1<j<p)]
J J
J
J
--
°.
°
.
+[ F (l.x.+(l-L) (-x.-)+rn-~.
1:., l<j<n) - F (i.x.+(l-i.) (-X.-)+n1 ~.,l<j<p)
_...F
J J
J
J
J-m J J
J
J
J
-!.::
°
-Fo(i.x.+(l-i.) (-x.-),l<j<n)
m J J
J
J
_...F
Now, by virtue of the assUIrEd diagonal synnet:ry of F , :1 (x) = f «-l)x) ,
°
for every.x
...
of f
°
,
WE!
E
Ef,
0_
°
_
so that using (3.3) along with the a.e. unifonn continuity
obtain by the Taylor expansion that
Ei =0 1 • .. E. i =0 1 (-1)
l'
p'
p-Ei. ~
.
-1
J m IFo(iJ·x,+U-i.) (-x.-)+ m '1t.
J
J
J
J
= EPk=lt
-k
E.
°
=
~1
•
l<J'<o)
..,a-
F (i.x.+(l-i.) (-x.-), l<j<p ) ]
~
J J
J
J
-•
1 (~l)P-Eij (a/ayk)F (i.x.+(l-i.) (-x.)+y.,l<j<p)
, 1 ••• E..1 p-0
- ,
J
J
J
J - _- _
+o(II~II).oU) ~
-
(3.9)
,
°
° :)
Iy=o
-7-
We like to show that the first tenn on the right hand side of (3.9) is identically
equal to O. Towards this, for simplicity,
•
holds for general p
~
"We
=
take p
~
2. Note that for every xl
2 ; a very similar treatment
0 and x
2
~
0 ,
(a/aYl){ Fo (xl +yl'x2+y 2 ) - Fo(xl+Yl,-x2+Y2)-Fo(-xl+Yl,x2+Y2l+Fo(-xl+Yl,-x2+Y2)]
x
-x
-x
x
= f-oo2 fo(,xl'u)du - f.....oo 2 fo(xl'uldu - f.....oo2 fo(-xl,uldu + f-oo 2 fo(-xl,u)du
•
=
x2
-x2
f-oo fo(,xl,u)du - f.....oo
00
fo(,xl,u)du - f-
(3.10)
=
Iy=o
00
fo(xl,u)du + f~ fo(xl,u)du
x2
(by the diagonal synnetry of f )
o
f
00
00
f 0 (xl,uldu
-00
=
f -00 f 0 (xl,u)du
-
0 •
- = O.
_
'l1he sam= identity holds with the Partial derivative with respect to Y at y
..
Also, by (3.3), unifonnly in x
(3.9) is
0
P
2
E* , the second tenn on the right hand side of
p
£
(1). Hence, unifonnly in x
-
£
E* ,
P
(3.9) is
0
p
(1). For the treatment
of the last tenn on the right hand side of (3.8), we make use of the weak convergence
of the sample distributional process in the llUllti-dinensional case. For every
•
m ~ 1,
(3.11)
"We
define vf
m
= { vfm (xl
-
, x_
£
JiP } ,
by letting
vf(x)
= ~ [ FO(x)
- F0(xl
] , x_
IDID_
-
£
JiP.
vh
Tnen the weak convergence of the empirical 'process
to a tied-down Gaussian
function follows by an appeal to the standard results on weak convergence in the
llUllti-pararreter case [ see for exarnple, Neuhaus (197l)]. If we define the
nodulus of continuity
wo(g) as sup{ Ig(~) - g(~)1 :I~ - ~I <o},o > 0, then
by the results in section 5 of Neuhaus (1971), we conclule that
(3.12)
limc-+
O
limsUPm ~
tala
p{
[ Actually, we may rewrite \.f
(vh)
>
£}
=0
,
y
£
>
o.
as a process on D[0,1] P (by using the marginal
ID
probability integral transfonnations) and then verify (3.12) by reference to
Neuhaus (1971). However, by the unifonn continuity of f
o
(as has been assurred) ,
we may bypass these details and state the result as in (3.12)]. Since
-k,
II ~ II =
It II p
-+ 0 as m -+ 00 , so that by (3.12), we
conclude that the last tenn on the right hand side of (3.8) being bounded
o
p
(1) (as assumed in (3.3»,
ID
2
-8-
p
by 2P w0 (W ),where
m
m
as m -+
Consequently, we obtain that
00
(3.13)
0, converges in probability to 0 (a. e.),
-+
sup{
IFA*m (x)
-
!.:
m2
0*
- F
(x)
m-
I
X
E
E* }
P
p
-+
O,asm-+
oo •
This prove (3.1). Q.E.D.
Now, by virtue of Theorem 1, we conclu::1e that under the regularity conditions
of section 2, as N -+
sup{
!.:
00
,
A*
0*
A*
0*
N 2 1 Fm (x)
_ + Gn (x)
_
_ - Fm(x)
- - Gn (x)
I :
*
x_ E Ep }
p
o•
By (2.8), (2.9), (2.14), (2.15) and (3.14l, we conclude that under the assum:rl
regularity condilions, as N -+
(3.15)
I
A*+ - K *+
nm.
TIm
K
I
p
0
-+
00
,
A*
*
and IKnm. - KTIm
I
p
-+
o•
Note that the stochastic equivalence result in (3.15) is of quite general
nature, and it holds irrespective of the null hypothesis H in (1.2) being true
O
or not. For the univariate case, under H ' K*+ and K* are genuinely distribuO nm.
ron
tionfree and the asynptotic distributions in (1.5) and (1.6) pertain to these
K*+
nm.
statistics. 1bus,
and
K*nm.
•
are asymptotically distrinution-free and their
asynptotic distributions are given by (1.5) and (1.6). The situation is different
for the multi-variate case. Kmn
*+ and K.nm.
* are generally not distribution-free
under H (unless the p coordinates of x~ (or y~) are stochatically independent).
_1
-1
O
Keeping the asynptotic equivalence result in (3.15) in mind, we may proceed as
follows:
(i) Pennutational distributional approach: As the Chatterjee-sen(1964} rankpennutation principle can be adapted to generate the pennutational (conditional)
*+
distribution of K
nm.
*
or K
nm.
0*
..0*
(if the X. and Y. were observable), we use (3.15)
-1-1
and consider the following approximation to this pennutation distribution.
Define the
A*
X.
_1
and
A*
Y.
-]
as in (2.5). Pool these N aligned vectors into a
canbined set and then consider all possible
<:)
partitioning of these into tv.AJ
subsets of sizes m and n respectively. For each such partitioning, consider the
e
-9-
two-sarrple Ko1.m:Jgorov-Smirnov type statistics K*+ and K* ,defined as in (2.8)ron
ron
(2.9). Use the (discrete) distribution over the ~) realizations (with equal
probability assigned to each of these mass points), and by the usual ordering
of these (N) realizations of K~ or K*
m
ron
ron
,the cut-off point (critical value)
can be obtained as in an usual pennutation test. The essential difference here
is that the critical value detenuined fran this pennutation distribution
is
exact (conditional) one (Le., may not correspond to the
not (generally) the
exact (conditional) significance level). But, these critical values are good
approximations to the exact ones. This procedure, though detenuinistic, may
becare quite computationally laborious as N increases,
and hence, for large
semple sizes, one may like to use the following weak convergence approach which
is rrore adaptable in practice.
(ii) Weak convergence approach: As in section 2, we denote the d.f. of
•
x~*
-1
by F * ,and let F * . be the jth marginal d.f., for j=l, ... ,p. let I = [O,l]P
o
oJ
P
be the unit p-cube, and for every t = (t ' ... ,t ) E I and s = (sl ' ••• ,s )
l
-
E I
p
p
p
p
,let
*-1
y(s,t) = F * ( F *-1
1 (s,I\t ), ••• ,F
(s f\t ) )
o
o
~
l
op
p P
(3.16)
*-1
* *-1
*-1
- F* ( F *-1
1 ·(sl)'··· ,Fop (s»F
(F 1 (t ), ... ,Fop (t»
o
p
00· l
p •
o
Consider then a Gaussian process
every t E l
-
= { vf(t) , t E l
-
p
} where Ntf(t) = 0 for
-
,and
p
J(s)vf(t) =
(3.17)
vf
y(s,t) , for every s, t E l
P
let
+
sup
t £1
P
P
Then, by reference to the general weak convergence results for the rrn.l1.ti-di.rrensianal
(3.18)
"
K
=
t
sup
£
I
_.o(t)
W
.
and
K
=
empirical processes I viz., Neuhaus (1971)] , we conclude that under H
(3.19)
and
*
Kron
~
K
,asN
O
in (1.2),
+00
+ or
Thus, by virtue of (3.15) and (3.19), we may use the distribution of
K
to provide asynptotic approximations to the null distribution of K*+
ron
or K*
ron
K
-10-
In the univariate case, the covariance function
y(s,t} reduces to sAt - st ,
so that vf is a standard Brownian bridge on [0,1], and hence, the distributions
of
K+
and
are given by (1.5}-(1.6). In the nnlltivariate case, i.e., for p .::. 2,
K
the distribution of
K+
or
K
dePends on the covariance function y(s,t) • For
--
sorre Particular fonn of this covariance function, Dugue (1969) has studied the
characteristic functions of certain functionals of
vf ,
and his results may be
extended to our case as well. However, for our case, the covariance function
y(s,t) is not SPeCified (even under H ) , and hence, the Fourier coefficients
-
-
0
appearing in the characteristic function are not known. This makes it difficult
to use the inversion of the characteristic function to obtain the distribution
K+
function of
or
K
in a series fonn. Actually, using the basic results of
Kiefer (1961), it can be shown that for every p ( .:: 1) and E > 0 , there exist
positive constants cl(p,E) and c (p, E), such that for every t .:: 0,
2
(3.20)
p{
K
(3.21)
p{
K
+ > t}
>
t}
2
<
cl(p,E).exp{ -(2-E)t
<
2
C (p,E).exp{ -(2-E)t }
2
}
•
and this could have been used to provide suitable bounds for the critical
values of
K
+ and
K.
However, the constants c (p,E) and
l
may not be able to have their values
TIUlCh on the underlying covariance function
E
), and hence, even for a given
eatq?uted when
--
y(s,t) is not known.
E, we
c (PrE) dePend vel:}'
2
y(s,t) (besides dePending on p and
For this reason, we may take recourse to
a m::x1ified bootstrap nethod to estimate the critiCal levels of
"*
K
+ and
K
•
,,*
Define the aligned vectors Xi and Y as in (2.5) and pcx:>l all these N
j
vectors into a combined sample. Let
"*
~
aligned vectors. Fram the distribution
be the empirical d.f. of these N
"* ,
I~
we draw K independent sanples,
each of size N • we denote the observations in the rth semple by -r
Z 1' ••• ' _r
ZN
for r =1, ••• ,K. Note that the _r
Z k are drawn with replacement from the finite
"*
"* /<k
,,*
population of the N units (related to the vectors Xl ' ••• 'Xm 'Yl' ••• 'Yn ).
For the
I!h
sample, we partition the observations into two subsets
(~rl'··· '~nn)
-11-
and (Zm+l"'"
~)_,
and then compute the statistics K: and K: as in Section 2,
Isee (2.14)- (2.l5}J; we denote these statistics by T+r and Tr , respectively, for
+
+
r =1, ... ,K. Let TK,l
values of the T;
J
< ••• .::. TK,K
(and TK,l
2. ••• .::
TK,K } be the ordered
(and T ) fran the r pseudo-samPles,and we choose K (large)
r
in such a way that corresponding to the chosen level of significance
<
a ( 0 < a
1) , we have
(3.22)
M = K( 1 - a)
Then, our estimate of the
(3.23)
+
TK,M
and
a positive integer.
a-level critical values of
TK,M'
K
+ and
K
are
respectively.
Note that the above procedure is conceptually very similar to the penm1tational approach outlined in (i). The basic difference is here we are sampling
with replace.trent, while in the earlier case, it was without replacerrent ; but
A*
both are based on the aligned observations ( X.
1
A*
and Y. }. While in the earlier
J
case, we do not have to choose K (which is rather arbitrary), we have in all
•
~) possible partitioning, and as N increases this may bec<:>ne prohibitively
large. For example, when m = n = 25,
(~~) is an enonrously large number, while
for the nodified bootstrap procedure, we may choose K as 500 or nore to have
good estimators of the critical values. Actually, with respect to the permutational
distributional approach too, one may choose a sUbset of K permutations (at
randan) fran the set of (N) possible ones, and based on this sUbset, one may
m
consider the estimates of the critical values. These will have essentially
the sane properties , but, sampling without replacerrent may make the variability
even SClI'OOWhat smaller. On this ground, we would recamen.d the use of the
rrodified permutational approach where a sUbset of K (randon) permutations are
used in the estimation of the critical levels.
we
may CCXtIreIlt briefly on the consistency of the tests based on the
statistics in (2.8)-(2.9). By virtue of (3.15), (3.19) and (3.20)-(3.21), the
tests are consistent against the sarre class of alternatives for which the tests
-12-
based on K
*+
1m
and K
*
1m
0*
are consistent. Note that if F
*
0
*
and G stand respectively for
0
0*
0*
0*
and Y. , i . e. ,they are the population countet:parts of F
and G
-]
m
n
the d. f. of X.
-1
defined in (2.13), then the test based on K *+ is consistent against the class of
ITU1
alternatives that
K*
F (x) 0-
*
G (x) : x
0-
....
£
E
*
p
}
> 0, and the test based on
is consistent against a larger class for which sup{
ITU1
>
sup{
*
o • Note that ideally
* I
I Fo* (x)
- - G0 (x)
_
we would have expected our tests to be consistent against
the class of alternatives for which sup{ F0_
(x) -G0(x) : x
_
G0(x)
_
I :
x....
£
EP }
>
o.
:x £ E * }
p
£
EP
I
}> 0 ani sup{
F0_
(x)-
This is easy to verify that these two sets of classes are
*
not necessarily the sarre. In the univariate case, FolX)
= 2Fo (x)-1, V x::..
0 ,
so that these two classes are the sarre. However, in the bivariate or general
multivariate case, they need not be the same. As an example, consider the case
of p
=
O
2. let X
_
have the diagonally synmetric d.f. F0 • Also, let
diagonally synmetric d.£. Go • We denote '!.o
= lY~'Y~)
..p
....
have the
and aSSllllE that (Y~,-Y~)
has the sarre d. f. F •Then , in general F and G are not the sane (exception ,
0 0 0
when
Y~ and Y~ are indePendent).
On the other hand, if we define
~o* and '!.o*
as in before, then, for than, the corresponding d. f. F * and G* will be the
o
0
•
SaIre.
This apparent difference is mainly due to the fact that in the multivariate case,
the synmetry of the marginal distributions fails to ensure the same for the
joint distributions, and hence, using particular constructions for these joint
distributions, one may obtain the sarre joint distributions for the absolute
values of the coordinates which may not have the unique inversions. In view
of this fact, we have to keep in mind that the class of alternatives for which
the proposed tests are consistent. The pathological example cited above should
not be over- errphasized; in practice , this works out well.
For local alternatives, the weak convergence of the two anpirical distribution
processes can be anployed to obtain the asynptotic distributions of K*+
1m
and
i<*un
in tenus of the boundary crossing probabilities of some drifted Gaussian process
(in the multi-pararreter case), and local asynptotic efficiency results may then
be obtained as in Hajek andSidaJ< (1967, p.272).
-13-
4. SCME GENERAL
RI~
The primary reason for
~rking
with the absolute values of the variables in
(2.5) is to eliminate the effect of the location estimators (possible under the
assurrption of diagonal syrmetry of the underlying F and Go) .If we
o
~rked
~uld
have
with the residuals in (2.3), even under the assl.llled diagonal syrnretry of
A
*
and
F and G ,the equivalence result in (3.1) and (3.2) may not hold when F
m
o
0
(and FO
and ~:~ ) are replaced by F and G (and F and G) , respectively.
n
m
n
n
m
n
m
a*
*
A
The weak convergence of the errpirical distributional processes when
are estimated has been studied by a host of
~rkers
SClIl'E
para:rreters
I viz., Durbin (1973)J, and
this may be employed to find out the asyrrptotic distribution of the KolnogorovSrninlov type statistics based on the residuals in (2.3). However, this involves
a Imltidirrensianal Gaussian process whose covariance function depends on sate
other functionals of the density functions f
difficult to adopt in practice ( even when f
The proposed tests
aJ~
o
o
and g
and g
0
0
,and thereby, are nore
are diagonally syrnretric).
operationally nore sinple and share good asyrrptotic
properties too.
In the multivariate nonpara:rretric problems, often, d:imension-reduction is
errployed , and on
ili~
reduced data, sate si.rcpler tests are used. For exarrple,
Anderson (1966) considered
Sate
'cutting function'
(a scalar valued function
of the vector .....
X or ,...Y ), by which one can transfo:rm the p--variate data to the
univariate case, and then use the classical univariate nonpara:rretric tests to
test for the multivaJ::-iate situation. we may also refer to Friedman and Rafsky
(1979) where this tec:::hni.que has been errployed at a rrore sophisticated level.
In either case,
theSI~
authors considered the basic hypothesis of equality of
the two multivariate distributions, and the choice of the cutting function
has been justified an sarna natural grounds. In general, apart fram the arbitrariness of this choice of cutting functions, there is also a loss of efficiency
due to this dirrension-reduction. If one is interested in testing for a hypothesis
-14-
similar to (1.2), one may use a cutting function, estimate its location in
a convenient way, 'WOr;;. with the residuals, and tmder an assurrption of syrmetry
of the distribution of this cutting function (around its Iredian), one c.;m
use the aligned test based on
,,*+
~
"*
or K
(defined on the residual cutting
mn
functions). For this univariate case, one may even use the asyrrptotic results
in (1.5)-(1.6). However, in general, this will entail sane loss of efficiency
due to dinension-reduction, and also the class of alternatives for which such
a test will be consistent will be a sueset of the class considered. in section 3.
In this paper, we have considered an aligned Ko1nx:lgorov-Snirnov type test
(for both one-sided and two-sided alternatives). It is also possible to consider
related test statistics. For exarrple, we may define the alignErl eIl"pirical
"*
A*
A*
d.f. 's Fm and Gn as in (2.6), .let IL
= N-1 {mFA*m + nGA*n } , and consider an
-~
satE
aligned Craner-von Mises' type test statistic
(4.1)
~
J....
ron
N
= ~N IE· ft· I I
,,*
,,*
2 "*
Fm(x)
•
_ - Gn (x)
..., ] dH_(x)
-~ ...,
P
Again, by virtue of rlheorern I, we may establish the asyrrptotic equivalence of
O
m*
the statistic in (4.1) and the Parallel version based on the d.f. 's F
, GOn *
and HO* ( = N-l { mF°* + nGo *} ). For the distribution theory, we may proceed
-~
m
n
as in section 3 • One advantage of the statistic 'cN in (4.1) is that its
pennutational rranents can be carputed with relatively nore
ease, though its
asyrrptotic distribution is no less sinpler than that of K*+ or K* • In a
mn
ron
different context Iviz. Najunrlar and Sen(1978)], the study of the local
(approximate) Bahadur-efficiency of Kolnogorov-Srnirnov and craner-von Mises
tests reveals that generally the Ko1Irogorov-Srninlov test nay perfonn better
than the cranEr-von Mises test. A very similar conclusion holds here.
REFERENCES
[1] T.W. Anderson, SOne nonpararretric procedures based on statistically equivalent
blocks, Multivariate Analysis (ed.P .R. Krishnaiah), pp. 5-27 ( 1966) •
[2] P.J. Bickel, A distribution-free version of the Srnirnov t\\u-sanple test in the
p-variate case, Arm. Math. Statist. 40, 1-23 (1969).
[3] S.K. Olatterjee - P.K. Sen, Nonpararretric tests for the bivariate t\\u-sarnple
location problem, Calcutta Statist. Assoc. BulL 13, 18-58 (1964).
J
[4] D. Dugue, Characteristic functions of random variables connected with Brownian
rrotion and of the von Mises' multidim:msional W 2 ,Multivariate Analxsis,II
(ed. P.R. Krishnaiah), pp.289-301 (1969).
n
[5] J. Durbin, sc:m= results for the bivariate goodness of fit problem, in Nongrranetric Tech. in Statist. Infer. (ed. M.L. Puri), pp. 435-449 (1970)
[6] J. Durbin, Weak convergence of sarnple distribution function when pararreters
are estiIl1ated, Arm. Statist. !, 279-290 (1973)
[7] J.H. Friedman - L.C. Rafsky, Multivariate generalizations of the Wald-Wolfowitz
and Srnirnov t\\u-sanple tests, Arm. Statist. 7...' 697-717 (1979)
[8] J. Hajek - Z. Sidak,
Theo~
of Rank Tests, Academia, Prague
(1967) •
[9] J.L. Hodges,Jr., The significance probability of the Srnirnov t\\u-sanple test,
Arkiv for ~ath. i, 469-486 (1957).
""
e
[10] J. Kiefer, On large deviations of the errpiric d. f. of vector chance variables
and a law of the iterated logarithm, PacificJ. Math. 11, 649-660 (1961).
[11] H. Majum:ia.r - P .K. Sen, Nonpararretric tests for multiple regression under
progressive censoring, J. Multivar. Anal. ~, 73-95 (1978).
[12] G. Neuhaus, On weak convergence of stochastic processes with multidimensional
time pararreter, Arm. Math. Statist. 42, 1285-1295 (1971).
[13J P .K. Sen, On a Kolrrogorov-Snirnov type aligned test , Statist. Probability
Lett• .£,
(1984).
[14] 1. Vincze,
10, 82-87.
en
two-sample tests based on order statistics, Publication. Math.
© Copyright 2025 Paperzz