"
RANK TESTS INVARIANT ONLY UNDER LINEAR
TRANSFORMATIONS
.by
Robert L. Obenchain
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Himeo Series No. 617
APRIL 1969
--
This research was supported in part by the National Institutes of Health,
Institute of General Medical Sciences, Grant No. GM 12868-05.
ii
TABLE OF CONTENTS
CHAPTER
PAGE
ACKNOWLEDGMENTS
ABSTRACT
NOTATION
I
2.5.
2.6.
2.7.
2.8.
3.3.
L(p)
The Projection Matrix, E
The Co-ordinate Representations, E
The Squared Distance Matrix, Q2
Two Numerical Examples
Table 2.1.
Figure 2.1. and Figure 2.2.
Figure 2.3. and Figure 2.4.
Further Properties of the Representation
Distinctions Between the Univariate and
Multivariate Cases
Other Models For Invariance
The Distribution-Free Content of E
7
9
11
13
14
15
17
16
24
26
32
E
Expressing
in Polar Co-ordinates
The Univariate Case
The Asymptotic Efficiency of the V Test
An Asymptotically Nonparametric Test for Scale
Table 3.1.
The Multivariate Case
35
37
39
43
47
49
CONDITIONAL RANK TESTS ON ALL DISTANCES
4.1.
4.2.
4.3.
•
1
1
3
5
DISTRIBUTION-FREE TESTS FOR GENERALIZED DISPERSION
3.1.
3.2.
3.2.1.
3.2.2.
IV
The Data
Invariance Under Monotonic Transformations
Invariance Under Linear Transformations
Organization of the Study
THE MAXIMAL INVARIANT UNDER
2.1.
2.2.
2.3.
2.4.
III
v
vi
INTRODUCTION
1.1.
1. 2.
1. 3.
1. 4.
II
iv
The Conditionally Nonparametric Distribution
The Standard Form of the Difference Rank Matrix
Figure 4.1.
Rank Tests Which Ignore Dimensionality
Figure 4.2.
53
54
58
59
60
iii
CHAPTER
PAGE
4.3.1.
4.3.2.
4.3.3.
4.4.
4.4.1.
4.4.2
4.5.
The Two-Sample U-statistics Approach
Table 4.1.
L(F) versus A(F)
Table 4.2.
Table 4.3.1.
Table 4.3.2.
Table 4.4.
Figure 4.3.
Comments on the General Multivariate Problem
The Assignment of General Rank Scores
Principal Co-ordinate Analysis of ~(E)
Table 4.5.
Proximity Analysis of ~(E)
Table 4.6.
Tests of Structure S(TI(N), a) Invariant
Under L(p)
REFERENCES
..
61
66
67
69
72
73
75
77
80
80
83
85
87
88
88
93
iv
ACKNOWLEDGMENTS
I gratefully acknowledge the guidance of my adviser, Professor
P. K. Sen, during the progress of this research.
He suggested the
topic of this dissertation, and it has been an invaluable experience to
attend his courses.
I wish also to thank the other members of my doctoral committee:
Professor N. L. Johnson, Professor M. R. Leadbetter, Professor
J. A. Pfa1tzgraff, and Professor D. E. A. Quade.
I thank
Dr. Nariaki Sugiura for his helpful criticisms of some of my work.
My
graduate experience has been greatly enhanced by the creative lecturing
of the members of the faculty.
For financial assistance, I acknowledge a National Aeronautics and
Space Administration Fellowship and support as a research assistant in
the Department of Biostatistics.
The computer time needed to implement
this research was provided by the Computation Center, University of
North Carolina, following application to the Dean of Research
Administration.
This research was
support~d
in part by the National
Institutes of Health, Institute of General Medical Sciences, Grant
No. GM 12868-05.
I wish to thank Professor Meyer Dwass, an excellent teacher, for
stimulating my interest in statistics while I was an undergraduate at
Northwestern University.
v
ABSTRACT
Nonparametric procedures appropriate for data measured on at least
an ordinal scale utilize ranks invariant with respect to monotonic
transformations.
By restricting attention to data measured on at least
an interval scale and to procedures invariant only with respect to
translation and nonsingular linear transformation, the univariate concept of rank order is generalized to several dimensions.
Univariate
and multivariate, distribution-free and conditional rank tests of this
type are introduced and studied.
vi
NOTATION
px1
is a
x
column vector, and
p 1
x'
1 p
(xl
...
x )
P
is the
corresponding row vector.
is the
1
px1
column vector of all unities, and
p 1
a
~
=
p q
«x-<"j.. »
pxq
matrix of all unities.
is a
pxq
matrix with
x ..
-<"j
1 l'
p 1 q
as the element in the
is
~-th
row and j-th column.
is the identity matrix of order
I
~p
·e
p •
denotes the Kronecker or Right Direct Product of the matrices
Q(n) =
n n-1
-1
~
and
~.
-1
-1
1M
v'f:2
-1
-1
1
l(n-1)n
ff.3
11·2
-1
2
l(n-1)n
1273
l(n-1)n
• • •
is the negative of the
semi-orthogonal matrix
-1
associated with the
He1mert transformation.
0
0
0
0 •• • 0
0
l(n-1)n
n-1
l(n-1)n
Then
l'Q(n)
1 n n-1
I
~n
-
1.
1 l'
n--
CHAPTER I
INTRODUCTION
1.1.
The Data
Suppose
P
2::
individuals from
size
... ,
variables,
1
c
2::
populations,
1
are to be measured on
... , P c .
is to be taken from the j-th population.
n.
J
~]
(1.1)
c
N
= En.,
j=l
!k =
J
(X
lk
A random sample of
The data to be col-
p x N matrix of random variables
lected can thus be formed into a
where
PI'
xp ),
,
••• X )
pk
are mutually independent for
1 :s; k :s; N,
the first
1
-- are random vectors identically distributed as in
:s;
k
:s; n
l
and the j-th group of
as in
p .•
J
Thus, if
columns of
n.
J
columns of
c > 1,
X
~
for
PI'
are identically distributed
only certain of the columns of
~
need
be interchangeable.
1.2.
Invariance Under Monotonic Transformations
Distribution-free procedures for analyzing data are available even
when the variables
tive) scale.
(Xl' ••. , X )
p
are measured on a nominal (qualita-
If, however, the data is measured on an ordinal or an
2
interval scale, nonparametric procedures which use the "ranks" or order
properties of the observations are appropriate.
The random matrix
~
is thus replaced by the matrix
"£:f
(1. 2)
=
[R:f
-1
p N
I
:f
... I ~]
I
of discrete, dependent (within rows as well as within columns) random
variables.
the element in the i-th row and k-th column of
is the "rank" of
X
ik
X ' X ' ••• , X
iN
i1
i2
and is defined by
:f
1
N
Rik = -2 + E c(X· k - X. ),
m=l
~
~m
(1. 3)
where
·e
among
(1. 4)
c(u)
Each row of
"£:f
=
H
if
u > 0,
if
U =
if
u < 0.
0,
is thus a permutation of the integers
if there are no ties.
1, 2, ••• , N
Procedures of this type are invariant with re-
spect to coordinate-wise monotonically increasing transformations of
the data and are of great interest because no assumptions need be made
concerning relationships between the
The procedure of replacing
~
p
variables being measured.
by
B:f can be easily justified
P n
Distribution-free tests can be char-
p N
in the univariate case
(p = 1).
acterized as tests invariant with respect to monotonic transformations
because any continuous distribution can be transformed into the uniform
distribution on
[0, 1]
by the transformation
Xl
+
F(X ) ,
1
where
is the cumulative distribution function of the random variable
Xl.
F
One
3
:g'f
is the maximal invariant form of
x under the
1 N
1 N
group of all monotonically increasing transformations of the data.
then notes that
~
The multivariate case
(p
1)
>
is not so simple.
Coordinate-wise
monotonic transformations can make all the marginal distributions uniform, but the joint distribution will not necessarily be uniform on the
unit "cube" in p-space.
Thus multivariate distribution-free tests can-
not be characterized in the above way.
For example, Anderson [1],
Bhapkar [4], and Mardia [28] have proposed multivariate procedures
which do not utilize the rank matrix (1.2).
1.3.
Invariance Under Linear Transformations
Let us consider distribution-free procedures appropriate for data
measured on an interval scale.
If
p
>
1
variables are measured, let
us assume that linear combinations of variables are of interest.
There
are certainly experimental situations in which these assumptions are not
realistic, but these assumptions and more (e.g., multinormality of distribution) are made when using "standard" multivariate procedures.
Let
us consider procedures invariant under the "general nonsingular linear
transformation group,"
(1. 5)
a l' + ~~
pIN
p P N
~
P N
where
a
singular
is an arbitrary
p
x
p
of the form
L(p),
matrix.
p
x
1
That
group is verified as follows.
(a(2) , ~(2))
E
L(p),
vector, and
L(p),
If
is an arbitrary, non-
~
with elements
(a,~),
(~(l)' ~(l)) E L(p)
is a
and
their "product" is defined to be
(~(l) + ~(2)' ~(1)~(2))
E
L(p).
Then
(0, .!p)
E
L(p)
is the "identity"
4
transformation, and
(a,
~)E
L (p) •
~
(-a,
-1
)
€
is the "inverse" of
L(p)
~
Note that the matrix
~
(variables) of
when
p
>
1,
in (1. 5) combines the rows
but that transformations of the type
(1.5) do not combine the columns (individuals) of
~.
Let us consider, first of all, procedures invariant under (1.5)
when
1,
p
the univariate case.
becomes a scalar,
a,
becomes a scalar,
b,
In this special case, the vector
which translates the data
~;
the matrix
1 N
which changes the scale of the data (by /bl)
and changes the sign of the data if
b < O.
a
~
The procedures we are con-
sidering thus form a proper subset of all univariate "two sided" distribution-free tests -- i. e., those which are invariant with respect to
all monotonically increasing or decreasing transformations of the data.
Let us consider an example which will be developed later.
Suppose
we wish to test the null hypothesis that two univariate populations have
the same distribution against the alternative that they differ only in
variance.
Then
(1. 6)
Zlk
c
=
= 2, N = n l + n 2 and we write
!X lk - XII
for
1
~
k
~
N,
N
~ X • Since the variable, Xl' was measured on an inlk
k=l
terval scale, the ranks of the Zlk observations are meaningful. Note
where
NX
I
further that these ranks are invariant under (1.5) but not invariant
under general monotonic transformations of the data
that all rankings of the
Zlk
the null hypothesis because the
~.
Finally, note
observations are equally likely under
Zlk I S
are "interchangeable random
variables" in the sense of Chernoff and Teicher [9].
In Section 3.2,
we will return to this problem and display a nonparametric
"V"
test
5
sensitive to difference in scale which has the same null distribution as
the Mann-Whitney test for difference in location.
The implicit difference between the univariate
tivariate
(p
1)
>
group
defined by
b
~
-+
p N
aI' +
pIN
b ..
11
~
0
for
1
~
i
~
..
0
n
b
0
0
where
0
~
22
.
• ·b
Lb (p)
,
pp
P -- i.e., the nonsingular
(1.5) is now restricted to be diagonal.
iance un d er
Rather than consider the
defined by (1.5), we could consider invariance under the
Lb(p) c L(p)
(1. 7)
and mul-
We are, however, already equipped
to consider the most obvious difference.
L(p),
= 1)
cases in assuming invariance under transformations
(1.5) is the subject of Section 2.6.
group,
(p
rather than
L(p)
~
matrix of
Note that considering invarwhen
p > 1
tion that the variables can be combined linearly.
removes the assumpThis is the same
type of multivariate extension of univariate techniques which is employed in forming the rank matrix (1.2).
1.4.
Organization of the Study
In Chapter II, we will derive the maximal invariant form of
under
L(p)
~
defined by (1.5) and study its geometrical properties.
It
will be shown that a concept of distance between individuals is preserved here which is lost if one considers transformations more general
than linear.
We will see that only conditional tests can use all the
information contained in the maximal invariant.
In Chapter III, we consider some simple distribution-free rank
tests for detecting difference in "generalized dispersion" among
c
6
populations.
than
P
2
A population
is said to be relatively more disperse
if the probability is greater than
be "further" from a point,
P2
PI
H.,
take .H.
X
equal to
that a variate from
than a variate from
have the same known location,
pected value if it exists.
~
.H.
P •
2
If
PI
PI
and
is taken to be their common ex-
Sample measures of generalized dispersion
so that the respective locations need not be
known, equal, or expected values.
In the univariate case
(p = 1),
distance is easily measured and the proposed statistic is seen to be a
rank order measure of the relative magnitude of variance or mean deviation about the mean.
In the multivariate case
(p
>
1),
generalized
dispersion is a univariate, relative measure of variation not to be
confused with "generalized variance,"
the determinant of the variance-
covariance matrix of a single population.
-e
(c
= 2)
The two sample version
of the test will be seen to be of interest because its null
distribution is well known and tabulated and because it can be applied
to multivariate
(p
>
1)
as well as univariate
(p = 1)
data.
In Chapter IV, we consider some conditional rank tests sensitive
to differences in location and/or dispersion.
The assignment of general
"linearly invariant rank scores" to possibly multivariate data is considered.
It is shown that there is a way to construct these scores so
that they preserve the dimensionality of the data and, as a result, contain all of the information in the maximal invariant in the asymptotic
case where the sample size approaches infinity.
Rank tests which use
these scores can thus be asymptotically most powerful invariant.
CHAPTER II
THE MAXIMAL INVARIANT UNDER
2.1.
L(p)
E
The Projection Matrix,
Write
1
X
N
L X
p I N k=l -k
(2.1)
= -
for the arithmetic mean of the
-e
dom samples from a total of
formations in
L(p)
(2.2)
-
~
N xN
~
c
~
1
populations.
Note that the trans-
specified by (1.5) can be rewritten as
X I'
=
P N pIN
where the
N random vectors to be observed in ran-
matrix
~
11 ')
(.1N -1.N--
P N
1 I'
.1 - 1.
N--
-T
N
~ ~
11 ')
(.1N -1.N--
P P N
is idempotent, and the
is nonsingu1ar -- we return to this point in Section 2.6.
S
,..,
p p
(2.3)
for the
p xp
among the· c
samples.
N N
where
matrix
R,
X (I ""
P N
f"t>J
1.
1 1')
N --
Now write
X'
""
NP
=
Finally, write
(!
X'S-X (I,.., - 1.
.- - 1.
N -1 -I') ,..,,..,,...,
N -1 -I')
N
NP P N
N
denotes any generalized inverse (= g-inverse) for the
R,.
matrix
matrix of pooled sums of squares and products wi thin and
E
(2.4)
=
p xp
N
Then we have
pxp
8
Theorem 2.1.
£, defined as a function of
~
by (2.4), is
NN
P N
maximal invariant under the group of transformations, L(p), defined
The statistic
by (1.5) or by (2.2).
£
is symmetric, indempotent, and uniquely de-
termined regardless of the choice of the g-inverse involved in its definition.
£
is the projection matrix associated with the vector space
~
generated by the rows of
- X l' .
Proof.
Rao [37], pages 10 and 23, discusses projection in Euclidean
space.
That
£
defined by (2.4) is a projection matrix and is uniquely
determined are well known facts; see, for example, Rao [38], Lemma 5.
Since it is obvious that
on
~,
if
N~N(l)
£
is invariant under transformations (2.2)
all that remains to be shown is that
P
£(1)
(~,~) E
Note that
r
nonsingular
~
= £(2) =
L(p)
I(i) = ~(i) - X(i)1'
and let
min(p, N-1)
p xp
matrices
is of rank
the identity matrix when
unique when
with
I(2)
£.
r < p.
r
r
,g(i)
..
I(i)
~(l)
~(2).
into
Write
= rank £ = rank I(l) = rank I(2)
because
[
of
That is,
~ (1) and
P
from
~.(2)' we
p N
N~N(2)
P N
£ implies that there exists a transfor-
which takes
(2.5)
where
is maximal.
is calculated from
must show that
mation
£
Y(O) 1
p~
]. N-l
such that
r I(i) N ]
- - Q-
=
0
p-l
Now there exist
,
p-r N
for
r = p;
i = 1, 2.
,g(i)
the choice of
should be taken to be
,g(i)
Now one notes that the rows of
is clearly not
I(l)
and the rows
are two sets of basis vectors for the vector space associated
Thus there exists a unique
r
x
r
matrix
~*
such that
9
Xt2) =
l!*xh) •
~(2)'
into
It follows that
(a, B)
'"
L(p)
E
which transforms
~(l)
is
l!*
(2.6)
-
=
l!
p p
-1
Q
r r
~(2)
~(l)
B**
Q
'"
p-r p-r
and
a
(2.7)
p 1
where
l!** is any
choice of
(~,
l!)
(p-r) x (p-r)
nonsingular matrix when
is unique if and only if
r
= p.
r < p.
The
This proves the
theorem.
Let us emphasize here that we are not assuming that population
expected values and variance matrices exist.
2.2.
The Co-ordinate Representations,
~
Gower [12] has discussed characteristic root (= c-root) and vector
techniques in which a geometrical interpretation is given to the information contained in
N x N association matrices -- the element in the
i-th row and m-th column of an association matrix such as
E
thei-th and m-th individuals in the sample taking all
variables
measured into consideration.
r
x
p
compares
The result of this sort of analysis is a
N matrix of the form
~]
(2.8)
where
W,
j
the j-th column of
~,
gives the co-ordinates of a point,
10
to be plotted using
r
orthogonal axes in Euclidean space, which rep-
resents the j-th individual.
Remember that
r
= Rank E is an integer
valued random variable.
E
Now
E
has
r
is idempotent.
c-roots of
+1.
X
point.
r
N-r
c-roots of
all
The case
E
orthonormal c-vectors of
The case
~ =
1~
p N
P 1 N
and
0
because
Thus the work of Gower implies that the rows of
should be taken to be
c-roots of
+1
r = 0
~
associated with
is trivial because we then have
N individuals are then represented at the same
r
= 1 is simple because the c-vector of length one
corresponding to the single non-zero c-root is uniquely determined up
to a multiplicative factor of
elements.
If, however,
to pick the
r
r > 1
orthonormal
±1
which can change the sign of all its
there is no automatically unique way
c~vectors
because the
r
non-zero c-roots
are all equal.
Lemma 2.2.
If
~'Ji
(2.9)
N r N
= E
NN
and
Ji Ji' = I
(2.10)
~r
r N r
then
Ji l
(2.11)
If
E*
= 0
r N 1
r N
(2.12)
E*
=
£E
r r N
rxr orthogonal matrix
r N
for some
r 1
satisfies (2.9) and (2.10), then
£.
Furthermore, the sample mean of the projections of the representation on any direction in r-space is zero.
The sample sum of
squares of projections on any direction is one.
The sample
11
covariance between the projections on any two orthogonal directions is zero.
Proof.
Relationships (2.9) and (2.10) imply that the rows of
orthonormal c-vectors of
E!
relationship
e
associated with c-roots of
+1.
onal matrix of (2.12) is
f
=
r r
~'!i!i'~ =
that
e*
e*'e*
e'e = 1.
Then
1 by (2.10).
= 1,
!i*lf •
~'!i
Now let
then
£
!i
Thus
Finally, i f
~'!i!i'~*
= e'e*
The Squared Distance Matrix,
The matrix
~
The orthog-
be any
r xl
vec-
is the row vector of projections of
represent orthogonal directions.
2.3.
Since the
r N r
the representation onto some direction.
and
are
= 0 is obvious from (2.4), (2.11) follows because
c-vectors associated with unequal c-roots are orthogonal.
tor such that
W
,..,
e*
e 'w 1 = 0
r x1
is a
by (2.11),
vector such
is zero if and only if
e
and
This proves the lemma.
Q2
is an aid in the geometrical interpretation of
r N
because its columns give the co-ordinates of
N points in
r
N N
space.
However, the
non~uniqueness
of
that the geometrical interpretation of
!i
implied by (2.12) suggests
e
might best be approached by
analysis of the distances between points rather than by reference to a
non-unique
co~ordinate
system.
Write
(2.13)
where
dim
denotes the squared distance between the representations of
the t-th and m-th individuals in the sample.
The matrix
Q2
is sym-
metric, and its diagonal is null -- expressing the fact that each individual is of distance zero from itself.
Let
Ptm
denote the element
12
E,.
in the l-th row and m-th column of
Then we have
Lemma 2.3.
2
lm = (Wl - ; ) ' (Wl - ; )
(2.15)
= Pu + Pmm - Pm£. - P.em
(2.16 )
= (Xl - ~) I :i (Xl - ~),
(2.14) d
where
(2.17) 0
Proof.
is as in (2.4).
£
s;
d
2
lm
s;
2.
The expression (2.15) follows immediately from (2.9) and (2.14)
as pointed out by Gower [12].
Expression (2.16) follows by noting that
= Prot =
(~
Indeed, a particular choice for
:i
P.em
·e
Finally,
uniqueness of
one notes that
orthogonal to
dim
lD £I
t - X).
(X
may not even be symmetric, but the
follows from that of the entire
IN - E,
E,.
-
E, matrix.
is symmetric, idempotent, of rank
N-r,
Next
and
The proof of the lemma is concluded by noting that
the squared distance in the space orthogonal to the row space of
~
- Xl'
is
=
2
2 - dlm
2:
0.
We are now in a position to state and prove
Corollary 2. l' .
E"
Q2, and ~ are equivalent in the sense
NN NN
r N
that they are all maximal invariants under L(p).
The statistics
13
~
Proof.
is calculated from
E
by c-vector analysis and is not unique
unless further restrictions than those implied by (2.10) are applied;
see Obenchain [33].
is calculated from
from Q2
E is determined
E using (2.15).
~
by (2.9).
Finally, one can determine
Q2
E
(I,.." - 1.
1 1 ')(- ~V2)(1
- 1.
1 1')
N-,..",.."
N--
which follows from the fact that
E! = O.
Two Numerical Examples
The geometrical properties of
E,~,
and
Q2 will now be i1lus-
trated by a numerical example using artificial data with
and
Similarly,
by using the formula
(2.15')
2.4.
from
N
=
5.
=r =2
The results of the calculation are presented in Table 2.1.
The raw data
~
is plotted in Figure 2.1, and the representation is
displayed in Figure 2.2.
dividua1s
p
2, 3, and 5
Note that the linear relationship between inis preserved in the representation.
Note that,
as was not obvious in the raw data, our model implies that individual
2
is evidently closer to or more like individual
1
is like individual
5
than individual
4.
Before proceeding to the next lemma, it should be noted that the
specific measure of distance (2.16) was considered by Gower [12] in
§4.2.
Gower states that use of this measure of distance "reflects the
attitude that the group of individuals is homogeneous" essentially because of the properties of the representation that we demonstrated in
Lemma 2.2.
There is, however, reason to question this assertion.
First
of all, one notes that the distances between points will almost surely
be greater than zero and relatively different.
14
e
TABLE 2.1.
Numerical Example for
X
=
2 5
[
[
X
2 1
p = r = 2
and
N=5
-2
-3
1
2
3
1
-1
1
-1
2
26.8
5.6
5~6
7.2
0.2
0.4
]
~
=
2 2
0.367
f
-.
5 5
0.134
0.471
0.010
-0.148
-0.460
[
Ji
2 5
0.000+
0.055
-0.074
0.644
-0.051
-0.457
0.156
-0.111
0.463
-0.076
-0.672
0.230
-0.163
0.681
-0.601
-0.139
-0.046
0.786
0
0
Q2
5 5
=
0.569
0
0.401
0.822
0
1.931
1.114
0.846
0
0.933
1. 849
0.205
1. 329
0
]
15
FIGURE 2.1.
~
The Raw Data,
2 5
5
---+--+-t--"'''---~~-+---+-:f--1--'''
xl
2
w
The Representation,
FIGURE 2.2.
~
2 5
-e
..
circle of
1
radius
14/5
16
Suppose two characteristics are measured on a population such that
the bivariate density of the characteristics has unusual equal probability density contours.
These contours would be circles or ellipses for
the Bivariate Normal distribution, so suppose the contours are squares
or crescents or are not even closed or connected.
Then data from this
population and its corresponding representation would display the same
behavior in probability.
For "example., the artificial raw data of Fig-
ure 2.3 has the canonical form shown in Figure 2.4 in which the original
circles become ellipses.
If actual data were to split like this into
two clusters, there would be empirical evidence for arguing that the
population consists of two types of individuals (e.g.
men and women)
so that the characteristics are distributed as a mixture of two distinct
distributions.
-e
The linearly invariant canonical form can thus display
considerable heterogeneity among the individuals in the sample.
2.5.
Further Properties of the Representation
The following lemma is useful in the stepwise construction of pro-
jection matrices.
Let
v'
-l
denote the l-th row of
~
and write
P N
(l' )
(2.18)
~
-k
=
k N
where
k ~ p
and
~k
= (ll' •..
tegers from
1, 2, ... , p.
'IT'U')
.::=:[(
, ••• , 'IT~)
-
=
('IT
II
Thus
,~)
is an ordered subset of
~ = ~(1,2, ..• ,p).
Let
denote a permutation of the elements of
(l')
(l' )
Write
E -k
N N
k
for the projection matrix calculated from
-k
~
k N
in-
17
FIGURE 2.3.
The Raw Data,
~
2 15
J\
12\
•
)5
• •8~.~6•
13~.~15
14
7
..
FIGURE 2.4.
--..
The Representation,
12
-e
14.
,/
1
w
~
2 15
~~-15
8 •
7.
5
6/
.----------- .
18
and define
denotes the empty set.
Q ,where
Then we
N N
have
Lenuna 2.4.
(,e.' )
E -k
(2.19)
<.!' (fk»
= E
N N
(2.20)
=
(fk)' (l')
(fk)'
(i')
(I - 1
1
1')
X
[X
-k
(I
1
1
l')X
]-X
-k (I - 1
1 1')
N-N-N-/"'OW
("'oJ
N
f'V
I'V
f",J
t'.J
Nk
/"'oJ
kN
N
(2.21)
where
(pI
(2.22)
=
)
(1 - E .:.=tc-1 ) (1
1
- N
ll')v:e
k
is the component of the
~-th
row of
Y
,
f"oJ
= XCI
f'J1"'toI
1
- -N1- 1')
which is
orthogonal to the vectors in the vector space generated by the
-e
11- th , 12-th, .•• , and
~_l-th
rows of
y,
and
1
=
otherwise.
Q
Finally,
. ( 7T 1'7T 2 ,···,7T )
E= E
p
and
and is associated with a c-root of
Proof.
v(fk)
+1
is a c-vector of
unless
v(fk)
E
= o.
Relationship (2.19) follows from the uniqueness of projection
(i')
matrices and their invariance under transformations (1.5) on
The non-unique generalized inverse,
"- " ,
-k
•
in equation (2.20) can be re-
placed by the uniquely determined Moore-Penrose inverse,
cussed in Rao [38].
~
"+",
dis-
The formula (2. 21) then follows by noting that
19
I
I
[
D e]+
:' ~
- 1.
e'D+
g-
D e
- -
I
=
+
1
I
- +I -
-g-"'--1
g
I
I
I
if
+
g = f -~'Q~:/: 0,
Q = Q' ,
In our case
when
g
= o.
=
g
.L'
and
~'(1 - Q+Q)
.Q'
r. (.tic) = r. (~-l)
.L
v(&k)v(~),
=
so it is clear that
Following (2.23), we have given the geometrical interpre-
tation of the results of the calculation (2.21).
This concludes the
proof.
Geometrical intuition can be brought to bear upon the question of
·e
how the variables
(Xl' ••. , X )
p
interact by considering the stepwise
~
using Lemma 2.4 as the variables (rows of ~) are
r N
added indifferent orders. The elements of v~l') in (2.23) are inter-
construction of
preted as the coordinates of
-k
N points to be plotted on
N-1
[- N' + N-1]
N
c
The resulting one dimensional configuration displays the contribution
of variable
X~
to the representation
Xl ' Xl ' ••• , and
1
2
Xp
"K-1
only.
and
X~+l
Xl ' .•• , Xp
•
1K-1
Similarly,
in the presence of
v~l')
and
-k
be plotted on orthogonal axes in
X~
~
R2
.Y.~PH'
)
can
~+1
to represent the contribution of
together after adjusting for the presence of
Note that
v(P')
is the projection of
~
onto some
~
direction, the orientation of which is determined using only the
20
ll-th, l2-th, ••• , and lk-th
•
~.
rows of
invariant under transformations (1.5) although
The question of how to modify
to the sample is also of interest.
changes
X,
E
v(~)
Thus
~
is clearly not
is invariant •
when another individual is added
Unfortunately, if this process
no simple result like that of Lemma 2.4 can be stated.
Now let
(2.24)
=
where
2
2
(d10 , ••• , dNO) ,
denotes the squared distance between the representation of
the m-th point and the origin.
Then we have
Lemma 2.5.
2
(2.25) d
mO
.-
where
Proof.
0
=
:::;
2
d
mO
(X - X)' ~
-m
-
=
Pmm
N-1
N
(X - X)
-m
-
N 2
1: d
mO
m=l
and
:::;--
-
r :::; N-1
The result (2.25) follows immediately from (2.9) as before.
p x (N+1)
give another interpretation of this result, form the
~*
=(X
.
Thus
X).
p* =
._
N+1 N+1
~O
-EO ']
implies that
To
matrix
d;o = Pmm + 0 - 2(0)
is the squared distance between the representation of the m-th individua1 and the representation of
b'N
.to
11
-1.
N - - '
E,
- .P
X
using Lemma 2.3.
Now
is symmetric, idempotent, of rank
and orthogonal to
1.
N-r-1,
The squared distance between the m-th
point and the origin in this subspace is thus
1
2
1 - -N - d
;::: O.
m0
Finally
N
2:
m=l
N
=
2:
m=l
orthogonal
Pmm
=
trE
=
rank
;e
r .
21
Lemma 2.6.
r
= N-l
P
implies that
s; p
= I
'"
'"
1
- -
1 l'
N--
and a possible choice for a coordinate representation is
(2.26)
!i
=
-1
I(N-l)N
I(N-l)N
-1
-1
I(N-2) (N-l)
N-l N
=
-1
g
(N) ,
•
-1
+1
v'1-2
v'r:2
=
P'
t""oJ
~NN-l
PI
= t""oJ'
p2
N-2
0
•
•
0
0
N equally spaced points in
1:2
N-l
from every other point
N-l
imply that
E
is the
dimensional space of N-vectors orthog-
1.
Thus
P(1 - ~ 1 1')
'"
N--
Thus we can take
E=
and rank
N-l
onal to the constant vector
cation.
I (N-l)N
from the origin.
= _,
0
~
projection matrix for the
On the other hand,
I(N-l)N
.
and of distance
P
N-l
I(N-2) (N-l)
dimensions; each point is of distance
t""oJ
-1
..
•
I (N- 2) (N-l)
The representation is of
Proof.
.
=
P(1 - ~
1 1')
N--
"''''
=
(I - ~ 1 1').
'"
N--
P is obvious by matrix multipli-
'"
g (N)'
W~W
Then
--t..-m
Therefore
=
Po
= 00
-t..m
-t..m
N-l
=-N
1
N
and
where
=
where
n(N)
is our notation
N-l N
for the semiorthogonal matrix associated with the Helmert transformation.
!i
~
denotes Kronecker Delta.
concluding the
proof.
The above lemma shows that the maximal invariant under
L(p)
is a
constant when many variables are measured on only a few individuals.
22
We conclude that invariance under
is most reasonable when
p
L(p)
as a model for data analysis
is much less than
N.
Lenuna 2.7.
E
and
Q2
2
N
have
E
elements and
However, each of these matrices has
only
has
rN
r(N-r-1)
elements.
functionally
independent entries; the distributions associated with these statis tics are singular.
Proof.
If
~
is considered to contain
functionally dependent, then
(p-r)N
pN
variates which are not
"independent" variates are lost by
restricting attention to linearly combinable variables.
This loss will
usually be zero because only linear redundancies can be detected, and
even these may be overlooked if there is random error in recording the
·e
observed values of
~.
Next,
r
constraints are added in achieving
translation invariance and imply (2.11), while
r
2
constraints are
added in achieving nonsingu1ar linear transformation invariance (2.2).
Of these last
r
2
.
constra1nts,
dr-H).
2
1mp1y (2.10) and
r(r-1)
~~2~~
are lost because coordinate representations are non-unique as expressed
by (2.12).
This concludes the proof.
James [15] shows how locally defined exterior differential forms
can be used to derive sampling distributions associated with a Mu1tivariate Normal population.
His result on the decomposition of a random
sample was reinterpreted in Obenchain [33] from which the following resu1t is quoted for completeness.
23
Theorem 2.8.
If
d
~
(2.27)
N{.H.l';
pIN
=
p N
then
(2.28)
dF{
~
dF{ X ) • dF{ ~ ) • dF{ !i )
pIp p
p N
)
P N
where
P{r=p) = 1
~
bution of
£;
if
f
is positive definite and
.
dF{ £; )
P P
N > p.
The distri-
thus splits into four independent distributions;
~
and
are ancillary statistics because their known distribution does not
depend upon .H.
or
f.
If, however,
c
1
>
Multivariate Normal pop-
ulations are under consideration and at least two of them differ in 10cation or dispersion, the dis tribution of
~
(2. 28), and the unknown distribution of
depends upon
The linearly invariant canonical form
does not factor as in
~
and/or 1:!..
is a representation of
r N
adjusted for
and §,. While
and
describe inN
dividuals in the population{s) in terms of the specific characteristics
~
the data
p
measured,
~
describes individuals in the population{s) in terms of all
possible linear combinations of the characteristics measured.
were known that
and
§,
~
If it
had the Multivariate Normal distribution (2.27),
would be sufficient for
1:!.
and
f,
information about the population as a whole.
describe the specific
and
~
would contain no
However,
~
would still
N individuals chosen for analysis.
hand, given some experimental data, one might assume that
distribution (2.27) but discover that
!i
X
On the other
~
has the
contains empirical evidence
that casts doubt on this assumption (e.g., as in Figure 2.4).
Finally,
24
if
c > 1
different populations are being sampled, although "average"
location and dispersion have been removed,
W
,...,
reveals "relative" 10-
cation and dispersion of the populations.
2.6.
Distinctions Between the Univariate and Multivariate Cases
When only one variable,
Xl'
is being measured on the
N indi-
viduals under consideration, one certainly does not have to assume anything about combinability of the values of different variates or orientation of axes in a co-ordinate .representation, (2.8).
of (1.5) and (2.2) is a scalar,
~
since the matrix
we can even restrict attention to
(2.29)
!i
>
s
1 N
'e
b
O.
,
In fact,
b,
in this case,
We have
... ,
where
(2.30)
s
2
N
=
2
X
I:
k=l
is written for the
(2.1) •
(2.31)
~
lk
matrix of (2.3) when
p = 1,
and
Xl
is as in
Similarly (2.16) becomes
2
.em =
d
(XU. - X
lm)
s
2
2
We will be interested in ranking the (squared) distances (2.31) or the
It is clear that divisions of all
radii
these quantities by
s
has no effect on the rankings.
Thus univariate
linearly invariant rank tests can be constructed from quantities such
25
Let us now examine the role played by
X , ••• , X ,
ll
IN
Xl'
the arithmetic mean of
in the theory of univariate procedures invariant under
transformations
(2.32)
k
Defining the median,
as the m-th largest
of
X
lk
•.. , N .
X , ••• , X
1N
ll
when
the m-th and (m+l)-th largest
= 1,
N
= 2m
Xlk's
- 1
when
in the usual way
i.e.,
is odd, and as the mean of
N = 2m
is even -- we see
that the median as well as the mean enjoys the following desirable property:
the location of the transformed values is the transformed value
of the location under (2.32).
We formalize this notion as follows for
the multivariate as well as the univariate case.
vector
m,
We say that a
~
defined as a function of the data matrix
p
x
1
is a
p N
"central measure compatible with (1. 5)" if
(2.33)
m(a
.!' +
~) = a
+
~ (~)
and
(2.34)
~
are iden-
= 1,
it should
This last requirement implies that, if all the columns of
tical, the central measure is that common column.
Although
be obvious that
and
1\
Xl
both satisfy (2.33) when
p
the vector of variable medians, does not satisfy
(2.33) in the multivariate case,
p > 1.
The most general class of com-
patible central measures known to the present author is the class of
"weighted means."
note that
For any
Nx 1
vector
c
such that
l' c
1,
we
26
m (X)
-c .'"
(2.35)
=
p
~.£
N1
satisfies (2.33) and (2.34).
act.within the rows of
Furthermore, one notes that these measures
and thus do not assume combinability of var-
~
iables.
In particular,
X
= m1 (~)
N"1.
is a compatible central measure.
This
is not the only acceptable multivariate measure of location, but it has
the property not shared by its competitors with
c #
i 1.
~).
symmetric function of all the observations (columns of
X
note that the property (2.33) of
as (2.2).
We can state that the
to 82.2'):
~
tion of
·e
- .!!!
1.'
-+
~(~
-
Finally we
is essential when rewriting (1.5)
transform~tions
m 1.')
that it is a
and .!!!
-+
(1.5) are equivalent
J! + 1?E!.],where .!!!
is a func-
.!f satisfying (2.33) and clearly contains no information in-
variant under (1. 5) •
2.7.
Other Models For Invariance
So far in this chapter, we have considered only the properties of
the maximal invariant under
L(p)
defined by (1.5).
This model is im-
portant in standard (parametric) multivariate analysis.
Rotelling's
T2
For example,
for detecting difference in location of two populations
is a statistical procedure with these invariance properties.
the further assumption that
.!f has a multivariate normal distribution
k
must be made before the significance of
p p
can be judged with respect to the central F distribution, but the
with dispersion
T2
IN
Indeed,
®
permutation distribution of
T2
can be constructed and a conditional
test can be performed without making further assumptions than those
stated in Sections 1.1 and 1.3.
The likelihood ratio criterion of
27
Wilks and the trace criteria of Law1ey-Hotel1ing and Pi11ai are examples
L(p)
of test procedures invariant under
for comparing
c
~
2
popu-
lations.
Fisher's sample linear discriminant function is another statistic
invariant under
L(p)
if the individual to be identified and the sam-
p1es known to be from the respective populations are simultaneously
represented by
Q2.
E,~, or
The sample measures of distance and af-
finity between multivariate normal distributions considered by Matusita
L(p).
[29] are invariant under
in (2.3),
12 is the diagonal matrix of c-roots of §"
orthogonal matrix of c-vectors of
(2.36)
~
=
§"
1- 1Q' (~ - X 1.')
p p
P N
·e
where
Writing
p
ance
1
N'
is as
Q is the
one notes that
N
is a possible choice for a coordinate representation if
the sample principal components of
and
§,
§"
r = p.
Thus
normed to mean zero and vari-
constitute a decomposition of the maximal invariant,
Suppose, however, that one is not willing to assume that all the
variables can be combined together, but that the
divided into
k
p
variables can be
mutually exclusive subsets and each subset is thought
to conform to the invariance model (1.5).
Then, it can easily be shown
that a maximal invariant is achieved when the separate representations
are combined to form a total representation
(2.36')
W
P'"N
=
: [1]]
~
• [k]
W
,
28
k
where
P =
Pl.
E
When
k = 2,
the above calculations could be per-
l=l
formed before use of Rotelling's sample canonical correlation analysis.
Note that the extreme case
k = P (Pl = 1
for every
l)
is that con-
sidered in (1.7).
When
c
>
populations,
1
PI' •.. , Pc'
are under consideration,
it may be of interest to consider invariance under the class of transformations specified by
(2.37)
~
1!(c)1.' ]
-+
pIn
P N
where 1!
(1)
, .•. , a
(c)
bitrary, nonsingular
-e
are arbi trary
p xp
matrix.
ences in location among the
(2.38)
~
~(w)
=
p p
where
(2.39)
~
c
p p N
vectors, and
B
~
is an ar-
In this way, any possible differ-
populations can be ignored.
Now write
~ (a)
+
p p
pXl
c
+ ~ ~
p p
is as in (2.3),
c
c
~(w)
E E
i=l jEP.1,
E ~(i)
i=l
p p
is the within populations matrix of sample sums of squares and products,
Pi
denotes the individuals in the sample from the i-th population,
(2.40)
-(i)
X
1
E
X.
J'EP.-J
n.
~
1,
is the sample mean vector from the i-th population, and
c
(2.41)
=
E
i=l
••
is the among populations matrix of sample sums of squares and products •
It follows that a maximal invariant under (2.37) is
29
e
E[ c]
(2.42)
T X'S(w)-X T
NN
NNP P
=
i"'o.J
NN
t".J
/"OJ
1'""o.J
/"OJ
where
(2.43)
-1 1 l'
n -1
Q
Q
1 l'
n2 - -
Q
1
I
"'N
I
NN
Q
•
Q
° a symme t r1c,
°
°
1S
1°dempo t en t ma t r1X
l
Suppose individual
is from
0f
Po
1
rank
and
Q
1
n
1 l'
c
xrx'
N - c, an._
d §.(w) -- ._._._
m is
from
Po,
J
then
-e
Expressions (2.42) and (2.44) should be compared with (2.4) and (2.16)
to see the distinction between (1.5) and (2.37).
At the opposite extreme from (2.37), we can consider invariance
under the group of transformations
(2.45)
~
BX
'"
p P N
+
'"
P N
when
c
~
1.
Thus invariance under translation is not required, and
the maximal invariant becomes
(2.46)
EO
X' (XX')-X
=
/""oJ
r'••J("y
NP
~'
We note that
Thus, when
r
=~ +
= rank
~
i"'o.J
P N
N X x'
= p,
for
X as in (2.1) and
we can write
as in (2.3).
30
(2.47)
~
~
=
P N
where
-%
~
2
-~
p P
~
N
is any matrix (necessarily non-singular) such that
and we note that (2.46) becomes
(2.48)
Z Z'
N
Z, (I
'" "'p
1 + N Z'Z )
NP
--
~
P N
,
whereas (2.4) can be written
- 1 Z')
(~'
(2.49)
(~-
Z 1')
p
N
N
where
-e
(2.50)
(Z ~
_
Z l')(Z'
.......
,....,
p
Thus it is clear that
'"
- 1
Z') =
__
= po
/""o,J
f"tJ
~_
i f and only i f
'"
= "'p
I
Z Z' - N Z Z'
P NP
P 1 p
Z ::; o.
possible choice for a co-ordinate representation of
p
(2.51)
(lp - [1 + 1/11 + N Z'Z]
- -
EO
Z z'
==
)
~'~
~
.
In fact, a
is
if
Z# Q ,
if
Z=O,
P N
Y!,0 =
P N
~
P N
while such a decomposition of
(2.52)
p
Y!,
=
N
E
is
~(l - ~ 1: 1') = ~ - Z l'
31
We note further that the squared distance between the representations
of the l-th and m-th individuals when considering invariance under
(2.45) is
=
(2.53)
N[(!l (1
+
the first term of which is just
.
a representat10n
0f
po
._
~) ~- !]2
N X'~-
dim
,
X)
of (2.16).
We note finally that
does not generally have mean at the origin,
does not generally have equal variation in all directions, and does not
generally have covariance zero between orthogonal directions.
The model (2.45) for invariance may be of interest in one sample
problems, c = 1.
-e
For example, the statistic
(2.54)
has been suggested by Rotelling for use in testing that the mean value
of a population is the specified value,
~O.
This statistic is commonly
said to be invariant under (2.45) in the restricted sense that
~O
+
~o.
The statistic ~o
is clearly not ancillary when
Multivariate Normal distribution (2.27).
~
has the
On the other hand, the statis-
tic (2.54) can also be thought to be invariant under (1.5) in the following way.
As in the proof of Lemma
so that
N=n
l
2.5, we form a data matrix
+ 1.
Then it can be shown that the
squared distance between the representation of
~O
and the origin is
n T2 /N(N + T2)
l
and that the average squared distance between the rep-
resentation of
~O
and the remaining
n
l
points is
32
when considering invariance under (1.5).
~*
Similar calculations using
and the invariance model (2.45) are actually more difficult to in-
~O)' ~
terpret because they involve the factor
(X -
ficia1 device of forming the matrix
converts a
into a
c + 1
X*
~
-1 -
!. The arti-
c
sample problem
sample problem and shows that translation invariance, in
some sense, can often be thought to be part of the model.
We have, how-
ever, emphasized the distinction between (2.45) and the models
(1.5)~
(2.36'), and (2.37) so that procedures can be considered which are not
translation invariant in any sense.
Thus, if the data is measured on a
ratio scale (interval scale with a true origin), measures such as geometric means and coefficients of variation
can be used in constructing
procedures invariant only under (2.45).
-e
2.8.
The Distribution-Free Content of
E
We are now in a position to state and prove the basic theorem which
enables us to construct distribution-free tests invariant only with
respect to
specified by (1.5).
L(p)
THEOREM 2.9.
Under the null hypothesis
(2.55)
that all
- p
c
populations are identical, the columns of a co-ordinate
representation
dom vectors.
c
~,
H(c)
(2.8), (2.36'), or (2.51) are interchangeable ranclearly holds when
ordinate representation derived from
not interchapgeab1e.
c
E[c]
= 1.
The columns of a co-
by (2.42) for
c
>
1
are
33
Proof.
Under
the columns of
distributed random vectors.
of
~
- X l'
are independent and identically
Furthermore, it is clear that the columns
are random vectors with a joint distribution which is a
symmetric function of the columns,
x - -X;
-.t
the columns are inter-
changeable random vectors with a non-random sum, Chernoff and Teicher
[9], case (a).
This is the point at which the proof breaks down when
considering invariance under (2.37) because the columns of
interchangeable in (2.42),where
T
is defined by (2.43).
~
are not
Now one ex-
Q and 1 are derived from
amines equation (2.36) and notes that
~
~!
~.
is expressed by (2.3) as a symmetric function of the columns of
- ! 1.',
and so the columns of
1i
defined by (2.36), each having
been multiplied by the same symmetric function of the columns of
-e
x - Xl''
~
are interchangeable.
Indeed, these random vectors are inter-
changeable in an even more restricted sense than those of Chernoff and
Teicher [9], case (a) because they have properties like (2.10) or perhaps even (2.26) as well as (2.11) as pointed out in Lemma 2.7.
This
concludes the proof.
Conditionally distribution-free tests invariant under
L(p)
can
now be constructed by considering the permutation distribution of the
columns of
1i.
Tests of this type are sometimes called randomization
tests for homogeneity.
Kendall and Sundrum [18] survey the subject of
distribution-free methods and consider conditional tests in §24.
Wa1d
and Wo1fowitz [43] give some results which make these techniques practical in large samples.
Suppose, however, that we wish to construct uncqnditional nonparametric tests invariant under
L(p),
or tests conditional on only
34
some of the properties of the sample realization
~.
Since tests of
this type do not use all of the information contained in the maximal
invariant,
~,
they cannot be easily characterized.
These procedures
are of interest because they are strictly distribution-free so that
their null distribution and critical values can be easily tabulated or
because they provide protection against outliers.
In Chapter III, we shall consider tests based upon the rank order
of the quantities
the origin,
!,
d;O
defined by (2.25), the squared distance between
and the representation of the m-th individual.
In
Chapter IV, we consider tests based upon the rank order of all (squared)
defined by (2.16), for
distances,
i
<
m.
In the terminology of
Kendall and Sundrum [18], §IS, these tests should be called "distribution-free under the null hypothesis" because the content of the critical
-e
region under alternative hypotheses of interest is not distributionfree.
This latter requirement is (d) of §13 of Kendall and Sundrum [18].
The so-called "Lehmann Alternatives," Lehmann [23], provide examples of
alternative hypotheses under which the content of the critical region
of standard rank tests is distribution-free.
mations,
L(p),
The group of transfor-
that we are considering is not sufficiently "large" to
construct nonparametric alternatives using the technique of Lehmann
[23], §8.
-e
CHAPTER III
DISTRIBUTION-FREE TESTS FOR GENERALIZED DISPERSION
3.1.
Expressing
E
in Polar Co-ordinates
E
In Section 2.2, we considered expressing
gular co-ordinates,
in terms of rectan-
However, we saw in Lemma 2.2 that such co-
~.
ordinate representations are not unique because the
can be oriented arbitrarily.
-e
r
orthogonal axes
If, however, a polar co-ordinate system is
used, the distances of the representations of the
N individuals from
the origin, the radii, will be the same in whatever way the "angles"
are measured.
When
r = 1,
all angles will be
0
or
, there will
7f.
be some question as to which angle to assign to all the points on the
same side of the origin, but the effect of angle is to multiply the
radius by
+1
or by
-1
in any case.
When
r
=
2,
angle can be meas-
ured from an arbitrary axis in either of two arbitrary directions.
In
the general case, the "angles" can be thought to specify the co-ordinates
of
N points located on the
sphere of radius 1 in
r
(r-1)
dimensions.
dimensional surface of a hyperThese non-unique co-ordinates
specify the projections of the representation
unique co-ordinates
-e
~,
E,
in terms of non-
onto the hypersphere.
In an interesting paper, Mardia [28] has given a distribution-free
test for detecting difference in location of two bivariate
(r
= 2)
sam-
36
p1es which uses only rank on angle as defined above.
his statistic, a monotone function of Hote11ing's
Wa1d and Wo1fowitz [43J calculated from
Mardia shows that
T2
suggested by
N "scores" equally spaced on
the circumference ofa circle, does not depend upon the way the angles
are measured and that, in fact, the angles can be measured on a plot of
~
with origin at
X as well as on a plot of ~. The ideas of
2 N
2·1
2 N
using rank on angle and scores on a circle appears to have originated
with B1umen [5J while considering a bivariate sign test for two matched
samples, but a similar setup was considered by Hodges [13J.
these sign tests are invariant under
the representation of
of
!)
L(p),
Both of
but angle is measured with
as origin (rather than the representation
and the differences of matched individuals as points.
The tests
of B1umen and Mardia both have asymptotic relative efficiency with respect to
T2
for normal shift alternatives of
TI/4 ~ .79.
This result
is somewhat surprising in that the univariate version of Mardia's test
is a median type test applied at the combined sample mean.
A simple
calculation shows that the efficacy of such a procedure is
2n1n2/TI(n1 + n )
2
cation;
when the two populations are normal differing in 10-
the asymptotic relative efficiency with respect to the t-test
of the procedure is thus only
for the median test.
2/TI ~ .64
in this case -- the same as
Mardia [28J, §15, considers the extension of his
technique to the multivariate multi-sample problem, but his suggestions
here are not well motivated.
The
Nx1
vector
defined by (2.24) and (2.25) clearly contains
interchangeable random variables.
Thus it is possible to construct
distribution-free tests using only rank on radius.
These tests will be
37
invariant under
L(p)
when radius is measured using
£
by (2.25)0
Rank on radius cannot be determined by examining a plot of
~
-
X l'
•
For example, individual 5 is closer to the origin in Figure 2.2 than
individual 4 but further from
X in Figure 2.1.
Since nonparametric tests sensitive to difference in location
based upon rank on angle and nonparametric tests sensitive to difference
in dispersion based upon rank on radius can be constructed, the question
arises of how these techniques can be combined.
Tests of this
type must, however, be conditional because the variables rank on angle
and rank on radius are dependent in general.
Obvious examples of this
fact are provided by the populations with unusual equal probability density contours discussed in connection with Figures 2.3 and 2.4, or more
simply by asymmetrical univariate distributions.
Therefore, conditional
tests of this type will not be considered because the conditional tests
of Chapter IV are intuitively superior.
3.2
The Univariate Case
When
2
d
mO
(3.1)
where
r = 1,
s
2
(2.25) becomes
(X
=
lm
- )2
- X
1
2
s
is defined by (2.30) and
1 :5 m :5 N.
Thus, as in Section
2.6, the ranks of the radii could be obtained from the quantities
(3.2)
for
-e
1:5 m :5 N.
The "radius rank,"
thus defined to be
~O'
of the k-th individual is
38
(3.3)
=
~O
where
c(u)
R'
(3.4)
-1?,
1,
=
..::0
1.+
2
or
N
L:
m=l
0
)
c(~O - d*
mO '
as in (1.4).
Then
(RIO' R20 , ... , ~O)
is a random vector whose elements are a permutation of the integers
1, 2, ••• , N when there are no ties.
all
of
c
Bo
Under the null hypothesis that
univariate populations are identical, all possible realizations
are equally likely -- the common distribution having been as-
sumed absolutely continuous so that ties occur with probability zero.
Let us consider, first of all, the two sample problem
(c = 2).
The
null hypothesis is then
·e
(3.5)
and the alternative of interest is
(3.6)
K:
PI
and
P
2
differ only in dispersion.
Then, in analogy with the Wilcoxon [44]- Mann-Whitney [27]
U
test
statistic, we write
(3.7)
v
=
and note that the null distribution of this statistic is the same as
the null distribution of
tion of
(3.8)
"e
U.
V is symmetric about
E(V)
with variance
Thus, in particular, the null distribu-
39
Var(V)
(3.9)
Thus,
H can be tested against
statistic
K when
n
l
+ n 2 is small using the
V and existing tables of the null distribution of
3.2.1. The Asymptotic Efficiency of the
U.
V Test
There is certainly no problem of establishing the fact that
(V - E(V»//Var(V)
pothesis,
H·,
is asymptotically Normal (0,1)
under the null hy-
see Dwass [11], Example 2, for one approach.
However,
in order to justify the calculation weare about to perform, we need
to establish that the asymptotic distribution of
V is normal under
alternative hypotheses in some neighborhood of the null hypothesis.
We
will consider a slightly more general problem in Subsection 3.2.2 and
show that, as a special case, the distribution of
V is asymptotically
normal if the distributions in the two populations are symmetric about
their common mean value and have bounded densities.
We will adopt the approach of Mood [30] to calculate the asymptotic relative efficiency (A.R.E) of the
standard
F
V test with respect to the
test when the populations have normal distributions.
Let
Xl' ••• , X
be independent random variables with Nl(~; 1) distrinl
bution ,and X + , ••• , ~ be independent random variables with
nl l
Nl(~; 0 2 )
distribution, and write n.
A.N for i = 1, 2. It fol~
~
lows that
(3.10)
d
1] ,
40
e
where
=
(3.11)
N -
2;"
2
(3.12)
and
(3.13)
Now
(3.14)
<
(3.15)
n n P[Y and Y' < 0
1 2
=
or
Y and Y'
where
(3.16)
2
[:. ]
=
1
the orthogonal matrix
(3.17)
l
=
1
1
12
12
-1
1
12
12
performs a rotation through
o
45,
the diagonal matrix
o
(3.18)
§
=
o
"-
>
0] ,
41
performs a change of scale, and
(3.19)
r
(N - 2)(a 2 -
=
1)
is thus the correlation coefficient between
Y and
Y'
Thus we have
that
(3.20)
E(V)
=
The efficacy of the
(3.21)
V test is therefore
I )
( . dE (V)
da
a = 1
Null Var (V)
=
l2n n (N - 2)
l 2
1T 2N(N
+
1)
and
(3.22)
A.R.E. (V, F)
= -6
~
.608
when the populations have normal distributions with common mean.
Other
efficiency results will be presented in Table 3.1; the technique of calculation is different from the above and is based upon the results of
the next subsection.
Under the same conditions, the
A.R.E.
of the Sukhatme [42] and
Ansari-Bradley [3] tests is the same as that of the
V test above.
This result is not surprising because all of these tests are intuitively
similar, although these other tests do not assume the data to be measured on an interval scale and the
V test does make this assumption.
There is reason to believe that a relatively good rank test for
dispersion invariant only with respect to linear transformations exists.
Moses [32], §3, argues that rank tests invariant with respect to mono-
42
tonic transformations cannot be very good for testing dispersion.
He
gives an example of a pair of distributions absolutely continuous with
respect to each other and with the same dispersion in all the usual
senses and a pair of monotonic transformations which provide either distribution with the greater dispersion and yet leave the ranks unaltered.
No such example can be constructed with the transformations being
linear.
Indeed, a "normal scores" test for dispersion, Capon [6], can be
constructed which will have
A.R.E.
with respect to the
F
test of
1
when the distributions are normal as above, but such a test cannot be
good for all distributions by the argument of Moses [32].
The question,
of course, arises of whether greater efficiency can be obtained if
greater weight is assigned to the extreme ranks in the
If one assigns weights
-1 RiO
F
(N+l) ,
where
F
V statistic.
denotes the distribution
function of the chi-square distribution with one degree of freedom, it
can be conjectured that the
with respect to the
F
A.R.E.
of the
V statistic, so modified,
statistic is equal to one.
Such an argument
will not be attempted here, however, because such a test would not be
of use in practice without tables of small sample weights and critical
values as provided by Klotz [19] for the test of Capon.
More powerful
rank tests for dispersion invariant only with respect to linear transformations will be considered in Chapter IV, but these tests will not
even be asymptotically nonparametric whereas
tribution-:free even in small samples.
V
type tests are dis-
43
3.2.2. An Asymptotically Nonparametric Test For Scale
In an important paper t Sukhatme [42] considers modifying U-statistics (Hoeffding [14]t Lehmann [22] and [24]t §6) by centering them at
an estimate of location.
We will adopt his approach and notation here
and consider generalizing the hypotheses (3.5) and (3.6) to
(3.23)
H*:
PI
and
P
K*:
PI
and
P
2
differ at most in location t
~d
(3.24)
2
differ in dispersion.
We note that our problems of testing
H or
H*
against
K or
K*
using rank tests invariant only with respect to linear transformations
are so similar to the problems considered by Sukhatme [42] that it will
not be necessary to give many details.
Write
x.J -
=
(3.25)
S)
t
where
c
~
with acceptance region
level
{:
I
otherwise.
and consider a two tailed test of hypothesis
t /
Y 2
~ U(atS) ~ I - t y / 2
at significance
y.
Three special cases when
S
= a are of interest:
(i) when
a
approaches plus or minus infinitYt the test is equivalent to the
Wilcoxon-Mann-Whitney two-tailed test for location t
the test is equivalent to the
when
a
=
~
V
(ii) when
a
test we have been studying t and
=
X
(iii)
is the (known) common mean value of the two populations t
44
u(~,~)
is a two-sample U-statistic (whereas
ate for testing
H against
K.
Similarly, for testing
H*
where
against
U(X,X)
K*
is not) appropri-
we have the statistic
is as in (2.40), which is invariant under
xCi)
(2.37) but not distribution-free by Theorem 2.9, and the statistic
U(~1'~2), where ~ i
is the mean value in population
P.,
which is a
two-sample a-statistic and is distribution-free under
H*.
Thus we have
1
THEOREM 3.1
If the distributions in populations
PI
and
P
2
are symmetrical
about their respective mean values and have bounded densities, the test
of
H*
against
K*
tribution-free.
based upon
In particular, with
(1)
lim
(3.27)
N -r
is asymptotically dis-
L(V:N(U(X
L
indicating distribution "law,"
(2)
,X
) - s*»
lim
=
N-r
00
L(V:N(U(~1'~2) - s*»
OO
=
where
is the asymptotic variance of
and, if furthermore
(3.28)
lim
N -+
~l
= ~2
=~,
L(V:N(U(X,X) - s»
then
=
lim
= Nl(O,
where
Proof.
-e
s
EU(~'~),
L(V:N(U(~'~) - s»
N-+oo
00
and
a2
a2)
,
is the asymptotic variance of
One notes the similarity of the kernel
the kernel of Sukhatme [42], eqn. 5.2, given by
~
U(~,~) •
given by (3.26) to
45
or
(3.29)
otherwise.
Instead of centering at medians as Sukhatme did, we are centering the
two-sample U-statistic at the means
-(1)
X
and
x(2),
which are one-
sample U-statistics but can be thought to be two-sample U-statistics in
a trivial sense by setting the kernel
nlX
j
in a formula like (3.25).
or
IjJ (X., X.)
1
J
The asymptotic joint distribution of
the statistics thus follows from well known theorems.
Our theorem is
thus little more than a restatement of Theorem 5.1 of Sukhatme [42].
One need only follow his arguments making a few slight changes to solve
our problem.
appearance in Lemma 3.2 and Theorem 3.1,
-e
( n )
2s-c
One notes that a factor of
( n)(s)(n-s)·
s c s-c
t h at t h ere are
terms with
makes a mys terious
whereas Hoeffding [14] shows
c
arguments in common.
Sukhatme's results are valid, but his proof can be clarified.
We now
verify a calculation which is slightly different for our problem than in
Sukhatme's.
ulations
(3.30)
PI
Let
and
F
P
and
2
G denote the distribution functions in.pop-
respectively, and write
=
=
f
[G(t z + x) + 1 - G(t z -xl ]dF(t 1 + x)
_00
(3.31)
+
~
o
=
[1 - G(t 2 + x) + G(t 2 - x)]dF(t l + x)
fO [F(t - x) - F(t + x)]dG(t z + x)
1
1
_00
-e
00
(3.32)
+
1
o
[F(t
l
+ x) - F(t
l
- x)]dG(t
2
+ x).
46
Then (3.31) implies that
aA(t lJ t )
2
at
2
(3.33)
=
t
1
t
=
2
=
Joo [g(x) - g(-x)][f(x) + f(-x)]dx J
0
a
and (3.32) implies that
aA(t lJ t )
2
at
l
(3.34)
=
t
1
=
t
2
=
a
Joo [f(x) - f(-x)][g(x) + g(-x)]dx •
0
Finally one notes that
(3.35)
aA(tJt)
=
t
-e
if
F
and
1
= t2 = a
t
1
=
t
2
=a
=a
at
a
t
G are both symmetric about the origin J an important point
in establishing the equality of the asymptotic distributions.
This con-
eludes the comments on the proof.
The results of the preceding theorem will now be used to calculate
the asymptotic relative efficiency of the tests based upon V or
(XJX)
u(x(l) Jx(2»
U
when ~l = ~2 and upon
for symmetric distributions differing in scale.
We assume that the expected values of the
populations exist and J without loss of generalitYJ take
We write
F(X)
~l
= ~2 = O.
for the cumulative distribution function (c.d.f.) of the
first population and assume
F(x)
=1
the c.d. f. of the second population is
- F(-x).
G(x)
FinallYJ we assume that
= F(O'x) where a
Then Theorem 3.1 implies the results summarized in Table 3.1.
-e
>
O.
The ex-
pression for efficacy is written so as to allow easy comparison with the
results of Sukhatme [4l]J (3.8) page 194 J applied to [42]J page 75 J and
47
TABLE 3.1.
The Asymptotic Relative Efficiency of
of
(x(1),x(2) )
P(!X1 '
1 - F(-x) ,
<
x)
when
~l
~2
=
and
for symmetric distributions differing in scale.
U
F(x)
u(X,X)
G(x)
F(crx)
for
a >
= F(x) - F(-x) = 2F(x) - 1
for
Asymptotic Expected Value
=
foo
°
x >
°
(2G(x) - 1)d(2F(x) - 1)
o
= 4foo F(O'x)f(x)dx - 1
o
d~
-e
Asymptotic Expected Value 10' = 1
Null Variance
=
=
2
2
xf (x)dx)
Efficacy
where
.60
A.R.E. (V,F)
=
.94
{
A.R.E. (V,M)
=
00
.80
{ .94
A.R.E.(U(O,O), M)
=
f(x) = f(-x)
for the uniform distribution on (-~,~)
for the double exponential distribution
for the double Beta II (1, 3)
for the normal distribution
for the double Beta II (1, 3)
1.06
for the double Beta II (1, 1)
48
of Ansari-Bradley [3], (37) page 1184.
The three procedures thus have
the same asymptotic efficiency for symmetric distributions differing in
scale.
This will not be the case if the distributions are not symmet-
rical, but Theorem 3.1 does not apply in this case,
is not
u(~'~), and A.R.E. 's of the V test
asymptotically equivalent to
and of the Sukhatme [42] test are unknown.
we compare the
u(X,X)
At the bottom of Table 3.1,
V test with the squared rank test for dispersion of
Mood [30], a test that is good for distributions with large tails.
example,
A.R.E.(M,F)
For
= ~ ~ .760 for normal distributions differing
2~2
in scale -- making it superior to the other tests considered in this
case.
The Double Beta II (1,8)
distribution is defined by (4.36).
We note that the study of the statistics
u(~'~)
and
U(~1'~2)
involves work with the distribution of the absolute values of deviations
-e
from the population mean.
The expected values of' these distributions
are the well known measures of dispersion called "mean deviations about
the mean."
Since we are considering rank tests, however, we can ex-
amine squares rather than absolute values -- i.e., we can change (3.26)
if
to
2
Yl
<
2
Y2 .
Thus, instead of working with the
half-normal distribution, we can consider the chi-square distribution
with one degree of freedom.
If the mean values of the populations are known, it is easy to
suggest statistics which will lead to tests which are more powerful
than tests based upon
natives.
u(~'~)
or
U(~1'~2) against specified a1ter-
The data from the i-th population is replaced by the squares
(or absolute values) of its deviations from the population mean value,
and the theory of Chernoff and Savage [8] is then applied.
The prob-
lem,of course, is to show that the corresponding statistics with
~
49
or
es tima ted by
and
asymptotically.
or
-(1)
and
X
-(2)
X
are just as good
In the null hypothesis case, the approach of Quade [36]
for showing asymptotic normality for modified statistics and general
scores is particularly appealing.
Finally let us comment on the extension of these univariate techniques to
c > 2
populations.
It is logical to perform a one-way anal-
ysis of variance on the radius ranks,
dispersion.
R '
mO
to detect differences in
The null distribution of the resulting statistic is, of
course, related to that of the Kruskal-Wallis [21] statistic,
H.
The
non-null distribution problems are handled by combining Sukhatme centering at
X
with the techniques of Andrews [2].
The work of Puri [34]
and Quade [36] can be used if general scores are assigned.
3.3.
The Multivariate Case
When
r > 1,
the expression (2.25) for
does not simplify as
to
it does in (3.1) when
r = 1.
It is not possible to measure distance
as in (3.2), so radius rank in the multivariate case is defined to be
(3.36)
where
\:0
c(u)
= l,~,
sample problem
(c
=
or
0
2)
we wish to test
(3.6) using the statistic
as in (1.4).
V of (3.7).
Then for the multivariate twoH of (3.5) against
K of
The null distribution of
V is
the same as before, but we shall see that the non-null distribution of
V is much harder to study than in the univariate case and that the
test is relatively inefficient because "generalized dispersion" is
only a univariate measure of dispersion.
50
Note in particular that the squared radii,
are computed in
X,
such a way that the data are not only "centered" at an estimator,
of the average location,
i =
(nl~l
+
p 1
of the two populations
n2~2)/N,
but are also "adjusted for variance" by an estimator,
§,
of
N
times
N f = nlI + n I ,
of the two pop2 2
l
p p
If the populations are such that expected values and variance-
the average dispersion matrix,
ulations.
covariance matrices do not exist, the large sample properties of the
test, except under the null hypothesis
H of (3.5), are unknown; better
X
estimators of location and dispersion than
called for.
and
S
~
appear to be
We were able to use the techniques of Sukhatme [42] in the
univariate non-null case to disregard the fact that
2
2
d lO ' ••. , dNO
are
not independent, but that approach will not work here because of adjustment for variance.
Thus let us suppose that the distributions under
consideration are such that
X and §/N
are consistent extimators of
It follows immediately that any
as
N
approaches infinity.
finite set of
k
of the squared radii -- say
~
and
2
2
d lO ' •.• , dkO
fixed -- converge in law (joint distribution) as
to the variables
••• , e
(X
(3.37)
-m
2
kO
(Cramer [10], p.254) where
- ~)I f- l (X - ~)/N .
-m
~
totically independent, we cannot argue that
U-statistic.
k
N approaches infinity
But, although we have that any finite number of the
sample
for
d 2 IS
mO
are
asymp-
V is asymptotically a two-
Note in particular that the above argument and
the theorem of Cramer cannot be used to show the convergence of the
order statistics of the
dIs
to the order statistic of the
els
as
approaches infinity because the order statistics may not have a proper
limiting distribution.
We can, however, state a theorem like Theorem
N
51
3.1 for the multivariate case by making the if part of the statement so
strong that the result is obvious from well known results on U-statistics.
THEOREM 3.2.
If the non-null distribution of
••• , d
that they are asymptotically independent, then
2
NO
V
is such
is asymp-
totically normal.
Of course we do not need the above theorem to show asymptotic normality
under the null hypothesis.
Suppose for simplicity that we are considering two bivariate normal populations.
c-vectors
-e
.£1
II and I 2 have the same orthonormal
Suppose that
and "£2'
and write
Then, suppose
further that
(3.38)
c' ,
=
=
~
so that
(3.39)
where
f
o
£
=
Ai = ni/N
as before.
£'
Note that (3.38) with
Y1 f Y2
and pos-
itive merely implies that the bivariate equal probability density contours of the two populations are ellipses (not circles) and are rotated
90
o
with respect to each other.
(3.40)
while
d
u
Then, if
2
ll
+
-
..E.1 = .H.2 = ..E.,
Y2
u
2
21
N(A1 Y2 + A2 y 1)
we have that
52
d
(3.41)
where
and
+
are independent
random variables.
We
say that the two populations differ in "generalized dispersion" if
0.42)
with
P1
having the relatively greater generalized dispersion if this
probability is greater than
Al = A2
=~)
~.
However, if
n
1
=
n
2
(so that
we see that (3.40) and (3.41) imply that the two popu1a-
tions do not differ in "generalized dispersion" even though they differ
On the other hand, if
-e
2
d
for
k
::f
1
and positive, then
2
= ke
and the two populations must differ in generalized
n +1,0
10
1
dispersion. Thus the multivariate v test is reasonable in situations
e
in which the alternative of interest, rather than being
is that the populations differ in that all
changed by the same multiplicative factor.
p
K of (3.6),
scales of measure have
CHAPTER IV
CONDITIONAL RANK TESTS ON ALL DISTANCES
4.1.
The Conditionally Nonparametric Distribution
Q2
The matrix
a null diagonal.
defined by (2.13) and (2.16) is symmetric and has
Thus we can focus attention on the
which are, say, above the maj or diagonal.
-e
ference rank,"
R..
1J
of the i-th and j-th individuals (for
R .. ,
1J
=
rank of
=
1:. + ~
2
d ..
1J
N
(4.1)
where
2
~,
c(u) = 1,
or
~
£.=2
a
butions in all
zero, there are
trix
(4.2)
R.
i < j.
c
>
1
(~)I
among
£.-1
k.=l
c(d
R
ij
{d
2
U
;
1
:5:
k.
i < j)
as
< £. :5: N}
2
2
ij - dk.£.)'
as in (1.4).
trix with null diagonal and
j-th column for
Then we define the "dif-
Denote by
R
the symmetric ma-
as the element in the i-th row and
Assuming absolute continuity of the distripopulations so that ties occur with probability
possible realizations of the difference rank ma-
Consider the null hypothesis
- p
c
and let the alternative of interest be
54
(4.3)
p
K(C) :
m
t.
1 ::; m
for some
p ,
m
E
Note in particular that all realizations of
even under the null hypothesis,
H(c)'
< m' ::; c •
are not equally likely
However, given realizations of
~
and
E, all N! permutations of. the col~mns of ~ are
p N
NN
p N
equally likely under H(c)' and therefore all N! permutations applied
E
simultaneously to the rows and columns of
likely under
H(c)
are conditionally equally
and are called permutationally equivalent reali-
zations.
c
On the other hand,
II
j=l
permutations of the columns of
n. !
J
and of the rows and columns of
E
within (but not among) the
ples are conditionally equally likely under
hypothesis of interest.
Thus for testing
K(c)'
sam-
the alternative
vs
H(c)
c
the
c
n. !)
NI/( II
J
j=l
permutationally equivalent realizations conditionally
equally likely under
H(c)
but not conditionally equally likely under
are of interest.
4.2.
The Standard Form of the Difference Rank Matrix,
In Section 1.1, the
N individuals in the
c
E
samples were num-
bered so that all the observations from the same population were in
contiguous columns of the data matrix,
~
-- the numbering of obser-
vations within populations being arbitrary.
Let us now consider renum-
bering the individuals so that the difference rank matrix may be put
into a standard form.
bered
k.l.
The individual numbered
i
will now be renum-
using an algorithm yet to be explained, and we define
where
i
and
j
are such that
t = k.
l.
< m
= k.J
(if
i
> j
55
we take
R
ij
to mean
Individuals
i
R.. = (N)
2
1J
if
and
and
min(R. ;
1m
m of i)
and
is less than
if
i
J
1<.. = j.
1
N respectively
min(R. ;
Jm
m of j).
Then
N
R* = (2) , and the renumbering algorithm is essentially completed
IN
if
=2
N
or 3.
However, if
a new pair of individuals
N
i
The individuals
integers
=
K[2]
i of
{i;
N
~
2q
pair of individuals
i
tively.
K[q] =
1 :::; i
:::;
4 we go to the second step and pick
j
to be renumbered
i of j
2
and
N-l
are to be chosen from the set of
and 1 :::; i :::; N}
I<.~l or 1<.-1
N
= max (Rlm ; l, m E
In general, if
~
and
respectively.
R.. -- R*2,N-l
1J
-e
I<.~l
R.. ). Finally we define
J1
j are to be renumbered 1
such that
R'I<.-l is less than R'I<.-l
1 1
J 1
q-th step will be necessary in which a new
a
and
j
and
K[2] )
are renumbered
q
and
N-q+l
respec-
The new individuals are to be chosen from the set
-1
i of 1<.1 '
{i;
N}
such that
1<.-1
N '
R..
1J
1<.-1
2 '
,,-;1
-1'
= R*q,N-q+l
... ,
1<.-1
or
q-l'
max(R.e.m;
l, m
-1
~-q+2
E
K[q])
and
and
.
is less than R'I<._l
Of course, i f N = 2t - 1 is odd, the
Ri l<.-l
J q-l
q-l
algorithm is finished by renumbering the single individual in K[t]'
giving it the number
q :::; l < m :::; N-q+l).
t.
Note that
R*q,N-q+l
From the above construction, the following should
be obvious.
LEMMA 4.1.
Two difference rank matrices are "permutationally equivalent"
if and only if they have the same standard form,
We now demonstrate the following.
E*.
56
LEMMA 4.2.
=
If the representation of the data is univariate (r
rankE
=
~,
then
for
1
~
t <m~
~
i
j ~ N, and
(4.4)
R~{)
~(~)
...-t..
R*
mj
R~N -- (N ) - 1.
Th ~s
' 1atter resu 1 t ~mp
. l'~es t h at
Furthermore, the
2
ordinary rank of the i-th individual (that defined by (1.3) when
p
= 1)
is
(4.5)
for every
=
or
i}
{N+1-k.,
for every
~
i} .
Thus knowing the difference ranks of univariate data implies know1edge of the ordinary ranks; the converse is not true.
Proof.
Remember that the individuals to be renumbered
1
to be those points represented as being furthest apart in
of the two points to be renumbered
1
and
Q2
N are
which
having been decided by use of the
arbitrary convention that it be the one having the closest neighboring
point.
Of the remaining points in the representation, the individuals
to be renumbered
closer to
1
2
than
and
N-1
N-1
must be furthest apart, and
is, etc.
with
1
~
i
~
t <m
~ j
~
t
N are such that points
the (closed) line segment connecting
2
must be
We now utilize the geometry of the
univariate case and note that the individuals renumbered
j
2
i
and
j.
i,
and
t, m, and
m lie on
Thus
2
and the first part of (4.4) follows. The final
dk.- 1 k.- 1 ~ dk.: lk.: 1
t m
~
J
part of (4.4) follows from the obvious fact that, for points on a line,
~e
if
.e.
is closer. to
i
than
m is to
j,
then
t
must be further
57
from
j
than
j
= N,
R~N ~
m is from
i.
Taking
= 1, £ = 2,
i
Rf2
we see that the convention of taking
Rf ,N-l·
see that
m
= N-l,
~ ~-l,N
and
implies that
Combining this result with the first part of (4.4), we
R~N ~
RI'm'
for all
£'
< m'
except that
(£',
m') ~ (1, N).
It should now be clear that the points are being renumbered so that
(4.5) holds.
The redundancy in (4.5) is due to the fact that sign
changes are contained in
data,
~,
L(l); one would have to return to the original
to see which order is conventional.
This concludes the
1 N
proof.
It should be obvious that there are
N x N difference rank matrices.
forms.
(~)!/N! standard forms for
Thus when
N
=
4, there are 30 standard
The realization
o 136
045
(4.6)
o
2
o
cannot occur for four points on a line because
Rf3
= 3.
R* = 4
23
is greater than
The only possible invariant representation of four points in
three dimensional space is such that each point is of distance
the other three points and
/3/4
from the origin by Lemma 2.6.
all six difference ranks are tied at the value
N
=
4.
when
r
=3
Thus
and
The geometrical configurations implied by the various realiza-
tions of
E*
are summarized in Figure 4.1.
five univariate possibilities with
occur with
"-
7/2
/2 from
r = 2
and
R~4 =
4.
E*
Case (a) corresponds to all
satisfying (4.4) but can also
The realization (4.6) is included in
Case (b), while Case (c) includes the possibilities in which the
58
FIGURE 4.1.
Realizations of
There are 30 standard forms with
R!2
E*
N= 4
R~4' R~4
R!3'
<
for
and
(a1)
5 + 3
with
R~4
= 5,
= 8 realizations
and
and
(a2)
with
Rb
5.
-e
(b)
6 realizations with
Rh = 5.
6 realizations with
R~3 = 4.
2 realizations with
Rh = 3.
8
(c)
realizations
with
and
where
R~4 < R~4
R*
23
<
R13
Rf3' R*34
or
R~4
,
=
5.
59
individual to be renumbered
2
is contained in the convex hull of the
representations of the other three individuals.
erties of
4.3.
E*
The large sample prop-
will be considered in Section 4.5.
Rank Tests Which Ignore Dimensionality
•
Let us now consider tests which utilize the difference rank matrix
E
but which make no attempt to account for the dimensionality,
the configuration.
r,
of
We will consider applying a One-Way Analysis of
Variance to the
treatment combinations corresponding to dif-
ference ranks arising from individuals within or among the
c
samples.
For this purpose, let us define
(4.7)
L:
i,mEP.
if
Rim
i
j,
J.
-e
the total within
the i-th sample,
T(i,j)
(4.8)
L:
L:
iEP.J.
mEP.
Rim
J
if
i :f j , the total between
the i-th and j-th
samples.
We could consider the more general situation in which a real valued
score,
~(Rim)'
~(i) ~ ~(i')
for
is used in place of the difference rank,
i
the two-sample case
testing
H(2)
>
i' .
(c
2)
Rim'
where
Without loss of generality, let us consider
summarized in Figure 4.2.
of (4.2) against
K(2)
Then for
of (4.3), we can use the small
sample permutation distribution of the standard variance ratio criterion utilizing the similarity or dissimilarity of
T(l,2)/n n 2 ,
1
and
T(l,l)/(~l),
T(2,2)/(~2). For alternatives other than K(2)'
different statistics will be of interest.
For example, if the
60
Setup of the Difference Rank ANOVA for
FIGURE 4.2.
R13 • •. Rln
N
= n l +n 2
Average Rank
R
2n
=R
l
c
=
2
R
l,nl+l
R1N
R
nl,nl+l
R
nl'N
l
= ..!.«N)
2
2
Sum of Squares of Ranks
= ~R2 =
(N)[(N) + 1][2(N)
222
Sum of Squares due to Mean Rank
=M=
R
nl+l,N
(N) [(N) + 1]2/ 4
2
2
Adjusted Sum of Squares
= ~R 2
-e
- M = (N)[(N)
2
2 + 1][ (~) - 1]
12
Within Sample 1
R
12
Between Samples
Within Sample 2
R
.l,nl+l
R
.n +l,n +2
l
l
~lN
R
.nl+l,N
R
:l,nl+l
R
nlN
~-l,N
T(1,1)
T(1,2)
T(2,2)
Totals
(nl)
2
n n
l 2
(n2)
2
Number of Observations
R
13
R
nl-l,n l
.-
61
alternative is that the populations differ only in location,
and
T(2,2)
ignored and
could be poo1edo
T(l,l)
On the other hand,
compared only with
T(2,2)
T(1,2)
T(l,l)
could be
if the populations can
differ in location even under the null hypothesis.
There are also appro-
priate variance ratio criteria when the hypotheses order the population
analogs of the within and between sample measures of difference.
A question of major interest concerns whether or not the large samp1e distributions of the variance ratio criteria follow the F-distribution -- and, if so, with what degrees of freedom in the numerator and
denominator and what
non~centra1ity
under different hypotheses.
This
problem will be considered in the univariate two-sample case in the
following subsection.
That discussion is followed by subsections on
estimating the asymptotic variance and the general multivariate
-e
problem.
4.3.1.
The Two-Sample U-statistics Approach
We now write
F
of the random variables,
P1'
and
Y ,
j
to be observed from
for the distribution function
to be observed from the first population,
X. ,
~
G for the distribution function of the random variables,
P •
2
Then we define
n
(4.9)
U(3,1)
=
E
1
~
i
< j
<
k
~
m
l~1$31(Xi' Xj , Xk ; Yl ) ,
where
(4.10)
3$31(s,t,u; v)
I I - lu-vl)
c( Is-ul It-vi)
c( It-u I - Is-vi)
c( s-t
=
+
+
62
and
c(u)
=
~,
1,
or 0
as in (1.4).
interchanging the roles of
m and
Similarly we define
n
and of
X and
Y.
U(1,3)
by
Next we
consider
(4.11)
U(2,2)
1
=
where
(4.12)
1Ji
22
c(ls-tl - lu-vl) ,
(s, t; u, v)
a statistic previously considered by Lehmann [22].
(4.13)
U(2,1)
=
:::; i
-e
L:
< j
n
L:
:::; m k=l
1Ji
Finally we define
X.; Y ),
2l (X.,
k
1.
J
where
(4.14)
21Ji
21
(s, t; u)
c(ls-tj - Js-uJ)
=
+ c(ts-tl - !t-ul),
and
of
U(1,2)
X and
is defined by interchanging the roles of
m and
nand
Y.
It is also necessary to revise the notation for difference ranks,
(4.1), by writing
R(X., X.,),
1.
R(X., Y.) = R(Y j , Xi)'
1.
J
1.
Then, from the formula (2.31) for
2
d (X., Y.),
1.
J
or
R(Y j , Yj ,).
it is clear that
(4.15)
R(Zi' Zj)
where
Zi
=
X.
1.
~ + N(l,l) (Zi' Zj) + N(1,2) (Zi' Zj) + N(2,2) (Zi' Zj)'
or
Y.,
1.
Z. = X.
J
J
or Y.,
J
63
N(l,l)
(4.16)
N(1,2)
(4.17)
m-1
=
(Zi,Zj)
J:
k=l
l=k+1
m
(Zi,Zj)
=
N(2,2) (Z.,Z.)
=
m
J:
J:
n
J:
k=l l=l
c( 1Zi
- Zj
I-
c(!z. - z.1 ~
J
I~
-
Xli),
-
Yl
!),
I~
and
(4.18)
J
~
We now write
N
= m + nand p = miN,
difference rank is
R = «~) +
1)/2
and we note that the average
and thus grows as
2
N •
After some
calculation, it follows that
(4.19)
2
=
[(1-;)
vN(U(2,2) - ~) + p(1-p)vN(U(3,1) - ;)]
1
r.;
1
+ N [2(1-p)vN(U(2,1) - 2)
-
(l-p) r.;
1
2- vN(U(2,2) - 2)
- 2(bp) vN(U(3,1) - ;)],
(4.20)
T)
N3~2 { (~n~·
- R}
2
=
[-
~ vN(U(2,2) - ~) + p(1-p)vN(U(1,3) - ;)]
2
+ ~ [2pvN(U(1,2) - ~) +
%vN(U(2,2)
- 2(1-p)vN(U(1,3) - ~)],
and
"-
- ;)
64
(4.21)
_1_ {
N3/2
T
(1,2) _
mn
2
iU =
[.yvN(UC3,l) -
~) +
. )2
(1-;
vN(U(1,3) -
- pvN(U(2,1) - ~) - (1-p)vN(U(1,2) -
~)]
t)]
+ J:...2 [similar terms] •
N
Now it is easily seen that, under the null hypothesis
(F= G),
~.
the above two-sample U-statistics have expected value equal to
now define a functional,
L(F),
of a distribution,
F,
all of
We
by the formula
(4.22)
-e
Then it follows from the work of Lehmann [22] and [24], §6, that in the
limit as
N goes to infinity and the null hypothesis holds
U(3,1)
(4.23)
IN
1
2
1
U(1,3)
-2"
U(2,2)
1
2
d
N
3
[~ ]
0'2
a
1
-1
2
[ -12
1
-2
-2
4
]
where
(4.24)
Since the asymptotic distribution of these statistics is seen to be
singular, we have that.
65
(4.25)
asymptotically.
U(1,2)
One also notes that the contributions of
are less important by a factor of
~.
If
p
=
are just as important as
U(3,1)
and
U(1,3)
and
~ (m=n),
notes that the leading term in (4.21) vanishes so that
U(1,2)
U(2,1)
U(2,1)
one
and
in contributing
to the variability of the between sample difference rank total,
It is easy to argue that the statistics
U(1,3)
T(1,2).
U(3,1), U(2,2), and
are not asymptotically singular under alternative hypotheses of
interest.
One notes that, as the populations undergo a shift in loca-
tion with respect to each other, the distribution of the statistic
U(2,2)
smaller.
-e
is unaffected while
U(3,1)
and
are stochastically
On the other hand, if the dispersion of the
creases relative to the dispersion of the
U(3,1)
U(1,3)
are stochastically smaller but
larger -- and vice versa if the
Y variables in-
X variables,
U(1,3)
U(2,2)
and
is stochastically
Y dispersion decreases relatively.
Thus, instead of using a variance ratio criterion as originally
suggested, we see that the statistics summarized in Table 4.1 lead to
large sample tests of
H(2)
noted that an estimate of
standard error,
00'
versus the alternatives specified.
L(F)
It is
is required to specify the asymptotic
of these statistics.
66
TABLE 4.1.
Large Sample Univariate Tests Based Upon Difference Ranks
F :: G
Xl' ••• , Xm i. i. d. from
Alternative of Interest
Test Statistic
VN(U(2,2)
t
l
=
2
1
-)
2
F
=
-
F
1
and
2
3
=
differ in location
Var(Y)
-
1
2
F
and
-)
G
1\
°0
1
L(F) - -
4
>
i.i.d. from
Var(Y)
G
Critical Region
Two Tailed:
ItIl
>
Z
1-0./2
One Tailed:
t
Var(X)
differ in location
Var(X)
p (l-p)
>
and/or
where
and
differ in
and/or
1\
vN(U(1,3)
G
-)
°
-e
G
variance
2o 0
0
t
and
1\
vN(U( 3,1)
t
-
Y , ••• , Y
n
l
F,
2
<-Z
1-0.
One Tailed:
t
3
<-Z
1-0.
67
4.3.2.
L(F)
versus
A(F)
The functional
L(F)
specified by (4.22) was studied by Sukhatme
[41] who showed that it took on different values for the uniform and
double exponential distributions.
Unfortunately, his formulas (2.10)
and (2.11), page 192, are missing a factor of four in the numerator
from (2.2), page 191.
It follows that the statistic
t
l
(Lehmann [22]) is not asymptotically distribution-free.
of Table 4.1.
A somewhat
similar functional,
(4.26)
A(F)
was considered by Lehmann [25], who showed that
distributions and that
A(F)
Z~
A(F)
~ ;4 for all
was close to its upper bound for the uni-
form, normal and cauchy distributions.
LEMMA 4.3.
(4.27)
Proof.
L(F) -
i~
L(F)
~2:
.
Rather than adopt the approach of Lehmann [25], we note that
Z=
for the statistic
sOl
U(2,2)
in the notation of Lehmann
[24], §6, the two-sample extension of the quantities
studied by Hoeffding [14], Theorem 5.1.
••• ,
l;
q
It follows that
(4.28)
for
1
~
i
< j
~
q,
where
q
is the number of arguments of the
U-statistic from the second sample.
and
-e
Now in
U(2,2)
we have
q
=
2
68
(4.29)
F _ G •
if
The proof is completed by noting that (4.28) with
i = 1
and
j
2
implies (4.27).
A short summary of known exact values of
Table 4.2.
The value of
.0171
is presented in
L(F)
reported for the normal distribution
resulted from a numerical integration of
L(N)
(4.30)
=
),), .~
x - -][iP(
1
.:I..-)
[lP(-)
f2
2
f2
-
I
-]
2
2
2
• cosh (~) exp ( -x 3-y ) dx dy
3
'"
The function
and
• 26712329
2cosh(~) was broken up into its two terms
exp(-xy/3) ,
and the accuracy of the calculation was checked by
noting that the exact value of
places.
A(N)
was reproduced to seven decimal
It is possible to write the exact value of
ni te series, however.
If the
4
x
1
random vector,
variate Normal distribution with mean vector
1
(4.31)
y(p)
4 4
exp (xy 13)
£.
2
1
1
...e....
12
12
..E-
1
1
p
rz rz
1
0
4-1
L(N)
y,
as an infihas a Multi-
and variance matrix
69
TABLE 4.2.
Some Exact Values for
A(F)
=
and
A(F)
P[X - X2 - X + X < 0 and Xl - X - X + X
1
5
6
7
3
4
1
1
1
o ~ L(F) - 4'
A(F) - "4 ~ 24 = .04166 •••
F
L(F)
Uniform
1
-4
--=
.01854
17
4
2 .3 3
=
.03935
1
.00555 t
103
3 2
2 ·3 ·5·7
=
.04087 *
.0171
arc sin
21T
=
.040215 *
22. 3 2. 5
Normal
A(F)
0]
409
27 .3 4
=
1
2.3 2
Exponential
1
-4
<
.02064 t
107
26 .3 4
Double Exponential
-_
L(F)
=,
=
"41
.03945
.0379 *
Cauchy
*
Reported by Lehmann [25]
t
Reported by Sukhatme [41] •
70
then the probability
(4.32)
=
P(Y1
>
0,
Y2
0,
>
is such that
(4.33)
L(N)
=
X(N)
=
and
(4.34)
.!4 + -!..
2'IT
1
arc sin 4
=
1
- - arc
.
212
'IT
Now t h e
.
quant~ty
P+(p)
1
s~n---
•
is a special case of the quadravariate prob-
abilities expressed in a general tetrachoric series by Kendall [16] and
Moran [31].
Unfortunately, the series for
p+(~) and p+(- ~)
do not
converge fast enough to be of any use in evaluating these probabilities.
Given a random sample of size
n
assumed to be from a single pop-
u1ation with an unknown distribution function,
L(F)
the functional
can be estimated by a one sample U-statistic which averages over
all seven-tup1ets in the sample.
In order for the kernel of such a
statistic to be symmetric, it must account for
orderings of its arguments.
--
F,
We note that
7!/4
possible
Theorem 5.2 of Hoeffding [14] implies that
by writing the kernel of
which are zero or one and applying Lemma 4.3.
U
n
as a sum of terms
The lower limit above is
71
the asymptotic variance of
U .
unfortunately the expression for
n'
1';1
involves a functional utilizing thirteen independent random variables
with distribution function
Another estimator
of
F.
L(F)
can be recommended on the grounds of
its computational simplicity, namely
L(F )
n
where
F (x)
pirica1 distribution function of the random sample.
presented in Table 4.3.1.
dividua1s
(k
~ l)
i
and
j
(i
n
is the em-
The details are
If we say that the differences between in~ j)
and between individuals
k
and
l
are in the same "class" if there is exactly one integer in
common among
i, j, k, and l,
then the next to last expression for
L(F)
in Table 4.3.1 shows it to be a sample intrac1ass rank covariance.
L(F)
can be written as a linear combination of U-statistics with
n
n
kernels of size less than or equal to seven and is thus seen to have
·e
bias
O(l)
n
as an estimator of
L(F).
The statistic can, however, be
shown to be asymptotically equivalent to the single U-statistic considered in the previous paragraph;
in particular, its asymptotic var-
iance is unknown although bounded above by a factor of
49/24n
as
before.
A naive estimator of
L(F)
will now be considered to get some
idea of the sample sizes required to estimate or simulate
proposed that the sample be divided into
n'
L(F).
It is
= n/7 seven-tuplets and
that only one ordering of the observations within each seven-tuplet be
considered.
Table 4.3.2 contains a summary of the relevant material.
Essentially, we are estimating
L(F)
by running a sequence of
n'
Bernoulli trials in which "success" is the event that
(IX1 - x21
>
IX 3 - x41
and
IX 1 - xsi
(!X l - x21
<
IX3 - x41
and
IX l - xsi < IX6 - X71).
>
IX 6 - x71)
or
In the last
72
TABLE 4.3.1.
Estimation of
F (x)
n
=
1
-(number of
n
G (x)
n
=
1
n(n-1) (number of
=
-_
H (x,y)
n
=
~
for
Xi'Xj
1
n(n...l) (n-2) (number of
such that
L(F )
n
=
foo
0
=
n
4
E
3
n (n-1) (n-2)
R.
o~
such that
i < j
Xi,Xj'~
such that
E
Ix. - x.1 s; x
~
J
i=1
-2
Rio
=
2..-1
n-
E
j;&i
R ..
~J
2
E
i;&j < k;&i
IX.-x.1 s; x)
~
J
Ix. - x.1 s; x)
~
J
i ;& j ;& k ;& i
for
and
n
n -1
[n(n-1)(n-2)(2)]
where
=
i;&j
G (x)G (y)dH (x,y)
n
n
n
i=1
=
n
~
for
Xi 'X j
L(F)
X. s; x)
such that
X.
1
-(number of
(n)
2
~
L(F)
R.. R' k
~J
~
[(~) + 1][2(~) + 1]
2
2
3n (n-1) (n-2)
IX i - Xkl s; y)
73
TABLE 4.3.Z.
Estimation and Simulation of
(i)
L(F)
=
p[IX 1 - xzl > IX3 - X4 '
=
~
where
1\
P
=
1\
E(p)
-e
(iii)
Xl' •.. , X
number of successes
Zn'
= L(F),
and
L(F))
<
F.
=
N (O,1)
1
+
1\
d
Zn'p = Bin(n' ,ZL(F))
is such that
Var(pl\)
IZL(F)(l- ZL(F))
ZL(F)(l - ZL(F)) <
7
- 16n
4n'
as
n'
+
00
,
where
1\
1\
Zip - L(F)
I
s;
P - L(F)
s;
IZL(F) (1 - ZL(F))
(iv)
For
n'
large
IX 6 - x71J
are independent random variables
7
Z&(P -
IX 1 - xsi >lx6 - x71J
and
p[IX1 - XZ ' > IX 3 - x41 and IX1 - xsi
-
with distribution function
(ii)
L(F)
p{lp _ L(F)I
>
Z~
Ip - L(F) I
k_} ~ Z~(-k)
4 Tn'
74
section of Table 4.3.2, we have tabulated the values of
to estimate
L(F)
n'
required
such that, with stated probability, the estimate will
not differ from the true value by more than
ample, we must have
n'
~
82,700
.01
or
.001.
For ex-
in order that we might estimate
L(F)
and have only one chance in four of missing the true value by more than
.001
o5
with our naive estimator.
L(F) -
L(F)
t
.04166
5
to within
.01
Since we know that
it is absolutely necessary to estimate
or else not even try.
To get some idea of the values assumed by
L(F)
for different
distribution functions, a simulation study was run with
n'
= 90,000.
Two types of errors, besides sampling variation, affect the results of
such a study.
First of all, pseudo-random numbers were generated using
an unsophisticated overflow algorithm.
Secondly, the appropriate in-
verse distribution functions were only approximate
particularly in
the case of the normal distribution simulations.
The results of the study are presented in Table 4.4; some typical
values are plotted in Figure 4.3.
and
33
The range of variables
1, 2, 3, 4,
was adjusted to match the largest number that could be
handled by the computer -- none of these densities would have a finite
integral from
to
_00
+00.
Much use was made of Beta distributions
because of the relative simplicity of their inverse distribution functions.
The Double Beta distribution of "type I" and with parameters
and
has density
0
(4.35)
f(x)
=
~
(1 _ Ixl)o-l
for
Ixl < 1
whereas the Double Beta distribution of "type II" and with parameters
1
75
TABLE 4.4.
Results of the Simulation Study
Distribution Types
LT
J
and
Distribution
-e
0_
Large Tai1(s)
M:
All Moments Exist
B
Bounded Range and Finite Density
U
Unbounded Density
~
Range
(0, e 174 )
l.
Density
= k/(l+x)
LT
2.
Density
= k/(l-x)
J
3.
Density
= k/(l+lxl)
LT
4.
Density
= k/lylZnlyl
LT
5.
Double Beta II (1, 1/16)
6.
t(F)
~(F)
.2862
.2867
(0, 1 - e-19)
.2858
.2864
( -e 174 ,e 174)
.2857
.2869
174
Ixl + e < e
.2855
.2870
LT
(_ 00, 00)
.2847
.2866
Double Beta II (1, 1/8)
LT
(_ 00, 00)
.2846
.2868
7.
Double Beta I I (1, 1/4)
LT
(_ 00, 00)
.2839
.2865
8.
Beta II (1, 1/4)
LT
(0, 00)
.2837
.2871
9.
Beta II (1, 1/2)
LT
(0, 00)
.2816
.2877
10. Double Beta I I (1, 1/2)
LT
(- 00, 00)
.2814
.2864
11. Double Beta I I (1, 1)
LT
(- 00, 00)
.2795
.2874
12. Cauchy
LT
(_ 00, 00)
.2778
.2875
13. Beta II (1, 1)
LT
(0, 00)
.2772
.2879
14. Double Beta I I (1, 2)
LT
(- 00, 00)
.2764
.2880
15. Double Beta I I (1, 4)
LT
(- 00, 00)
.2739
.2888
16. Double Beta II (1, 8)
LT
(- 00, 00)
.2729
.2891
17. Beta I (1, 1/4)
J
(0, 1)
.2720
.2893
18. Double Exponential
M
(- 00, 00)
.2706
.2895
19. Exponential
M
(0, 00)
.2689
.2883
Iyl
=
76
e
Table 4.4. (continued)
Dis t rib ution
·e
t(F)
~(F)
20. Logistic
M
(- 00 t 00)
.2682
.2898
21- Double Beta I (I t 4)
B
(-1 t 1)
.2675
.2896
22. Beta I (I t 1/3)
J
(0 t 1)
.2663
.2899
23. Normal
M
.2654
.2902
24. Beta I (l t 4)
B
(0, 1)
.2639
.2897
25. Double Beta I (1, 2)
B
(-I t 1)
.2635
.2907
26. Triangular
B
(0, 1)
.2609
.2901
27. Beta I (I t 1/2)
J
(0 t 1)
.2598
.2914
28. Beta I (I t 3/2)
B
(0 t 1)
.2585
.2906
29. Uniform
B
(0, 1)
.2566
.2909
30. Double Beta I (I t 1/2)
U
(-I t 1)
.2531
.2912
31- Double Beta I (I t 1/3)
U
(-I t 1)
.2521
.2907
32. Double Beta I (I t 1/4)
U
(-I t 1)
.2514
.2909
-19 ) .2507
1 - e
.2905
33. Density
'e
Range
~
= k/(l-lxl)
U
(-
(-1 + e
oot
-19
t
00)
77
FIGURE 4.3. Some Typical Results of the Simulation of
L(F)
vs.
A(F)
Distribution Types
LT:
M:
B:
and U:
J
Large Tail(s) -- e.g. Cauchy
All Moments Exist -- e.g. Normal
Bounded Range and Finite Density--e.g. Uniform
Bounded Range but Density Unbounded
-- e.g. Beta I (1, 1/2) is
and
J
Double Beta I (1, 1/2) is
~
o
Predominant
Types
.29
-_
Density = k/(l+!XI)
Double Beta II (1,1/4)
Double Beta II (1,1/2)
LT
.28
~
~.
ll.
~.
t
Double Beta II (1,1)
Cauchy
Double Beta II (1,4)
~
LL
""'-""
---J
Double Exponential
Exponential
Logistic
Normal
Double Beta I (1,2)
~
.27
f
-I
f
U
M
. 26
....
~.
~
>.
..
"
Triangular
B
.
~
Beta I (1,3/2)
~.
Uniform
~ Double Beta I (1,1/2)
= k/(l-IXI)
Density
• 25
'-
A(F)
.25
.26
.
~.
.27
••
~
~
.28
.29
78
1
and
8
has density
(4.36)
f(y)
for
-
00
< y <
00
•
The distributions considered are classified into one of five types,
and, as is obvious from Figure 4.3, a great range of combined values of
L(F)
and
A(F)
are possible with certain.types of distributions pre-
dominating in·certain ranges.
.273
(1)
~
L(F)
~
.286
is a region in which distributions
with Large Tails (LT) predominate.
Typical members are
the Cauchy distribution and (Double) Beta II (1, 8)
for
(2)
8::; 1.
.265::; L(F) ::; .272
is a region in which distributions
possessing finite Moments (e.g. Exponential, Logistic,
Normal) predominate.
(3)
.256::; L(F) ::; .264
contains mainly distributions with
finite range and Bounded density -- e.g. the Triangular
(i.e., Beta I (1, 2)) and Uniform (i.e., Beta I (1, 1)).
(4)
.250::; L(F) ::; .255
contains distributions ona finite
range but with unbounded "U-shaped" densities
particular the Double Beta I (1, 8) with
in
8 < 1.
One notes that Beta I (1, 8) has been called "J-shaped" for
does not fit into the above classification.
bution is in the
for
8 =
41
B region, .for
it is approaching the
1
8 = 3
LT
For
1
8 =-
2'
it is in the
region.
8
<
1
and
this distri-
M region, and
Distribution 2 is
79
J-shaped and has one of the largest simulated values of
x
the linear transformation
-+
1 - x
does not effect
L(F).
Making
L(F),
of
course, and shows that there is an essential difference between symmetric distributions on
the origin
at
±1
(L(F)
(L(F)
€
LT
€
(~1,
1)
region)
U region).
which have density unbounded only at
and those with density unbounded only
Some general nonsymmetrica1 simulations
were attempted by placing different distributions in the positive and
negative tails; without exception the resulting estimates were intermediate between their values for the separate distributions.
The most striking implication of the simulation study is that
is much less robust a functional of
F
than
ment leading to this result is as follows.
A heuristic argu-
A(F).
Suppose
Xl' X ,
2
are independent random variables with distribution function
·e .
L(F)
and
F.
X
3
We
and note that the correlation,
between
Y
l
and
Y
2
is
(if it exists at all) and
that the marginal distributions are symmetrical.
(4.37)
where
We note that
A(F)
ps(Y , Y )
1
2
is the "grade correlation" between
Kendall [17], page 136.
Y
1
and
Y ,
2
On the other hand,
(4.38)
but we do not know
p (IY11, IY ')
2
without knowing
FI
Finally, in
view of the results of the simulation study, there is reason to conjecture that
L(F)
of
only a small range of values for
L(F),
is always less than
A(F)
and that, for a given value
A(F)
is possible.
80
4.3.3.
Comments On the General Multivariate Problem
The conditional test utilizing the permutation distribution of the
variance ratio criterion of the One-Way Analysis of Variance of the
within and among sample treatment combinations is valid.
A
large sample procedure based upon this statistic is of· doubtful interest
because it would not be asymptotically nonparametric.
there can be no more than
(c - 1)
We have seen that
degrees of freedom in the numerator
of such a statistic and that the statistic would have to be modified by
an estimate of an asymptotic variance.
The resulting test would dis-
regard the dimensionality of the data without gaining any desirable
properties.
4.4.
The Assignment of General Rank Scores
S*
of withP N
in variable ranks defined by (1.2) and (1.3) can be thought to replace
Multivariate procedures which utilize the matrix,
the data,
~
by standardized scores or
c~rdinates
with respect to
P N
p
orthogonal axes.
The squared "distance" between the
l-th and m-th
~ (R~o - R~ )2. On the other hand, the approi=l J.-tJ.m
priate measure of distance between the l-th and m-th individuals when
individuals is thus
one is considering procedures invariant under the general nonsingular
linear transformation group,
rank order of these
ference ranks,
r = rank
r xN
E,
matrix
,B.
L(p),
is
dim
given by (2.16).
The
quantities is expressed in the matrix of difIn order to utilize the known dimensionality,
N N
of the sample, general rank scores in the form of an
:I must be associated with the maximal invariant
Rather than being a complete co-ordinate representation of
E
E.
as is
~
81
of Section 2.2, we wish
E
only through
Suppose
o<
u
<
1
.6
and
N ,r
to be robust in that it depends upon
r•
is a real valued function of argument
(u)
which may depend upon
(4.39)
O:';.6
N ,r
(u):';.6
~~,r(R)
._
Now d enote b y
N,r
the
~~
r.
N and
o
for
(u')
Let
<
.6
N ,r
(u)
u
for
be such that
u < u' < 1 .
N x N symmetric matrix wi th null diagonal
whose element in the i-th row and j-th column is
(4.40)
Now
.6
N,r
~.N,r(R)
._
(~~i: 1]
for
i
:f
j
.
'
d as a matr1x
'f
'
' d '1stances
can b
e V1ewe
0 square d 1nterp01nt
~
-- its elements being measures of dissimilarity between the corresponding individuals in the sample.
the function
.6 ,r(u)
We now ask if it is possible to choose
so that the matrix of general rank scores,
N
~,
can be calculated using standard techniques such as the "Principal Coordinate Analysis" of Gower [12] or the "Analysis of Proximities" of
Sheppard [39].
It should be clear that
(4.41)
~!
=
~ ,J;'
r N r
=
~
must have the following properties:
o
r 1
r N 1
and
(4.42)
I
~r
The property (4.41) actually implies no restrictions from a geometrical
point of view; we are merely asking that our rank scores be measured
with origin at their mean value.
The requirement (4.42) then implies
82
~'~
that
is a projection matrix -- i.e., symmetric and idempotent.
N r'N
The difference ranks derived from our scores,
as
E.
~,
are to be the same
In practice, we may be satisfied if there is only almost com-
p1ete agreement.
If, however,
~
does not have property (4.42), the
squared distances between the representations of its points invariant
under
itself.
L(p)
(~~')
must be measured from
-k
2~
rather than from
~
This fact is illustrated by the numerical examples plotted in
Figure 2.1 and Figure 2.2.
0
The rank order of interpoint distances is
6.5
8
6.5
9
10
0
2
2
0
5
2
4
0
(4.43)
0
(allowing for ties) in the raw data, while the rank order in the invariant representation is
0
(4.44)
3
2
10
6
0
4
7
9
0
5
1
0
8
0
A logical measure of the discrepancy between two difference rank matrices is the average,
~ =
1
(N)
2
N-1
L
i=l
of the squares of the differences between their
above diagonal
83
elements.
82
The measure has the value
=
5.15
for the discrepancy
between (4.43) and (4.44).
4.4.1.
Principal Co-ordinate Analysis of
We used the work of Gower [12] in Section 2.2 to derive a co-ordinate representation,
elements of
~~,r(B.)
._
~
~,
of the association matrix
£.
Although the
are no t actua11y square d d'1S t ances b etween
points in Euclidean space of
r
N
dimensions, we now propose to use a
modification of Gower's "Principal Co-ordinate Analysis" to derive the
co-ordinate representation
form the association matrix
~.
The first step in his procedure is to
-1
f.JN,r(R).
2 '"
'"
column means of this matrix so that
Now we remove the row and
will have property (4.41); the
~
resulting matrix
(4.46)
(1 - 1 1 1')(- 1 f.JN,r(R))(I - 1 1 1')
=
.-
N - -
2 '"
'"
'"
N - -
will obviously have a c-root of zero so that the technique never tries
to represent
and vectors of
N points in more than
~
are computed next.
N-l
dimensions.
The c-roots
However, so that
will have
~
property (4.42), we do not scale the c-vectors so that the sum of
squares of their elements is equal to their corresponding c-roots.
fact,
~
will not necessarily be non-negative definite.
If
In
has
~
negative c-roots, imaginary axes are required to actually achieve the
interpoint squared "distances,"
to be
r
orthonormal c-vectors of
(most positive) c-roots.
the first
f.J (R ).
ij
r
~
We propose that
associated with the
~
be taken
r
largest
The ideal case in practice would be to have
c-roots all equal even though this would imply that there
would be no unique way to pick the corresponding c-vectors.
The
84
technique is expected to be satisfactory if the first
r
c-roots are
positive and of approximately the same order of magnitude.
82
The measure
of (4.45) must be computed in any case to see how well the procedure
has worked.
As an example of this technique, the results of the calculation
are presented in Table 4.5 for all
when
(4.47)
N=4
30
E*,
standard forms,
possible
and for
!.:J
R..
1.J
[ (~)
]
+ 1
=
or
(4.48)
·e
!.:J
R..
1.J
[ (~)
]
2
R.•
1.J
+ 1
A careful study of the results of this, the simplest possible non-trivia1 case, reveals that either score function, (4.47) or (4.48), leads
to satisfactory results in general.
more natural; standard form
2
The measure (4.48) is perhaps the
is recovered exactly in this case.
rows of the table are arranged so that, using (4.48), the first
standard forms are best represented by the method when
becoming more and more two dimensional, and the last
are best represented by the method when
r = 2
r
13
=
1
The
16
but are
standard forms
and are becoming more
and more two dimensional. The ordering of the rows would be slightly
different using the same algorithm with score function (4.47); in particu1ar, standard forms
table.
12
and
14
would come much later in the
Actually, neither method is satisfactory for standard form
which cannot be achieved in one dimension.
5
There is reason to believe
that score function (4.48) will produce better results in practice for
85
TABLE 4.5.
Principal Co-ordinate Analysis of
r
=1
or 2,
;,S (u)
and
iCE)
=u
when
or u
N
= 4,
2
The standard forms are indicated by listing the above diagonal elements
by rows.
For types "a", "b", and "c", refer to Figure 4.1.
Columns (1) and (2) give the values of
;,S(u)
=u
for
r
=1
and 2
2
r = 1 and 2
for
achieved using
respectively.
Columns (3) and (4) give the values of
;,S(u) = u
82
82
achieved using
respectively.
Columns (5) and (6) give the largest and
secon~
largest c-roots of
2
;,S (u) = u •
using
The rows are numbered
1
to
30
and are arranged according to the
values in columns (3) and (4).
~e
Row
Standard Form
~
(1)
(2)
(3)
(4)
(5)
(6)
1.
136
25
4
al
0.0
2.6
0.0
2.6
20.7
2.2
2.
146
35
2
al
0.0
4.3
0.0
2.3
22.8
0.0
3.
146
25
3
al
0.3
2.6
0.0
1.3
21.6
2.6
4.
156
34
2
a2
0.3
3.6
0.3
3.6
22.7
1.5
5.
126
35
4
b
0.3
3.0
0.3
2.3
20.7
3.9
6.
236
14
5
c
0.3
1.0
0.3
2.0
19.8
3.8
7.
136
45
2
b
0.3
2.3
0.3
1.6
22.4
4.2
8.
246
15
3
al
0.3
1.0
0.3
1.3
20.9
3.9
9.
236
15
4
al
0.3
1.0
0.3
1.0
20.2
2.7
86
e
'e
Table 4.5.
(concluded)
Row
Standard Form
(1)
(2)
(3)
(4)
(5)
(6)
10.
156
43
2
b
0.6
1.6
0.6
1.3
22.3
4.6
11.
156
24
3
a2
1.0
1.6
1.0
1.6
21.2
4.1
12.
246
13
5
c
1.0
0.3
1.0
1.6
18.6
7.1
13.
136
24
5
c
0.6
1.3
1.0
1.3
19.8
4.3
14.
256
13
4
c
1.0
0.3
1.0
1.3
18.9
7.1
15.
136
54
2
b
1.0
1.3
1.0
1.3
21.1
9.5
16.
126
34
5
b
1.3
2.3
1.3
2.0
20.3
4.6
17.
126
45
3
b
1.0
1.6
1.3
1.3
21.2
6.1
18.
156
23
4
c
2.3
1.3
1.6
1.3
19.2
7.1
19.
146
53
2
b
1.3
1.0
1.3
21.0
9.6
20.
256
14
3
a2
1.0
0.3
1.0
20.6
4.8
21.
156
42
3
b
1.6
1.0
1.6
20.6
7.4
22.
126
43
5
b
1.6
1.0
1.6
LO
0.6
0.6
0.6
20.4
7.3
23.
146
23
5
c
1.3
0.6
1.6
0.6
18.7
7.4
24.
126
54
3
b
1.6
0.6
2.3
0.6
19.4
11.6
25.
146
52
3
b
2.3
0.3
2.3
0.3
19.1
12.0
26.
136
42
5
b
2.6
0.3
3.0
0.3
19.9
8.2
27.
156
32
4
c
3.0
0.3
3.0
0.3
18.9
8.1
28.
126
53
4
b
3.6
0.3
3.6
0.3
18.7
12.4
29.
146
32
5
c
2.6
0.0
2.6
0.0
18.7
8.5
30.
136
52
4
b
4.3
0.0
4.6
0.0
18.4
12.8
Note:
~
Columns (1), (2), (3), and (4) contain only repeated decimals.
Thus,
4.6
should be read
4.666···.
have been rounded to the nearest tenth.
-e
Columns (5) and (6)
87
r = 1,
while (4.47) could be used for
assign to
R..
r > 1.
the expected value of the
1J
Score functions which
R.. -th
largest of
1J
(N )
2
in-
r
de-
dependent random variables (each distributed as chi-square with
grees of freedom, say) are reasonable.
Of course, score functions which
recognize that the difference ranks and squared distances are dependent
random variables or which assign tied scores to tied or adjacent ranks
are also of interest.
..
Ana 1YS1S
.
P rox1m1ty
4 . 4 .2•
0
~~,r(E.)
._
f
~
The mathematical formulation and numerical.method of Kruskal [20]
is recommended for this calculation with Euclidean distance as metric.
Of course, the property (4.42) is not a standard part of this procedure,
so that, if
;[,*
r
is chosen by the algorithm as an improvement on the
N
i-th approximation,
of a solution
then the
(i+l)-th
ap-
proximation should be taken to be
(4.49)
-k
(;[,*;[,*') 2;[,*
=
r
r N
R..
Although the quantities
.6
1J
(
)
N,r (N)+l
are not squared distances, they
2
are "dissimilarities."
In fact, we know that Kruskal's algorithm will
lead to a solution with "perfect" fit
i.e., the "stress" will be
zero because a monotonic relationship will exist between dissimilarity
and the constructed distances or squared distances.
case because any co-ordinate representation,
iant
-e
E
~,
This must be the
of the maximal invar-
is a solution to the problem with zero "stress."
that this implies that
~
Note further
must not be taken as a starting point for
the algorithm of Kruskal because the result would be a final solution
88
of
=~
~
in one iteration.
thus very important if
representation of
E
~
The starting point of the algorithm is
is to be a robust estimator of a co-ordinate
depending only upon
E
and
r.
The principal
co-ordinate solution is thus recommended as a starting point, and a
solution for
~ with
82 = 0
in
r
dimensions can then be constructed.
A more simple starting point could be recommended for the univariate case so that the c-vector analysis of
avoided.
~
given (4.46) can be
The starting co-ordinates for the Proximity Analysis can be
chosen so that the distances between adjacent points are proportional
to the
N- 1
R!2' R~3' R~4' ""
difference ranks
contains the recommended starting co-ordinates for
~-l,N'
N
=4
Table 4.6
before ad-
justment to mean zero and sum of squares one.
'_
TABLE 4.6.
Possible Univariate Starting Points for
Row Reference
to Table 4.5.
R
23
R
24
Starting Point
(j1' j2' j3' j4)
1.
1
2
4
(0, 1, 3, 7)
2.
1
3
2
(0, 1, 4, 6)
3.
1
2
3
(0, 1, 3, 6)
8.
2
1
3
(0, 2, 3, 6)
9.
2
1
4
(0, 2, 3, 7)
S(7r(N), a)
Suppose we wish to test
with
(4.50)
c
~
2.
=4
R
l2
Tests of Structure
4.5.
N
We can rewrite
H(c):
Invariant Under
H(c)
of (4.2) against
H(c)
as
The columns of
distributed.
~
L(p)
K(c)
of (4.3)
are independent and identically
89
Suppose we restrict attention to procedures invariant under
and base our test upon a real valued statistic,
erty.
t(~),
L(p)
with this prop-
Then
t(~)
(4.51)
The statistic
t(Ji) •
=
t,
to be of any use for testing, must not be invariant
~.
under permutations of the columns of
We can write this latter
group of transformations as
(4.52)
~
~f
p N N
-+
P N
where
f
is contained in a finite subset,
orthogonal matrices of order
N:
One notes, however, that
TI(N)
as we restrict attention to tests of
L(p),
invariant under
(4.53)
H(C):
of the group of
those with exactly one unity in each
row and column and zeros elsewhere.
invariant in distribution under
TI(N) ,
we can replace
The distribution of
when
H(c)
H(c)
against
H(c)
holds.
K(c)
H(c)\ holds.
is
Thus, as long
which are
by
Ji is invariant under TI(N).
Note that we are certainly not claiming that the columns of
dependent when
t
are in-
We are now in a position to apply the
results of Lehmann and Stein [26] who showed that the theory of optimum tests of
H(c)
is the same as the theory of optimum similar tests
of the hypothesis that the columns of
tica11y distributed.
-e
(4.54)
Ji are independent and iden-
For this purpose, define
t(k)(~) = k-th largest value of t(~) for fE TI(N) ,
and let .a randomized test function be defined by
90
e
(4.55)
=
lJiaQO
QD ,
tQD
a(~)
if
t(~) =
t(k)(~)
0
if
t(~) <
t(k)(~)
aQp
can be so chosen that the test is of "structure
E
[0,1]
S(1T(N), a)"
the integer
and is invariant under
(4.56)
k
,
0
<
1,
t (k)
if
where, given
<
a
>
1
L(p).
and the function
Thus,in particular
aN!
making the test similar of size a when
holds; actually, we
c
need consider only N!/( IT n.! ) possibilities for t(~) but this
j=l J
complicates the notation.
·e
Before continuing, let us comment that most powerful tests of
against a simple alternative can be constructed .in theory.
Let
denote the generalized density of the distribution
respect to some measure on the Grassmann manifold
t(~)
= g(E)
by definition.
Gr,N-r- l'
with
and take
Unfortunately, no examples of such a den-
sity are known; and example of the measure which arises under
(so that
H(c)
c. can be regarded as equal to one) for a Multivariate Normal
distribution is given in Obenchain [33].
Let us now consider the large sample properties of the test
as compared with
lJia(~)
= lJia(E).
lJi a (J)
~
Sheppard [40] reiterates a theme that
he and other writers have suggested on heuristic and empirical grounds:
the rank order alone of the dissimilarities is sufficient to approximately recover the true distances up to a change of scale when
much larger than
r.
In terms of co-ordinate representations
N is
91
satisfying (4.41) and (4.42), this means that
(4.57)
~
r N
where
r,
is some
~
B W
r r N
~
r
~
r
x
orthogonal matrix and
N is much larger than
so it follows that
(4.58)
The argument for the result (4.57) has not been made mathematically
rigorous by any workers in the field of non-metric scaling; heuristic
arguments on this point note that the number of interpoint distances to
be rank ordered is increasing as
creasing at the rate
r
=1
or 2
N.
2
N while the number of points is in-
Sheppard [40] states that
gives good results.
N
= 10 or 15 with
The advantage of the methods included
under the heading of "Analysis of Proximities," rather than being simp1e mathematical properties of their solutions, is that they have
proven satisfactory in practice.
There are, however, certain "degenerate" cases in which the methods
of Sheppard [39] and Kruska1 [20] cannot recover interpoint distances.
An example is cited by Sheppard [40] in which the data breaks up into
two clusters in such a way that all between cluster distances are
greater than all within cluster distances.
However, we have modified
the usual non-metric scaling techniques by requiring the property
(4.42).
Thus there is no problem in finding the desired solution to
this and other "pathological" examples.
At any rate, it is now obvious that the choice of
-e
important in the limit as
N approaches infinity with
~(u)
r
is not
fixed.
It
92
c
IT n.! possible standard forms, E*,
j=l J
N is much larger than r because some of them must
should be obvious that all
cannot occur when
N!I
correspond to configurations which do not have equal variation in all
directions.
Furthermore, rank tests invariant only under linear trans-
formations which preserve the dimensionality,
r,
of the data act like
their parametric analogs in large samples -- they are certainly not
asymptotically nonparametric.
On the other hand, interest is now focused upon the small sample
properties of
~a(l)
because this procedure will be robust.
The fact
that the test is conditional is not a major drawback because the samples
are small.
An example of such a test when
from Rotelling's
T2.
c
=2
can be constructed
The permutation distribution of interest is
greatly simplified if we consider instead the monotonic function of
suggested by Wald and Wolfowitz [43].
(4.58)
where
=
7(i)
~
Because of (4.41), we have that
0
is the mean vector for the i-th sample as in (2.40).
(4.42) implies that the
~
matrix for the scores,
~,
is
Wald-Wolfowitz statistic (p.368, (7.3)) is
(4.59)
n l n 2 (N-l)
N
T2
(7
~
(1)
(2)
- ; - ) ' (7
-~
(1)
(2)
7)
~-~
I,
Then
and the
93
REFERENCES
[1]
ANDERSON, ToW. (1966).
Some nonparametric multivariate pro-
cedures based on statistically equivalent blocks.
Multivariate Analysis (Edited by P.R. Krishnaiah).
Academic Press, New York. 5-27.
[2]
ANDREWS, F.C. (1954).
Asymptotic behavior of some rank tests
for analysis of variance.
[3]
ANSARI, A.R. and BRADLEY, R.A. (1960).
dispersions.
[4]
-e
Ann. Math. Statist. 25 724-736.
Rank-sum tests for
Ann. Math. Statist. 31 1174-1189.
BHAPKAR, V.P. (1966).
Some nonparametric tests for the mu1ti-
variate several sample location problem.
Analysis (Edited by P.R. Krishnaiah).
Multivariate
Academic Press,
New York. 29-41.
[5]
BLUMEN, I. (1958).
A new bivariate sign test.
J. Amer.
Statist. Assoc. 53 448-456.
[6]
CAPON, J. (1961).
Asymptotic efficiency of certain locally
most powerful rank tests.
[7]
Ann. Math.
----
CHATTERJEE, S.K. and SEN, P.K. (1964).
Statist. 32 88-100.
Non-parametric tests
for the bivariate two-sample location problem.
Calcutta
Statist. Assoc. Bull. 13 18-58.
--
[8]
-
CHERNOFF, H. and SAVAGE, l.R. (1958).
Asymptotic normality
and efficiency of certain nonparametric test statistics.
Ann. Math. Statist. 29 972-994.
94
[9 ]
CHERNOFF, H. and TEICHER, H. (1958).
A central limit theorem
for sums of interchangeable random variables.
Ann. Math.
Statist. 29 118-130.
[10]
CRAM~R, H. (1946).
Mathematical Methods of Statistics.
Princeton University Press.
[11]
DWASS, M. (1953).
On the asymptotic normality of certain rank
order statistics.
[12]
GOWER, J .C. (1966).
Ann. Math. Statist. 24 303-306.
Some distance properties of latent root
and vector methods used in multivariate analysis.
Biometrika 53 325-338.
[13]
HODGES, J.L., Jr. (1955).
Ann. Math.
----
[14]
A bivariate sign test.
Statist. 26 523-527.
HOEFFDING, W. (1948).
A class of statistics with asymptot-
ica1ly normal distribution.
Ann. Math. Statist. 19
293-325.
[15]
JAMES, A.T. (1954).
Normal multivariate analysis and the
orthogonal group.
[16]
KENDALL, M.G. (1941).
Ann. Math. Statist. 25 40-75.
Relations connected with the tetrachoric
series and its generalization.
[17]
KENDALL, M.G. (1962).
Biometrika 32 196-198.
Rank Correlation Methods.
Charles Griffin, London.
[18]
KENDALL, M.G. and SUNDRUM, R.M. (1953).
methods and order properties.
Distribution-free
Rev. Internat. Statist.
Inst. 3 124-134.
[19]
KLOTZ, J. (1962).
Nonparametric tests for scale.
Statist. 33 498-512.
Ann. Math.
95
[20]
KRUSKAL, J.B. (1964).
Multidimensional scaling by optimizing
goodness of fit to a nonmetric hypothesis.
Psychometrika
~
1-27.
Nonmetric multidimensional
scaling: A numerical method.
[21]
Psychometrika
KRUSKAL, W.H. and WALLIS, W.A. (1952).
criterion analysis of variance.
~
115-129.
Use of ranks in one
J. Amer. Statist.
Assoc. 47 583-621.
[22]
LEHMANN, E.L. (1951).
Consistency and unbiasedness of certain
nonparametric tests.
[23]
LEHMANN, E.L. (1953).
Ann. Math.
----
[24]
[25]
Statist. 22 165-179.
The power of rank tests.
Statist. 24 23-43.
--
LEHMANN, E.L. (1963).
variance.
Ann. --Math.
--
Robust estimation in analysis of
Ann. Math.
----
LEHMANN, E.L. (1964).
Statist. 34 957-966.
Asymptotically nonparametric inference
in some linear models with one observation per cell.
Ann. Math. Statist. 35 726-734.
[26]
LEHMANN, E.L. and STEIN, C. (1949).
non-parametric hypotheses.
On the theory of some
Ann. Math. Statist. 20
28-45.
[27]
MANN, H.B. and WHITNEY, D.R. (1947).
On a test of whether one
of two random variables is stochastically larger than the
other.
[28]
Ann. Math.
----
MARDIA, K.V. (1967).
Statist. 18 50-60.
A non-parametric test for the bivariate
two-sample location problem.
B 29 320-342.
-e
~.
Roy. Statist. Soc. Sere
96
[29]
MATUS ITA , K. (1966).
A distance and related statistics in
multivariate analysis.
P.R. Krishnaiah).
[30]
MOOD, A.M. (1954).
Multivariate Analysis (Edited by
Academic Press, New York. 187-200.
On the asymptotic efficiency of certain
nonparametric two-sample tests.
Ann. Math. Statist.
25 514-522.
[31]
MORAN, P.A.P. (1948).
correlation.
[32]
Rank correlation and product moment
Biometrika 35 203-206.
MOSES, L.E. (1963).
Rank tests of dispersion.
Ann. Math. Statist. 34 973-983.
[33]
OBENCHAIN, R.L. (1969).
A linearly invariant canonical form
for multivariate data.
Series No. 606.
Institute of Statistics Mimeo
University of North Carolina,
Chapel Hill, N. C.
[34]
PURl, M.L. (1964).
Asymptotic efficiency of a class of
c-sample tests.
[35]
Ann. Math.
----
Statist. 35 102-121.
PURl, M.L. and SEN, P.K. (1966).
On a class of multivariate
multisample rank-order tests.
[36]
QUADE, D. (1966).
problem.
[37]
On analysis of variance for the k-sample
Ann. Math. Statist. 37 1747-1758.
RAO, C.R. (1965).
Applications.
[38]
Sankhya Sere A 28 353-376.
RAO, C.R. (1966).
Linear Statistical Inference and Its
Wiley, New York.
Generalized inverse for matrices and its
applications in mathematical statistics.
Research Papers
'J
in Statistics (Edited by F.N. David).
263-279.
Wiley, London.
97
[39 ]
SHEPARD, R.N. (1962).
The analysis of proximities:
Mu1ti-
dimensional scaling with an unknown distance function,
(
Parts I and II.
J
[40 ]
SHEPARD, R.N. (1966).
~.
[41]
Psychometrika
125-140 and 219-246.
Metric structures in ordinal data.
Math. Psycho1. 3 287-315.
SUKHATME, B.V. (1957).
On certain two-sample nonparametric
tests for variances.
[42]
~
SUKHATME, B.V. (1958).
Ann. Math. Statist. 28 188-194.
Testing the hypothesis that two
populations differ only in location.
Ann. Math.
Statist. 29 60-78.
[43]
WALD, A. and WOLFOWITZ, J. (1944).
Statistical tests based on
permutations of the observations.
Ann. Math. Statist. 15
358-372.
[44]
WILCOXON, F. (1945).
methods.
Individual comparisons by ranking
Biometrics Bull. 1 80-83.
© Copyright 2026 Paperzz