Obenchain, Robert L.; (1969)A linearly invariant canonical form of multivariate data."

A LINEARLY INVARIANT CANONICAL FORM FOR MULTIVARIATE DATA
by
ROBERT L. OBENCHAIN
Department of Statistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 606
JANUARY 1969
This research was supported in part by the National Institutes of Health of
General Medical Sciences, Grant No. GM 12868-05.
TABLE OF CONTENTS
ABSTRACT
i
NOTATION
ii
INTRODUCTION .
1
CONSTRUCTION OF THE REPRESENTATION
4
2.1.
Invariance with respect to Translation
4
2.2.
Invariance with respect to Nonsingu1ar Linear Transformation
5
2.3.
Individual Analysis of
E
Table 2.1.
Figure 2.1.
10
and
Figure 2.2.
GENERAL PROPERTIES OF THE REPRESENTATION
Figure 3.1.
and
Figure 3.2.
SOME EXACT DISTRIBUTIONS FOR NORMAL POPULATIONS
4.1.
A Single Multivariate Normal Population
Figure 4.1.
6
and
Figure 4.2.
11
13
17
21
21
27
4.2.
Comments on Several Multivariate Normal Populations
28
4.3.
Two Normal Populations Differing in Location
28
4.4.
Two Normal Populations Differing in Dispersion
29
SELECTION AND EVALUATION OF VARIABLES
31
SUMMARY AND USES
34
REFERENCES
37
i
ABSTRACT
Suppose that data on
from
c
~
1
p
~
1
variables in the form of random samples
populations is to be analyzed using statistical procedures in-
variant with respect to translation and nonsingular linear transformation.
Then it is shown that the only information of use from the data is that contained in its canonical form.
This representation of the data is constructed
using the work of Gower [1] and has many interesting properties. implied by
assumptions not usually realized. when viewed as a scatter of points in
Enclidean space of
p
or fewer dimensions.
The exact distribution of the
representation for a single Multivariate Normal population follows from the
work of James [2].
Possible uses of the canonical form are considered.
ii
NOTATION
x
is a
pxl
column vector, and
x'
p 1
(xl ... x )
p
1 p
is the corre-
sponding row vector.
1
is the pxl column vector of all unities, and
1 l'
p I p 1 q
matrix of all unities.
IS
p q
«x-<.].. »
and
I
~p
is a
pxq
matrix with
j-th
column.
x· .
-<.]
is the identity matrix of order
is a
as the element in the
pxq
~-th
p.
denotes the Kronecker or Right Direct Product of the matrices
and
row
A
~.
Q(n)
-1
-1
-1
n n-l
il.2
12.3
l(n-l)n
is the negative of the semiorthogonal matrix associated
1
-1
-1
Ii:2
l2.3
l(n-l)n
with the Helmert Transformation.
-1
o
/(n-l)n
o
-1
o
E-TL.~ ~Ix
~~~~.
/(n-l)n
o
o
o
E"<tr-. (~. 11) ) JU'Y Ie/
n-l
~~tJ.
/(n-l)n
Then
1:.' Q(n)
1 n n-l
I
~n-l'
and
n(n)n(n),
~
~
I
~n
-.lll'
n--
1
1.
Suppose
p
individuals from
~
characteristics
1
c
INTRODUCTION
~
(Xl' " ' , X )
p
populations.
1
selected at random from the
i-th
are to be measured on
Suppose
individuals are
population, and let
N
=
c
L
i=l
simplicity of notation, let the individuals be numbered
~j
= (X lj ' X2j ' " ' , Xpj )
For
n·~ .
1, 2, .•• , N , let
denote the random vector to be observed on the
j-th
individual, and write
~l
so that the first
tributed
through
(=
i.i.d.)
nl+n
Z
individuals.
are independent and identically dis-
random vectors from the first population, columns
are i.i.d. from the second population, etc.
only certain of the columns of
A row of
~
columns of
(1.1)
~
Thus, if
nl+l
c
>
1,
are interchangeable.
~
contains a particular variable to be measured on all
We have already imposed a structure on the columns of
!.
us now assume a model for the rows of
~
N
, let
In this paper, we will be concerned
with situations in which it makes sense to the experimenter to consider linear
combinations of the variables.
Although this assumption is not always rea-
sonable, this mechanism for combining variables is often assumed in standard
techniques of multivariate analysis.
In particular, let us suppose that the
information we intend to collect by observing
~
is only that information
which is invariant with respect to all transformations of the form
~
p N
~
~
p N
+ a l'
(1.2)
pIN
This research was supported in part by the National Institutes of Health of
General Medical Sciences, Grant No. GM 12868-05.
2
and
x
-+
P N
where
pxp
a
is an arbitrary
matrix.
pxl
B X
P N
~
p
vector and
(1. 3)
~
B
is an arbitrary, nonsingular
Note that transformations of this type do not combine data for
different individuals (columns) or populations; this is obvious when
B
are not functions of
a
and
~.
~
It will be shown that, under the above assumptions, the data
p
can
N
be replaced by its canonical form
Qil
W
r
~
is a
rxN
N
(1. 4)
random matrix with the following properties:
(i)
~
is a function of
(ii)
r
is an integer valued random variable taking values between
zero and
(iii)
only and can be uniquely determined.
~
inclusive.
min(p, N-l)
W
is maximal invariant under the group of transformations
r N
(1. 2) and (1. 3) on
x
P N
is distributed on a subset of the Grassmann manifold
(iv)
homeomorphic to
(v)
~j'
the
j-th
r,N-r
G
•
r ,N-r-l
column of
to be plotted using
represents the
G
r
j-th
~,gives
the coordinates of a point,
orthogonal axes in Euclidean space, which
individual.
The canonical form will be derived from a somewhat pedagogical point of
view so that the properties of
W will be systematically revealed.
The
experienced user of multivariate techniques may find that the considerations
of Lemma 3.3 are a more obvious starting point.
Rotelling's
T2
for detecting difference in location of two populations
is one example of a statistical procedure applicable to data which conforms to
3
our model.
Indeed, the further assumption that
Normal distribution with dispersion
nificance of
I
~N
~
~
.-
X has a Multivariate
could be made so that the sig-
P P
could be judged with respect to the central
On the other hand, the permutation distribution of
T2
F distribution.
can be constructed and
a conditional test can be performed without making further assumptions.
In
this paper, we wish to study only the implications of our basic assumptions
about translation invariance (1.2) and linear transformation invariance (1.3).
4
2.
2.1.
CONSTRUCTION OF THE REPRESENTATION
Invariance with respect to Translation.
Write
where
X
-
1
N
= -N
.E
y
x
X l'
P N
p N
p 1 N
X.,
~.
the form (1.2) on
writing
for
I
is invariant under transformations of
--
Note further that this invariance could be achieved by
lIT
0'. The choice
Y
X T
for any matrix T such that
p N
p NN
1 NN
1 N
specified in (2.1) recognizes the fact that I combines the columns
~
~
~
(individuals) of
of
¥.
and note that
j=l - j
and yet must retain the interpretation that the columns
Y represent the corresponding columns of
symmetric, idempotent, and of rank
space of dimension
If
(2.1)
c
>
1,
N-l
orthogonal the
n
I
1 l'
T
I
is
it is the projection matrix for the
Nxl
constant vector,
I
~N
-.!..
n
1.
Q
1 l'
2
--
(2.2)
-
o
Q
Q
a projection matrix of rank
location among the
Q
--
Q
~
This choice for
it might be of interest to consider
-.!..
N N
N-l;
~.
c
N-c.
1
n
1 l'
c
In this way, any possible differences in
populations can be ignored.
The choice for
T
spec-
5
ified in (2.2) will not, however, be considered in the sequel because it does
not lead to a maximal invariant statistic under the group of transformations
(1. 2) .
2.2.
Invariance with respect to Nonsingular Linear Transformation.
Write
P
X'(XX')-X
=
NN
(X X')
where
matrix
(2.3)
N P P N P P N
denotes any generalized inverse
XX' .
E
It is well known that
(= g-inverse)
for the
pxp
is symmetric, indempotent, and
uniquely determined regardless of the choice of the g-inverse involved in its
P
definition; see, for example, Lemma 5 of Rao [4].
N
is the projection matrix
Y.
associated with the vector space generated by the rows of
E
E
is invariant with respect to transformations (1.3) on
N
X,
I t follows that
and, therefore,
is invariant with respect to transformations (1.2) and (1.3) on
proof that
E
X.
The
is maximal invariant must be delayed until the end of the next
section.
Note that
=
y y'
N
N
P NP
If we are considering
N
.L:
(X; - X) (X. - X)'
j=I
-j
c > 1
--j
=
say.
S
N
(2.4)
P P
populations, we have
s (w) + s (a)
s
p p
p p
p p
where
§ (w)
p p
c
.<..=1
.L
§(-t)
c
.<..=1
.L
L
jEfL
.<..
(X. -j
x:(-t) )(X.
-
-j
-
:R(-t»,
(2.5)
6
is the within populations matrix of sample sums of squares and products t
i-th
denotes the individuals in the sample from the
11.
{.
population t
-(i)
X
(2.6)
is the sample mean vector from the
i-th
population t and
S (a.)
(2.7)
p p
is the among populations matrix of sample sums of squares and products.
2.3.
Individual Analysis of
E
Gower [1] has discussed characteristic root
(= c-root)
and vector
techniques in which a geometrical interpretation is given to the information
contained in
NxN
association matrices.
Analysis" because the element in the
association matrix compares the
taking all
pxp
p
..e.-th
..e.-th
and
These are techniques in "Individual
row and
rn-th
rn-th
individuals in the sample
variables measured into consideration.
matrices such as
~
column of an
Techniques which analyze
are more common in multivariate analysis and proceed
from the point of view of "Collective Analysis".
We will apply Gower's technique to the matrix
r = rank
E=
X~
rank
min(PtN-l).
If
r = 0t
there is some indication that the data is
idempotent t it has
r
then
P
given by (2.3).
X
p N
non~stochastic.
c-roots of +1 and N-r
c-roots of O.
X l' so that
p 1 N
Since E is
Thus the work of
Gower implies that the data should be represented using exactly
c-vectors of
E
associated with
c-roots of +1.
If
r=lt
Let
r
orthonormal
the c-vector
7
c-root of +1
associated with the single
is determined uniquely up to a con-
stant multiple which can change its length and the signs of all its elements.
P
unique way to pick the
orthogonal c-vectors of interest.
r
> 1
are all
equal~
Since the non-zero c-roots of
definition~
To overcome difficulties in
there is no automatically
we take
~
to be the almost
r N
everywhere unique matrix with the following four properties:
~'~
N r N
(i)
(ii)
~ ~'
r N r
(iii)
~.!
r N 1
(iv)
~
P
(2.8)
NN
I
(2.9)
~r
=
=
r N
( ~(l)
r N-r
._ (2)
F
where
(2.10)
0
r 1
~(2) )
r r
. a
1S
rxr
(2.11)
. W1t
. h a 11 e 1ements on t h e secon d ary
matr1x
diagonal (lower left to upper right) greater than zero and all
elements below the secondary diagonal equal to zero.
Equation (2.8) implies that the rows of
with
c-roots of +1.
~
E
are c-vectors of
Equation (2.9) implies that the rows of
associated
~
r
orthonormal.
to zero.
are
N
Equation (2.10) states that the elements in each row of
This property of
~
~
sum
I .!
0
P NIp 1
Then equation (2.3) implies that
follows from the fact that
as is obvious from equation (2.1).
E .!
0
so that 1 is a c-vector of E associated with a c-root of zero.
NN 1
N 1
Thus (2.10) follows because c-vectors corresponding to unequal c-roots are
orthogonal.
Equation (2.11) orients the representation in a unique way.
terminology of James
(2]~
(2.9) implies that
W
r
Stiefel
manifold~
Vr, N'
is an "r-frame" in the
N
with the special property (2.10).
makes it possible to interpret
~
In the
However~
as an element of the Grassmann
(2.11)
manifold~
8
Gr,N-r'
with the special property (2.10).
elements of
G
as the
r,N-r
in Euclidean N-space;
elements of
r-space.
G
r,N-r
Actually, James [2] interprets the
dimensional planes (passing through the origin)
r
we are considering the dual interpretation that certain
are very special scatters of
N points in Euclidean
In geometrical terms, equation (2.11) states that the right-handed
system of orthogonal axes should be "rotated" (an orthogonal transformation
with determinant
(1)
+1
or
-1) in r-space until
the individual numbered
N
(the last column of
~)
lies on
the positive half of the axis numbered one,
the individual numbered
(2)
N-l
lies in the plane spanned by the
axes numbered one and two and has its coordinate with respect
to the axis numbered two greater than zero,
the individual numbered
(k)
(N - k + 1)
lies in the k-plane
spanned by the axes one through k and has its coordinate with
respect to the
while
k ::;; r
k-th
axis greater than zero, and so forth
.
The set of all possible transformations (1.2) and (1.3) can be termed
L(p).
the general nonsingular linear transformation group,
L(p)
(~, ~)
will be denoted by
and transforms the matrix
The "product" of
a. l' + ~~
pIN
p p P
I )
defined to be (~(l) + ~(2)' ~(1)~(2))' (0,
formation, and
-1
(- ~,~
)
x
into the
P N
and (~(2)' ~(2))
matrix
~p
An element of
is the identity trans-
is the transformation inverse to
(~, ~).
We are
now in a position to prove the following:
E, defined as a function of ~ by
NN
P N
(2.1) and (2.3), is maximal invariant under L(p).
~,
or any other
THEOREM 2.1.
is
The statistic
r
N
9
E,
equivalent of
also has this property.
NN
Proof.
E
can state that
is maximal invariant if
(~,~)
implies that there exists
r=rank
If
~(i)
P
where
E
p
~(..{.)
~(2)
-1
~a)
p
p
-1/2
f (i) ~(i) ( ~U)
N
P
P
is some
~(l)
implies that
L(p)
E
S .
p,
P P
is
is invariant under
E(
=
(a*,
~(2)'
~*)
we
E(
~(2).
and we have
p
-
-X(
. . ( .')
P
N
-1/2' -1/2
~(i)
L(p),
~(l) ) =
~(2) )
N
NNP
N
NN P
which transforms ~(l) into
~(i)
for
-l' )
1 N
orthogonal matrix and
nonsingular) matrix such that
into
P
Since we have already shown that
-1/2
~U)
-1
= ~(i)
..{.
1, 2
(2.12)
is any (necessarily
Since
E(~(l»
= E(~(2»
~(l)
the linear transformation that transforms
where
~*
and
(2.13)
p p
a*
~(2) - ~*~(l)
p 1
-1/2
~(2)
is the inverse of
where
If
fact that
r=rank
(a*,
E<
~*)
(~(i) - ~(i)l') = r
however, the proof is somewhat complicated by the
p,
E
(2.14)
L(p)
is not uniquely determined.
there exist nonsingular
pxp
Since rank
matrices
such
~(i)
that
~(i) (~(i)
p
for
that
p p
i = 1, 2 where rank
X*
~(2)
= ~B*X*
~(l)
N
~ti)
-
r
~(i) l')
pIN
= r.
[
~~i) N
Q
p-r N
Now there exists
as was shown above when
(2.15)
r=p .
~,~*) E
L(r)
It follows that
such
10
TABLE 2.1:
Numerical Example for
r-:
~
2 5
-3
1
2
-1
1
-1
N=5
:)
fl
( 0.4
2 1
and
26.8
0.2 )
X
p=r=2
2 2
[
5.6
5.6 ]
7.2
0.367
P
5 5
~
0.134
0.471
0.010
-0.148
-0.460
w*
2 5
}i
2 5
0.000+
0.055
-0.074
0.644
-0.051
-0.457
0.156
-0.111
0.463
0.127
0.681
-0.225
0.094
-0.678 )
[ -0.592
-0.080
-0.066
0.797
-0.059
-0.076
-0.672
0.230
-0.163
[ -0.601
-0.139
-0.046
0.786
:.681 ]
0
]22
5 5
=
0.569
0
0.401
0.822
0
1. 931
1.114
0.846
0
0.933
1. 849
0.205
1.329
0
11
FIGURE 2.1.
The Raw Data,
X
2 5
2
FIGURE 2.2.
The Representation,
W
2 5
3
circle of radius
1
12
(~**, ~**)
E
L(p)
transfonns
~(l)
into
where
~(2)
~**
(2.16)
p p
and
a**
p 1
~
where
is any
(p-r)x(p-r)
_X(2) - B**X
~
-(1)
(2.17)
nonsingular matrix.
g,
The calculation of the linearly invariant canonical fonn,
be illustrated by a numerical example using artificial data with
and
N=5.
p=r=2
The results of the calculations are presented in Table 2.1.
E
g* contains a pair of orthononnal c-vectors of
2 5
5 5
c-roots of +1 calculated using the familiar "power method".
matrix
not have the property (2.17), "rotation" was required to get
~
will now
The
associated with
Since
g.
W*
does
The raw data
is plotted in Figure 2.1, and the representation is displayed in Figure 2.2.
Note that the linear relationship between individuals
in the representation.
The matrix
Q2
2, 3, and 5
is preserved
displayed in Table 2.1 contains the
squared distances between the five individuals in the representation.
Note
that, as was not obvious in the raw data, our model implies that individual
is evidently closer to or more like individual
individual
4.
5
than individual
1
2
is like
13
3.
GENERAL PROPERTIES OF THE REPRESENTATION
The following lemma is useful in the stepwise construction of projection
matrices.
~t
Let
denote the
l-th
row of
and write
~
p N
(3.1)
where
k~p
and
1, 2, ••• , p.
~
= (iI' ... ,
Thus
lk)
~ = ~(1,2, ••• ,p).
a permutation of the elements of
k
is an ordered subset of
ik.
Let
2!.'
(~) = ('TTl' •.• , 'TTl)
(l')
Write
integers from
E~
denote
k
1
for the projection
N N
matrix calculated from
the empty set.
Q where
NN
and define
denotes
Then we
Lemma 3.1.
(l' )
E -k
=
(3.2)
N N
(l' )
(l' )
(I N
~N1- 1') x::::n [X::::n
~
N
k
~
(l' )
(I _
~
(l' )
1l')X::::n ]-x::::n
N1- ~
~
k N
(I ~
11')
N1- -
(3.3)
N
(3.4)
where
.L
~(~)
(l'
=
(1 -
E~-l
)
)(1 - 1- 1 l')v
N - -lk
is the component of the
(3.5)
lk-th
row of
Y
,. . ",
= ,X(I
1 1')
. . ", . . , , - 1N--
which is orthogonal to the vectors in the vector space generated
and
14
1
1-
if
~(f')
-k
~tt')
-k
otherwise .
°
Finally,
and
+1
associated with a c-root of
Proof.
~(l')
-k
unless
is a c-vector of
~(l')
p
and is
0.
.:=fz.
Relationship (3.2) follows from the uniqueness of projection matrices
(f' )
x.:=fz.
and their invariance under transformations (1.3) on
generalized inverse,
"_"
The non-unique
in equation (3.3) can be replaced by the uniquely
determined Moore-Penrose inverse, "+",
discussed in Rao [4].
The formula
(3.4) then follows by noting that
[~.
:r
if
D'
~
,
g
=
+
f - ~'Q ~ ~ 0,
and
~'(1
- Q+Q) = 0'.
(l' )
In our case
so it is clear that
E
-k
when
g=O.
Following (3.5), we have given the geometrical interpretation of the results
of the calculation (3.4).
Lemma 3.2.
The sample mean of the projections of the representation on
any direction in r-space is zero.
on any direction is one.
The sample sums of squares of projections
The sample covariance between the projections on
any two orthogonal directions is zero.
15
Proof.
.!
e
be any
vector such that
e'e
= 0 by relation (2.10), and
e'O
1 r N1
rxl
= 1.
e'W
is the
1 r N
row vector of projections of the representation on some direction. Then
~'li
Let
~'li li'e
1 r N r 1
1 r 1
relation (2.9).
Finally, if
~'li li'~* = e'e* = 0
1 r N r 1
1 r 1
directions.
e*
is such that
if and only if
e
Then
and
e'e
e*'e*
e*
=
by
1
1 r 1
1,
then
represent orthogonal
In the following four lemmas, we will study the representation in terms
of the distances between its points.
between the representations of the
Let
l-th
dim
and
denote the squared distance
m-th
individuals.
Then we have
Lemma 3.3.
2
d
lm
Pu
+ Pmm - pm£. - Plm
(!m where
0
Proof.
d
~
2
lm
2
~
and
S
Xl)' ~-
(3.7)
(!m - .!l)
(3.8)
is given by (2.4).
The expression (3.7) follows immediately from (2.8) as pointed out by
Gower [1].
Then (3.8) follows by making use of the fact that
(3.9)
Plm
Indeed, a particular choice for
£
symmetric, but
determined.
gonal to
e.
space of
Y
~
Now
~
is symmetric and unique.
IN -
£
~
may not be symmetric although
Thus
dim
~
0
is symmetric, idempotent, of rank
is
is uniquely
N-r,
and ortho-
Thus the squared distance in the space orthogonal to the row
is
=
2 - dim
~
O.
16
Before proceeding to the next lemma, it should be noted that the specific
measure of distance (3.8) was considered by Gower [1] in §4.2.
Gower states
that use of this measure of distance "reflects the attitude that the group of
individuals is homogeneous" essentially because of the properties of the
representation that we demonstrated in Lemma 3.2.
to question this assertion.
There is, however, reason
First of all, one notes that the distances be-
tween points will almost surely be greater than zero and relatively different.
Suppose two characteristics are measured on a population such that the
bivariate density of the characteristics has unusual equal probability density
contours.
These contours would be circles or ellipses for the Bivariate
Normal distribution, so suppose the contours are squares or crescents or are
not even closed or connected.
Then data from this population and its corres-
ponding representation would display the same behavior in probability.
For
example, the artificial raw data of Figure 3.1 has the canonical form shown
in Figure 3.2 in which the original circles become ellipses.
If actual data
were to split like this into two clusters, there would be empirical evidence
for arguing that the population consists of two types of individuals (e.g.
men and women) so that the characteristics are distributed as a mixture of
two distinct distributions.
The linearly invariant canonical form can thus
display considerable heterogeneity among the individuals in the sample.
denote the squared distance between the representation of
Now let
the
m-th
point and the origin.
Then we have
Lemma 3.4.
(3.10)
where
~
N-l
N
N
and
~
m=l
r
~
N-l.
17
The Representation,
FIGURE 3.2.
W
2 15
• 10
15
• 4
5
e_
18
Proof.
The result (3.10) follows immediately from (2.8) as before.
another interpretation of this result, form the
(!
~*
that
d
of the
IN -
~)
2
mo
Pmm + 0 - 2(0)
m-th
~ l l' -
y* =
and note that
(Q
I
I
I
~D
px(N+l)
matrix
=~ rJ
E*
. Thus
To give
N+l N+l
implies
is the squared distance between the representation
individual and the representation of
X
E
N-r-l, orthogonal to
is symmetric, idempotent, of rank
1.
and orthogonal to
The squared distance between the
origin in this subspace is thus
1 -
N
l:
N
m=l
m=l
Lemma 3.5.
r
using Lemma 3.3.
= N-l
~
2
~N - dmo
~ O.
trE
l:
Pmm
P
implies that
rank
m-th
Now
E,
point and the
Finally
E
r.
P=I-..!.ll'
N --
.-
and
-1
-1
-1
N-l
I(N-l)N
I(N-l)N
I(N-l)N
I(N-l)N
-1
-1
N-2
o
I(N-2) (N-l)
I(N-2) (N-l)
I(N-2) (N-l)
W
N-l N
-1
+1
Il.2
Il.2
o
o
(N) ,
9
(3.11)
The representation is of
each point is of distance
from the origin.
1:2
N equally spaced points in
N-l
dimensions;
from every other point and of distance jN;l
19
£ = £' = £2 , PI = Q,
Proof.
jection matrix for the
constant vector
P(I-~ll')
N--
imply that
£
is the pro-
dimensional space of N-vectors orthogonal to the
1
1
£(1 - N1 1') = (1 - N1 1')·
is obvious by matrix multiplication.
£
~
N-1
Thus
1.
£ = N-l
and rank
On the other hand,
Thus we have
(N) ,
W
=Q
where
0fm
where
is our notation for the semiorthogona1 matrix
N-1 N
1
associated with the Ve1mert transformation. Then ~l~ = Ptm = °tm - N
(W
--{..0
-
denotes Kronecker Delta.
w )' (w
-m
--{..0
w)
-m
-
=
Therefore
~l~t
=
N-l
~
and
2.
In the next lemma, we will investigate the properties of average squared
distance in the representation with special attention to the case when
populations are under consideration.
br
l2
(f2 (i, j)
Let
t,m
I
.{. j
.{.
~
j
c
~
if
.{.
j
.{.
<
m
and
IT.
.{.
d
L
t
E IT·
mE
~
E IT •
2
fm
(3.12)
1
1
d
L
=
n.n.
where
c > 1
2
fm
if
i
< j
.{.
IT.
j
is as used in (2.5).
Then we have
Lemma 3.6.
(3.13)
-[ - 1 s (.) +-1. S(.) + (_-x(i)
tr {s
~
n·~.{.
n.~j
.{.
where
and
-(i)
X
j
are as in (2.5) and (2.6).
Furthermore
(3.15)
20
Finally
-2
d
(3.16)
L
f
<
m
is non-stochastic except for its dependence upon
Proof.
£2
222
~'
Write
= ~' +~' -
=
(d
lo
2E
' ..• , d
No
)
from Lemma 3.3.
diag(E).
Then
r.
We have
diag(£2)
= Q'
that each individual is of distance zero from itself, and
Then
1'£21
=
2Nr
expresses the fact
l'd
2
--0
=
trace P
=
r.
shows result (3.16), while the other results of the lemma
follow from direct algebraic manipulation.
The many properties of the representation we have considered imply
restrictions.
Thus we have
~
Lemma 3.7.
has
rN
elements but only
r(N-r-l)
functionally
r N
independent variables; the distribution of
Proof.
If
is considered to contain pN
~
p
W is highly singular.
variates which are not functionally
N
dependent, then
(p-r)N
"independent" variates are lost by restricting
attention to linearly combinable variables.
This loss will usually be zero
because only linear redundancies are detected and these usually occur with
probability zero.
Next,
r
constraints are added in achieving translation
invariance (1.2) and imply property (2.10), while
r(r+l)
2
constraints are
added in achieving invariance with respect to nonsingular linear transformations
(1.3) and imply property (2.9).
make the choice of
~
Finally,
r(r-l)
2
constraints are added to
unique and imply the property (2.11).
21
4.
4.1.
SOME EXACT DISTRIBUTIONS FOR NORMAL POPULATIONS
A Single Multivariate Normal Population.
James [2] shows how locally defined exterior differential forms can be
used to derive sampling distributions associated with a Multivariate Normal
population.
His result on the decomposition of a random sample will be
reinterpreted here.
c = I, N = n , and suppose that
l
Take
d
~
NpN (111"1
- - , ~N ~
=
p N
where
dF(
is positive definite.
L
~
that
~
)
P N
pIN
~
=
-1
almost surely.
1£1-
N/2
Then
f )
(4.1)
p p
Pr(r=p)
if
1
N
>
P
and we have
Now (4.1) implies
1
etr{- -2 ~L
-1
(X
- E~') (~'
~
n
-1-~')}
1 ::; -<.. ::; p
1 ::; j ::; N
(21T)¥
dX ..
-<"j
(4.2)
We now consider the He1mert Transformation to get
d
g(N)'
(4.3)
N N-1J
We now transform
k as in James [2], page 69:
z
where
(4.4)
Q1 ~
p p p N-1
p N-l
12 is the diagonal matrix of c~roots of kk'
~
matrix of c-vectors, and
Z
=
Z'
~
(t -
p N
p N-l p
does not depend upon
1- 1Q'k
~
because
is the orthogonal
Then
ill') ~'
(4.5)
Np
~
~'
p N-1 p
=
Q
~'~
N-1 P N-1
while
I
~p
=
Q (N)'
N-1
E Q(N)
N N N-1
(4.6)
22
does not depend on
Q or 1
(and therefore
A
£
r
~
p
and
N ~ N-1.
(4.7)
p p N-1
is chosen so that
Note that
~
Now we can write
£ Ii
p N-1
where the orthogonal matrix
§).
H has property (2.11) with
does not have property (2.10).
Finally
we have
~
Ji
p N
g(N)'
(4.8)
p N-1 N
We can now restate Theorem 8.1 of James 12] as summarized in equation (8.22) of
[2] .
THEOREM 4.1.
~
Let
p
have the probability element (4.2), and let
N
~)
~.
be the Canonical Form of
N-p-l N
is orthogonal.
rtl
Let
~
be chosen so that the
=
..1
b'
[
~
~-P-l
N x N matrix
Then the distribution of
X
can be decomposed into four in-
dependent distributions:
dF(
~
)
p N
(4.9)
23
where
(1)
d
X
1
Np(~, -L)
N-
=
dFQ9
-
(2)
S
d
so that
.E.
N2 Ifl- 1 / 2
(2IT)p/2
p
N, -1 exp{- -(X - ~)L (X - ~)} IT dX.
2- - - ..t
,[=1
W (N-1, L)
p
-
p p
so that
_ N-1
N-p-2
I~I
2
p(N-1) p(p-1)
2
2 IT 4
P
If I
dF(§)
2
(4.10)
1
etr{- - L
2 -
-1
S}
IT
-
,[ ~ j
dS ..
(4.11)
..tj
IT
,[=1
does not depend upon
(3)
~
is distributed on the subset of
with property (2.10)
}i
P N
[homeomorphic to
because of (4.8)] so that
[._p(N;e-l)
dF(}i)
=
.
r(N~i)] . N-p-1
IT
p
IT
,[=1
does not depend upon
(4)
~
.
j=l ..t=1
r(~)
2
or
p
IT
I,
b'.dw.
(4.12)
-j-..t
and
f , as chosen in (4.7), has the invariant distribution on the
orthogonal group so that
_p(p+1)
P
IT
4
=
does not depend upon
~
c'.dc.
IT
,[=1
or
..t
< j
f, where C = (£1
The above results follow because James shows that
"invariant distribution" on
G
respect to the transformations
N
p, -p-
~
c )
-p
g of (4.7) has the
However, this invariance is with
1
p N-1
(4.13)
. - j -..(.
7
~
Q
p N-1 N-1
where
Q is any orthogonal
24
matrix.
Thus we have
COROLLARY 4.2.
~
If
has the distribution (4.1), then
W is invariant
in distribution under transformations of the form
x
-+
P N
Xl'
pIN
+ x V
P NN
(4.14 )
where
g,(N)g
g(N)'
(4.15)
N N-l N-l N
g
and
is orthogonal,as well as strictly invariant under transformations of
the form (1.2) and (1.3).
Note that the transformation matrices (4.15) have the property
=
Y,Y.,'
y
Thus
1
= I-N"1.1
,
acts like a projection matrix to remove the row means of
"rotates" these projections.
~
Y,'Y,
are included in (4.14).
and then
~
Note also that all reorderings of the columns of
For example,
g
implies
N-l N-l
1
- -
Y
1 1
,
N--
N N
so that the data for the first and second individuals are interchanged.
fact, the distribution of
columns of
~
when
c=l
because the columns of
~
In
has to be invariant under reordering of the
whether or not
~
(4.16)
X has the distribution (4.1)
are independent and identically distributed in this
case.
E
~
have the distribution (4.1), let
be the
p N
projection matrix calculated from ~, and let Q2 be the matrix of squared
COROLLARY 4.3.
Let
distances between points in the representation as in Lemma 3.6.
of
~*
p N
be any orthonormal set of c-vectors of
E
Let the rows
25
associated with c-roots of
normal set
Then
0
dF(~)
f
c-vectors of
+1,
~*
be any orthoN-p-l N
associated with c-roots of +1.
and let the rows of
P
1
,
NII
-
I -
in decomposition (4.9) can be replaced by
N-p-l
K(p, N)
dF(f)
*
p
IT
j=l
*
(4.12')
b.'(dP)w.
IT
-<-=1
~ -,{.
-j
or
N-p-l
K(p, N)
IT
j=l
where
dE
~
2
l(dd l' + ldd 2 , - dQ2)
2
-a-
~*d~*'
~
~
d~*'~*
0
2 )W*'
B*(- l2 dV
~
~
B*(dP)W*'
~
-<-=1
*
(4.12")
b. ' (
-j
is the constant term from (4.12).
K(p, N)
Proof.
p
IT
+
~*'d~*
l'~*'
because
1 Np
Thus
0'
1 p
B*l
0
I
and
So, taking
Q
~*~*'
N-p-l N 1
N-p-l 1
N-p-l N P
N-p-l P
~*
rather than ~ as our "reference" p-frame in V N (James [2], p. 61),
~p
p,
we arrive at the probability elements (4.12') and (4.12") by noting that the
double product of linear differential forms in (4.12) is simply the exterior
product of the elements of
-t
Q2
without knowing
~*d~*'
~,
.
Note that
E
can be constructed from
the vector of squared distances from the origin,
~
using the technique of "Principle Co-ordinage Analysis" due to Gower [1].
set of possible realizations of
possible realizations of
manifold
Q2
E
(the "Projection group") and the set of
are thus both homeomorphic to the Grassmann
Gp,N-p- 1 .
The probability element (4.12) for
W is expressed in terms of locally
defined exterior differential forms and cannot be simplified in general.
us consider the very simple case
p
= 1 and N
sort of distribution implied by (4.12).
p(N-p-l)
The
=1
3
to get some idea of the
contains
!i
1 3
functionally independent variable because (2.9) implies
Then
Let
26
2
2
2
W + W + W
123
W
3
>
O.
=
W + W + W
2
l
3
(2.10) implies
1 ,
0,
and (2.11) implies
We have
lr=I=
=
'IT
(4.17)
Y2-3W:
3
and
- a
where
o
=+
a
1
~
or
2
- 1
otherwise
3W~
Note that the marginal distribution of
1
with parameters
W ,
3
and
2
1
2
-t
for
2
1, 2, or 3
is Beta
Figure 4.1 displays the marginal density of
and Figure 4.2 shows the possible values of
W
l
and
W
2
given
W
3
Thus we see that, starting with random variables having independent and
identical distributions on the entire real line and central tendency displayed
by the familiar bell shaped curve, we end up with dependent, bounded random
variables that have a tendency to assume their extreme values.
The case
p
=
1
and general
N can be handled by making a polar trans-
formation as in James [2], equation (5.7).
W
is closely related to the N-2 variate Dirichlet distribution
1 N
1
It follows that the marginal disparameters all equal to
tribution of
with
N-l
tribution of
N-2
2
It is then seen that the dis-
~
N W~
N_l-t
2
for
i = 1, 2,
the marginal distribution of
skew for large
••• , N
W.
-t
is Beta with parameters
for
1,2, ""
N
1
2
and
is positively
N rather than negatively skew as in Figure 4.1 for
N
=
3.
FIGURE 4.1.
r-
FIGURE 4.Z.
N
------------ --- -1--- -- --I
I
+ .75
I
I
1. 75
I
1. 75
+ .50
1. 50
1. 50
+ .25
t
~~
1. 25
1.25
+
o
.Z5
NII=
II
,-...
1. 00
1.00
C"'l
~
'-'
- .25
....
max (WI ' W )
Z
.75....
I
I
I
'.75
I
I
-
.50
I
I
I
.50
o
...L..
+
o
•
.25
I
i
. 50
W -+
3
'I
I
--l
:,,+
1
.7
12/3
min(W , W )
1
Z
I
.50
-
.75
I
I
28
4.2.
Comments on Several Multivariate Normal Populations.
The distribution of
W
N
restricted to the special case
p
Wishart distribution.
is unknown in this case.
f l = ... =f c
The distribution of
of analysis performed by James [2] when
independent.
S
If attention is
will have the noncentral
W does not follow from the type
=1
c
because
Sand
Ware not
Thus, when James [3] derived the marginal distribution of
he did not integrate over the possible realizations of
W.
~,
Instead, he con-
sidered the integral of the density of the Multivariate Normal distribution
t1
N(
g' g
~ f)
with respect to the invariant measure for
N-l N-l N-l
P P
the orthogonal group of order N-l. He thus replaced the joint
g
p N-l N-l
g E O(N-l),
~
density of
and
~
by the density of
~
and a statistic independent of
N( C M
IN-l ~ f f f') over
p p N-l
p p p p
one could replace the joint density of ~ = ~12~, and !i by the
If one were to integrate the density of
f
E
O(p),
joint density of
1 2 ,~,
and an independent statistic.
The very simple case
c
= 2,
p
= 1,
n
l
= 2,
and
n
2
= 1
be attacked from first principles for comparison with (4.17).
s
2
can, however,
Write
3
L:
]=1
4.3.
Two Normal Populations Differing in Location.
Suppose
d
where
~
~
0
without loss of generality.
(4.18)
Then
(4.19)
while
29
e
-s
(4.20)
and
1
f(W )
3
where
22]
)J W3
(4.21)
y(Z' -3-)
I
v
(z)
is the modified Bessel Function and
y(a, x)
is the incomplete gamma function.
x
fo
t
a-l
e
-t
dt
f(W , W , W )
l
2
3
Furthermore,
as in ( 4 . 17) .
Note that
)J
= 0 reduces (4.21) to (4.17), reduces (4.20) to the
distribution with 2 degrees of freedom, and implies that
S2
and
W
1 3
are independent. However, )J > 0 implies that
has non-centrality
2
2)J
2
W
are not independent (although both are independent of X) ,
3
s
and
1 3
and (4.21) is even more negatively skew than (4.17) as plotted in Figure 4.1.
~
Note that
(4.22)
where
~(z)
is the cumulative distribution of the standard normal distribution.
Thus one can claim that (4.21) retains some direct relationship with the normal
distribution when
4.4.
)J > O.
Two Normal Populations Differing in Dispersion.
Suppose
d
1 0 0
N ( (0 0 0), (0 1 0 »
3
0 02
o
(4.23)
30
Then
-2
TI
while
S·
s2( 1 + 3(1-0 2 ) W2 )}
• exp {- 2-3W2
2
1+2 02
3
3
(4.24)
(4.25)
and
(4.26)
where
F (a, b, z)
is Kummer's form of the confluent hypergeometric function.
1 1
Here
0 2 =1
However, if
reduces (4.26) to (4.17) and shows statistical independence.
0
2 11,
s2
and
H
1 3
also dependent upon
X.
are not only dependent on each other but
31
5.
SELECTION AND EVALUATION OF VARIABLES
In order for the model for
introduced in Section 2 to be valid in a
X
p
N
given experimental situation, the number of variables
the number of individuals
(Xl' ... , X )
p
to be observed,
to be examined, and the particular characteristics
to be measured should be selected in such a way that the
H
properties of
demonstrated in Section 3 have meaningful physical inter-
N
r
pretations.
(N)
(p)
Since there will be exactly
r
equally important orthogonal axes
of variation in the invariant representation and
we should take
p
~
N-l
and select
A principal components analysis of
(Xl' ..• , X )
p
S
~
r
min(p, N-l),
such that
we see that
P(r=p) = 1.
or of the corresponding correlation
p p
matrix might be used to get some idea of the dimensionality of the data.
In
view of Lemma 3.5., it is desirable to have
r
observed to be less than
of the data matrix
N
much larger than
p.
If
is
p , there is some linear redundancy between the rows
X
Xl'
p N
P 1 N
under consideration to only
r.
and we should reduce the number of variables
In particular, there should be no physical
... ,
reason to believe that the chosen characteristics
X )
p
are not
equally important in distinguishing the individuals and populations under
consideration.
Geometrical intuition can be brought to bear upon the question of how
the variables (Xl' ... , X )
p
of
W
r
orders.
of
N
interact by considering the stepwise construction
using Lemma 3.1. as the variables (rows of
X)
,....,
are added in different
N
The elements of
~te')
~
in (3.6) are interpreted as the coordinates
points to be plotted on
[-
N-l]
1
NcR
•
32
The resulting one dimensional configuration displays the contribution of
to the representation
variable
W
,....,
in the presence of
can be
Xi
only. Similarly, ~*(i')
and v*i
~
-(~+l)
1
2
k-l
2
plotted on orthogonal axes in R
to represent the contribution of
and
Xi ' Xi ' ... , and
together after adJ'usting for the presence of
~(i' )
is the projection of
X
iI' "',
X
Note
i k l
W onto some direction, the orientation of
-k
which is determined using only the
Thus
~*(i')
rows of
X.
is clearly not invariant under transformations (1.3) although
-k
W is invariant.
Note that i f
will be null.
r
is less than
at least two of the vectors
p,
One of the corresponding variables,
Xi , which makes no
p
contribution to the representation in the presence of the
should be dropped from consideration.
If
calculations should be repeated until only
r
is less than
r
p-l
p-l,
physical considerations to keep in mind when selecting
~,
p
Besides
... ,
and
X ).
p
A large number
Their respective diameters and
weights can easily be measured and would seem to be of interest.
p=2.
the above
there may be
Consider the following artificial experimental situation.
might be collected with
-p
other variables
variables are retained.
these techniques based upon the theoretical model for
of solid metal balls are to be examined.
~*a')
Thus data
However, if the experimenter notices that all
the balls are made of the same uniform metal, diameters and weights are clearly
related.
This redundancy would probably not cause the representation to have
rank one because the relationship is nonlinear.
However, except for errors
in measurement, the points of the two dimensional representation would lie on
a smooth curve and thus reveal the redundancy.
Since the problem is really
one dimensional in this case, only one variable should be observed.
Weight
might be considered critical if the balls are to be lifted and carried by
33
hand; diameter might be considered critical if the balls are to be manipulated
by a mechanical device which opens only to certain widths.
34
6.
SUMMARY AND USES
The linearly invariant canonical form
data
adjusted for
X
X
fi .
and
is a representation of the
W
r
N
X and
While
describe individuals
P N
w
in the population(s) in terms of the specific characteristics measured,
describes individuals in the population(s) in terms of all possible linear
combinations of the characteristics measured.
the Multivariate Normal distribution (4.1),
for
~
a whole.
and
L and
~
However,
X and
S
X had
would be sufficient
would contain no information about the population as
W would still describe the specific
chosen for analysis.
might assume that
If it were known that
N
individuals
On the other hand, given some experimental data, one
X has the distribution (4.1) but discover that
empirical evidence that casts doubt on this assumption.
Finally, if
~
contains
c > 1
different populations are being sampled, although "average" location and dispersion have been removed,
W reveals "relative" location and dispersion of
the populations.
The invariant model for
~
considered here is inadequate and misleading
when the variables chosen for analysis are linearly dependent but are measured
subject to random experimental error.
distribution of the p-variables.
Let
r*
<
p
be the true rank of the
Because of measurement error, the linear
dependence will probabably not be discovered
(r=p).
Furthermore, the errors
in measurement will be given weight equal to
(p-r*)
dimensions of variation
in
~.
Thus the greatest weakness of the representation ~ is with respect
p N
to the type of redundancy in the data it was hoped to automatically detect.
35
The invariance of
Wunder (1.3) implies, in particular, that the signs
of all the variables can be reversed.
Thus procedures appropriate for the
analysis of data conforming to the model we have considered can detect differences but do not automatically indicate a direction of difference.
example, in the univariate
(p=l)
For
two sample testing problem, we are considering
two-sided tests only.
The canonical form
E
can be used to bring geometrical intuition to
bear on problems in which linear invariance is assumed.
The one and two
dimensional representations discussed in Section 5 are the easiest to interpret.
The results of Lemma 3.6. imply that multivariate test criteria based upon
sums of characteristic roots may have an interpretation as average squared
distances within and among samples.
before the use of the
T2
The representation can be constructed
statistic as mentioned in the introduction.
This
is also the case when Fisher's sample linear discriminant function is to be
used if the individual to be identified and the samples known to be from the
respective populations are simultaneously represented.
If one is not willing to assume that all the variables can be combined
together, but
k
distinct subsets of the variables are each thought to con-
form to the invariant model, then the separate representations
!i [1] , .... ,
PI N
can be combined to form a total representation
k
where
p
L Pi'
When
k=2,
the above calculations could be performed
i=l
before use of Rotelling's sample canonical correlation analysis.
36
The amount of calculation required to find
N.
A non-singular g-inverse for the
pxp
out to its Hermite row canonical form.
W increases rapidly with
matrix
When
P
S
can be found by sweep-
is formed using (2.3), the
accuracy of the calculations is indicated by the number of digits to which
P
is symmetric and idempotent.
all c-vectors of
P
Since the non-null rows (columns) of
associated with c-roots of
of these vectors will produce
r
+ 1,
non-null rows for
Pare
any orthonormalization
W.
The c-vectors found
by this or any other technique will usually not have the uniqueness of
orientation property (2.11), but this is not really necessary for geometric
interpretation of the representation; only the inter-point distances are of
interest.
Thus, writing
~ = Qb 2Q'
as in (4.5), we see that
1-1Q'(~ - !l')
has properties (2.8), (2.9), and (2.10); the sample
p p p
N
principal components normed to mean zero and variance one constitute an
un-oriented decomposition of
E
N N
Many univariate nonparametric techniques based upon ranks are invariant
with respect to monotonically increasing transformations of the data.
transformations include translation and change of scale.
These
Multivariate, con-
ditionally nonparametric procedures have been considered which rank data on
each variable separately.
In a future communication, the author will discuss
the use of the linearly invariant canonical form in the assignment of "rank
scores".
New nonparametric and conditionally nonparametric, univariate and
multivariate ranking procedures will be introduced which are invariant only
with respect to translation and nonsingular linear transformation.
37
REFERENCES
[1]
Gower, J.C.
(1966).
"Some distance properties of latent root and
vector methods used in multivariate analysis".
Biometrika 53
[2]
James, A.T. (1954).
325-338.
"Normal multivariate analysis and the orthogonal
group".
Ann. Math. Statist. 25
[3]
James, A.T. (1955).
40-75.
"The noncentral Wishart distribution".
Proc. Roy. Soc. London Ser. A. 229
[4]
Rao, C.R. (1966).
364-366.
"Generalized inverse for matrices and its
applications in mathematical statistics."
Research Papers in Statistics (Festschrift J. Neyman,
Edited by F. N. David), Wiley, New York,
263-279.

Download Report

Obenchain, Robert L.; (1969)A linearly invariant canonical form of multivariate data."

Paperzz.com

Your Paperzz