L
,
",---,
RElATIONS AIIfONG SEVERAL
SETS OF VARIATES
by
Jon R. Kettenring
University of North Carolina
Institute of Statistics Mimeo Series No. 538
This research was supported by the
Air Force Office of Scientific
Research Contract No. AF-AFOSR-760-65
1
L..
....
i
fl1
i~.
\
".
'-
...
~,~\~
"~~'''':..,
."
Department of Statistics
University of North Carolina
Chapel Hill, N. C.
,
'Summary
This paper commences with a discussion of the relations, particularly those represented by canonical correlations, between two sets
of variates.
The main purpose, however, is to investigate two exten-
sions of canonical correlation theory to deal with several sets of
variates.
An old criterion of minimal generalized variance is
redeveloped, and a new criterion based on sums of squared correlations
is formulated.
Solutions in both cases are obtained by an iterative
procedure which can be readily carried out on a computer.
I
L
CRAPTER I
Relations Between Two Sets of Variates
Preliminary Remarks
The study of relations between two sets of random variables was
begun by Rotelling in his brief 1935 paper [6], "The Most Predictable
Criterion."
Ris definitive study of the problem appeared the next
year in a fifty seven page paper [7] which still stands as a key
reference in multivariate literature.
Let Xl (PI x 1) and
of variates.
var
00
!2 (P2 xl), PI -s: P2, represent the two sets
At this point the only assumption about X is that
= L =
L12]
PI
L21
L22
P2
PI
P2
[ Lll
exists and is positive definite.
The relations between the two sets
are found through analysis of L.
An illustration given by Rotelling [7] compared a set of mental
test scores with a set of physical measurements on the same persons.
"The questions then arise of determining the number and nature of the
independent relations of mind and body •
and of extracting from
the multiplicity of correlations in the system suitable characterizations of these independent relations."
Of particular interest might
be a pair of linear combinations of scores and measurements yielding
the maximum or minimum squared correlation.
2
The formal extraction procedure will be loosely referred to as a
"canonical analysis."
We start by finding two new variables, each with
unit variance, formed from linear combinations of
the largest possible correlation.
~l
and
~2
yielding
The new variables are known as
canonical variables and together they form the first pair of canonical
variables.
correlation.
The associated correlation is called the first canonical
When one of the initial sets consists of criterion
variables and the other of predictor variables, then the canonical
variable derived from the criterion set corresponds to Hotelling's
most predictable criterion [6].
The analysis may be continued until PI pairs of canonical variables
and PI canonical correlations have been identified.
At each stage
it is required that the variable derived from each set be uncorrelated
with other canonical variables formed from that set, that the correlation between the derived variables be as large as possible, and finally
that the new variables have unit variance.
If PI < P2' a formal com-
pletion of the problem may be made by finding an additional (P2 - PI)
canonical variables as linear combinations of the second set; besides
having unit variance these variables must be uncorrelated amongst
themselves as well as with the other canonical 'variables from the
second set.
These latter (P2 - PI) variables are usually of little
interest.
A consequence of the canonical analysis is that there is zero
correlation between a canonical variable from the first set and a
canonical variable from the second set whenever they do not constitute
one of the canonical pairs.
In addition some of the canonical
I
_.
I
I
I
I
I
I
,
.J
_I
I
I
I
I
I
I
-.
I
r"
!
l-
3
correlations may be zero, the number being equal to the difference
between PI and the rank of LI2.
Already it should be clear that the canonical correlations are
invariant under internal nonsingu1ar transformations of either set;
this fact was pointed out by Hote11ing [7] and used advantageously
by Horst ([3], [4], and [5]), Steel [13], and others.
Hence, without
any loss of generality, one may replace L by
R =
where R is obtained from L by the transformation
with
0]
PI
[o
T2
P2
PI
P2
TI
L.
T I and T2 may be taken to be lower triangular, a computationally convenient form, or symmetric, in which case it is customary to write
A canonical analysis of two sets of variates with variancecovariance matrix R simply involves the selection of appropriate
internal orthogonal transformations for each set.
Thus if a variable
is dropped from the first set, the induced canonical variables for
4
the second set will be related to the original canonical variables
for the second set by an orthogonal transformation.
~
Basic Theorem of Canonical Analysis
Hotelling's derivation of the results needed to determine the
canonical variables and correlations was based on calculus and Lagrange
multipliers.
It is possible, however, to give a rigorous derivation
using only matrix theory.
Lancaster [8] has done essentially this.
Other writers have presented or discussed results which could be
expanded into such a proof; see, for example, Vinograde ([15], pp. 160-
161), Roy ([12], pp. 142, 153), and Horst ([3], pp. 131-132).
proof given here is similar to Lancaster's.
The
It illustrates, more
clearly than the calculus approach, the structural significance of
the canonical correlations.
Theorem 1'1
and PI
Suppose E
Then there exists a matrix
oS P2·
DB -
[:1
PI
:J
P2
and a unique matrix
.
lI) -
DBEDB
such that
41 -
=
[:21 :12]
PI
P2
I
--I
I
I
I
I
I
I
,I
> 0
_I
I
I
I
I
I
I
-.
I
5
o)
~12 = (D
"-
PI
P2- P l
PI
........
D = diag 0.'1 , A2'
and
,
... ,
API) , 1 > Al > ••• > API> 0,
,
~21 = ~12·
where Bl* is any orthogonal matrix such that
an d B2
= B*..,-1/2
2~22
,
where B~ is an orthogonal matrix satisfying
This equation will determine the first j rows of B2'
* j being the number
of non zero eigen values A; the remaining (P2-j) rows are any orthogonal completion of B~.
proof
....
.
..,
..,1/2..,1/2
d'"
..,1/2..,1/2
Wr1te ~11 = ~11 ~11 an ~22 = ~22 ~22 .
\.
and
-1
.......
-1'
form R = D EDT
T
=
R:2]
[R:,
-1/2
where R12 = Lll
,
and R21 = R12·
~.
-1/2
L12 L 22
6
By lemma A2 (see Appendix), there exist orthogonal matrices B~ and B~
such that
where
~12
• (D
0)
D • diag (AI' A2' ••• , Ap ), 1
1
>
Al
> ••• >
-
222
and (AI' A2 , ••• , API) are the eigen values of R12R21.
Ap
>
1-
0,
Moreover,
Bt may be any orthogonal matrix such that
Note that the eigen values of R12R21 are the same as the eigen values
-1
-1
of 1:111:121:221:21.
I
-'I
I
I
I
I
I
I
Let
I
This definition of DB yields the desired decomposition
~.
Suppose now DB is any other matrix yielding a decomposition of 1:
in the
form~.
=
Then
~
1/2 -1/2
-1/2 1/2~t
(B11: 1 1 )1:11 1:1~1:22 (1:22 B2)
2
~* -1/2
-1
-1/2~*~
D • B11:11 1:121:221:211:11 B1
~*
~
1/2
where B1 • B11:11 is orthogonal.
and
~
itself is unique.
The properties of B2* follow from the two equations
_I
I
I
I
I
I
I
-.
I
-
7
Corollary 1·1·1
Bl is any matrix such that
'-
and
Alternatively, the rows of Bl are any complete set of eigen vectors
-1
-1
of LllLlZL22L2l, such that the ith row of Bl is associated with ith
largest eigen value and BlL1lB; = I.
The first j rows of B2 are
determined by the equation
-1
=
BlL12 L22
~12B2·
The last (P2-j) rows of B2 are chosen so that
.-
B2L22B2
= 1.
proof
and
so that
BlL~{2
is orthogonal and may be taken as
and
These hold if and only if
-1/2
-1
-1/2
1/2 .(Lll L12 L22 L2l Lll
- AI)(BlLll)
and
,
'-
0
(BlL~{2)(BlL~{2)'- = I
B~
in the theorem.
8
so that, again,
BIL~{2 is orthogonal and may be taken as B~ in the
theorem.
The last two statements of the corollary are immediate consequences
of the theorem.
-*1 and B-*2 are any two orthogonal matrices
Suppose B
Corollary 1·1·2
producing a decomposition of the form
B~RI2B~~
= ~12 = (D
0) PI
PI P2-P I
D=
where
«d
ij
)
is not necessarily a diagonal matrix but with
diagonal elements
Then it is impossible for
d
d
ii
ik
>
Ai' i = 1, 2, ••• , j, unless
= 0,
i · 1,2, ... , j, k· 1,2, ..• , PI(k"i)
in which case there is equality for i
for j
= 1,
2,
= 1,
2, ••• , j.
This is true
... ,
I
--I
I
I
I
I
I
I
I
el
,I
proof
BrRI2R2IB~~
= DD~ = F
BrRI2R21B(
= n2 = d+ag (A~, A~,
(say) where F
...
J
I
= «f ij ».
t
A~l).
d~i ~ f ii with equality if and only if d ij
= 0,
j
= 1,
2, ••. ,
I
PI (j"i) •
Af
- f ll
>
by lemma AI; hence i f Al = dll then d
lJ =
° for j
• 2, 3,
..., Pl , and the first rows of BI-* and B~ are in the same eigen space •
Now A2
f22 by lemma AI; hence if A2 - d22' then d 2j = ° for
>
I
I
-.
I
-
9
j
= 1,
2, .•• , PI (j,2), and the second rows of B~ and B~ are in the
same eigen space.
for i
Continuing in this way, it follows that d
= 1, 2, ••• ,
j; j
(in which case d ii
ii
~
Ai
= 1, 2, ••• , PI if and only if D is diagonal
= Ai' i - I , 2, ••• , PI)'
Theorem 1'1 together with corollary 1'1'1 provide the essential
information for carrying out a canonical analysis.
The corollary, in
particular, ties the theorem to the procedures given in Anderson
[1].
The second corollary is needed to show that the A'S are in fact
the canonical correlations.
Now it follows that there is zero correla-
tion between canonical variables not belonging to the same canonical
'0,,",-,_'
pair and that the number of non zero canonical correlations is the same
as the rank of LI2'
There are two other useful consequences of theorem 1'1,
suppose PI = P2'
First
Take any PI linear combinations of variables from the
first set and pair them with an equal number of linear combinations
of variables from the second.
Then if any two derived variables not
in the same pair are uncorrelated, the correlations between the paired
variables must, apart from sign, be canonical.
Here the smallest
canonical correlation is the minimal non negative correlation attainable between any pair.
Second, when PI < P2 there will always be
(P2 - PI) uncorrelated variables, linear combinations of the variates
in the second set, which are uncorrelated with any variable derived
as a linear combination of the first set.
It is sometimes said that "the ensemble of canonical variables
accounts for all of the existing relations between the two sets of
variables."
In what sense is this so?
Writing RI2 = ((r
ij
)), the
10
r
ij
represent all that may be explained by analysis of E or R.
PI
= E A.~
A2
Let
i=l
•
1
By dint of the last equation. it is reasonable to call A2 a measure of
the "total relationship" between the two sets.
Hence the ith canoni-
cal pair isolates and accounts for a proportion A~/A2 of this total.
And. together. the PI pairs account for all of the total.
The next corollary relates the eigen values of R to the canonical
correlations.
This result underlies certain generalizations to be
discussed in chapter two.
The largest PI eigen values of Rare 1 + AI.
Corollary 1·1·3
1 + A2 ••••• 1 + ApI.
of
*~
(l~
•
*~
2~
). i
A corresponding set of eigen vectors consists
= 1.
2 •.••• Pl. where
*
Bj • j • 1. 2.
proof
~I
R2l
~
R12]
I
P~
CI-P)I
Rl2
R2l
(l-p) I
[:::] • 0
lb
J[ l*
2b
• 0
*~
j~
is the ith row of
I
.1
I
I
I
I
I
I
I
I
el
I
I
I
I
I
.-I
I
11
(l-p) Ib* +
<=> r
R12 2 b* = 0
1
1-
Ib* + (l-p) 2E.* = 0 •
R21
Suppose p
~*
+ 1;
=
then
(p-1)-lR12 ~* and 2E.*
=
(p-1) -1 R21
*
l~
•
Substituting
[R12R21 - (1-p)2I] Ib*
Hence p
=1
=0
A and, for each positive A, I + A is one of the PI
±
largest eigen values of R and
corresponding rows of Bl* and B2.
*
K, then p
=I
*" and 2E.*" may be taken as the
l~
When A = 0 is a root of multiplicity
is a root of R of multiplicity at least 2K; thus the
remaining of the PI largest eigen values of R (if any) are all unity.
For each A = O '1_
b *" and 2E.*" may be taken as the corresponding rows
of Bl* and B2* since
Some Special Results
Let V(M) designate the vector space generated by the row vectors
of the matrix M.
If
,
l~l
Bl
=
,
l~
·••
b"
I_PI
and B2
".
~l
,
= ili
.b"a
~2
12
then the standard canonical analysis involves the selection of
l~€V(IPl)' i = 1, 2, ... , PI' and 2~€V(IP2)' j • 1, 2, ..• , P 2 ,
subject to the conditions of theorem 1·1.
The intent now is to show
how to modify this procedure to allow for arbitrary homogeneous linear
restrictions on the coefficient vectors.
Consider the (possibly) more restrictive setting where
with ql
~
~
q2' dropping the assumption PI
Theorem 1·2
Let E be as in theorem 1·1.
••• , q
P2·
Then there exist
2
such that
where
B1
= 1b;
B2
=
·
=
(D
ql
2~
••
••
b~
l--ql
~12
I
I
212.~
,
·
b~
2--q2
0
)ql
q2- q l
• • • t
A ), 1
ql
I
I
I
I
I
I
I
I
_I
~12
112.;
I
.1
>
Al
>
-
A2
> ••• >
A
~
-1
-
-
eigen values of (M1EIIMI)
ql
>
o.
~
M1E12M2
If the rows of B1 are any complete set of eigen
I
I
I
.-I
I
13
-.
BI may be taken as
Also B2
= B2M2
where the first j rows of B2 are solutions of the
equations
(j being the number of non zero eigen values A).
The last (q2- j ) rows of B2 are chosen so that
B2M2I:22M;B; = 1.
Suppose Sand T are non singular transformations such that
(S,T):D
-4
D.
proof
2
2
The eigen values of this matrix, to wit, AI' A2, ••• , A ,
ql
are those
obtained by applying theorem 1·1 to the positive definite matrix
MI I:IIM~
[
M2I:21 MI
MI I: 12M~]
M2 I:22M2
By the same theorem and corollary 1·1·1, there exist ~l amd
that
\
....-
B2
such
14
I
.-I
and
I
For the transformation S,
Since V(M l ) and V(SM l ) are isomorphic,
The effect of the transformation T is seen similarly.
~
~
Now (SMlL1lMlS)
=S
~-l
-1
~
(MlLllM l )
,
,
,
, -1
SMIL12M2T (TM2L22M2T)
-1
~
~
MlL12M2(M2L22M2)
-1
'
TM2L2lMlS
~
M2L2lMlS
..
~
which has eigen values the same as
The assumption L
>
0 is not necessary for the theorem to hold.
It is sufficient that
I
I
I
I
I
I
el
I
I
Hence, it is easy to see how one would complete. a canonical analysis
for singular sets of variables.
rank (L22)
= r2
~
P2 •
Suppose rank (Lll) • rl
~
PI and
Then there exist matrices Ml (rl x PI) and
M2 (r2 x P 2) of full rank such that MlLllMi - I r
Now, working with the nonsingular sets,
l
and M2L22M2 • I r •
2
I
I
I
.'I
I
r \
,
'-
15
from which point theorem 1'1 may be applied.
An implication of
theorem 1'2 is that the reduction is invariant under choice of M1 and
The next theorem was motivated originally by problems to be
discussed in chapter two, but is of interest here for the further
insight it gives to the relations between two sets of variates.
In
essence, one set of variables will be held fixed, after which attention is directed to certain invariants of the system under internal
nonsingu1ar transformations of the other set •
Revising the original assumptions on
..
~ =
(2 Xl, 2X2' ••• ,
2~
)
2
slightly, var (2Xi)
= 1, i = 1, 2, , •• , P2'
nonsingu1ar matrices
..
J2.;
.!!.l
A
,
= a;'
·.
B
b" , and
= -2
.
C
= .£.2
12; 1
.!pI
satisfying
(1'1)
= BEIIB" = CEIIC" = I.
ALlIA"
Define new variables
I
U
= U2
= A.!l. ,
··•
U
'}l
1
VI
V =
Y2
Vp
=
1
BX ,
1
.
..
.£.1
..£p•• ..
1
Let A, B, and C be
16
I
.1
•
•
P2
t
2
{corr(V , 2Xj)} ,
Let a i i
j-l
P2
6i •
and Y i
{t
~j-l
corr(v i ,
X )}2,
j
Ivar[~] Ifor i
= 1,
2, ••• , Pl·
Let! be a vector of ones and 8 ('-) the ith largest eigen value of the
i
matrix in the argument.
Now for i
~
1, 2, ••• , Pl in turn, select
~, ~,
and
~
to
maximize a i and 6i and to minimize Yi satisfying at each step the
restrictions in (1·1).
The technique for doing this and the values
attained are given in theorem 1·3.
[Theorem 1·3(i) was discussed
recently by Fortier [2] and earlier by Rao[10].]
Theorem 1·3
(i) a •
i
(ii)
for i • 1,
These values are attained for each i when
~, ~,
and
~
are eigen
vectors (subject to 1·1) corresponding to the ith largest eigen values
I
I
I
I
I
I
I
I
_I
I
I
I
I
I
.'I
I
17
proof
Making repeated reference to lemma AI,
-1/2
a 1 = E11
0.2
~1
= sup
-1
.!.~L12L21.!. = 62(LIIL12L21)
a
1!;E 11.!.=o
.!.~L II.!.
~",1/2
~1
£.11 .!.=o
-1/2
a 2 = Ell e 2
-1/2
~
= (Lll
=
-1/2
L12 L21 Lll
o.2I)~2
-1
(LII L12 L21 - o. 2I)a 2
= sup
a
.!.1 L ll1!=o
j=1,2, ••. ,Pl-1
a.
= 2:~/2
o
= (L: -1/2
"""'PI
e
"""'PI
-1/2
)
- CX I ~l
1lo L:12 L:2l 2:11
PI
= (L:~i ~
L:2l -
CXPlI)
~l
•
The other results follow in precisely the same manner.
obtain results for the
~i
Note that to
's one needs to consider variation of
18
for the yi'St one needs to consider variation of
.£."E 11.£
E21.£.
.£."E 12
E22.
.£."Ell.£.
IE221~"Ell.£. - .£."I12I2~I21£)
.£." Ell.£.
-1
- II221 - II221 .£."II2 E22 I 21'£'
.£."Ill'£'
with respect to.£..
For the latter prob1em t since the object is to
seek the minimum value for Yit the lemma may be applied directly to
-1
.£."I 12 I22 I21.£.
.£."I II.£.
The Yi'S are particularly noteworthy because of their intimate
relation to the canonical correlations.
of canonical variables for the first set.
Moreover t the W's are a set
I
.'I
I
I
I
I
I
I
I
_I
I
I
I
I
I
.1
I
I
,r
,
19
CHAPTER II
Relations Among Several Sets of Variates
Preliminary Remarks
Rotelling [7], in addition to solving the two set problem,
recognized the need to generalize his solution to deal with more than
two sets of variates.
Only a small amount of research has been done
in this direction, however.
A resume of this work will follow the
introduction of necessary notation and assumptions.
Let Xl (PI x 1),
~
(P2 xl), ""
X (P m x 1) represent the
-m
m(>2) sets of random variables, arranged so that PI < P2 < ••• <_ Pm•
X" =
<Xl, !2,
X") •
... , -m
1:11
1:12 •••
1:21
1:22
....... '--
var(X)
Assume that
I:
...
1: 1m
PI
1: 2m
P2
I:
P
unn
m
P
m
is positive definite.
that 1: ii
If
= TiT~,
!!.:t
=
i
= 1,
Then there exist nonsingular matrices T such
i
2, ••• , m.
T~l!.t and .!(
=
(U;,
u;, ... , ~),
then
20
var(Q)
=
R
=
I
Rl2 •••
RIm
R21
I
R2m
°
0
•
--I
•
O
O
R
ml
R •••
m2
I·
The final variables of interest are ~
orthogonal matrix and
The matrices B~
=
••
o
o
°
• *~
i£P
i
i£; i
so that
Introducing quasi-diagonal matrices
DT • diag(Tl
DB
and
= diag(BI
DB'*= diag(Bf
I
T
I
I
B2'I
I
I
I
I B~:
•••
ITm),
•
IB ),
1m
0
•
.0 •
IB*)
1m'
I
= B~~
where B~ is an
I
I
I
I
I
I
I
_I
I
I
I
I
I
.I
-.
I
21
(2 ·1)
we have DB*D~*
DB*
and
= D~*DB* =
I,
= DBD T,
R
= D-T I W T..- l '
ct>
= DBW~ = DB*RD~*.
Two more quasi·tiiagonal matrices
I
and
*",I •••
I 2~
-.1.
define
'-.....
for i
= 1,
2, ••• , Pl.
ct>(i) is just the variance-covariance (or
correlation) matrix of (IZi' 2 Zi' ••• , mZi).
(2·2)
that DB*(i)D~*(i) = I,
(2·3)
and
DB*(i)D~*(j) =
a
for i = 1, 2, ••• , PI; j
It follows from (2·1)
= 1,
2, ••• , i-I.
From the point of view of the present research, a bonafide
generalization of Hotelling's procedure must
(i) define a logical cd tedon for ct> (i)' i
= 1,
2, ••• , PI;
(ii) satisfy (2·2) and (2·3);
(iii) reduce to Hotelling's solution when m
= 2.
When all the Pi's are not equal, the remaining elements of D * may be
B
specified by an appropriate orthogonal completion process.
But this
point is of little importance and will not be considered further.
(Even if all the Pi's are equal D * need not be unique.)
B
Here the
term "canonical analysis" will refer to the formal extraction of
relations among the sets as defined by the criterion.
22
Previous work on generalizations has been done by Vinograde (1950),
Steel (1951), Horst (1961), and McKeon (1966).
Not all of their
proposals satisfy conditions (i) and (it) of the last paragraph although
all satisfy condition (iii).
In 1935 Wilks [16] developed the likelihood ratio test for independence of sets of variates.
statistic is jU8t
I RI
The parametric analogue of his test
or the product of the eigen values of R.
Hotelling [7] observed that the
eige~
values of (R-I) (and hence of R)
are invariant under internal nonsingular transformations of the
~'s.
McKeon [9] has defined the generalized canonical correlation as
(y-l)/(m-l) where y is the largest eigen value of R.
Recall from
corollary 1'1·3 that when m • 2 this is just the first canonical
correlation.
However when m > 2 there will usually not be an associ-
ated eigen vector whose components generate m standardized variates.
Nevertheless, if these m derived variates are subsequently
standardized, then the resulting variance-covariance matrix,
~(I)'
will
give the best least square or Euclidean norm approximation to a rank
one matrix in the class of possible
~(I)
matrices associated with R.
This criterion was suggested by Horst ([4], [5]) and called by him
the rank one approximation method.
It extends logically to PI stages
in accordance with (2'2) and (2·3).
Two similar techniques, the oblique maximum variance method and
the orthogonal maximum variance method, have also been suggested by
Horst ([4], [5]).
The former is based upon the PI largest eigen
values and associated vectors of R.
The induced variates (PI per set)
are subsequently standardized, but they usually will not be uncorrelated
I
.-I
I
I
I
I
I
I
I
_I
I
I
I
I
·1
.-I
I
23
within sets.
The latter technique is a revision of the former which
incorporates (2·3) as a constraint.
Horst named his most recognized contribution the maximum correlation method ([2], [4], and [5]).
A better name from our point of
view would be the "sum of correlations" (SUMCOR) method.
by selecting IZI' 2 Z1' ••• , mZI (subject to(2·2»
One starts
which maximize the
sum of the correlations (1'~(1)1 '.m)or, equivalently, minimize
m m
i~l j~l var(iZl - jZl)·
To continue, a DB*(j) which maximizes ~'~(j)~'
i"j
subject to (2.2) and(2·3), is found for j
= 2,3,
••• , Pl.
At the sth stage one must solve
s-l
(2·4)
(I
s-l
-i~lD~*(i)DB*(i»(R - 1)(1 -i~lD~*(i)DB*(i»D;*(s)!= D;*(s)~
simultaneously for DB*( s ) and -s
A such that --s
l'A (= _~'~(s)1
- _m)is a
maximum.
Horst suggests an intuitively appealing iterative procedure
to do this; the properties of the procedure, however, are unknown.
Horst's other three methods, being essentially eigen value
problems, admit straightforward solutions.
mation method, the largest eigen value of
a measure of the fit to a rank one matrix.
elements of
near m and
~(s)
~(s)
For the rank one approxi~(s)
is, as Horst remarked,
If all the off-diagonal
are near unity, then the largest eigen value will be
will, loosely speaking, be "nearly of rank one."
This
suggests that the rank one method may yield useful approximations for
the more difficult SUMCOR problem.
Vinograde [1] proposed a quite different method.
The crux of his
idea is to select a B~ (i = 1, 2, ••• , m) which diagonalizes
m
j~lRijRji.
Referring to theorem 1·1 this is easily seen to be a
24
generalization of the two set canonical analysis.
This approach is
mainly of mathematical interest in that it provides a simple canonical
form for partitioned positive definite matrices.
The earliest idea of statistical interest came from Steel [13],
a student of Vinograde.
He established a system of equations from
which one, in theory, could find a DB*(I) yielding a minimum value of
1~(I)I.
In other words, the resulting standardized variables
2ZI' ••• ,
Zl possess minimum generalized variance (GENVAR).
m
Steel argued that, for any DB*, I~(l)
element of the mth compound matrix
of~.
I would
appear as a diagonal
Fixing attention on this
element, derivative equations are found which, along with orthom 2
gonality restrictions, are i~IPi in number with as many unknowns, the
* B* , ••• , B* •
elements of BI'
2
m
difficult to solve.
These equations are non linear and
Some solution will minimize 1~(I)I.
Finally, we supplement this list with a new technique, the sum
... ,
of squares of correlations (SSQCOR) method.
I
,
.-I
I
I
I
I
I
I
I
_I
2
generated by a DB*(I) which maximizes (tr~(l) - m) or
m
m
2
i~l j~l{corr(iZl' jZI)}·
For j
= 2,
3, .• "
PI' the same criterion
i=,'j
determines DB*(j) subject to (2.2) and (2·3).
For statistical purposes, the SUMCOR, GENVAR and SSQCOR criteria
seem to be the most relevant.
The choice of criterion should, of
course, be related to the problem at hand.
The present research is primarily concerned with the GENVAR and
SSQCOR criteria.
To see that they may be superior in certain situations
to the SUMCOR criterion, consider the following example.
I
I
I
I
I
.-I
I
......
25
1.0
0.0
0.3
0.0
0.3
0.0
0.0
1.0
0.0
0.1
0.0
0.1
0.3
0.0
1.0
0.0
0.3
0.0
0.0
0.1
0.0
1.0
0.0
0.1
0.3
0.0
rO. 3 0.0
0.0
0.1
0.0
0.1
r·o
1.0
O.Oj 0.0
0.0
~ 0.0 0.0
0.0
0.011.0
0.0
~ 0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
0.0
0.0
1.0
0.0
0.0
0.0
0.0
\'"~
(2·5)
(2.6)
R •
Then
DB*( 1) =
0.0
0.0
1.0
satisfies (2'4) with
.! " <II(l).!
=
3.6
as does
DB*( 1) =
(2 .7)
ro
0.0
0.0
0.0
0.0
with l"<11( )~ = 3.6.
-
1.0
I-
~
i
~
~ 0.0
1.0
Evidently there is indifference between the two
weighting systems (2'6) and (2·7) with respect to the SUMCOR criterion.
Intuitively one may feel that the first system of weights provides a
better summary of the relations among the three sets.
Both the GENVAR
and the SSQCOR criteria would corroborate this feeling.
Other situations may favor SUMCOR over GENVAR or SSQCOR.
often, in the selection of IZl' 2ZI' ••• ,
produce similar results.
of real data studies.
More
ZI' the three criteria
m
This tendency has been observed in a number
26
The Generalized Variance Criterion
Steel's problem may be handled without recourse to compound
matrices by repeated use of a simple determinantal expansion.
The
final equations are in a form amenable to canonical analysis.
Writing [A](ij) for the matrix obtained from A·by deleting the
ith row and jth column, define
fi •
JMi • [~(i)](jj)'
1 j-l~IEjj+l j+l~I···lEjm ~).
b*IR
b*'
IR
b*)
: ···1 Rjj _1 j-l--J.,1
jj+l j+l--J. I• • • I jm m=-i '
jNi • O:jl 1~1···1 Ejj _
j N*i -- (Rj
jP i • jNi
1 1~
-1 jN~,
jM
i
.-I
I
I
I
I
I
I
I
el
and
The A's and y's will be Lagrange multipliers corresponding to conditions (2·2) and (2·3) respectively.
f
(2· 8)
I
i
may now be expanded into the form
fi •
IjMi I (1
. *, jP* j~)'
* j • 1, 2, ••• , m.
- j~
i
The salient feature of (2·8) is that jMi and jP~ do not involve j~.
At the first stage (i • 1>, form the Lagrangian equation
gl
= f1
-
m
*, *
j~l(l- jb 1 jbl>ljMlljAl·
Differentiating symbolically with respect to jk~
I
I
I
I
I
.-I
I
27
equating the derivative to zero yields
(2·9)
these m "eigen value" equations involves the determination of
~DB*(I)
and
~1
such that (2·9) holds with fl as small as possible.
Moving to the second stage, the Lagrangian equation is
which, equating to zero, becomes
o = jP*2
(2·10)
with
•._--
"--- \"
and
b* jb;l* - /'2 j-2
= j.Q.*1 .. jP2*
*
j~'
.
* .. *
jA 2 ... jJ2.2 jP2
j~'
jY 21
. £2
= Ij M2 I(1
- j A2) for j
jJ2.~
j Y21
= 1,
2,
••• t
m.
Rewriting (2.10),
o ... [(I = [(I -
for j
= 1,
j b;
b
*
j-l
2, ••• , m.
For higher stages s (up to PI) the situation is completely
analogous.
Only the key equations are given:
28
(2 ·11)
(2·12)
i • 1. 2. •••• s-l; j • 1. 2. ...• m.
(2·11) and (2·12) may be reformulated in terms of the original
variables; then. for j - 1. 2 ••••• m.
(2 ·13)
o...
1/2
j~ j~)tjj - jAsI]
(2·14)
1/2
t jj
In practice one would solve either (2·11) or (2·13).
iterative procedure for solving (2·11) will be described.
jla·
An
The
procedure. with only minor modification accounting for the trans-
I
.-I
I
I
I
I
I
I
I
_I
formation of variables. is equally applicable to (2·13).
The. following argument should provide some insight into the
procedure.
Suppose l"'DB*(s) and ~ comprise an optimal solution
of (2·11).
Then. for any j. JA S and jla* are consistent with a
canonical analysis of
subject to (s-l) independent linear homogeneous constraints
defined by (s-l) independent rows of
s-1
*
*'" -- on the
k~1 j~ j~
I
I
I
I
I
.-I
I
29
coefficient vectors of the first set (cf. theorem 1·2).
This may be
seen by rewriting (2·12) in the form
(2·15)
where
Vp
j
+1-s
= V (I
-
s-1
b*
b*')
k~1 j~ j~
*
s-1
*
*,
*
and ~ is in the same eigen space as (I - k~1 jb k j~ )j~.
s-1
*
*,
*
Also (I - k~1 j~ j~ )j~ = 0 for i - 1, 2, ••• , s-l;
*
*,
S-1
*
*,
and rank (I - :i~ j~ j~ )- tr(1 - k~1 j~ j~ )
S-1
*,
= Pj - tr(k~1 j~* j~)
S-1
*
= Pj - tr(kI:=1 j~*,j~)
= P
j
+ 1 - s
~he first step is a consequence of the idempotency of
....... ,
-
(I -
S-1
I:
~1
(2·3).
b*
b*'») •
j~ j~
The restriction -t'€Vp +1 - S ·is equivalent to
j
A final observation is that 0
show that 0 < f
<
-
jA
s
<
1 from which one may
< 1.
s -
Suppose now that variables have been chosen for the first (s-l)
stages and consider the following iterative procedure for generating
an optimal 6th stage solution:
(i)
specify initial variables
Z(o) by vectors b*(o) which
j s
j-s
satisfy orthogonality conditions like (2·2) and (2·3) for
the required jl2.a* vectors;
(11)
for n - 1, 2,
...
solve
S-1
b*') p*(n)_ A(n)l] b*(n). o· j = 1,2, ••• ,m;
[(I - k~1 j~* j~
j s
j s
j-s
-'
30
obtaining the largest eigen value
jA~n) and an associated
(normalized) eigen vector .b*(n).
J---S
.p*(n) and .M(n) (used below) are computed as .P* and jM using
J s
J s
s
J s
l---sb*(n), ••. , . 1b*(n), '+lb*(n-:;) ••• , b*(n-l).
J - ---s
J ---s
nrs
var i ance
0
fen)
j s
The generalized
z(n-1)
z(n-1) J.·s then
f 1Zsen) , ••• , j zen)
s 'j+1 s
, ••• , m s
=I
M(n)/(l _ .A(n».
j s
J s
This procedure, as will be shown, must converge (monotonically
to a solution of (2·11).
in .f(n»
J s
The solution will be optimal if
the initial variables are appropriately chosen.
A sufficient condi-
tion for the solution to be optimal is for the generalized variance
of the
Z(o) variables to be less than the generalized variance of
j s
any non-optimal jZS variables yielding a solution to (2·11).
The nature of the calculations is such that (cf. theorems 1.2
and 1·3(iii»)
f(l) > f(l) > ••• >
1 s
- 2 s
-
Thus since
lim
(2·17)
n~
j
> 1f
(2)
s
>. •• •
-
fen) is positive
s
fen) = f (say), j = 1, 2, •••
j s
s
t
Let
A(n)
b*(n-l)' p*(n) b*(n-1)
=
jA s
j---s
j s
j---s
and
fen) = fen)
o s
ms
~ f<n)
Then lim i::2..:.s
n~f(n)
. s
=
lim
n~
=
I,
{I jM~n) I(1 _ j
IjM~n) I
m.
~(n»
s
}
_
.;>..(n»
(1
J s
I
.-I
I
I
I
I
I
I
I
el
I
I
I
I
I
.-I
I
31
(2'18) and
lim ( A(n) n-+oo j s
j~~n»
..., m•
for j • 1, 2,
• 0 (since 0 < 1 -
j
A(n)
t3
~
1)
"(n) • A(n)
Hence, in the limit, jA
and
j s
s
j~(n-1) and j~(n) are in the same eigen space -- the space corres,(n) of j p*(n)
ponding to the largest eigen value jl\s
s
•
Suppose jZs generated by
-*
j~'
j - 1, 2, " ' , m, have generalized
Use 1b"'*
b-*
s' 2s'
variance f •
s
I
Let
JA S
then
fs
-
-* - -*
j~ jPs j~;
=
IjMsl(l - JA S)' j ~ 1, 2, ••• , m.
Furthermore l~DB*(S) and
l comprise a solution to the partial deriva-
tive equations (2'11) since f
s
cannot be decreased by changing any
..*
one of the jks'
In practice -- where
jA~n) will always be of multiplicity one --
b*(n-1) will equal .b*(n), apart from a scalar factor (-1), in the
j -s
J-s
limit.
With no loss in generality the
b*(n) may be adjusted in sisn
j-s
so that
-*
= j£S
(2'19)
lim b*(n)
n-+oo j-s
(2,20)
lim M(n)
n-+oo j s
(2' 21)
lim A(n). A
n-+oo j s
j s'
(to be used as above),
= j M,
s
(2'17) holds, and l~DB*(s) and A form a solution of (2'11).
32
The Sum of Squares of Correlations Criterion
The SSQCOR method is developed in much the same way as the GENVAR
method.
The main changes are in the definition of the criterion, the
interpretation of the Lagrange multipliers, and the composition of
the matrices of the "eigen value" equations.
Redefining some notation from the last section,
and
*
jP i
*
*,
jN i jN i for i
= 1,
PI and j
2, ""
= 1,
2, ••• , m.
th stage (1 < s < P ) the Lagrangian equation
- 1
m
s-) m
*, *
*, *
gs = f s + 2j~1 (1 - j~ j~)jAS - 4i~1 j~1 j~ j~ jYsi '
At the
is
=
S
I
.-I
I
I
I
I
I
I
I
el
which, equating to zero, gives
o
=
jY si =
=
S-1
P* b* - jA
b* i~1
S j-s
j s j-s
*,
P*
b*
j~ j s j-s
m
*,
b*)2
k~1 (j~ Rjk k=s '
k"j
A = b*' P* b*
j s
j-s j s j-s
*
j~ j Ysi '
I
I
I
I
I
.-I
I
33
and
f
s
= -(\ ;
~
i
= 1, Z, ••• ,
s-l; j = 1, Z, ••• , m.
The key equations (Z·ll) - (Z·15) are also valid here with, of
course, the new definitions of jP
s
and jP '* •
s
To obtain an optimal
solution one finds some l'DB,*(s) (or l'DB(si and ~ such that (Z·ll)
(or (Z·13») holds with f
s
as large as possible.
The iterative proce-
dure described in the last section may be used to produce these
solutions.
In this context
j
A(n) = b,*(n)' p,*(n) b,*(n)
j-s
s
j s
j-s
The calculations guarantee that (cf. theorem 1·3(i»
f O ) < f(l)
Is - 2 s
Since 0
<
- j
f(n)
s
<
f O ) < f(2) <
-ms
-Is
-
< ••• <
••••
m(m-l), equation (Z·17) is valid here.
(n)
f(n)
Also j f s
- j -1 s
Z(
A(n)
j s
j
~(n»
s
so that (Z·18) holds too.
Now it is apparent that the SSQCOR iterative procedure has
convergence properties like those mentioned for the GENVAR procedure.
In particular (Z·19), (Z·ZO), and(Z·Zl) hold, as well as (Z·17);
l'DB*(s) and
i
make up a solution of (Z·ll).
m
m
A sufficient condition
(0)
for the solution to be optimal is for i~1 j~l{corr(iZS
i;j
(0)
'jZs
)}
to be greater than the corresponding sum for any sub-optimal jZs
variables connected with a solution of (Z.ll).
2
34
Some Calculations and Comparisons
Horst ([3] and [4]; see also [5]) illustrated his methods using
data from Thurstone and Thurstone [4].
The variables consisted
of nine test items broken down into three sets of three items each.
Each item measured one of three abilities, and each set contained one
measure of each ability.
0.000
0.000
I 0.636
0.126
0.000 1.000
0.000
~0'021
0.633 0.049
0.000
0.000
1.000
0.016
0.157
0.~048
0.636 -0.021
0.016
I 1.000
0.000
0.000
0.126 0.633
:.~:~
I
1.000 0.000
0.059
0.049
~O.OOO
0.626
0.035
0.048
0.195
0.459
0.238
0.059
0.129
1.000
R-
Horst transformed the original t matrix into
~.OOO
0.059
I 0.626
I 0.035
I 0.709
I
0.195
0.059
0.459
0.129
0.238
0.426
0.050 -0.002
0.039 0.532
0.190
0.000
1.000
0.067
0.258
0.299
I 0.709
0.039
0.067
I 1.000
0.000
0.000
0.050
0.532
0.258
0.000
1.000
0.000
0.426 1-0.002
0.190
0.299
I 0.000
0.000
1.000
The iterative procedure for the SSQCOR criterion was applied to R
using five different sets of initial vectors:
(i)
j~l
(2·22)
(ii)
*(o)~
(iii)
= (0.5735, 0.5735, 0.5735), j • 1, 2, 3;
all weight on first variables
j~l
(2·23)
(2·24)
equal weight on all variables
*(o)~
.
=
(1.0, 0.0, 0.0), j • 1, 2, 3;
all weight on last variables
jb~(o)~ = (0.0, 0.0, 1.0), j • 1, 2, 3;
I
.-I
I
I
I
I
I
I
I
_I
I
I
I
I
I
.-I
I
'--
35
(iv)
'--
Horst's second stage (s=2) solution to (2·4)
b*(o)'" = (-0.6806, 0.5743, 0.4550);
(2·25)
1-1
--
b*(o)'" - (-0.7524, 0.5557, 0.3536);
2-1
b*(o) ... = (-0.7324, 0.5477, 0.4045);
3-1
(v)
(2·26)
Horst's third stage (s=3) solution to (2·4)
1~~(O)'" = (0.0228, -0.6372, 0.7703);
2~~(o)'" = (-0.0122).-0.5485, 0.8361);
3b~(o)'" = (0.0604, -0.5395, 0.8398).
The iterative procedure for the GENVAR criterion was tested using
(2·22) as a starting point.
The iterative procedure was terminated
3
in each case as soon as
The values of
j~l\jA~n) - jA~n-l)\
<
3f~n) are shoWn in Table 1.
0.0001.
(The SSQCOR figures
in this table are subject to error in the sixth decimal place.)
Table 2 displays the final jb:(n) vectors including those obtained by
Horst for the SUMCOR criterion.
Table 3 gives the values of the three
criteria for each of the solutions listed in Table 2.
It is noteworthy that the five different sets of initial vectors
generated essentially the same solution to the SSQCOR problem.
apparently this solution is an optimal one.
And
[To see that the procedure
does not always work independent of the initial vectors, apply the
SSQCOR procedure to (2'5) starting with (2'7).]
In fact the solutions are nearly the same for all of the criteria.
This is not surprising in view of the high positive correlations on
the diagonals of the Rij blocks and the generally low absolute
correlations elsewhere in the Rij blocks.
36
Computations for this example were done on an IBM 360 Model 75
computer.
The time required to compile the program and investigate
three or four different starting points was about one minute.
I
.-I
I
I
I
I
I
I
I
el
I
I
I
I
I
.-I
I
38
Table 2
Final Iterated Vectors
Criterion:
SUMCOR
SSQCOR
SSQCOR
SSQCOR
SSQCOR
SSQCOR
GENVAR
Starting
Point:
--
(2·22)
(2·23)
(2·24)
(2°25)
(2.26)
(2·22)
0.7323
0.7338
0.7347
0.7338
0.7338
0.7348
0.7371
0.5139
0.5123
0.5115
0.5124
0.5123
0.5115
0.5076
0.4468
0.4462
0.4456
0.4462
0.4461
0.4455
0.4460
0.6586
0.6603
0.6612
0.6603
0.6604
0.6613
0.6636
0.6247
0.6233
0.6226
0.6233
0.6233
0.6226
0.6197
0.4195
0.4189
0.4185
0.4190 0.4189 0.4185
0.4192
0.6781
0.6795
0.6802
0.6795
0.6796
0.6803
0.6811
0.6395
0.6383
0.6378
0.6383
0.6383
0.6378
0.6361
0.3621
0.3616
0.3612
0.3617
0.3616
0.3612
0.3626
*
112.1
b*
2-1
*
3,2,1
Table 3
I
.-I
I
I
I
I
I
I
I
el
Comparison of Criteria Values
Solution
Criterion:
Value of
SUMCOR
Value of
SSQCOR
Value of
GENVAR
SUMCOR
4.4693
3.32926
0.161680
SSQCOR (2·17)
4.4695
3.32981
0.161615
SSQCOR (2°18)
4.4695
3.32981
0.161613
SSQCOR (2·19)
4.4695
3.32981
0.161615
SSQCOR (2·20)
4.4695
3.32981
0.161615
SSQCOR (2·21)
4.4695
3.32981
0.161613
GENVAR
4.4694
3.32976
0.161606
I
I
I
I
I
.-I
I
39
Appendix
Lemma Al
Let A and B be (p x p) matrices with A symmetric and B
positive definite.
B-
1/
Let Al _> A2 > ••• > A be the eigen values of
-
2AB- 1 / 2
,
an d £.1' £.2' ••• ,
eigen vectors.
sup
y~o
p
a correspondi ng set
£.p
0
f ort h onorma 1
Then
y'Ay
y'By =
y'Ay
sup
y:/;o
A and
1
z,'By = Ak+1 for k
, '-a 1/2;=0
-
= 1,2, ••• ,p
- 1.
e
i=1,2, ••• ,k
proof
sup
y:/;o
y'Ay
sup
y'By = Efo
~'B-l/2AB-l/2~
~'x
x'x
,
e x=o
,-
i=1,2, ••• ,k
The lemma now follows from [11], If.2.l and If.2.3, page 50.
Lemma A2
Let R be any (p x q) matrix with p < q.
orthogonal matrices r and
ru'
where
= (D
~
Then there exist
such that
0)
D = diag(Al, A2, ••• , A )
P
2
2
is of dimension (p x p) and AI' A2 ,
proof
'-
See Vinograde [15].
... , A
L
p
are the eigen values of RR'.
40
References
[1]
Anderson, T. W. (1958), Introduction ££ Multivariate Statistical
Analysis. Wiley, New York.
[2]
Fortier, J. J. (1966), "Simultaneous linear prediction,"
Psychometrika, 31, 369-381.
[3]
Horst, Paul (1961), "Relations among m sets of measures,"
Psychometrika, 26, 129-149.
[4]
(1961), "Generalized canonical correlations and their
applications to experimental data," J. Clinical Psych.
(monograph supplement), 14, 331-347.
[5]
(1965), Factor Analysis
and Winston, New York.
2t~Matrices.
Holt, Rinehart,
[6]
Hotelling, Harold (1935), "The most predictable criterion,"
d. Educ. Psych., 26, 139-142.
[7]
(1936), "Relations between two sets of variates,"
Biometrika, 28, 321-377.
[8]
Lancaster, H. o. (1966), "Kolmogorov's remark on the Hotelling
canonical correlations," Biometrika, 53, 585-588.
[9]
McKeon, J. J. (1966), "Canonical analysis: some relations between
canonical correlation, factor analysis, discriminant
function analysis, and scaling theory," Psychometric
Monographs, 13.
[10]
Rao, C. Radhakrishna (1964), "Use and interpretation of principle
components," Sankhy!, 26, 329-358.
[11]
(1965), Linear Statistical Inference and!!! Applications.
Wiley, New York.
[12]
Roy, S. N. (1957), ~ Aspects of Multivariate Analysis.
Wiley, New York.
[13]
Steel, Robert G. D. (1951), "Minimum generalized variance for a
set of linear functions," Am. ~. Statist., 22, 456-460.
I
...I
I
I
I
I
I
I
I
_I
I
I
I
I
I
I
-.
I
41
[14]
Thurstone, L. L. and Thurstone, Thelma Gwinn (1941), "Factorial
studies of intelligence," Psychometric Monographs, 2.
[15]
Vinograde, Bernard (1950), "Canonical positive definite matrices
under internal linear transformations," .f!2£. Amer. ~.
Soc., I, 159-161.
[16]
Wilks, S. S. (1935), "On the independence of k sets of normally
distributed statistical variables," Econometrica, 3, 309-326.
© Copyright 2026 Paperzz