E.J. Hannan; (1960)The Canonical Correlation of Functions of A Random Vector."

\
'
...
THE CANONICAL CORRELATION OF
FUNCTI~NS
OF A RANDOM VECTOR
by
E.
J. Harman
University of North Carolina
and
Canberra University College
This research was supported by the Office of
Naval Research under Contract No. Nonr 855 (09)
for research in probability and statistics at
Chapel Hill. Reproduction in whole or in part
for any purpose of the United states Government
is permitted
Institute of Statistics
Mimeograph Series No. 250
April 1960
THE CANONICAL
Cli£iRELLi'IiLJ1~
0.1" liUNCl'J.ONS OF A
rlAl~DOH
VECTOR 1
by
E. J. Harman
The University of North Carolina
and The Canberra University College
1.
Introduction
The classical theory of canonical correlation is concerned with
a standard description of the relationship between any linear combination of p random variables x t and any linear combination of q random variables It insofar as this relation can be describeo in terms
of correlation.
Lancaster
~~/
has extended this theory, for p =
q = 1, to include a description of the correlation of any functions of
x and y (which have finite variances) for a class of joint distributions of x and y which is very general.
It is the purpose of this
paper to derive Lancaster's results in another 1'ashim W:lich lends itself easily to fenera1ization- to the case where p andq are not
finite.
In the case of Gaussian, stationary, processes this gene-
ra1ization is equivalent to the classical spectral theory and
corresponds to a canonical reduction of a (finite) sample of data
1This research was supported by the Office of Naval Research
under Contract No. Nonr-855 (09) for research in probability and
statistics at Chapel Hill. Reproduction in whole or in part for
any purpose of the United states Government is permitted.
2
which is basic.
cesses.
The theory also then extends to any number of pro-
In the Gaussian case, also, the present discussion is in-
timately connected with the results of Gelfand and Yaglom ~2:7 relating to the amount of information in one r81 dom process about
another.
2.
The Canonical Correlation of Stochastic Processes.
We consider two stochastic processes, that is two families of
random variables
, seS
, teT
; Yt
lI'here Sand T are two index sets.
two sets
be independent.
\~e
~IJe
do not require that the
have mainly in mind the case where S
and T are finite or are the set of all integers or all reals but the
discussion is general.
We consider the space f2. which is the cartesian product of a
family of copies of the real line, one copy for each point in S and
T.
It is a classic· f~Qt:Kolmogoroff J:37)that the joint distribu-
tions of finite sets of the X and Yt may be used to institute a
s
probability measure, ~ , on a Borel field of sets in n. which includes all cylinder sets having a Borel set in a finite dimensional
Euclidean space as base. ·We call
it
the Hilbert space of all square
integrable complex valued functions of
product in
£a
it
en. and indicate the inner
by the square bracket,
(w), b (uJ)J
..
r a, ~
, .- ,\~\I a .11,2
. •
~
(..J
fa
.n.
(w) b (w)
3
We shall use the letters f, g, h, •• _ for those elements of ,t
which are functions only of the coordinates of w belonging to S
for those depending only on the coordinates be-
and u, v, w, •
longing to To
t
The set of al such f forms a subspace (closed) of
which we call
we call
n.
In ,
and the set of all u form a subspace which
We shall use the notation
gJ = (f,
Lu,!l = (u,
["f,
2
IIf: 11 11
V)N' (u, u)N = 'lI.u /I,2N_
g)M' (f, f)M -
By a well known property of bounded linear functionals on a Uil·
bert space we know that
Where A is a linear operator from 1'1 to N, and putting u = Af
in (1) we derive, from Schwarz IS inequality, that
nf: liM
1I Af.ll 'N ~
$0
that
A
,
is bounded by 1_
["f, u_7 •
Of course we also have
(f, A~)M
where A* is the adjoint of A..
We may write ( Riesz and Nagy
A
[Zi_7
p. 286)
= WB
where B = (A*A)1/2 (taking the positive square root) and Wis"partially isometric ll in that it maps
Blf'lisometrically on to Ailland is
the null operator on (~).J., the orthogonal complement of
Thus
(2)
A ... W S~-
f dEC p..>
Boo. in m.
4
where E(
P)
is a resolution of the ioentity in in relative to the
Borel subsets of LQ,1_7.
It is evident from (1) that 1 is always
a characteristic value of B corresponding to the characteristic
function which is identically 1.
Now putting
(J )
f =
51
0-
dE (p)f , u =
.L
where ul e(Mn.)
Lf,
dE (p) w~~ u + u l '
, we have
k- f d (WE (p)f, ~VE (p) li1T* u)N
uJ =
= J~- f
(4)
wJ-'0-
d (E
(f
)f ,E'r)'w"u)M
The relations (1), (2), (3), (4) are the generalizations of
the classic relations of the theory of
Jt
c~onical
correlation•. There
is the space of all (real) linear combinations of the Xl' •••
xp ' Yl".
Yt while 1'Y1 and It are similarly constructed from the X
and y sets respectively. The inner product ~f, u~ is usually
written out as a p x q matrix and A corresponds to the matrix of regressions of the x set on the y set.
3.
The Canonical Correlation of Two Finite Sets.
It is shown in
t:2~
that the information (in Shannon's sense)
in one set of p random variables about another set of q can be
finite only i f the probability distribution, in the p + q demensional
space, induced by the joint distribution is absolutely continuous with
respect to the product measure induced by the marginal distributions
of the two sets.
tion
functions~
If we use H(x, y), H(x), N(y) for these three distributhen when this absolute continiuty condition is satis-
fied, we shall put
...
A (x, y)
~H
0:
(x, y)
oM (x)
~N(Y)
for the Radon-Nikodym derivative of the H measure with respect to the
product measure.
(5)
At
Thus A in (1) is defined by
= fA(~)f(X)dM(X).
The typical case where this is not so is that forpe.t q-l, where
there is a concentration of mass along a line. When H(x,y) is Gaussian the amount of information becomes (~2~ p. 217).
1
2
- ~ Z log (1 - r j)
wherer j is the j th canonical correlation in the classical theory.
This last expression is
(6)
~ log tr B2
where the trace is defined as
and the ';k from any complete orthonormal sequence in the (separable)
Helbert space ~. The expression (6) follows from the fact that
the spectrum of B2 is now discrete with jumps at all points
Q
,
=
TJJ r~Pj
J
,p. integral,
J
the typical jump corresponding to a characteristic function which is
the product of the p. -th standardized Cheybeyshev-Hermite polynoJ
mial (see for example
L 5_7
p. 133) in the j-th x variable canoni-
cal in the classical theory.
If (7) is required to be finite in any case then evidently B2
must be a compact (completely oontinuous) operator with finite trace
so that A (x,y) must be square integrable with respect to d M (x)
dN(y).
This is the case studied (under a slightly different guise)
6
by Lancaster
L 1 J.
Though (6) gives no loneer a satisfactory mea-
sure of information when the distributions are not Gaussian (since it
may be infinite when the maximum correlation between an f and a u is
very smaLl) the restriction to square integrable A (x"y) appears not
unreas anable.
The analogy with the classical theory is now almost complete for
B will be generated by t he square intergrable kernel
= t~ Pj~j
B(x"z)
~. (x)
so that the
fj
J
(x)
~j
EO~
p;
< 00
become the canonical x variables while
(y) = W ~j (x)
are the canonical y variables.
with
(I),
Finally
["f" uJ = E ~ ~j f j
f = Eco0 a j P" j (x ) + f,,, U = E00
0
~ j UJ
rj ( ~) +
U" ,
-mere f l is orthogonal to all ~. and u, to all If ..
J
J
4. The Canonical Correlation of Stationary Gaussian Processes.
We consider first the case of two stationary Gaussian processes
of the form
(8 )
x = E~ (~1' cos t). + ~2' sin tA.)
s
J
lJ
J
J
E~o . ('\1...! cos tA. + 1')2' sin tA.)
1 .J 'J.J
J
J
J
Y
J
t :=
2
2
Ea . < 00" Eb. < CD
J
where the
<
0
<
Aj
::
IT
J
~.
and
~j
are Gaussian and S and T are the integers.
vIe al-
so take these variables to have zero mean and unit variance and assume
fJ'i)-
all of thelariances zero save for
7
If we form the random variables
CJ.j = ( l / J fl/+
.J
( aj = (1 /~
f
2
lj
+
f 2j
2
2
f2j
) ( fij 'Ylij - f2j 'Yl2j)
.) ( f2j 'Yl2j + Pl j
then the only non !ero correlations among the
~ij
~j)
and
~
ij
Now the functions of the form
where there are only a finite number of terms in the product, span
the space }n,.
This follows from the fact that the elements of
~ may be approximated (in the strong topology) arbitrarily close-
ly by polynomials in the X and any X may be approximated strongly,
s
s
uniformly in s, by a truncated sum of the form
is Gaussian the result follows.
s,n
To orthonomaliae these products we replace the powers of
Since x
s
- x
~1j
by the standardized Cheybeyshev-Hermite polynomials, the pth of
8
which we indicate by H (.; .. ). We do the same with the ~1.' J"
P
J.J
Then the products of the H (.; .. ) form an orthonormal base for
p
l.J
and the H ( , .. ) perform the same function for
l.J
upon which E (
P)
,
11.
h1.
The space
in (2), projects is thus spanned by the pro-
ducts
for which
(10)
This serves to describe, in a somewhat unsatisfactory fashion,
the E
( f ) for a general pair of Gaussian stationary processes.
Let x
and Yt have the spectral representations (~6~ p. 481)
(11)
t
x
J~
cos t A dU
= I~
cos t A dU
=
t
Yt
l
2
(A) +
J~ sin tAdv ()..)
l
()..) +
f~ sin t A dV2 ()..)
E'r1.dul()..) 2] = 2dFll (A)
=
[ [ dV (A)2]
l
E[ dU2(A)2}
=
E[ dv2()..)2 J
with
[
= 2dF
22 ()..)
fdUl ()..) dU2 ()..)]...
[f
dUl(A) dV2 ()..)J
[~Vl()..)
= - EfdVl (A)
dV2 ()..)J =
dU 2
0{ [2dF12 ()..)]
P-1= J{2dF12 (A) J
9
where
M and· j
denote the real and imaginary parts and all
other cross products have zero expectation.
The F.. (A) are the
~J
spectrdLdistribution functions.
We may now approximate to the processes (11) by two sequences
of processes ~ (n)
t
'
y (n)
t
of the form (8) with
j =
1, ••• ,
n
(n.)
(n)
b J 1) 2j
If En (
p ) projects
upon the subspace 11111. of
the elements defined by (9) and (10) for the
converges strongly to. E (
P)
for every
to the point spectrum of A (in (2».
An = Qn
A
P""
....
1rt
spanned by
.;(~~
then E
~J
n
(P )
P which does not belong
Indeed, if we put
10
wh9ra ~ projects onto the subspace
JIl n
n spanned by the
of
m
H ( ~ .. (n» and P projects onto
then A is the operator dep
J.J
n
n
n
fined by (1) for the xs(n) and'l (n) processes, on the subspace
t
-'Vv1
{nn' since if
then
f
=fl
+ f , f
2
l
e
m
n,
f
u
=~
+ u 2 ' ul £.
fl n ,
u
(Qn A Pn f, u)N
=
2
2
£.
rll..ln
£.
J'ZJ.
n
(A f l , Ul)N ., ["f1 , ui7·
However, P and Q converge strongly to the identity operators
n
n
on
for
fYL and
7t (since{x(~f
Ly~n)} p.
(LI.:JJ
Thus
converges stron(::;ly to
X
t
P and similarly
Q A P converges strongly to A and it follows
n
n
p 369) that En(p) converges strongly to E.(f~ for each
not in the point spectrum of A.
If we put
r
L FiJ·
(kn)
n- -
then the inequality
(10)
F
«k-l)n>j
n
= 6 (n)
F
ij
(A)
k
may be written in the form
A characterization of E(
I.d F
(A)
12
I
.p )diractly
."
in terms of
p
11
would be preferable to the indirect one given above.
The situation for Gaussian stationary processes is simpler
than that obtained in general for the theory extends to any number
r\
, as beof processes. Let x.1., t be the i th process. We use ~4
for~for the space of all realizations of the vector process and
It
again for the hilbert space of square integrable functions on
but ·~i for the subspace of functions of the realizations
of the i th process only.
12 ,
Now
r f.,
f.
1. J
L
J
= (A.. f , f.) M
1.J i
J j
where
A.. )1/2 •
1.J
It is not difficult to show that
If now, in addition, the x.1., t are jointly Gaussian and
stationary the B •. , for fixed i and j varying, commute. This is
1.J
easily seen to be so for the B1.. .(n), corresponding to the x~n)
1.,t
J
as defined earlier in this section, and will be so also for their
strong limits.
B';J'
...
=
Thus we may write (L~_7 p. 36~)
JI
0-
p..
1.J
(r.) dEi
(r.)
12
where the p.. (A) are defined, measurable and take values in
l.J
~o,
(E
1_7,
i (A)
almost everywhere with respect to all of the measures
f, f')M.' f
mi-
&
l.
Thus
rf.,
1.
-
f. 7
J-
f ,..
=
l.J
0-
(A) d (Ei (A) fi' E. (A) WoOf ')M
1.
J 1. J i
and we have, speaking loosely, simultaneously diagonalised all of the
For the present situation the information defined in one process about the other can again be shown to be
1
2
2' log tr (B )
This is infinite in general for a p air of processes With continuous
spectrum but for a process whose matrix of spectral distribution
functions has only a denumerable sequence of jumps, at the points
Aj let us say, we have
1
2
'2" log tr (B )
1
=
2p.
~ log Z II Pjt J t ,
where the summation is over all different products, each such product
being repeated twice, however, since Hp (io:lj ) and Hp
the same correlation as H (';2 .) and H
2 .) ,
P
1
= '2"
log
P
J
II (
J
1 -
(t
rj2
)-2
(t 1j ) have
J
=-Z log (1-f. 2 )
J
13
= -
Z. log (1 J
agreeing with the formula given in
~
2-7 p. 226.
The methods used in this section extend to a number of other
cases, in all of which the processes are Gaussian, of course.
(a)
A ~quare
on the real line.
continuous vector valued stationary process
Indeed the theory may be extended to any mean
square continuous process (vector valued) defined on a separable,
loeally compact, AbeJias group,
g , for which the
covariance func-
tion
, s, t
&
g
is a function only of the translation required to take s into t.
Indeed we may write the vector xt,having x it in the i th
place, in the form
=
1\
where
Z
§
J:ff
(ll,t) d Z (a.)
is the character group of ~ , (a,t) is a character and
11-
(a) is a vector valued function defined on;;
for which
The development then follows the same line as that given
above.
(b)
If only two processes are involved there is a range of
other relevant cases of which we shall mention only that of two meaa
14
square continuous (but not necessarily stationary) processes on a finite interval on the real line.
It is now not very difficult to
show that we may put
ex:>
=
i:
1
where wk and zk are random variables with unit mean square and ak(s)
k (s) are constants. The cross products between the random vari-
J9
ables vanish save for those between wk and zk for the same k.
In this case the information, 1/2 log tr B2, is
where
'fk
is the expansion (essentially by
i~rcerls
(t)
theorem) of the continuous
function
r 12 (s, t)
5.
~anomical
=
Analysis of Sample Sequences
For completeness we shall here briefly indicate the analysis
of a sample of n
consecutive observations on a vector stationary
process x t of p components.
We form the transforms
1
J
n
(A., x)
"n x t
= r= oWl
" n
e
itA
whose real and imaginary parts correspond to the u (A) and
v (A) previously discussed.
f [In (A, x)
(12)
It is easy to show that
I n (\-L,x) '}
~
null matrix ,
A
*F
J
if A and \-L are not points of jump of F (A), the matrix of spectral
distribution functions, and that
However, we shall want to form J
n
(A,
x) for
Yl
equi-
spaced values, say
A.
J
=
2nj
j
n
=
0, • ••
In - 1
Now it is easy to show that
(14) If
where
E[Jn (Aj,x)
p=
I n (Ak,X) '] /1 < K
sin np
I
n
-1 n""l
I
l:
1
II r
(h)lll sin nph
n
Ij-kl
n-1
<
-
K Z:
1 1
~r(h)1I
h
n,p~
1,
which converges to zero as n increases if
00
II r
l:
(h) II
<
00
o
which does not appear to be an over strong condition.
Thus the re a1 and imaginary parts of J n (A, x), for the
case where x
t
is Gaussian can tm{e the part played by u (A) and v (A)
l
16
in the previous section, at least asymptotioally. Of course, the
f ij
0-
are not known unless F (A) is prescribedAPriori, which is
not likely, so that the effects of estimation need to be considered.
Of course a considerable improvement in the rate at which (14)
converges to zero can be obtained under suitable stronger conditions.
For example if
00
xt
= i:
-00
A.6 .
J /;-J
where the E t form a sequence of random vectors with
=
E~ s, t G
and
00
i:
<
00
8
in the same way as I n (A,X)
-00
then
(~7_7
section 111.1)
li:
L
A e
j
ijA
}
where I n (A, e) isformed from the
was formed from the x • Here
t
Kn
t
-1
while
I )
J
n
(!J.,
8
)
j
=
-e
17
References
L1J
Lancaster, H. O. liThe Structure of Bivariate Distributionsl!,
Ann. Math. Statist., 29, (1958) 719-736.
/:2J
Ge1fan, I. 1"1. and Yag1om, A. M.
"Calculation of the Amount
of Information about a Random tunction Contained in
Another Such FUnction, American Mathematical Society
Translations, Vol. 12 (1959) 199-246.
1:3J
Ko1mogoroff, A. N.
Foundations of Probability Theory,
Chelsea (1956).
~J
Riesz, F. R. and S~-Nagy, B.
Functional Analysis
B1ackie (1956)
["5_7
Cramer, H.
l1athematica1 Methods of Statistics
Princeton (1946)
L:6_7
Doob, J. L.
["7_7
Hannan, E. J.
Stochastic Processes.
Time Series Analysis.
Wiley (1953)
Methuens (1960)