E.W. Stacy; (1959)An Estimate of correlation corrected for attentuation and its distribution."

,;>;'."
If:, .
f
~.
•.
AN ESTIMATE OF CORRELATION CORRECTED FOR ATTENUATION
AND ITS DISTRIBUTION
5
By
Edney Webb Stacy
University of North Carolina
r~.""
.. <
'.'.""::",:":
' .. -,-,:
'.-oJ·-:
-"
"" :J'
!( .
This research was supported by the
Office of Naval Research under
, Contract No. Nonr-855(06) for research in probability and statistics
a~Chapel Hill.
Reproduction in'
whole or in part for any purpose
of the United States Government is
permitted.
",;:~.
Institute of Statistics
Mimeograph Ser~es No. 233
August 1959
,.' '.. ,e
ACKNOWLEDGEMENTS
It is a pleasure to acknowledge indebtedness to
Professor Harold Hotelling for suggesting the research
presented here and for illuminating its course with further
suggestion and assistance at every
stage~
The opportunity
of working under his distinguished direction has been a
privilege.
I wish to express my deep appreciation to
Professors R. C. Bose, W. Hoeffding, and S. N. Roy for
their advice, criticism, and assistance in preparation of
the paper.
The Office of Naval Research supported initial
stages of this research; and the Operations Analysis
Office of the Air Force has made possible my recent stay
at the University.
To both of these organizations, I
acknowledge my gratitude.
Reproduction of this work for
any purpose of the United States Government is freely
permitted.
ii
TABLE OF CONTENTS
CHAPTER
• • • • • • • • •
ii
Introduction • • • • • • • • • • • • • • • • •
v
Acknowledgements • • •
I.
II.
III.
•
o.
•
MODEL OF TWO SETS OF P VARIATES
1.1 Basic structure •• • • • • • • • • • • • •
1
1.2 The 2-variate case. • • • • • • • • • • •
1
1 .3 A generalization • • • • • • • • • • • • •
3
1 .4 Covariance matrix. • • • • • • • • • • • •
4
1.5 Illustration. • • • • • • • • • • • • • •
9
DENSITY FUNCTIONS
2.1 Density considerations • • • • • • • • • •
11
2.2 Canonical correlations • • •
• • • • • •
13
2.3 Canonical variates and quasi-canonical
correlations • • • • • • • • • • • • • • •
16
•
MAXIMUM LIKELIHOOD ESTIMATES
3.1 Likelihood function. • •
• • • • • • • •
18
3.2 A lemma • • • • • • • • • • • • • • • • • •
19
3.3 Estimation • • • • • • •
• • •
20
3.4 Estimates using new variates • • • • • • •
23
3.5 Asymptotic distributions • • • • • • • • •
29
3.6 Distribution of the sample quasicanonical correlation • • • • • • • • • • •
36
3.7 Alternate derivation of estimates. • • • •
38
3.8 Anomalies of
P and
•
o
•
•
•
•
0
w • • • • • • • • • • •
40
iv
J
CHAPTER
IV.
DISTRIBUTION OF THE STATISTIC w
4.1 General. • •
•
•
• • •
43
0
••••••••••
45
4.3 Lemmas for the distribution of w • • • • •
50
4.4 Distribution of w. • • • • • • • • • • • •
58
4.5 Illustration • • • •
61
.00
•
4.2 Geometry • • • • • •
V.
0
•
•
0
•
0
•
•
•
•
0
•
•
•
•••
SUMMARY WITH ALLIED PROBLEMS
5.1 Basic assumptions and techniques. • • • •
77
5.2 Summary of results • • • • • • • •
78
0
•
•
•
5.3 Extension of the theory. • • • • • • • • •
80
5.4 Additional problems. • • • • • • • • • ••
83
BIBLIOGRAPHY. • • • • • • • • • • • • • • • • • • • •
85
\.
INTRODUCTION*
Measurement of mental characteristics is subject to
considerable error as a rule.
Test scores are regarded as
having two components--a true score plus a variable error.
In the fields of psychology and education, statisticians
are concerned both with the reliability and, more importantly, with the correlation between true measures of different characteristics.
With regard to the latter problem, if
perfectly reliable tests were available, correlation between
two characteristics could be estimated directly from a
sample product-moment correlation.
The existence of test
errors means that correlation between test scores may not
be an adequate indication of correlation between characteristics which the tests purport to measure.
C. Spearman, G. U. Yule, and others noted this
phenomenon around the turn of the century.
In 1904,
*This research, begun in 1948, was supported until
its interruption in 1950, by the United States Department
of the Navy, Office of Naval Research, under Project
NR 042031, Contract No. N 7onr-284(02). In September
1958, research was renewed in conjunction with the United
States Air Force policy regarding administration of those
portions of Public Law 57, 84th Congress, which pertain to
contract training and as a part of the Air Force Academic
Training Program for Operations Analysts, established by
order of the Chief of Staff via letter to the Commander-inChief, Continental Air Defense Command and dated December 3,
1956. Further assistance of several kinds, including typing,
was provided through the Office of Naval Research Project
NR 042031, now under Contract No. Nonr-855(06) for research
in probability and statistics at Chapel Hill. Reproduction
in whole or in part is permitted for any purpose of the
United States Government.
v
vi
Spearman 1 [17] proposed a method for correcting the correlation between test scores to obtain a better estimate of
correlation between characteristics.
Errors of measurement
tend to diminish or "attenuate" the correlation estimates.
Hence Spearman's method is called a correction for attenuation.
The correction (to be specified later) involves
reliability coefficients of the tests.
Though the meaning
and method of estimating reliability has been the subject
of much discussion in the literature, Spearman's concept
was that of the "self-correlation" of a test.
Suppose, for example, that an arithmetic test is
administered twice to the same group of students and that
memory of the test items plays no part in determining the
second set of scores.
Discrepancies between the two sets
of test scores are expected, and they evidence test error
in measuring the mental characteristics involved.
The
sample correlation betweeen the test scores is a measure
of test reliability in the Spearman sense.
Current methods
for estimating reliability are summarized and evaluated by
Kuder and Richardson [10] and Loevinger [11].
Statistical textbooks such as [7], [8], and [12]
include reliability and correlations corrected for attenuation as topics.
••
The Spearman correlation corrected for attenuation
is r/~r1r2 ,where
r
denotes the sample correlation
1Numbers in square brackets refer to bibliography
listed at the end.
vii
between two sets of test scores and
r, ,r2
self correlation for the respective tests.
are apparent.
are measures of
Two anomalies
The modulus of the statistic may exceed
unity and the statistic may have an imaginary value.
A central difficulty in determining reliability is
an ambiguity of definition.
For example, the split-half
method [10] requires the test items to be divided into two
groups, each group containing half of the items.
A test
containing more than 100 items, say, may be "split" in a
multitude of ways, each providing an estimate of reliability.
This difficulty has received moderate attention in the
literature of psychology, e.g. [10] and [12]; but no generally accepted methods have been found to escape it.
In Chapters I through V, we shall consider two
random variables
traits.
~
~
and
The variable
~,
which may be interpreted as mental
for example, might be the arith-
metic ability of college freshmen; and
~
might be the
reading comprehension ability of those students.
We shall
suppose that each trait may be measured (inaccurately) in
more than one way, e.g., by more than one test.
~
denote two measures of
X,
=~
If we
by X, and Xa, for example, we write
+ E,
X2 = ~ + E2
where
E,
and
E2
each other and
are random errors which are independent of
of~.
The error variables
E1
and
E2
are
viii
assumed to have zero means and equal standard deviations.
The scores X1 and X2 may stem from a variety of situations.
However, we shall avoid ambiguity if we always think of
X1 and X2 as having been produced by the test-and-retest
situation previously illustrated.
Then the correlation be-
tween X1 and X2 may be called a reliability coefficient.
(Similar remarks pertain to the mental trait which we denote
by
~.)
Extension of these concepts to cases in which more
than two measures are available for each trait is straightforward.
Pe~
The correlation between
e and
~
is denoted by
and is called the correlation corrected for attenuation.
Study of correlation corrected for attenuation in-
volves two sets of variates:
set
measuring~.
one set measuring
e and
one
Hotelling [5] has developed the theory of
relations between two sets of variates and has shown that,
referring to features which depend only upon correlations
and to non-singular internal linear transformations within
the sets, canonical correlations and functions of them are
the only invariants of the two sets.
Texts by Anderson [1]
and Roy [15] include many subsequent developments in the
theory.
The present study gives consideration to canonical
correlations for the special cases, i.e., models, which
are
discussed.
The theory of simple correlation is widely known and
is included in modern texts in statistics.
A historical
ix
account of its development is given by Hotelling along with
his research in [6).
The primary subject to be discussed herein is the
sample correlation between variates whose measurements are
subject to error.
Thus, in the sense of the preceding para-
graphs, the study concerns correlations corrected for attenuation.
A statistic, derived from maximum likelihood con-
siderations is proposed, and its exact probability distribution is given.
The new statistic cannot result in an imaginary number,
as is possible with Spearman's correction.
However, its
modulus may exceed unity, just as Spearman's may.
This
illustrates a class of problems which has been drawn to the
author's attention by Professor Hotelling and which apparently has not been treated in the literature of statistics.
Suppose, for example, that independent random variables
X and Yare observable only in the form
X and
Suppose further that information concerning
X +Y •
X and
is obtained separately from independent samples.
X+Y
The vari-
X + Y may be estimated directly from the
ances of
X and
samples.
But how should such information be used to estimate
the variance of Y?
The variance of
X + Y is the sum of
the variance of X and the variance of Y.
traction of variance estimates for
Yet, simple sub-
X + Y and
X (based
x
upon independent samples) may result in a negative value.
Obviously, some modification is required if such a result
is to become the basis for estimating the variance of
Y.
Apparently, similar considerations are required in applications of reliability coefficients and correlations corrected for attenuatione
A mathematical model is specified in Chapter I.
The
present study of reliability and correction for attenuation
postulates two sets of variates.
Each set contains
p
~
2
variates, and each variate includes a random error as one of
its two components.
The two-component concept, with attendant
assumptions, is called the structure of the variate.
Both the case
are treated.
p
=2
and its generalization, p
~
2
The covariance matrix, as determined by the
assumed structures, is established for the variates in each
case.
The concepts are illustrated at the end of Chapter I.
Density functions for the study are specified in
Chapter II.
It is assumed that the joint distribution of
the variates is multivariate normal with zero means and
specified positive definite covariance matrix.
The symbol
P1 (or P2) always refers to correlation between variates
in the same set; and
ferent sets.
p
to that between variates of dif-
Canonical correlations stemming from the model
are discussed for
P1 ~ P2
and for
P,
=
P2 •
Under the
xi
assumptions of Chapter II, an exact test is provided for the
hypothesis
P,
= P2
versus the hypothesis
P, ~ Pa •
The covariance matrix for the variates is denoted
by
t
or, when P,
= Pa,
by ~.
estimation of elements in
Chapter III concerns the
~.
ates have a common variance
It is assumed that the variaa
> 0 and that the correla-
tion between any two variates from distinct sets is
p
•
It is shownfuat, if the specified structures are adopted,
p,
Ip/p, I < 1 • These restrictions are waived
> 0 and
to facilitate estimation of a 2 , P"
and p •
The maximum
likelihood estimates of p, and P2t denoted by
mayor may not satisfy the inequalities
I pip, I < 1
t
p,
P1 > 0
and
p,
and
as is discussed at the end of Chapter III.
Asymptotic distributions of the estimators are derived,
and canonical correlations are considered.
Chapter III
includes an alternate derivation of maximum likelihood
estimates and a discussion of their anomalous properties.
The
next chapter, Chapter IV, provides the distri-
bution of the statistic
probabilities for
w.
It is shown that cumulative
w can be expressed as linear functions
of four types of integrals.
These integrals are evaluated
and give the cumulative distribution function, F(w), as a
terminating series which involves incomplete Beta-functions.
F(w)
and the previously mentioned asymptotic distributions
are functions of p, and
p~~.
Methods of calculation are
illustrated by numerical example.
xii
Chapter V is a summary of assumptions, techniques
and results.
It includes a discussion of some possible
extensions of the results and of some problems which are
associated with the present research.
...
CHAPTER I
MODEL FOR TWO SETS OF P VARIATES
1.1
Basic structure
The introductory remarks suggest the study of vari-
ables whose structures are of the form
x = E; +
£
,
where X is a test score, say, E; is the true component, and
£
is a random error which is independent of
~
•
We adopt
this structure and assume throughout that the expectations
of E; and
1.2
£
are zero.'
The 2-variate case
Suppose the random variables X"X2;XS,X4 have struc-
tures
i
i
= 1,2
= 3,4
,
where
(a)
...
£"
••• '£4
constitute an independent set of
'It is well known [6] that for complete samples of
n + 1 individuals from a p-variate normal population, the
distributions of functions of the covariances, computed by
dividing the sums of the products of deviations by the number
of degrees of freedom n, and with arbitrary population
means, is the same as if sample means were not thus eliminated
but the means were known to be zero. Accordingly we have
assumed that the population means of all variables are zero.
1
2
variates with means zero and variances
=
i
1 ••••• 4,
respectively, and
(b) ~ and ~ are independent of the error variables
respectively, and have covariance
af
ance of Xi by
= 1, ••• ,4)
(i
>0 ,
a~ , a~
have zero means and variances
£1' ••• '£4 ,
a~~.
Denoting the vari-
and the covariance of Xi with
Xj by aij .we have
+ a£2 .
-
a~
J.
i = 1,2
,•
~
2
+ a£.
,
i = 3,4
~
(1 .1 )
a12
-e
2
= a~ ;:> 0
a2 > 0
aS4 =
a ..
~J
11
i = 1 ,2;
= a~~
j = 3,4 •
Using a notation similar to that above in connection with
the letter
p
we note that
to indicate correlations between variates,
P12,
> 0 • Further,
PS4
_ al;n
P~11 - O'l; a
(1 .2)
11
a ..
--1:.L
¥'a12 a S4
=
(i=1,2; j=3.4).
Now, we impose two conditions which apply to the whole of
the remaining discussion, viz. , a£1
-e
This implies
a1
-
<12
and
=
a£2 and
a£s
so that
as
= <14
(i
= 1 .2; j = 3,4)
p •.
(1 .3)
Pl;11 =
~J
.vP12PS4
•
=
a
£4 •
3
.
Denote the variables
and Z2, respectively.
X, + X2
and
Xs + X4
by
Z,
Then, using customary notation for
correlations between variables, we have
PZ, Z2
40'~1'I
-
)40'e + 20'2£1 ) (40'21') + 20'2E:s )
(1 .4)
=
20',S
)o'~
+ O'f)( o'~ + 0':)
from which
(1 .5)
1.3
•
A generalization
p >2
Generalization to the case of
each set is straightforward.
variates in
x" ••• ,Xp;Xp+1' ••• 'X2p
Let
be random variables with structures
=~ +
i-1, ••• ,p
+ £i
x.1.
1')
i - p + 1 , ••• , 2p
£.
1.
and make assumptions which are direct extensions of (a)
2p
P
and (b) of paragraph 1.2.
If
Z,
=
LXi
and
1
then
imply
0'1
= 0'.1. (i = 1, ••• ,p) and O'p+1
=
0'.1. (i
Z2 =
L
Xi
p+1
'
= P + 1 , ••• , 2p)
4
2
P O'E;!)
+ pO':
(1 .6)
p+1
)
2
P 0'1 ,p+1
=
j'[ (p2_
P)0'12 + PO'~][ (p2 -
P)0'~+1 ,p+2 + PO'~+1]
from which
,
(1 .7)
and
_
(1 .8)
1.4
PE;T}
=
•
;::=:::::;::::==P=Z;1=Z~2~:::;:=====:;:
'"
Ie
p P12
)( P Pp +1 p+2
)
1 + ( p - 1) P12 \., + (p - 1 'P p+1 , p+2
Covariance matrix
We assume hereafter, in all of the discussions, that
p
~
P12
2 , O'i
=
Pl
= 0' > 0
(i
, Pp +1,p+2
=
1, ••• ,2p), and adopt the notation
= P2 , and P1,p+1 = P , maintaining all
assumptions of paragraph 1.3.
The parameters Pl and P2 may
be regarded as correlations between like forms of the same
test and will be referred to as reliability coefficients.
We denote the covariance matrix for the variables
and define
~
where
=
0'2
~
"
2:: 21
..
e
'"
5
(1.10)
~"
=
,...
(1.11)
~22 =
1
P,
• • • P1
P,
1
• • • P1
P1
P1
••• ••
••• 1
1
P2
.0.
I)
•
•
•
•
,
P2
P2 1
•••••
P2 P2
••• ••
••• 1
P
P
• •• P
P
P
••• P
••• •
• •• p
• • • P2
,
and
(1.12)
=
Z,2 = Z21
-_
•••••
P
P
specifying that
1 + (p-1)
e,
+
2
e.a
> 0,
+ PP
1 +(p-1)
P1 + P2
2
- pp
>0
Now consider an orthogonal transformation of the variables:
Let X and Y be random column vectors with elements
X ••• ,X2p
"
and
Y"o •• ,Y 2p
tors be related by
p
respectively.
Y = AX , where
A
is the orthogonal
matrix
(1.13)
where
A
=
As
Let these vec-
o
A, and A2 are the 2 x p submatrices
•
..
6
e
1
A1 = -
( 1 014)
ffp
1
(1015)
A2 = ¥'2p
1
o
1 • 00
G
1 ••0
l;
-1 00.
As and A4 are the (p - 1 ) x p
.E.=l
a1
0
(1.16)
Aa =
*e
0
....
0
~ =
-J
Boa
-a1-1 --1a1
-1 -1
0 ....,.
a2 -a2
-1 -1
••• as -a8
• ••
••0
o ••
o •
o •
-¥21 ~
--1
0
-a1-1 --1a1 E~
a1
000
p-~
0
• ••
.:2~
0
0
••
•• •
•••
-1
-¥21
-t/2
•• •
0
o •
a . =~(p-j)fP-j+1)
J
0
t
a2
a8
j
=
o •
0
0
1J
•
•••
0
0• ,p-1
all elements of A not defined by (1.13),
~e
••
--1a2
--182 -a2-1
-1
-1
-_.
a3 -as
•0
where
,
submatrices
0
--1a1 --1a1
(1.17)
t
--1a1 --1a1
.E.::~
--1a2
a2
0
f: II •
J
00
"'OJ
, and where
(1.17) are zero.
We denote the covariance matrix for the elements of
and, after noting that the means of Y1'. DO, Y2p
state the following theorem:
y
by
are zero,
,..,
D
.
•
7
Theorem 1.1
If the elements of X have covariance matrix
~
and other assumptions of this paragraph obtain,
B
o
o
o
(1.18)
where the zeros represent zero submatrices with appropriate
numbers of rows and columns,
1 + (p-1 )
P1+P2
P1-P2
( p- 1) -"2
P1.P2
1 + (p-1 )"""2- - pp
2 + PP
(1.19)
( p- 1)
.and where
...
I
-
Proof:
is a
Since
2
(p - 1) x (p - 1)
Y1 =
-l- (Z1
P1+P2
identity matrix.
+ Z2)
p
the definitions
~
together
with
the structures of Xi (i= 1, ... ,2p),
of Z1 and Z2'
permit us to write the variance of Y1 as
(1 .20)
Since
we have the variance of Y2 as
(1 .21 )
We have
Y1Y2 = 2~ (Z~ - Z~) , with expectation
•
8
which establishes the matrix B.
Using customary notation
for variances and covariances, we have for
1.. [( p _ j) 2(12 + (p _ j) 20'2
=
(1 .23)
a~
J
2f
-
= (12(1
Similarly, for
=1, ••• ,p-1
j
1.
(
')2,...
(p-j)(p_j_1)
J] ,
v12 ----......~2~---- 0'12
P- J
- p,)
•
= (12(1 - P2) , with
2+j
the notation agreement (here and elsewhere) that ap = a, ,
a p+1
= a2
j= p,.e.,2(p-1),
(1y
, etc.
Y2+j
The covariance of
and Y, (j = 1, ... ,p-1)
2
0'
[ (p _ j) (p _ 1 ) P1 + (p _ j) + (p _ j) pp
a, .v2p
J
- (p - j H(p - 1 ) P1 + 1 + pp J] ,
(1 .24)
-
0
•
We consider next the covariance of
for
is
j,k < P and
j ~
k •
Y2 +j
and
Y2+k
We have
(1 .25)
.
.
- (p - k) (1 2 - 2( (p - 1<)( p - j) 0' 1 2 + (p -
j -
1 )(p - k) (112J
+ (p_ k)(12
=0
•
By similar considerations, when
•
j,k ~ P
and
j ~ k ,
•
9
j < P with
The only remaining case is
k
~
p.
The
algebra is the same as in the previous case except that the
signs are reversed, no variances appear, and
a12
(or
cr p+1 ,p+2 as the particular case demands) is replaced by
cr 1 ,p+1.
This gives
cr
Y2+j Y2+k
=0
j
<P
j
k ~ P
•
This establishes Theorem 1.1 and a corollary follows
immediately by inspection of the matrix
Corollary 1.1
If
B.
the elements of
P1 = Pa ,
Yare
independent.
1 .5
Illustration
It is appropriate to illustrate Theorem 1 .1 •
choose an example in which
matrix
A
p
=3
and
a2
becomes
-1 .vb
-1 .vb
-1
-1 .vb
-1 .vb
-.vb1 .vb
-1 .vb
-.vb1 .vb
-1 v'b--1 .vb--1 .vb--1
=
1 •
The
We
10
and is orthogonal.
After transformation by
A, the covari-
ance matrix for the new variables is, by Theorem 1.1,
1+(P1+P2)+3p
P1-P2
0
o
o
o
P1-PS
1+(P1+P2)-3p
0
o
o
o
o
o
o
o
o
o
o
0 · 1 - P1
o
o
1 - P1
o
o
o
o
o
o
1 - P2
0
o
o
1 - P2
o
We could have established the theorem by computing
,..,
the product
elements.
A
~
At
for the covariance matrix of the Y
Hence we have the corollary [1J:
Corollary 1 .2
,..,
A 2:: A'
=
,..,
D
•
,..,
,..,
We denote the inverses of 2:: and D by
respectively.
Q
Theorem 1 .2
·e
and
Then, by well known [2J properties of
matrices and determinants
and since
A
= A,-1 A A- 1
A is orthogonal,
•
Q,
\~
CHAPTER II
DENSITY FUNCTIONS
2.1 Density considerations
Apart from the introduction of means, variances.
and covariances, references to distributions have been
avoided in Chapter I.
Let the variables X, ••••• X2p have means zero and
....
positive definite covaJiancematrix ~.the form of which is
specified by (1.9).
Let the joint density be
2p
'E )"..J.J x.x.
J. J
i.j=1
(2.1 )
where A has elements )"ij and is the inverse of the matrix
E•
That is to say, we assume the variables have a joint
{2p)-variate normal distribution. with parameters as specified above.
Let X be a column vector with elements
X, ••••• x2p and
X1 ,Xa, ••• ,x 2p •
x be a column vector with elements
If
Y = AX •
where
A is defined by (1.13),
the joint density of the elements of Y is obtained from
(2.1) by means of the transformation
y
= Ax.
A is orthog-
onal so that the Jacobian of the transformation is unity.
Thus, the elements of
Y
have joint density
11
12
....
where Q is the inverse of D, (1.18), and whose elements are
denoted by OOij.
Here we have used Theorems 1.1 and 1.2.
By a well known theorem in multivariate analysis (page 29
of [1]), the joint density of Y1 and Ya is, therefore,
2
-JrL .L
.
t
~ e
-1
where
B- 1
c .. y.y.
1J 1
J
dY1 dY2
1,J=1
Bt (1.19), and has elements
is the inverse of
c ..•
1J
Let the respective variances of Y1 and Ya be v~ and v~
and the correlation between Y1 and Ya be
~.
Theorem 1.1
indicates that
(2.4)
\If
= 0'2[1
+ (p-1) P1; Pa + pp]
\I~
= 0'2[1
+ (p-1)
P1 ;pa _ pp]
and
•
,.."
Our assumption that Z is positive definite requires the right
hand sides of equations (2.4) to be positive.
Let the columns of the 2 x n
-e
r
11
~21
".Y1~
...
Ya~
matrix
13
be independently and identically distributed according to
(2.3).
The joint density of the variables is
n
2
L L
(2.6)
If
't
CijYiaYja
a=1 i, j=1
=0
dy" ••• dy2n
•
and
n
LY,aY2a
r
(2.7)
= ....
a_=1-;;;=:==:;;;===
n
n
a=1
a=1
LY~;)( LY: a )
-e
we have, from the theory of simple correlation (page 640f[1]),
the distribution of
~
r
as the "Student" t-distribution
.v'1_F:
wi th
n- 1
degrees of freedom.
is a test of
p,
=
P2
versus
A test of
p,
1=
P2
't
=
0 versus
't
1=
and may be made by
means of the standard IItwo-tail" t-test with
n- 1
degrees
of freedom (page 65, [1]).
Later in the discussion we shall adopt a model which
requires
p, = P2
so that the above considerations provide
a partial test of that model •
. 2.2
Canonical correlations
Hotelling [5] has shown that canonical correlations
are the non-negative roots of the determinant equation which,
in terms of the present model, is
0
e
14
.~
>"L11
(2.8)
""
L12
= 0
~
Z21
>"L22
or, by [1 L
•
From this
and
,
a result frequently used by Roy [15], and in which
a
p x p
I
is
identity matrix.
To obtain roots of (2.9), we first examine the matrix
1
L'1
~
elements.
jj'
1
Zj2 ~22 ~21
=p
•
Let
j
be a
Then j'j is a
, and
j'jj'j
1
p x p
= pj'j
•
x p
matrix with unit
matrix with unit elements,
From (1.10), (1.11), and
(1.12), we have
~11
=
(1
P1 )I + P1j'j
~22 = ( 1
P2)I + P2j'j
,
~
(2.10)
2:: 12
Inverses of
= Z21
L11
=
and
pj'j
.....
Z22
•
will be obtained with the
aid of a result by Roy and Sarhan [16] •
diagonal matrix with diagonal elements
be
1 x k
and let
~e
matrices having elements
>..
be a scalar.
graph 4 of [ 16], we have
q.~
Let
D
p.~ ,• let
and
ri
be a (k x k)
q
and
r
, respectively;
Paraphrasing the result in para-
15
Lemma 2.1
If
(2.11 )
C
=
D + ;.. q' r
then
(2.12)
....
....
where q and r are row matrices (1 x k) whose elements are
qi/Pi
and
ri/Pi' respectively.
Direct application of the lemma gives
~_1
11
1
I
= 1 _ P1
P1
1 +~
-
(1)2 . , .
1 _ p.j
J J
1=--ii1
1
= r-
[
P1
I - ,
P1
+ (p-1 ) P 1
jtj]
and similarly
~-1
( 2.14 )
22
= 1 _1 P2
[I
-
P2
. , •]
1 + ( p-1 ) P2 J J
•
These relations together with (2.10) give
_
(1 - P1
p2
)( 1_
P 2)
P~
=
[(I
P1
-
P
2
(1 -P 1 ) (1-p 2)
1
[(1-
j'j)j'j(I
PP1
[(j'j_
( 1 - P 1 ) ( 1 - P 2)
=
l "'" (p-1 ) P1
+ (p-1 ) P 1
-
1
P2
+ (p-1 ) P 2
PP2
j'j)(j'j_
1
+ (p-1 ) P 2
jtj)],
PP1
)(1PP2
)j'jj'j]
1 + (p-1) P1
1 + (p-1 ) P2
,
= [1 +
j'j)j'j],
pp2
j'j
(p-1 ) P1 J[ 1 + (p-1) p 2 ]
•
16
Thus the positive roots of (2.8) are simple functions (square
roots) of the characteristic roots of the matrix given by
the last line of (2.10).
~
with that matrix by
1~21
_
We denote the scalar associated
and write (2.9) as
~j'jl
=
I~(~
I - j'j)1 = 0
•
Hence the square of any root of (2.8) is proportional to some
characteristic root of the matrix j'j, the proportionality
~.
factor being
The general result A.1.18 of [10] applies;
so that the non-negative roots of
for
jj'
= p.
Hence
~2 = ~p
j'j
are the same as those
, and
(2.16)
is the only positive root of (2.8).
Theorem 201
Provided
1
- ---p- 1
unique canonical correlation
We have proved:
< P1, P2 < 1 , there is a
~
, say, where
(2.17)
and the positive square root is taken.
Also if
P, ~ P2
and
2.3 Canonical variates and
P
=0
, we have
gun~i-ca~onical
~
=0
•
correlation
Assumptions of this chapter are compatible with those
-e
of paragraph 1 ~3 provided
P12
=
P1,
Pp +1,p+2
= P2
a,
, and
=
=a
P1,p+1 =
ap+1
and the new notation
P is adopted.
Now
17
assume that the variables X1, ••• ,X2p
satisfy other require-
ments of this chapter and may be written
~
Xi
where
a1
=~
+
~.
E1 ••••• E2p'
= a p+1 = a.
+
E.
•
e: .
and
~
i = 1 ••••• p
i
= P~ ••••• 2p
are defined by paragraph 1.3 with
Equation (1.3) applies and may be written
ppa 2
PZ1Z2
= oV''[(p_1)P1a2+ a2Jl(p-1)P2a2+a21
=
PP
V:-;[;:1=+=;::(p=-=;1);::p=,:;;][;=;1=+=(;=P=-1;=;)=P::2}:;::
2p
p
where
Z,
Ip Z1Z2 1
=
=,
LXi
and
Z2
=
LXi.
By Theorem 2.1.
p+1
1
correlation if
•
which is a unique non-vanishing. canonical
P
~
o.
It is evident that. for the present
model, the population canonical variates are independent of
a 2 • P"
and P2 ; depending only upon general properties of
p •
We arbitrate possible ambiguities by the following conventions:
If
p
is non-negative we take Z, and Z2 as population canonical
variates.
Otherwise, we take Z, and (-Z2).
We shall call the random variables Z1 and Z2 quasicanonical variates and their correlation (regardless of sign)
the quasi-canonical correlation. denoted by
'*.
CHAPTER III
MAXIMUM LIKELIHOOD ESTIMATES
3.1
Likelihood function
Let the columns of the matrix
tive definite, be defined by (1.9) with the restriction
P1
=
P2 , and have elements whose values are unknown.
denote the covariance matrix (with P1
= P2)
by
~
We
and have
~12J
i:: 1 1
where
Z11' Z12'
and
Z21
are defined by (1.10) and (1.12).
The likelihood function [1] of the sample is
2p
n
-~ ~ >.. •. 5 ..
~
L
L
1J 1J
~ e i,j=1
(2'Jt) np
where
A, with elements >.. 1J
•• , is the inverse of Z and
18
19
n
S~J
..
= L
~
x.~a x.Ja •
In this chapter, maximum likelihood esti-
a=1
mates of the parameters c 2 , P1, and p are derived.
3.2 A lemma
Use of the symbol A here differs slightly from the
previous use.
We have imposed the condition
otherwise, the meaning of A is unchanged.
= P2
P1
We shall use the
temporary expedient of denoting the elements of
~
by Cij •
Lemma 3.1
If in
Proof:
.
~
we have
1
P,
=
c2
P1
P1
P1
P1
1
••
P1
P
P
••
••
·.. . •
P
P
• •• P
••
~
ck~
, Aij
=
Ak~
•
Consider the matrix
e
(3.4)
c ij =
•••
•••
• ••
•••
•••
••
1
p
•
P
P
••• P
•••
•• • • •
P •••
1 • ••
•• • • •
P1 • • •
P
••
P
P1
••
•
1
Cofactors of its elements (C ij ) are proportional to the elements (A ji ) of A. The cofactors of c ii for i= 1, ••• ,p are
identical by inspection of (3.4).
When i > P , an even number
of row and column interchanges in the matrix which determines
the cofactor produces a matrix which is identical with the
matrix determining the cofactor of c ~~
.. for
i _< p.
Hence,
the cofactor of c ii equals that for C j j , i,j= 1, •• • ,2p •
Consider two elements c ij and Ci ,j+1 which lie above
the main diagonal and which are equal by definition (3.2).
20
The cofactor of the former is based upon the sUb-matrix obtained from L by deleting the i th row and jth column. The
i th row and (j+1)st columns are deleted in connection with
the element a,1.,J'+1. It is observed that the corresponding
sub-matrices differ only in two rows and that they would be
equal if the appropriate two rows in one of them were interchanged.
Hence, except for sign the torresponding deter-
minants are equal.
The cofactors of
1.J and C1.1.,J• +1 are equal
if both elements lie above the main diagonal and are equal
C1 ••
by (3.2).
Reference to the symmetry of L completes the proof.
-e
3.3 Estimation
The maximum of the likelihood function and its logarithm occur at the same point in the parameter space.
Let
us denote the logarithm of (3.3) by L, adhering to the notaand ~ .. for the elements of L and its inverse,
1.J
1.J
respectively. We have
2p
\' ~ .. S.. ,
L = -np log 2x + ~ log ~I - ~~ lJ
1.J 1.J
i,j=1
tion
(1 ••
and, following customary maximum likelihood technique, put
2p
(3.6)
a;; =~ . .L
aL
n
1., J=1
The equations gLp
~ O~ij
o 1.J
" 00-2
IAI
1
-~
2p
L
i,j=1
a~
..
S .. .....1:.J. = 0
1.J 00- 2
•
= 0 and oL = 0 have a similar form.
°P1
By [1], (Appendix 1, Theorem 7)
21
olAI
(3.7)
-=
0". ..
A..
J.J.
J.J.
~nd
01 AI
(3.8)
- = 2 A..
0".. .
J.J
J.J
where Aij is the cofactor of ".ij , (i,j = 1, ••• ,2p). The
matrix ~ is symmetric and is the inverse of A, so that
A•.
-!J.=O' ..
IAI
JJ.
=0' . .
•
1J
Lemma 3.1 establishes various equalities between elements
Using those results, together with (3~7), (3.8),
1J
and (3.9), the equation (3.6) may be written
A.. . of A.
2p
(3.10)
OA."
ocr 2
\'
~
• .
J.,J=1
(
Sii) 0A.12 \'
cr •. _ - +~. ~
. J.J.
n
ocr 2 • •
1,JeR,
(
Sij)
cr .. - - +
1J
n
o,
I
where the notation
indicates summation over those
i,jeR,
values of the subscripts which satisfy either of the
restrictions
i,j < P , with
(3.11)
I
i,jeR 2
j
,
i ~
j
,
or
i,j > P , with
and
i ~
indicates similar summation for either
22
i
p
~
and
(3.12)
>
j
P ,
or
i
> P and
j ~
p
Analogous development for ~ and --aaL produce results similar
8P1
P
to (3.10). Those equations and (3.10) are satisfied if
~(
L a
i=1
E
(3.13)
i,jeR 1
-e
i
S .. )
ii - -!l
n
~ 2pa 2
(a ~J.. - Sin j )
E (a .. - Sin j )
J
~J
j€R 2
¥L ~
s ..
-
i=1
=
0
n
E
= 2p(p-1)P1a2 _
=2p(p-1 )P2 a2 -
S~ = 0
, and
S ..
E
.21.
i,jeR a n
= 0
•
Equations (3.13) establish
Theorem 3.1
Under our assumptions, the maximum likelihood
estimates of a 2 , P1, and pare
~'L s ~~..
(3.14)
(3.15)
0'2
=
i=1
2pn
A
P1 = -
2
p- 1
(3.16)
where
=:f + S~
+ ••• + S~E
2pn
S12 + Sl a + ••• + Sp_1,p + Sp+1 ,p+2+"·+ -1,2p
-------....;.-.,;.,;;..-.....=---.;..;;..---....;......-...,
S~ + S~ + ••
0
+ S~p
_ 2 S1,p+1 + S1,p+2 + ••• + S p,2p
P
S~ + S~ + ••• + S~p
S?=S
.. (i=1, ••• ,2p).
~
~~
The numerators of the rightmost expressions in (3.14),
(3.15), and (3.16) occur frequently in subsequent discussion
23
and will be denoted by
where
5,2(51 +'S2), and 25 3 , respectively,
-5,
= 5'2 + 5,s + ••• + 5 p- 1 , P and
52 = 5 p +1 ,p+2 + 5 p +1 ,p+3 + ••• + 5 2p _1 ,2p
•
In passing it should be noted that the estimators
p, , and p give the maximum of (3.3) without regard to
the inequalities P1 > 0 and I p!p, I < 1 • Hence, it may
happen that p, < 0 or I P!P1 I > 1 or both.
A2
(J
A
,
A
3.4 Estimates using new variates
Regard the a th column of (3.1) as a column vector,
~
x
(a = 1, ••• ,n) and apply the transformation
a
where
A is defined by (1.13).
Ya =
~
A
This gives
2p
../2p Y1a =
(3.17)
x.
L
i=1
~a
2p
P
.v2p Y2a =
L
i=1
x.~a -
xia
L
i=p+1
with
2p
2
2p y 1a
=
(3.18)
L x~
~a
i=1
+ 2(T 1a + T2a + T3a )
2p
2p y2 =
2a
L
i=1
x.~a + 2(T 1a + T2a - T3a )
where
(3.19
1 a x2p,.a· .
T2a - xp+1 , a x p+ 2 , a .+. •••' .+. "2 p-,
xa '
24
The subscript a will be deleted when no ambiguity arises.
We also have
p
~(p-')p ys
=
L xi
(P-1)X1 -
i=2
•••
p
L xi
~lp-j)(p-j+1) Y2+j - (p-j)X.
J
i=j+1
~(p-j)(p-j+1) Yp+j = (p-j)X 2p _ j +1 -
= 1, ••• ,p-1
for j
Let us examine
xi
i=p+1
p-1
pI
LY~+i
y: =
Ly~+i
p!
and
• We have
i=1
i=1
(p-1)p
L
•
p-l
-e
2p-j
P
(p-1)2 x f +
L xf
i=2
P
- 2[(p-1)
L x1xi - L L
i=2
(3.21 )
• • •
P
p-1
xix k ]
i=2 k=i+1
• • •
• • •
(p-j)(p-j+1)Y~+j - (p-j)xj +
P
L
xf
i=j+1
P
- 2[(p-j)
L
i=j+1
for
y~+j
j = 1, ••• ,p-1.
p-1
xjx i -
L
P
L
xix k ]
i=j+1 k=i+1
We multiply the equation involving
by
p(p_1) ••• (p_j+2)·(p_j_1)1 j = 1, ••• ,p-1 • The
p-1
coefficient of xj in pi
y~+i is b j , say, where
i=1
L
25
bj
=
(p-2)1 + p(p-3)1 + p(p-1)(p-4)1 + ••• +
(3.22)
[p(p-1) ••• (p-j+3)](p-j)1 + p(p-1) ••• (p-j+2) x
(p_j -1 ) 1(p_j )2
with
b j +1 = b j - p(p-1 ) ... (p-j+2) (p-j-1 )1 (p_j) 2
+ p(p-1) ••• (p-j+2)(p-j-1)1
(3.23)
+ p(p-1) ••• (p-j+1)(p_j_2)1(p_j_1)2 ,
=
b. - p(p-1) ••• (p_j+2)(p_j_1)l[(p_j)2 _
J
{1 + (p-j+1)(p-j-1)}] ,
= b.
J
•
But
(3.24)
b 1 = (p-2) 1 (p_1)2 = (p-1) (p-1 ) 1
b2
=
=
(p-2)t +p(p-3)l(p-2)2 = (p-2)1[1 +p(p-2)]
(p-1){p-1)1
We have completed an induction which shows that the squared
p-1
terms of pI
Y~+i each have coefficient (p-1)(p-1)1
L
i=1
Similarities in the last two equations of (3.20) show that
p-1
the squared terms of pi
Y;+i have coefficient (p-1)(p-1)1.
L
i=1
Consider the coefficient of XjX~ for ~ > j in
p-1
the sum pi
Y~+i. We follow the previous pattern and
i=1
denote this coefficient by c j , the subscript ~ being
L
unimportant so long as
~
>
j
•
We have
26
c. +1
~
Cj
=:r
(3.25)
• . .
+ p(p-1) ••• (p-J+2).(p-J-1)I(p-J)
+ p(p-1) ••• (p-j+2)·(p-j-1)1
- p(p-1) ••• (p-j+1)-(p-j-2)1(p-j-1) ,
c.
=~
+ p(p-1) ••• (p-j+2)·(p-j-1)l[p-j+1-(p-j+1)] ,
C.
=
~
•
This, together with
~ = -(p-1)(p-2)1
= -(p-1)1
~ = (p-2)1 - (p-2)p(p-3)1 = (p-2)1[1 - p] ,
(3.26 )
2
= -(p-1)1
completes an induction which shows the coefficient of each
p-1
I \'
2
cross-product term in p. ~ Y2+i
to be [-2(p-1)1]. Hence
i=1
p-1
P
p-1
p
(3.27) pI
Y~+i = (p-1)1[(p-1) X~ - 2
xix k ] •
i=1
i=1
i=1 k=i +1
L
L
L L
The previously mentioned similarities in equation (3.20) give
p-1
(3.28)
pi
L Y~+q =
2p
(p-1) H (p-1)
i=1
L
i=p+1
2p-1
xi - 2
L
2p
L
i=p+1 k=i+1
xix k ]·
We restore the subscript a and use (3.19) to write
2p
2p
.-lL
~
2
\'
2
_ 2 ~2- + T 2a
( 3.29 )
p=1 ~ Yia = ~ xia
p _ 1
i=3
i=1
~
e
based upon (3.27) and (3.28).
Also,
27
2p
p(Y~a + Y~a)
L x~a
=
i=1
(3.30)
p(Yfa - Y~a)
= 2T 3a
+ 2(T 1a + T2a )
'
from (3018).
The paragraph following (3.16) defined the sums
S, S1, Sa, Sa , which refer to summation upon the subscript a •
We use the additional notation
n
2t =
yfa ,
a=1
n
2u =
y~a
a=1
n 2p
2v =
y~a •
a=1 i=3
L
L
"e
L
L
It is further observed that
2p
n
L Lx~a = Sand
a=1 i=1
(3.32)
k
= 1,2,3
•
By summation on both sides of each equation, (3.29) and
(3.30) yield
(3.33)
2p(t + u)
= S + 2(51 + Sa)
2p(t - u)
= 2Sa
~ v
p:1
Then
=S
and
,
_ 2 $1 + Sa
p - 1
•
28
.
S
2= t + u + v
v
S1 + S2 = t + u 1
P
p - 1
(3.34)
-
~ = t - u
p
and
•
Equation (3.34) and Theorem 3.1 establish the following
theorem:
Theorem 3.2
The maximum likelihood estimates of
0 2,
P1
and p
may be written:
we
(3.35)
t +u +v
pn
0"'a
-
'"P1
v
- p-1
=
t +u + v
t +
U
P = t t+ U- u+ v
Since
P
~TJ
,
•
e = pip 1 ,
=
the ratio
pip 1 will be
iV"P1Pa
denoted by
w and proposed as an estimate of Pl;TJ
relation corrected for attenuation. We have
(3.36)
t - u
w = -.-.;;._.......t +
U -
v
P-I
~
which reduces to
(3.37)
.e
t - u
in the special case
P
=
2 •
, the cor-
29
3.5
Asymptotic Distributions
The statistics
a
~1' and
2,
p are functions of the
sample moments and ate consistent estimates of cr 2 , P1, and P
[1].
The same is true for
w as an estimate of
= p~~
P/P1
a
2,
Hence, the means in the asymptotic distributions of
P1,
and pare cr 2 , P1, and P, respectively.
The remainder of this chapter is devoted to the variance of
a
2
and the asymptotic distributions of
a
2,
P1, and p.
The elements of the sample matrix (3.1) and variables obtained from them by transformation may be regarded as random
variables.
The marginal distributions of the elements in
(3.1) are normal [1] with means zero and variances cr 2 •
each variable
Yij
Hence
obtained in paragraph (3,4) is a linear
combination of normal variates and is normally distributed
with mean zero and variance according to the matrix
with
P1
= P2
(j = 1, ••• ,n),
y
We define
•
~
a
to be the variance of
= 1, ••• ,n).
P1 = P2
,(j
PrJ
that, with
Y1j
to be the variance of Y2j (j = 1, ••• ,n), and
to be the common value of the variances of
••• , Y2
D (1.18),
Y3j' Y4j'
Reference to Theorem 1.1 shows
a = cr 2 [1 + (p-1) P1 + pp]
~ =
(3.38)
cr 2 [1 + (p-1) P1 - pp]
y = cr 2 (1 - pd
~e
2
Then
Y1 j
a
2
Y2'
, T'
2
Y3j
y
•
2
Y4j
y
and
,
... ,
2
Y2p,j
y
are independently
•
30
,-
(Corollary 1.1) distributed as X2 variables, each with one
degree of freedom.
The variables 2t, 2u, and 2v as defined
by (3.31) are sums of independent sets of independent X2
variables, the individual X2 variables having one degree of
freedom.
Hence we have:
Theorem 3.4
The random variables 2t, 2u, 2v are independent
as a set and
a.
!a is distributed as
tx 2
with n degr~es of freedom,
with n degrees of freedom,
b.
*
is distributed as
tx 2
c.
_yV
is distributed as
tx 2 with
2(p-1)n degrees of
freedom.
The frequency functions of t, u, and v are therefore
of the form
x
v
x
-lr
e
r( v+1) eV +1
x ~ 0,
e > 0,
v
>0
•
This "gamma" type distribution [13] has mean and variance
e(v + 1) , and
e2 (v +1),
respectively.
and
v
n
= ~,
n
The variables
t, u, and v
require
e = a,p,y
2(p-1)n (th e or d er ~s
. t ,U,v.
)
~'2
Table 3.1 contains the means and variances of some
functions which appear in (3.35) and (3.36) and will be used
in computing the variances of
a
2,
P1,
and
P•
31
Table 3.1
Means and Variances
(Ancillary)
Function
t - u
t +
U
+
v
t+u+-Y..1
p-
Mean
Variance
~(a - ~)
~(a2 + ~2)
~[a + Wt. 2(p.d )y]
~[a2 +
~2
+ 2(p_1)y2]
~[a +
~[a2 +
~2
+ 2X~ ]
p-
~
- 2y]
The following covariances are useful also:
cov(t - u, t + u + v) = ~(a2 _ ~2)
,
cov(t+u -p~1' t+u+v) = ~(a2 +
_
~2
cov(t-u, t+u-p~1) =~(a2_
Using Theorem 3.2, the variance of
(3.40)
2y 2)
,
•
a
2
is
Var(a 2 ) = -JL- [a 2 + ~2 + 2(p_1)y2]
2p 2 n
an exact expression.
By Theorem 3.2 and equations (3.17),
(3.18), ~nd t3.31), it is seen that &2 is a linear function
of sample second order moments from a multivariate normal
distribution.
It follows [3] that, for large n,
a
2
is
approximately normally distributed about mean a 2 , with variance given by (3.40)0
... 1, ...p,
P
The asymptotic distributions of the estimators
and w will be obtained by means of a lemma which is a
special case of a general result by Hoeffding and Robbins [4].
•
32
Let
(3.41)
•••
be a sequence of random vectors (real elements) which are
independently and identically distributed.
For all values
Xi* and Y i* have zero means,
finite third absolute moments, variances afs and af2
of i, let the vector elements
(respectively), and covariance a'2'
*
H(x,y)
Finally, let
x + 6,
e
+ 2
=y
where 6, and 6 2 are constants with
62
~
o.
derivatives are continuous at the point (0,0).
H(x,y) and its
All condi-
tions of Theorem 4 from the cited research [4] are satisfied.
Hence,
Lemma 3.1
As
n
-+ 00 ,
the function
has a limiting normal distribution in which the mean is zero
and the variance is
where H, and H2 are the first order derivatives with respect
..
to x and y, evaluated at the point (0,0) •
Each of the estimators
in the form
P1, p,
and w may be written
•
33
Q(N,D) =15N
n (N
=
-1n (D 1
nv ) + v
,
nb) + b
where (n v ) and (nb) are the respective means of Nand D and
each of the functions (N - nv ) and (D - nb) is a sum of n
identically and independently distributed random variables
whose means are zero and third absolute moments are finite.
This may be verified by inspection of equations (3.35), (3.36),
(3.17) and (3.18).
.-
By appropriate pairing of a random vari-
able from (N - nv ) with a random variable from (D - nb), we
can construct a sequence of random vectors with the properties
of (3.41).
We have assumed that the covariance matrix L is positive definite so that
a,~,y
> o. By Table 3.1, then,
> 0 in the cases of P1 and p. The estimator w is used
only when P1 > 0 and Ip/p1 I < 1 • These inequalities
b
lead to
1 - P1
< 1 + (p-1) P1
Hence, we have
for which
a,~
-
pip I ~ 1 + (p-1) P1 + pip I
> y > 0 and a +
w is appropriate; and
~
- 2y
•
> 0 in cases
b >0 •
The conditions of Lemma 3.1 are satisfied in all three
cases.
·e
The variance of N (and D), in the present context,
is
n times the variance of one of its
components.
n
independent
•
34
In the sequence (3.41) which is established from
(N - nv ) and (D - nb), denote the vector elements from the
former by Xi* and those from the latter by Y*
i • We observe
** ) is zero.
that, if i 1= j , the expected value of (XiY
j
Hence, the covariance of Nand
D
is
n
times the covariance
of X.*
and y ~*. •
~
This completes preparation for applying Lemma 3.1 to
the functions
Q given by (3.42).
We have:
j..emma 3.2
~
(Q(N,D) - Q(n v , nbJ
has a limiting normal distribution in which the mean is zero
-e
and the variance is
_1
(]
nb2
N
=
+ _v 2
_ 2 _v
nbs
~~
nb 4
2i
-
(]2
D
~~D) + (V ~
2
Let us apply this result to
~1
•
with the aid of
equation 3.35, Table 3.1, and the covariance information
which follows that table.
distribution of
a +
~
2
+ 2(p_1)ya
(3.44)
-e
=
~(~1
2
The variance in the asymptotic
- P1)
is
[a 2 + R2 +~2 _ 2P1 (a 2 + Q2 _ 2y~)
~
p~
+ p~(a2 + ~2 + 2(p_1)y2)]
[a+~+2(p_1)y]2
{(a2+~2)(1_P1}2+~[1+(p_1}P1]2}
p-
•
35
An interesting sidelight is the fact that if
p~~ = 0
, the variance (3.44) reduces to (1
=
p
2
and
2 pf)2.
It
follows that the variance in the large sample distribution
of P1 under this set of special conditions is
(1 _ pf)2
•
2n
This expression is equivalent to the asymptotic variance of
a sample product-moment correlation coefficient for a sample
of size 2n from a bivariate normal distribution.
Similarly, the respective variances in the asymptotic
distributions of ~(p - p)
(3.45)
0_
and ~(w - p~~)
are
2
{a2(1_ p)2+ ~2(1 + p)2+ 2(p-1 )y2 p2}
[a + ~ + 2(p-1 )y]2
and
(3.46 )
•
The preceding discussion is summarized by
Theorem 3.5
If
P1 > 0
/p/p1' < 1 , each of the
and
variables
m(p - p)
,
and
m(w - p~ )
c:.~
has a limiting normal distribution as
n
~oo;
the means
in these respective distributions are zero; and the corresponding variances are given by (3.44), (3.45) and (3.46).
·e
It follows that the asymptotic distributions for
P1,
p, and ware normal, have means P1, p, and
P~~
,
•
36
respectively, the variances being easily obtained from
(3.44), (3.45) and (3.46).
Theorems (3.2) and (3.4) together with (3.36) and
the definitions of a,
~,
and y indicate that p"
may be expressed directly as functions of
and that each is independent of
p, and w
(tx 2 )-variables
Hence 'when distribu-
0'2.
tions for P1' p, and ware discussed,there is no loss of
generality in assuming that
=1
d2
•
3.6 Distribution of the sample quasi-canonical correlation
Developments in the present chapter permit us to
0_
comment upon paragraph (2.3) and, earlier, Theorem 2.1.
Expression (2.17) gave the canonical correlation as a function of the parameters P1, P2, and
Ipl.
Paragraph (2.3)gives
the quasi -canonical variates Z, and Z2 when
p
> o. It
was assumed in those discussions that the basic variables
X, , ••• ,X2p had the particular structure given in paragraph
(2.3).
We make the corresponding assumption now for the
elements of (3.1).
Let p be positive and the columns of (3.1) be transp
formed by
Z1j
=
Lxij
i=1
(i
= 3, ••• ,2p)
for j
the variables Zij (i
, Z2j
=
?
i=p+1
= 1, ••• ,n.
= 1, ••• ,2p;
x ij •
and
z ..
J.J
=
x ..
The joint distribution of
j
= 1, ••• ,n) is multi-
variate normal with means zero and covariance matrix
say.
J.J
Then by [1] (page 24), Z1j and Z2j have a joint
"
•
37
bivariate normal distribution in which the means are zero,
the variances can be specified, and the correlation coefficient
-is
the
quasi.canonical correlation (2.18).
canonical correlation by ~*.
We denote this
The parameter ~*is estimated
by z , say, where
n
LZ1j
(3.48)
z -
Z2j
j=1
is distributed like a standard sample product moment corre-
0_
lation coefficient, the distribution having parameters ~*
and n.
In addition to the above assumptions, let
Then the variances of Z1 and Z2 are equal.
see that [15]
(Z, -
~
*Z2)
P1
= P2
•
It is easy to
and Z2 are uncorrelated and have
a joint bivariate normal distribution with means zero and
positive variances. Let r * be the sample product moment
~
correlation between (Z1 -
*Z2)
and Z2.
Then, following
the development of Roy [15, page 93], we note that
r *~
,vi _ r*2
= t , say,
has a "Student" t-distribution with
n- 1
degrees of' free-
d om and that we can set con f idence intervals about
.... 2
""2
Y'*.
~
Let
S, and s, be sample variances for z, and Za respectively.
Then with confidence
1-
(1
•
38
where t a is the value of t (in the t-distribution) for which
P(t > t a )
e
= ~
when (n - 1) degrees of freedom are used ..
3.7
Alternate derivation of estimates
It was seen in Theorem 3.4 that the statistic
be expressed as a function of independent
specified parameters.
w may
tx 2 variables with
These variables are t, u, and v and
we denote their joint frequency by
,
with
t, u, v
Consi~er
function
~
0 , a,
~,
m = 2(p-1)n •
y > 0 , and
the space of the parameters a,
g
~,
and y.
The
is continuous with respect to the parameters
and its derivatives with respect to the parameters are continuous.
Further, as one or more of the parameters tends
to 0 or tends
to~,
g
=
o.
Since
9
is non-negative it
therefore must have a smooth maximum at which its derivatives
with respect to a,
that
g
~,
and yare zero.
In fact we shall see
has a unique maximum in the octant
a,~,y
>0 •
•
39
Let
log (g)
=
M so that
- ~ log Y- (~+ ~ +~) + (~- 1) log tu + (~ - 1 )log v,
and
oM
ra=
aM
(3.53)
-=
a~
n 1 +.:L
a2
-~a
n 1 +lL
- -2
~
aM
ay
m1
~2
v
- = -~Y+ya
Since
g
and
log g
,
•
have maxima at the same point in the
parameter space, we maximize
g
equal to zero and solving for a,
by setting equations (3.5)
~,
and y.
The unique solu-
tion is
2t
a = n2u
~ =n
2v
y =m
A
(3.54)
A
A
•
Let us put
and determine values of a 2 , P1, and P which satisfy these
equations.
Direct substitution from equations (3.35) verifies
P1
=
P1, and
A
p=p
provide a solution.
40
Anomalies of
3.8
~,
and w
Before proceeding to the distribution of w,we shall
examine the function g in thep"p plane.
out that g vanishes as a,
~,
jointly or independently.
a,
~t
It has been pointed
and y tend to zero either
This fact and the definitions of
and y indicate that g must vanish along the lines
p
=_1
+ ( p- 1 ) P1
P
(3.56)
P = 1 + ( p- 1 ) P 1
P
p,
=
1
in the p"p plane.
These lines form a triangle illustrated
by the largest triangle shown in Figure 3.1.
Figure 3.1
Region in which
g >0
p
(1 , 1 )
I---~.
(1,-1)
p,
41
Outside and on the boundaries of the largest triangle in
the figure, g
and
g >
o.
= o.
Within that triangle
a,
~,
y >0
The smaller, unshaded, isosceles triangle
indicates the region in which
= /p/p,'
Ip~'
<;of}
< 1 , the
-
equality holding on the non-vertical boundaries.
and on the boundaries of this smaller triangle
/p/p1 I
It is evident that
~ 1
Inside
a,~
-> Y
•
does not imply
/p/p1 I < 1 necessarily. Either or both of the following
inequalities may be satisfied:
-e
I pip
b.
l'
>
1
•
The structure which has been assumed prevents P1 < 0 , so
that ...p, < 0 cannot be used directly to estimate P1 > o •
Likewise, w cannot be used directly to estimate
we have /pip 1 / - /wl > 1 or when ...P1 < 0 •
Pl;f} when
It may be possible to derive estimates which do not
possess the present anomalies--perhaps, by combining the
method of maximum likelihood with the use of some method
to introduce the conditions
(or alternately
a,~ ~
P1 > 0
and
/p/p1 I
~ 1
y).
The practicing statistician may be able to alleviate
the trouble by taking advantage of another property shown
by Figure 3.1.
-e
The left vertex of the largest triangle is
( -1
0), so that increasing values of p
1'"
pdiminish the size of the shaded area. However, the precise
at the point
42
effects of increasing the value of p have not been investigated.
Further guidance in these matters is provided by the
consistency property of each of the present estimates.
the sample size. n. increases.
bility to P1 and
P~~.
~1
As
and w converge in probaba-
respectively.
CHAPTER IV
DISTRIBUTION OF THE STATISTIC w
4.1
General
Much of Chapter III was devoted to expressing the
statistic
w in terms of independent variables
t, u, and v.
The development culminated in (3.36), giving
t - u
w = --------- •
v
t +U - ~
P-I
(4.1 )
We shall use capital letters to denote random variables
which correspond to
t, u, v, and w so that the smaller
letters may be used in integration processes.
For example,
we let the cumulative probability function of W be
F(w) = p(W < w) •
The frequency
functions of T, U, and V are given by
(3.39) and their joint density is
for
t,u,v
Let
(p-1).
~
0 ;
t
a,~,y
= t',
u
=
> 0 ; and m = 2(p-1)n
•
u', and v = (p-1)v', with Jacobian
Then (4.2) becomes
43
44
n 1
m 1 _(.i: +~+~)
K(ttut)~- (vt)~- e a
p
y dttdutdv
where
yl
=
y/(p-1)
where
VI
=
V/(p-1) •
The ratio
'
(T - U)/(T + U _ V')
is
positive or negative according to whether numerator and
denominator have like or unlike signs.
valued, having range
The ratio is real
tOO •
We shall find it convenient to omit the prime, ('),
-e
associated with the variables and yl.
The notation will be
renewed when clarity of expression requires it.
Section 4.4
of this chapter is the first in which the primes reoccur.
Now let us put
w>
o.
Then
(T - U)/(T + U - V)
<w
if the ratio is negative or if either of the following sets
of inequalities is satisfied
T >U
-
T +U>V
Vw
,.
(w - 1 )T + (w + 1 )U < wV
•
(w - 1 )T + (w + 1 )U
~
or
T <U
(4.6)
-e
T +U<V
•
45
Geometry
4.2
The above sets of restrictions define boundaries of
regions in the t, u, v space over which (4.3) must be integrated to derive the probability (4.4).
The above discus-
sion, taken with late developments in Chapter III, reduce
the multiplicity of the integration to 3, so that three
dimensional geometric figures can be used to summarize the
problem and to guide the integration processes.
Decreasing or increasing the value of
w rotates the
plane
we
(4.7)
(w-1)t + (w+1)u - wv = 0
about the common intereection--clockwise for decreasing wand
counterclockwise for increasing w.
coincides with
t - u
=0
with
o.
When
t +u - v =
v=O
; as
it approaches coincidence
u=t(1-w)/(1+w) ,v=O.
V< T +U ,
-
(4.8)
00
0 < w < 1 , the intersection of
and (4.7) is the line
We have, for
w -+
At w = 0 , the plane
-
P(w> -Iwl) = p[lwlv < (lwl-1)U + (Iwl +1)T]
•
This could have been obtained from (4.6) by interchanging
the roles of T and U and substituting Iwl for
relation (4.8), the illustration of Figure
4.1~
w.
The
and the
density (4.3) each suggests that integration results for
·e
w>0
can be modified to give results for the case
Specifically, if we write
F(w)
=1
- F(~,a,y; Iwl) •
F(w)
= F(a,~,y;
w< 0 •
w) and have
w< 0 ,
46
Only the positive octant of the t,u,v space concerns
us.
Boundary planes such as those defined by (4.5) and
(4.6) extend indefinitely in the positive directions.
To
depict the situation, we truncate the planes
t
v
wV
=u
=t +u
= (w - 1 ) t + (W + 1 ) u
in Figure 4.1, showing only the unit cube.
Figure 4.1
Boundaries for Integration Regions
v
r-+-. t
+u- v=O
u
The origin and the point (1,1,2) satisfy equations (4.9).
Hence, the corresponding planes of (4.9) intersect in a line.
47
Suppose
not exceed
0
< w < 1 • The probability that W does
w is given by
P(W < w)
-
00
t bJ+cU
=SS
0 at 0
f dt du dv
00 00 00
SSS
+
o t bt+cu
(4.10)
00 00
+
f
f dt du dv
ot 0
00
where
t+u
SSS
+
f dt du dv
too
SSS
o 0 t+u
f dt du dv
is the frequency function determined from (4.3)
by deleting the differential elements and recalling that
the primes (,) are omitted temporarily.
w+1
b =w-1
w
,and c = w •
Here,
a
=~
+='
We shall use the first integral on the right hand
side of (4.10) to illustrate other geometric concepts.
have
00
t bt+cu
SS S
o at 0
f dt du dv ,
and transform by
(4.12)
t
= azx
u
= ~zy
v
=
which has Jacobian
yz(1 - x - y)
a~yz2.
The integrand becomes
We
48
n
(4.13)
( xX)
m
~-1
( 1 - x - X)
~-1
z
n+
m
~
- 1
e -z
•
The transformation was chosen to yield (t/a)+(u/~)+(v/y)=z •
Before the transformation, the region of integration was
defined by
t,u,v
~
0
<u <t
o < v < bt + cu •
(4.14)
at
Substituting from (4.12), these inequalities define the new
region as
x,y,z
(4.15)
<
y
~
~
y
~
0
1 - x
-< Y < ~~
>
x
X _ x+ ba
- y+
c~
x
y + c~
•
is restricted only by
z >0 .
.-
Hence, we
The variable
z
eliminate
by integration over the entire positive range.
z
The remaining integration becomes
(4.16)
the region
S
being illustrated in Figure 4.2 by the un shaded
area of the largest triangle shown.
49
Figure 4.2
Region S in the xy-p1ane
(0,1 )
.....::.-_ _-.-Jr---
5
'=
y =
.g.~
y
= Taa
x
x
1 - x
x
0·
"-
We note that (4.16) is proportional to
(4.17)
•
If the region S contained all of the area between the co- '
ordinate axes and the line
y
=
1- x , the integral (4.17)
could be evaluated as a Dirichlet type integral.
Hence we
regard (4.17) as a truncated Dirichlet integral in the
sam~
sense that an incomplete Beta-function may be regarded as a
2-dimensiona1 truncated Dirichlet integral.
There is little
hope for a tidy, easily computed solution to (4.16), but
the integral can be expressed as a function of incomplete
Beta-functions--as wi11'be shown in paragraph 4.3.
If the variables T, U, and V are transformed in'the
same manners as (4.12),
50
(4.18)
T - U
T +U - v
=
ax - .I?...
Y
_
(a +y)x + (~ +y)y - y
and (4.13) indicates that x and y (as random variables)
are not independent.
However, if we transform by
= , x_ y
r
,
with Jacobian (1 - S), the integrand, (4.13), becomes proportional to
r
~ - 1 ( 1 _ r) n-1
0
~-1 (1 _ S) n+~-1
S
Hence, random variables corresponding to
independent.
rand
•
s
are
The new region of integration is bounded by
non-linear curves and will not be used in further developments.
(4.20)
Also
ar(1 - S) - @~S ___
ax - @y
_
(a+y)x + (~+y)y - y
(a+y)r{1-S) + {~+y)S - Y
is a result of the transformation, but will receive no
further study in this paper.
4.3 Lemmas for the distribution of
w
Reference is made to Figure 4.1, observations which
follow equation (4.9), and to the example provided by
equation (4.10).
Cumulative probabilities for
W = (T - U)/(T +U - V)
·e
can be expressed as functions of
four types of integrals, viz. ,
00 at 00
S S S , S S S, j j
000000
(4.21)
00000
0
j
st
~t+cu,
00000
~t+cu
0
•
51
The first of these is unity since the integrand is a frequency function.
The remaining three will be evaluated
in sequence and will be denoted by
An,m(a,~,y; a)
(4.22)
Cn,m(a,~,y; b,c)
,
Dn,m(a,~,y; a,b,c)
The symbols
(4~22)
and
•
will be abbreviated A, C, and D when no
ambiguity results; and the incomplete Beta-function,
j
(4.23)
o
x~-1(1 _ x)~-1 dx
will be denoted by
B
(~, ~)
•
,
The density (4.3) is repeated
(without the primes) here for future reference:
(4.24) .
where
m = 2(p-1)n
shall write (4.22) as
4.3.1
and we
and
f dt du dv
The inte9ral A
00
A=
We have
at
n
·e
where
00
SS S
000
00
=
when convenient.
K
A
at
SS
o
,
f dt du dv
(tu)
~_ 1
e
-(.1+#)
a P dt du
0
K = {(a~)~ (I'(~p2}-1
A
•
Now let
52
t
(4.26)
=
~
u =
the Jacobian being
=
a 5(1 - r)
s r
a~~.
rt n )
ri
[r(~)]
This gives
n
aa
aa+~ ~ - 1
S
2
0
r
(1 - r)
~- 1
dr.
Hence,
we
Lemma 4.1
r{n)
(4.27)
A
4.3.2
B
where
n)
= [f(~)]. a:~ jl" 2' 2
The Integral C.
C = K
C
•
Expressions (4.12), (4.13) and (4.16)
indicate that we may write
*4.28)
(n
S S
S'
(xy)
n
~-1
m
(1 -
I<e = r(n+~f[r \~) J"r(~)
x - y)
and S I
The integration limits for C give
t,u,v
~
0
,
o < v < bt + cu ,
and substitution from (4.12) leads to
~-1
dx dy ,
will be defined.
53
,
x,y > 0
1 - x
>Y2:
X
y + c~
-
y + ba x
r +
c~
as illustrated by Figure 4.3.
Figure 4.3
Region SI in the x,y-plane
"-
r---y=1_x
01---4------~
It is recalled that
integer.
m = 2(p - 1)n
x
so that
~
must be an
!!! - 1
This fact permits the expansion of (1 _ x _ y)2
in the terminating series
m
~ -1
(4.29)
m
\' ( ~ •-~
L
1)
(x
..
+ y)~(-1)~
i=O
m
~ -1
1) (
=L
i=O
-e
Multiplying by
Kc(x y)
n
~-1
~J'.)
. . yJ.
x~-J
•
and integrating term by term,
the integrals involved are of the form
54
(4.30)
where
t
(4.31)
~ + i - j,
=
n
q=~
+ J"
and
•
Figure 4.3 indicates that (4.30) is the difference between
a Beta-function and an integral which, except for the matter
of scale, is a Beta-function.
With reference to the second
integral, we shall have
x,y
(4.32)
~
y <
0
X
Y + c~
X + ba x
Y + c~
•
We make the change of scale by letting
r
=
X + ba x
Y
s = X + eli y
Y
[y/(y + ba}][y/(y +
with Jacobian
c~)J •
Use of that trans-
formation in (4.30) and combining the result with (4.28)
and (4.29) gives
Lemma 4.2
(4.33)
-e
c
=
f(t) r(q)
r(t+q+1}
55
4.3.3
The Integral D.
development.
(4.34)
Evaluation of D follows the same
We begin with
o
= K
S -S' (xy)
~-1
C 5"
(1 - x - y)
~-1
dx dy ,
where
KC is defined following (4.28) and the region 5"
is obtained from
t,u,v > 0
o<
o<
(4.35)
,
u < at
v < bt + cu
which lead to
x,y > 0
~e
X
o~
X+
y + c~
y +
(4.36)
y <
aa' X
T
ba x < y < 1 - x
c~
-
•
The region defined by (4.36) is illustrated by the un shaded
region of Figure 4.4.
Figure 4.4
Region 5" in the x,y-plane
y
(0,1 )
o
y =
Taa
x
./
X
56
The region I, shaded with
dots~
and the point
(~,
0) are
shown because of their usefulness in subsequent development.
In fact, integration over S" may be accomplished by
subtracting the result of integrating over T from the result
S" + T •
of integrating over
We have, after duplicating the step indicated by
(4.29) and (4.30)
S S xt - 1yq-1
S"+T
aa x
= ~S l:'S
o
-e
dx dy
0
x~-
0
1
1
1 1 -x 0 1
1
yq- dx dy + S S x~- yq- dx dy ,
~
0
(4.37)
where
~
lines
y
is the x-coordinate of the intersection of the
=
¥
x
and
y
= 1 -
x.
Specifically,
l
(4.38)
~
=--L~
•
+ aa
After a change of scale for each variable, integration over
the region T is similar to that for the region
Hence, we let
r =
(4.39)
with Jacobian ~2
t
x
*
s
=
,
where
,
S" + T •
•
57
,,X
. - y +
(4.40)
I.L
The line
y
=
c~
,
Y + ba
- y + c~
if x
•
transforms into
s
= if
~r
coordinate of its intersection with the line
r = 6,
and the
s
= ~ , the last relation defining
r
= 1 - r is
6, •
Then
5 S xt-1 Yq-1 dx
T
=
(4.41)
(1)
t
aa~
r
T
~q {51 J r t - 1 s q- 1dr
6
I.L
+
t'o 0f (1
t
_ (1)
I.L
dY
~q
0
ds
0
- r)t-1 s q-1 dr dS}
t+q
{( aa A.) q
If
The results (4.37) and
6,
q(t+q)
(4~41),
+
.1
B
q 6,
(t
,
q+1) }
•
combined with steps parallel-
ing those which lead to (4.29), give
where the symbols have been defined previouslyo
58
4.4 Distribution of w
It was shown in paragraph (4.2) that the cumulative
distribution of
and D.
W can be expressed as a function of A, C,
To use these integrals it is necessary to restore
the prime notation, (,).
We shall associate the symbol w
with the random variable Wand denote the latter's cumulative
distribution function by
F(w) •
Consider the case
P(W
~
w)
0
< w < 1 • The probability
was discussed in connection with (4&10) which gave
P(W ~ w)
=
oo"t' b"t'+cu'
S]}
f dt'du'dv'
Oat' 0
00 00
+
SS S
f
o t' bt'+cu'
0000
+
dt'du'dv'
t'+u'
SS S
o
00
+
00
t' 0
t'
f dt'du'dv'
00
55 S
o 0 ti+u'
f dt'du'dv i
•
Each of these integrals will be examined in turn.
We have
00
t' bt t +cu '
55 S
Oat' 0
(4 .. 44)
00
=
S5 S
000
where
00
at' bt'+cu
SS
S
000
- Dn,m(a,~,y,; 1,b,c)
-
-e
t' bt'+cu'
Dn,m(a,~,y,;
a,b,c)
,
59
p
1
a
2(p-1) w
=
1
+
P
2(p-1)
w
P
b
-~"'!""w-1
= 2(p-1)
p
p
c
2(p-1 )
w + 1
= --------
p
---w
2(p-1)
a
= [1 - (p-1)p,
+
PpJ
~
= [1 + (p-1) p1
-
pp]
y' = ~
(1 - p,)
p-I
,
,
and
The second integral of (4.43) is
00 00
00
JS 5
o t' bt'+cu'
(4.46)
t
00 00 00
00
00
000
000
00 00
bt' +cu' + OOs t' Qt' +cu '
=5SS- J 5J - SS5
=
1 -
An,m(a,~,y,;
1)
-
Cn,m(a,~,y,;
b,c)
+ Dn,m(a,p,y'; 1,b,c)
The next integral is
·e
J
000
J
000
•
60
00 00
SS
t t +u t
00 00
o tt 0
(4.47)
t' +u'
=S S S
.f
000
00
t' tt+u '
- SS S
000
Finally,
00
tt
00
00
tt
00
00
t' tt+ u '
JS J
= S S S - S J .\
o 0 tt+ut
00000 0
(4.48)
We combine this result with (4.44), (4.46) and (4.47)
to obtain
-e
Theorem 4.1.
For
<w<1 ,
0
(4.49)
If we denote the right hand side of (4049) by
F(a,~,ytj w),
F(-Iwl) = 1 - F(~,a,yt,lwl).
tions of
a, b, and c
place of
w when
w
The case for
[The defini-
must be interpreted with Iwl in
< 0 .]
w >1
gives
F(w)
as (4.43) with the following exception.
in the same form
The first integral
on the right hand side of (4.43) must be replaced by
\
(4.50)
oos Jt bstt+CU'
000
f dt'dutdv'
= Dn,m (a '1-"R
yt.,
1
"
b c) •
61
This case gives
F(w)
=1 +
2Dn,m(a,~,-,;
1,b,c)
- 2Dn,m(a,~,y ; 1,1,1)
F(-w) , w ~ 1 , is obtained by appropriate
The value of
changes in the roles of
4.5
•
a
and
~,
as in the previous case.
Illustration
Computation by means of the functions A, C, and D
-e
will be illustrated next.
in which
P1
=
n
.9.
P
4,
=
We take
generality.
= 2,
Let us consider a numerical example
and reliability coefficient is
c = 1
since this introduces no loss of
The only remaining parameter is the correlation
corrected for attenuation which, for this example, we take
to be
P~~ =
.5 •
These assumptions establish
(4052)
a
=1
~
=
+ (P-1)P1 + PP1P~~
1 + (p-1) P 1
y = 1
P1
= 0.1
=
2.8
= 1
•
These values may be substituted into equation (4.27) to find
A
=6
B 2.8a
(2,2)
•
(, + 2.13a)
Evaluation of the incomplete Beta-function appearing on the
right hand side is accomplished by reference to [14].
62
The functions C and D will be used to compute
P(W
~ 1).
Interest in this problem stems from the con-
sideration of W (or, alternately, the statistic
Of course,
estimate of the correlation coefficient
IPe~1 ~ 1
w) as an
and the probability that its "estimate" shall
exceed unity should be considered.
An upper bound for
P{W> 1) can be obtained in the
general case through the use of C alone.
p(w> 1)
= P(T
+ U - VI > 0
We have
and
2U - VI < 0)
+ P{T + U - V' < 0 and 2U _ V' > 0)
~
P(2U < V') + P{T + U < V')
•
This may be written
2 _ p(V' < 2U) - p(V' < T + U)
p(w> 1)
~
P(W> 1)
~ 1
or
-
C(a,~,y'
; 02) -
C{a,~,y; 1 , 1 ) •
Returning to the example, we find
yl
(4.54)
=
and
00
l?(W ~ 1)
'If
p - 1
t' t' +u'
=S S S
o 0 2u
00 00
+
.
=y
f dt'du'dv '
2u I
SS S
f
o t' tl+u'
dt'du'dv'
as may be determined from Figure 4.1.
The right hand side of the last equation may be
written
63
00
S r' 0fU + S5 fut 5 J' fU'
t t t t +u t
SS S
°° °
0000
(4.56)
°°°
°°
t'+u'
- SSS
00
+
t' t'+u'
SS S
°° °
°°°
-
°° °
2[D(1,1,1) - D(1,0,2)] + C(0,2)
C(1,1)
where the subscripts nand m together with the first three
arguments of each function have been deleted to simplify
the notation.
Figure 4.1 indicates that the region of in-
tegration for
D(1,1,1)
Hence, we must have
values of
C(1,1)
includes that for
D(1,1,1)
and
C(0,2)
~
D(1,O,2).
D(1,0,2) •
The relative
cannot be determined from
the figure.
It is seen from (4.56) that we shall encounter
two cases:
a = b = c = 1
and
a = 1, b = 0, c
=2
•
Table 4.1 contains pertinent information for each of the
cases.
Completed work sheets for the computing
p(W > 1)
are appended to this chapter as Tables 4.2, ••• , 4.12 •
The tables are related and must be used with equations
(4.33), (4.42), and (4.56) to determine the desired
probability.
C(1,1) is obtained by combining items labeled 5
",.
in Table 4.4 with items labeled 3 in Table 4.3.
-e
this is done and the result multiplied by
When
64
Table 4.1
Ancillary Data
Function
aa
a=b=c= 1
a
=
1, b
= 0,
2.8
2.8
~=\
0.090909
0.047619
X + b~ = ~
y+c
2.636363
0.047619
-\
0.034482
1 .0
~
0.26316
0.26316
~1
0.797101
0.882353
T
-e
Case
~
c = 2
65
C(1,1)
= 840(0.001190) = 0.999600
•
Similarly, using items 2 in Table 4.5, we have
C(0,2)
840(0.001157)
=
=
0.971880 •
Had the interest been in finding an upper bound, we
have:
P(W> 1)
~
2 - C(0,2) - C(1,1) - .0285 (from the
paragraph which precedes (4.54) ).
The quantity
band c.
= ~/(~
+ aa)
is not a function of
Thus, after examining equation (4.42), we con-
clude that parts of
-e
6
D(1,1,1) and
when we take the difference
D(1,O,2)
will cancel
D(1,1,1) - D(1,0,2) •
Computa-
tion of the part which cancels is unnecessary but has been
completed for illustrative purposes.
Where values of the
incomplete Beta-function were required, linear interpolation
in [14] was used.
Items 5 in Table 4.7 and 3 in Table 4.10 are combined
with items 3 in Table 4.3 to-yield
D(1,1,1)
= 840(0.000577
- 0)
=
.467880
0
Similarly, using Tables 4.7 (again) and 4.12 with
Table 4.3, we have
D(1,O,2)
=
840(0.0000557 - 0.000028)
The computation of
p(w> 1)
=
0.444360 •
is complete except for
substituting our results in equation (4.56).
(4.57)
P(W> 1)
= 2(0.467880
=
0.01932 •
This gives
- 0.444360) + 0.971880
- 9.999600 ,
\
e
66
Table 4.2
General Data (1)
i
It
j
em
0
1
2
3
;
0
-e
1
2
3
1
2
2
2
3
3
4
5
4
8
1
3
2
2
2
3
3
3
4
5
5
10
4
5
15
1
4
2
3
3
4
4
5
3
6
2
4
5
6
6
12
18
24
1
2
5
2
4
3
2
3
3
3
7
4
4
5
5
6
7
7
7
14
21
28
35
2
3
4
5
·e
Item
Item
Item
Item
Item
1
2
3
4
5
is
is
is
is
is
j •
2 + i
2 + j •
3 + j •
4 + i •
(2 + j)(4 + i) •
67
•
Table 4 .. 3
General Data (2)
e
68
Ii
Table 404
Computation - C(1,1)
i
4_
It
j
em
2
0
1
2
3
4
5
0.001189
0.008264
0.000010
0.999990
0.041666
1
1
2
3
4
5
0.000004
0.008264
0
1.000000
0.016667
0.001189
0.000751
0
1.000000
0.016667
1
2
3
4
5
0
0.008264
0
1.000000
0,,008333
0.000004
00000751
0
1 0000000
Og005555
0.001189
0.000068
0
1 ~OOOOOO
0.008333
1
2
3
4
5
0
0.008264
0
1.000000
0.004761
0
0.000751
0
1.000000
0.002380
0.000004
0.000068
0
1.000000
0.002380
2
3
Item 1 is
-e!
1
0
Item 3
is
Cy Iba)~
·
Item 2 is
~y h"f(y lc~)1·
Item 5 is
r(t) r(q)
r (t + q)
Cy lc~)q
Item 4 is
3
0.001189
0.000006
0
1.000000
0.004761
•
G-'CY4tJ\yJci>~
G- (vh-J(~)~.
•
e
..
69
Table 4.5
Computation - C(0,2)
i
1
0
2
0.997732
0.041572
1
2
0.997732
0.016629
0.999892
0.016665
2
1
2
0.997732
0.008314
0.999892
0.005554
0.999995
0.008333
3
1
2
0.997732
0.004751
0.999892
0.002381
0.999995
0.002381
1
-e
em
1
2
0
~e
j
It
- Cy If))q
Item 1 is
1
Item 2 is
['(-1-) ['(q) ~ _
r
(t + q + 1 )
•
Coy +ba
r _fl.
J
3
1 .000000
0 .. 004761
e
71
Table 4,,7
Computation - D(1,1,1) and D(1,0,2)
( 2)
i
It
j
em
0
0
1
-e
2
1
2
2
3
0.022094
0.011047
0.015747
1
2
3
0.003931
0.001965
0.002954
0 .. 019746
0.006582
0.008428
1
0.000750
0.000375
0 .. 000591
0.003181
0.001060
0.001464
0.016556
0.004139
0.004988
0 .. 000150
0.000075
0.000123
0 .. 000596
0.000165
0.000255
0 .. 002507
0.000627
0.000818
2
3
1
3
1
2
3
3
0.013980
0.002796
0.003223
Item 1 is
B6 (2+i-j, 3+j) ..
Item 2 is
2~j B6 (2+i-j, 3+j) •
Item 3 is
aa)2+j
64+i
1
(T
(2+j)(4+i, +"2+j B6 (2+i-j, 3+j) •
e
72
Table 4.8
Computation - D(1,1,1)
(1 )
i
It
j
em
0
1
2
3
0
1
2
3
4
5
0.064793
0.403695
0.026157
0.003269
1
1
2
3
4
5
0.064793
0.321786
0.020849
0.002085
0.016493
0.321786
0.005307
00000353
2
1
2
3
4
5
0.064793
0.256496
0.016619
0.001384
0,,016493
0.256496
0.000235
0.004198
0 .. 256496
0,,001077
0.000044
0.064793
0.204453
0.013247
0.000946
0.016493
0.204453
0 .. 003372
0.000160
0.004198
0.204453
0.000858
0.000030
~e
0~004230
1
2
3
3
4
5
Item 1:
~e
See Item 2, Table 4.4 (~q) •
Item 2 is
(T ~) q.
Item 4 is
(aa ~) q 6~+q
T
0.001069
0.204453
0.000219
0.000006
1.
Item 3 is
6~+q
Item 5 is
aa
(T ~
•
~+q
)q 61
q{:t+q)
•
e
73
Table 4.9
Computation - D(1,1,1)
( 2)
i
j
It
em
0
1
2
0
1
2
3
4
0.971641
0.485820
0.489089
0.004042
1
1
2
3
4
0.939770
0,,469885
0.471970
0.003900
0.992884
0 .. 330961
00331314
0.000249
2
1
2
3
4
0.897383
0.448691
0.450075
0.003719
0.982096
0.327365
0.327600
0.000246
0.998278
0.249569
0 .. 249613
0.000017
3
1
2
3
4
0.846859
0.423429
0.424375
0.003705
0.964899
0.321633
0.321793
0.000242
0.994981
0.248745
0.248775
"e
Item 1· is
B~1 (t,q+1) "
Item 3 is
(ir ~)q
0~000017
1 B~
Item 2 is
t+q
~1
q(t+q)
+
q
1 B~
q
t+q
~e
Item 4 is
~qDaa ~)q ~1
T
q(t+q)
+1
q
1
1
(t,q+1)
3
0.999594
0.199918
0.199924
0.000001
•
(t,q+1) •
B~ 1 (t,q+18
•
e
74
~
Table 4.10
Computation - D(1,1,1)
(3)
i
It
1
em
0
2
1
3
3
0.001189
0.000005
0
1
1
2
3
0.000041
0
0
0.001189
0
0
2
1
2
3
0.000001
0
0
0.000041
0
0
0.001189
0
0
3
1
2
3
0
0
0
0.000001
0
0
00000041
0
0
0
0_
j
2
Item 1 is
(>../~)t
Item 2 is
.r..9
0.001189
0
0
•
•
t+q
Item 3 is
(>,/~)t r.. qDf r..)q~,- + 1
q(t+q)
q
Btl (t,q+1
1
U•
_
75
Table 4.11
Computation D(1,O,2)
(1 )
i
It
j
em
0
3
2
1
0
1
2
3
4
5
0.002268
0.017778
0.606134
0.010776
0.001347
1
1
2
3
4
5
0.002268
0.017778
0.534824
0.009508
0.000951
0.000108
0.002370
0.534824
0.001268
0.000084
2
1
2
3
4
5
0.002268
0.017778
0.471904
0.008390
0.000699
0.000108
0.002370
0.471904
0.001118
0.000062
0.000005
0.000316
0.471904
0.000149
0.000006
3
1
2
3
4
5
0.002268
0.017778
0.416386
0.007403
0.000528
0.000108
0.002370
0.416386
0.000987
0.000047
0.000005
0.000316
0.416386
0.000132
0.000004
"-
.
4&
.
~e
Item 1 is
X, 2+j
Item 3 is
61
•
- Item 2••is
•
Item 4 is
•
•
4+i
Item 5 is
(aa A) 2+j
T
6~+i
_ _--4
(2+j )(4+i)
•
0
0.000042
0.416386
0.000017
0
• •
»
(aa X,) 2+j
1f
•
(T A) 2+' 64+'
J
•
1
1
•
76
Table 4.12
Computation
D(1,0,2)
( 2)
j
i
o
2
3
4
0.032880
0.016440
0.017391
0.000039
0.049946
0.016652
0.016736
0.000002
3
4
0.016254
0.008127
0.008826
0.000020
0.016627
0.005542
0.005604
0.000001
0.033328
0.008332
0.008338
1
2
3
4
0.009147
0.004573
0.005101
0.000012
0.004948
0.001649
0.001696
0.009250
0.002312
0.002316
0.023809
0.004761
0.004761
o
o
o
1
2
1
we
3
4
1
2
2
3
3
2
0.082836
0.041418
0.042765
0.000097
1
o
1
Item 1 is
B6 ,(2+i-j,3+j)
Item 2 is
2~j B~1(2+i-j, 3+j)
Item 3 is
(~}..)
2+'
(2+j)(4+i)
jJ
Item 4 is}..
•
6;+i
J
.
2+J{ aa
(r.i }..)
p
.
2+J
o
•
+
-L
2+j
BA (2+i-j, 3+j)
u,
•
.
4+i
61
1
}
+ ~J B6 (2+i-j, 3+j)
(2+j)(4+i)
1
•
CHAPTER V
SUMMARY AND SOME ALLIED PROBLEMS
5.1 Basic assumptions and technigues
The foregoing chapters have been based upon the
general concepts of test reliability and correlations
corrected for attenuation arising in the field of statistics in psychology.
Basic to the discussion has been the
assumption that the observable variates are jointly distributed according to the multivariate normal distribution
,.."
in which the means are zero and the covariance matrix
~
has a specified pattern as well as being poStive definite.
The pattern of the covariance matrix is determined by the
assumed structures of the variateso
But the structures
cannot be deduced from the covariance matrix.
In order that
IV
~
be consistent with the assumed structures, defined just
prior to (1.6), we must have positive reliability coefficients
P1
and
P2
Ip/~P1P21
together with
= Pa
I pip 1 I < 1
much of the analysis, the condition
wi th it, the conditions
P1
>0 ,
< 1 • During
obtained, and
P1
•
Multivariate analysis techniques, e.g. [1], [3], and
[15], were employed throughout.
Estimates of covariance
matrix elements were derived by the method of maximum likelihood, without reference to restrictions
77
P1
> 0 and I pip 1 I < 1 •
78
It was proposed that the resulting estimates be used when
it is known that the two inequalities are satisfied.
Conse-
quent anomalies were pointed out.
Canonical correlations were established by the methods
of [15].
from
-Z
It was found that the canonical variates derived
(or~)
are independent of the specific values of
the parameters, these variates being determined when the
sign of
p
is known.
This finding gave rise to the terms
quasi-canonical correlations and quasi-canonical variates.
Methods of simple correlation analysis were found applicable
to this quasi-canonical correlation.
The asymptotic distribution of estimators
&2,
P1'
p, and w were derived by the methods of [3] and [4J. The
exact distribution of
w was found by standard techniques
which reduced the sample density to the joint distribution
of a set of independent statistics whose distributions are
knowno
From that point in the development, derivation of
cumulative probabilities for
w became a three dimensional
calculus problem.
5.2 Summary of results
Consequences of the assumed structures are set forth
in Chapter I.
-
The covariance matrix, Z , for the observable
variates was established there.
An orthogonal matrix
A
yielded transformed variates whose covariance matrix became
the basis for a simple test of the hypothesis
P1 = P2
79
versus
P1 ~ P2.
The covariance matrix for the transformed
variates is diagonal when
P1 = P2 •
Canonical correlations associated with two models,
one in which
P1 ~ P2
and one in which
=
P1
P2 , were
derived and the previously mentioned results are given in
Chapters II and III.
The maximum likelihood estimates, denoted by ~2, P1
and
p , are determined in Chapter III. These estimates are
expressed in terms of independent
defined by (3.31).
tx 2
variates t, u, and v
The asymptotic distributions of &2, P1
p, and w were found to be normal with means and variances
specified in paragraph 3.5.
Anomalies of the estimators
P1 and ware alluded to in paragraph 5.1, preceding.
The distribution of the statistic w was derived in
terms of integrals denoted by A, C, and D.
Values of these
functions are given as linear combinations of the complete
and incomplete Beta-functions.
Calculations using these
results are illustrated for a sample of size 4 in which
the number
p
for P1 and
P~~
of variates in each set is
2
and values
are specified.
It is true that the cumulative distribution
given by (4.49), is a function of
P1 and
P~~.
a,~,
F(w) ,
and y and hence
Similarly these parameters are present in the
asymptotic distributions to which reference has been made in
this chapter.
Hence, the exactitude of these distributions
80
.
is limited to cases in which one of the two parameters is
known.
An upper bound for
p(w
~
(4.54) in terms of the function
was established for
1)
C.
w; so that, if
is given prior to
The consistency property
p~~
tends to zero with increasing values of
5.3
< 1 , p(w > 1)
n (the sample size).
Extensions of the theory
This paper sheds some light upon distributional prob-
lems connected with reliability coefficients and correlations
corrected for attenuation.
But assumptions which lead to
,....
the covariance matrices
~
and Z limit the usefulness of the
results to a relatively small class of problems.
It may be
possible, for example, to relax the conditions of equal
variances and
P1
= P2
to derive results which are applic-
able to a much larger class of problems.
Even more radical changes in the model may be amenable
to mathematical treatment similar to the present research.
Professor Hotelling has suggested the following example:
Let
~,~,
e1, ""
e 2p
be defined as in paragraph 1.3 and
let the standardized variables associated with
~,
X2p
and
~',
respectivelyo
~
and
Consider random variables
with the following structures:
.,
i
i
<P
>P
~
be
X1, ""
81
..
where the coefficients a i are positive constants depending
upon i . Then O'l; t fJ t = Pl; 1 fJ t , .where .standard notation is
used.
Let
E p+1 '
••• ,
E"
••• ,
E
p
have a common variance
0'2
and
2p have a common variance 0'12 •
The constants a·J. introduce a variety of relations
E
between elements of the covariance matrix for
X, , ••• ,X2p •
We simplify notation by putting
a.J. =
whenever
>p
i
X.J. = y.J.
and
~.
J.
and leave the remaining notation unchanged.
We have, for example,
0'2
X.J.
= a~
J.
=
O'Y.
0'2
,
p~ + 0"2
J.
J.
and, when
+
,
i ~ j
=
a.a.
0' Y.y.
J. J
=
~.p.
0' X.y.
J. J
=
aiPjP~'fJ'
0'
X.X.
J. J
J. J
J. J
,
•
Among other relations which can be established, when
~ ~ m I q, we have
i ~ j ~ k, and
PX.X. PX,X
k
J. J
~e
PXjX
k
J
a~
J.
=
a~ + 0'2
J.
PX.X. PXiY k
J. J
=
P
XiYk
p
~
3,
I
,
82
=
,
o
.
Thus, for example,
Px .X. PX.X
J.
again with
p
~
Even when
3,
i
I
j
I
J
J.
k , and
k
t
PYoY
'V
1=
p
m
m 1= q
p = 3 , it is apparent that
YoY
'V
9
•
P~t~,
may be
expressed in several forms, each involving elements of the
correlation matrix.
Further consequences of the structures
have not been investigated.
But the model, which may be
suitable for some practical. problems, is seen to be much
more complicated than those to which Chapters I through IV
refer.
Models in those chapters may be regarded as special
cases of the above.
5.4 Additional problems
In addition to unsolved problems mentioned in the
introduction and at the end of Chapter III, there are others
directly connected with the present study.
83
Consider, for example, the upper bound for
"
as given in paragraph 4.5.
p(w> 1)
Computation of this bound is
straightforward but lengthy i f n
is large.
It may be
possible to sharpen this result or to establish a servicable
upper bound which is more easily computed.
Professor
W. Hoeffding has suggested an approach to the latter alternative.
We can take the Tchebycheff inequality given by
[9, page 42] and apply it to the inequality which precedes
(4.54.
This gives
p(W> 1)
~
P(2U - VI < 0) + P(T + U - VI < 0)
< E{ e -~ ( 2U - V
I )}
for all
...
~,j.L
>
o.
+ E{ e -j.L (T + U - V' )}
Selection of values for
~
and
j.L
can be
based upon minimizing the indicated expectations •
In connection with the observation that both the exact
and the asymptotic distributions of
w contain nuisance
parameters, no general methods have been found for testing
hypotheses concerning
P~~
and determining confidence
intervals when the reliability coefficients are unknown.
Such methods would be useful.
Professor S. N. Roy suggests
the possibility that, in view of the findings of paragraph
3.6 concerning canonical correlations, the function
may be a servicable substitute for
P~~
(2.18) and its value is approximately
0
,
*
P/¥P1P2
,*
is given by
when the
reliability coefficients have values close to unity.
84
Another advantage in replacing the usual definition of
correlation corrected for attenuation would be that, under
the very reasonable assumption that the population dispersion
N
matrix
Z of the2p variates must be positive definite, we
would have
-1
< ~ * < 1 • This means that we would not
encounter the possibility of this correlation being greater
than unity in absolute value, which plagues the usual definition in terms of
P/~P1P2.
It is easy to see that with
the usual definition it is hard to suggest such a "natural"
way to get around the difficulty.
The function
~* is the
correlation between sums of the first and second sets of
p
variates each, and well known distributions can be used
in connection with ~* for the usual statistical purposes.
Suppose, for example, that two equivalent forms are available
for each of two tests, that the reliability coefficients are
unknown, and that otherwise the present results apply.
Possibly, the purposes of correction for attenuation can
be accomplished through the use of correlations between
test totals.
BIBLIOGRAPHY
~
[1]
Anderson, T. W. An Introduction to Multivariate
Statistical Analtsis. New York: John
Wiley and Sons, nc., 1958.
[2]
Browne, E. T. Introduction to the Theory of Determinants and Matrices. Chapel Hill: University
of North Carolina Press, 1958.
[3]
Cramer, Harald. Mathematical Methods in Statistics.
Princeton: Princeton university Press, 1956.
[4]
Hoeffding, Wassily, and Herbert Robbins. "The Central
Limit Theorem for Dependent Variables," Duke
Mathematical Journal, Vol. 3 (1948), 773~.
[5]
Hotelling, Harold. "Relations between Two Sets of
Variates," Biometrika, Vol. XXVIII (1936), 321377.
[6]
--------"New Light onthe Correlation Coefficient and its
Transforms," Journal of the Ro~al Statistical
Society, Series B, Vol. XV (19 3), 193-232.
[7]
Johnson, P. O. Statistical Methods in Research.
York: Prentice-RaIl, Inc., 1949.
[8]
Kelley, T. L. Fundamentals of Statistics.
Harvard University Press, 1947.
New
Cambridge:
Kolmogorov, A. N. Foundations of the Theory of Probability. New York: Chelsea Publishing Company,
1956.
[10]
Kuder, G. F., and M. W. Richardson. "The Theory of the
Estimation of Test Reliability," Psychometrika,
Vol. 2 (1937).
[11 ]
Loevinger, Jane. "A Systematic Approach to the Construction and Evaluation of Tests of Ability,"
Psychological Monographs, Vol. 64 (1947).
[ 12]
McNemar, Quinn. Psychological Statistics.
John Wiley ana Sons, fnc., 1955.
85
New York:
86
[13 ]
[ 14]
[16]
-------and A. E. Sarhan. "On Inverting a Class of
Patterned Matrices," Biometrika, Vol. 43 (1956),
227"';231.
.,
I
I
[ 17]
Spearman, C. "The Proof and Measurement of Association
between Two Things," American Journal of
psychology, Vol. 15 (1904), 72-101.
[ 18]
-------"Correlation Calculated from Faulty Data,"
British Journal of Psychology, Vol. 3 (1909),
271-295 •