Heck, D.L.; (1958)Some uses of the distribution of the largest root in multivariate analysis." (Air Research and Dev. Command)

SOME USES OF THE DISTRIBUTION OF THE LARGEST
ROOT IN MULTIVARIATE ANALYSIS
by
D. L. Heck
University of North Carolina
This research was supported by the United
Sta.tes Air Foree through the Air Force
Of~ice of Scientific Resear.ch of the Air
Research and Developm8nt Coumand, ~nder
Contract No. AF 18(600) -83. Reproduction
in whole or in part is permitted for any
purpose of the United states Government.
"
..
,
.'
,,;
~.;
Institute of Statistics
Series No. 194
March, 1958
ML~eoGraph
SOME USES OF THE DISTRIBtJrION OF THE LARGEST
ROOT IN MULTIVARIATE ANALYSIS*
•
by D. L. Heck
•
Summary. There has been only limited application of the distribution of
characteristic roots for testing purposes and construction of confidence
intervals in multivariate analysis, largely because a comprehensive set
of tables and easy instructions were not available.
The present report
is a partial attempt to fill this gap.
In this paper, testing procedures for -three tests in multivariate
analysis are given, and for two of the test situations which require the
distribution of the largest characteristic root, numerical examples are
•
worked out.
Charts of the upper 1%, 2.5%, and 5% points of this distri-
bution for the degrees of freedom s
5
~
= 2(1)5,
m = -i, 0(1)10~ and
n $ 1000 are included, as well as a procedure for obtaining the
points for n
> 1000.
The method used in computing the percentage points
is also described.
Multivariate confidence bounds which may be set up using these percentage points are not considered at this time, but for papers on this
subject, the interested reader may refer to the following:
~3,
21, 22,
23,24,25, 26, 27~.
*
..
..
•
This research was supported by the United states Air Force throuSh the
Air Force Office of Scientific Research of the Air Research and De.
velopment Command, under Contract No. AF 18(600)-83. Reproduction in
whole or in part 1s permitted for any purpose of the United States
Government.
'--
2
•
1.
Introduction and notation •
In multivariate normal analysis, certain characteristic roots of
matrices of sample quantities, or a simple function of one of these roots,
provide statistics for testing three important hypotheses, as well as
for the construction of confidence bounds on parametric functions which
might be regarded as departure from the null hypothes is • The three
hypotheses which may be tested are:
(a)
Equality of the covariance (dispersion) matrices in two p-
variate normal populations.
(b)
Independence between a set of p correlated variates and a set
of q correlated variates in a p;q-variate normal population.
e.
the problem of canonical correlation.
(c)
The general multivariate linear hypothesis, assuming multivari-
ate normal popUlations.
MUltivariate analysis of variance and covariance
are special cases of this •
•
This is
(1.1)
=
3
•
where Q1 and the parameters (or degrees of freedom) s, m, and n will be
given for each of the three cases, in the discussion below •
•
In order to test (b) and (c), or put confidence bounds on associated
parametric functions, we require the upper percentage points of the discharacteristic root (Qs ) in (1.1), and to test
(a), we require the Joint distribution of the largest and smallest roots
tribution of the
lar~est
(Qs and Ql). Inasmuch as only the percentage points of the distribution
of the largest root have been considered in this paper, our primary interest will be in cases (b) and (c). Case (a), after a statement of the
testing procedure involved, will not be discussed further; for testing
purposes, but not for confidence bounds (see Chapter III of
e
likelihood-ratio test is suggested ~30~.
Notation
•
x'
column vector (x l ,x 2 , ••• ,xp )
row vector
(
"
)
A(pxk)
matrix with p rows and k columns
At
transpose of A
!.(pxl)
inverse of A
•
.
ch(A)
characteristic roots of A
R(A)
rank of A
I(p)
(pxp) identity matrix
p.d.
positive definite
a .e •
almost everywhere
NJ:l,I:.J:
p[ J:
1:31 ), Wilks
I
4
upper lOca% point of the distribution of the largest
characteristic root corresponding to the degrees of
freedom s, m, and n. Sometimes written ~, for brevity.
•
)
•
expected value of (
)
d.f.
degrees of freedom
c.d.f.
cumulative distribution function
2.
Testing the equality of two covariance matrices.
We are given two random samples, one of size Nl from a p-variate
N[1l ''£lJ, the other of size N2 from Nt:i2,r.2 J, where 11 and 12 are
(pxl) vectors of (unknown) parameters and '£1 and '£2 are the (unknown) covariance matrices of the two populations.
~
N ' N •
1
2
Denote the respective (pxp) sample covariance matrices, based on
•
e·
nl
nl
= Nl-l
and n2
= N2 -l
d.t.,
Also, P
by Sl and S2' and the p roots of the matrix
-1
n 8182 ' which are a.e. greater than zero, by c l
2
~
c2
~
•••
~
the test of equality of the two covariance matrices, i.e., HO:
El F ~, is as follows: accept HO at
~ = Q < x U and ~ = Ql ~ X~T' and
l+c
s - a
J.+cl
against Hl :
cance if
Cpa Then
!1 = '£2'
the a level of signifi-
reject otherwise, where
v..lJ
p
xau and xaL are determined from the joint distribution of the largest and
smallest roots such that
•
The values of the degrees of freedom
nl-p-l
m = "'-'''''''2'''''- , and n
n2 -p-l
= 2
6,
m, and n in (1.1) are s
= p,
5
•
3. Testing the independence between a set of p correlated variates
and a set of q correlated variates •
•
We are given a random sample of N individuals from a population j(,with
p+q (<N) characters measured on each individual.
servation vector is assumed N~l,E~, where
1
Each p;q-varlate ob-
is a (ptqxl) vector of (un-
known) parameters, and E 1s a symmetric (prqxptq) p.d. (unknown) covariance matrix with the structure
p
1:
=
•
q
p
Denote the sample covariance matrix, based on N-l d.f., by
8 [5~1 512 J,
e
·
=
8 12
822
and the largest root of the matrix 8l28-~2Si2sit by c.
Then the test of
independence between the p-set of variates and the q-set, i.e., HO: l12 =
against Hl : E12 i 0, is as follows: accept H at the a level of signifiO
cance if c = Qs ~ x ' and reject otherwise, where xa 1s determined from
a
the distribution of the largest characteristic root such that
•
•
If 2 $ m1n(p,q) $ 5, then xa ' for a given a, may be obtained by
entering the charts in section 7 With the following degrees of freedom:
s = min(p,q),
If min(p,q)
= 1,
m-
lE.:ll:!
2
'
and
n
= N-p-q-2
the test is equivalent to the test for
2
~
2
= 0 where
~
1s
the multiple correlation of the p set on the q set; for it will be noted
°
6
-1
. -1
that the characteristic roots of 812822812811 are the squares of canonical
correlation, and in the above case we are testing the significance of the
largest canonical correlation. A numerical example is given in section 6
to illustrate the procedure just outlined.
4. The general multivariate linear hypothesis.
To study generalizations of the analysis of variance and covariance
to the p-variate case, it is useful to proceed from the regression model
(general linear hypothesis).
The test statistic in the univariate case,
the F-statistic consisting of the ratio of applicable mean squares, is
replaced in multivariate analysis by the largest characteristic root of
the "ratio" (the product of one matrix and the inverse of the second) of
•
two matrices whose elements are the corresponding sums of squares and
sums of cross-products for the p variates, corrected for the means.
In
addition, the presence of a matrix M as a postfactor in the multivariate
hypothesis permits testing of hypotheses concerning certain linear relationships between the p variates themselves.
We are given N independent (pxl) observation vectors !1'!2' ••• '~
(p < N), where each consists of the measurements on p characters and is
assumed
NCt (!i)' I1·
t. (!.i) = !i~' where g is an
of (unknown) parameters and
sign; generally ~(X')
•
the observations.
.
for a given design.
A=
LAl
!i
= Ag,
(mxp) matriX (m<
N)
is a (lxm) row vector specified by the de-
where X'(Nxp) is the matrix containing all
I is a symmetric (pxp) p.d. (unknown) matriX, fixed
If A(Nxm) is of rank r < m, we can partition it into
A2 ], where Al(Nxr) is an arbitrary basis of A.
to be tested is HO: CgM
= 0 against
Hl : CgM
F0 = ~
The hypothesis
(say), where C(gxm) is
7
~
of rank g
C
rand M(pxu) is of rank u
~
p.
C can be partitioned into
= LC l
C2 ], where Cl (gxr) and C (gxm-:r) must be chosen so that
2
Ct = Clt l + C2 g2 , with ~l and t 2 representing the partitioning induced
in
~
by
A~
= A1S l
+ A2t2.
Consider now the two matrices
and
where H is the matrix of sums of squares and cross-products due to the
hypothesis, and E is the matrix of sums of squares and cross-products
due to error.
HE
e
•
-1
Now let the largest characteristic root of the (uxu) matrix
be denoted by c s ' where s
= minL~R(C),
R(M)
1.
Then if C obeys the
testability condition, i.e., if C2 = Cl{AiAl)-lAiA2' the test of HO is
as follows: accept H at the a level if ~ = Q < x , and reject otherO
s - a
l+c s
wise, where x is determined from the distribution of the largest root
a
such that
If 2
~
s
~
5, xa for a given
0,
may be obtained by entering the
charts in section 7 with the following degrees of freedom:
s
= minLR(C),
R(M)
m = JR{C) -R{M) l.:.l
= Jg-u I - 1
= N-R(A)-R(M)-l
2
N-r-u-l
2.
and
n
If s
= 1,
1 == min(g,u),
2
2
,
•
then m and n should be increased by unity and The Tables
of the Incomplete Beta-Function [14] should be used.
8
5. Examples of the general multivariate linear hypothesis.
(a)
MUltivariate analysis of variance with one-way classification.
Consider k random samples of n1 (i = 1,2, ••• ,k) individuals each,
from k different groups, with p characters measured on each individual.
The (pxl) observation vectors are assumed N[Ii' r. J, where Ii is a (pxl)
vector of (unknown) parameters and r. is a symmetric (pxp) p.d. (unknown)
covariance matrix.
It is desired to test the hypothesis 11
=12 = .•. =lk'
1.e., equality of the mean vectors among the k groups. Denoting xir as
th
th
the V variate (V = 1,2, ••• ,p) of the j
individual (J = l,2, •• ~,ni)
from the i th group, and ~ V) as the mean of the Vth variate of the i th
i
group, we have the mOdel t(X 1 ) = A~:
(1) (2)
xl! x1l •
•
x(p)
• 11
1
0 o •• • 0 0
~(l) ~(2)
~(p)
1
l' • • 1
~(p)
£(1) ~(2)
2
t
x(l)
1n1
(1)
x 21
x(2) • • xfp)
ln '
n1
1
(2)
x(p)
x21 •
• 21
=
0 1 0
·..0
•
x(p)
• 2n
2
0 1
0
•
0 • • • 0 0
•
~(1) ~(2)
k
k"
(kxp)
•
~l) ~2).
~
nk
(NXp)
.. ~p)
~
• 2
1 0 0 • • • 0 0
•
x (1)
x (2) •
2n2 2n2
2"
0 0 0 • • • 0 1
k
where N = r. ni , and the hypothesis
i=1
(Nxk)
C~M
= 0 or 11 = 12 = •••
=
lk:
•
~(p)
k
9
1 -1
1
O. • • 0
0
o
0
0 -1 • • •
•
1
0
O • • • -1
1
0
o•
•
0
· . o -1
t (l) I: (2)
••
sl
lOl •
/:(1) t(2)
lO2
\>2 •• •
£ (p)
1
0
t(p)
\>2
o
1
1
•
•
t (l) I: (2)
I:{p)
lOk-l lOk-l· • • lOk-l
t (l) t (2)
t(p)
lOk
lOk •• • lOk
= k,
R(e}
= k-l,
0
0
o •
•
= (Wxp)
o
0
o •••
o
0
o•
1
0
..o
1
(pxp)
(lap)
It is seen that R(A)
.
o . .o
o ••. 0
and R{M)
= p,
and since A is of rank
k, the partitioning of A into Al and A2 does not occur, and A replaces A l
in the matrix formulae for Hand E above.
.
To make the test, we first calculate Hand E
-1
I
where H turns out
to be the (pxp) matrix of between groups sums of squares and cross-products
•
of the p variates, and E is the (pxp) matrix whose elements are the within
groups sums of squares and cross -products. We then calculate the largest
root of the matrix HE- l , c
s
(say), where s
= min{k-l,
p), and if
5, we may obtain xa for the desired Q by entering the charts in
section 7 With the following degrees of freedom:
2
~
s
~
s = min(k-l, p),
C
-1 ,
m _Jk-P2ll
-
and
n =
N-k-p-l
2
•
s
If ----1+
. = g <~, we accept the hypothesis of equality of mean vectors
Cs
s - .....
among the k groups, and reject it otherwise. 'If s = 1, then the statement at the end of section 4 applies. A numerical example is given in
section 6 to illustrate the above testing procedure.
(b) MUltivariate analysis of variance with two-way classification the randomized block design •
•
10
Consider a single replication of a randomized complete block experiment with r
blocks and r 2 treatments. On the yield of each of the r r 2
l
l
plots, p characters are measured, and we express each observation as a
(pxl) vector,
~ij
= 1,2, ••• ,r l ;
(say) (i
1i
are assumed independent N[l: +
+ .!lj'
j
= l,2, ••• ,r2 ),
r.J.
where the
~ij'S
l: is a (pxl) vector of
(unknown) parameters due to an overall effect, ii is a (pxl) vector of
(unknown) block effects due to the i th block, .!lj is a (pxl) vector of
(unknown) treatment effects due to the jth treatment, and r. is a symmetric
(pxp) p.d. (unknown) covariance matrix.
pothesis ]1
= ~ = ~ •. =]r
It is desired to test the hy-
' i.e., that there is no difference between
the treatment vectors. We
by
the f th variate
1,2, ... ,p)
of the observation from the plot i~ the i th block to which the jth treat-
~enote
e
x~~)
ment has been applied. Also, denoting by
,
ated with the
~~~) the effect of the jth treatment on the ~th variate,
we have the model
t
(X')
x(p)
· ·11
t
= Ag:
1 1 0 •• • 0 1 0 •• • 0
•
•
.x~)
(1)
Xlr ••
2
(1)
x 2l • •
x(p)
x 2(1)
r • • • 2r 2
2
gel)
1 •
•
101 •
· . 010 • • · 0
=
1 0 1 •
• 000 •
• 1
•
xr(1)r • • • x(p)
r r
l 2
l 2
(r l r 2xp)
~(l) •• • ~ (p)
1
· • g(p)
110 • • .00 0 • • • 1
2
x(p)
• 21
•
•
~(~) the overall effect associ-
~th variate, by gi~) the effect of the i th block on the f th
variate, and by
(l)
x ll •
(~=
1 0 0 • • • 100 • • • 1
(rl r 2xr +r2+l)
l
gel).
• £(p)
r
rl
l
(1)
~(p)
TIl • • • 1
·
•
~l). • • ll{p)
r2
2
(r l +r 2+lxp)
11
and the hypothesis CgM
00.
•
o0
..
= 0 or ]1 = ]2 = ···=Tlr:
2
010 • • • 0 0 -1
• • • 0 0 1 • • • 0 0 -1
~(l) • • •
~il).
lJ(p)
· · ~ip)
1 0 ••• 0 0
o1
• • .00
•
= o
00 • • • 000
. . • 1 0 -1
o 0 • • • 000
• 0 1 -1
(r -lxp)
..
(1)
TJr
2
•
2
•• ~p)
2
o0
o0
• 1 0
••• 0 1
(pxp)
In this case we have R(A)
•
= r l +r2 -l,
R{C)
= r 2 -l,
and R(M)
= p,
For the
arbitrary basis A of A, we may choose for A2 the first column and, in
l
addition, some other column of A. C is the matrix which remains after
l
selecting for C those columns of C corresponding to the columns of A
2
chosen for A2 • Tlle testability condition C2 = Cl(AiAl)-lAiA2 is satisfied.
To make the test, we calculate Hand E, where H is the (pxp) matrix
whose elements are the between treatments sums of squares and crossproducts for the p variates, and E is the (pxp) matrix whose elements
are the error or residual sums of squares and cross-products.
The
largest root of the matriX BE- l , c s ' where s = min(r2 -1, p), is then
calculated, and if 2 ~ s ~ 5, we may obtain xa for the desired a by
entering the charts in section 7 With the following degrees of freedom:
m=
and
•
We then accept the hypothesis of equality of treatment mean vectors if
C
s
l+c = Qs ~a' and reject otherwise. If s=l, the statement at the end of
s
section 4 applies.
12
If we are interested in equality of block mean vectors, this can be
tested in a similar manner, with H now being the matrix of between blocks
sums of s9.uares and cross-products, E remaining the same as in the above
test for equality of treatment mean vectors. As in univariate analysis,
in order to test for significant block by treatment interaction, more
than one replication of the experiment must be available.
If this is
the case, then H will consist of the interaction sums of squares and
cross-products,
~nd
E will consist of the error or residual sums of
squares and cross-products.
In general, the correspondence .between uni-
variate and multivariate analysis of variance, no matter what the experimental design ['e.g. Latin squares, balanced incomplete blocks, etc._7,
is as follows:
the mean square occurring in the numerator of the F-
statistic in univariate analysis of variance is replaced by the matrix H
consisting of the corresponding corrected sums of squares and crossproducts of the p variates. The residual or error mean square occurring
in the denominator of the F-statistic is replaced by the matrix E whose
elements are the residual or error sums of squares and cross-products
of the p variates. The test statistic in the multivariate case is then
.
the largest root of the matrix BE
-1
, and the test is carried out as
stated above.
(c) Profile analysis.
The following problem, suggested by R. E. Bargmann, provides
an application of the general multivariate linear hypothesis in
which the postfactor hypothesis matrix M is other than the identity
matrix, as was the case in (a) and (b) above. We are given samples
(of size ni , i
= 1,
2, ••• , k) of observations from k different
•
13
racial groups, where each observation consists of l' body measurements.
There will undoubtedly be significant differences in the means of each
measure, but we may ask whether the "group profiles" of all measurements
are similar.
The model in this case 1s the same as that in (a) above,
and writing ~ij as the jth (j 0: l,2,u.,n i ) (lxp) observation row vector
.th group,
from the i th group, and -1
g! as the (lxp) mean row vector of the).
we have.for ~(xt)
= As:
100 • • .00
St
•
•••
-2
.00
100
o
•••
1
=
0
· . .00
1
...
•
•
o
gt
-1
0 • • .00
(k~p)
•••
~nk
(Nxp)
000 • • .01
.J
(NJ:k)
k
where N = l: n • The
1=1 1
of equal profiles states:
hYl?otb.~s1s
sU') = d
s(r) i
l
for all
f = 1,2, ••• ,1'
1
2
12
and all k(k:ll pairs of the k groups.
2
Figure 1
indicates what is meant by the similarity of profiles hypothesis.
differences d121 d , etc. can be different.
13
The
14
FIGURE 1
Example of similarity of profiles
"---- Group 1
Respons
"---- Group 2
(e .g. i
. . ., .
inches)
"---- Group' k
(1)
We may state this hypothesis as CeM
1 -1
1
.•
• . •
o•
o -1
0
~i1)
0
~(l)
2
••• (p-l) (p)
C~)
(2)
Variate No.
= 0:
. • • ~(p)
1
g(p)
• . • 2
1
·
1
• •
0
·•
0
1 • •
o•
-1
o -1
•
=
(k-lxp-l)
•
100
. . • -1
t(p)
o
• sk
(k-Ixk)
0,
O •• • -1
(pxp-l)
(kxp)
= k-l,
anditJa seen that R(A)
= k,
and R(M) = p-l.
th variate of the jth subject
the observation on the V
Denoting by xi~)
th
from the i
group (i
= 1,2, ••• ,k;
R(C)
= 1,2, ••• ,n i ; V=
1,2, ••• ,p), we form
new observations yf~) by subtracting each observation on the h variate
from the corresponding observation on the 1st variate, i.e.
j
(0
= 2,3, ••• ,p).
t
15
The elements of the (p-lxp-l) matrix H are given by:
k
h
00:
= i=l
1:
N
and
N
where
N
=
k
1:
n
i=l
1
,
and
k
•
1:
r:
(0)
Y
.
1=1 i •
The elements of the
eact
=
and
To test the hypothesis, we first obtain the largest characteristic
-1
cs
root of HE , c s , where s = min(k-l, p-l), and calculate ----1
The
+c s = Q.
s
percentage point x ' for the desired significance level 0, may then be
a
obtained by entering the charts in section 7 with the following degrees
= min(k-l,
'k-p'-l
p-l ) , m = ~
,
= N-k-p
2 •
If Qs ~ xo '
we accept the hypothesis of equal profiles, and reject otherwise.
of freedom:
s
and n
16
6. Numerical examples.
(a) Testing the independence between two sets of variates.
The data for this illustration are an excerpt from L. L. and T. G.
Thurstone [28 J, and are available in the form of a correlation matrix
based on 437 observations of five variates, the variates representing
scores on five different psychological tests.
Two of the variates,
scores on sentence and vocabulary tests, are representative of verbal
factors, and the remaining three variates, scores on tests involving
flags, cards, and figures, are representative of spatial factors.
The
hypothesis which we desire to test is that there is no correlation between verbal and spatial abilities. We will follow the test procedure
•
given in section 3, where, in the present case, we have p
and N
= 2,
q
= 3,
= 437.
The correlation matrix is given by
R(say)
Sentences
1
.829
.108
.033
.108
Vocabulary
.829
1
.115
.061
.125
= Flags
.108
.115
1
.636
.626
Cards
.033
.061
.636
1
.709
Figures
.108
.125
.626
.709
1
2
Rn
R12
.3
R12
2
R22
=
3
We may work in terms of this correlation matrix rather than the covariance matrix, S, as given in section 3, because of
To make the test, we require
th~
fact that
17
From the matrix R above, we obtain
-1
Rll
=
~
3.197350
_2.65 0603J
-2.650603
3.197350
.108
.033
R 12 - [ .115
.108 ]
.061
.125
and
-1
R22 =
1.873114
- .723779
- .659410
-.723779
2.290453
-1.170846
-.659410
-1.170846
2.242920
whence
[001344J.
R1~~~ii'·ii =
.010508
0009861]
.012661
The characteristic equation of this matrix is
and solving for cmax ' we obtain
cmax = .0232 = Q2 (say).
We enter the charts in section 7 with a = .05 and the degrees of freedom
s
=p = 2
,
- I p _q 1-1 - 0
m -~-
,
and
n
= N-p-q-2=
215
2
'
=
2 < .024,
we accept, at the 5% level, the null hypothesis that there is no correand from Chart III, we obtain x .05(2,0,215)
1ation between the two factors.
.024.
Since
Q
18
Incidentally, if a more precise value of the percentage point is desired than that read from the chart, we may, since n is large, calculate
the 5% point using the value of z.05(2,0} given in Table 8.1, and the relation
where
zo:(s,m}
y = m+2n+s+1
From Table 8.1 we obtain z.05(2,0}
= 10.7393,
= _ y2
Y
and hence y
+ y3 _
2
31
i:..4t +
= .024802
•••
Now
,
and using the first three terms of the series, we obtain
x .05 (2, 0,215 )
=.0245,
as compared with the value .024 which was read from the chart.
(b)
Multivariate analysis of variance with one-way classification.
The data are from a study by R. E. Bargmann [2 J, and provide us
with an example involving four groups (Of size twenty five each) and six
variates. The groups have been formed by dividing 100 original observations for each variate into four sets of twenty five each. Due to this
method of choosing the groups, it is not likely that the 6-variate mean
vectors will differ from group to group, and this is the hypothesis which
we desire to test.
Using the notation of section 5(a), we have the following values:
k (number of groups)
= 4,
p (number of variates)
(1= 1,2,3,4) and N (total sample size)
= 100.
= 6,
0i (group size)
= 25,
19
We first calculate the matrix H, whose elements are the between groups
sums of squares and sums of cross-products.
This may be done in two ways,
either by the usual method as in univariate analysis of variance and covariance, or by means of the matrix expression given for H in section 4.
In the present case (and in general, when the number of groups is less
than or equal to the number of variates), by using the matrix expression,
we can simplify numerical calculations at a later stage.
For this later
simplification we write
=(1) -(1»
=(1) -(1»
n (x
3
n
3
-x
n4 (x
3
(x=(2) -x-(2»
3
=
-x4
-(2»
b4 (x=(2) -x4
,
•
=(6) -(6»
n (x
3
-x
=(6) -(6»
n4 ( x
-x4
3
where ~(l), ~(2), ••• , x(6) are the grand means (over all four groups)
of the six variates, and
-(1) -(1)
xl
C (A'A )-lA'X ' =
11:1
1
-x 2
-(1) -(1)
xl
-x 3
-(1) -(1)
xl
-x4
-(2)
-(2)
xl -x2
-(2) -(2)
xl -x 3
-(2) -(2)
xl
-x4
• ••
• ••
• ••
-(6) -(6)
xl -x2
i(6) .i(6)
1
3
,
-(6) -(6)
xl -x4
i.e., the matrix of mean differences versus the first group. From the
given data, we obtain
20
r
; 1.33
-.79
-.40
2.28
XA l (AiAl )-lCi[C l (AiA1) -lCiJ-1::25 1.06
1.62
.47
·59
1.81
2.05
-.43
-.17
.11
2.79
,
and
C (A'A )-lA'X'
1 1 1
1
=
1.24
.56
5.24
.04
5.68
-.24
5.48
2.84
3.24
5.80
1.00
3.00
5.52
,
whence
•
H =
.57
319.11
81.89
-43.88
178.16
147.56
53.72
-7.44
223.72
271.26
147.56
586.76
54.82
465.86
397.90
.57
53.72
54.82
28.99
16.77
65.35
319.11
-7.44
465.86
16.77
485.71
202.05
81.89
223.72
397.90
65.35
202.05
381.95
,
which is the matrix of between groups sums of squares and cross-products.
For the matrix E, consisting of the sums of squares and crossproducts due to error, we first obtain the matrix T (say), consisting of
the overall sums of squares and cross-products (corrected for the grand
means) and then using the relation T
We obtain for T,
=H+
E, obtain E by subtraction.
21
T =
9910.91
3974.24
5356.50
4080.09
5982.47
-597.23
3974.24
9993.36
5223.00
5126.76
550.08
5980.28
5356.50
5223.00
11477.00
-1128.50
5685.50
4832.50
4080.09
5126.76
-1128.50
10482.91
4393.53
4904.23
5982.47
550.08
5685.50
4393.53
10562.99
3924.09
-597.23
5980.28
4832.50
4904.23
3924.09
10250.19
9692.56
4018.12
5085.24
4079.52
5663.36
-679.12
4018.12
9815.20
5075.44
5073.04
557.52
5756.56
5085.24
5075.44
10890.24
-1183.32
5219.64
4434.60
4079.52
5073.04
-1183.32
10453.92
4376.76
4838.88
5663.36
557.52
5219.64
4376.76
10077.28
3722.04
-679.12
5756.56
4434.60
4838.88
3722.04
9868.24
and from E
E =
•
= T-H,
For E- 1 , which is required be1ow, we obtain
E - 1 = 101, 000 x
36.1985414
3.0649974 -37.5116552 -36.8517637
2.0472603
34.8582807
3.0649974
26.6629266 -29.4720498 -28.7240118
23.3719743
3.17 09850
-37.5116552 .29.4720498
67.1426781
64.9377648 -26.4445828 -37.4298204 •
-36.8517637 -28.7240118
64.9377648
64.4595419 -25.7133798 -36.8712179
2.0472603
23.3719743 -26.4445828 -25.7133798
22,4834247
2.5190879
34.8582807
3.1709850·37.4298204·36.8712179
2.5190879
35.5123725
For our test , we require ch
r-HE- l - 7, and it is at this point that
max'-'
we may simplify calculations as mentioned above.
that, for non-zero roots, ch['AB
J
= chLBA
J,
Making use of the fact
we have
22
Chmax [HE-1 J = chmax LXA1 (AiAl) -lCiLCl(AiAl) -lci ]
-1
Cl (AiAl) -lAiX'E-l_7
(pxp)
-1
= chmax LC 1(AiAl)-lAlX'E-lxAl (AiAlr1Ci£Cl (AiA)-1ciJ J.
(iWxk-1)
We thus reduce our task of finding the largest root of a (6x6) matrix to
that of finding the largest root of a (3x3) matrix. We have
-5.6822851
-23.8941742 32.0394028
27.9624838 -16.9682774 -5.5676617
x 33.5361910 -11.5681556 -18.6515639 -2'.1145348 - 5.5294069 33.7354395
-1.0430906 -21.8536616 26.4150470
21.6673303 -19.4725212
3.1196431
and
1
= 100
2.0005284
2.5436423
.7131640
10.0085859
-.0556434
2.4405366
1.1286155
5.9126956
1
= 1'00
Z
(say).
1
The Chmax LHE- _7 = 1~0 Chmax[Z], where chmaxLZ J is the largest '"
satisfying the characteristic equation of Z, given by
",3 _ (20.7558528)",2 + (128.5785355)'" - (217.6107096)
obtain
~ax(Z)
= 10.42239,
From this we
whence
Chmax [BE- 1 ]
Q, = -1'
=
+c3
= O.
= .1042239
=
c
3
(say).
C
We next form
ex
= .05
.0944. Entering the charts in section 7 with
and the degrees of freedom
23
s
= min(k-l,p)
m
= Jk-P21 1-1 = 1,
n
= N-k-p-l
2
= 3,
and
= 44 .5,
(3,1,44.5) = .184. Since Q < .184, this
05
3
indicates that the mean vector differences are non-significant at the 5%
we obtain from Chart VI, X.
level, and we accept the null hypothesis.
For purposes of comparison,
the group means are displayed in Table 6.1 below.
TABLE 6.1
Group means (based on 25 observations)
e
7.
Variate No.
Group
1
(1)
(2)
(3)
(4)
(5)
(6)
32.00
31.04
34.68
30.44
33.92
33.32
2
28.64
30.48
29.44
29.56
28.68
30.76
3
28.48
31.00
29.00
30.68
28.44
30.48
4
30.76
27.80
28.88
29.44
30.92
27.80
Charts of the upper 1%, 2.5%, and 5% points of the distribution
of the largest characteristic root.
7.1 Description.
P~Qs
<
Charts I-XII enable finding xa(s,m,n) such that
xa(s,m,n)~= 1-0, where Q is the largest non-zero root.
s
each page appear the graphs for a particular s and a (s=2(1)5, a
.025, .05) for m =
~,
On
= .01,
0(1)10 and n from 5 to 1000. The curves corres-
ponding to the twelve values of m on each page are in two sections, the
24
lower section being the continuation of the upper section, with an overlap
occurring from x = .50 to .55. Of the two scales for x at the bottom
a
a
of the page, the upper scale corresponds to the upper set of curves and
the lower scale to the lower set. The lowest curve in each case (with
the exception of Chart III) corresponds to m = -l, the next lowest to
m = 0, the next to m = 1, etc., to the uppermost curve, which corresponds
to m = 10.
The scale for n is on the left margin of the page and is
logarithmic.
7.2 Instructions for use. To find the percentage point xa(s,m,n)
corresponding to a given combination (s,m,n), and a desired significance
level a, first find the appropriate chart for s and a, select n on the
left margin, read across from this value to the appropriate curve for m,
and then read down to the proper x scale at the bottom of the page to
a
obtain the desired xa(s,m,n).
7.3 Example. Find the upper 1% point of the distribution for s
m = 4, and n
n
= 82
= 82.
For s
= 3 and a = .01,
we choose Chart IV.
= 3,
Selecting
on the left margin, we read over to the sixth curve from the left,
scale, find x = .182.
a
7.4 Note. For a more precise value of x (a,m,n), when n> 100, the
and reading down to the upper
xCi
-
method described in section 8 is suggested.
a
25
CHART I
s • 2
n
a ... 01
1000
-. ---
.
700
500
400
=i:t
':l\ -
~\!'
III
p:r; - H
it~
I
!_:+t-
+-
-, Y.i\-
-HtJ . I·~I
,.
200
70
++
-;+!- -;:
- .~":j-t
t$r. ..l _
30
. 1--t+t- •
.
-1-"
'
20
. -T je!+]
j i-. _ hF!:Tj
-:~Hf'Jjn
f
o
.1
.5
.6
.2
.3
.8
.5
1.0
26
CHART II
s • 2
Q •
.025
- T
ch; +i+! '1J--'_'i'
c;cj+n Il-:~;:::: .·ri" ; ~~r(
J
-
!jOo
300
t:. .ff.
-- -1=
-. ~I
I
200
,
11:.11 ;~:
i,'
~;!W~-
- I
'.'fu
.
-.' - -t, I!
, ,.1L ,\! \~. ',\ .\ Q> 'I!'f-t
,1'V,lm· 1'I..'i.'"
'
11
I.,
I
I,'
I
.,:'\.
i\~ i~N:' N'ld'H+t-I-H-!++-II-H-i-+++H-H-+f-H+-f+H+f+l+H+++f-HI++++f-H-H+H+f+l+H+++++H+I+f-H+-f++-H
H+t+t-HIH'IHi-\--l'l:If-\?\I1
H+t-H+~I\-\+II\! HJIII\Ji.~.~. k-i-rJ
~,,-:R'1-.I-N'l'-l-~J+I-II+J-HII++hH-H-H+l-H-++H++l+++H-++I+I-I-H-++H+J--H+J--H+J--++I+++I+H+I+I-H+I-H++I
~INN
I
I ~,\I! r ,', ! ,-" I
II
I
100
._ i;~~:-:~ ~h;t:
~-Ef: s~
r-~
70
: ¥i~:r-
,. 4:t:t1: .:",-
t-
rp.~~ :-;:~-:~ _--¢~~ , _
:tii~-
-:if: #.-
+1,,-
-1: 1
+
_
',' ..,
-t -1
,
-~~
,-+
L',
~4
m
-:.q~l=±$L~l
t,!-
I-i'ti - -
--i
!---.:.r
--I--
30
i,"
20
_=t+-
T.,
-::J~
I-H+HR>I--I--'--H 11 -1- =r_,
.. !- t··l -'" .
. '1!JH$ l=j,i:t-tt' _
J-H-+H- H-
.1
.6
-- _,
..
- 1-1-
.3
.8
.4
.9
.5
1.0
27
CHART III
Ct-
s • 2
n
1000
I+H:HW tr x. :: _::
TfI;'f1Jl i·H,
IF~"
li}:f=jn t-+ ',-" I:f
700
[!f'fE-::E
g'~-i:
i
-,
. --
- _.
>=f -:: _-. - _ - - _:_:
-j
-r- - -
.-
.05
- : r:f'::fp+I.-'-.
-
: -, - ':. ,iL:_.
. -,_.~
--
. -
.-
•
-;-
500
+
!tOo
-t
~"H
I·I-I-J-I-I.
u
.
-1--
300
200
-i--
'.
30
20
10
--,
.. - : :::l : " . : q
...... +
-- -
7
~:t
"i-: I :~l·-: .
j
·ttti· + '{ . :I-~
5
0
.1
.5
.6
.3
.8
.4
-
28
CHART IV
s .. ,
n
1000
ex ... 01
'ii; IJfi ii:i+fi j'l ' :
qJU!titl"
Urnt:'.Eft:r;, ..
700
tB~ "31~
1-+1=
' . ~.
.l+t:
I
i'~:'
r
-=ff
-I
,. "
'H.E
I
H·
I'
r 8:::t
- -,
H-
'-1+
~j:
"
.
-+
'-
-t-r-~
~
300
-.
--r
":.':
,\:
:
200
-,
"
ffH- j~ :.\11.
f-,,}L
! ':
'ill
r: " : ~.:, t'k"i'<"'kl"'~'H'+''H+i'-++II+f+tH·f+ht+Ni+H+H-++++++H+f++i!+H+f+tH-f++I+hI+H+f++++++I·-H+i-+-I-H+1
ri' :Sifl~
.
i
I,:
"
"
',\,.
'.
I
II
II
I
II
II
100
"
..t- ___
~
70
,J
'F·
T
f--'-+++ '-+' '
. +-
ltt-t~ H';
+,
IT,"
.
t
r'
,,+'
H-i-Tt
,1-;!
30
20
_________
10
~
. -.
_.
-
-J~
."
..
:
I:'
. 1, . ,··:t
. __ :t·"
.1: .' ' .
m1 ,.,-- ..
"':'.~htr:
7
.
5
'r:lt-l+··
·1+ Trri'
f·
+ .. ..,
1+ __
. H+
o
.1
.5
.6
.3
.8
.4
mm~J
29
CHART V
8 •
n
3
1000
-I:H: -_
100
-r-
r
500
400
l'
It 1
300
-f
'n
:\,\;r{-',,-
I+J-.I+l+i-;\;U-i+'\,..f\j1_
, : "
':
II
1_
~'
': 1\ 1'1.; ,
"
'M"kl>N+' 1+++;+H-if++++J+f++++J+f-H-IffiJ++l+t+.J+HH+++J+f-H-I+++J++l++++H-l+t++J+I+H·H-l-l
' I
NC'
/11
. -'
i'l.!, No
1\/ \INNJ
l\!1\'N;
l}
I
II
N..
II
I'
100
50
,,I
40
,0
~.
30
20
10
-
1
5
'+
o
.1
.5
.6
.3
.8
.4
.9
~-
30
CHART VI
s
n
1000
=3
ex
=
IHJ JIlT fJI) 1"
IJi" ,.dlltf. Jf "
e.lt -.
'U ".-,.
-- I--i.
-l;-:'~!~1 : :,_,
:-1-. .1 r
-t.:
t=!: _-1'-. j-+:t-t:+ ---H:"
700
_H±U::I_ .
I-H-,H+:: -
---1--0
_I
.05
--I
_
+-
.+
500
400
1+ H-
-t
~
~-
300
~
f-
if-
r
.,
~,
;
.,
,
,
+.
. -;-1+ ,-
, -,-'1.
~
-+
++
f\
,. .\
f·
,.
t-
+--j
.~
-r
200
,
i'
; i 11\
1\"
,\.! .
.++
T!""",
I
70
50
,
,,
.,
:-
T
40
30
20
10
-- ~ Ji:l~tt i
I::(lf
'f.:: . -,
7
:.
d- -l:~- -1
':.:IT
:tfll -:3.,.'h.'-' -,; I - j ..
.j.
5
- ~t
- +j~'''''''''
o
.1
.5
.6
.7
.3
.8
.4
.9
.5
1.0
31
CHART VII
S ::
4
ex :: .01
-
~--
- -+
-:1
+"
'-t
200
i !~
H+H+H+o+t\fY+'t' -t
H+H+lr-H-!,+H"\'-I'~-1I>r','t
H+1H-H+i,i
1
t!
I
4
i\
-I
I
I
~
"'i..
~
I
I!
: "
I
I
I
I
100
•
+
30
..
,
20
10
"1:-"';) ,
,- "[ELl-
7
.. - i
5
o
.1
.5
.6
.3
.8
.4
.9
.5
1.0
32
CHART VIII
s •
n
1000
rii:.U-!ilfh+1H.--.
4
a=
..
--'-- --"
-.L"
100
;.:
-
,.
~~:
500
1-'
--rH
.,
.j
400
300
It-,.
Ij
.
,.
: =,.:H-t --1= .
--:;: _ ": 'ri--'·
- -j
-I
..
I"
'+rt
t
__,__
,+f· --
j:.Lt __
~
.
j
.
•
.~.J
~-+: ;:,1
I.
- ±:.
H--
30
20
10
:. "t ,::
'. --, r .,:.
',If'
__
J-.,-t
1
.
'1
.1
.2
.6
.1
--,
-- --
':1
o
.5
__.
.3
.8
.4
.9
,--
i
.----
:,.1
--to ]-.
"
CHART IX
s "'"
n
1000
i!:~ i-I HT ::jl:idfJ
a "'" .05
- '::,::J:
.
CE;: :f:tfj:1:i FH:t: : : :: ' :, : ·;:'r 1;=1]:1:l:f1 .
:T[[{C[i
++u.±
'- l.'
.
. - - -rt-H-I·
- - _.
-.,'r'
-
.. '-±.:
, , 'Itttt.{:E,
: _:ef:'-
.'
-1-
.I~l=li-
500
1+ rf -r
ljOo
:-1:1- -..:I± :r
..'~:~~R;;·:· -~.i
300
r-~
i
:+
-H-:
"
200
•
4
,
'
l:i=l:f-1ttt:r:f~_=~.
70
n~h->o.
-ct:;:
rl- ' ,
30
..
20
10
I:t=t-• . :.~
.:E-
"
-
-:~fl>fl
--J
<,
-IJi
.
7
5
0
.1
.2
.5
.6
.7
.3
.8
.5
1.0
"
CHART X
s
n
1000
100
-l± .
-
500
=5
ex
= .01
'- -'-~'- -
-~ =l=
f
r- .
,r
!,Oo
+
300
200
I
'.T.
N~ i':,
+- + r)(\,-
-tr-r -,
I
~'-H--~-4;+'1
t I'
; I
i
rx~-
.\.I
100
-
•
i-+-
-
I:i::,._~_
I-
-1;-
",
'N·
I
I.
:~- -:r-=f~
-
-I-
_.J-
:-J---t++,-~
70
-
50
"
~
,
1- .. ·
30
20
10
:t
1
I-I--J-.I--J-
jc!.
5
H-
~+
- H-i-
o
.1
.5
.6
.3
.8
.4
.5
1.0
"
CHART XI
s • ,
-.<l'E.
H littlll: c ~tIll' ~Il'
l:tT· ._n_
J!' 1'fjc- 'Ice<,~ - ,. ::
:-J
If}J' : JL--
': -
- -:
=F ..•
- .--
- . - - - - -. - : ~+--.
_
-:1=
i
200
!
).:.,
I,
It'.
•
_--
I
r
~I
.
- .. :1· : __ __ _ _
H·'·-"'tT~
70
r
,
+-
30
20
--
7
. -,.-- +-
-
- -
1-
+ --
-(
5
o
.5
.1
.2
.6
.7
.3
.8
.4
.9
.5
1.0
xa
CHART XII
S
: -
~.
III
5
ex
= .05
--
-1-
: - ::
·t _..
-_.-
-.
-r .
.
-
-J
-
.'
,
f·
-:::t::- -
.
-:,
200
•
~;
1-;'
, ,""
",
i
'1\"'.
I
, !
',,",
I
'.
\,r
t'\!
i'\.'
.N"
I
I
100
•
II
10
I
;-
50
lj()
30
20
10
+
15
o
.5
.1
.6
.3
.8
.4
.1
+H+
37
8.
Asymptotic ;,(s,m) values.
8.1 Description.
s = 2(1)5" m =
-i"
In Table
8.1 are listed values of za,(s"m) for
0(1)10" and a, = .01" .025" .05.
For n > 100" these
may be used to obtain xa,(s"m"n)" with an error of at most five units in
the fourth decimal.
8.2 Instructions for use.
For a given combination (s,m,n) and a
desired significance level a,,, first find the corresponding za,(s"m) in
Table 8.1.
Then compute
za,(s"m)
y = m+2n+s+I
The desired percentage point is given by
~
~(s"m"n)
8.3 Example.
s
= 4"
=1
- e-
Y
.
Find the upper 2.5% point of the distribution for
m = 3, and n
= 200.
From Table 8.1 we obtain Z.025(4,3) = 33.0074.
Also,
=
408 ,
Y =
.0809
m+2n+s+l
,
and
e -Y =
.9223 ,
whence
X.
025 (4,3,200)
=
1 - e -y =
.0777.
38
TABLE
s
~a
1
~
0
1·
2
3
4
5
6
7
8
i
9'
10 I
I
.01
12.1601
14.5680
18.7346
22.4664
25.9526
29.2755
32.4795
35.5920
38.6311
41.6098
44.5375
47.4215
8.1
=2
.025
10.1465
12.4157
16.3599
19.9086
23.2352
26.4145
29.4870
32.4773
35.4018
38.2722
41.0970
43.8827
s
.05
8.5941
10.7393
14.4873
17.8762
21.0641
24.1192
27.0779
29.9628
32.7886
35.5658
38.3021
41.0033
.01
17.1762
19.5012
23.6906
27.5181
31.1203
34.5647
37.8905
41.1230
44.2795
47.3726
50.4118
53.4042
=3
.025
14.9006
17.1192
21.1262
24.7971
28.2597
31.5768
34.7848
37.9071
40.9597
43.9542
46.8993
49.8017
.05
13.1141
15.2389
19.0866
22.6216
25.9635
29.1708
32.2774
35.3050
38.2685
41.1785
44.0431
46.8684
i
S
~,a
.01
21.9646
~
0
24.2395
1. 28.4328
21 32.3175
3 1 35.9964
4
39.5253
5· 42.9387
61 46.2593
I
71 49.5034
8
52.6831
55.8073
9
10 ' 58.8833
1
I
i
I
J
I
=4
.025
19.4847
21.6713
25.7078
29. 4540
33.0074
36.4207
39.7262
42.9454
46.0934
49.1815
52.2182
55·2102
s
.05
17.5183
19. 6277
23.5278
27.1543
30.5996
33.9135
37.1265
40.2588
43.3246
46.3345
49.2964
52.2166
.01
26.6206
28.8613
33.0524
36.9748
40.7087
44.3009
47.7814
51.1710
54.4847
57.7338
60.9269
64.0709
=5
.025
23.9697
26.1339
30.1861
33.9834
37.6027
41.0883
44.4688
47.7639
50.9876
54.1508
57.2615
60.3264
.05
21.8538
23.9515
27.8835
31·5731
35.0938
38.4880
41.7829
44.9971
48.1441
51.2340
54.2745
57.2717
39
9. Computation of the percentage points.
The charts in section 7 were prepared from percentage points which
were computed using two types of approximations to the c.d.f. of the
largest characteristic root.
The first type of approximation, obtained
by K. C. S. Pillai 1:15 J, was used to compute, in general, the points
for n
s
~
100. These formulae by Pillai are available for each value of
= 1,2, ••• ,5.
For large values of n, generally n
> 100,
asymptotic ap-
proximations based on P1l1ai'a approximations were used·which were obtained by J. R. B. Whittlesey ~29J.
To compute the percentage points from P111ai's approximations, denoted by ps (x,m,n), the value of pa (x,m,n) for a particular combination
•
(a,m,n) was first calculated at the 100 values of x from .01 to 1.0 at
intervals of .01. Then, on the resulting ordinates, a method of inverse
interpolation was used to obtain the upper 1%, 2.5%, and 5% points, i.e.
xa auch that
The overall computational procedure for each value of s was as follows:
For a fixed integral m and an initial (small) n, the percentage
points were computed; n was then stepped up by unit increments until the
desired set of values of n was covered.
Then the expression was modified
for the next value of m, and the percentage points for this value of m
were computed for all desired n.
This procedure was continued to m = 10,
which is a fairly large value for practical purposes.
As a partial check on the accuracy of these percentage points, a
number of the points were substituted in the expression for the exact
40
.
c.d.f., and the largest error which occurred was found to be less than
two units in the fourth decimal.
Whittlesey's asymptotic approximations (for integral values of m)
may be obtained from Fillai's approximations by using Stirling's approximat ion and the substitution
Z
= -(m+-2n+s+l)ln(l-x),
and then letting n become large. From the resulting expressions, denoted
~y
ws(z,m), a method of inverse interpolation was used to obtain za(s,m)
(or za) such that for fixed sand m,
w (z ,m) ... 1 -
s 0:
a
From these "asymptotic" za(s,m) values, the percentage points
•
~(s,m,n)
were obtained by inverting (9.1) •
A selected group of these percentage points was checked by substitution in the expression for the
~xact
c.d.f., and of those points used
in the final tabulation, the error for the most unfavorable combination
of sand m (8=5, m=IO) was found to be five units in the fourth decimal.
This error, which is primarily an error of asymptotic approximation, is
considerably smaller for smaller values of sand m, and, because of the
asymptotic nature of the approximation, decreases in all cases, for increasing n.
Computation of the percentage points and the za(s,m} values was
carried out on the IBM 650, with the programs c01ed in The Bell Interpretive System
£32J.
The program of the exact c.d.f. of the largest
root, for s = 2(1)6, m = 0(1)10, and n :: 0 was coded in .j>OPSIR
LIJ,
and is available at the IBM Laboratory, The Institute of Statistics,
41
Raleigh, North Carolina. A more detailed account of the computing procedures used in obtaining the percentage points for integral values of m,
as well as the explicit expressions for Pillai's and Whittlesey's approximations, may be found in
C10J.
The computation of the points for
m = ~ was done sUbsequent to the computation for integral valued m, and
Pillai's and Whittlesey's approximations were used, after appropriate
modifications were made in the latter.
10.
Notes and references.
Likelihood-ratio methods for dealing with multivariate tests such
as those considered in this paper have been advanced by Wilks ~30~ and
Bartlett
•
[4], and comprehensive accounts of some applications of these
techniques are given by Wilks ["31 J and Rao [17 J.
Procedures based
on the largest characteristic root were proposed by Roy
1:19,
20, 2}J
for testing (i) independence between two sets of variates and (ii) multivariate linear hypotheses, and also for obtaining confidence bounds on
certain parametric functions associated with both cases. These procedures involve a knowledge of the c.d.f. of the largest root, which was
given in terms of a chain of recursion formulae by Roy
~19
J
and Nanda
~12J, both starting from the Joint distribution of the roots obtained
earlier.
However, the numerical computation of the c.d.f., based on
these recursions, becomes extremely laborious when the total number of
non-zero roots involved is even moderately large, say greater than three.
It was the aim of the author, by making use of available electronic computing equipment, and by means of the approximations obtained by Pillai
1:15 J and Whittlesey 1:29] working under Roy, to construct a set of
42
tables of some upper percentage points of the distribution of the largest
j
root, which could be used to carry out the testing procedures and to set
up the confidence bounds mentioned above.
The distribution function itself, for a particular number (s) of
non-zero roots, is a two parameter curve, and for the situations to which
we restrict ourselves, the first parameter (m) 1s in general small, while
the second parameter (n) is a function of sample size, and in many practical applications is large. Within recent years, tables of the percentage points and a set of tables of the c.d.f. itself (for s=2) have appeared, but in most cases the range of the parameters s, m, and n has been
rather limited.
•
One of the most extensive set of tables to date is that
by Pillai [16 J, giving the upper
1%
and 5% points based on his own ap•
proximations to the exact c.d.f., and covering the range s = 2(1)5,
= 5(5)40(20)100(100)500,1000.
Other tables, all of which
are based on the exact c.d.f., include Nanda I s
[13J, the upper 1% and 5%
m = 0(1)4, and n
points for s = 2, m = O(! )2, and n = ~{! )10; S. B. Chaudhuri' s
upper 1% and 5% points for s
m =n
= ~{~)5{1)8,
and D. H. Rees'
= 2,
for s = 3, m
m =n
= 2!(!)5(1)11,
= 0(~)2,
1:5 J,
the
= 3,
for s
and n = !(!)2; F. G. Foster
1:7], the upper 1%, 5%, 10%, 15%, and 20% points for
s = 2, m = ~,0(1)9, n = 1(1)19(5)49,59,79; Foster's
1%, 5%, 10%, 15%, and 20% points for s
= 3,
1:8],
the upper
m = ~(~)3, and n
= 0(1)95.
In the present work, the upper 1%, 2.5% and 5% points of the distribution were calculated, based on the approximations by Pillai and .
asymptotic approximations (for large n) derived by Whittlesey. The range
of parameters which we have considered is s
= 2(1)5,
m = ~,0(1)10, and
n = 5(1)100(10)500(50)1000, and in addition, a value za(s,m) is available
for each s, m, and a combination such that for large n, and in particular
n
> 1000,
the corresponding percentage point may be calculated simply and
directly.
It is felt that the charts of the percentage points presented herein
will be adequate for the purposes of most investigations; we therefore
refrain at this time from reproducing the rather extensive set of tables
required for the actual numerical values.
In conclusion, the author wishes to express his sincere thanks to
s.
N. Roy, R. E. Bargmann, and the staff of the Institute of Statistics
for their helpful advice and assistance in preparing this report.
J
44
References
Adams, H. E., "DOPSIR:Double precision floating point SOAP interpretive routine," 650 Program Library, File No. 2.0.010,
IBM, Washington, D. C. (1956).
Bargmann, R. E., "A demonstration study on the effectiveness of
factor-analytical methods," Hochschule f. International
Paed. Forschung, Frankfurt)Main, Forschungsbericht (June 1955).
[)J
, "A study of independence and dependence in multivariate normal analysis," North Carolina Institute of Statistics Mimeograph Series No. 186 (1957).
---,....:---:--:---
Bartlett, M. S., "Further aspects of the theory of multiple regression," Froc. Cambridge Phil. Soc., Vol. 34 (1938),
pp. 33-40.
Chaudhuri, S. B., "Statistical tables and certain recurrence relations connected with p-statistics," Calcutta Stat. Assoc.
Bulletin, Vol. 6 (1956), pp. 181-188.
[6J
Fisher, R. A., "The sampling distribution of some statistics obtained from non-linear equations," Ann. Eugenics, Vol. 9
(1939), pp. 238-249.
Foster, F. G. and Rees, D. H., "Upper percentage points of the
generalized beta distribution. I," Biometrika, Vol. 44 (1957),
pp. 237-247.
Foster, F~ G., "Upper percentage points of the generalized beta
distribution. II," Biometrika, Vol. 44 (1957), pp. 441-453.
Girshick, M. A., "On the sampling theory of the roots of determinantal equations," .Ann. Math. Stat., Vol. 10 (1939),
pp. 203-224.
Heck, D. L., "Construction and applications of certain tables for
multivariate analysis," unpublished MA thesis, University of
North Carolina (1958).
HBU, P. L., "On the distribution of roots of certain determinantal
equations," Ann. Eugenics, Vol. 9 (1939), pp. 250-258.
[12J Nanda, D. N., "Distribution of a root of a determinantal equation,"
Ann. Math. stat., Vol. 19 (1948), pp. 47-57.
[13J Nanda, D. N., "Probability distribution tables of the largest root
of a determinantal equation with two roots," Journal of the
Indian Soc. of Agricultural stat., Vol. 3 (1951), pp. 175-177.
•
[14J Pearson, K., Tables of the Incomplete Beta-function, London, The
Biometrika Office (1934).
[15J Pillai, K. C. S., "On the distribution of the largest or the
smallest root ot a matrix in multivariate analysis,"
Biometrika, Vol. 43 (1956), pp. 122-127.
[16J ____~~~__~, Concise Tables for Statisticians, Manila,
Bookman, Inc. (1957).
[17J Rao, C. R., Advanced Statistical Methods in Biometric Research,
New York,John Wiley and Sons (1952).
[18] Roy, S. N., "p-statistics or some generalizations in analysis of
variance appropriate to multivariate problems," Sankhya,
Vol. 4 (1939), PP. 381-396.
, "The indiVidual sampling distributions of the maximum,
---:-:-the minimum and any intermediate of the p-statistics on the
null hypothesis,1I Sankhya, Vol. 7 (1945), pp. 133-158.
£20J
, liOn a heuristic method of test construction and its
--~u--s~e---:-in multivariate analysis, II Ann. Math. Stat., Vol. 24
(1953), pp. 220-238.
£21]
and Bose, R. C., "Simultaneous confidence interval
---es""":t-:'t-mation," Ann. Math. Stat., Vol. 24 (1953), pp. 513-536.
[22]
, "Some further results in simultaneous confidence
---:-in"""::t--e-rval estimation," Ann. Math. Stat., Vol. 25 (195 4),
pp. 752-761.
["23J
, "A report on some aspects of multivariate analysis,1I
----~N~o-r~t~h Carolina Institute of Statistics Mimeograph Series
No. 121 (1954).
[24] ____~_ and Gnanadesikan, R., "Further contributions to multivariate confidence bounds, Ii North Carolina Institute of
statistics Mimeograph Series No. 155 (1956).
["25J
and Potthoff, R. F., IIConfidence bounds on the 'ratio
---o~f;--m~eans' and the 'ratio of variance' for correlated variates,"
North Carolina Institute of Statistics Mimeograph Series No.
170 (1957).
.
46
C26J ROy, S. N. and Bargmann, R. E., "Tests of multiple independence
and the associated confidence bounds," North Carolina Institute of Statistics Mimeograph Series No. 175 (1957).
[27J
and Gnanadesikan, R., "Further contributions to mul--''''It:-'i:-'v-a-riate confidence bounds," Biometrika, Vol. 44 (1957),
pp. 399-410.
[28) Thurstone, L.L. and T. G., UFactorial studies of intelligence,"
Psychometric Monograph No. 2, Chicago Univ. Press (1941).
Whittlesey, J. R. B., "Summary of results obtained under the Ford
Foundation contract," unpublished research report, Dept. of
Statistics, Univ. of North Carolina (1954).
[30J Wilks, S. S., "Certain generalizations in the analysis of variance,"
Biometrika, Vol. 24 (1932), pp. 471-494.
[31 J ____~~~, Mathematical Statistics, Princeton, N. J., Princeton
Univ. Press (1943).
£32J
..
Wolontis, V. M., "A complete floating-decimal interpretive system
for the IBM 650 magnetic drum calculator," IBM Applied Science
Div., Tech. Newsletter No. 11 (1956) •