S.N. Roy and Whitfield Cobb; (1959)Contributions to Univariate or Multivariate Analysis of Variance with Fixed Effects, Normal or Nonnormal Random Effects, and Normal Error."

•
CONTRIBUTIONS TO UNIVARIATE OR MULTIVARIATE
ANALYSIS OF VARIANCE WITH FIXED EFFECTS,
NORMAL OR NONNORMAL RANDOM EFFECTS,
AND NORMAL ERROR
by
.~
I,
/
S. N. Roy and Whitfield Cobb
University of North Carolina
,
lit
.·011
1
:1,
.
1
J
This research was supported by the United
States Air Force through the Air Force
Office of Scientific Research of the Air
Research and Deyelopment Command, under
Contract No. A~ 49(638)-213. Reproduction
in whole or in part is permitted for any
purpose of the United States Government.
Institute of Statistics
Mimeograph Series No. 214
. January, 1 959
ACKNOWLEDGMENTS
It is with sincere appreciation and humility that
•
I acknowledge my indebtedness, first, to my wife, who
willingly endured prolonged separation, reduced income,
and doubled parental responsibility while I belatedly prepared for and then undertook this research, and, next, to
Professor S.N. Roy, who delineated the problem, made many
fruitful suggestions, never lost patience, and never failed
to inspire as well as guide my hesitant efforts.
Also Professors R.C. Bose, Harold Hotelling, W.R.
Mann, and G.E. Nicholson, Jr., I take this opportunity to
thank for their personal encouragement as well as for the
•
formal instruction I received from them.
To Mrs. Ouida Taylor of the Institute of Natural
Science I am deeply grateful for reading the manuscript
with discernment and typing it with due regard for appearance and meaning.
Miss Martha Jordan and Miss Marianne Byrd should
also be mentioned for their helpfulness throughout this
undertaking.
..
At one time or another during the past three years I
have received financial assistance from Guilford College,
•
Southern Fellowships Fund, the United States Air Force, and
the Office of Naval Research.
TABLE OF CONTENTS
-
CHAPTER
•
. . . . . .. . . . .
INTRODUCTION . • • •
NOTATION. . . • • •
I.
Page
•
•
•
•
•
GO.
•
•
•
•
MATHEMATICAL PRELIMINARIES. • . • • . . • . . .
1.1 Properties of direct sums and direct
products. • • • • . • . . • • • • . • • •.
1 .2 Properties of h(m 2 ), J:!(m-1 x m), and
~(m x m..1)
• •
0
•
•
0
•
•
•
•
•
•
•
D
•
••
1.3 Inverses of partitioned and patterned
matrices. • • • • • • • • • • • . • • . .•
1.4 Some lemmas on matrix factorization, completion, stationary values and characteristic roots. • • • • . • . • • . • • ••
1.5 Miscellaneous lemmas. • • • • . . . . . •.
II.
.
UNIVARIATE MODEL: DESIGNS AND STATISTICS • . • .
2.1 General two-factor models • . • • • • . • .
2.2 Derivation of a-related b-free statistics •
2.3 Derivation of b-related a-free statistics •
2.4 Derivation of a-freeb-free statistics
and the error sum of squares • • • • • • . .
2.5 Certain simplifications of the previously
defined statistics • • • • • • • • . • • • .
2.6 Simplification possible for randomized
block statistics • • • • • • • • • • • •
2.7 General multifactor models • • • • • . • • •
2.8 Multifactor ortho9~nal designs and their
statistics. • • • • • . • . . • • . • . • •
2.9 Summary . . . . .
•
iii
It
..
•
•
•
•
•
0
•
•
•
•
0
vi
xiv
1
1
2
5
9
10
20
20
22
25
26
28
33
35
36
45
iv
CHAPTER
"
III.
•
•
UNIVARIATE MODEL: ESTIMATES, TESTS, AND
CONF IDENCE BOUNDS • • . • • • • . •
••••
3.1 Testing the hypothesis of equal fixed
effects • • • • • • • • • • • • • • • • • •
3.2 Estimating linear functions of fixed
effects • . • • • • • • • • • . • • • •
3.3 Testing other testable hypotheses on
fixed effects • • • • • • • • • • • • . • •
3.4 Testing the hypothesis of zero variance
of block effects. • • • • • • • • • • • • •
3.5 Confidence bounds on a~ when .£ is N(~i,a~l)
3.6 Preliminary or quasi-confidence bounds
on l2. t Ii 'Hb • • • • • • • • • • • • • • • . •
3.7 Alternative confidence bounds on a~ when
b is normal • • • • • • • • • • • • • • • .
3.8 Confidence bounds when b is two-valued.
3.9 A quadratic form in m-tile differences. • •
3.10 A priori probability of bounding loci. • .
3.11 Inner and outer boundaries of bounded
regions. • • • • • • • • • • • • • • .
3.12 Summary of development when b is not
known to be normal •
0
IV.
"
V.
•
•
•
•
•
•
•
...
MULTIVARIATE MODEL: DESIGNS AND STATISTICS. • •
4.1 General two-factor models • • . • • • .
4.2 Suitable statistics for MANOVA. • • • • . .
4.3 Variances and covariances for a matrix of
normal variates • • • . • • • • • • • • • •
4.4 Restricted designs and multifactor models.
4.~ Multivariate analogs of ANOVA sums of
squares • • • • • • • • • • • • • • • • • •
47
47
48
~1
~4
55
57
58
60
62
71
76
82
84
84
86
88
91
93
MULTIVARIATE MODEL: TESTS AND CONFIDENCE BOUNDS 96
5.1 IIStep-down" and multivariate m-tiles. • • • 96
~.2 Testing the hypothesis of equal fixed
effects • • • • • • •
102
5.3 Estimating ~ and certain simultaneous
confidence bounds pertinent to ~.
. .• 106
0
•
•
•
"
•
•
•
•
•
•
v
CHAPTER
(V)
5.4 Estimating ~1 when ~ is normal • • • • . • • 109
5.5 Obstacles to confidence bounds when
~1 -I O'~~. • • • • • • • • • • •
• • • • 110
5.6 Preliminary or quasi-confidence bounds
on lit!:! t!::!!i • • • • • • • • • • • • • • • • • 11 5
5.7 Confidence bou;lnds on ch(Z1) when B is
2
normal and ~1
0'1~".'
. • •- . • ' •• 119
5.8 Confidence regions when laC s x 2) is not
necessarily normal . • . • . • • • • • • • • 121
BIBLIOGRAPHY. • • • • • • . • • . • •
•
•
Page
...
132
INTRODUCTION
Statistics is concerned with making inferences (or
decisions) which have a calculable risk of being wrong. Both
the kind of inference to be drawn and the magnitude of the
risk depend upon certain assumptions relating the observations from which the inference is drawn to the ultimate
unobservables about which the conclusions are stated. These
assumed relations constitute the statistical model--e.g., the
regression model, the analysis of variance (ANOVA) model, the
covariance model, the variance components model, etc.
•
Actu-
ally reaching conclusions from observations is the business
of statistical analysis, but the model determines the kind
of analysis to be made.
Thus we have regression analysis,
ANOVA, analysis of covariance, variance components analysis,
etc.
Variance components analysis, as understood here, pre-
supposes a model in which each observation is a sum of at
least three components of which two or more are random
variables, each having a frequency distribution.
Variance
components analysis then proceeds to ascertain and appraise
the variations of these random components.
Traditionally--that is, until a few years ago--the
variance components model stipulated a normal distribution
for each of its random components, and variance components
analysis stopped with estimates of the variances of these
vi
vii
random
compon~nts.
But this has changed in recent years so
that today variance components analysis may include testing
certain hypotheses about these variances
and determining
confidence bounds on either the variances themselves or
simple functions of them, but on the assumption that each
random component is normal.
However, even now for the ANOVA
model with all fixed effects and a normal error component
(to say nothing of the variance components model in which
some effects are random) there is no adequate treatment of
the statistical analysis of multivariate experiments--that
is, experiments in which observations are made on not just
•
one characteristic but several characteristics of each one
of the experimental units available under the design and
the sampling scheme.
Here then are two areas in which
analysis of variance may advance--univariate models with
nonnormal random components and multivariate models with
either fixed, normal, or nonnormal random components.
The
present study attempts to extend the theory of variance
components analysis in both of these areas.
Gnanadesikan [6]1 recently considered estimates,
tests, and confidence bounds for variance components in
both univariate and multivariate experiments, but on the
assumption of normality for both the random block effects
and the error.
In his multivariate model not only were all
1The numbers in square brack~s refer to items
listed in the biDliography at the end.
viii
random block effects normal but there
~as
also the further
restriction that the variance matrices of these block effects were all proportional to the variance matrix of the
normal error.
This was recognized as a severe limitation,
but it was thought at the time to be necessary for the
analysis to go through.
One aim of the present investi-
gation, so far as the multivariate part is concerned, is
to remove from Gnanadesikan's model this restriction of
proportionality and still obtain confidence bounds on some
functions (actually on characteristic roots) of these variance matrices when the block effects are normal.
•
Although
difficulties were encountered, this attempt has been
successful.
A more comprehensive aim of this study is to remove
from the model (whether univariate or multivariate) the
stipulation that the random components are normally distributed and still obtain confidence bounds on some measure
of the dispersion of those random components, e.g., the
interquartiledifference if not the variance.
The under-
lying motive of the whole study has been to remove from the
model restrictive assumptions even including its designation as an ANOVA model or a variance components model. Thus
another specific aim is to obtain for each unobservable of
the model a statistic which can be used either in ANOVA or
in components of variance analysis, regardless of whether
other
unob$erva~l~s
of the model are fixed or random.
ix
Chapter I states without proof some of the less
familiar propetties of matrix algebra which are used repeatedly in subsequent chapters.
Also in Chapter I are
stated and proved certain lemmas, mathematical rather than
statistical in nature but specifically needed in the sequel
and not encountered by the author in any available literature.
Chapters II and III are concerned with statistical
inference from univariate data.
The observations are
arrayed as the components of a vector
r
and statistics are
regarded as the result of premultiplying
matrix.
.
r
by a suitable
Chapter II is primarily concerned with deriving
these matrices--expressing them in terms of a matrix denoted
by
Mand
here called the structure matrix (although commonly
called design matrix) without specifying the design--and
ascertaining that the resulting statistics will have certain
desirable properties regardless of whether the unobservables
of the model are (i) fixed, or (ii) random and normal, or
(iii) random but not necessarily normal.
Chapter III shows
how the statistics defined in Chapter II may be used for
estimation, tests of hypotheses, and confidence bounds when
anyone of these three alternative assumptions is incorporated into the model.
The most novel feature of Chapter III is its treatment of random effects that are not necessarily normal.
If
the distribution is of unknown type, its variance might not
even exist;
and~
in any
~ase,
an
~stimate
of the variance
x
would not
nece~sarily
reveal much about the distribution as
a whole.
On the other hand, quartiles, sextiles, octiles l
or deciles would indicate progressively more about the
nature of that distribution.
The actual values of these
m-tiles of the distribution, like the actual values of
fixed effects, are not estimable; but we succeed in obtaining confidence bounds for the differences between successive
odd m-tiles.
The confidence bounds we get are regions in
the space of these m-tile differences--not separate intervals for the individual m-tile differences.
But somewhat
like the indeterminacy principle of quantum physics, for a
given set of observations if we increase m in order to approximate the distribution more closely, the confidence
coefficient is correspondingly decreased.
Chapters IV and V are concerned with statistical
inference from multivariate experiments.
of each of p variates are arrayed as an
The n observations
n xp
matrix, y,
and statistics are regarded as other matrices (with p columns) obtained by premultiplying y by suitable matrices
(with n columns).
In Chapter IV it is shown that useful
multivariate statistics are obtained from y by exactly the
same transformations used on y in Chapter II.
Again it is
emphasized that the statistics are defined and some desirable properties are demonstrated without the assumption of
normality for random effects and without even specifying
which
~ffect6 ~r$
fixed and which are random.
xi
Chapter V indicates how the statistics defined in
Chapter IV may be uSed with different models for appropriate
kinds of inference.
But here again the emphasis is on what
can be inferred when the model is purposely left somewhat
indefinite.
For example, it is shown that contrasts among
one set of fixed effects may be estimated, or testable hypotheses about one set of fixed effects may be tested,
whether the other factors are represented by fixed or random effects, normal or nonnormal.
As indicated earlier,
if the model has a normally distributed random factor,
confidence bounds are also found, not on either the marginal
variances or the conditional variances but on the characteristic roots of the variance matrix whether or not that variance matrix is proportional to the variance matrix of the
error.
Also in Chapter V marginal and conditional m-tiles
are defined for a multivariate distribution, and the problem
of finding confidence bounds on differences between successive odd m-tiles is posed.
For the general mUltivariate
(but not necessarily normal) case, no solution has yet been
attempted.
For the interquartile ranges of a bivariate,
confidence bounds, of a sort, have been found; but the
exact values of the confidence coefficient and the exact
shape of the confidence region have not been determined
except for the simplest possible type of relation between
the two variates.
xii
It is left for a future investigation, perhaps for
another investigator, to obtain these details and bring
the confidence statements about m-tile differences for
multivariate models to the same degree of explicitness
herein achieved for univariate models.
Since these m-tiles
.
for a multivariate model are defined in a sequential manner,
it seems reasonable to suppose that the step-down procedure
of J. Roy [12] might be adapted to this problem.
Whether
this particular approach is successful, some distribution
other than that of the characteristic roots might be found
which would put confidence bounds on simpler functions of
the m-tile differences.
The present work also leaves unattempted the univariate problem of comparing the confidence statements on,
say, the interquartile range of a random effect as obtained
by the methods presented here for an unspecified type of
distribution with corresponding confidence statements when
the type of distribution is specified and the known relation of its interquartile range to its variance is utilized.
Presumably smaller confidence regions would be obtained
when more specifications are assumed in the model, but a
comparison for several types of distribution might give
some indication of the usefulness of having a method which
does not depend upon a specified distribution for the
random effects.
Appended to this report is a brief list of recent
articles and books.
It is by no means an exhaustive
xiii
bibliography of the many topics in statistics and mathe-
..
matics presupposed by the theory of univariate and multivariate components of variance analysis.
For relevant
material published or otherwise made public before 1955,
the report by R. Gnanadesikan previously referred to [6]
has a rather complete bibliography.
Hence the bibliography
attached hereto includes only those works bearing directly
on the problems considered here and the particular approach
made toward their solution in the present study.
Thus, for
example, we do not list either the recent articles on
sampling from a finite population by J. W. Tukey using
"polykays" or the early (1937-38) attempts by B.L. Welch
and E.J.G. Pitman to use the F test even when the test
statistic was not a ratio of chi square variates.
A much
more questionable omission is the report by W.A. Thompson
in the mimeo series of the Institute of Statistics of the
University of North Carolina. but the rigid application of
the above criterion would exclude it.
Although seemingly
allied to the general problem of the present study.
Thompson's work is primarily concerned with determining
suitable designs for a particular approach to components
of variance analysis due to Walde
Moreover. his only model
was univari.ate "with errors arising from two sources"--i.e.,
one set of fixed effects and two normal random components.
NOTATION
•
N.1
Alphabetical abbreviations of words and phrases.
a.e.
almost everywhere
ANQVA
analysis of variance
BIB
balanced incomplete block
cumulative distribution function
ch( )
characteristic roots of
chmax ( )
the largest characteristic root of
chmin ( )
the smallest characteristic root of
col.
column
cols.
columns
const.
an unspecified constant
cov( )
the covariance of
Defn.
definition
d.f.
degree(s) of freedom
~( )
the expectation of
exp( )
the exponential function of
'H:
the hypothesis (that)
inf ( )
the infimum of
MANOVA
multivariate analysis of variance
max(m,n)
the larger of the integers m and n
min(m,n)
the smaller of the integers m and n
NSC
necessary and sufficient condition(s)
xiv
xv
PBIB
partially balanced incomplete block
p.d.
positive defihite
p.d.f.
probability density function
Pre )
probability that
Prop.
property
p.s.d.
positive semidefinite
sup( )
the supremum of
sym.
symmetric
tr( )
the trace of
var( )
the variance of
N.2
a(m),
Typographical conventions and special symbols.
~(m)
different matrices with m rows and m cols.
a(mxn), B(mxn)
different matrices with m rows and n cols.
lal
the determinant of a
a- 1
the inverse of h
h'(n x m)
the transpose of a(m x n)
~(m),
Q(m) different col. vectors each with m components
the transpose of
~(m)
the i-th col. of a(m) or hem x n)
the i-th component of
~
a ..
the element in the i-th row and j-th col. of
I(m)
the m-th order identity matrix
I(m)
the
m x m matrix with all elements unity
I(m x n)
the
mx n
i(m)
the col. vector with all m components unity
1J
matrix with all elements unity
a
xvi
Q(m x
n)
the
mx n
matrix with all elements zero
,g,(m)
11 (m),
la(m)
.a(m x n),
~(m x
[ Q,(p x n),
Q(p x
8,
or
[
no.
eols.
!!J
~,
12
n
q
no.
rows
m
p
denotes an m + p x n + q matrix partitioned
into four submatriees.
CHAPTER I
MATHEMATICAL PRELIMINARIES
1.1
Properties of direct sums and direct products.
A(mX n),
De f n • 1. 1 • 1 :
ACm x n) .;. B( P x q) ::
[ Q(p x n),
-
Q(mx q) ]
.!l(p x q)
and is called the direct sum of A and
Defn. 1.1.2:
e
b 11 8(mxn) •
b 1 28(mxn).
b 21 A(mxn) ,
b 22A(mxn) ,
A(m x n) • x .§(p x q) ::
•
••
b p1 A(mxn) ,
•
"'•
b
p2
A(mxn) ,
·.. ,
·.. ,
·.. ,
§.
b 1 q6(mxn)
b 2 q8-(mxn)
•
••
b p q6(rnxn)
and is called the left direct product or Kronecker product
of A and
~.
Prop. 1.1.1:
Prop. 1.1.2:
Prop. 1.1.3:
[A(rnx n) + .§(px q)][~(nx r) +.Q(qx 5)]=
e&. +lill
[A(rnxn) 'xla(pxq)][C(nxr)'x D(qxs)] =1:&. ·x!ill. •
1
2
Prop. 1.1.4:
•
•
L~(mxn)+.§(px q)]+[~(mxn)+ D(pxq)]
= [a +
but [a(mx n) ·x
Prnp.1.1.5:
Q.] .;. [!i + Q]
~(px
q)] + [Q(mx n) ·x Q(p x q)]
I- [8.
+ ~] • x [ia + 0] •
[A(mxn)·x ia(pxq)] + [~(mxn) ·x ia(pxq)]
= [A
+ C] ·x ia
[a(mx n) ·x .§(px q)] + L~(mx n) ·x ~(px q)]
= A • x [ia + ~] •
Prop. 1. 1 .6:
A.x [B, QJ [A.X ~,
0,
.',g
=
A·x 12,
B,
but
-
[
o
-'
·x A,
8. ·x ~J
a ·x ,g
AJ .
~
·x
oS ·x ii
1.2 Properties of h(m 2 ). H(m-1 x m), ~ ~(m x
m=1).
In addition to the identity matrix I(m) and the more
-
or less familiar j(m) and J(m xn) consisting of unity for
-
every element, certain other matrices with special properties
have been found useful and will appear repeatedly in the
subsequent chapters.
Special symbols are given here and
used consistently to denote these matrices.
Their usefulness
rests upon the peculiar properties which are listed below.
No proofs are given, because the properties cited are directly
verifiable though not always obvious without verification •
.Q.rl.n..1.2.1:
h'(4):: (1,0.0,1); h'(9) :: (1,0,0,0,1,0,0,0,1);
. ,t·
3
and, in general, !lema) denotes the column vect~r wh~se components are all zero except for unity in the [i(m+ 1) + 1 ]-th
pesiti~n
fer
i
= O,1, ••• ,m-1
¥2
-1
..1.
~
~
-1
v'6
••
•
••
•
..1.
••
•
_=='==
1
1
.v' (m-2)
(m-fY , ¥'(m-2) (m-1 ) , ¥"(m-2) (m-1 ) ,
1
.vI(m-1 5m
1
,
,
if'(m-1)m
1
oV"(m-1 )m
0
,
•••
0
0
0
,
•••
0
0
••
••
•
••
•
•
-(m-2)
1
• • • .¥'(m-2) (m-1 ) ,
if'(m-2) (m-1 ) ,
1
if'(m-1 )m
•••
[I(m)
~l (m)
Defn. 1 .2.3:
~(m+1
Prop. 1 .2.4:
!It(m 2 )hCm 2 ) = m ;
Prop. 1.2.5:
-h(m )h-t Cm
2
x m) ::
2)
1
¥' (m-1 )m
J
0
-(m-1 )
.v(m-1 )m
•
-h t (m )j(m
2
2)
= .!l(m 2 ) ex.!lt(m 2 ) •
= m •
•
.~:
4
frop. 1.2.6:
[A(m x n) 'x l(n)]h(n 2 ) = ,2,(mn), where the m
elements in the j-th col. of a become respectively the
(jm -m+ 1 )-th through jm-th components
~.
1.2.7:
~f
,2, for j= 1,2, ••• ,n.
Li(m)'x h t (n 2 )][,2,(mn) 'x len)] = a(mx n) ,
where the (jm - m+ 1) -th through jm-th components of A become
respectively the elements of the j-th col. of b for
j
..
=1
t
2, ••• ,n •
~(m
frop. 1.2.8:
!i(m-1 x m).i(m) =
Prop. 1.2.9:
.l:i<m:Txm>!::!'(mxm-1)
- 1) •
= 1(m-1)
•
J:>rop. 1 .2.10: !i' (m x m:r)!i(m:f x m) = l(m) - ~ .J:(m) •
jj(m-r x m)
Prop. 1.2.11:
-1 j'(m)
,;m-
is orth9gonal.
Prop. 1.2.12: j'(m + 1)K(m+1xm) = j'(m) •
-
-
-
Prop. 1.2.13: K,'(mxm+1)K(m+1xm)
= l(m)
•
~. 1.2.14: Postmultiplying A(mxn) by K(nxn:r) removes
the n-th eel. of A.
Premultiplying A(m x n) by ~,(rn-r x m)
removes the m-th row of
8.
Prop. 1.2.15: Postmultiplying A(mxn) by K,t(nxn+1) adjoins
.Q.(m) as an (n+1) -th col.
Premultiplying A(m x n) by
K,(m+1 x m) adj~ins .2.' (n) as an (m+1 )-th row.
Prop. 1 .2.16: Postmultiplying A(m x n) by K,(n x n::r)~' (0-1 x n)
replaces the n-th col. of
8
with .2.(m).
Premultiplying a(mx n)
by ~(m x m:1)!$.' (m:T x m) replaces the m-th row of 8 with
.Q.'
(n) •
Prop. 1 .2.17; .!S:.' (m-1x m>!j(m
X m+1 )~(m+f
x m) = .tl(m-1 x m) •
- (m:f x m)H(m
- x m+1 )K(m+1
- x m)K'
- (m x m+1 .....)H' (m+1 x m)K(mxm-1)
-
Hence K'
= I(m-1)
•
Pro.n. 1.2.18: The n-th col. of anyA(mxn) may be expressed
-
-
-
-
as [A(m x n) - A(m x n)K(n x n:T)Kt(n-1 x n)]j(n) •
-
Hence any
A(m x n) may be written as [AK, (A - AKK' )j,) •
Prop. 1.2.19: bIl'
A(m x n)..1( n)
= A[KK'
= .1 (m) ,
bA t
+ (1 - ~t)~H1 -~')]A I •
= ~ IAI
~. 1 .2.20: H(m-1 x m)~(m x
Hence
HK[l + J]K I HI
..
=1
+ J - p.KJ -
•
k I(m-1 )][li~]-' = 1(m-1)
HK{~'[I(m) - ~ I(m)]K}-'K'H'
l':rop. 1.2.22: If A(mx n)j(n)
=A(mx
1.3
= .Q.(m),
= l(m-1)
•
= A[K,
-~J.J
then A
and
n)K(n x n:f)[K' (n,;,,1 x n)12.(n) - b 1.(n-1)] •
n
-
Prop. 1.2.23: If A(m x n)j (n)
-
lK IAt + AKJK tA' •
m:1) [1(m-1 ), -1 (m-1 )] = .tl(m-1 x m) •
Prop. 1.2.21: [K'H' ]-' [1(m-1) -
A(mx n)b(n)
And if
= Q(m),
then 8!:!'.tl
Inverses of partitioned and patterned
=A
•
matr~~.
For a statistical model with k factors, the symmetric,
nonsingular matrix M~MI may be "naturally" and conveniently
partitioned into k 2 submatrices.
Moreover, the particular
design imparts a distinctive pattern to the structure matrix
M and to MiMI.
Hence we consider in this section some of
the types of matrices whose inverses will be needed in later
chapters.
6
Lemma 1.;j .1': If .o(m) and !len) are both nonsingular.
~(m)
,
Proof:
This relation may be verified
~(nxm).
both members by
[a,
~J.
by multiplying
It may also be derived by solving
D
g"
four simultaneous matrix equations.
The lemma is stated as
an exercise in [1].
Corollary 1.3.2:
Lemma 1.3.3:
provided
Proof:
If
a and
~
are symmetric and nonsingular,
[al(m) + bI(m)] -1
bm + a
I
= [~l(m)
- a (b~ + a) I(m)]
0 •
This relation may be verified by multiplication or
derived by solving two simUltaneous (scalar) equations.
Lemma 1.3.4:
[ 0 (m -1 x
Proof:
Ltl(m-1 x m)lS,(m x m:f) ]-1
=lS,' (m:1' x m)J:p (m x m:f)
+
m:2),. m-1
j (m -1 )] •
.v'"m(m-1) Since !:!(m-1 xm)lS,(mx m:T) is, by Prop. 1.2.11, a sub-
matrix of an orthogonal matrix, the transpose of this latter
7
matrix is a nonsingular matrix to which Lemma. 1.3.1 may be
applied.
Identification of correspondin9 submatrices in the
first orth090nal matrix and the inverse of its transpose
completes the proof.
~emma
1.3.5:
A matrix which can be partitioned into (m+1)2
submatrices with the followin9 pattern,
no. of
rows
...1..
--1-
1
90 -
--L..
J
9091 -'
e
1
9'9
J,
02-
J
9091 -'
--1- J,
9092 -
...1..
9 09m -
90
J
9192 -' • • •
--L :l
91 - 1
--1-
92 - 1
..1- 1 ,
J
9192 -'
·••
••
1
1
90
91 -1
92 -
91 9m
•••
J
929m -
·••
•
••
•
J-
1 J.,
•••
929m
99
1., 9"9 I,
1 m
o m
no. of
eols.
-l- J
, --L
1
91 -
--L
•••
I
9m -
92- 1
9m -1
9 m-1
has an inverse with the followin9 pattern,
901+ (91+92+".+9 m- m)J,
-91 J ,
-92I
,
••
•
-9 mJ
,
-91I
91 (1 + .J),
Q
-92l.
, •••
,
-9 mJ
•••
Q
92 (1 + I), • • •
Q
0
••
•
•••
Q
Q
••
•
,
• •• 9m(1+ ~V
8
where the submatrices of this inverse have the same dimensions as the corresponding submatrices of the given matrix.
Proof:
The method of Roy and Sarhan [14] was used to derive
the inverse.
Verification is quite simple.
Corollary 1.3.6:
(m +
A matrix which can be partitioned into
1)2 submat"rices with the following pattern,
no. of
rows
J, • • •
t
J.,
tI,
J.
-
:1.,
tI,
I, • • •
:1.,
:1.,
tl, • • •
•
•
•
••
••
•
J.,
J.,
J.,
t
t-1
t-1
·
e
no. of
cols.
t-1
I
:1.
t-1
•
••
•••
t-1
tl
t-1
has an inverse with the following pattern,
1 1+ m(t-1) ;I,
t -
t2
1
-t:1. ,
1
1
- -t -J
- -t -J
1
,
•••
1
- -t -J
Q
,
•••
.Q
1
, tel
+ J.),
•••
Q.
•••
tel + J.)
,
1
tel
+ :1.),
J
t -
.Q
•
•
•
••
•
•
•
1
Q.
Q
·
t -J
,
For the particular case when m = 2, this result was obtained
by Roy and Sarhan [14].
9
1.4
Some lemmas on matrix factorization. completion.
stationary values o..J\4- characteri stic roots.
Lemma 1.4.1:
If M(n) is symmetric and p.d., then there
~
exists a nonsingular lower triangular matrix I such that
M= II' · I
~~
~
can be made unique by specifying the sign of
the elements in the principal diagonal.
This lemma is
(A.3.9) of [16].
Lemma 1.4.2:
If~,
(m x n) is orthonormal for m < n, then
there exists another orthonormal matrix 1a(n:IDxn) such that
-LL'a'l
[_ ~ is orthogonal.
of
~,
Lemma
•
~a
is called a (not unique) completion
This lemma is (A.1.?) of [16].
1 .4.3:
-
- -
If M(m x n) is of rank r < m < n and such that
the first r rows of
Mcan
be a basis for
M,
then there
exists a nonsingular triangular matrix I(r) and an orthonormal matrix I.(r x n) such that
M(mxn) =
i(r)
[.
J
I(M x rl
~(r x n)
•
This lemma is (A.3.11) of [16].
Lemma
1 .4.4:
If M(m) is sym. and 2(m) is nonnull, the sta-
tionary values of ~tM9/2'g (under variation of g) are ch(M) •
This is (A.2.1) of [16].
Lemma 1.4.5:
The stationary values of
(2f~)2,
under
10
~ and~,
variation of the unit vectors
is any mx n matrix.
~emma
1.4.6:
are ch(MM'), where M
This is Lemma 1.2e of [6].
Ii M1 is sym., M2 is sym. p.d., and d is non-
null, the stationary values of
of g) are ch(M1M;1).
Lemma 1.4.7:
~'M1g/~~g
(under variation
This is (A.2.2) of [16].
If M2(m) is sym. p.d. and M1 is sym. at least
91 ~ g,'M1s!/g'M2g, ~ 92 for all nonnull
gem) < ~ 91 ~ Chmin (M1Mi 1) ~ ch max (M1M;1) ~ 92. In
slightly different notation this is (A.2.6) of [16].
p.s.d., then
Lemma 1.4.8:
then
91
S
If
M,
and M2 are each sym. and at least p.s.d.
chmin (M,M2') ~ chmax (M,M2') ~ 92
i
{ g;chmin(M,) ~
i
(
chi
min (M2) ~ 9T chmin(M,) and g; chmax M,)
~ chmax (M2) ~ ~ chmax (M1)J
~
ch max (M2)
~
1
~ ~
chmax(M,) •
~ ~ chmin(M,) ~ ch min (M2)
Although not stated there as
an explicit lemma, these implications are shown to hold on
pp. 6-7 of [6].
1.5
Miscellaneous lemmas.
bemma 1.5.1:
If
Cj
=0
-:-.-.
where -C(m) is sym. and of rank m-1,
then
Proof:
The equivalence of (1.5.1a) and (1.5.1b) is obvious.
Hence we shall proceed to prove the latter.
Since
~
is sym.
11
..
and Cj = Q , then
~
may be partitioned as follows:
J
'~K~K'
KICK
(1.5.1c)
-K'CKj
~ = [ --J' '~CK'
- --'
1- -1
•
This is merely an extension of Prop. 1.2.22 for sym. matrices.
And from Lemma 1.3.4 we get
~(l:!!$.)-1ti = ~I!i'ti+ lS.[g(m::'fxm-2),· m-1
)ti
.y'm(m-1 ) 1
= KK t [ I (m -1 ) -
-
1m-J] + -mK[ 1 J (m-1 ), _ m-1
m
fJ (m-n. ~ _lfI(m-n,
=
~I
=
~-
m~I
oj
(m-1)' -lJ
Q'
J']
-
~ + 1!J(m-1 ).
~ m~I
.
0
Substituting this and (1.5.1c) into the left member of
(1 .5.1 b), we get
~ b~'CK
[ ~'
,
o I -:J" K ' CK
:.J - - - '
-l
.·
~~~~:~J[~: -~
which
by (1.5.1c) is C •
m
L (~)[ L (_1)i-
Lemma 1.5.2:
i=1
Proof:
When
above becomes
i
j
= m,
j
(;)jS]
j=1
the coefficient of jS on the left side
(:)(_1)°(:), which is 1.
coefficient of jS is
•
For
j
< m, the
12
m
.
L(T)
m
(-1 ) i - j
(3) = L(j)
i=j
m-j
(-1 )i '" j
i=j
~
• • •
r
!..J
~
m
x. y. z. .. (
111
i=1
i-1
rL1
x,xj(Y3.
r
~
m
(L
1 1
i=1
x. Z. ) /
1 1
i=1
L Xi
x. =
1
i=1
m
- y.)(z. - Zj)/
J
1
~
L
•
The above equality is obviously equivalent to
m
m
m
m i-1
L XiYiZi)= (L
i=1
(Z. - z.).
1
J
m
m
i=1
Xi )(
i=1
rL
x. y. ) (
i=2 j=1
Proof:
m
XiYi)(
i=1
L XiZi)+ L L XiXj(Yi -Yj)
i=1
The right member of this may be written as
m
m i-1
L
m
m
(~x.y.)( ~ x.z.)+ r
L 1 3. !..J 3. 1
!..J
i=1
i=1
or
m
m
or
m
(1 -b . . }x.x.y.z." r
r (1-b 1J
.. ) x
1J
3. J J J
L!..J
i=1 j=1
i=1 j=1
m
X.X.y.Z.
1 J 1 J
x
i=2 j=1
(LXiYi)(LXiZi)+
LXiXj(YiZi+YjZj-YiZj-YjZi)
i=1
i=1
i=2 j=1
m
•
k=O
Lemma 1 5 3- m
m
(T: 3)= (j) L (-1) k ( mkj ) = 0
m
r
L
L XiYi L XjZ j +
i=1
j=1
m
L:
i=1
m
m
Xi
L:
j=1
xjYjZj"
L xjYjZj +
j=1
t3
In
i-1
I I
Lemma 1.5.5:
1-1
i=2 j=1
m-1
j
I (I
j=1 k=1
Pr~of:
I
k= j+1
=
k:::lj
m-1 i-1
m
xk )(
Zzk)2
XiX j (
j
xk ) Z + 2
m
j
I I (I
I
Xk)(
i=2 j=1 k=1
Xk)ZiZj •
k=i+1
The (~) terms of the double sum in the left member
above may be arranged as elements of a triangular matrix
X2X' Z,2 ,
e
0
0
Xs X, (z, + Z2) 2,
XSX2 Z22
X4r X, (Z, + Z2 + Za ) 2,
X4 X2 (z 2 + za) 2,
•
•
•
I
k=1
•
•
i X2(
X
XmXt(
LZk)2,
i-1
I
zk)2
k=1
Xi xa (
•
•
•
k=2
Zk) 2,
• • •
zk)2,
• • •
m-1
m-1
Lzk)2,
I
k=3
k=2
•
XmX2(
• • •
•
•
m-1
,
•
•
•
• • •
2
X4 Xa Za
i-1
zk)2
• • •
0
•
•
•
i;;,.1
XiX, (
,
,
XmXa(
I
k=3
,
14
given j, 1
< j 5. m-1, zj occurs in mj -
j2
terms, namely
those arrayed in the rectangular submatrix consisting of
the first j cols. and the j-th through (m-1)-th rows of the
above matrix.
The sum of the coefficients in the first j
j
cols. of the i-th row is
m-1
of
zj
is
i= j
k=1
Lxk •
.k=1
j
Lxi +1 Lxk
xi + 1
or
(t
k=1
xk )(
Hence the coefficient
m
L xk )·
Also from the
k=j+1
same triangular display it is apparent that for a given i
and j, 1 5. j < i 5. m-1, the factor 2z i Zj occurs in mi - ij
terms, namely those arrayed in the rectangular submatrix
consisting of the first j cols. and the i-th through the
15
(m-1)-th rows of the triangular matrix.
m-1
the coefficients of 2z i z j is
L xh+1 L xk
or
h=i
k=1
The equality of the two members is thus
m
j
Hence the sum of
j
( Lxk )( L xk )·
k=1
k=i+1
demonstrated.
m i-1
Corollary 1.5.6:
i-1
L L xix j ( L Zk)a
is a quadratic form in
i=2 j=1
k=j
z"
Za, ••• , zm_1 whose matrix has as its element in the
.t.
m
i-th row and j-th col.
xk )(
xk ) where .t.
min(i,j)
k=1
k=n+1
and n
max(i,j).
(L
L
=
=
e
Lemma 1.5.7:
m
X1 (
xk )
k=2
m
x,(
xk )
k=3
L
L
The determinant
m
x,( [x k )
k=3
m
2
( [ x k )(
xk )
k=3
k=1
L
·m••
X1 (
L
k=i+1
2
xk ),
·••
L
k=m-1
x,x m
.~ xk ) (
k=1
• ••
k=i+1
xk )
L
k=i+1
2
m
(L xk ) ( L
i
xk ),
2
xk ) ,
•
•
• ••
m
(L xk ) ( L
k=i+1
k=1
i
m
k=m-1
(L xk )xm
k=1
•
••
xk ),
xk ),
•
(L xk ) ( L
k=1
2
k=i+1
k=1
m
L
x, (
•
m
x, (
(
•
••
·..
m
xk ),
••
m
·.. (L xk ) ( L
k=1
i
•••
k=m-1
(Lxk)xm
k=1
xk ) ,
•••
i6
•••
X,X
m
2
•••
(L xk)x m
k=1
•
•
• ••
•
·..
•••
Proof:
The given determinant is obtained from a sym. matrix
whose element in the i-th row and j-th col., for
-r..
m
i,j
1,2, ••• ,(m-1), is
xk )(
xk ) where -r..
k=1
k=n+1
(L
=
and
n;: max(i,j).
Adding (
L
I
Xk)/x,
= min(i,j)
times the first
k=j+1
column to the j-th col. of this matrix (provided x,
I
0)
m
makes (
Lxk )
a common factor of all elements in the j-th
k=1
m
\'
m-2
col. for j = 2,3, ••• ,(m-1). Thus (L xk )
is a factor of
k==1
the original determinant, and the remaining factor is the
17
(m-1)-th order determinant obtained from a matrix whose
first col. is the same as the
f~rmer
matrix and whose ele-
ment in the i-th row and j-th col., for
i = 1,2, ••• ,(m-1),
m
and j = 2,3, ••• ,(m-1), is
L xk
; where
n
= max(i,j)
•
k=n+1
Now subtracting the second row from the first row, then subtracting the third row from the second, ••• , and finally
subtracting the (m-1)-th row from the (m-2)-th row leaves
only zeros above the principal diagonal.
this diagonal are
j-th col. for
The elements in
in the first col. and x j + 1 in the
j = 2,3, ••• ,(m-1). Hence the value of this
X1X2
m
TT
latter determinant is
for
X1
I
x • The lemma is thus proved
k=1 k
0, and it is trivially true when X1 = o.
Corollary 1.5.8:
The above determinant vanishes if and only
m
if either
L xk = 0
or
k=1
Coroll~y
1.5.9:
xk = 0
for at least one k •
The principal minor obtained from the first
h rows of the matrix from which the above determinant was
h+1
obtained has the value (
TT
k=1
Proof:
xk )(
Lm xk ) h-1 •
k=1
The same steps taken in the proof of Lemma 1.5.7 can
be taken on the submatrix consisting of just the first h rows
and the first h cols.
Lemma 1.5.10:
The real sym. (m-1)-th order matrix whose
,f.,
element in the i-th row and j-th col. is (
m
L xk )( L
k=1
k=n+1
xk )
18
= min(i,j)
t
where
and n - max{i,j), is p.d. if
xk > 0
for k ; 1,2, ••• ,m
Proof:
By Lemmas 1.5.7 and 1.5.9, the determinant of the
entire matrix and each principal minor are positive if each
xk ' >
o.
Then by Gundelfinger's rule (cf. [3J) it follows
that each characteristic root of the given matrix is
positive.
m
1&mma 1.5.11:
Lxk > 0 and
If
no
xk
< 0,
the real sym.
k;1
(m-1)-th order matrix whose element in the i-th row and j-th
t
m
col. is
xk ), where .t :: min(i,j) and n :: max(i,j),
xk ) (
k;1
k;n+1
is at least p.s.d. with its vacuity equal to the number of
(L
L
values of k for xk ;
o.
m
Proof:
say
X1
Since
> O.
L k > 0,
x
xk > 0 for at least one value of k,
k=1
Then the steps taken in proving Lemma 1.5.7 are
elementary transformations on the matrix of the present
lemma.
These transformations put into triangular form suc-
cessively larger submatrices in a sequence of nested 5ubmatrices consisting of the first j rows and first j cols.
If
xk ; 0 for v distinct values of k, 1
~
v
~
m-1, then
other elementary transformations could reduce the entire
matrix to diagonal form with zeros in the last v rows.
Then
the rank of the original matrix is clearly m - v - 1, and for
j
= 1,2, ••• ,
m- v - 1
in this rearranged numbering
19
j+1
(lJ
k-1
m
~
xk )( i..J xk )
. 1
J-
>
0 ,
Hence by Lemma 1.5.9 and Gundel-
k=1
finger's rule there are
m- v - 1 posi ti ve characteristic
roots of this and of the original matrix.
CHAPTER II
UNIVARIATE MODEL:
2.1
DESIGNS AND STATISTICS
General two-factor models.
We start with a general two-factor model in which
each observation is assumed to be the sum of three terms,
the first two corresponding to the two factors or criteria
by which the experimental unit is classified and the third
term being of the nature of an error.
Included in this
model is the postulate that the n errors, in a set of n
observations, are independently and identically distributed
normal deviates, each with distribution denoted by N(o,a 2 ).
We shall suppose that there are t categories in the first
classification and s in the second.
of the model we denote by ai' i
j
= 1,2, ••• ,s.
The terms or elements
= 1,2, ••• ,t,
and b j ,
For convenience we shall refer to these a's
and b's as "treatment effects" and "block effects," respectively, but this designation should not limit the application or prejudice the interpretation of the model.
One
further point is important: the a's and b's may be fixed
constants or random variates having their own (not necessarily normal) distributions, but even in the latter case
the errors are still N(o,a 2 ) and independent of other
variates.
20
21
In any planned experiment the n observations will
come from experimental units located in the different
categories of the two factors in accordance with some
scheme determined (in part) by the experimental design
chosen.
If the observations, treatment effects, block
effects, and errors are respectively ordered and written
as column vectors--l(n), £(t), b(s),
~(n)--then
the two-
factor model described above can be represented by
1 (n ) = M(n x s+t) [£(t
(2.1.1)
~
12.( s ~
where
M(n x S'+'t)
= [Mo. (n x t),
+ .§. (n ) ,
M1 (n x s) J .
The fact that M is partitioned into two submatrices and the
,',
f act that the rank (M)
~
s+ t - 1
result from the basic
assumptions of the model as stated above.
Hence
Mmay
with
some justification be called the "model matrix" for the
observations
y.
On the other hand the actual elements of
M
are not determined until a specific experimental design has
been selected and each experimental unit uniquely classified
according to a "field plan" consistent with this design.
Hence
Mhas
more frequently been called the "design matrix. II
Throughout this investigation the general pattern or structure of
Mwill
usually be known, whereas the design will
usually not be specified.
For these reasons we shall call
til by the less specific name "structure matrix."
Since an experimental unit will belong to one and
only one category of each classification, each row of each
22
submatrix, Mi' i
= 1,2,
will consist of zeros except for
unity in some one col.
Thus for a two-factor model, the
structure matrix will always be such that
O(t]
M(n x -s+ t ) oJ.
[-i(s)
= 2.( n )
In what follows we shall consider this to be the only independent linear relation among the cols. of
~
useful design will have
n
that rank (M) = s + t - 1
and any
consti tute a basis for M.
M.
Since any
s + t, it follows from (2.1 .3)
s + t - 1 cols. of M would
We agree to use the first s + t - 1
cols. and to denote this basis by MI (n x s+ t - 1).
s
MI (n x s + t - 1) = M(n x "5+t) ~ (s+t x + t - 1 )
= [Mo (n x t), M1 (n x s),K( s x s::-f)]
(2.1.4)
using the notation of (1.2.3).
( 2 •1 •5 )
where
2.2
Thus
M(n x S+t)
Then it follows that
= M1 (n x s+ t -1 )[ 1( s+ t -1 ),
i ( s+ t -1 ) ]
i' (s + t - 1) = Ci' (t), -1' (s - 1 ) ] •
Derivation of a-related b-free statistics.
Whether the ultimate purpose is estimation, testing
hypotheses, or confidence bounds, and whether the treatment
effects are regarded as parameters or variates, statistical
inference about those treatment effects will require a set
of statistics whose distribution depends upon A(t) and not
upon !2-(s).
From the basis MI{n x s+t-1) we form the
nonsingular sym. matrix
2'3
MoMo
'
no. of
cols.
Then
(2 • 2 • 2)
is a set of
[
,
MoM1!S.
t
KtM~Mo,
no. of
rows
t
J
~'M~M1l$.
s-1
s-1
t
t
]-1 ,
[ MIMI
MIl
s + t - 1 statistics defined in terms of the
observations yen) but whose relation to A(t) and Q(s) can
be found by substituting (2.1.1) and (2.1.5) into (2.2.2).
Thus
(2.2.3)
(t),
= - _
~
=
~
-
Q(s-1xt),1(s-1)
A(t) + bsJ.(t)
~I
(
J[s.j LMIM, I ] - x
O(tx s-1),j(t)
-
-s -1 x s) b ( s)
1
.+
,-1(s-1)!2.
J
- b s1( s -1 )
+
t
[MIMI]
t
MI §.
-1
t
MI~·
From this final form it is apparent that any £ontrast (linear
combination the sum of whose coefficients is zero) among the
first t rows of (2.2.2) will be a statistic entirely unaffected by Q.
There are at most t -1 linearly independent
contrasts possible.
How many and which contrasts are needed
will depend upon the purpose of the experiment.
But one set
of t - 1 linearly independent contrasts among the first t
rows of (2.2.2) is obtained by multiplying (2.2.2) by the
matrix
(2.2.4)
[J:f{t='f x t), Q( t-1 x s:f)] •
24
Sometimes it is desirable to use a set of statistics which
is derived from the observations y by an orthonormal transformation.
From Lemma 1.4.1 we know we can find a lower
triangular matrix To (t - 1) such that
(
2.2.5 )
""
"'"
1010=
Then using this
[ -
-
-
I
H(t-1 x t) ,Q.(t-1 x s-1 )[MrMr ]
10
-1
-!:!'(txt-1)J
__
Q(s-1 x t-1)
we define a set of statistics
The matrix of this transformation is orthonormal, since
(2.2.7)
fl01[H,Q](MiMr]-1MiJrIo1[!:!,Q][MiMr]-1MiJ'
=
To' [J;!,Q] [j!\iMr r'[ :] 1~-'
..... -1 .......... ' ..... '-1
= 10
ToToT o
=1·
The components of y(t-1) are a-related and b-free, since
y. =
10' (t-1 )!:!<t-1 x t )A( t) + 10 1 (t-1 ) [!:!(t=1' x t),
-) ] [ '
]
Q ( -t-1 x s-1M
rMr
-1
,
Mr§...
25
Thus yt y is a scalar statistic expressible as a quadratic
form in the observations
y
without actually solving for
For convenience we shall denote the
n xn
To.
matrix of this
quadratic form by the single symbol Qo •
2.3 Derivation of b-related a-free statistics.
From (2.2.3) it is apparent that each of the last
s-1 components of (2.2.2) is a statistic whose distribution
depends upon £ but is entirely unaffected by A.
components of Q
we~e
If the
fixed parameters, these s-1 statistics
would be unbiased estimates of
=
bj - b s ' j
1,2, ••• ,s-1.
Other contrasts among the b components could also be estimated by appropriate linear combinations of these.
But if the bls are variates, these individual contrasts would have little relevance.
Hence we need a sym-
metric function involving all of the bls, such as the
corrected sum of squares of the bls.
Knowing that QIHltlQ
would be just this sum of squares, we make the transformation on y.. '
(2.3.1)
which gives us s-1 statistics whose distributions are free
of A.
If it is desirable to have a set of statistics ob-
tained from y by an orthonormal transformation, we first
form the symmetric nonsingular matrix of known constants,
Gertrude M. Cox
INSTITUTE OF S1'ATlSTlC8
pox 5457
,-rATE COLLE~ STATION
. ~AL€lGI-l.
N('r.'T'.f
"'A~"...tNA
-~- !:,!*~-
..
-
26
~ """"t
~
(
and using Lemma 1.4.1 factor it into 1111,
where 11
s-1
is a lower triangular matrix.
Then premultiplying (2.3.1)
..... -1
by 11
we obtain s-1 statistics which we call
which are related to band
-
( 2 •3 .3)
s-1
)
and
Thus
~ ( s -1) :: 11'1 [.Q( s -1 x t) ,1j( s -1 x s) JS.( s x 5-1'")] x
~(s-1)
= T,11j(s-1 x s)b(s-1)+
~'y
I
MIMI
]-1 ,
MIl:
.b1~(n) ,
From (2.3.3) we obtain the
as a quadratic form in l and then replacing
We denote the matrix of this quadratic form by
2.4
(
E
where L, (s-1 x n) is orthonormal.
scalar
~
- but are free of -a.
[
(2.3.4)
)
Q1
•
of a-free b-free statistics and the error
sum of sguare.§..
~rivation
Gi ven that oMI (n x s+t-1) is of rank s+t-1, using
Lemma 1.4.3 we can factor oMi into TL, where I(s+t-1) is
lower triangular and 1( s+t-1 x n) is orthonormal.
Then by
Lemma 1.4.2 we can find another (not unique) orthonormal
matrix .b*(n-s-t+1 x n) which "completes"
1. and makes [L
I
,L~]
27
an
nxn
orthogonal matrix.
Using thi s 1* we de£ ine a
third set of statistics as an orthonormal transformation on
(2.4.1 )
Using (2.1.1) and (2.1.5) we get from (2.4.1)
(2.4.2)
!\\ = h{M1[1(S+ t - 1l. i] [ : ]
=
l...ll'[J,(s+
t-1l, i]
+d
[:J+ ~ ·
But since 1~ = Q(n-s-t+1 x n), ~ is seen to be merely 1~
and hence is distributed as N(Q, a 2 I), unaffected by A or £
whether these latter are fixed or stochastic.
As with y and
~,
we form the sum of squares
w'w = y'L'L Y
---~*-'
which is such that
wi th
n- s -t+1
~t~a2
d. f •
has the chi square distribution
Since [,~, ,1:] is orthogonal,
Then
= l(n) - L'(T'I,-1)(I- 1T)L
= l(n)
- (TL)'(TLL'T,)-1
(IL)
-'
] -1 ,
= l(n) oMI [ MIMI
&11 •
=
Thus although 1* was not unique, 1~* is unique and ex,.,.
pressible in terms of MI without actually obtaining T, 1,
or 1*. We shall denote ~* by the single symbol Q. It
r:
28
is a
symmetri~
p.s.d. matrix of rank
where r*:: s + t - 1
~'n
= !.'~
n- s- t - 1
is the rank of M.
Since
or n - r *
~ = 1.~
,
is called the error sum of squares, and ~'~I 0'2
ha s the chi square di stribution with
n - r * d. f •
2.5 Certain simplifications of the p;eviouslydefined
statistics.
In addition to what are called optimum properties for
the specific use intended, there are certain other more
general properties desirable in all statistics--viz.,
(1)
freedom from nuisance parameters, (2) independence of
other statistics intended for joint use, (3) meaningfulness
with respect to the unobservables of the model, and (4) simplicity of the computations required.
Consideration of the
first two of these has already led to the definitions of
~,
~,
and
~
above.
Now we shall see to what extent these
definitions achieve the third and fourth properties also.
Substituting (2.2.3) into (2.3.5) gives
~IMI[MiMlr'G~liJ ([Q,!:Jl$.J(MiM1r' ~~li}
-,
x
[Q, tlK) [MiMI] -1 EdI !. +
Since the
invers~
of the matrix (2.3.2)
occurs in each of
29
the three terms of (2.5.1), we shall
possible simplification.
investi~ate
it for
First we substitute (2.2.1) and
then use Corollary 1 .3.2 for the inverse of a partitioned
matrix.
Thus
(2.5. 2)
{[.Q, ~][MI' M1 ] -1
~
OJ J
-1
-
~'!i'
.
= {!il)[K' Md:41 ~ -~' l~~l1 Mo x .
M'M ]-1M'M
K]-1 K
_0_1_
_'H,}-1
_
[ _0_0
= [~'tl'] -1 {~'MHd1 K -~' ftH Mo[M~Mo]
-1
M6M1~}[~J-1
= [~'l:!,]-1f'hMM1 - M~Mo[{~1~Mo]-1M61~hJ~[HK]-1 •
Hence the first term of the right-hand member of (2.5.1) may
be written as
But the model stipulates that
(
2. 5. 4 )
{
oM,' !~11 -
'
]
~11~ Mo [MeMo
-1
Mol
=i
and M11
= i.
Hence
,
J'1. = M1, J. - lllb,Mo [M'
M' .
MoM1
_oMo ] -1 -01
'-M"
M"M" - M'M
M'M
= _11,
_1 _0 [M'M
_0 _0 ]-1 _0
~1
- _11 - _11.
- Q.
Moreover, we know that
nonsingular.
•
!S.'{M~M,-M.jMoU:1.~Mo]-'M~r~l1JK is
Therefore (2.5.4) is the only independent
relation among the columns (or rows), and the rank of
{M~ M, - M1 Mo[ M~Mo] -1 MbM1
J
iss - 1 •
Thus the conditions of
Lemma 1.5.1 are satisfied, and (2.5.3) can thereby be
reduced to
' ]-1 MoM1
t
Jb_ •
t2.' {Mo1, M1 - M,'Mo[MoMo'
Moreover, the matrix
~'~
= -y'Q1Y
--
9"
which was defined by the relation
and which appears in the second term of the
30
right-hand memberof (2.5.1), can also be simplified to
(2.5.6)
M1 rt!liM1
r' ~}M;!!h - !!Ii Mo[I!Jc\Mo r' ~M,}[Q..Kl •
~
[
,
MIMI
] -1
1
MI·
Then using (2.2.1) and Corollary 1.3.2, we get
'
t
[ I ] -1
t ] -1 MoI J&11!S.{
( 2.5.7 ) 31 = {1- Mo [ MOMo
!S. t~hM1!S.
-!S. t ~l1Mo
MoMo
x
M~MdS} -1!S., M; {l- Mo[r~l~Mo] -1 MbJ
In this final form
g,
•
involves inverses of one t-th order and
one (s -1 )-th order matrix, whereas (2.3.4) involved inverses
of one (s + t - 1 ) -th order and one (s - 1 ) -th order matrix.
The above simplifications in form were implicit in
the original definition and have required no new assumptions
or restrictions.
But still further simplifications are pos-
sible for certain designs.
When b is a variate,
2,'[1- ~ I]Q. I(s - 1) is the customary mean square used to
estimate the variance of b.
Thus
1
12'[,1- S
I]Q. is certainly
a meaningful expression, and under certain circumstances to
be considered later it will be desirable to have (2.5.5)
proportional to it.
Thus we shall in the sequel refer to
the "restricted designs" and mean those for which
where b 2 is a constant depending upon the particular design
and size of the experiment.
It is not our purpose here to
determine just which designs do and which do not imply
(2.5.8).
But we shall next derive some sufficient
31
conditions and then show that many types of familiar, prac-
..
tical designs satisfy these conditions •
Suppose (1) every treatment is applied to r experimental units (in r distinct blocks).
M~Mo = r!(t);
Then
U:1oMo]-1 = ~ let)
Suppose (2) every block contains q experimental units (to
which q distinct treatments are
~pplied).
M'M ]-1
[ _1_1
(2.5.10)
Then
= 1q 1(s)
_
•
Then any design meeting these two specifications is such that
' [MoMo
' ] -1,
M
' - Mdlio
( 2.5.11 ) Mdth
MoM1 -_ q!. - r1[M'oM, ]" MO_1
But
,
MoM1
•
is the so-called incidence matrix, whose elements
are either one or zero according as a particular treatment
(associated with a particular row) has or has not been applied to some one experimental unit of a particular block
(associated with a particular column).
above implies that
MoM ,
Restriction (2)
has q ones in each column and
that each diagonal element of [MoM1]'M~M1 is q.
henc~
But neither
(1)
nor (2) nor both determines the nondiagonal elements,
say
)"r j
,
of [M~M1]tM~M1' which elements are the numbers of
treatments common to experimental units in the i-th and in
the j-th blocks.
But for (2.5.8) to hold when (2.5.11) is
*.. must be equal, to A* say.
given, it is obvious that all ).. 1J
' ]" .Molth may be written as (q -).. * )1. + )" *J.. ComHence [.MoM1
bining this with (2.5.11) and (2.5.8) gives
32
I"fI~l[(q-A.*)I+A.*J]=
b 2 [I-l
J]
'1.=.
r
_ $. __
which as an identity in 1 and
both A. * and b 2 ,
~
t
is sufficient to determine
Thus we obtain the third property, restric-
tion (3), of those designs henceforth to be referred to as
restricted designs:
A* = q(r - 1 )
. s- 1
For these restricted designs
b2 -
-
qs(r-1) _ sx.*
r(s-1)-r-
but we repeat that these three restrictions are sufficient
and may not be necessary for (2.5.8) to hold.
To show that our class of restricted designs is not
too restrictive to be useful, we shall now consider some
familiar, practical designs which do satisfy these three
conditions.
For randomized block designs
X. * = t, and b 2 = t.
q=t,r=s,
For symmetric BIB designs, q=r, s=t,
X. * (t-;1)=r(q-1), and b 2
- ir - 1 ) s
For PBIB designs
s -1 •
which are duals of BIB designs, the so-called linked block
(r-1)
designs of Youden [21], A. * (s - 1 ) = q(r - 1) and b 2 -- qsr(s-1)
•
The three restrictions considered above not only
imply (2.5.8) but also permit further simplification of 91:
in which form no inversion of matrices is indicated.
further consequence of these three restrictions is
obtaine~
by substituting (2.5.8) into (2.5.2) and then applying
Prop. 1.2.21:
A
33
.-
Hence (2.3.2) becomes b- 2 1 and
•
"'"
""'I
-1
11
= 11 = b 1
(2.5.17)
Thus for restricted designs (2.3.1) is already an orthonormal transformation except for a scalar multiplier b.
Of
course (2.5.17) should not be surprising since (2.5.8) is
equivalent to
2.6
§1mplificstion possible for randomized block statistics.
Because randomized block designs are so widely used
and because their simple structure permits greater mathematical simplicity in their statistics, we give this special
consideration to the statistics for randomized block design
experiments.
Certain simplifications, such as (2.5.9) and
(2.5.10), are shared by many designs.
But peculiar to
randomized block designs is the possibility of so ordering
the
n = st
(2.6.1)
observations that
Mo(n x t) = let) ·x 1(s) and M1 (n x s)
=j,(t)
·x l(s) •
Throughout the remainder of this section we shall assume
that the observations are so ordered and that the structure
matrix is given by (2.6.1).
Then we can obtain by a single
transformation the three vectors
y(t-1 )
~(s-1
)
~(st-s-t+1 )
-H(t-1 x t)
= t-ti
, (t)
·x S-tjl(S)
-
·x l:i<s-1 x s)
H(t-1 x t) ·x H(s:r x s)
y( st)
34
Replacing r by means of (2.1.1) and then substituting (2.6.1)
gives
(2.6.3)
11
st!:l(t-1 x t).s.(t)
y
= tttl(s:1 x s)2,(s) +
Q.(st-s-t+1 )
~
free of both.
~(st).
!:!(t-1 xt)·x H(s-1 x s)
Thus we see that y is free of b,
(2.6.4)
-H(t-1 x t) ·x s-tjf- (s)
t -t j'( t) •x H(s:r x 5)
-, is free of a, and
~
~
is
Moreover
-H(t=f x t)
•x s
-t j
- , ( s)
t-t j , (t) ·x H(s-1 x s)
-
-
-H(N x t)
·x s-t j , (s)' '
-H(t=f x t)
·x H(s-1 x s)
-
• t -t j , (t) .x H(s:1 x s) ;
-
H(N x t) 'x l:!(s-1 x 5)
-
-
l(t-1) ,O(t-1 x s:T) ,Q(t-1 x st-s-t+1)
=
O(s:1 x t:r), I (s-1 ) ,Q(s:f x st-s-t+1 )
Q( st-s-t+1
x t=r), O( st-s-t+1 x s-1 ),1( t-1 ) • x I (s-1 )
= 1( st - 1) •
Thus the entire transformation on
From
(2.6~2)
~
is orthonormal.
we obtain the three scalar statistics as
quadratic forms in y:
(2.6.5)
_
u t _u = y"
(2.6.6)
yf y = yl{~ J(t) 'x [1(s) -~ l(s)]Jr ;
(2.6.7)
yt'Xt = y'{[I(t) -
{I (t)
--
-
1t -J (t)]
t I(t)]
•x
1s J-(s)
Jv •
.Lo'
·x [1(s) - ~ I(s)]Jr •
And from (2.6.3) we find their expressions in terms of the
unobservables of the model.
35
(2.6.8)
y'Y.
::ts~'[l(t) - t~(t)]A+ §,'{Li(t) - tI(t)] ·x ~(S)J§, +
2~'!:!' (t
(2.6.9)
x t':1 )[!:!(t:T x t) ·x i' (s)]§, •
y.'y.= tll'[l(5) - ~J(5)].E+ §,,{tJ(t) ·x [1(s) - ~J(5)]J§, +
2l2.'!:!' (5
X
5-1 )[i' (t) ·x !:!(s:1" X 5)]§, •
Of course (2.6.5), (2.6.6), and (2.6.7) above look more
familiar when expressed in terms of Mo and M1 without
direct products:
(2.6.11 )
1
, 1- y'] (st) Y
-u'u
- = -s -y'McMoY
- - - -st
.... 1
' 1 y'J{st)y
-v'v=- t -y'M-M
__ 1_-1 _y--st - -
( .6.13
2)
1,
.1tl'~=Y'y.-s y.'MoMoY
1
,1, (
)
-t y,'M 1 M1 Y+'St y J st 1. •
2.7 General multifactor models.
Quite analogous to the general two factor models considered in §2.1, multifactor models are possible, in which
each observation is regarded as the sum of a normal error
term and three or more other terms which correspond to different criteria by which the experimental units are classified.
As in the two factor model, we shall suppose that there
are n observations arranged as components of the vector y(n).
The corresponding error terms are e(n), with the postulate
that
~
is
N(~,a21).
As for two factor models, we suppose
here that each observation includes one and only one of the
components of A(t), called treatment effects, according to
36
..
which one of the t treatments has been applied to the experimental unit on which that observation is made.
But here,
for m > 1, we suppose that each observation includes the sum
of one component from each of m vectors
Qm(sm) called block effects.
~1(S1)'
b a (s2), ••• ,
At the outset we do not specify
whether these block effects are to be regarded as constants
or as random variables; but if the latter, we do specify that
the several criteria are mutually independent and independent
of the error.
(Of course, the n observations are not mutu-
ally independent under these conditions.)
s=
If we let
S1
+ S2 + ••• + sm' then we can write,
just as (2.1.1),
=
M(n x -s+t)
~a(t)J
+ §,(n) ,
~(s)
where each row of the m+ 1 submatrices consists of all zero
elements except for a single unity.
The value of n and the
location of these unities depend upon the experimental design.
2.8 Multifactor grthogonal designs and their statistics.
In a model with two or more factors each regarded as
consisting of fixed effects, if the sums of squares due to
the hypotheses of equal fixed effects for each of the several
factors are all mutually independent, then the design is
37
said to be orthogonal.
If y'gol' ltg1y, ..• , y'9ml
sums of squares due to the hypotheses of equal
and if
r
is normal with the n error terms
are the
fix~d
effects
N(~,a21),
Gnanadesikan [6] has shown that gigj = Q, for i
I
NSC for the independence of these sums of squares.
j, is a
We do
not know of any NSC for orthogonality in terms of constants
of the designs themselves.
On the other hand, the familiar designs are readily
classified as orthogonal or nonorthogonal.
And those designs
known to be orthogonal are of two distinct kinds.
The first
kind requires no special relation between the numbers
t, S1, S2, ••• , sm' and orthogonality is achieved by having
one observation (or in some instances another fixed number
of observations) for every possible category in the multiple
classification.
Thus for this type of design n is equal to
(or possibly a multiple of) t·S1 ·S2 ••• sm.
Complete fac-
torial designs are familiar examples of this type.
The
randomized block design is also an example when m = 1.
The
second kind of orthogonal design depends upon specific numerical relations and consequent geometric configurations.
Familiar examples here are Latin Squares (m = 2, t = s, = S2)'
Graeco-Latin Square s (m = 3, t = S1 = S2 = sa), and HyperGraeco-Latin Squares (4.s. mit = S1
of these latter type n = t 2 •
=S2 =••• =sm).
For each
If we denote by r i the number of observations including as a term the treatment effect ai' we see that for both
~8
.
of the above types of orthogonal designs, r i is a constant r
(say) for i = 1 ,2, ••. ,to But for designs of the first kind
m
(2.8.1)
r
= 1T
i=1
s.;
n
l.
= rt
and for designs of the second kind
(2.8.2)
r
=t
n = rt •
;
With this distinction understood (and the relation of r left
ambiguous), it is possible to give one set of conditions
which formally characterizes both types of orthogonal designs:
t
.MoMo
M~M.
-l.
-
(2.8.3)
,
M.M.
-1-1
M!M.
-1-J
= r1.(t) ,
i = 1,2, ••• ,m,
= f:I(txs
i)
1.
I(s.) ,
i = 1,2, ••• ,m,
= tl
s i - l.
= s:~. I ( six s j ) , i ;' j = 1,2, ••• ,m.
1
J
It should be noted that conditions (2.8.3) have not
been proved necessary for orthogonal designs, and it may
well be that orthogonal designs will be discovered or devised
which do not satisfy (2.8.3).
On the other hand, the rela-
tions (2.8.3) are sufficient conditions for orthogonality
and may be satisfied by more designs than we now know.
In
the sequel, when we refer to orthogonal designs, we shall
mean any which satisfy (2.8.3).
Moreover, in our subsequent
study of multifactor models, we shall restrict ourselves to
those with orthogonal designs.
Since in all practical designs n
~
s + t , the rank
39
of M(n x S+t) will be
5 + t - m , and a basis for Mean be
•
Mr(n x s+t-m) = [Mo,M1 K , M2~' ""
(2.8.4)
Mm~] •
Using (2.8.3) we obtain the "pattern" of the nonsingular
matrix
r
57
1 ,
r
llr
51 -
r
-s
J,
1-
(2.8.5)
r
-5
1, ,
r
-5 J ,
2 -
,
J,
•
rt
ss
J,
1 m
no.
eols.
...
••
•
•
••
•
•
•
••
..LJ
5
m -'
J..J
sm -
•••
,
2 -
·••
•••
rt
55
1" ...
llr
s m
2 m
t
•••
r-
rt-'
51
S1[ 1 + J],
rt
- it
(2.8.6)
J,
52
- rt 1"
•
•
•
sm
- rt
no.
eols.
51
- -rt -J
t
J,
Q
52
- -rt -J
Q
,
...
J
•••
•
••
Q
Q
51 -1
S2- 1
m-1
no.
rows
t
Q
!a[r+ J] • • •
rt - - '
••
•
5
s m-1
And by Lemma 1 .3.5 we know its inverse will be
1r + s-m J
no.
rows
t
51 -1
Q
·••
••
•
,
s
• ••
r~[l + J]
·..
sm -1
5
-1
m
40
-
In order to express the structure matrix M explicitly
in terms of the basis (2.8.4), we need two other matrices
defined as follows:
2"
.Q., • • •
.Q.
0,
1, • • •
0
1
•
•
•
.Q.,
Q.,
••
•
Q.
••
1 ,
·••
-
0,
0,
2"
2"
, I , Q,
,
0' ,
0'
' 0'
Q , 0 , I ,
0
,
•••
Q
,
Q.,
_ - , ·.. -
0'
,
1,
0,
,
Q
,
-
,
0'
-0' ,
•
•
•
•
•
•
••
Q
0
.5,
0
-,
•
t
0'
-0' ,
•
Q
-
-0' , -0' ,
(2.8.8)
0
,
••
no.
cols.
•••
••• 0
0'
-:-'
e
S1
,
0
_
-
Q.
Q
Q ,
0
.Ii
.Q., • ••
,
I ,
(2.8.7)
.Q.
no.
rows
t
-1
0'
,52 -1
£,( s+t-m x m)
It is easily seen that
• • • 0'
·••
Q
,
_
0'
•••
J
• •• 0'
,
, sa -1 , • • • sm -1 , 1 ,
•••
1
1, • • •
1
i
.Q.,
·..
·..
no.
rows
t
.Q.
s, -1
-i,
20,
• ••
.Q.
S2- 1
2"
•
•
•
2.,
-1,
·..
.Q.
Sa
•
••
·••
Q.,
Q.,
2"
• ••
-1
1,
1,
1,
·..
1
i,
i.
j,
-i,
.Q.,
.Q.,
-
no.
cols.
• ••
·..
-1
1
S2 -1
•
5
m-1
1
-1
•••
·••
5
m-1
.E ( s+ t x 'S+'t) is an orthogonal matrix
which, when used as a postmultiplier of M(n x S+t), shifts
41
to the last m positions those columns of M not included in
Mr'
On the other hand, E(s+t-mxm) is such that, when used
as a postmultiplier of Mr (n x s+t-m), the product is just
those m columns of
M.
In symbols,
= MI (n x s+t-m)(l,( s+t-m) ,E (s+t-m x m)]B t
•
Now making this substitution in (2.7.1) and then premultiplying by [M~MI]-1Mi gives
(2.8.10)
[MiMI r ' Mix =
a.pl!l.{:J
+
[MiMI r ' Mi§. ·
But it follows from (2.8.7) and (2.8.8) that
(2.8.11) [1,f,]B.'
.s.
a + (b 1 + b
+ ••• + b
)j
2 52
51
ms m -
b1
~
!2.2
-
no.
rows
t
tb 1 - b15,j
s, -1
K't2.a - b 2s2 .1
52- 1
•
!2m
K'b
- --m.
- bms .1.
m
sm -1
where b.;.... 5. is the last (si-th) component of the vector b i •
~
Now recalling Prop. 1.2.8 and Prop. 1.2.22, we premultiply
--
.
.-
( 2.8. 10) by [b 01:1< t -1 x t) + b 1H( 51 -1 x s 1 ) IS.( s 1
.................
+ ••• +
X S 1-1)
bmH(sm-1 xsm)lS.(sm x sm-1)], where b i is a constant yet to be
chosen. We know the first term of the product will be
42.
•
(2.8.12)
Since
~(n)
•
•
was given as
N(Q,c 2 1), we know that each com-
ponent of the second term in the product will be a normal
variate, but we must determine the variances and covariances
of the set of such components.
components into
m+ 1
If we group the
s+ t - m- 1
vectors
!o(t-1)
'h(S1- 1 )
•
•
•
(2.8.13)
and then use (2.8.6) and properties of ti, we get for 'the
variance matrix of (2.8.13)
bB
1 I ,
r Q
(2.8.14)
0'2
, ...
Q
s,
b~ r:t I '
f
0
s, - 1
• • •
•
•
•
•
•
Q
no. cols.
Q
no •. rows
t-1
t - 1
Thus we see that by choosing
•
Q.
s, - 1
sm
1 • • •
b~ rt 1sm - 1
sm - 1
43
2
~o
(2.8.15)
=r
i=1,2, ••• ,m
and
we can make the sum of (2.8.12) and (2.8.13) the result of
an orthonormal transformation on "i..
The
tistics thus obtained are grouped into
s + t - m- 1
m+ 1
sta-
ve'ctors
Yo (t-1)
~1(S1-1)
(2.8.16)
~2(S2-1)
•
••
•
~(sm-1 )
These
m+ 1
vectors are mutually independent and each vector
involves the normally distributed errors and the set of effects associated with just one criterion of classification
or factor.
Now using exactly the same approach as in §2.4 for
the two factor model, we define by still another orthonormal
transformation on
r
a statistic
~(n
- s - t - m)
N(Q,a 2 1) and independent of (2.8.16).
not use
(2.8.17)
~
which is
In practice we shall
as such but the scalar
i
~'~ = y'{I-Mr [MiMr ]-1 M Jr
·
Substituting (2.8.4) and (2.8.6) into (2.8.17), we obtain
(2.8.18)
w' w
...
-
=-y' {-r
1
,
- -MaMa
r- -
s -m
s1[
.,
]
- - rt -J - -rt. --..
M1 M'1 - -J -
44
Similarly we substitute (2.8.4) and (2.8.6) into the transformation on l which gives (2.8.16) and thus obtain
.
t
s.
,
- - 1.
,
,
v.v.
-1. -J,
r t 'i M.M.'i
-1. -1.
-
1
rt
-
'i'li:
t
sible to find two orthonormal matrices
l.t.2(S2-1 x n)
such that
- = b'!:!1 &2.1
~,
-
1.,y
~2
-
LaY
(2.8.20)
= baH2Q.a
+ 1.,§.
+ 1.2§. •
i
= 1 2, ••• ,m •
t
1.1 (S1- 1 x n)
4~
Thus the sums of squares due to 1+1 and tf2 are :l~Y1 = y'Q1 y
= Y Q2l:,
and y~Y2
where Q1
t
now instead of ~ and
*
'H1:
~, 2.1
*
and 7+2:
=0
rank (~i)
< si -
1
=~~ k.1
W
2 there
C 2b 2
= rank
=.2
92 =L~!:.2.
and
are two other hypotheses
such that
for i
(Hi)
Suppose
C. j
-~-
=1,2.
=-0
and
Then· by virtue
of Prop. 1.2.23
(2.8.21)
=C.H~ (s.
-~-~
~
x s. -1 )L.y
~
-~-
is a statistic whose expectation is zero if and only if ~:
is true. Since var(v i ) = O' 2I(si -1), Var(~itli:li) = O'2Cici •
Hence the sum of squares due to
is
1*r
(2.8.22)
It follows from (2.8.20) and (2.8.22) that
t
'[
,
Q.* = L.H.C.C.C.
(2.8.23)
-~
]
-1
-~-~-~ -~-~
,
C.H.L.
-~-~-~
•
Using the NSC previously referred to, it remains to be seen
that
Q1Q2
(2.8.24 )
But
Q, Q2
k.1g'Q2L~
2.9
=Q
implies
**
9,Q2
* * = ~1t ti1 C1, [ ~1 C,t ]
919.2
= ~~ k.1 !:.~1.2 =Q
= ~1L~ = Q and
and
by
-1
= Q.
From (2.8.23) we get
t
t [
C11:Itt k.1 ~H2~
= I( si - 1 ).
* * =0
(2.8.24) Q1Q2
1.i k..1
, ]
~~2
-1
C2H2t 12·
Hence
•
Summary.
In this chapter we have described those two-factor .
and multifactor models for which certain methods of analysis
46
will be presented in the next chapter.
Certain statistics
were derived and expressed in terms of the observations and
the structure matrix as well as in terms of the unobservables
of the model.
For multifactor models only orthogonal de-
signs were considered.
For two-factor models it was pointed
out that three restrictions would effect a great simplification in several matrices of quadratic forms, but statistics were obtained without these restrictions.
case
~
is a normal vector independent of M and
In every
~i'
But
these latter vectors are mutually independent only for
orthogonal designs.
CHAPTER III
UNIVARIATE MODEL:
ESTIMATES, TESTS,
AND CONFIDENCE BOUNDS
3.1
Testing the hypothesis of equal fixed effects.
To the general two-factor model of §2.1, or the
general multifactor model of §2.7, we now adjoin the postulate that
~(t)
is a set of constants called fixed effects.
Then testing the equality of the effects of the different
treatments is a legitimate aim of the experiment and of the
statistical analysis which follows.
~(t)
equality of components of
The hypothesis of
may be expressed'in various
algebraic forms including
(3.1.1 )
H(t-1 x t).s.(t) = Q(t-1) •
When this null hypothesis is true, the statistic
defined by (2.2.6), has zero expectation.
Yo (t - 1)
where
to (t - 1)
is
square variate with
N(g"
= to (t 0'21.),
t -1
u(t-1),
In fact
1) ,
and Y. t Yo/ 0'2 becomes a chi
d. f •
But regardless of the truth of this hypothesis,
~t~/ 0'2
is an independent chi square variate with
n - r * d.f.
And depending upon these d.f. and a chosen significance level
ao, there is determined a constant F which is the upper ao
ao
47
48
point of the variance-ratio distribution.
Hence we reject
the null hypothesis (3.1.1) if and only if
Y.'Y./ (t - 1)
------:':'- >
yj.,tYJ../(n-r*)
F
ao
•
Note that the test statistic and the critical region are
quite free of £i thus the test of this hypothesis is indifferent to whether 2. is fixed, normal, or random but not
normal. Although r * would be numerically different, the
test of equality of components of A(t) is formally the same
whether there is one other set of effects £(s) or mother
sets.
3.2
~stimating
tinear functions of fixed effeci!.
Although our primary interest is in the random effects, the estimation of functions of fixed parameters may
be one of several aims of an experiment when the model includes both fixed and random components.
Conditions for
the estimability of such functions are sometimes stated in
such a way that they seem to apply only when the observations
themselves are normally distributed.
We make no such re-
strictive assumption here and show that it is unnecessary
to do so.
We have already seen that regardless of whether £(s)
is fixed, normal, or nonnormal random
where
to
is
constants,
N(~,~2l).
Thus if A(t) consists of (unknown)
49
and any linear function of
linear function of £.
~
is an unbiased estimate of some
We wish to find which linear functions
of A have unbiased estimates among the linear functions of
~.
Let ~'a be such a function and let g'~ be its estimate
(estimator).
That is
~ (g'~) =
(3.2.3)
&'A
must be derivable from (3.2.2).
is
t -1, (3.2.2) with
~
Since the rank of 1:!(t-1xt)
replacing
& e.ll)
uniquely but cannot be inconsistent.
~'A
cannot determine.!
On the other hand,
can be obtained (uniquely) from this modified form of
(3.2.2) if there exists some arbitrary vector
t, = ,g,'ti
or
~
such that
•
In this latter form we have the well known situation of t
linear equations in
t - 1
unknowns.
The NSC for their con-
sistency is rank of [H' ,&J equals rank of !:!' •
seems to tell us more than (3.2.4).
is known to be
This hardly
But since rank of H'
t - 1 , we now know there must be one and
only one linearly independGnt constraint among the rows or
columns of the
that j'H'
= ~'
t - 1 columns.
txt
matrix [H' ,t].
Also we already know
is a constraint obtaining among the first
Hence
it i
=
0
is the NSC for the existence of an
~
satisfying (3.2.4).
j'~
=0
or
50
Hence (3.2.5) is also the NSC for the existence of a
that (3.2.3) is satisfied.
-!:tHt
g such
From (3.2.4) we get
= ~t!:ll:!t = ~tl = ~t
•
From (3.2.2) and (3.2.3) it follows that
Thus provided (3.2.5) is satisfied,
(3.2.6)
is an unbiased estimator of
Now although
~
~Ia
•
may not be normally distributed, Mis.
Hence the variance of the estimator (3.2.6) is
The final simplification in (3.2.7) was possible because of
(3.2.5) and Prop. 1.2.23.
This final form of (3.2.7) is
recognized as the variance of the unbiased minimum variance
estimator of ~'a obtainable by the usual methods when it is
assumed that
~
also consists of fixed effects.
Hence, regardless of whether
~
is fixed, normal, or
51
nonnormal random, we may estimate any contrast among the components of A by using the statistic
~,
and we may put con-
fidence bounds on such a contrast by using
preassigned confidence coefficient
1 - ao, there is deter-
n - r * , the number
mined a constant tao (depending also upon
of d.f. of
For a
~ and~.
that
~'~)such
t'H'Tou
-tQo.vt'R
lwtw/ (n-r*'
- .... - -- ..,.. -
_<
t_'_a
_<
t'H'Io11 + t
A
ao ~lt:J.'wl(n-r*)
where
B.(txt)
,
= { MoMo
v r . ] -1
MoM11$.d~' MdtldS.
-
,
t
~!tl1
t
Mo
J-1
•
For the restricted designs of §2.5, (3.2.9) reduces to
(3.2.10)
3.3
_ [
B. ( txt ) -
()
rl t
-
1
, , '] -,
q
MoM1 ~ M1 Mo
•
Testing other testable hypotheses on fixed effects.
For the sake of a complete consideration of the stat-
istical analysis of data from an experiment we consider
briefly the testing of hypotheses on the fixed effects other
than the hypothesis of equality of all fixed effects.
We
suppose the hypothesis to be tested can be expressed in the
form
~(g
where
rank (g,)
=g
x t) A(t)
i t -
1
and
= ~(g)
Cj
= 2..
Thus each row of
52
Ca is a contrast among the components of A and is itself
estimable.
A recently written and as yet unpublished paper
by S.N. Roy considers the broad question of test@bilitx of
hypotheses in terms of the relation between the hypothesis
matrix
~
and the structure matrix
M,
without explicit ref-
erence to the concept of estimability.
He starts with the
general relation
rank(M) i
rank(["~J)
i rank (M) + rank
(~)
•
where the two equalities cannot be simultaneously attained.
Now if we use the least squares method or any other standard
method to derive an F-statistic for
1+:
Ca
=2
, it can be
shown that we obtain a test which would have the same power
(equal to a, the probability of the first kind of error) for
a possibly weaker hypothesis ;+* as for the given hypothesis
/+.
In other words, instead of testing
be testing
by
1+*
1+*,
~?f.
where the relation of
If
we would really
»* to
If is indicated
The three possible relations among the ranks
in (3.3.2) correspond to three different possibilities as
regards the testability of
rank(!4)
If·
=
=rank ([ ~J)
if*c:
testable.
If.
< rank(!4)
(i) If
+
rank(~).
then
it is said to be completely
< rank([~J) < rank (M) +
Model, in which case
(ii) 1£
rank (~) , then
rank (M)
If c. 1+* C
Model, in which case
Jf is said
53
to be weakly testable.
(iii) If
rank (M) + rank (~), then
rank (M)
if f 1f*
}f is said to be untestable.
< rank
= Model,
([~J) =
in which case
This last is, of course, a
trivial case having no intrinsic interest.
Returning now to the results of the last section, we
have that ~(~'Io~)
=~ ,
and hence if the hypothesis
(3.3.1) is true, ~(~'Io~) = ~.
But regardless of whether
the hypothesis is true
Hence
~'~~'[~'IoT~~,]-1Qtl'To~ ~ as
is a central chi
square variate if the hypothesis is true.
the symbol
a defined
If we use here
in (3.2.9), we may write
_ _0_0_
CH'T
T"HC' = CRC' •
Then for a chosen level of significance ao, we reject the
hypothesis (3.3.1) if and only if
~ 'lotlQ.' [~']
'!l''Ii/
-1
Qi'I.o~ / g
(n - r
*)
> Fao
where
Fao is the upper ao point of the variance ratio distribution with g and n-r* d.f.
It is worth noting that this test presupposes nothing
about the nature of the block effects Q--whether they are
fixed, normal, or nonnormal random--except the assumption
already made that the error is independent of block effects.
54
3.4
Testing the hypothesis of zero variance of block effects.
Just as testing the null hypothesis of equality of
treatment effects is appropriate when treatment effects
are assumed to be fixed, so testing the null hypothesis of
zero variance of block effects is appropriate when block
effects are assumed to be random.
Since
!:!i =
~ and
~(§.) =~,
~ (:£) =~.
And
= I.~1 H[ var(!2.) ]!:!' (1~1 )' + ~1 [var(§.}J~~
= 11"' 1j[ var(!2.) ]Ji' (11"') t + 0'21(s - 1)
On the null hypothesis that var(!2.) = ~, var(:£) = 0' 2 1, and
e(;x:':£) = tr{ var(:£)} = (s - 1 )(1 2 • Thus t"n this hypothesis.
var(:£)
:£':£/0'2 is a chi square variate with ·s . . 1 d.f-.·· ."Regardless'
Y1.'Y1./ (12 is a chi square variate ·with·
of the variance of &,
n - r * d.f.
Hence we reject the null hypothesis ·var(!2.) =.
if and only if
(3.4.1 )
:£':£/ (s - 1)
Yi,.'Y:J.,/
* >F
(n - r )
a,
where Fa, is the upper a, point of the variance ratio distribution with
s - 1 and n - r * d. f. and a, is a predeter-
mined significance level.
Note that this test is formally
the same, mutatis mutandis, as the test in §3.1.
But for
fixed effects when the null hypothesis is false, the test
statistic in (3.1.3) has a non-central F distribution,
whereas for random effects when the null hypothesis is
false, the test statistic in (3.4.1) may not have the F
distribution at all.
~
The two test statistics in (3.1.3) and (3.4.1) are not
independent even when both null hypotheses are true, since
they have the same sum of squares in their denominators.
In
general their numerators are not independent either, but for
orthogonal designs with m+ 1 factors,
~~~1/a2,
••• , and
Y:J.'YiJ a 2 ,
Y'Y.! a 2 ,
~~a2 are all independent and, if the re-
spective null hypotheses are true, are all chi square variates.
When this is the case, the m+1 ratios with Yi.'Yi./a 2 in the
denominator are quasi-independent in the sense of Ramachandran
[11], and any two or more of the null hypotheses may be subjected to a simultaneous test as developed by Ghosh [5].
If we include in the model the assumption that £(s)
is normally distributed with
unknown constant and with
~(£) = ~i(s)
= a~l(s),
var(£)
also normally distributed with ~ (:t) =
postulated independence of £ and
var(~)
The
~,
2,.
where ~ is an
then ~(s - 1) is
Recalling the
we get
= var(1;1~)+ var(~1~)
= T;1tl[a~l]tll(1;1)'+ ~1[a2l]1J
s - 1 components of
~
variates if and only if
will thus be uncorrelated normal
1;11
= I·
const., that is, if and
only if ""T, is proportional to an orthogonal matrix.
This
NSC is certainly satisfied by the restricted designs of
§2.5 for which 11
= t l.
These designs have been called
56
"linked block" by Youden [21].
The class of such includes
all orthogonal designs but also
~ome
designs.
BIB and even some PBIS
Gnanadesikan [6] states the restriction in other
terms--g1 [var(y) ],91 = Q1 • const. --and remarks that "in
general" the conditions are not met by incomplete designs.
Throughout the remainder of this chapter we shall further
assume that the design is linked block if for two factors,
and orthogonal if for more than two factors.
Thus we have
Then X'X is the sum of
s- 1 squares of independent normal
variates, each with zero mean.
Thus
X'Y../(b 2 0'2 +
d.f., while
0'2)
yj'Yj,,/0'2
is a chi square variate with
is a chi square variate with
and the two are independent.
s- 1
n- r
*
Then proceeding exactly as in
[6], we specify a set of confidence coefficients
1 - a.J.
and
find the pairs of constants X~i and X0i such that if X2 is
a chi square variate with the appropriate d.f.
Then
Pr { -rXU1
w'w
Pr{-2- ~
XU2
Y..'~
2 2
< b 0' 1
< X'v
0'2 _
2
J=
1 -
a1
and
XL1
w'w
0'2
+
< -2-J
XL2
=1
- a2
d. f.,
may be combined into
57
~-
vt_.v
X!'~~}2
Xu
XU2
---
2
-
wi th a confidence coefficient equal to
3.6
b
2
(1 -
a, ) (1
~
a2).
~'Ht~
Preliminary or gyasi-confidence bounds on
We have defined
-
•
so that its distribution is free of
A, and then we have restricted consideration to those designs
which simplify the relation of
~ to~.
Thus we have
where b is a constant of the design and!, is
may solve (3.6.1) for
Now although
~
1',
N(~,a21).
We
obtaining
appears formally in the right member of (3.6.2),
it must be the case that the right member no less than the
left is a function of
~
only and hence independent of
~.
Then
i~t,/0'2 and Yi.'~/ a 2 are independent chi square variates with
s - 1 and n - r * d.f. respectively.
Hence for a chosen a, we
can find Fa,' the upper a, point of the variance ratio distribution with the appropriate d.f.
Then
(.Yo - bt!Q.) , (.Yo - bt!Q.)/ (s - 1 )
Pr{
*
'!J.''!l/(n-r)
<F
-
J
a,
= 1 - a,
•
It is important to note that unlike (3.4.1), (3.6.3) is
true regardless of the nature or value
of~.
with the use of the Cauchy inequality we get
From (3.6.3)
58
.1.
(b'H'Hb)~
.. - -
s - 1 )w'wF ~t
- - a,
.
* J~
t
0 2 (n - r
1 -
(vtv)t
< - -
-
a,
+
~
•
On the assumption that £ is a sample of size s from some
population of block effects,
nor a parametric function.
a'tlf~
is neither a parameter
Although a function of random
variables, a'H'tlQ. is unlike a proper statistic in that it is
not explicitly definable in terms of the observations
y.
Thus (3.6.4) is neither confidence bounds nor tolerance
limits in the strict sense of those concepts but is merely
a probability statement on a set defined by three random
variables--the one in the middle and the two at the extremes
within the braces.
It should be noticed that this probabil-
ity statement is true no matter what the distribution of
£,
subject of course to the broad restrictions indicated
earlier.
Now from (3.6.4) we may go on to obtain genuine confidence bounds on a parameter or a parametric function when
~
has a specified type of distribution.
preliminary or quasi-confidence bounds.
Thus we call (3.6.4)
In the next two
sections we shall illustrate using (3.6.4) as preliminary
to finding confidence bounds, once for a continuous distribution and once for a discrete distribution.
3.7
Alternative confidence bounds on a~ ~
a is
normal.
59
Then!lttlttiQ./a~ is a chi square variate with s-1 d.f.
Hence
for a chosen as, we can find a pair of constants Xf3 and X~3
such that for X2 , a chi square variate with
Pr{xf3 5. X2 5. X~3J = 1 - as·
=1
- as
(3.7.1)
Since
Then
s -1
d.f.,
Pr{xf3 5. !2.'Ht!:!Q./a 2 5. X~3J
and
pr{!2.tl:itHb/X~3 5. O'~
rx~=1t < fx; ~t , we
~u~
< !2,ttltJ:!Q./Xf3 J = 1 - as •
can obtain from the three-member
l:L3J
inequality within the braces of (3.6.4), the following fourmember inequality which will be true with probability at
least as large as
1 - a, •
•
Since according to our model b is distributed independently
of
~,
we may combine the inequality of (3.7.1) with (3.7.2)
to obtain the following confidence bounds with confidence
coefficient not less·than (1 - a,)(1 - as).
60
This resulting inequality and its eonfidence coefficient may
be compared with (3.5.4) and its confidence coefficient.
No
claim is made that the present confidence bounds are in any
sense better than those found by Gnanadesikan in [6] and
given in §3.5 above.
The only merit claimed for the pre-
liminary or quasi-confidence bounds when
a is
normal is that
with their use we may find confidence bounds on a, without
having to use the more complicated distribution of the
actual observations I(n).
3.8
Cgnf~dence
bgunds when b
~§
two-valued.
As a first illustration of the use of the preliminary
confidence bounds when
g is nonnormal, we take an extremely
simple example of a discrete distribution.
that each b j for
j
= 1,2, ••• ,s
equal probability.
distribution of
~,
or
~8
with
Then we must derive the probability
a·H'~.
the s components of
has the value
We postulate
a,
Since
a'H'H2
is a sym. function of
its distinct values are determined by
the numbers of components having each value rather than by
which components have which value.
If we denote by si the
number of components of a(s) which have the value
of course, s, +
S8
(where,
0 So si So s), then it is easily
= sand
shown that
a'H'H!!. -_
~i
2
2
S1~' + S8~8 -
1( t:l
~ )2
S
S11-'1 + S81-'8
•
61
From this it is apparent that with probability 1/2 s - 1
£"!i'!:i2. = 0, and that with probability 1 _1/2 s - 1 , (£IJ:i'Hb)t
is proportional to f ~s - ~1
I,
which is thus a parametric
function of this two-valued, equiprobable distribution
assumed for
parametric function "naturally" associ-
b-~a
---
ated with b'H'Hb.
To obtain bounds on I~s -~11, we must first determine the extreme values of
S1SS/S.
Accordingly as s is odd
or even
(3.8.2)
i=1 < !J..!i. <
s
-
s
-
2
S -1
s
or .2.:l < !.1.!a. < .1
s s -4
•
Whence it follows that
(~
- p~11 _< (...L
s2-1 -b'H'Hb)t
- -- -< lAs
~
s-1 -bH.I'Hb)t
u -if s is odd or
(3.8.4)
s -btl.HUb)t
u. ~
(!
< lAs
-..,
-1:1. 1
t"
1
<
-
t
b'H'Hbr
s-1 -- -
(-L.
if s is even.
Whichever of (3.8.3) or (3.8.4) is applicable
is thus true with a known probability of (1 - 1/2 s - 1 ). Except for the fact that this probability cannot be chosen
arbitrarily as
1 -as
could be, the above inequalities are
confidence bounds for this distribution comparable to (3.7.1)
for the normal distribution--for which the variance is a
parameter "naturally" associated with .Q'H'!:iQ..
And just as
(3.7.1) was combined with the preliminary confidence bounds
(3.6.4) to obtain (3.7.3), so we now combine (3.8.3) with
(3.6.4) to obtain
62
when s is odd, and similarly from (3.8.4) and (3.6.4) obtain
(3.8.6)
tx.~t _
sl,)2
t(S-l )~'!ltFa9t
s ( n-r *) l,) 2
- G
sv'v
7,
i r ~8 - ~1' i
tt
)b 2
+
- - a1~t
sw'wF
(n-r*)l,)2
for the case when s is even.
fidence coefficient is
In either case the final con~ (1 - a1 )(1 - 1/2 s - 1 ).
In itself this two-valued equiprobable distribution
for each b j is seemingly artificial and of very limited interest. It has been used to illustrate a method, and in
subsequent sections of this chapter we hope to show that
even this example has wide applicability when properly
interpreted.
3.9
A quadratic form in m-tile differences.
In the two preceding sections we have considered the
case of a normal distribution and the case of an equiprobable two-valued distribution for the block effects b.
Mathematical convenience might recommend either of these,
but certainly a realistic approach to experimentation would
recognize that in many situations the block effects actually
63
affecting the observations would have neither of these distributions.
Can the statistical analysis of an experiment
yield useful conclusions even when the nature of the underlying distribution is not previously known or postulated?
If so, what properties of distributions-in-general would be
useful and also possible to infer?
These questions we
attempt to answer by the type of analysis developed in this
section and the remainder of the present chapter.
The median, the quartiles, the sextiles, the octiles,
and the deciles of a distribution give successively more
information about the distribution--information whose acquisition does not presuppose a particular type of distribution.
Unlike moments and measures of skewness and kurtosis based
on moments, the above properties necessarily exist and have
a direct probabilistic interpretation.
by
m~n
Thus if we denote
the n-th m-tile of the population from which b is a
sample,
Pr{ mt-'n
Q
-< b
< mjJn+1
~
J
= 1m
•
It would be very desirable to be able to infer these m-tiles
and very satisfying if components of variance analysis could
yield the m-tiles of general block effects as well as variances of normal block effectsJ
expect.
But this is too much to
We know that even for fixed effects, not the effects
themselves but only contrasts among the effects can be estimated by ANOVA.
Perhaps, somewhat
an~logously,
the
64
differences between certain pairs of m-tiles could be estimated?
These differences would be lengths of the intervals
into which the range is subdivided by the requirement of
equiprobability.
And if we wanted to represent such an
interval by a single value, the most appropriate would be
not the mid-value of that interval but the (conditional)
median of the portion of the distribution lying in thQt
interval.
Thus the intervals whose boundaries are the
m-tiles would be represented by certain 2m-tiles (those
which are not also m-tiles).
To be more specific, let us
consider the first interval determined by the k-tile divisions
of the range of b.
Then for any b it is a true statement that
But for mathematical convenience we represent and replace
the interval from
k~O
to
k~1
by the conditional median of
that portion of the distribution, viz.,
the interval from
2k~3.
k~1
to
k~2
2k~1.
Similarly
would be represented by
And in general the k intervals between the successive
pairs of k-tiles are represented by the k odd 2k-tiles. Thus
instead of such probability statements as (3.9.1), we shall
use the following:
m=1,2, ••• ,k,
where the subscript 2k on the variate b indicates that it
is not the original variate, having a presumably continuous
density function, but is the replacement or substitute
•
65
variate, which takes on only k discxete values (the odd
2k-tiles of the distribution of the original b) with equal
probabilities.
For k
= 2,
2kb is the kind of variate b was
assumed to be in §3.8.
In this and subsequent sections we do not postulate
that b
1£
discrete, but we attempt to infer something (viz.,
differences between successive odd 2k-tiles) about the distribution of b through appropriate use of (3.9.2).
In other
words, in replacing the original variate b, for which (3.9.1)
holds, with the substitute variate 2kb defined by (3.9.2),
we are representing the unknown shape of a frequency curve
by a histogram of k cols., the boundaries of these cols.
being the k-tiles of the true distribution of b.
To the
extent that the curve differs from the histogram, the results
obtained using 2kb will differ from the corresponding properties of the true distribution of b.
But this kind of dis-
crepancy is admitted, and it can be made smaller and smaller-thereby giving a closer and closer approximation to the true
distribution--by increasing k.
However, we shall see in
§3.10 that there is another phenomenon which, as k is increased, reduces the associated probability or confidence
coefficient.
Thus the attempt to increase the closeness of
approximation of the histogram to the true distribution of b
leads to what might be called a saturation point in the
final confidence bounds and confidence coefficient.
The vector £(s), consisting of a sample of size s
66
..
from the actual population of block effects, is thus xeplaced
by 2k~(s), where each component is
we denote by
for each
(3.9.3)
2k~2m-1
for some m.
If
2k~2m-1
s2m-1 the number of components equal to
m= 1,2, ••• ,k, then
[2k!l(s)] '!:l'.(s x s:r)!:l(s:f x s)[2k!l(s)]
k
k
=
L(s2m_1)(2k~2m_1)2- t[ m=1L(s2m-1)(2k~2m_1)]2.
m=1
Also we define for any integer
k
~
2
and
m= 1 ,2, ••• , k-1 ,
Then by applying Corollary 1.5.3 to the right member of
(3.9.3) we get
~
For
k
m-1
L L (s2m_1) (s2n-1 ) (2k~2m-1 - 2k~2n-1 ) 2 •
m=2 n=1
m> n ,
m-1
2k~2m-1
-
2k~2n-1
=
L
(2k~2i+1 - 2k~2i-1)
i=n
•
Substituting (3.9.4) into (3.9.6) and then the result into
(3.9.5) gives
k
m-1
m-1
t m=2
L n=1L (s2m-1) (s2n-1 ) ( i=n
L
Applying Lemma 1.5.5 to this gives
67
~
k-1
n
k
{L(L
s2i_1)( L ~2i_1)(2kdn)2 +
n=1 i=1
i=n+1
k-1 m-1
n
k
L L(L
L
2
s2i-1)(
s2i-1)(2k dm)(2k d n)] ,
m=2 n=1 i=1
i=m+1
which, by Corollary 1.5.6 can be written as the quadratic
form
k
j
real sym. matrix which has
t(
Ls2m-1)( L s2m-1)
m=1
as the
m=i+1
element in its i-th row and j-th col. (and in j-th row,
i-th col.) for
i
~ j
•
We must carefully distinguish three expressions
which look very similar.
First there is
~tHtH~,
a non-
negative scalar function of the continuous stochastic vector
~(s).
Since
~
is not directly observable,
~tHtHb
is not
computable, but in (3.6.4) we have upper and lower bounds
on (~tHttl£)t.
statistics
These bounds depend upon the computatable
and the chosen confidence level
~t~, ~t~,
a,.
Suppose we denote these bounds by single symbols as follows:
.t,
-
(3.9.9)
.t 2 -
f>2(n-r )
f>
(~t y.)~
f>
t
t
S-1 )!!l:a~t
(~t~)t
+
S-1 )~t;FaJ
f>2(n-r )
it > 0 J of her-wise
11 =0.
- )
•
68
In these terms (3.6.4) becomesa
Pr{t,
~
(!:l t!:it!i£)* 5. tal
~
1 - a,
•
Second there is (2k!:l) t!:it!:i(2k!:l) , which is a nonnegative scalar
function of the hypothetical, discrete stochastic vector
2k£(s), each element of which can take on anyone of the k
odd 2k-tile values of the population from which £(s) is a
sample of size s.
The subscript 2k prefixed to'b
guishes the second expression from the first.
distin~
But if the
statement within the braces of (3.6.4) is true for any
distribution of !:l with probability not less than 1 -a"
then
it follows that the altered statement in terms of the substitute variate 2kb instead of the original b would also be
true with probability not less than 1 - a1.
Thus
What this replacement does to the final confidence coefficient
and confidence bounds on the 2k-tile differences of the true
distribution of b has already been indicated.
(2kh)tHt!:i(2k~)
Now
has its own theoretical distribution.
With
probabilities obtainable a priori, it takes on discrete
values expressible in terms of
in terms of
2kdn
2k~2m-1
for n = 1 ,2, ••• , k-1.
for m= 1 ,2, ••• ,k
or
These expressions
we already have in (3.9.3) and (3.9.7) respectively.
Statis-
tically we regard 2kdn as a fixed but unknown parameter in
the distribution of (3.9.7) and regard the partitioning of s
69
into S1,sa, ••• ,sk_1 as stochastic.
But purely mathematically
we may think of 2ks! as the k - 1 Cartesian coordinates of a
point and think of the s1,sa, ••• ,sk_1 as given constants
which enter into the matrix g of the quadratic form (3.9.8).
When (2kQ)'H'H(2k£) is thus regarded as a function of these
2kd coordinates, a function whose coefficients are determined by a given partition of s, then and only then
(2kb)'H'ti(2k£) will be denoted by the symbol
6~(g).
It
should be noted that the a priori probability of a specific
partitioning of s into
by
~)
S1,S8, ••• ,
s2k-1 (hereafter denoted
is
sl
k
s
k
1T
m=1
(s2m-1)!
Thus for a given vector
2kg ,
=
L
k
s
k
sJ
1T
(s2m-1)!
m=1
where the summation is over all (integral) partitions of s
for which
6 s (g) = const.
From the above definition of
6~(s!)
as a quadratic
form in g for a given S, it follows that
determines a locus in the (k-1)-dimensional space of which g
corresponds to one point.
For all possible partitions S,
70
13.9.13) would determine a discrete family of such loci.
If for some one
A~1
~,
say
~1,
all real points on the locus of
(d) = const. are interior to some other loci in the
~
family (i.e., loci determined by other partitions
but the
same const.), then we call the locus of As (d)
= const.
inner boundary.
~a,
_1
Similarly, if for some
= const.
points on the locus of As (d)
_2 -
~,
-
say
an
all real
are exterior to some
loci determined by other choices of £ but the same const.,
this locus of As (d)
_2 -
= const.
we call an outer boundary.
For several different values of k we shall investigate the
geometrical aspects of
A~(g)
to determine the existence of
inner and outer boundaries and their shapes if they do
exist.
What use is to be made of this investigation?
From
(3.9.10) we shall obtain a confidence region in the space
whose coordinates are the k - 1 components of 2kd.
region will be bounded (in part) by A£1(g)
As_2 (g)
= ~:.
= ~f
This
and
The confidence coefficient will be determined
in part by the 1 - a, of (3.9.10) and in part by the sum of
appropriate probabilities given by (3.9.12).
Now by Lemma 1.5.10, G is p.d. if no s2m-1
long as no s2m-1
=0
= o.
So
then, the locus of (3.9.13) is an
ellipsoid in (k - 1) -dimensional space.
one or more values of m,
~
When s2m-1
=0
for
is p.s.d. (by Lemma 1.5.11).
When s2m-1 ~ 0 only for m = 1 and m = k, (3.9.5) reduces to
k-1
t s1 s2k-1( m=1L
2k dm)
2
71
and for each const. in (3.9.13) the s -1 'Possibilities for
~
determine s - 1 pairs of "parallel" hyperplanes in the
(k- 1)-dimensional space.
and all s2m-1
=0
These two cases--no s2m-1
=0
except for m = 1 and m = k--we call bound-
ing cases, because under such conditions the loci of (3.9.13)
and the coordinate planes bound regions in which 2kS has
finite, positive components.
loci.
Such loci we call bounding
Henceforth we shall have no interest in loci of
(3.9.13) which are not bounding loci.
3.10
A priori probability of bounding loci.
We have from (3.9.12) the a priori probability of a
particular partition
what conditions
space of 2kS.
~
~
of s, and we have just seen under
corresponds to a bounding locus in the
The problem now is to find the probability of
the union of all
~
corresponding to bounding loci.
This
total we call the bounding probability and denote by y •
We know that the total number of possible partitions of s
into k nonnegative integers is (
s+k-1
k-1
).
We would like a
scheme--preferably a visualizable scheme--for classifying
the possible partitions in such a way that those corresponding to bounding loci are easily distinguished and their
associated probabilities are readily summed.
At first we disregard the fact that each s2m-1 must
be a nonnegative integer, and we think of each
~
as
72
coordinates of a point in k-dimensional space.
The condition
k
Ls2m-1
=s
confines the points under consideration to a
m=1
hyperplane or a (k - 1 ) -f lat.
The further restriction that
each s2m-1 be nonnegative confines the points to that portion
of the (k - 1) -f lat bounded by the coordinate hyperplanes.
These latter are also (k - 1 ) -f lats and thus with the first
hyperplane form k distinct intersections or boundaries of
k - 2 dimensions.
But this is a rather familiar geometrical
figure known as a (k - 1) -dimensional simplex and often denoted by S(k).
The intersection of the original (k - 1 )-flat
and (k - 1 - r) ether (k - 1) -f lats in an r-f lat, which is
called an r-boundary of the simplex.
k
S(k) has (
I'
r+1
) r-boundaries for
= 1,2, ••• ,k-2
(I' -
each
r-bounda~y
I'
It is known that an
= O,1, ••• ,k-2
and that for
is bounded by r+ 1 distinct
1) -f lats, making ita simplex S(r + 1 ).
Since in our
scheme the r-boundaries are determined by (k - 1 - r) coordinate
hyperplanes--i. e., by s2m-1
=0
for (k - 1 -
r)
particular
values of m--the points on a particular r-boundary correspond
to the (
r+s
) partitions of s into r+ 1 nonnegative summands.
r.
From geometry it is known that the number of q-boundaries
r+1
•
lying in a given r-boundary of a simplex is (
I'
= 1,2, ••• ,k-2.
) for q
q+1
But for q
< r,
> 0, these q-boundaries have
lower dimensional boundaries in common. Hence in order to
s+k+1
make our classification of the (
) partitions ~ mutually
k-1
73
exclusive, we must consider
points~n
an
r-b~undary
which
are not also on any boundary cf dimension lower than r.
We introduce the symbol Nr for the number of such points
and Yr for the total a priori probability associated with
the union of those points (on a single r-boundary).
Now
even though a simplex S(k) has no boundary of k- 1 dimensions,
we define Nk _1 as the number of points in the interior of
S(k), i.e., with none of their k coordinates zero, and denote
by Yk-1 the total a priori probability associated with the
union of those points.
(3.10.1)
Th~n
it follows that
s+k-1
k-1 k
() =
)N
r
k-1
r=O r+1
L(
,
and
k-1
=
1
L(
k
)Yr
•
r=O r+1
Since each r-boundary is an S(r+1), the single equation
(3.10.1) may be replaced by the set of k - 1 equations
(
s+r
) =
r
r r+1
L(
i=O i+1
)N.
, r= 1,2, ••• ,k-1 •
1.
These equati.ns together with the trivial No
(3.10.4)
s-1
N
r
=(
r
),
r
=1
determine
= 0,1 , ••• , k-1 •
We now seek Yr , especially Y1 and Yk-1. We recall
that the a priori probability associated with any point of
74
S(k) is k- s multiplied by some coefficient.
If each member
of (3.10.2) is multiplied by k S , we obtain a relation among
these coefficients Cr
•
Since each r-boundary of a simplex is also a simplex, we
have k - 2 analogous relations among these same coefficients
r
(r+1)s =
L (r+1 )C.
1=0 i+1
,
r= 1,2, ••• ,k-2 •
l.
Moreover, we already know Co = 1.
Rewriting (3.10.6)
r+1 r+1
(r+1)s = i.J
~ { l.. )C.l.- l ' r=1,2, ••• ,k-2
1=1
and using Lemma 1.5.2, we get
i
L(_1)i-j(~)jS
=
·-1
J-
r+1
(3.10.8)
..
.
L
C
r
=
yr
=k- s
i=2,3, ••• ,k-1.
J
<_1)r+1- j <r
+1
j
)jS, r=1,2, ••• ,k-2.
j=1
, r=1,2, ••• ,k-2.
We still do not know Yk-1' but we substitute (3.10.8) and
Yo = k- s into (3.10.2). Then
1 = k- S [
k
k-2 k r+1
.
. r+1
k
1
(1 ) +
r+ 1)
(-1) r+ - J ( j ) j S + (k) Ck -1 ] •
r=1
j=1
L(
k-2
(3.10.9)
L
k r+1
r+1
1 =k- [
(r+1)
(-1 )r+1-j( j )js] + Yk-1
r=O
j=1
S
I
I
•
Again using Lemma 1.5.2, we get from (3.10.9)
k
(3 •10• 10)
k
s ~ (-1)k- j (J.)jS ,
Yk-1 = k- lJ
j=1
which is just what (3.10.8) would give if known to be valid
for r = k - 1.
Since only one of the one-dimensional boundaries
of S (k) and the (k - 1 ) -dimensional interior correspond to
bounding loci of (3.9.13), for k> 2 the a priori probability
of the union of bounding loci of (3.9.13) is
(3.10.11)
For k= 2 the interior is one-dimensional.
Hence the a priori
probability of the union of bounding loci of (3.9.13) is
From (3.10.10) it can be seen that increasing k indefinitely,
which seemed desirable from consideration of closeness of
approximation, would decrease the probability with which
assertions about 2kS would be true.
Thus we have the situ-
ation described as a saturation point and also compared
with the Heisenberg indeterminacy principle.
76
3.11
Inner and outer boundaries of bounded regions.
Having obtained the a priori probability of the union
of bounding loci for k=2,(3.10.12), and for k>2, (3.10.11),
we now proceed to investigate the shape of the region which
is the union and the region bounded in part by the inner and
outer boundaries.
Since the k- 1 components of
2k~
are neces-
-sarily nonnegative, the region in which we are ultimately
interested has positive coordinates and is bounded in part
by the coordinate hyperplanes.
In §3.9 we defined inner and
outer boundaries in such a way that they were not unique.
But in order to use the probability of the union of all
bounding loci--i.e., as large a probability as possible--we
must find the outer boundary which is exterior to all other
bounding loci and the inner boundary which is interior to
all other bounding loci, for the same const. in (3.9.13).
Then using ~: as the const. for the outer boundary and ~~ as
the const. for the inner boundary, we shall have a bounded
region in the 2kg space for which
o < t~ < li.§. (d) i~:
.
Since the family of loci represented by (3.9.13) is discrete
rather than continuous, not all points in this bounded region
represent possible values for
2k~;
but all possible values
of 2kd correspond to points within the bounded region.
When k
(3.11.2)
=2
li (d) =
.§. -
S1
5
s
8
4d~
•
77
The bounded region for the single coordinate is an interval
whose outer boundary is the point, with 4d1
s-1
- S
4
~
0, such that
d 2 _ 02
1 -
""2
and whose inner boundary is the point, with 4d1
(3.11.4)
or
S
4'
4
d2 _
02
1 -
""1
~
0, where
accordingly as s is odd or even.
When k
= 3,
•
The outer boundary is the positive quadrant portion of the
ellipse
(3.11.6)
The inner boundary is the positive quadrant portion of the
line (with nonnegative intercepts) determined by
(3.11.7)
accordingly as s is odd or even.
All other bounding loci
of (3.9.13) lie between the inner and outer boundaries.
For
small values of s, this fact may be demonstrated by mere
enumeration of cases.
But for a general s, it is clear that
the ellipse with largest int€rcepts corresponds to the
smallest coefficients of its square terms.
is satisfied by
~'(3)
= (1,
s-2, 1).
This condition
On the other hand,
78
of the family of lines whose equation is
(3.11.8)
the line with smallest positive intercepts corresponds to the
largest coefficient.
When 5 is odd, ~I
~'= (5;1, 0, 5;1) makes S1 S5/S largest.
.2.'
= (~,
0, ~) makes S1S5/S largest.
= (5;1,
0, 5;1)
or
When s is even
Moreover, for s
>
2,
S/4 ~ (s-1)/s; and for s ~ 3, (s2-1)/4s ~ (s-1)/s, with the
equalities holding only for the smallest values of s.
for k
= 3,
Thus
the inner boundary and the outer boundary are
uniquely determined.
To justify our calling these two loci the inner and
~
outer boundaries, we should show that all other members
of the family (3.11.8) and all other members of the family
of ellipses
= const.
do in fact lie entirely between the above two loci so long as
(3.11.10)
Since all lines in (3.11.8) have the same slope, a larger
intercept does imply greater distance from the origin in
other directions too.
But for a family of concentric
ellipses, a member with smaller intercepts could cross
another
me~r
with largex
interc~pts
if the eccentricity
79
ot the forIller wa& enough smaller than the eccentricity of
the latter or if the orientations of their major axes were
sufficiently different.
Hence we must show that neither of
these possibilities is the case here.
substitute
od,
=p
cos
e
and eda
=p
Into (3.11.9) we
sin
e,
obtaining a&
the coefficient of p 2/ 5, 51 (sa + 55) cos 2e + 25, 55 cos e sin 6
+ (s, + SS)S5 sin 26. The derivative of this coefficient (with
respect to
e)
is 25,55 at
e =0
and -25,55 at
e = n/2.
Thus
the major axis of every member of (3.11.9) lies in the first
quadrant.
p
Moreover, the magnitude of the rate of change of
with respect to 6 is least when
5, = S5 = 1.
In other words,
the particular ellipse (3.11.6) has the smallest eccentricity
of any ellipse in the family (3.11.9).
For k
= 4,
the bounding loci belong to the family of
surfaces
s,(Sa+ 55 + s?) 2
2s, (55+ S7)
+
ad,
(ad,)(ada) +
5
5
(3.11.11)
(5, + Sa)(S5+ 5?)
2s,s?
a
s (ad,) (ada) +
ada +
5
2(S1+ sa)s?
(s,+ sa+ S5)S? 2
da)
S
-(ad2)(a
+
5
ada
= const.
Since the surface with the largest intercepts corresponds to
the equation with the smallest coefficients of its squareo
terms, and since s, or s? is a factor of all but one such
coefficient, we set 5,=
becomes (55+ 1 )(55+ 1 )
5'
S7= 1.
This remaining coefficient
which is then made small by making
80
either Sa = 1 or &5
= 1.
Which of these choices is made
does not change the intercepts or the point of intersection
with the equiangular line, but it does interchange the traces
of the surface in the ada
=0
plane and the ad, = 0 plane.
Thus there is no one member of the family (3.11.11) which
can be the outer boundary, but in the positive octant every
member of this family for which (3.11.10) holds would be
interior to the ellipsoid of revolution whose equation is
d 2 + s-1 d 2 _ D2
+ 2(5-2)
- 5
a 2
- 5
e a-"\;2
•
This surface may be called the outer boundary.
The inner
boundary is a member of (3.11.11), viz.,
(3.11.13)
t(ed, + ed2 + eds)2 = t~,
or
s2 -1 ( 8 d 1 + 8 d 2 + 8 d S ) 2 --
~
accordingly as
5
,
is even or odd.
When k = 0,
which have s, or
D2
"\;,
Sg
A~(d)
has ten distinct terms, seven of
as a factor.
The remaining three terms
have either (5, + sa) or (57+ 59) as a factor.
all intercepts as large as possible, we use
Hence to make
~'=
(1,1,5-4,1,1).
This gives as the outer boundary the hyperellipsoid
81
2(5-2)[
( od ) (1 ada) + 10 dS2 +
5s-1( ' od,a
+a
,od)
S
10 d a2+'
4 +
1
(10 d s)(10 d4)]+ f(10 da)(,ods)+ ~[(10d')(10dS) +
(3.11.14)
(10 d a)(,od 4 )]+ ~(10d1 )(10d 4)
=~~
•
The inner boundary is easily found to be one of the hyperplanes determined by
5 2 -1
a
~(1 od, + , od 2 + , ods + 1od 4 )
= ~1a
,
or
(3.11.15)
accordingly as s is odd or even.
It is interesting to note that the intersection of the
equiangular line and the outer boundary is independent of s,
provided 5
~
k, when k
=3
and k
=5
but not when k
= 4.
This property is in striking contrast to the fact that for
k
= 3,4,
or 5, the intercepts of the outer boundary are
decreasing functions of s.
Another property worth noting here is that increasing
s has more effect in decreasing the intercepts on the axes
of the "middle" d's than on the 2kd1 axis or the 2kdk-1
axis.
Recalling the definition of 2kdm' (3.9.4), we realize
that this geometrical conclusion merely confirms a plausible
belief that increasing the sample size in order to reduce
the uncertainty about the inferred distribution is more
effective for the middle range of the distribution than for
the tails.
/e
82
3.12
Summary of development when b
!!
not known to be normal.
Since these last few sections have been rather mathematical and seemingly far removed from the original purpose
of making statistical inference about an underlying population of block effects, we close this chapter with a brief
summary of the latter portion.
assumption that
~(s)
In §3.6 we dropped the
was normal, and instead of aiming
directly at confidence bounds on some parameter such as the
variance, we obtained preliminary or quasi-confidence bounds
in the form
(3.12.1)
where
t,
and
t2
are computable from the observations and a, ,
as defined in (3.9.9).
In §3.9 we p=esented a motivation for
replacing b'ti'Hb with a discrete random variable whose values
are quadratic forms,
~~(~),
in the differences, 2kdm' between
successive odd 2k-tiles of the distribution of b.
we found particular pairs of quadratic forms
~~1 (~)
In §3.11
and
As (g), for k= 2,3,4,5, such that
_2
for y as large as possible.
In §3.10 we had already found
this largest y as a function of k and s.
(3.12.3)
and
~
(d)
.&2 -
The loci of
= t 22
we called inner and outer boundaries, respectively, of a
region in the (k - 1 )-dimensional space of
2k~.
Since a,
83
and yare independent probabilities, (3.12.1) and (3.12.2)
may be combined to give statements with a joint probability
~
(1 - a,)y.
The particular statements we make are that
the k -1 differences between pairs of successive odd 2k-tiles
of the distribution of b are nonnegative and have the simultaneous confidence region bounded by (3.12.3).
CHAPTER IV
.
MULTIVARIATE MODEL: DESIGNS
AND STATISTICS
4.1
General two-factor models.
Multivariate analysis of variance (hereafter called
MANOVA) is not as old or as familiar as its univariate
counterpart now widely known as ANOVA.
In this chapter and
the next, we shall, in so far as is feasible, treat the
multivariate model like the univariate model, thus utilizing
the results of Chapters II and III.
By a multivariate experiment we mean the kind of
experiment intended to show the effect of one or more factors
on a certain response (just as a univariate experiment for
ANOVA) but called "multivariate" because both the response
and the factors are evaluated by observations on not just
one characteristic but several characteristics of the experimental units involved in the experiment.
Thus a certain
experiment may be planned in which t treatments are applied
to n experimental units arranged in s blocks according to
some experimental design.
If the experimenter observes only
one characteristic of each experimental unit, the model is
univariate, and the
as components of the
observa~ion~
ve~tor
84
l(n).
are conveniently arranged
But if his model is
85
multivariate, the experimenter observes the same p different
(but presumably related) characteristics of every experimental unit.
Such observations are conveniently arranged as
elements of the matrix yen x p).
The model would tell the
number of factors--number of ways of classification--and the
number of characteristics believed necessary to characterize
each cell of the cross classification.
Some of these char-
acteristics might be qualitative and others quantitative.
The experiment might reveal that one or more of the characteristics originally believed necessary could conveniently
be dropped from the model.
But whatever characteristics are
to be observed for multivariate analysis of variance, the
experimental design tells how to distribute the experimental
units among the cells of the cross classification.
The p
cols. of yen x p) are related to corresponding cols. of the
matrix of postulated treatment effects and the matrix of
postulated block effects by the identical structure matrix.
Thus in MANOVA the p different characteristics could not
possibly have different experimental designs.
Each level of
each factor is assumed, tentatively at least, to have an
"influence" on each of the p characteristics.
Hence for each
treatment and for each block there is not just a single effect
but a set of p effects, written as a row vector.
Thus for
a two-factor multivariate model we have, analogous to (2.1.1),
(4.1.1)
Y (n x p)
= M(n x s+ t ) - (t x
~
pJ
B(s x p)
+ E(n x p) ,
86
where the structure matrix
Mis
exactly the same as in §2.1.
The elements of g are assumed to be normally distributed
and of the nature of errors.
ANOVA model, the rows of
~
Analogous to the errors in the
are assumed to be uncorrelated,
but we do allow and expect correlation between elements in
the same row.
Thus we say every col. of g' (p x n) is
N[~,~(p)],
and the i-th col. of g is N[Q,aiii(n)], where a ii
is the i-th element in the principal diagonal of ~(p). As
in Chapter II we do not specify the nature of the treatment
effects A(t x p) or the block effects la(s x p) except to say
that if random, they are independent of E and
~
is independent
of A.
4.2
Suitable statistics for MANOVA.
Because of the fact that the structure matrix
M(n x 5+1) of (4.1.1) is the same in every respect as the
structure matrix of §2.1, we have in the first four sections
of Chapter II given both the motivation and the detailed
derivation of certain matric functions of M which we now use
to transform the multivariate observations
y. Analogous to
(2.2.4) we get
(4.2.1)
[ Ii(t:r x t) , Q (r:r x s:T') ] [MiMI ] -1 Mi'i.
which is a set of (t='f x p) statistics entirely free of block
effects.
And analogous to (2.3.1) we get
which is a s~t of .(5=1 x p) statistics entirely free of
87
treatment effects.
\
'"
lo(t-1) and
Using the same triangular matrix factors
11 (s-1)
defined in §2.2 and §2.3 r~spectively,
we define
Y( -t-1 x P )
-
I ] -, I
To !i,Q ] [MIMI
MIl:
[ .... -1
and
y( s -1
x p) :: [ Q,
1;1 ~][ MiMI] -, Mi Yo
which are analogous to (2.2.6) and (2.3.3) in that each set
of statistics is related to the original matrix of observations by an orthonormal transformation.
The third set of
statistics, free of both treatment effects and block effects,
is defined by the same matrix L*(n-s-t+1 x n) used in §2.4:
•
Using the same steps as in Chapter II, we can express these
new matrices of statistics in terms of the unobservables of
the model.
o
Y = T 1 tIa + 10
y
= T;'!:ill. + 11
~
=
•
b~
Thus regardless of the nature of a(t':T' x p) and £!(s:1" x p),
each of Wo , W and! results from an orthonormal transfor"
mation on E alone. Using this fact and the nature of E as
stipulated in §4.1. we can determine the nature of these
error matrices.
88
4.3
Variances and covariances for a matrix Qf normal variates.
It is • well established custom to exhibit the n vari-
•
ances and (~} covariances of a set of n normal variates as
elements of an n x n sym. matrix, displaying each covariance
twice.
Thus we say a stochastic vectQr has a variance-
covariance !latrix.
In similar manner the elements of an mx n
matrix couW first be written as components of a vector, and
then the m~ variances and the (m;) covariances of those elements could be displayed as elements of an (mn x mn) sym.
matrix.
But it would certainly be desirable to arrange the
variance~
and covariances in such a way that properties of
rows of the original variates and properties of cols. of the
original variates are readily apparent from the large matrix.
Starting with the matrix k(m x n), we replace it with
the vector ..(mn) :: [k(m x n) ·x len) Jh(n 2 ), which by
Prop. 1.2.6 separates the rows of
~
but keeps consecutive
in .. consecutive elements in the same col.
of~.
Hence the
n • (~) intracol. covariances will appear (twice) as the nondiagonal elements of the n principal mx m submatrices.
On
the other hand, the m' (~) intrarow covariances will appear
(twice) as the diagonal elements of the remaining (nonprincipal) mx m submatrices.
The mn(m-1) (n-1 )/2 covariances
of elements not in either the same row or the same col. of
will appear (four times) as the nondiagonal elements of the
nonprincipal mx m submatrices.
The variances of the ele-
ments of Z are the elements of the principal diagonal,
~
89
consecutive elements in the same col. of
~
having their
variances consecutively ordered in the same subset of
diagonal elements.
In choosing to use the above scheme for rearranging
the elements of Z(m x n) into the components of
~(mn),
we
imposed a certain pattern on the matrix we have been describing which suggests that this latter matrix might be a
Kronecker product of two simpler matrices, say
(4.3.1)
But (4.3.1) is a further restriction on the kind of interdependence among the elements of
Prop. 1.2.6.
~
and does not follow from
However, if each col. of
has the variance
~
matrix L (m) and if the n cols. are independent,
-c
(4.3.2)
And if each row of
~
has the variance matrix Lr(n) and it
the m rows are independent,
L(mn) = -I(m) ·x -r
L (n)
-
•
Starting with a matrix £,(m x n) whose variances and
covariances are given by the matrix
~(mn)
tion described above, we may make a
transfo~mation
(4.3.4)
in th9 configura-
y(q x n) :: l(q x m)Z(m x n)
and then want the covariances and variances of the elements
of
y. Applying Prop. 1.2.6 to (4.3.4) we get a vector
90
which, by Prop. 1.1.3, can be written as
which is easily recognized as
Thus premul tiplying the mattix
~(m
x n) by the matrix l{q x m)
corresponds to premultiplying the vector 1(mn) by the matrix
[l(q x m) ·x l(n)].
the elements of
(4.3.6)
~
Hence the variances and covariances of
appear a$ elements of the matrix
[I (q x m) •x 1 ( n) ] ~ (mn )[ l' (m x q) •x 1.( n)] •
This qn x qn matrix will also have distinctive patterns if
~(mn)
does.
Thus corresponding respectively to (4.3.1),
(4.3.2), and (4.3.3) we have (4.3.6) reducing to
(4.3.7)
I
~1(m)1'
·x
(4.3.8)
I
~c(m)l'
·x len)
(4.3.9)
I I' ·x
~a{n)
.,
.,
~r(n) •
Now comparing (4.3.9) and (4.3.3) , we see that they would
have the same form if and only if II'
= l(m).
In other words,
an orthogonal or orthonormal transformation applied to a
matrix whose rows are uncorrelated yields a new matrix whose
rows are also uncorrelated, and each row of the new matrix
has the same variance matrix as each row of the given matrix.
Since ,&(n x p) as described in §4.1 would have a variance matrix of the form of (4.3.3), viz., len) ·x
~(p),
then
10 (N x p), 11 (s:T x p), and !(n-r* x p) are matrices of normal
91
variates (with zero means) whose variance matrices are
respectively
1(s-1)
·x~(p).
Thus the orthonormal transformations on
matrices
U.
of y and
~
~.
and l(n-r *)
·x~(p)
•
y which gave us the
and! preserved for the normal error components
and for! itself the same type or pattern of vari-
ance matrix as was assumed for li.
4.4
Rest;icted designs and multifactor models.
In §2.5 we considered some restrictions on the designs
in order to obtain in
~t~
a particularly simple and meaning-
ful quadratic form in Q(s-1).
With a multivariate model no
single quadratic form will be an adequate representative of
the block effects R(s:1 x p), but the matrix Q1 (n x n) defined
in §2.3 will now determine both quadratic forms and bilinear
forms in the rows and cols.
~1
Hence simplification of
of~.
is still desirable and will be effected by the same three
restrictions on designs which were introduced in §2.5.
Multifactor or multidimensional models are possible
in MANOVA no less than in ANOVA.
Such a model may be ex-
pressed by
yen x p) = M(n x _t+ s )
~(txP~
R( s x p)
+ ~ (n x p)
where now
(4.4. 2)
and
~ t (p x s) = [R~ (p x
S1 )
.!a~ (p
X
sa), •••• .e~ (p x Sm) ]
92
with, of course, s
= 5,
+ sa + ••• +
S
Here (4.4.1)
m•
corresponds to (2.7.1), but (4.4.3) is identical with (2.7.2).
Thus when m> 1, the only designs compatible with the subsequent development in Chapter V are the orthogonal designs
as discussed
an~
characterized in §2.8.
specify that the n cols. of
each col. of
~t
is
Whether m~ 1, we
g' are mutually
N[g,,~(p)].
ind~pendent
and
Using on yen x p) the trans-
formations (2.8.10) and (2.8.11) used on r(n), we obtain
m+ 1 matrices of statistics whose relation to the unobservables of the model is thus known to be
y. (t-r x
(4.4.4)
~,
(s,-1 x p)
•
+ fo (t-1 x p)
boI:!(N x t )8,( t x p)
p)
=
b1!i<S1 -1 x s, )B, (5, x p) +
1, (5, -1 x p)
•
•
•
•
·
Ym(sm-1 x p)
where the b's are the same constants defined in (2.8.10).
And from the structure matrix M of (4.4.3) we obtain the
basis
M1
defined in (2.8.4), factor it into
Ib
as in §2.4
and then from 1 obtain the completion L*(n-r* x n) which
makes [1' ,lJE,J an n x n orthogonal matrix.
Then 1* is ortho-
normal, and the matrix of statistics !(n-r*x p) is defined by
~(n-r* x p)
= t:.*(~ x n)y(n x p)
and known from the argument in §2.4 to be free of all treatment effects and all block
effe~ts.
In fact,
93
(4.4.6)
Here, as elsewhere, r * denotes the rank of the structure
matrix regardless of m.
In agreeing to use only orthogonal
designs when m> 1, we have insured the independence of !1'
12, .•. ,
1m.
If
~1' ~2'
••• ,
~
are known or assumed to be
mutually independent (as well as independent of g) when
stochastic, then Y1' Y2' ••• ,
Ym
will also be mutually inde-
pendent and each of the my's can be used in the same way
as y for a two-factor model.
Hence, as in §2.9, we shall
not continue to consider separately two-factor and multifactor models.
In
§4.~,
and in Chapter V, we shall avoid
unnecessary complexities in notation by letting the single
symbol y(s:f x p) represent anyone of Y1 (S1 -1 x p),
Y2(S2- 1 x p), ••• , Ym(sm-1 x p) whenever m> 1.
4.~
Multivariate analogs of ANOVA sums of squares.
In Chapter II, after defining
the sums of squares y'y,
~'~,
and
~,
~'~.
~,
and
we formed
Analogously we use
the matrices defined in (4.2.3), (4.2.4), and
(4.~.1 )
~,
(4.2.~)
U' (p x N)y(t-1 x p
=:i' (p x n)Qo:i(n x p) ;
Y' (p x s:f)Y(S:1 x p)
=:i' (p x n)91 yen x p) ;
to form
!' (p x n:r*)!(~x p) = :i' (p x n)g y(n x p) •
The three Q's occurring in
(4.~.1)
are exactly the same sym.
matrices defined in Chapter II, but the complete expressions
are also sym. matrices rather than scalar sums of squares.
94
AS was indicated in the last sectioh, the same restrictions
on designs found desirable for univariate variance components
analysis would produce corresponding simplifications in
(4.5.1) and will prove equally desirable for multivariate
variance components analysis.
Henceforth the 9's used will
be understood to be those obtained when the experimental
design is subject to the three restrictions of §2.5 (and
is thus the kind of design called "linked block").
y'u are sums of squares,
which are quadratic forms in the p cols. of y. The nondiagonal elements of y'y are sums of products, which are
The diagonal elements of
bilinear forms in two different cols. of Y.
But each ele-
y'u, whether a quadratic form or a bilinear form,
ment of
is obtained from the appropriate data in accordance with the
same pattern given by 90.
said about
and
!'~
Of course, what has just been
y'u can be said, mutatis mutandi,§" about Y.'Y.
.
From the de scription of E(n x p) given in §4.1 we know
that the p.d.f. of
g' may be written as the joint p.d.f. of
n independent and identically distributed p-variate normals:
n
(21t)-
PT-I~I- 2 exp(- ~ tr[~-1E'g]).
Also it follows that
~ (g' E) = n~ •
Now so long as n~p, it is generally said that g'g/n has
the Wishart distribution with n d. f.
If n < p, g' g becomes
9~
p.s.d. instead of p.d. and its p.d.f. becomes zero.
Never-
theless, Gnanadesikan introduced, [6], the term "pseudoWishart distribution ll and applied it to at least p.s.d.
matrices of the form £'£/n where (4.5.2) is the p.d.f. of
£'.
In §4.3 we saw that
y,
~,
and
~
were such that 10' W"
and I were matrices of normal variates with the same pattern
of variance matrix
as~.
In other words if n were replaced
by t-1, s-1, and n-r*, respectively, (4.5.2) would be the
p.d.f. of 10,11' and~.
Hence I oWo/(t-1), W1W,/(s-1),
and wt~/(n-r*) would be said to have pseudo-Wishart distributions with t-1, s-1, and n-r* d.f. respectively.
matrices
y'y/ (t-1) and
Y'~/(s-1)
The
will not in general have
pseudo-Wishart distributions but will have under certain
conditions to be considered later.
CHAPTER V
•
MULTIVARIATE MODEL: TESTS AND
CONFIDENCE BOUNDS
5.1
"Step-down" and mu!ti variate m-tiles.
Before taking up specific tests, it seems advisable
to consider a feature of multivariate models sometimes overlooked but recently noted and utilized in a new test procedure.
The theoretician advancing from univariate to
multivariate models is inclined to postulate some generalization of the univariate normal which will retain many of
the latter's simplicities.
But the resulting multivariate
normal may have unintended simplicities of a distinctively
mUltivariate kind.
One of these is the fact that the multi-
normal distribution allows for no natural or preferential
ranking of the p different characteristics being observed on
each experimental unit--all variates enter symmetrically into
the p.d.f.
However desirable this may be as a mathematical
convenience, we should recognize that it is just that.
For
in a practical approach to a multivariate experiment, the p
variates to be observed are apt to be selected, one at a
time, in the order of their interest or presumed relevance
in the total context of the experiment.
seems the more
reQlis~ic
approach. we
96
In keeping with what
~dopt
the convention
97
that in all sUbsequent references to multivariate models,
the subscripts 1,2' ••• JP on col. vectors af
y,
~,
10'
or
11
will
~odicate,
I, a, B,
~,
y,
respectively, the most im-
portant, the next most important, ••• the least important
characteristic.
This natural otdering of the p characteristics prior
to the experiment undetlies the recently proposed "step-down
procedure" of J. Roy
[1~],
in which hypotheses about a
p-variate distribution are accepted if and only if a particular sequence of conditional univariate hypotheses would be
accepted.
But since he applies this procedure only to a
multivariate normal, and since the quantities he calls
"step-down variances" are
$0
defined that they would be vari-
ances of conditional distributions only when (as in the
normal) conditional variances do not depend upon specific
values of the given (fixed) variates, it thus appears that
his only use for the a priori order among the variates is
to justify using a seguen£i of univariate tests.
On the
other hand, we.believe that this notion of a priori order
among the characteristics is essential to a general treatment
of multivariate models in MANOVA.
In Chapter III we proposed to characterize a nonnormal
(or not-necessarily-normal) population by its 2k-tiles and
then showed how to obtain confidence bounds on differences
between successive pairs of odd 2k-tiles.
In the present
chapter we shall attempt a similar approach to the multi-
98
variate model with random block effects.
Consistent with
the notation introduced in §3.9. in a p-variate context
m~n1
will denote the n1- th m-tile of the marginal distribu-
tion of the first variate.
The symbol
m~n1na
will denote
the na-th m-tile of the conditional distribution of the
second variate given that the first variate lies below the
(n,+1)-th m-tile and not below the (n1-1)-th m-tile of its
mar9inal distribution.
Similarly
m~n1n2ns
will denote the
ns-th m-tile of the conditional distribution of the third
variate given that the first variate lies below the
(n2+~-th
m-tile and not below the (n2-1)-th m-tile of its
conditional distribution, etc.
As in Chapter III we con-
sider only even values for m, say 2k, and only odd values
If now we use b i for any element in the i-th col.
of Ii(s x p), we have
for. n J-.•
= k1
;
(5.1.1 )
~
2kt'n1
-1 -< b, -<
..
etc.
Combining the above we get
99
2k ~ n1n2+1'.••• 2k ~ n1 ••• np -1 -<b p
=.1kP
<
2k ~ n 1
••• n
p+1
J
•
Although these concepts and notational conventions are general enough for any integral values of k and p, we shall in
this chapter be concerned with those situations in which
both k and p are small.
For the simplest such situation, with k
=2
and p
= 2,
we shall now illustrate the ability of the differences between
pairs of these m-tiles to describe and distinguish certain
types of relationship between the variates.
In each of the
following diagrams the abscissa represents values of the
first variate and the ordinate, the second variate.
No scale
is shown because the diagrams show types of relations not
particular distributions.
In each diagram there is a verti-
cal dashed line at the median, and there are vertical solid
lines at the first and third quartiles of the first variate.
There are also in each diagram horizontal dashed half-lines
at the medians and horizontal solid half-lines at the odd
quartiles of the conditional distributions of the second
variate (given 'WhicA- -side of -the median the first variate
is on).
100
f--'-+._-'- . 4~ 3 3
~-- -
--:.---1----
4~ 81
I
I
I
4~ 1
Type 1:
4~S
No simple relation.
4~83
4~ 18 ----+-....,
I
- 'l
_-+---- 4~S 1
4~1
Type
2:
4~18
-
4~11
=4~BS
4~8
4~3' and 4~3'
-
-
4~11
=4Bs3
-
4~'3
•
4~38
__1
_
1
I
4~3
Type 3: 4~SS-4~13 = -(4~31-4B,,) and 4~38-4~S1 = 4~'8-4P11
-
2(4~31-4P11)
4~ 1 S ---+---1----+---.-- 4~S8
I
______ L
_
I
I
4
~ 1 ~ ---r---rI----I----4~S 1
I
I
4~'
I
4~3
•
101
Of these four types it is obvious that 1 is the most
general and 4 is the most testricted with 2 and 3 as distinctivesp~cial
cases.
If! type 2 conditional distributions
of the second variate"given different values of the first
I
variate, have the same"dispersion, as measured by interquartile differences; but the variates are lIdep.endent" in
the sense (i) that medians of the conditional distributions
of the second variate are different for different values of
the first variate.
In type 3, on the other hand, the vari-
ates are not "dependent" in this sense but are "dependent"
in the sense (ii) that conditional distributions of the
second variate have different dispersions, as measured by
interquartile differences, for different values of the first
variate.
In type 4, the variates are not "dependent" in
either sense (i) or sense (ii).
In type 1, it may be noted
that the variates are dependent in both senses.
(Type 4
for a normal distribution would imply independence.)
In each of the above diagrams the network of dashed
and solid lines and half-lines partitions the total range
into sixteen equiprobable regions.
What corresponds to the
interquartile range of a single variate is here represented
by the innermost four of these sixteen regions.
This union
of four rectangular regions is itself rectangular only for
type 4; in general it will be a re-entrant octagon.
To
determine the octagon exactly requires knowing seven
quartiles--4~1' 4~2'
4Pa, 4~11' 4~18' 4~a1' 4~aa--which are
coordinates of the vertices of the octagon.
But the trapezoid
102
whose vertices have as coordinates (4~' '4~")' (4~a'4~81)'
(4~a, 4~S8)'
and
(4~"4~,a)
has two sides in common with the
octagon and almost equals it in area.
Because of its greater
simplicity at slight loss in accuracy, it is this trapezoid
(which becomes a parallelogram for type 2 and a rectangle for
type 4) rather than the octagon which we shall use as the
bivariate analog of the univariate interquartile range and
whose shape and size we shall try to infer.
Of course it
should be remembered that the trapezoid is associated with
the probability of 1/4 not 1/2.
.
Analogously for a tri-
variate distribution, the central region would be a prism
associated with the probability of 1/8.
The trapezoid for
the first two variates would be a cross-section of this
prism, and the third coordinate of each of its eight vertices
would be either the first or the third quartile of the conditional distribution of the third variate given that the
first two variates are on a particular side of their medians.
5.2
Testing the hypothesis of equal fixed effects.
Throughout Chapter IV we used the labels "treatment
effects" and "block effects" for the elements of the matrices
a(t x p) and .§.(s x p), respectively.
But in deriving the
statistics y(t-1 x p), y'(s-1 x p), and W(n-r* x p), we did not
need to know whether a and .§. consisted of fixed effects or
random effects.
This fact is worth noting because, on the
one hand, theoretical developments usually state all their
postulates at the outset and, on the other hand, the
103
experimenter himself may not know whether a certain factor
being studied should be regarded as fixed or random.
•
Thus
even if one subsequent procedure is more appropriate than
another, it is desirable to be able to start the analysis by
computing the same set of statistics for both procedures.
And although the chief concern of this monograph as a whole
and of this chapter is particular is with random effects,
we shall consider briefly some procedures for fixed effects
when there are also random effects in the model.
In a univariate model all effects, whether fixed or
random, are dimensionally the same and can, conceptually if
not actually, be added, subtracted, or equated.
But in a
multivariate model the p different characteristics being observed are usually dimensionally different so that equality
of all elements of the matrix A( t x p) would not be meaningfu 1
even if numerically possible.
Thus in MANOVA the phrase
"equality of fixed effects" refers to equality of the t elements within each col. separately.
a
ik
= a jk
,
i
1
j
Symbolically expressed,
= 1,2, ••• t,
k
= 1,2, ••• ,p.
Expressed differently, equality of fixed effects implies the
existence of some vector of constants
~I(p)
= [C 1 ,C 2 , ••• ,c p ]
such that
or
a(t xp)
=-j(t)ct(p)
-
•
This latter form suggests using Prop. 1.2.8 and replacing
'04
the set of pt(t-1)/2 equalities in (5.2.1) by the set of
only p(t-1) equalities
tl(t:T x t)a(t x p) = Q(t:1 x p) •
And y(t-1 x p) was so defined in (4.2.3) that
1l
.... -1
= 10
tla +
10 ,
"" is determined by known constants of the structure
where 10
matrix
Mand 10
is normal with zero expectations.
regardless of whether
r
Thus
has other fixed effects or random
effects representing a second factor, the hypothesis of
equality of fixed effects among the elements of
a is
equiva-
lent to the hypothesis that each row of y, which is a
p-variate normal, have all means equal to zero.
Although far more complicated than testing the hypothesis of zero mean for a univariate normal, the analogous
multivariate hypothesis, e.g.,
~ (y) = Q
has had at least four tests proposed for it.
The oldest of
such tests is based on the well-known A.-criterion usually
associated with the names of J. Neyman and E.S. Pearson, but
its extensive use in tests on multivariate normal distributions is due to S.S. Wilks.
The distributional and compu-
tational aspects of Wilks' statistic have been investigated
by M.S. Bartlett, C.R. Rao, and A. Wald and R.J. Brookner.
More recently this approach has been adapted to general MANOVA
by S.N. Roy and J. Roy.
Using the A.-criterion we would
105
reject (5.2.3) and hence (5.2.2) if and only if
!~'Yi!
-----<
I~t! + yty!
ca
where c a i5 a constant determined by the chosen a and the
given constants p, t-1, and n-r * •
A second test due to D.N. Lawley and H. Hatelling
would reject (5.2.3) if and only if
for a suitable constant ca determined by the chosen significance level and the constants.
A third test was originated by and has been extensively
.'
developed by S.N. Roy.
It is based upon the fact that the
equality of two matrices can be replaced by a simpler determinantal equality and the fact that the distribution of the
characteristic roots of certain matrices is free of nuisance
parameters, depending only on certain constants called d.f.
Using this test, we would reject (5.2.3) if and only if
where c
a depends
upon the significance level a and the known
constants p, t-1, and n-r * •
A fourth test for a multivariate hypothesis has been
devised by J. Roy [12] from a procedure used for a different
purpose by S.N. Roy and R.E. Bargmann [2].
This test con-
sists of a sequence of F tests which are independent if taken
106
in a predetermined order.
Although the distribution involved
is familiar, the notation used by J. Roy is not familiar,
and the successive statistics become more and more complicated
when p > 2.
Moreover, it now appears that this test is not
applicable, or at least not readily adaptable to the multifactor model in which there are nonnormal stochastic effects.
Of course it is not the purpose of this section to
trace the history of tests of multivariate hypotheses or to
appraise the methods currently in use.
The point to be made
here is that by using the statistics y and
~
as defined
earlier, it is quite possible to test the hypothesis of
equality of fixed effects aCt x p) regardless of whether
"
!iCs x p) consists of fixed effects, normal random effects, or
nonnormal random effects.
5.3
Estimating ~ and certain simultaneous confidence bounds
pertinent to ~ •
From what has already been shown in §4.5, we know
regardless of whether y has random components other than the
normal elements of
g. Thus ~tw /(n-r*) has the Wishart dis-
tribution even when y is not normal, and the methods used
by S.N. Roy in [16] and by R. Gnanadesikan in (6] can be
applied to
~
instead of
y.
In other words, for given n - r *
and p and chosen a, we can find two constants c 1a and c 2a
such that
CONTRIBUTIONS TO UNIVARIATE 'OR
MULTIVARIATE ANALYSIS OF
VARIANCE WITH FIXED EFFECTS,
NORMAL OR 1I0NNORMAL RANDOM
EFFECTS, AND NORMAL ERROR
By S. 11. Rey and Whitfield Cobb
hlVV1{O
~~u 11 ~
!!!!.
!!!!.
107
with probability
~
1 - a1.
Moreover, we have
and
with joint probability ~ 1 - a. Finally we may obtain a total
of 2P - 1 sets of simultaneous confidence intervals of the
form
where d(p) is any nonnull vector and where successively
0,1, ••• ,p-1 of the components of d are chosen to be zero.
In this way (called "truncation" in [6]) simultaneous confidence bounds can be found on the individual variances which
are the diagonal elements
But it must be remembered
of~.
that the confidence coefficient > 1 - a for sll possible
choices of d in (5.3.4) and not for each choice.
It is also possible to obtain simultaneous confidence
bounds on certain functions of
cedure of J. Roy [12].
and
(~)i
cols. of
~
using the step-down pro-
Using the convention that (i'W)i
denote the submatrices from the first i rows and
~'~
and
i :: 1,2, ••• , p-1 ,
~,
respectively, we define for
108
::s
w~ -
(5.3.6)
2
2
cr, -
cr i + 1 -
f (Y1t~o i+1
I
ICw"W)i
I
I<~) i+1
t (2:)1
I
I
Denoting the i-th col. of Yi(n-r* x p) by
~i'
its marginal distribution is N[~,cr~l].
The conditional dis-
Yi, is such that
tribution of Yii+1 given ~1'~2, ••• ,wi is N[~i+1,cr~+11] where
the conditional means
~ts
and a parameter
~i+1
~(i)
regression coefficient.
are expressible in terms of given
called the i-th order step-down
Specifically, ~i+1
= [~, '~2""'~i]1
where
for i = 1 ,2, ••• , p-1 •
·cr.••
.+1
~,~
Analogous to the parameter 1, there is a statistic
i = 1,2, ••• , p-1 •
•
•
•
t
w.w·+ 1
-~-l
Although the step-down procedure does not give confidence bounds on the individual elements of
~,
it does
provide confidence bounds on the p "step-down variances,"
( 5 • 3 • 6), vi z • ,
1 - a.1
109
where
xfu
and XfL are upper an9 lower values obtained from
the chi square distribution for n-r*-i+1 d.f. so chosen that
the union of the regions X2 > Xlu and X2 < X~L is a critical
region of size 0i.
Moreover it is possible to obtain simul-
taneous confidence bounds on all the step-down variances and
on certain functions of the step-down regression coefficients,
(5.3.7).
If sCm) is any vector of m components and T is any
subset of k of the first m natural numbers, say n"n2, ••• ,n k ,
for kim, then
(a~, + a~2 +
and is called a T-norm of
••• +
s.
a~k)t
is denoted by T[s(m)]
Similarly the submatrix con-
sisting of the elements from the n,-th, n2- th ,
e.
... ,
nk-th
rows and cols. of any square matrix 8. is called a T-submatrix
of 8. and is denoted by 8(T).
(5.3.10)
T[~(i)]-
Then
(c i W!+1 Chmax {([(W'W)i]-1)(T)J
T[!!.(i)] i
TL~(i)]+
(ciW!+1chmax{([
)! <
(~f!)i]-1 )(T)J)t,
where c i is a preassigned positive constant, represents
2P - P - 1 statements all of which would be true with a certain
probability depending upon the variance ratio distribution.
In [12] it is shown how (5.3.10) and (5.3.9) may be combined
and the joint confidence coefficient found.
5.4
Estimating
~1
when
~
is nOImSl.
For a "Model 11" or "Mixed Model" in which
~(s
x p)
consists of s independent samples, each from a p-variate
normal distribution with mean
~,
and variance
~"
Gnanadesikan
110
obtained in [6] an estimate of ~1.
Using the same approach
exactly we get an estimate in terms of the
earlier.
y
and
~
defined
Prom (4.2.6) with the substitution of b,1(s-1) for
1;1(S-1), We get
•
Because of the postulated independence of
~
and £
And since 1:l(s:f x s) and 1*(~ x n) are orthonormal,
(S-1)[b~~1 + ~] ,
(n-r*)~
•
Thus we have for an unbiased estimate of
~1
,
-L ('1.''1. _ ~'! ) .
b~
s-1
n-r*
Except for notation this is the same estimate as that obtained in [6], but we have shown it unnecessary to assume
that
X itself
5.5
Obstacles to confidence bounds when ~,
is normal or that
~1
is proportional to
F a~~
~.
•
In §5.3 besides obtaining an unbiased estimate of
we placed confidence bounds on
chmax(~)
had been done before, assuming X normal.
be possible, when
of each row of
~,
~
is normal and
~1
and
~,
chmin(~)--as
Might it also
is the variance matrix
to find confidence bounds for
chmax(~1)
111
(5.5.1 )
chmin ell I Y)/ c 1a* .? chmin (t>2~1 + ~) .? chmin {yty)/c 2a * ,
'-
with joint confidence coefficient.? 1 -a*, where c 1a * and ~
c 2a * are the lower and upper values suitably chosen from the
distribution of characteristic roots for given s -1 and p.
Since y and! are independent, (5.5.1) could be combined
with (5.3.3) and a joint confidence coefficient easily found.
However, since
ch(t>2~1
t>2ch(~1)+ ch(~),
+
~)
combining (5.5.1) and (5.3.3) does not
yield confidence bounds on
•
if ~1
= a~~
are not in general equal to
ch(~1)
explicitly.
Of course,
, where a~ is some unknown scalar proportionality
constant, then ch(t>2~1 +~)
= (t>2()'~ + 1 )ch(~J,
and (5.5.1)
could be combined with (5.3.3) to yield
1
ch max (V
C2 a *
- V)/t>2
-
- chmax (W-
1
-W)/t>2 C1 a
(5.5.2)
Here in this dual statement, with confidence coefficient
~ (1 -a)(1
-a*), we have the multivariate analog of the
univariate confidence bounds given in (3.5.4) above and in
(3.7.5) of [6].
But (5.5.2) was possible only on the
112
assumption that
the experiment.
~1
is known to be proportional to
~
prior to
The MANOVA model with this assumption was
called the "restricted Model lIlt in [6].
Although this
assumption was made in [6], the results were expressed not
as (5.5.2) above but as simultaneous confidence bounds on
ch(~) and on a~.
It would be gratifying if we could obtain confidence
bounds on
ch(~1)
from the estimate of
~1
given in (5.4.3)
by the same steps used to obtain the confidence bounds
(5.3.2) from the
estimate~ ~
given in (5.3.1).
But this
latter derivation was possible because !'!/(n-r*) had a
Wishart distribution and the distribution of !,~-1 is
nuisance-parameter-free, whereas the distribution of
LY'y/(s-1) - !'!/(n-r*) ]~~1 is not known and almost certainly
is not of the same type as that of !,~-1.
Still another possible approach to confidence bounds
F a~~
on ~1 when ~1
is through the step-down procedure
mentioned in §5.1 and illustrated more explicitly in §5.3.
Using the same convention that
(~)i
denotes the ix i sub-
matrix from the first i rows and cols. of a square matrix
we define
2
(5.5.3)
V1
(5.5.4)
\)1
2
2
-
-
(yty) 1 ,
-
(b2~1+~)1' \)1+1
vi +1
I (Y'V)i+1 I .
,
I OL'Y)i I
-
I (b2~1 + ~) i+1 I
I (b2~1 + ~)i I
~,
113
_
z(i)
[(y'y) ]-1
- - i
r ~1 vi +1v'v
-2-i+1
•
••
'...,:
V.V'+1
-I,-~
"1,1+1
"2,i+1
•••
",~,~. +1
where "j,i+1 is the element from the j-th row (i+1)-th col.
of [b 2~1 + ~] •
p-1.
In all of the above definitions, i = 1 ,2, ••• ,
It should be noted that for
ponent of
~(j).
~(i)
is
D~~
k~j
<i, the k-th com-
the same as the k-th component of
The p quantities of the form ,,2, like the p quan-
tities defined in (5.3.6), are called step-down variances.
The p(p-1)/2 quantities of the form ~, like the quantities
defined in (5.3.7) are called step-down regression
coefficients.
Now provided 5-1 ~p, Y.'Y /(5-1) has the Wishart distribution where ~ (Y.'Y)/( s-1 ) = b2~1 + ~ rather than~.
The
marginal distribution of v~/,,~ is that of a chi square
variate with s - 1 d.f.
And for a given (fixed) (:l'Y) i' the
conditional distribution of v~+1/"~+1 is that of a chi square
variate with s-1-i d.f. for i= 1,2, ••• ,p-1.
--
Moreover, for
given (Y'Y)., the conditional distribution of
( 5 • 5 • 7)
~
[ ~ ( i) -
~ ( i ) ] , (Y.' V) i L~ (i) - ~ ( i ) ] /
~
" +1
114
is that of a chi square variate with i cl.f.
In [12] it is
indicated how to obtain simultaneous confidence bounds on
the
uf
~(i).
and I-norms of ~(i), or on the
of
and T-norms of
Suppose these confidence statements included such
inequali ties as
C1
2
~ ui
< c2 and Ca ~ 0i2 So C4. Then they
would imply
' u 2- 2
1
f
f'
f 1
bt
u J.s
i 0i an e ement 0 , or a unctJ.on 0 e ements
E1 alone? It is not difficult to combine (5.5.4) with
f,
0
(5.5.6) and show that
and hence also from (5.3.6) and (5.3.7) show that
Now substituting (5.5.9) and (5.5.10) into (5.5.8) gives
u J.'+1 ,J.,)t:: (i)+ (0 J.,
'+1 1,C1'+1
,)" (i) ~ C2 - Ca·
J., 2'··· ,C1'+
J. 1 ,J.-
But for i=0,1, ••• ,p-1,
2
u J.,J
'+1 ,= b (1 C1 J.,J
'+1 ,)+ C1'+
1 '
J.,J
where 1C1i+1,j denotes the element in the (i+1)-th row, j-th
col. of
Z,.
Substituting (5.5.12) into (5.5.11) yields
<
C 1 - C4
t>2
-
(5.5.13)
[,2 -~J
1 (
)
1 C1 i+1,i+1 + b 2 O'i+1,1,C1 i + 1 ,2,···,C1 i + 1 ,i x
- (1C1i+1,1'1C1i+1,2'·.·'10'i+1,i)~ ~
8
c 2 -2 C
b
But it is clear that (5.5.13) could not be the form of confidence statements on the elements of
unknown parameters
~
-
and~.
~1'
Thus to date the step-down
-
procedure has not been successful as a
confidence bounds on
5.6
because of the
meth~d
of obtaining
~1 •
Preliminary or quasi-confidence bounds on
Using the matrix
~(s-1
~'H'tlft.
x p) defined in (4.2.4) we can
f ormally obtain a matrix denoted by W1 (s-1 x p) in (4.2.6):
(5.6.1)
This is not a matrix of statistics, in the strict sense, because it cannot be computed directly from the observations
yen x p).
Moreover, although the matrix B(s x p) formally
appears in the right member of (5.6.1), we know from the
postulated independence of Band
~
""-1
- 11
~
is a p-variate normal, quite independent of B,
even when B is stochastic.
e(W)
? (
\f Y.
= Q(n-r*x
p);
From §4.3 we know that
var(~)
= l(n-r * )
·x I:(p) ;
-)
(
..... - 1 )
- ""-1
T 1 !:ill. ) = Q(
s -1 x P j var V - 11 Ha = 1(s-1) ·x I:(p) •
Since we agree to use n ? p+ r *
or p.d. matrices (a.e.).
w'~
g that W1 and hence also
is p.d., but the rank of the former matrix is
min(s-1,p).
Moreover,
[V -
r1
1
HB] '[V -
~{---
T1- 1 HB]
---}= ~
s - 1
w'w
{- -*J = ~ •
n-r
116
Thus [y- 11'"'l:W,] '[~-11'"'J:m.]/(S-1)
and
~'!/(n-r*)
have pseudo-
Wishart distributions of the same type, differing only in
d.f. regardless of 8 or
~.
With any nonnull vector g(p), we can obtain from the
above two sym. matrices two scalars, g' [,y -l~' HB]' x
[y -
Hij,]g, and g 'Y.tl~.
T1"
(5 •6 •3 )
By Lemma 1.4.6 we know
d ' [V - T-1 HB] , [V - T-1 HB] d
sup
g tf- - -, -- - -, -- -1J
d'W'Wd
---
= c hma
f[
x
1.
]'
y - ~-1
T,.tm.
X
But for two pseudo-Wishart matrices related to each other as
[y-T~'I:m.]I[~-I.;'!m.] and kl'ki, the right member of (5.6.3)
has a distribution which is known and partially tabulated.
Thus depending upon the known constants s-1, n-r * , and p,
and the chosen a, we can find a constant ca such that
Then by Lemma 1.4.8
~ 1 -
a •
The inequality within the braces of (5.6.5) implies, by
Lemma 1 .4.4, the inequality
51' [Yo - 11"lill.] '[~ - I;' HB]g,/51' g, S c a chmax (!'Y,O
for
s!l nonnull vectors g.
may be replaced by
By Lemma 1.4.5 this inequality
117
for sll unit vectors g(p) and
~(s-1).
From this it follows
that
-rca chmax(!·!)]'k
~ gtVI~ -S!'~tH' <I,') t~ ~
(c a
s:!'~.~-
Chmax(!t~)]t
rCa Chmax(J£lt!)]t S. gt!1 I!:i' (r,') tg,
;
~
dtVt~+
[c ch
(WIIiI/)]t
~
a
max - ~
- for all unit vectors s:! and g,.
~ sup<g'~'~)
It is easy to see that g'VIg,
for all choices of g and
which maximizes g,~t!:it(l")~.
~
including that choice
Similarly s:!t~t!:i'<11')'g, ~
supfs:!tB'Ht(T~')t~J for all unit vectors g and g, including
the choice which maximizes
g'~'~.
Then (5.6.7) implies
(but is not implied by)
SUp(gl~'~) - [caChmax(~i'~)]t < sup{g'BtH' (l~') '§.J <
sup(d'V'~)+
[c a chmax (W'W)]+.
- - ~
- Using Lemma 1.4.5 again, we may write this as
In the central term of this inequality is the s x s matrix
[H'(I~1)'I11H], which in §2.5 has been expressed as
[M~ ~h
- M; Mo (M~Mo) -1 M~M1] in terms of the submatrices of the
structure matrix M for any design and has been further
118
simplified to b 2H'tl for those designs called linked block.
Making this latter replacement in (5.6.8), we get
[b-2Chmax<,~,y.)]t- [Cab-2Chmax()d,'Y,Or~ S
(5.6.9)
[ch
(B'H'WB)]i< [b- 2 ch
max - -
-
~
(V'V)]!
max - -
+j'
.
[ ca b-schmax (WIW)J*.
- Since (5.6.9) is implied by the inequality within the braces
of (5.6.4), (5.6.9) is true with probability not less than
1 -a.
For the restricted designs of §2.5, we thus have pre-
liminary or quasi-confidence bounds on
(3.6.4) for the univariate model.
confidence bounds because
~'tl'~
~'tl'~
analogous to
We call them quasiis not a parametric function,
but neither is it a matrix of computable statistics.
We call
them preliminary because they are a useful step toward bounds
on parametric functions.
A
further result can be obtained from (5.6.6).
left member is ~ Igl~'~1 - I~'~'H'(I~')'~I.
1s!'~I~1
- Ig'.§.'tl'
(5.6.10)
(I.~1) '~I
$. [cachmax(!'¥Olt
Its
Hence
and
IgIY.'~1 - [CaChmax(!'Yi)]t $.IS'~'H' (T 1 1 ) I~I •
Now since IglVel ~ inflg'~1 for all choices of unit vectors
g and g, including that choice which minimizes ISf~'!:!'(T1')'~I,
(5.6.10) implies (but is not implied by)
in£lg'~'~1 -
(cachmax(WfYY.)lt .i infls'.§.'!:!' (11'
Then by Lemma 1 .4.5 we may write this as
)'~I ·
119
( ch ;n(v'v)]i - [c",ch
(W'W)]* <
m... - "'" max - -
Combining (5.6.11) with (5.6.8) gives
(chmin O.L' ~D]t - [cach max (i:l IW»)t <
(5.6.12)
[chmin fll'!:!.'
(1;' ) IIi' rrnJ ]t < [chmax
< (Chmax(y'y)]t
+
[cachmax(~tW)]t
~ Itl I (T;' ) 'i;1l:ill.J ]!
•
For the restricted designs of §2.5, this becomes
[b-
(5.6.13)
2 Ch
. (V I V) ] t
mJ.n - -
- [c a b - 2 C h max (W- W)
] t < [c h . ( B H HB) ] t
m~n - - I
I
I
(Vi/IW)]t.
< [ch max (8- t -H'HB)]!
< [b-2chmax (VIV)]t
+ [c b-2 ch
- a
max - -
-
Since this inequality is implied by the inequality within the
braces of (5.6.4), it provides quasi-confidence bounds on
ch (Ii' ti l !:ll2.) with a confidence coefficient not less than 1 - a •
Thus (5.6.13), as well as (5.6.9), is a multivariate analog
of (3.6.4).
5.7 ~onfidence bounds on ch(~,) when ~ is normal and ~,Icrf~.
Let us again suppose that the rows of
~(s
x p) are
independently distributed and that each row has the identical
p-variate normal distribution with variance matrix
Then regardless of ~ (~), ~ <'t!~)
~(Q'H'~)
= (s-1)~,.
Moreover
= Q(s:1 x p)
rrn is
~,(p).
and
normal, and
Ii'ti'tm./(s-1) has a pseudo-Wishart distribution.
Thus the
distribution of ch(~'tl'~;') is nuisance-parameter-free.
12~
Depending only on the known constants
*
$ -1
and p and the
*
*
Ch osen a , we can f'1nd two constants C1 and C2 such that
(~.7.1) Pr[c~ i chmin(~'tl'HBt;1) i chmax(~ttl'HB~;1) i c:J
=
Then by steps similar to those just used in
1 - a
§~.6,
*
•
we can
deri ve
(~.7.2)
Pr(Chmin(~ttl'tili)/c: i ch(~1) < chmax(~,tl'~)/c~J
>
Since by hypothesis
tributed, we may combine
~
and
(~.7.2)
1 - a* •
are independently dis-
~
and
(~.6.13)
to obtain
confidence statements with confidence coefficient
(1 - a * ).
Since each term wi thin the braces of
~ (1 -
(~. 7.2)
a) x
is
nonnegative,
Then
(~.7.3) pr{[Chm~n(Yt~)]t_ [CaCh~ax(~t~]t i [Ch(~1)]t i
C2b2
C2 b2
At last we have confidence bounds on
but ~,
I
a~~.
ch(~,)
when
~
is normal
Thus (~.7.3) is a multivariate analog of
(3.7.3) and is more generally applicable than
(~.~.2).
Of
course the characteristic roots of a variance matrix are not
in themselves easily interpreted parameters like standard
121
deviations.
ure~
Nevertheless they do constitute a set of measconfi~ence
of the dispersion, and
bounds on these roots
may be useful.
Since
~1
is p.d., confidence bounds on
might be of interest even without bounds on
the distribution of the maximum
we can find two constants,
Cs
chmin(~1).
.
characteristic
and
C4'
chmax(~1)
Using
root only,
such that for chosen as
Then by similar reasoning to that used above, it follows that
~
1 - as •
Then combining (5.7.4) with (5.6.9), we get
This confidence statement may be used as an alternative to
5.8
Confidence regions when
normal.
~(sx
2) is not necessarily
In §5.6 we obtained preliminary or quasi-confidence
bounds on
ch(g'tl'~)
regardless of the distribution of
~,
and then in §5.7 we used these quasi-confidence bounds as a
preliminary stage in the process of finding genuine
122
confidence bounds ort ch(L1) when
having
..
~1
~
as its vatiance matrix.
was normal, each row of
But what if
~
~
is not normal
or not assumed to be normal prior to the experiment?
Can we
still go on from the quasi-confidence bounds to genuine confidence bounds on some parameter or some other feature of
the distribution from which each row of
~
is a sample?
In
particular, can we obtain confidence bounds on m-tile differences for a multivariate model as we did in Chapter III
for a univariate model?
As the simplest and hence most hopeful multivariate
model we consider the case when p =2.
Then
~'tl'~
is merely
the 2 x 2 matrix
(5.8.1 )
whose characteristic equation is
and whose characteristic roots can thus be found explicitly:
~1 = ~(.Q~!:!'tlQ.1
+ .Q.ktl'ill2.2) -
~[(~~tl'ill2.1 - !2.~H'ill2.2)2
4(.Q.~H'tlQ.2)2]t
(5.8.2)
+
and
~2= ~(b;H'tlQ.1 + .Q.~!i'!:iQ.2) + ~[(.Q.~tl'ill2.1 - b~!:l'Hb2)2 +
4(.Q.;H'tlQ.2)2]t •
Thus when p= 2, we can get explicit algebraic expressions for
ch . (B'H'HB) and ch
(B'H'HB) and substitute them into
mln - - max - - (5.6.13). Now even though [ch . (B'H'HB)]t is nonnegative,
mJ..n - - -
123
the lower bound given for it in (5.6.13) may be negative.
Hence we define
t
1
-
[ch
min
(y,~)]t
[c achmin (Yi t~) ]t
b
b
>0 ,
if the right member
(5.8.3)
t
1
- o , otherwise;
[chmax(vtY)l~
ta =
b
+
[cachmax (Yi' W)]ib
Thus (5.6.13) may be replaced by
(5.8.4)
Prft 1
< At
.s.
At
.s. tal ~
(1 -
a)
where the t's are computable, by (5.8.3), and the A's are,
by (5.8.2), expressible in terms of unobservables of the
model.
Now suppose
(5.8.5)
t = tie Then 2A. = 2t.a or
1.
1.
~i
.Q;H'Hb 1 + .Q2tl't!Q.a - 2t~ = :!:.[ (.Q~!i'!iQ.1 - £~!:!'tn~.2)a +
4(Q.~!:i'Hh2)a]t •
Squaring (5.8.5) gives, upon some further simplification,
Thus for i= 1 and i = 2, (5.8.6) determines the extreme conditions on 12.1 and 12.2 permitted by (5.8.4).
In §3.9 we approximated the shape of the curve of the
unknown p.d.f. of b by two rectangles (columns of a histogram) with common boundary at the median, which approximation
124
would be the actual p.d.f. of a two-valued variate we denoted
by 4b.
Similarly we now approximate the surface of the
unknown p.d.f. of a bivariate by four rectangular parallelopipeds with commbn boundaries at the median of the first
variate and the conditional medians of the second variate.
(The vertical projections of these boundaries appear as the
dashed lines in the figures of §5.1.)
The tops of these
parallelopipeds would be the surface of the actual p.d.f. of
a bivariate taking on four pairs of values with equal probability.
Thus we define a bivariate analog of the substitute
variate, 2kb, defined in §3.9.
Using the notation of §5.1,
the possible values of this substitute bivariate are
(4~1'4~11)' (4~1'4~18)' (4~8'4~81)' and (4~S'4~S8).
define Sij as the frequency of the pair
(4~i'4~ij)
sample of 5 pairs of values of this bivariate.
And we
in a
Thus, for
quartiles, 511 + 518+ 581 + S88 = s.
As noted in §3.9 for the univariate problem, the
preliminary or quasi-confidence statements would be true
probability statements for either the original variates or
the substitute variates or for block effects with any other
distribution.
It is only when the substitute variates are
made to "do duty for" the original variates that the element
of approximation enters into the final confidence statements.
This approximation is, of course, quite aside from the question of replacing an exact probability statement by a
ment true with probability greater than or equal to a
st~te­
125
preassigned quantlty--akind of replacement which stems from
quite different considerations.
The ultimate approximation for this bivariate problem
is the one involved in replacing the actual density surface
by a histogram consisting of four parallelopipeds, just as
in §3.9 for the univariate problem, it was observed that the
approximation there resulted from replacing the density
curve by a histogram consisting of 2k rectangles.
The preliminary or quasi-confidence bounds (5.6.13)
are independent of the distribution from which
of size s.
~
is a sample
Hence the same limits hold with at least as large
a confidence coefficient when the s rows of
~
are replaced
by this substitute bivariate having four possible pairs of
values with equal probability.
t1
~
fchmin[ (4.§)
wi th probability
>
Thus (5.6.13) becomes
'l:!'!i<4~)]Jt ~ fChmax[ (4~) 'li'J:i<4.§)]J t ~
1 -
a.
t2
We also make appropriate replace-
ments of Q1(S) and £2(S) by s values of the substitute variates 4b1 and 4b2 in (5.8.6). Thus we find
s
s
(4£1) '!:!'tl(4£1) =
4b ;1 -~[
4 b j;]2
L
L
j=1
j=1
= (511+ SH~)4~~+ (SS1+ SSS)4~: *[(511 + S1S)4~1+ (SS1+ SSS)4~a]2 •
(5.8.7) (4 b 1)'tl'!iCQ.b,)= (S11+ S13)(SS1+ 5SS) (4~3 -4~1)2 •
S
126
5
Simi 1 a r 1 y
( 4 b 2 ) I!:!, 1:1< 4 b 2
)
=
5
L 4 b j t[ L 4 b j
2 -
j=1
( 4r.Q..3) I!:! IH( 4£2)
(5.8.8)
=~[ 51 1 5 U~ (4 ~ 1:3 -
2]
2
•
j=1
4 ~ 1 1 ) 2 + 5 1 1 S 8 1 (4 ~:3 1 - 4 ~ 1 1 ).3 +
511588(4~38 -4~1,)2+ S1:3531(4~13 -4~81)2 +
S
(4 b 1 ) 'H'H(4 b 2)
5
= L (4 b j1) (4 b j2) - t[ L:
j=1
4 b j1]
X
[ L b j2]
•
j=1
5
4
j=1
(4£1) 'H'H(4 b 2)
=;(4~8 -
4~1)[ (511
+ 518)(S81 +
S8a)(4~S1
-
4~1S)
+ 511(SS1+ 58S)(4~1S -4~1,)+ S83(S11+ 518) X
Replacing (4~S1 - 4~1 a) by (4~3a - 4~ 18) - (4~88 - 4~a 1 )
and
(4~8a-4~11) by (4~as-4~1s)+(4~1S-4~11)' (5.8.8) become5
(4£2) '!:!'!:!(4£2)
= ~[ 511 (513 + SSS) (4~1 S -
4~11
)2 + 2511 53S X
(4~1S -4~11)(4~S3 -4~13)+ 588(511+ 513)(4~8S-
(5.8.10)
Similarly (5.8.9) become5
~_ _~ n
_
127
(4Q, ) '.tl'l:!(4Q2) =
(5.8.11)
~ (4PS
- 4P, )[ Sa, (511 + 5, s)
(4~S1
- 4P,,) +
(S11 SSS- S1sSa,)(4~'S -4~'1)+Ssa(S11+S18) x
,)
Now (5.8.7), (5.8.10), and (5.8.11) could be substituted into
(5.8.6), but the latter would then become very complicated.
Hence we consider only the simpler forms of (5.8.7), (5.8.10)
and (5.8.11) corresponding to the special types of bivariate
distributions characterized in §5.1.
For type 2, (4~n3 - 4~'1) = (4PSS - 4~S1) and (4~S1 - 4P,,)
= (4~SS - 4~1S).
Making these simplifications, (5.8.10) "
becomes
(4Q2)'HtH(4b2)=~[(S11+SS1)(S1S+ SSS)(4PU~ -4~11)2 +
(5.8.12)
(S11 SS1+ S11Sas+ S1SSSS)(4PS1-4P11)2+2ssa x
And (5.8.11) becomes
(4 b 1)'!:!'ti(4Q2)=
(5.8.13)
~(4~S-4P1)[(SS1+ 588)(511+
S'8)(4PS1 -4P,1)
+(S11SSs-S,sSa1)(4~,a-4P,,)] •
For type 3, (4PSS-4~18)=-(4P81-4~11) and (4P88-4~S1)
(4~ 18 - 4~ 11 ) - 2(413a 1 - 413, 1 ) •
=
If we make these substitutions
in (5.8.10), it becomes
(4b2) '.tl'H(4Q2) =
·e
~[(S11 + SS1) (S1S + SS8 )(413,s - 4~'1 )2+ (511 SS1
+ S,1Saa+4s,sSS, -3S1SS8S+4sS1Ssa)(4J3s1 -413,1)2
+ (2s , ssss - 2S ,1 SS 8 -4S1S581 - 45 81 588)
X
128
And similarly for type 3, (5.8.11) becomes
(5.8.15)
(4b1)'J:!'tl(4£a2)= ~(4PS-4~1)[(S81 -S88)(S'1 +S18)
X
(4~8' -4~1,)2+ (S11 S88 - S1S SS')(4B1S -4~1')]'
Now with these somewhat simpler expressions for
(4 b 2)'J:!'H(4£2) and (4£a1)'tl'tl(4~2) we could substitute into
(5.8.6).
Straightforward algebra is all that would be re-
quired, but we wish to avoid complicating details at this
point.
Hence we leave types 2 and 3 for later investigation.
But for type 4 still further simplification is easily obtained.
We can obtain type 4 as a special case of either
type 2 or type 3.
In the expressions
for(4~)'J:!'H(4£2)
and (4,b1)'tl'tl(4£a2) we let (4PS'-4P,,)= (4~ss-4P1a) =0
and (4P,S - 4P1') = (4PS8 - 4~31)'
(5.8.16)
This gives
(4~)'H'tl(4£a2)=~(S11+ S81)(S15+ SSS)(4P18 -4~11)2
and
(5.8.17)
(4.Q.1)'tl'tl(4£a2)=~(S"S58-S18581)(4~8- 4P1)
X
(4P1S-4~11) •
Now for this simplest type of relation between the two
variates, the "equations of extreme conditions," viz.,
(5.8.6) for i
= 1 ,2, become
(511588- S1SSS1)2](4~5 -4~1)2(4P1S- 4~11)2
-tfs-1[(S11+ 518)(S31+ S83)(4~3-4~1)2
+ (S11 + 581)(S18+ 5S3)(4~13 -4P11)2]+ t~1. = 0 .
129
For a given partition of s into S11, S18, Sa1' SS8 and for
a given
~.,
~
(5.8.18) may be regarded as determining a locus
in the plane of (4~8 - 4~ 1) 2 and (4~ 18 - 4~ 11 ) 2.
All possible
partitions of s thus determine a finite family of conics for
each value of ~i.
If any three of the s's are zero, (5.8.18)
becomes a contradiction.
If 511 = S18 =
° or if
Sa 1 = SS8 =
°,
(5.8.18) becomes a condition on only one of the "coordinates,"
(4~a
-
4~ 1 ) 2
or
(4~1 S
-
4~11
) 2.
These possibilities thus do
not correspond to what were called bounding loci in §3.9.
But if 511 =S88=0, S18
511
I
0, and s8a
I
I
0, and 581
1o,
or if 518=581=0,
0, then the locus of (5.8.18) is a straight
line with equal positive intercepts.
If
511 S8S = S18SS1
I
0,
the locus of (5.8.18) consists of two straight lines, parallel to the coordinate axes and intersecting in the first
quadrant.
For all other possible partitions of 5, the locus
of (5.8.18) is a rectangular hyperbola whose asymptotes are
..•
ex>
to
-
(4~1S - 4~11)2
=
Now we can find for this bivariate model (as we found
in §3.10 for a univariate model) the a priori probability
130
associated with each partition of s and hence the total
probabil~ty
•
associated with all bounding loci of (5.8.18).
This latter probability is
y = 1 - Pr{only one Sij
1 oj -
S881 OJ- Pr{only S11
1
Pr{only S31
1
0 and S18
0 and
1 oj
= 1 .. - 4s 4
(5 • 8 • 20)
Y = 1 _ 21 - s
•
It is interesting to note that the total probability associated with bounding loci is the same for p= 2 and k = 2 (where
there are four equiprobable values) as for p = 1 and k = 2
(where there are only two equiprobable values).
Now for the f amil y of loci in the plane of
and
(4~18 -4~11)2
-
4~ 1 ) 2
we must determine the inner and the outer
boundaries as was done: in
sloping lines are
(4~S
§3~11.
The intercepts of the
&t~/S1SS81 and &t~/S11SSS' whereas the
intercepts of the hyperbolas are Stf/(S11+S1S)(S31+SS3) and
St~/(S11+S31)(S13+S33).
....
The intercepts of the vertical and
horizontal lines are, for the same t i , the same as the corresponding intercepts of the hyperbola. As in §3.11 we
determine the extreme values of these intercepts and find
that 4&tf/(S2_1) when s is odd, or 4t~/s when s is even, is
the smallest intercept possible, and it can be the intercepts for both the lines and the hyperbolas.
The largest
intercepts of the vertical and horizontal lines are
131
sti/2(s-2); the largest intercepts of the sloping lines and
of the hyperbolas are st~/(S-1). Thus among the loci of
(5.8.18) the
i~ner
boundary is the straight line whose
equation is
(5.8.21 )
depending
u~on
whether s is odd or even.
But there is no
unique outer boundary among the loci of (5.8.18).
Hence as
an outer boundary of all bounding loci of (5.8.18), we use
segments of three different loci:
= st~/(s-1)
1.
(4~a
4~1)2
(5.8.22)
-
If for
for
~i
~i
= st~/2(s-2)
1.
4~11)2
= st~/2(S-2)
in (5.8.21) we use the t
in (5.8.22) we use the
~2
1
of (5.8.3) and
of (5.8.3), then (5.8.21)
and (5.8.22) bound a region in the first quadrant of the
plane of
(4~a - 4~1 ) 2
(4~a - 4~1)
and
and
(4~ 1 a - 4~ 11 ) 2
(4~1a -4~11)
also, with probability
~
such that values of
which satisfy (5.6.13) will
1 - a, correspond to points in this
region with conditional probability y.
Thus we have a con-
fidence region for the interquartile ranges whose confidence
coefficient is not less than (1 - a)y where a may be chosen
but y is given by (5.8.20).
BIBLIOGRAPHY
[1]
Aitkin, A.C. Determinants and Matrices (Eighth Edition).
Edinburgh: Oliver and Boyd, 1954.
[2]
Bargmann, Rolf. "A Study of Independence and Dependence
in Multivariate Normal Analysis." Institute
of Statistics, University of North Carolina,
Mimeograph Series No. 186 (1957).
[3]
Browne, Edward T. Introduction to the Theory of Determinants and Matrices. Chapel Hill: University
of North Carolina Press, [c. 1958].
[4]
Coxeter, H.S.M. Regular Polytopes.
and Co., Ltd., 1948.
[5]
Ghosh, M.N.
[6]
Gnanadesikan, R. "Contributions to Multivariate Analysis Including Univariate and Multivariate
Variance Components Analysis and Factor Analysis." Institute of Statistics, University of
North Carolina, Mimeograph Series No. 158 (1956).
[7]
Heck, D.L. "Some Uses of the Distribution of the
Largest Root in Multivariate Analysis." Institute of Statistics, University of North Carolina, Mimeograph Series No. 194 (1958).
[8]
MacDuffee, C.C. The Theory of Matrices.
Springer, 1933.
1
London: Methuen
"Simultaneous Tests of Linear Hypotheses,"
XLII (1955), 441-449.
~QIDetrika,
Berlin: J.
Nair, K.R. "Simplified Analysis of Singly Linked Blocks,"
Biome1Ii~, XII (1956), 369-380.
.
[10]
Ramachandran, K.V. "On the Simultaneous Analysis of
Variance Test," Annals of Mathematical Statistics, XXVII (1956), 521-528.
[11 ]
--------"Contribution to Simultaneous Confidence Interval Estimation," Biometrics, XII (1956),51-56.
[ 12]
Roy, J. "Step-down Procedure in Multivariate Analysis,"
Institute of Statistics, University of North
Carolina, Mimeograph Series No. 187 (1957).
132
133
,
..
...
(13]
Roy, S.N., ~nd R. Gn~nadesikan. "Further Contributions
to Multivariate Confidence Bounds," Biometrika,
XLIV (1957), 399-410.
[14]
Roy, S.N., ~nd AlOE. Sarhan. "On Inverting a Class of
Patterned Matrices," Biometrika, XLIII (1956),
227-231 •
[15]
Roy, S.N. "A Note on Some Further Results in Simultaneous Confidence Interval Estimation,"
~nnals of Mathematical Stati§ti~, XXVII (1956),
56-857 ..
[16]
--------§ome Aspects 9£ Multivariate Analysis.
York: John Wiley and Sons, Inc., 1957.
[17]
Sommerville, D.M.Y. An Intr9duction to the GeometlY
~f N-Pimensions.
New York: E.P. Dutton and
ompany, Inc:-rn.d .. ].
[18]
Wilks, 5.S. "Certain Generalizations in the Analysis
of Variance," Biometrika, XXIV (1932) ,471-494 •
[19]
--------Mathematical Statistics.
University Press, 1943.
[20]
Wishart, J. "The Generalized Product Moment Distribution in Samples from a Normal Multivariate
Population," Biometrika, XX A (1928), 32-52.
[21]
Youden, W.J. "Linked Blocks: A New Class of Incomplete
Block Designs," (abstract), Biometrics, VII
( 1951 ), 124 •
New"·'
Princeton: Princeton