Kleinbaum, D.G.; (1970)Estimation and hypothesis testing for generalized multivariate linear models."

•
•
•
ESTHfATION AND HYPOTHESIS
TESTING FOR GENERALIZED
~illLTIVARIATE
LINEAR MODELS
by
David G. Kleinbaum
Department of Statistics
f't-.' ('t-.f 0 S LZ..-,E c;. <0 <0 'i
University of North Carolina at Chapel Hill
FEBRUARY 1970
d
•
ESTI~1ATION
AND HYPOTHESIS TESTING FOR
GENERALIZED MULTIVARIATE LINEAR MODELS
by
David G. Kleinbaum
A thesis submitted to the faculty of
the University of North Carolina at
Chapel Hill in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy in the Department
of Statistics
Chapel Hill
1970
Approved by:
Adviser
·e
h
.,;
DAVID GEORGE KLEINBAUM. Estimation and Hypothesis Testing for
Generalized Multivariate Linear Models. (Under the direction
of NOR}~ L. JOHNSON.)
This study investigates estimation and testing problems for
multivariate linear models in which observations may be missing and/or
in which different design matrices correspond to different response
variates.
Two new models called the More General Linear Multivariate
(MGLM) model and the Generalized Growth Curve Multivariate (GGCM) model
are defined to deal with these situations.
•
For the MGLM model, the problen of obtaining best linear unbiased estimates of estimable linear functions and sets of the unknown
treatment parameters is discussed.
Also, unbiased and consistent esti-
mates of the variance-covariance matrix are obtained.
Another
~ethod
of estimation discussed is that of obtaining estimators which are best
asymptotically normal (BAN).
Several such estimators are proposed for
the HGLM and GGCH models.
For the problem of testing linear hypotheses in the MGLM and
GGCM models, a testing method for large samples is described which
uses a test criterion given in general form by vlald.
The asymptotic
distribution of the test statistic is a central chi-square variable.
A BAN estimator of a linear set of the treatnent parameters and consistent estimators of the variance-covariance parameters are needed for
its computation.
ii
TABLE OF CONTENTS
"
CHAPTER
PAGE
tv
ACKNOWLEDG~lliNTS
I
NOTATION AND ABBREVIATIONS
v
INTRODUCTION AND SUMMARY
1
11ULTIVARIATE LINEAR MODELS
1.1
1.2
1.3
1.4
e
..I,
1.5
II
III
...····· ·....
. ·· ·· · ·· .· ··
·
'
~
'
.
·
····
·
BLUE
ESTI~~TION
2.1
2.2
2.3
2.4
Introduction
BLUE Estimation of Linear Functions ~'~
BLUE Estimation of Linear Sets H'i .
Unbiased and Consistent Estimation of E
5
8
·
·
·
·
·
· 12
15
· 16
·
· 22
· 24
· 31
36
· 40
FOR THE MGLH HODEL
....
.·
· ····
··
42
· · 43
. · ·· 58
· · 61
BAN ESTIY.ATION FOR THE MGLl1 Ai'l'D GGCH UODELS
3.1
3.2
3.3
3.4
-e
The Standard Multivariate
General Linear Model
1.1.1 Estimation in the SM Model
1.1.2 Testing Linear Hypotheses in
;
the SM Model
.
The Growth'Curve Model
1.2.1 Estimation in the GCM Hodel
1. 2.2 Testing Hypotheses in the GCM Model
Experimental Situationi':
to Which the S1'1 Model Does Not Apply
The More General
Linear Multivariate Model
1.4.1 Special Cases of the
MDH, GIM, HM and Other Models
A Generalization of the
Growth Curve Multivariate Model
Introduction
.
..
.
.....
Derivation of the Asymptotic Variance
Natrix of a BAN Estimator for the HGLM Model
Estimation for the GIM Mode1--A Review
of the Literature with Some Extensions
3.3.1 Anderson's Method
...
.
3.3.2 Derivation of MLE's for the HIM Model
3.3.3 Hocking and Smith's Approach
BAN Estimation for the ~·ml!
Mode1--A Liter~ture Review
•.
.
3.4.1 Comparison of the Estimators
by r!onte Carlo Hethods
68
72
• 79
· • 81
85
.91
95
• • 99
iii
r "
II I
CHAPTER
PAGE
3.5
3.6
IV
Introduction . . . . . . . . . .
Review of the Literature .
The Wald Statistic . . . .
Testing Linear Hypotheses in the MGLM Model
Testing Linear Hypotheses in the GGCM Model
PRACTICAL
5.1
5.2
5.3
5.4
5.5
5.6
VI
e
TESTING LINEAR HYPOTHESES
FOR THE MGLM AND GGCM MODELS
4.1
4.2
4.3
4.4
4.5
V
· . · · 100
101
103
· 107
··
A BAN Estimator for the MGLM Hodel
3.5.1 Special Cases
.
3.5.2 Iterative Techniques
BAN Estimation for the GGCM Model
· . . . III
· III
· 114
115
· 119
EXA}~LES
.. ...····..
125
·
.
.
Model · ·
.
126
··
Model with p=2
131
Model with p=4
. · . · ·· 135
Hodel
...·
· 145
. ...····
· 149
Introduction
.
Example 1: An HIM
Example 2: An HDM
Example 3: An MDM
Example 4: A GGCM
.
Conclusions
-
RECOMHENDATIONS FOR FURTHER RESEARCH
152
REFERENCES
155
APPENDIX
163
-
iv
•
ACKNOWLEDGMENTS
I wish to express particular gratitude to Dr. Norman L.
Johnson and Dr. James E. Grizzle for the guidance that made this study
possible.
I also wish to thank the other members of my Doctoral
Committee--Dr. P. K. Sen, Dr. I. M. Chakravarti, Dr. M. R. Leadbetter
and Dr. A. R. Bolstein for their various contributions.
In addition,
I wish to thank Dr. S. John of the Australian National University for
his advice during the early part of this study and Dr. A. W. Davis of
CSIRO, Australia, for his help during the latter stages of this study.
Also, I am grateful to Mr. James O. Kitchen who provided invaluable
assistance for the computer work of Chapter Five.
Finally, a special tribute must be given my wife, Anna, who
typed both the original and final drafts, provided encouragement and
companionship, and generally was instrumental in helping me keep my
",
sanity during my course of graduate study.
This research was supported in part by the Department of Biostatistics through the National Institutes of Health, Institute of
General Medical Sciences Grant No. 12868-06, and in part by the Department of Statistics through Air Force Grant No. AFOSR-68-l415.
v
NOTATION AND ABBREVIATIONS
~(q x 1)
is a (q
x
(1 x "q)
A(p
x
=
q)
«a
ij
))
column vector, and a'
1)
is the corresponding
row vector.
is a (p
x
q) matrix with a ..
J.J
as the element in the
i-th row and j-th column.
a ..
J.J
is the element in the i-th row and j-th column of the
'j
matrix A.
A
=
«A .. ))
J.J
is a partitioned matrix in which A..
J.J
is the sub-matrix
in the i-th row and j-th column.
R(A)
is the rank of the matrix A.
V(A)
is the vector space spanned by the rows of A.
A'
is the transpose of A.
tr A
is the trace of A.
A
-1
is the unique inverse of a square matrix A of full rank.
is any generalized inverse of the matrix A and is defined
A
ch. (A)
is the i-th largest characteristic root of A.
A@B
is the Kronecker Product of the matrix A and B defined
J.
by A
I
q
~
B
=
«a J.J..B))
where A
=
«a J.J..)) .
is the identity matrix of order q.
vi
op,q
is the (p x q)
o
is the (p x 1) vector of zeros.
1
is the (n x 1)
l'
.,
-n
matrix of zeros.
column vector of all unities.
is the square root of a symmetric matrix n
defined by
1--
n = C'AC
where C is an orthogonal matrix and A is a
diagonal matrix such that
For x
= (Xl'
Cov (x, 1..)
... ,
n = C'A 2C.
x )'
n
is the
~njx
m) matrix -witn Cov(x , Y )
i
j
in the i-th row
and j-th column;
is the (n x n) matrix
For Y(n x p)
E(Y)
Cov(~, ~).
= «y .. )), a matrix of random variables,
~J
is the (n x p)
matrix of expectations of the elements
of Y, Le., E(Y) = «EY
Var(Y)
is the (np x np)
(np x 1)
ij
));
variance-covariance matrix of the
vector defined by putting the
~
of Y
underneath each other in a long column vector.
().1, L:)
-X""'Np-
means that the random variable x
has a p-variate multi-
normal distribution with mean vector
~
and variance co-
variance matrix E.
means that the scalar random variable x
is distributed
as a central chi-square variable with c
degrees of
freedom.
Vi/
vii
x + x
-nL
means that the random variable x
-n
converges in law
to the random variable x.
;t(x )
-n
xn
+
+ A
p
F
means
~
t
~
has distribution function F.
means that each element of the matrix X converges in
n
probability to the corresponding element of A.
p.d.
means positive definite.
i. Ld.
means independently and identically distributed.
-.e
INTRODUCTION AND SUMMARY
The main purpose of this study is to investigate estimation
and testing problems for multivariate linear models which describe
experimental situations that cannot be analyzed under the standard
parametric multivariate general linear model (hereafter referred to as
the SM model) .
Chapter One contains a discussion of the SM model and the
growth curve multivariate (GCM) model, which are the two linear models
4.e .
that have been extensively dealt with in the literature and used in
applications.
Situations which cannot be described by these models
are then discussed.
These occur when there are missing data and when
different response variates have different design matrices.
A new
model generalizing the 8M model called the Hore General Linear Multivariate (MGLM) model is then proposed to incorporate the new situations.
Special cases of the MGLM model which have been dealt with
in the literature are described.
In addition, the GeM model, which
is not a special case of the MGLM model, is generalized to allow for
gaps in the data.
Chapter Two considers the problem of BLUE estimation in the
MGLM model of estimable linear functions- and sets of the unknown treatment parameters
·e
~ =
- -
(;',
-1
••• ,
~')'
....::p
tion of the variance-covariance
and unbiased and consistent estima-
matrix~.
In the discussion of BLUE
2
2
estimation, necessary and sufficient conditions are given for the
existence of a unique BLUE in terms of known linear functions (or sets)
of the data for the entire experiment, that is, functions which do not
depend on the unknown
r.
The method of proof simplifies and clarifies
considerably the work of Srivastava [60]1 for some special cases of the
MGUI model.
An unbiased and consistent estimate of
r: is
then obtained
and used to determine an unbiased estimator of the variance of any
linear function of the data.
Chapter Three deals with Best Asymptotically Normal (BAN)
estimation of estimable ligear sets HI f.·
non-linear functions of the data.
BAN estimators are usually
They have variances which in large
samples are the minimum that would be achieved by linear estimators if
r were known.
Also, they are needed for computing the Wald Statistic
described in Chapter Four, which is used for testing linear hypotheses
in the MGLM and Generalized Growth Curve (GGCM) models.
Section 3.2
considers the calculation for the HGLH and GGCM models of the information matrix for
set H'I
I
where H(M
and the asymptotic variance matrix of any estimable
x
w)
is of full rank w.
Use of the rules of matrix
differentiation (as described in the Appendix) simplifies considerably
these derivations.
reviewed for some
In the next two sections the literature is
important special cases of the HGLM model for which
BAN estimators have been obtained. One of these is a general multivariate model for missing observations called the General Incomplete
~lultivariate
(GIM) model.
For a special case of the GIH model called
the Hierarchal Incomplete (HIM) model, in which the missing data have
a simple monotonic structure, maximum likelihood estimators are
lThe numbers in square brackets refer to the bibliography.
3
,e
obtained using a method of Anderson [7].
work of Bhargava [13].
This result generalizes some
A third special case of the MGLM model allows
different response variates to have different design matrices.
model is called the Multi-Design Nultivariate (HDM) model.
This
The HDN
has been dealt with in the literature (e.g., [61], [72]) under the
title "Seemingly Unrelated Regressions.
II
_
In Section 3.5 a BAN esti-
mator for the MGLM model is then proposed which generalizes an estimator obtained by Zellner [72] for the MDM model.
Alternative esti-
mators may be obtained by a method of iteration suggested by Zellner
or by the method of scoring.
Finally, a BAN estimator for the GGCM
"J
model is obtained.
In Chapter Four a method is proposed for obtaining test
criteria for testing linear hypotheses for the HGLM model when the
e-·
J
underlying data are nornlally distributed.
The test statistics obtained
are quadratic forms called Wald Statistics, and they are constructed
from BAN ,estimators of estimable sets H's..
of the variance-covariance parameters,
and consistent estimators
The asymptotic distribution of
the Wald Statistic under the null hypothesis is a central chi-square
variable.
When applied to the special case of the SM model, the
method yields Hotelling's trace criterion.
In addition, the testing
method is applied to growth curve models.
For the GCM model, a com-
petitor to Khatri's trace [31] which uses S, the sample covariance
matrix, is obtained.
For the GGC!l model, two competitors are proposed,
one being the Wald Statistic and the other, a generalization of
Khatri's trace.
In Chapter Five some practical examples are analyzed to illustrate the proposed
BAl~
estimator and tile corresponding Wald Statistic.
4
Chapter Six contains recommendations for further research.
Some matrix results which 'are important for use in the main
text are given in the Appendix.
Although some of these results are
new (in particular, those dealing with the differentiation of a matrix
with respect to another matrix), all are easily established from the
definitions of generalized inverses, Kronecker-delta products and
matrix differentiation.
Proofs are therefore not given.
.e
CHAPTER I
~ruLTIVARIATE
1.1
LINEAR MODELS
The Standard Multivariate General Linear Nodel (SN)
Consider an experiment in which there are n individuals or
experimental units and on each individual p measurements are taken.
Each measurement is a univariate random variable which represents the
value of a particular characteristic (e.g., height, weight, blood
pressure) of interest or the same characteristic measured under dif-
.e.
ferent conditions.
The different characteristics or conditions, as
the case may be, are frequently referred to as response variates.
we label the p response variates as V , V , ... , V
1
2
and denote the
P
measurement of the j-th variate V. on the i-th individual by y .. ,
~
J
then the results of the experiment may be described as:
Individual
Response
Variate
VI
V
2
V
P
Y
Y
12
. 1P
Y
22
Y2P
----.. __. --_._1
2
n
·e
11
Y21
V
v
-n 1
TABLE 1.1
uATA FOR A :·XLTIVARIATE EXPERH1E:n
If
6
The data obtained from the whole experiment may thus be compactly summarized by the (n x p)
matrix
Y =' «y .. )) .
~J
If we assume that the
standard Multivariate General Linear Model (SM) describes the experimental situation, then we can write
(1.1.1)
=A
E(Y)
Var Y
where A is a
and
(n
~
= In
3 E
m) known design matrix of rank
x
~
is a (m
E
= «a rs )) is a (p* p)
x
p)
R(A)
= r (i min)
matrix of unknown parameters
positive de'finite (p.d.) matrix of
(usually) unknown parameters which represents the variance-covariance
.'
matrix of any row of Y.
It is clear from (1.1.1) that in the SM model, the measurements on different individuals are assumed to be uncorre1ated whereas
the measurements of the
may be correlated.
p
response variates on the same individual
The expectation of each measurement is assumed to
be a linear function of the unknown parameters
of~.
For testing
hypotheses and interval estimation in the 51-1 model, it is usually
assumed that the rows of Yare each distributed according to the
multivariate normal distribution.
An alternative way to represent the SM model, which will
provide some insight for analysis of the generalized models discussed
later in the chapter, can be given if we look at the experiment
variate-wise, and this will be referred to as the variate-wise
representation of the model:
If we let ls(n x 1)
denote the s-th column of
Y and
~(m x
1)
7
,e
denote the s-th column of
so that
~,
and
~
=
[~l ~2
-
...
~ ]
"""'"'P
,
(
then the SM model may be described as
Var (y ) =
s
(1.1. 2)
(j
(j
I
ss n
,s=l, ... ,p
when rt-s
I
rs n
Thus the variate-wise representation gives for each response variate,
the univariate model that would describe the experiment if only this
variate were observed.- These p
by the p(p-l)
2
separate univariate models are related
covariances between the different variate pairs.
A third way to represent the SM model, which will be referred
to as the vector version, results from rewriting Y and
y=
and
~
s.=
-1
-
, respectively.
~p
Then we have
(1.1.3)
E(y) = Ds.
Var y =
where
-e
D
I
P
S"l
oA
and
S"l
=
L: 0 I
n
~
as
8
If Q is known, it is clear that the vector version is equivalent to
the univariate weighted least squares model.
1.1.1
Estimation in the SH Model
In the S1'1 model, it is usually of interest to estimate linear
functions of the form
£(m
x
1),
~(p x
£'~~
and linear sets of the form C
1), C(g x m)
and D(p
x
v)
are known.
~D
where
The estimates
are usually required to be linear functions or sets of the data and
the usual estimation criteria are unbiasedness (estimability)
minimum variance.
Under the assumption of normality on
a maximum likelihood approach may yield such estimates.
interest to determine unbiased estimators
If we write
where
~
-s
C)
-p
is the s-th column of E;, s=l, ... ,p and
we let
d' = (d , d , ... , d )
1
C
I
-s
2
dc'
P
s=l, ... , p
s-
it is easily seen that
p
c';d
C
5=1
\.Jhere
h'
I
~
-s--"-s
(c I C T
-1 -'"'
""
.!2.'i
the rows of Y,
It is also of
of 2: and of the variances
of the estimators of £'~i and C~D.
(1.1.1.1)
and
4It
9
•
Thus we can consider the more general problem of estimating
h
h'~
where
is arbitrary and apply the results obtained to the special case
£'~i.
C~D
Similarly, instead of
H'i
problem of estimating
we can consider the more general
C2I Il
where H' = [Ci)
...
I
:
C' ]
p
is an
arbitrary matrix of known constants, and we can apply the results
obtained to the special case
CI(gv
s
where -g
d'(l
x
v)
= -s
d
m)
x
by letting
C~D
@ C , s=l, ... ,p
is the s-th row of D so that
H' = D I
@
C.•
Now it is well known (e.g., Roy [49], Srivastava [60]) that
the conditions for estimability of
c'
-g
£
YeA) , s=l, ... ,p,
rows of A.
where yeA)
~'i
p
~
c'~
s=l -g-s
are that
is the vector space spanned by the
£';~
For the special case
=
c'€ YeA) , without any requirement on d
the conditions simplify to
Similarly the conditions for
p
estimability of
H'i
~
=
C';
s---s
s=l
for each s.
are that the rows of C'
s
For the special case
C~D
belong to YeA)
this simplifies to the condi-
tion that the rows of C belong to YeA), without any requirement on D.
Rao [41], using generalized inverses, and Roy [49], using the
reparameterization technique, have shown that the best linear unbiased
p
estimate (BLUE) of
~'i
=
~
c';
s=l ~
the sum of the BLUE's for c ' ;
~
when it is estimable, is given by
obtained separately from the univariate
;'"
i J
10
Var v = cr I', s=l, ... , p.
models E(v ) = At.
"'-S
---a'
"'-S
ss n
the BLUE for
~'~
More specifically,
is given by
(1.1.1. 2)
where (A'A)
denotes any generalized inverse of A'A
Alternatively we can write this BLUE as
(1.1.1.3)
.
..
,?
where
A = (AI:~)
and Ar(n
x
r)
is a basis for A
and
c' = (c'
-s
: c')
-sI • -sD
and c'
-sI
is (1
x
r).
The variance of this estimate is given by
p
E
P
E c'(A'A)-c cr
r=l s=l -s
-r rs
or
c 10 rs •
-r
p
For estimating an estimable linear set H \; = E e'I:'
s2s'
s=l
Roy [49] suggests using the sum of the BLUE's for the linear sets e'E.
s--s
obtained separately from the univariate models E(ls) = AE. ,
---a
Var 1..s
(1.1.1.4)
0ssI , s=l, ... ,po
n
The estimate is thus
p ,
_
p
-1
)" Cs'(A'A) A'v or
<;"
C' (A'A) A'
"'-S'"
51 I I
IZs
5=1
5=1
11
.e
where C'
sI
is a matrix of r
C's =
columns obtained from
[e's I; C']
sD' s= 1 , ... , p .
I
The dispersion matrix of this estimate is given by
p
p
r
r
r=l s=l
C'(A'A) C cr
s
r rs
or
(Henceforth we will only use the generalized inverse description.)
p
Roy has shown that if
p
B~
r
s=l
r
is any competitor to
s=l
then
- p.
• 1',;
(1.1.1.5)
(i)
Var( r B~) - Var( r C~(A'A)-A'Zs) is positive
s=l
s=l
semi-def inite,
p
.e
(ii)
P
ch
[Var( r B'v)] ~ ch
[Var( r C'(A'A)-A'v)]
max
s=l s-s
max
s=l s
-s,
p
(iii)
tr[Var(
L:
p
B~)]
s=l
p
(iv)
Var( L: B'l.)
s=l s s
>
tr[Var( L: C~(A'A)-A'Zs)]'
s=l
I> I
p
Var( L: Cs'(A'A)-A'v)\ .
s=l
"'-s
In fact, a strict inequality like (1.1.1.5) (iii) or (iv) would hold
for all the symmetric functions of the characteristic roots such as
the trace and determinant functions.
Thus (1.1.1.4) is a unique
solution in terms of the minimization of the trace and generalized
variance criteria, but it is not a unique solution, though it still
yields a minimum, in terms of the maximum root criterion.
If it is assumed for the SM model that the rows of Yare each
·e
12
normally distributed, then maximum likelihood estimators of functions
of
may be considered.
~
In this case we can write the likelihood
function for the 8M model as
1
-:np
(1.1.1. 6)
ep
=
(2rr) 2
n
I
2
E
1
-1
exp (- 2 t r E (Y - A~)' (Y - AO].
A solution to the maximum likelihood equations obtained from ep is then
given by
and
E
=!n
Y' (I - A(A'A)-A']Y.
It then follows that unique maximum likelihood estimators
estimable functions of
C;D
~
of the form C
and H'i, respectively, where
columns of;.
~
~D
and
H'~
C;D
are given by
is obtained by rolling out the
~
p
E.
L: is biased for L: however, although
the bias can be corrected by replacing n by n - r.
}~E'S
of
Also E is the unique MLE of E and it is also a consis-
tent estimator of E, i.e., E
the
(}~E's)
and
H'~
Note further that
are the same as the respective BLUE estimators
given above.
1.1.2
Testing Linear Hypotheses in the 8}IModel
The general linear hypothesis for the 8M model is usually
given in the form
H:
o
where C(g
C~D
x
CE;D = O.
m) is of full rank g So m, D(p
is estimable.
x
v) is of full rank v and
A number of different test criteria have been
~
13
.e
proposed to test H.
o
these--~ilkst
Three of
Likelihood Ratio,
2
Hotelling's Trace (T ) and Roy's Largest Root--have been used in most
o
applications.
The test criteria are defined as follows:
Let
SE
= D'y'[I - A(A'A)-A']YD,
and
Then Wilks' Likelihood Ratio Criterion is given by
-1
p
= II (1 + A.)
(1.1.2.1)
J
p=l
.e
2
Hotelling's Trace (T ) Criterion is given by
0
2
(1.1. 2.2)
T
0
-1
= tr
(SHSE
) =
P
L
A.
and
J
j=l
Roy's Largest Root Criterion is given by
(1.1.2.3)
Some other proposed criteria are
p
-1
(1.1.2.4)
tr(SH[SE + SH]
)
=
L
j=l
(1.1.2.5)
-e
U1
1
+ A.
A.
J
J
J-1
A.
J
1
+ A.
J
,
r4
14
both due to Pillai [37], and
Roy's Hinimum Root Criterion:
(1.1.2.6)
In most practical applications, the use of Wilks' A criterion
.
h as b een
d to 1 arge samp1
··1
e s~tuat~ons.
restr~cte
that -[n - r variate with
infinity.
~ (v gv
g
It is well known
+ 1)] logA is distributed as a chi-square
degrees of freedom as the sample size
n
tends to
The above statistic can thus be used when the sample size is
2
large by comparing it with..; the appropriate ,percentage point of x
gv
Various researchers, in particular Box [14], have investigated closer
approximations to the distribution of A.
A comprehensive treatment
of these results has been given by Anderson [8].
2
The distribution of T
o
has recently been obtained by Siotani
[52] and Davis [18], although no tables have been published as yet.
2
Use of T
o
tion.
has thus also been restricted to its large sample distrihu-
It can be easily seen that
2
is distributed as X
gv
as
n tends to infinity'.
To use Roy's Largest Root Criterion, one can use the charts
of Heck [26].
Here however it is necessary to use
instead of A
l
s = min {g, v},
and to compute the parameters
- v I -1
n-r-v-l
n=
m=1-~
2
2
'
1
Recently the exact distribution of ~ has been tabled by
Scbatzoff [51] and Pillai [38], but use of their results is not widespread at this tL~e.
e·
15
The acceptance region is
e1
~
x
- a,s,m,n
where x
a,s,m,n
is the upper 100a
percent-
age point obtained from the appropriate Heck chart.
Although none of the other test criteria is usually used,
Pillai's trace
give~
in (1.1.2.4) has been- tabled byPillai [37].
At present, relatively little is known of the comparative
power of the three test criteria.
Nevertheless, in a recent study by
Schatzoff [50], simulation techniques were used to compare the six test
criteria given above with
regard to pow.er..
0/
It was concluded that Wilk s'
and Hotelling's criteria provide the best protection against a wide
spectrum of alternatives, although Roy's criterion is best for certain
restricted alternatives.
tt- .
of the
2
o
T
Ito [29] has compared the power probabilities
and A statistics for a simple class of alternatives and
has shown that when N - r
= 100 or 200, the powers differ at most in
the second place and for N - r
=
~,
the powers are identical for all
alternatives.
1.2
The Growth Curve Model (GCM)
Experimental techniques which observe and record responses of
an individual for a period of time, perhaps over a sequence of doses of
some drug, are generally discussed under the title of growth curve
experinents.
Growth curve models have been studied extensively in the
literature by Rao [40], [42], [43], [45], Khatri [31], Potthoff and
Roy [39], Allen and Grizzle [2], and others (e.g., [15], [22], [23]).
The model considered by all these authors may be given in general
form as
16
(1.2.1)
E(Y) =
A~P
Var (Y)
=
I
n
where yen x q), A(n x m),
R(A) =
r(~
@ E
~(m
x p), pep x q), E(q x q)
is p.d.,
m), R(P) = p, and the rows of Yare independently normally
distributed.
Implicit in the model is the assumption: that each indi-
vidual's response can be described by a linear model of the form
(1.2.2)
E(v.)
= pIS.
L~
_~
where L~
v.(q x 1)
on the i-th individual and S.
-~
is
a
is the observation vector
vector of unknown parameters.
We call (1.2.2) the within individual model. It is also implicitly
assumed that the expected value of each individual's response comes
from the same family (e.g., it is a (p-l)-th degree polynomial).
1.2.1
Estimation in the GCM Model
As with the SM model, it is usually of interest to estimate for
the GCM model linear functions and sets of ; of the form
P
E
P
1:.'5.. =
E c'C'
-s-2-s
s=l
and H'5..=
C';
S'""'"s
s=l
where
~
is the s-th column of ;
and 5.. is the column vector defined in (1.1.1.1) by rolling out the
columns of
~
into one long (mp
x
1) vector.
is the estimation of ; itself (i.e., H'I when
If it is assumed that
s
can be given by
-1
(1. 2 .1.1)
(A'A)
-1
A'YG
H :: I
mp
).
R(A) = m (i.e., A is of full rank),
then it is easy to see that ; (i. e.,~)
unbiased estimate of
Of particular interest
-1-1
p' (PG
PI)
itself is estimable and an
17
where G(q x q) is any symmetric p.d. matrix either non-stochastic or
-l'
independent of
or G
Y for which PG
P'
is of full rank (e.g., G
-1
= ~).
The matrix G
or the weight matrix.
Roy [39].
is frequently referred to as the weights
Use of G was first suggested by Potthoff and
Provided G is non-stochastic, it is easy to see that
can be put in the form
F'y where
F
is non-stochastic and
defined by rolling out the columns of
vector.
Thus,
= m,
R(A)
hand
H,
Y
I is a linear set in 1.
it follows that ~'I and
~
Yq]
= [1.1
Since
H'I
~
1.
into one long
is estimable when
~'I
~
H'I
and
is not estimable.
case i t is easy to see that if we want to estimate
respec-
c'~d
In this
and
and if we require our estimators to be of the form t'Yw
~D
T'yW
where
~,
and
.::' T and W are non-stochastic, then the estimability
conditions are that
c'
€
V(A),
£
€
V(P')
for
£'~~
and that the
belong to V(A), the columns of D belong to V(P')
rows of C
is
is defined by (1.2.1.1).
If A is not of full rank then
C
~
are estimable for arbitrary
and unbiased estimators are given by
tively, where
e,
= Iq
for
p
C E;D.
More generally, if we consider arbitrary
~'I
= L
c'~
s=l --s---s
and
P
H '_" =
'"
r c' [ and we require our estimators to be of the form
s=l sls
q
q
1'1.
Y
-e
=
=
L
i=l
f~y.
-~ ~
[1.1 ...
(1.2.1.2)
and F'y
Zq],
c'
-s
= L
i=l
,
F~Z.'
~ ~
respectively, where
then the estimability conditions become for
= .E.'s
LA, s= 1 , •.. ,p,
~'I:
/s
18
where L is arbitrary and
(1.2.1.3)
c's
E~
is the s-th row of P, and for H'i:
= L' A s=1, ... , p
s'
where the i-th rm" of L
s
~Li for arbitrary
is of the form
L ••
1
The
conditions given in (1.2.1.2) are more restrictive than just c' e: V(A)
-s
alone, the estimability requirement for the SM model, since the matrix
P
is involved.
Similarly, (1.2.1.3) is more restrictive than the
requirement that the rows of C' belong to V(A).
s
These conditions are
obtained observing that if
F '.1.
P
1:
=
P
F ~.1..
is unbiased for
1: 1
i=l
q
q
then
E [ 1:
i=l
F~.1.']
:l. 1
=
1:
s=l
L
P
F ~A( L
i=l
1
p
q
=
1: ( 1:
s=l i=l
P
1:
p
s=l
.~ ),
S1-S
where P
=
«p .))
S1
P .F~A)~
S1 1
-s
C'~
s=l
s---s
so that
C'
s
= (
q
1:
i=l
P .F ~) A
S1
1
= L'S A
as in (1.2.1.3).
In Roy's paper [49] it is pointed out that if
with regard to functions of the form fly, then
P
L c' (A' A) -A' x
-s
-s
s= 1
~'~
is
est~ab1e
19
•
-1
where x
-s
X = yp'(PP')
is the s-th column of
is the BLUE of
terms of linear functions of the elements of X.
~'~
in
This can be extended
to show that for any choice of G as in (1.2.1.1)
P
,"
r
c'(A'A)-A'x'
s=l -s
-s
where x *
-s
-1-1
-J
is the s-th column of X*
YG
P'(PG
is the BLUE of
pI)
oJ(
in terms of linear functions of the elements of X .
to choose G
to obtain a BLGE of
the elements of Y.
~';
We would like
in terms of linear functions of
can do this by using Lagrange multipliers with
\~e
.)~
respect to the estimability conditions (1.2.1.2); that is, we wish to
find f
to
min~ize
Var(l'y)
that £~ = ~~LA, where l'y =
subject to the estimability conditions
P
L:
i=1
f ~y.
-~ ~
and
L' = [f
-1
... f J.
-q
Now
Var (l'y)
q
q
L:
L:
i=l j=1
f ~f
.0 ..
-~-J
~J
so that we must consider the Lagrange expression
II:
r
=
q
q
L:
" Cf .cr
I.•
-~-J
i=l j=1
.
~j
p
q
,\ I (c
A'
,
..
L:
P .f.)
+
-s -s
s~-~
s=1
i=l
Taking derivatives with respect to f.
-J
2
or
• T
JW
'.. ;tlere
f
q
i:
i=l
P
f.0 .. - A;::
-~ ~J
s=1
we get
p ..\ ,j=l, ... ,q
s J-s
).0
20
Now
a'¥
=
aLI
2L I 2:
0
provided
MP
=
-1
-1
i. e. ,
2A'L'2:L:
pI
i. e. ,
2A'L'p'(p~
i. e. ,
2C 1 (PL:
where
C'
=
A'AAPL:
-1
-1
Thus
i. e. ,
=
L'L:
L'
pI
-1
= A'M
PI)
-1
= A'M,
pI)
[c c '"
- 1 -z
c ]
-p
= A(A'A)-C'(PL:
= A(A'A)-C'(PL:
-1
-1
pI)
-1
P
-1
pI)
-1
PL:
and
-1
where
is the i-th row of L:
-1
P'(P2:
pI)
-1
Thus
q
L
i=l
f~v.
-l.Ll.
=
q
L
i~l
P
L:
5=1
p.
element of
Z
5=1
p
=
I"'
'-'
s=1
I (AI A) -A 11.., where p.;s
1
p.
-1
q
P
=
C
15-S
C
I (A' A) AI ( L:
-S
i=1
c'(A'A) - A'x *
-s
-s
p ig1.i)
•
is the s-th
21
where
-1
~\:
is the s-th column of YE
X
-s
Thus the optimum choice of G
-1
P'(PE
is given by G
-1
P' )
E
and we have the
following theorem:
1.2.1.
THEOR~r
For the GCM model, the BLUE of an estimable linear
p
function of the form
p
= E
s=1
is given by
*
-
-1
~\:
where -s
x
s-th column of YE
is the
Theorem
-1
P'(PE
-1
P' )
1.2.1 is unfortunate since E is usually unknown.
us to consider estimates of
of Y alone.
~
This leads
which are not necessarily linear sets
In particular, the theorem suggests that we use for G
a good estimate of
ments.
c'~
-~s
E c' (A' A) A' x
s=l -s
-s
(1.2.1.4)
e.
h'~
~
based on the data or based on previous experi-
In fact, it has been shown (e.g., Khatri [31], Rao [43],
Grizzle and Allen [2]) that if C
~D
is estimable and normality is
assumed on the rows of Y, then the maximum likelihood estimator of
C
~D
is given by C;D
where
-1-1
-1
(A'A)-A'YS
(1.2.1.5)
and S
P'(PS
pI)
is the sample covariance matrix obtained from the data and
def ined by
(1.2.1.6)
-e
S
y'[I - A(A'A)-A']Y.
The maximum likelihood estimator of
use of
=
instead of S
~
is
.~. = -n1 Sand
in (1.2.1.5) clearly does not change;
'"
22
The dispersion matrix of C OJ
Williams [70]
(1.2.1.7)
1.2.2
was derived by Rao [45] and
and it is given by
=C
~D)
Var(C
-1
(A'A)-C'8 n - m - 1
D'(PL
n-m-(q-p)-l
-1
pI)
r
D
Testing Hypotheses in the GCM Model
As for the 8M model, hypotheses of interest for the GCM model
are usually in the form
H:
o
C~D
where C(g x m)
C~D
=
0
is of
is estimable.
ful~
rank g
and.D(p·x v)
is of full rank v
In the analysis of Potthoff and Roy [39], H
o
and
was
tested by transforming the GCM model (1.2.1) into the 8M model by use
-1
of a weights matrix G
where G is as described in the previous
section, and then using either Wilks', Hotelling's or Roy's criterion
as the test statistic.
The transformed data matrix in this case is
given by
-1
(1.2.2.1)
X
= YG P'(PG
-1
-1
pI)
and the hypothesis and error matrices are given by
(1.2.2.2)
8
= D'X'A(A'A)-C'[C(A'A)-C']
H
-1
C(A'A)-A'XD
and
(1.2.2.3)
8
E
= D'X'[I - A(A'A)-A']XD
Potthoff and Roy's approach did not allow G to be stochastic unless
it was independent of Y.
kI where k
Thus G could be chosen as an estimate of
is any constant, provided this estimate was obtained in-
dependently of the experiment under consideration.
-l
-1
Choosing G • 8
~
;
23
where S
·e
is the sample covariance matrix based on the data in Y was
therefore not allowed.
Khatri [31] used the method of maximum likeli-1
hood to obtain test criteria in which S
is used as the weights.
Khatri's hypothesis and error matrices simplify to
-1
(1.2.2.4)
SH = D'X'A{A'A)-C'[CRC']
C{A'A)-A'XD
and
-1
S
(1.2.2.5)
where X
E
= D'
[P S -lp , ]
D
is defined by (1.2.2.l), S
is defined by (1.2.1.6) and
• . 'J
(1.2.2.6)
. -1
R = (A'A)-+(A'A)-A'y[S-l_S-lp'{pS-lp,)pS-l]y' A(A'A)-
Rao [43] criticized the use of G-
l
by Potthoff and Roy by pointing
out that not all of the information available in the sample y would
be utilized unless G-
l
l
is chosen to be the unknown E- •
Using the
method of covariance adjustment Rao [42], [43] showed how the additional
information could be utilized by incorporating into the model (q - p)
covariables which give a column basis of the matrix
-1
[I - p' (PP , )
-1
P] .
(Note that V([I - P'(PP')
vidual error space.)
P]) is the within indi-
Rao also pointed out in [45] that weighting did
not always produce shorter confidence intervals.
Grizzle and Allen
[2] unified the previous approaches and described how the results could
be used on a computer in. applications.
Also they showed that Rao's
results using covariance adjustment were identical to those of Khatri,
and they described how to use stochastic weights other than 5-
1
if it
was desired not to use all the vectors in the error space as covariables.
24
1.3
Experimental Situations to Which the SM Model Does Not Apply
The SM model as defined by (1.1.1), (1.1.2) or (1.1.3) contains
two assumptions inherent to the experimental situation which are not
always met in practice.
(i)
These are
each response variate is measured on each experimental unit
(i.e., on each experimental unit is observed a p-variate vector)
(ii)
the design matrix, A, is the same for each response variate
(e.g., the same blocking system is applicable to each variate).
In general, condition (i) will not hold whenever it is
physically impossible, uneconomical or inadyisable to observe each
response variate on each experimental unit or, in other words, when
there are missing observations, either by design or at random.
Condi-
tion (ii) will not hold when different blocking systems are applicable
to different response variates and when some of the response variates
are insensitive to certain treatments.
In addition, in a somewhat
different context, a relaxation of (ii) is required for models which
describe certain kinds of econometric problems which involve the estimation of a set of different regression equations which are not independent of each other (i. e., "seemingly unrelated regressions" as
described by Zellner [72], Telser [61], and others).
Several examples of experiments in which (i) is not applicable
are described by Trawinski [63], Trawinski and Bargmann [64], and
Srivastava [57], [58].
design.
In each of these, data
is missing by
For example, if the process of taking measurements is time
consuming and the experimental material has a limited life or the
experiment has a fixed time limit, then the experimenter may have to
be content with measuring fewer than p
response variates on each
e-
25
.e
experimental unit •
As a second example, a psychologist may wish to administer 20
different tests to a group of subjects t but realizes that giving any
subject more than, perhapstfour tests might result in a bias on later
tests obtained from practice on earlier tests.
Thirdly,
th~
experimenter may not be equally.interested in
the different response variates, and he might like to observe some
variates on a larger number of experimental units than on others.
This
might be the case if there is a limited budget for the experiment and
the measurement of
respo~ses
is so costly that the experimenter cannot
afford to measure each response variate on each experimental unit.
As a fourth example t suppose u
experiments are conducted in
different localities involving the same set of treatments.
,e
At the i-th
locality, a subset of q. out of p response variates is measured on each
1.
experimental unit.
Trawinski [63] describes an experiment in which there are 3600
individuals divided into two groups (boys and girls) of 1800 each, and
there are four response variates of interest labeled VI' V , V and V .
4
2
3
It is desired to administer three treatments (drugs, for example) on
each group in such a way that each individual is observed on only two
of the four variates, all variate pairs are observed, and an equal
number of observations are made on each variate pair.
Such an experi-
ment requires choosing missing observations to provide a sort of
balance among'the variate pairs observed.
In this example, the 1800
in each group are divided randomly into 3 subgroups of 600 individuals,
each subgroup getting a different treatment.
A given subgroup is
then divided randomly into 6 sub-subgroups of 100 individuals each.
)Jo
26
Individuals in a given sub-subgroup are observed for one of the 6
(V , V ).
3
4
In this way, each variate pair is observed on the same
number of individuals in the entire experiment, 600, the same number
of individuals within each group, 300, and the same number of individuals for each treatment within each group, 100.
Note that the
variate design is a BIBD with v = 4, b = 6, k = 2, r = 3 and
A = 1.
Other examples of variate designs are described by Trawinski.
Srivastava [57J gives a number of examples for which condition
For.sinstance, consid'er an agricultural experi-
(ii) is inapplicable.
ment in which seven varieties or treatments are applied to 28 experimental units according to the following plan:
I
II
III
IV
V
VI
VII
I
3
1
5
2
7
4
6
II
6
3
7
5
1
2
4
III
7
4
1
6
2
3
5
IV
2
6
3
1
4
5
7
Row
Blocks
TABLE 1.2
SRIVASTAVA'S EXAMPLE
Suppose that two response variates VI and V are measured on each
2
experimental unit, but that the pattern of heterogeneity is such that
the 7 column blocks are a good block system for VI and the 4 row
blocks are a good block system for V .
2
If we denote by
Y (28
l
x
1)
e
.
27
.e
and Z2(28
1)
x
the vectors representing the data observed on V and
l
V2 , respectively, and let Y
= [Zl Z2]' then the expectation model for
the experiment is
where A (28
l
v
= b = 7,
x
14)
=
r
k
is a design matrix for a BIB experiment with
=
4, A
A (28
2
x
11)
I 1 (14
x
1)
= 1,
is a design matrix for ,a Randomized Blocks
experi.ment,
\.
I
= J[ :1
z
(11 x 1) -
]
[~z]
and
~1(7 x
1)
denotes the column block effects,
~2(4 x
1)
denotes the row block effects,
l(7
x
1)
denotes the treatment effects.
For another example, consider a
ment without blocks from a 2
C, D, E
S
factorial.
t fractional factorial
Denote the factors by A,
n,
and the eight treatment combinations used by (00000), (00101),
(10110), (10011), (01010), (01111), (11100), (11001).
sponse variates V and V
l
2
that Vl
experi-
are observed on each experimental unit, but
is insensitive to levels of D and E, and V
2
to levels of Band C.
Suppose two re-
is insensitive
Suppose further that we are interested in
28
28
estimating effects up to and including two-factor interactions.
Then
e
the expectation model for the experiment is described by·
.
I
E(Y)
where Y(8
x
2)
£.1 (7
[AI £.1 • A2£.2 ]
is the data matrix
x
1) = (U 1 ' AI, B, C, AlB, AlC, BC) '.,
.5..2 (7 x 1)
.
A (8 x 7) =
1
2
(U 2 ' A • D,
E, A2D, A2E, DE) , ,
1
-1
-1
-1
1
1
1
1
-1
1
-1
-1
1
·1
-1
-1
1
1
1
-1
1
1
-1
-1
-1 .
-1
,-1
1
1
-1
1
-1
1
-1
1
1
1
1
1
1
1
1
1
-1
-1
-1
-1
1
1
1
1
-1
1
-1
-1
1
-1
-1
-1
1
1
1
1
-1
1
-1
-1
1
-1
1
-1
-1
1
1
-1
-1
1
-1
1
1
-1
-1
1
1
1
1
-1
1
-1
~l
1
1
-1
-1
-1
-1
1
1
1
1
1
1
1
1
1
1
-1
1
-1
1
-1
1
.,
•
and
A (8
2
x
7) =
Note that in this example Al :F A
2
.
Gourlay [24] and others have considered a model (See Table 1.4
below. )
for an experimental design involving a campi ete set of 3
x
3
Latin Squares corresponding to the six possible orderings of three
treatments A, B and C.
The two squares are replicated n
each of the six groups consists of n
times so that
individuals chosen at random from
.
e
.
29
-
Ordinal
Positions
1
2
3
1
A
B
C
2
B
C
A
3
C
A
B
4
A
C
B
5
B
A
C
6
C
B
A
Groups
TABLE 1.3
GOURLAY IS MODEL
.'
a total of 6n
individuals to whom the treatments are applied.
individual goes through a sequence (row) of three treatments.
assumed that the results for a given individual are correlated.
of the n
It is
Each
observations in anyone of the eighteen cells is assumed to
have the following model:
= <J
2
=
Cov (Y ij (k) q' Yi 'j , (k ' ) q , )
{
where
Each
JJ
o
i f i#i' alid/or q#q'
1, ... , 6 (groups)
i
j
a .. , if j#j', i=i', q=q',
~
1, 2, 3 (ordinal positions)
k • A, B, C (treatments)
30
= 1,
q
... , n (replications).
If the observations on each individual were not correlated, then the
usual univariate analysis of variance model of the form
E(l)
= A~
Var (1.)
where
=a
2
I
18n
A is an (18n x 24) design matrix of zeros and ones,
~ = (Q' ,
I' ,
N') ,
Q= (Ql' Q2"Qj)"
T = (T , T , T )'
C
A
B
and
would be appropriate.
However, because the rows are correlated we
actually have a model of the form
E (1.)
=
Var(y)
where
L
=
As..
=
«a ..
JJ
L
0 I
I))
6n
and a ..
JJ
=a
2
for all j.
By considering the ordinal positions as response variates VI' V , and
2
V , respectively, and by letting ys(6n x 1), 5=1, 2, 3 denote the obser3
vations on V ,
s
where
the model can also be expressed as
31
.e
and A (6n x 10)
s
is the design matrix corresponding to V , s=l, 2, 3.
s
The three design matrices AI' A , and A
2
3
will all be different.
although anyone of these matrices can be obtained from any other by
column permutations.
Zellner [72] has pointed out that there are many econometric
situations which call for linear models in which different design
matrices correspond to different response variates.
The usual problem
can be described as the estimation of a set of correlated regression
equations.
The different response variates would correspond to the
different regression equitions.
Typical examples of such situations
include analysis of temporal cross-section data. time series regression analysis of demands for a variety of goods, or a cross-section
.e
2
budget study involving regressions for several commodities .
Another
application occurs when each regression equation refers to different
points in space. as in Barten and Koerts' analysis of voters' transitions from party to party within various voting districts [10].
1.4
The More General Linear Multivariate Model (MGLM)
To deal with situations such as described in the previous
section and also with situations which involve both missing observations
and different design matrices corresponding to different response variates. we now define a model which we call the More General Linear
Multivariate Model (MGLM).
"Hore" is used instead of "Most" in the
2With regard to the latter. a set of data may have been obtained from a survey of household expenditures during some time period.
The regression equation used to explain outlays on some commodity may
generate residuals that are correlated with the residuals of another
regression equation that was used to explain expenditures on a diiferent
cor..modity.
32
model name because there are meaningful models such as growth curve
models which are not special cases of the MGLM model.
Let us assume that there are n
response variates V , • • "
l
are divided into u
with n.
responses Vt
jl
,V t
in total .
P
The n
experimental units
disjoint sets of experimental units Sl' S2' ""
On each unit in the set S., we measure
J
units in S ..
J
J
V
experimental units and p
Su
q.(~p)
J
(The remaining p - qjresponse
'
j2
variates are not measured in S .• )
J
Then the MGLM model is given by
= [A'
~n
n
A.
J ""j l-""jl
Var (Y.) = I
J
@ B~LB.
n.
J
J
J
j=l, ... ,u; 1
qj
~
and for each s, s=l, ... ,p
<
Here
{~l'
...
,~,
J
< p
(so that the total set of unknown para-,
0)
Y.(n. x q.)
J
••• <
there exists at least one pair (j, r) such
that _"
n.
="
'o""
.2s' i. e. , t.Jr = s
Jr
meters is
n £n
J""j2 ""j2
is the matrix of observations for the j-th
J
set S.,
J
A.
n
J"jr
(n.
J
x
is the design matrix for variate
m
)
£jr
VQ,.
Jr
in the set S., and R(A'
J
n
J""jr
)
= a.
Jr
33
•
~
is the parameter vector for variate
x 1)
(mQ.
jr
jr
(and m~,
= m
£., ,
J r
jr
whenever £.
J r
£ .,
J r
,),
is the incidence matrix of rank R(B.) = q.
J
for the j-th set of experimental units.
of O's and lIs and is defined by B.
J
1
(1.4.2)
if~.
Jr
= s, i.e., variate V
s
J
It consists
«b(")
J sr
)) where
is the r-th
ordered variate measured in the j-th experimental set Sj
o otherwise.
And let us further assume that
Y.
(1.4.3)
J
and
Y~.
are independent i f j=!j'
J
and
the rows of Y
(1.4.4)
j
are also independent and distributed as a
q.-variate multinorrnal vector with variance-covariance
J
matrix
B~LB .•
J
J
The MGU1 model can alternatively be described by its variatewise representation (as was done for the SM model in (1.1.2)), provided
the following
definition~
are made:
Let x , s=l, ... ,p, be the vector of length N , say,
s
~
corresponding to all observations on V
s
Let D (N
s s
ponding to
~s'
x
i.e., D
s
in the entire experiment.
m ), s=l, ... ,p, be the design matrix corress
is detennined by E(x )
-s
= Ds--s
; .
34
Let Q (N
rs r
x
N ), r
s
<
s
denote the incidence matrix of O's
and lIs defined by Q = «q..
))
rs
~J (rs)
(1.4.5)
where
1 if the i-th component of x
q ..
-r
~J (rs)
and the j-th
component of x are observed on the same
-s
experimental unit
o otherwise.
Then the variate-wise representation of the MGLM model is given by
E(x
)
-s
(1. 4.6)
= Ds--s
e;
}
Var(x
) = cr ss IN
-s
s
Cov(x , x ) = cr Q
r < s
-r -s
rs rs'
Cov(x , x ) = cr Q' , r
-r -s
rs sr
s,
>
r , s=l, .•. , p.
Note that if we let
u
S
s
t
= the number of sets S.
~
.
, i=l, ... ,u
s~
s
s
s
s
is measured,
denote the different sets in which V is
s
measured where
then -Sx , D , and N
in which V
are given by
~
•
35
x (N
-s s
(1.4.7)
x
1)
, D (N
.;;
and
Ns
=
s
';;Sl s .
n
t
su
s
x
m )
s
s
A
s
t
su
s
s
.
s~
With the above definitions, we can also present the vector
,1
version of the MGLM model (as was strnilarly obtained for the SM model
in (1.1.3»
as follows:
(1.4.8)
E(3S.)
= Ds..
e.
where
(1.4.9)
3S.(N
x 1)
=
~l
' D(N
x
H) = Dl
~(M x 1)
0
D
2
~2
0
Dp
x
-p
Q(N x N) =
.e
alp
Q{p
°pp
I~
p
36
N
=
P
L
s=l
1.4.1
N ,and M =
s
P
L
s=l
m
s
Special Cases of the MOM, GIM, ffi1 and Other Models
The special cases of the MGLM model to be described in this
section have all been dealt with to some degree in the literature.
We
will defer a thorough discussion of this literature until Chapters Two,
Three and Four. which consider estimation and testing hypotheses.
Srivastava [55], [57], [60], Zellner [72], Zellner and Huang
[74], Telser [61J, and
~uenta
and Gilbert [33] have considered the
special case of the MGLM model where u
B = Ip'
l
= 1, n l = n, ql = p, Yl = Y and
This reduces to a model given by
(1.4.1.1)
Var(Y)
= In
8
L.
Srivastava has called (1.4.1.1) the Multiple Design Multivariate Model
(MOE) .
It is clear from (1. 4 .1.1) that for the MOM model different
design matrices correspond to different response variates.
The other
authors have referred to (1.4.1.1) as seemingly unrelated regressions.
In the latter context the }IDM model may be put into the following
variate-wise representation:
(1. 4.1. 2)
cr
I
rs n
r,s=l, ..• ,p,
where
Thus from (1. 4.1. 2) it can be seen that the HDM model describes a
situation where there are p regression equations involving different
e-
37
regression coefficients, so that they are seemingly unrelated, and yet
any two regressions are correlated according to the covariance structure
def ined in L:.
The vector version of the HDH model is given by
(1.4.1.3)
E (1.)
=
D~,
Var (1.)
=
l:
E,?
I
n
where
p
D(np
x
m ) =
s
l:
s=l
A
l
0
A
2
0
A
P
and
p
~( L:
s=l
m
s
x 1)
=
P
Clearly D is of full rank R(D)
l:
m
s=l
matrices is
s
if and only if each of the A
s
of full rank.
A second special case of the MGLM model is a general multivariate linear model for missing observations which is obtained when
A.
0
:::
A., r=l, ... ,q.
J"jr
.e
J
and when
~s
is (m
J
dent of s.
This model is then given by
(1.4.1.4)
E (Y .) = A.;B.
J
J
J
x
1), where m is indepen-
38
=
Var (Y.)
-J
I
, j=l, ... ,u.
n.
J
Srivastava [57J, [58], [60J has called (1.4.1.4) the General Incomplete
Nultivariate Hodel (GIM), and we will henceforth refer to it by this
name.
Trawinski [63] and Trawinski and Bargmann [64J have considered
the special case of the GI}! model where qj
j=l, ... ,u.
= q,
n
A· . =
- A,
n. -= -,
J
u
J
Another special case of the Gll1 model which has been con-
sidered separately by Srivastava [57J, [60J occurs when the missing
observations are
B.
J
(p
x
j) = [I. 1
J
structu~e9
-
o.J , P-J.]'.
so that q.
J
= j,.
u = p
and
.
We will refer to this latter model as
the Hierarchal Incomplete Multivariate Model (HIM).
It should be noted that if there are no missing observations
(i.e., the model is complete), then (1.4.1.4) reduces to the SM model
.'
where the design matrix A of (1.1.1) is given by
A
u
Thus for the GIM model it is clear that each of the u
exclusive sets of experimental units (e.g., S.)
J
subset of response variates (e.g., V£
'
mutually
corresponds to a unique
which are observed
jl
together only on the given set of experimental units (5.).
J
The largest
~
39
can take is 2 P - 1 , which will be obtained when the sets
value that u
of missing observations correspond to all possible combinations of the
p
variates taken 1, 2, ... and p
at a time.
The Hill model described above has been dealt with in a more
general context by Srivastava and Roy in [55].
q. = j
J
in the
for j=l, .•• ,p, t.
Jr
~lGU1
= r for r=l,.·.. ,j
By letting u = p,
and
o.
B.'" [I.
J
J
.] ,
J ,P-:J
model, we obtain:
(1.4.1.5)
= '1" S B ~ rB
,
n.
J j .
J
It is clear that (1.4.1.5) incorporates both missing observations and
different design matrices for different response variates.
has called this model simply the Hierarchal Multivariate
Obviously, when A.
Jr
= A.
J
for r=l, ... ,j
Srivastava
(~l)
model.
the HM becomes the HIM model.
Note that if we let
U = the set of all experimental units on which response
s
variate V
s
is measured
then for the HM model we have
Thus the HM model deals with a situation in which response variates
can be graded monotonically in descending order of importance.
addition, if V
r
In
is more important than V , 1. e. , r < s, then V
s
r
observed on each experimental unit on which V
s
is observed.
The
variate-wise representation in (1.4.6) and the vector versions in
is
40
(1.4.8) simplify considerably for the ffil model since we have
x =
-s
(1.4.1.6)
=
A
s
A
ss
A
';+l,s
s+l,s
A
ps
= [ON N-N
s' r
I
s
N
]
for r
I
<
s,
s
and
Ns =
1.5
P
l:
r=s
n
r
A Generalization of the Growth Curve Multivariate Model
Assume that there are n
experimental units and q
at which measurements are taken.
into u
The n
experimental units are divided
disjoint sets of experimental units 51' 52' ... , Su
units.
For j=l, .•. ,u
let
t
j1
,
...
incidence matrix specifying which of the q
J
where S.
J
be the qj (s.q) time
)
points at which observations are taken in Sj'
5 ..
time points
Let B
be the (q x qj)
j
time points is observed in
Assume that each individual's response can be described by a
linear model of the form
(1.5.1)
where
v
":"'j k
(q
j
x 1)
from S., P.(p x q.)
J
J
J
is the observation vector on the k-th individual
has rank p for all j, where p
is a vector of unknown parameters.
~
q., and S'k(P x 1)
J
-J
We call (1.5.1) the within
41
individual model.
We assume that the expected value of each individual
in the total experiment is from the same family so that
where pep
x
q) is of rank
p.
P. = PB.
J
J
The complete model expressing both the
within individual and across individual design is given by.
(1.5.2)
=. A.J ~PBJ.
E (Y .)
J
Var (Y.)
J
where
I
n.
e
J
j =1, ...
,u.
x
A.(n.
x
~(m x
p)
is the matrix of unknown parameters,
pep
q)
is a regression matrix of rank p,
J
J
J
x
q.)
B ~ EB.
J J
Y.(n.
J
e.
=
J
m)
is the observation matrix for the j-th
is the design matrix for the j-th set SJ"
is the incidence matrix of O's and l's for S ..
J
We will henceforth refer to model (1.5.2) as the Generalized Growth
Curve Multivariate Model (GGCM).
We will return to the GGCH model in
Chapter Three when we discuss BAN estimation of estimable linear sets
of the treatment parameters .
.e
CHAPTER II
BLUE ESTIMATION FOR THE MGLM MODEL
2.1
Introduction
This chapter considers BLUE estimation for the MGLM model
•
0J
defined in (1.4.1), (1.4.6) or (1.4.8) of linear functions and linear
sets of the treatment parameters
(2.1.1)
e
. .!l'I =
P
L:
s=l
~l'
..• ,
~
of the form
C't,;
--s-"'S
and
p
(2.1.2)
e = H'I =
L:
s=l
C't,;
s-s
.
We seek linear estimators of the form
(2.1.3)
.-
e = fIx =
P
L:
f'x
s=l -s-s
and
(2.1.4)
e" = F'x =
P
L:
s=l
F'x
s-s
respectively, where
x (N
-s
s
x
I), s=l, ... ,p
is the observation vector for V
alone as defined by (1.4.7),
_x(N
x
1) .., (Xi x ' ... x')'
- 2
-p'
s
e-
43
Srivastava t60J has considered BLUE estimation for the special
cases of the HIM, GIM and MDM models.
The approach described below
generalizes Srivastava's results to the MGLM model.
proof simplifies Srivastava's work considerably.
The method of
In addition, the
results obtained for line,iu functions .are .extended to linear sets 6.
Furthermore, a consistent and unbiased estimate of the covariance
matrix is obtained and may be used to determine unbiased estimators of
A
var(6).
e·,
2.2
BLUE Estimation of Linear Functions
Definition 2.2.1:
(Srivastava [60)
function of the form
=
h'~
if each component sum
h'~
For the MGLM model, a linear
P
E -s--s
c'~
s=l
is said to be piecewise estimable
has a BLUE
under the univariate
model for Valone as given in (1.4.6);
s
E(x
)
-s
= Ds--s
~
,
Var(x)
-s
= cr S5 IN
(This is equivalent to requiring that c'
-s
£
5
V(D ) for each s.)
s
Lemma 2.2.1. (Conditions for the estimabi1ity of
model,
e = h'i
h'~);
-
For the MGLM
is estimable if and only if it is pieceWise estimable.
44
Proof:
The method of proof used by Srivastava [60] for the special
cases of the HIM, GIM and MOM models can be directly extended to the
MGLM model.
Theorem 2.2.1:
r = «0 rs »
If, for the MGLM model, the variance-covariance matrix
is known, then every estimable linear function
e = _h'_~'"
has a unique BLUE given by
(2.2.1)
where D and n
are the design matrix and variance-covariance matrix
from the vector version definition of the MGLM model as given in
(1.4.8).
The variance of this BLUE is given by
(2.2.2)
Proof:
Clearly
n
is known, symmetric and positive definite if
r
Thus (1.4.8) is equivalent
is known, symmetric and positive definite.
to the univariate weighted least squares regression model which yields
the BLUE of (2.2.1) and variance of (2.2.2).
q.e.d.
Because
r
is assumed unknown,
n
must be unknown, too.
Thus
if we restrict our estimators to be known linear functions of x, then
we cannot use the
e
in (2.2.1) unless it is independent of
n.
Fortunately, when considering the SM model, the structure of D and n
allows r
to drop out of (2.2.1) so that this estimate becomes the sum
of the BLUE's for
(1.1.1.2).
c'~
-s--s
obtained separately for each s
as given in
This is shown as follows using the notation of (1.1.1),
(1.1.2) and (1.1.3):
45
l2.'(D'n
-1 D) - D'n -1y=
h'[(I'~
P
-1
A')(2:.
~
I )(1
n
p
~
-
A)] (I
p
~
A')(2:
-1
~
I)y
n
=' h' [I ~ (A' A) -A' ] v
p
"-
=
For the MGLM model. in general,. 2:
so that if we restrict estimators of
be satisfied with a
(2.2.2).
e
e
does not drop out of (2.2.1)
to be linear, we might have to
that does not achieve the minimum variance
It is, therefore, of interest to determine when the minimum
variance is attained by a knotvnlinear function of
~,
function which does not depend on the elements of 2:.
that is, a linear
Srivastava [60]
has obtained for the special cases of the HIM, GIM and MDM models
necessary and sufficient conditions for h'I
have a BLUE of the form f'x
as defined in (2.1.1) to
where f is known.
These are conditions
on the vector spaces spanned by the columns of the design matrices D
s
corresponding to the different response variates.
Srivastava showed
that i f the BLUE is not a function of the elements of 2:, then it is
given by the sum of the BLUE's for
s
from the univariate
alone.
.e
~odel
c'~
-s-s
obtained separately for each
given in (1.4.6) corresponding to V
s
This result is thus the same as that obtained for the SM model
~~cept th~t
in the latter the BLUE is always independent of Z
if hI;
46
is estimable whereas in the GIM and }IDM models other conditions aside
from estimability conditions must be satisfied.
Srivastava extend to the MGLM model.
~
These results of
The proofs given by Srivastava
have been simplified considerably by this author by use of the
following lemma:
a
Lemma 2.2.2:
p.
=
E
P
¢S
is a BLUE for
E c'~
s=l --s--s
a =
s=l
p
if and only if
E(z)
=
Proof:
cov(a, z)
=
0
whenever
z
= E d'x
s=l -s-s
satisfies
O.
The reader is referred to Rao [41], page 257, for a proof of
this lemma.
It is important to note that if for the MGLM model a
be a linear function of x
is restricted to
which does not involve E, then any a
satisfying Lemma 2.2.2 will be the same estimator given in (2.2.1) and
thus will have minimum variance for all functions of the data.
Theorem 2.2.2:
Let
If the BLUE for a
a
= ~'I be an estimable linear function of I.
is not a function of the elements of E, then it is
given by
(2.2.4)
a
p
=
E c'(D'D )-D'x
s=l -s
s s
s-s
i. e. , the sum of the BLUE's of c'~
--s-"'"S
obtained separately for each s
from the univariate model for Valone.
s
Proof:
Since the BLUE is independent of E, it must be given by the
BLUE obtained when
E = I.
This is clearly (2.2.4).
q.e.d.
~
47
,•
Lemma 2.2.3:
and if
Proof:
~(t x
If
~(t x
and
1)
1)
~(q x
and
~(q x
1)
are arbitrary non-stochastic vectors,
1)
are arbitrary random variables
This is a well known result which can be proved by direct
computation.
Lemma 2.2.4:
If for the MGlli model
are any two linear functions of
(2.2.5)
Cov(f'~,
e.
Proof:
d'x)=.
~ f'd a
s';l -s-s ss
f'x
s=l -s-s
and d'x =
p
L:
d'x
s=l -s-s
then
P
Q ( N x N)
rs r
s
where
~,
f'x =
P
L:
P
tL: .f'Q
r
s
<
-r
p
+ n -s
d 'Q f a
rs-s rs
sr-r rs
r > s
d a
is defined by (1.4.5).
From the variate-wise represent at ion of the MGUI model a5 given
= a 5S I N
in (1.4.6), we know that Var(x )
-5
Cov(x , x )
-r -s
= a r s Qr 5 '
r < sand
s
Cov(x, x )
-r -s
= a r s Q'sr , r > 5 .
= -r
and z = x
f , S =d , w =x
-s
-r
-s
Thus, using Lemma 2.2. 3 where a
we get
Cov(f'x , d'x )
-r-r
f' Cov(x , x )d
-r
-r -s-s
-s-s
f'da
ifr=s
-s-s ss'
=
f'Q d a
if r<s
-r rs-s rs'
f '0' d a
if r> s .
-r 'sr-s rs'
Using Lemma 2.2. 3 again where
·e
and
z
= (dl'x
, " ' , d'x )'
- -1
-p-p
a
= -p
1 =
we obtain
_3, w
=
(f1'x , " ' , f 'x )'
- -1
-~
48
p
P
L dlx)
= Cov( L fix
-s-s' s=l -s-s
s=l
COV<'f.I~, ~I~)
=
p
P
L
L
Cov(f'x , dlx )
-r-r -s-s
r=l s=l
P
I:
=
s=1
fld cr
~--s
ss
+
p
LI:
p
flQ d cr
+ LL -s
d'Q f cr
.
sr-r rs
r<s -r rs-s rs
r>s
q.e.d.
Corollary 2.2.3:
If
e
P
I:
cr
h'I has a BLUE of the form e =
gl~
where
is known, then
l
(2.2.6) Var(6) =
s=1
P
I:
5=1
e
+2H cr c ' (D' D 5"D'Q I D (D' D ) c
-s
rs-s s s s rs r r r -r
r<s
s s
cl(D'D )-D I
where ~ = -s
5 s
s
~
p
Var
ss-s
P
(D 'D ) - c
From Theorem 2.2.2 we know that
Proof:
=
C I
is defined by (1.4.5).
where Q
rs
e
=
=
e.
is given by
Clearly since
p
Cov( L
s=l
= -5
f
=
e
~,
~),
I:
we can use (2.2.5) of Lemma 2.2.4 with
5=1
to obtain the result (2.2.6).
d
~
q.e.d.
We now prove a
which will enable us to give necessary
l~nma
and sufficient conditions for
e = h'I
to have a unique BLUE in terms
of known linear functions of x.
Ler:una 2.2. 5 :
and
Suppose
p
~
~IX
""~s'
s=l
e = h'I =
where
2'
~
P
L c'e;
-s--s
5=1
is piecewise estimable
=c'(DID )-D'
sss
s
whenever d' x
=
, is the unique BLUE of 6.
P
L: d'x
sal ~
satisf ies
49
E(~'~)
i
= 0 for all
(2.2.7)
if and only if
~'Q d
""""r rs--s
= 0,
r < s
d'Q g
-s sr-'l.r
= 0,
r
>
s
E(~l~)
=
° for all I
and
(2.2.8)
both hold whenever
From Lemma 2.2.4 we know that
Proof:
p
Cov(£'~, ~'~)=
for any .8.
and d
if dID = 0
-s s
.~
.
g'd 0
+ H .8;Q r s--sd0rs + H d IQ [0
s=l -s-s ss
r<s
r>s -s sr r rs
L:
'd
=
0
po
d
'Q
0
-'l.r rs-s rs
whenever
Cov(.8.'~' ~'~)
+
H
r>s
e~ch
and only if
L: d' D ;
s=l -s s--s
= 0
=
= O.
d'Q 0" 0
-s sr-'l.r rs
E(~'~)
0
i f and only
if and only if d'
-s
Therefore, we must have
for each s
less of whether
r<s
P
=
for each s. Now since ~' = c'(D'D )-D'
--s
-s s s
s
for each s.
~
E(~'x)
for each s, i.e.,
gonal to V(D')
s
.
1..e.,
E(~'~)
Furthermore,
~
.a.g,
is ortho-
we must have
orthogonal to -s
d ,
= 0 and this is so regard-
Thus we need only show
= 0 whenever -s
dID = 0 for each s if
s
term in the summation is zero whenever dID =
-s s
°
for
each s.
Sufficiency follows trivially.
As for necessity, we note first
that, by assumption, the result should be true regardless of the actual
values of Z
we fix
5
€
E(d'x
-s-s ) = 0
=
«0
rs
))
provided
is positive definite.
L:
{l, •.. , p} and choose d
-s
to satisfy dID
-5
5
=
O.
Now suppose
Clearly
so that by the assumption of necessity. we must have
50
p
Cov( [ g'X
d'x)
--c-t' -s-s
t=l
0
s-l
p
[ g'Q d cr
+ [ d'Q ~ cr
= 0 .
t=l --c ts-s ts t=s+l-s st~t ts
L e. ,
Let us now choose r
{l, ... , p}
£
such that r
s
<
and then choose [
so that its diagonal elements are all non-zero and its off diagonal
elements are all zero except for
ocr
ss rr
-0
2
This [
~O
sr
0
=
rs
0
sr
which satisfies·
is positive definite and the above equation
p
reduces to Cov( [
t=l
~'x
d'x) = g'Q
d cr
-r rs-s .rs
~t-t' ~
= O.
Thus for any r < s
we can find a [
so that g'Q
regardless of [
we therefore must always have g'Q
-s s
d'Q g
-s sr-r
-r rs-s
= 0'
Similarly, for fixed s
r >s
range from 1
Since the result must hold true
d
-r rs-s
= O.
whenever dID
d
= O.
.
always, whenever dID
-s s
to p
= 0
'
r
<
s
we can show that
= O.
By allowing s
to
we thus get the desired result, Le., (2.2.7) and
(2.2.8) are satisfied.
q.e.d.
Theorem 2.2.4. (Main Theorem of BLUE Estimation):
let
g'
-os
for
e = .!!,'f
=
p
[
c'~
s=l -s-s
be piecewise estimable and let
= -s
c'(D'D )-D' , s=l, ... ,po
s s
s
.8..'~ =
p
[
~
Then necessary and sufficient conditions
to be the unique BLUE for
s=l
linear functions of x
(2.2.9)
C1 '
~
Q
rs
are given by
V (D ')
E:
s
,
r < s
and
(2.2.10)
g' Q'
--r
sr
E:
For the MGLM model
V (D ')
s
, r > s
e
in terms of
kno"~
51
where Q (N x N ), r
r5 r
5
5
<
i5 defined by (1.4.5).
p
Proof:
~
From Lemma 2.2.2 and Theorem 2.2.2 we know that
~
i5
5=1
e
the unique BLUE for
whenever dID = 0
-s s
for each s.
g IX
[ dlx) = 0
.w.s-s ' 5=1 -s-s
s=l
only if glQ d = 0, r
-r rs-s
for each s.
:
glX
~
d'x)
5=1 .w.s-s ' 5=1 -s-s
<
=0
From Lemma 2.2.5 we know that
o
sand dlQ g = 0, r
-s sr-r
Clearly £;Q~~ ~ 0, r
<
s
for each s
>
5
>
s
if and
whenever dID =
-s s
~henever ~Ds
glQ' d = 0, r
-r sr-s
valent to (2.2.9) and dlQ 9
-s sr"'r
.
~
whenever d'D
-ss
~
p
P
p
p
Cov(
if and only if Cov(
=
a
° is equi-
whenever dID = 0
-s s
is equivalent to (2.2.10) .
q.e.d.
An example of a
MGL~
model which is not one of the special cases
considered by Srivastava and for which the conditions of Theorem 2.2.4
are satisfied is described as follows:
Let
and consider the expectation model
E[Y1 (12 x 2)J = [All S-l
A12 S-2]
E[Y (8 x 2)] =
2
[A 2l S-1
A22 ~3]
E[Y (8 x 2)]
3
[A 3l £2
A32
.i.3 ]
where
All l12 x 6)
..-
.-
= .4 ~ ~
-4
~
~ .Q 4
-4
~
-4
0,
~ ~ ~
=-"I
A (12 x6) =
12
~
.4
~ ~
-4
4
~
~
4
~ ~
~ ~
4
~ ~
~
.4
52
A (8 x 6)
21
x
A31 (8 6)
=
[~
-4
~
-4
~
~ ~
4
~
-4
~l
Qq
A (8 x 6)
22
x
A3Z (8 6)
and D
3
=
~
~ ~
94 :
~
-4
~
~
~
J
J
-4
-4 -4
are given by
DZ· r1ZJ ,D 3= rzzJ
fAll)'
-4
= [~ -4 Qq Qq ~
-4 Qq- Qq Qq -4_
The matrices D , D
2
1
D
1
r~
~ ~
_.~
= [~ ~ Qq Qq -4 ~],
~
=
A
3l
_A 2l
.
A
32
A piecewise estimable linear function of the treatment parameters can
e = £{ I l +
be given by
£2
I2 +
I3
£3
£{ =
(2, 0, -3, 2, -3, 0)
.£.2 =
(3, -2, 0, 3, 0, -2)
£) =
(0, 0, 0, 0, 0, 0).
where
Note that we can write
c' = a'D
where a' =
c' = a'D
-2 2
where ~2
= (Q4 '
c' = a'D
-3 3
-3
where ~3
= -16
0'
-1
-1 1
-2
-1
(1:.2 ' Q{4'
-1 1:..3 ' 0)
.!.3 ' Q9'
-1
1:.2 '
0' )
-2
After some calculations we will obtain
£{
=
£i(Di Dl) D'1
.&2
=
1
£Z(D D2 ) D'2 = 4
Z
= -14
(2
14,
(Q4 '
3
Q4,
14,
0'
~
Q4,
Q4, Q4,
-3 1')
-=-4
-2 I')
-=-4
53
Q12(20
=
20)
x
fI
12
°8,8
_°8,12
and
Q23(16
16)
x
Q (20
13
Ol2.8J '
[OS.8
°8.8
1
°8,8
8
x
16)
= [012.8
1
J
012 ,8]
°8,8
8
Thus we have
g' Q
12
;::
1
£2 Qiz
and
;::
(t~, Q4, Q4, Q4, Q4)
l' Q4' .Q;4, Q4)
(Q4, 4"3 ~'
£i
Q ;::
13
3 l' Q4, Q4)
(Q4 ' -4=-4'
£2
Q ;::
Z3
(Q4 ' Q4, .Q4,
£j Qb
;::
.8.j Q23
;::
Qio
Qi6
It is easy to see that £iQ1Z
and .8.2Q23
€
V(Dj)
..: 1 1 ')
Z~
€
V(D Z)
, .8.z Qiz
€
V(Di) , £i Q13
€
V(Dj)
since we can write
£i Q1Z ;:: -f'D'Z where f'
;::
-h'D'1
where h'
;::
a'D'
- 3
where a'
;::
;:: b'D'
3
where b'
;::
.8.
2 Qiz
;::
£i Q13
;::
1
(0, 1, 1, 0, 0,-1)
2
-3 (1, 1, 0, -1, 0, 0)
4
3
-4
-
(0, 0, 1, 0, 0, 0)
1
2 (0, 0, 0, 0, 1, 0) .
Thus, for this example, the conditions of Theorem 2.Z.4 are satisfied
and we may conclude that the BLUE of the chosen
£i~l
+
.802~2+ £j~3
where £1 ,£Z
and £3
e is given by
are given above.
1
lIt can be established later from Theorem 2.Z.10 that for this
model every estimable function will have a BLUE which does not depend
on the elements of Z.
54
It should be noted here that if conditions (2.2.9) and (2.2.10)
are not satisfied for some (r,s)
variance of
a'~
pair, then Theorem 2.2.4 tells us the
is not minimum in terms of known functions of x.
Theorem 2.2.4 does not contradict the well known result that the sum of
minimum variance unbiased estimators is the minimum variance unbiased
estimator of the sum 'of the expected values of each of the estimators.
Whereas c'(D'D )-D'x
-s s s
s-s
is the BLUE for c'~
-s--s
based on only those obser-
vat ions on V , it is not necessarily the BLUE for c'~
s
-s---s
based on all
the observations in the enSire experiment unless the appropriate conditions of Theorem 2.2.4 are satisfied.
This might seem startling at
first, but we can see that this result makes intuitive sense if, for
example, in our experiment p=2, VI
represents height, V represents
2
weight, and there are missing observations on height.
Assuming height
and weight are correlated, then those weight measurements corresponding
to missing heights would provide some additional information for estimating any estimable function of heights alone.
Thus we might not be
surprised if some function of the height measurements alone is not as
good an estimate as a function of both heights and weights.
from Theorem 2.2.1 the actual BLUE for
function involving E.
Cl~
-s--s
Of course,
would be given by a linear
It is thus clear that Theorem 2.2.1 together
with Theorem 2.2.4 suggests that linear estimation might not be as
worthwhile as nonlinear estimation involving estimates of E.
dealt
with in the
na~t
chapter on
B~~
This is
estimation.
Furthermore, note that it is incorrect to claim that the BLUE
of
c~;s
for the entire experiment can always be obtained by taking
tne wei6hted avera~e of the BLI:E' s for ~
obtained from each
e·
55
independent set (e.g., Sj)
ple, if for Zl (n j
1)
x
and ZZ(n j
satisfies E(Z) = A~s' Var(z)
Clearly, for exam-
1), we have that Z =
(Z{ Z2)'
= a ss I nl+n? , A = [A l' : Al'] , , then the
where
-s-s
2 -
= (AlA + AZA ) -Aiz + (AlA + A AZ) AZZ
l
l
Z
Z
l
-
and not in general by c'C;
-s--s
n
Is
x
units.
is given by c'';
BLUE of an estimable -s--s
c'C;
is
ex~erimental
of
=
n
l
l
+ n
Z
where
(AiAl)
n
A1Zl
+
n
l
Z
+ n
-
Z
(A~AZ) AZZZ
:.$
Thus one cannot claim that for the GIH model the BLUE for c'C;
is
-s--s
u
c 'c;
-s-'-S
n,
~ -.J. (A ~ A. ) -A ~ Y . b ( . )
-s j=l Ns
J J
J J- J s
= c'
where we are using the notation given by (1.4.1.4) and b('.)
- J
to be the s-th row of B..
J
p
~
c't,;
s= 1 -s"""':S
is the BLUE for
S
is defined
Similarly, it is incorrect to claim that
P
~
c'C; , where c'C;
s=l -s--s
-s-s
is the weighted average
estimate for each s.
Continuing with our theory, it is easy to see that Theorem Z.2.4
incorporates the results obtained by Srivastava [60] for the HIM, GIM
and
MD~1
models separately.
For the HIH and
~IDM
models the theorem
simplifies considerably:
Corollary Z.2.5. (Srivastava [60]):
For the HE-I model as given in
p
Section 1.4.1, let
.e
e
.hli =
Z
etc
s=l-s~
be estimable and gl = c'(DID )-D'
-S
-s s s
s
p
s=l, ... , p .
Then necessary and sufficient conditions for
a'~
=
56
e
to be the unique BLUE for
are given by
(2.2.11)
where
the (N
=
~(s)
x
s
1)
vector given by the last N
s
of £r ' r
the (N s
Proof:
x
>
s
vector given by CQ~ -N
s r
For the HIM model the matrix Q (N x N) r
rs r
s'
]'.
Qrs =[0 N ,N -N .,:1N
s r s
s
r
s
<
1)
elements
= -r
g'
, r
(s)
Thus g'Q
-r rs
<
<
s
~]',
r > s
s, is given by
and &;Q~r = ~(s)'
by direct computation and the result follows from Theorem 2.2.4.
q.e.d.
Corollary 2.2.6. (Srivastava [60]):
g'
-s
p
e=
(1.4.1.1) let
= -s
c'(A'A )-A '
s s
s
h'~
=
r
c'~
s=l -s--s
, s=l, ... ,po
be estimable and let
Then necessary and sufficient conditions
p
for
£'~
=
r
~.
For the MDM model as given in
~s
to be the unique BLUE for
e
are given by
s=l
p
(2.2.12)
Proof:
S;Q rs
.£ t
r
£
f\ V (A '), r=l, •.• , p .
s=l
S
For the MDM model the matrix Qrs(n
=~ ,
r < s
from Theorem 2.2.4.
and ~Q~r
=~ ,
r > s
x
n) = In ' r
<
s.
Thus
and the result follows
q.e.d.
It is important to note that whereas Theorem 2.2.4 provides
necessary and sufficient conditions for a given estimable function
£'i
to have a BLUE independent of the elements of E, we may want to find
57
necessary and sufficient conditions for every estimable function
to have a BLUE independent of the elements of E.
restrictive requirement than the former.
The latter is a more
Srivastava [60] has considered
this problem for the HIM, GIM and MDM models.
directly to the MGLM model.
~'I
His results extend
Before presenting them, we make the fo1-
lowing definitions: .
Definition 2.2.3. (Srivastava [60]):
An orthogonal multi-response
design (OMD) is a design (i.e., model) in which every piecewise estimap
ble function
e = ~'I = E c'~
has a BLUE which does not depend on
s=l-s---5
the elements of E.
Definition 2.2.4:
For the MGLM model., let Drs (N rs. x mr )
the matrix consisting of those rows of D
r
'. >
••
rls
denote
which correspond to those
•
sets of experimental units in which both variates V
r
measured and let D* «N - N )
rs
r
rs
x
m)
r
and V
s
are
be the matrix consisting of the
remaining rows.
The following theorems due to Srivastava [60], give necessary
and suff icient conditions for the HDl1, HIH and GIM models to be OMD:
Theorem 2.2.7. (Srivastava [60]):
that a
~IDM
design is
spaces; i. e. , VeAl)
1
.e
O~ID
A necessary and sufficient condition
is that AI' ... , A
= V(A')
= ... = VeAl)
2
p
P
have identical column
2
2We note that these conditions are satisfied for the fractional
factorial design of Srivastava and the Latin Squares design of Gourlay
as described in Section 1.3. A design for which AlA = 0, rfs, however,
r s
will not be mID .
58
Theorem 2.2.8. (Srivastava [60J):
A necessary and sufficient condition
that a GU1 design is OMD is R(D ) = R(D ) + R(D* ) , r:!'s , r,s=l, .•. ,po
r
rs
rs
Corollary 2.2.9. -(Srivastava [60]):
An HIM design is OMD i f and only
P
i f R(A
l
A'
2
:
.
A')
p
= E R(A. )
~
i=l
Theorems 2.2.7 and 2.2.8 can be directly extended to the MGLM
model using Srivastava's method of proof to obtain the following
theorem:
Theorem 2.2;10:
Necessary and sufficient conditions for a MGLM design
to be OMD are
R(D ) = R(D ) + R(D* )
r
LS
rs
(2.2.13 )
and
(2.2.14)
V(D' ) = V(D' )
rs
r,s
r:!'s;
2.3
sr
= 1 ,. ... ,p. 3
H'~
BLUE Estimation of Linear Sets
We now extend the results obtained for linear functions to
p
linear sets of the form
e
= H'r., =
E
s=l
C'~
s---s
as given in (2.1.2).
We
p
seek estimators of the form
e
= F'x =
E
s=l
Definition 2.3.1:
H'~
=
For the
MGL~
F'x
s-s
as given in (2.1.4).
model a linear set of the form
P
L
s=l
C'
~
A
is said to be a piecewise estimable set if for each s
each component of C'r
s2-s
has a BLUE under the univariate model for V
s
3 Note that for the GIH model D
= Dsr ,r:Fs whereas for the
MGLN model, in which different respons~Svariates have different design
matrices, we may have D ~ D
,r:Fs.
LS
sr
e
59
alone.
4
Lemma 2.3.1. (Conditions for the Estimability of H'i):
H'i is estimable
if and only if it is a piecewise estimable set.
Proof:
This is a direct extension of Lemma 2.3.1.
For the MGLM model, ~"
Definition 2.3.2:
A
will.be called a BLUE
= H'i if F'x is known and unbiased for H'i and if whenever
set for e
-e = G'x
= F'x
is known and unbiased
= Var(~)
-
Var(~)
for H'i
we must have
is non-negative definite.
It is well known Je.g. , Rao [41)) ·that for any least squares
model of the form (1.4.8), the components of any BLUE set must be BLUE's
of their respective expected values.
for e
and if e
ch
~
is any other known estimable set for e
[Var(e)]~
max
Furthermore, if
-
ch
we must have
-
max
[Var(e)]
-
-
tr(Var(~)]
is a BLUE set
< tr(Var(~)]
,
and
I
Var <.~)
<
- I.
Var(~)
Thus a BLUE set is a "good" estimate in the sense of minimization of
the maximum root, the trace and the generalized variance.
Theorem 2.3.1:
For the MGLM model,
p
let
e = H's. =
[
s=l
C'E;
s-'-s
be estimable.
If L
is known then H'S. has a
unique BLUE set given by
4
This is equivalent to requiring that each of the rows of C'
s
belong to V(D ) for each s.
s
60
(2.3.1)
whose variance-covariance matrix is given by
(2.3.2)
This follows directly from Theorem 2.2.1 since each component
Proof:
of
e
e: .
is the BLUE of its corresponding component of
p
Theorem 2.3.2:
e
For the MGLM model, let
r
= H'£ =
be estimable
s=l
and let G' = C'(D'D )-D' , s=l, •.• ,p.
s
s s s
s
Then necessary and sufficient
p
conditions for G'x =
r
s=l
G'x
to be the unique BLUE sei for
s-s
e
in
terms of known linear sets are given by:
(2.3.3)
each row of G'Q
r rs
€
V(D') , r
< sand
(2.3.4)
each row of G'Q'
r sr
€
V(D') , r
> s
Proof:
s
s
_
..
where
From Theorem 2.2.4 for linear functions it is clear that each
component of G'x
is a BLUE for its corresponding component of H'f
and only if (2.3.3) and (2.3.4) both hold.
if
But we know that G'x
is a BLUE set if and only if each component of G'x
is a BLUE for its
expected value.
q.e.d.
p
Corollary 2.3.3:
If an estimable set
e
= H'f =
r
C'';
s=l
BLUE set 8 = G'x
~
has a unique
in terms of known linear sets, then
P
I
S& 1
cr S5 C'(D'D)
S
S S
C+ 2
5
rrcr
C'(D'D )-D'Q
r<s rs r
r r
D (DID) C
r rs s
S
s
S
61
where Q
rs
Proof:
e
This follows from direct calculation of
P
=
is given by (1.4.5).
1:
s=l
G'x
Var(~)
using
q.e.d.
as given in Theorem 2.3.2.
s-s
It is obvious that
and only if each D
s
(i.e., H' £
when H = 1 )
M
is estimable if
is of full rank, and in this case an unbiased esti-
_~
mator is given by
£
=
(~I
~)
-1
I
where
-1
~
-'-S
(2.3.6)
= (DID) D'x , s=l, ... ,p .
s s
s-s
Each element of -s
~
is the BLUE of the corresponding element of
~
-'-S
,
s=l, ... ,p, based on the observations on Valone.
s
We then have the following corollary:
Corollary" 2.3.4:
£
=
(£{
Suppose D
s
£;)'
is of full rank for each s.
is a BLUE set for
£
where
~
Then
, s=l, ... ,p
is given by
(2.3.6) if and only if (2.3.3) and (2.3.4) hold where
G I = [G I
:
l'
GI
2
G I] =
P
o
o
-1
(DID) D'
P P
P.
Unbiased and Consistent Estimation of Z
2.4
Lemma 2.4.1:
Let Y1
and Y
2
be (n x 1)
random vectors and let
Suppose that Var(Z) = I
f(2
y
2) •
«Y i i »
..J
is positive definite.
n
~
r
where
Then for any non-stochastic
62
matrix G(n
Proof:
Let
n)
x
v
"-1
we have
=
y
11
Yln.
Y2n -
Then
E(1..iG1.2) =
n
n
E
E
i=l j=l
gijE(YliY2j)
}
=
n
n
E
E
i=l j=l
g 1J
.. [Cov(Yl"1
n
=
E
i=l
Y2j ) + E(Y 1i )E(y 2j )]
n
E gijE(Yli)E(Y2j)
+
i=l j=l
n
gii Y 12
E
since Var(Z)
= In
~
r
by assumption
= y tr G + E(1.{) G E(1. )
12
2
q.e.d.
Theorem 2.4.1:
model is given by
An unbiased estimator
E
« 0 rs ))
of L
for the MGLM
5
1
(2.4.1) o
x 1 [1 N - Dr (D'D
rr = N - R(D) -r
r r )-D']x
r-r ,r=l ..... p
r
r
r
and
(2.4.2) o
rs
= _1_ x' [I
- D (D' D )-D' ][I
- D (D' D )-D']x
N
~
-rs N
rs rs rs
rs
sr sr sr
sr-sr
rs
rs
rs
where
SIt is shown in Chapter Five that the Z of Theorem 2.4.1 is
not necessarily P9sitive definite. The implications of this on estimation of i using L and on testing hypotheses about linear sets H'i are
discussed in Section 5.3 of Chapter Five.
63
(2.4.3)
fl
= tr[I N rs
rs
N (~2)
r
D
rs
(D'D
rs rs
)-D' ][1
rs
- D
N
sr
rs
(D' D
sr sr
) -D' ]
sr
is the number of experimental units on which V
r
is observed,
N
rs
(~2)
is the number of experimental units on which both
V and V
r
x (N
-r r
are observed together,
s
is the vector of all observations on V
x 1)
r
x (N x 1) ,r#s
-rs rs
is the vector of observations on V
r
which correspond to experimental units on which both
D (N
r
and
Proof:
D
rs
V
r
an~
x
m)
r
(N
rs
¥s
is the design matrix' corresponding to x
-r
r
is the design matrix corresponding to x
-rs
x m )
r
=
For r#s , we have that if Zrs
= IN
Var(Z rs )
L
rs
=
~ L
rs
(~rs' ~r)
6
then
rs ,where
[:rr :rsJrs
Letting G
~ogether,
are observed
- D
rs
ss
(D' D
rs rs
) - D' ] [1
rs
N
rs
- D
sr
(D' D
sr sr
) -D'
sr
]
we may, use Lemma 2.3.1 to obtain
E(x' Gx )
-rs -sr
= cr rs fl rs + E(x'
)GE(x )
-rs
-sr
But
E(x' )GE(x )
-rs
-sr
.e
6
=
(D
:
rs~r
)'[1
N
rs
-D
rs
(D'D
rs rs
Note that in general x
, x
-rs
-sr
)-D'
rs
D
rs
](1
N
rs
, D
sr
-D
sr
(D'D
sr sr
)-D' ]D
sr
:
sr~
but N = N
r#s.
r s ' sr'
64
D (D' D )-D' D ~]
sr sr sr
sr s~
D (D' D )-D'] [D [
rs rs rs
rs
s~
- D (D' D )-D l ][D ~ - D t;]
rs rs rs
rs
s~
sr~
by virtue of the properties of
(D~rDsr)
= O.
Thus we have that
o
i.e. ,
rs
E(o
= E(__l__
~
rs
rs
Xl
-rS
= 0 rs
)
It is clear that E(o
rr
)
t
G
X
-sr
r~s
)
t
r~s.
= 0 rr
since
t
0
is the usual estimate of
rr
error obtained for the univariate model:
E(x
-r )
= Dr---r
~
,Var(x)
-r
= orrIN
r
q.e.d'It is important to note here that by requiring N ,2 t
rs
r~s,
Theorem 2.4.1 restricts the use of the MGLM model to situations for
which every pair of response variates is observed together for at
least one set (S.) of experimental units and for at least two units in
J..
total.
This is clearly not a severe restriction t however.
An alternative unbiased estimator of
Corollary 2.4.2:
the
HGL'1
(2.4.4)
where cr
rs'
r~s
for
model is given by
0
rr
0
rs
and cr
cr
cr
rr
=
0
rs + kl{o ss
0
ss
} + k 2{o rr
are given by (2.4.1), cr
ss
=
-
N
rs
rs
-
0
rr
}
is given by (2.4.2),
1
x' [I
- D (D' D ) - D' ]x
R(D rs) -rs N
rs rs rs
rs -rs
rs
-
1
Xl (I
,
- D (D' D ) D' ]x
ss • N
sr sr sr
sr -sr
- R(Dsr ) -sr N
rs
rs
65
and k , k
2
l
Proof:
-
It is obvious that E(a tt ) = E(a ) = Ott
tt
~.
= E(a rs )
E(a rs )
Thus
are arbitrary constants.
for t=r,s.
+ 0 + 0
= a rs
Corollary 2.4.3:
q.e.d.
For the
MG~!
defined by (2.4.1) and
model, the L
(2.4.2) is a consistent estimator of L
provided the following condi-
tions both hold:
N
(2.4.5)
s exis~s and is non-zero, s=l, ... ,p.
lim
n-+<>o n
and
N
(2.4.6)
e.
lim ~ exists and is non-zero, r#s .
n
n-+<:o
Proof:
Using the assumption of normality on the data for the MGLM
model, it is clear that for each r, a
rr
as given by (2.4.1) is the
usual pooled estimate of error based on the N
observations on V
s
s
alone which is known (e.g., Rao [41], Anderson [8]) to be consistent
for a
ss
as N
s
=.
~
(2.4.5), then a
ss
Clearly if N
s
~
= as
is consistent for a
ss
n
~
as n
= according
~
=
to condition
Similarly for r#s
a a s given by (2.4.2) is the usual pooled estimate of covariance
rs
based on the N
rs
observations on V
r
and V
s
together which is known
(e.g., Rao [41], Anderson [8]) to be consistent for cr rs
Clearly if N ~
rs
consistent for a
00
rs
as n
as Nrs ~
00.
~
00
according to condition (2.4.6), then a
as n
~
=.
rs
is
q.e.d.
66
We now use the L given in Theorem 2.4.1 to obtain an unbiased
estimator of the variance of the BLUE of an estimable linear function
of the treatment paranleters.
Corollary 2.4.4:
= c' (D'D ) D'
-s s 5
s
~
mable and let
MG~1
For the
e
model let
.
=
£'i
p
L
=
c'~
be esti-
s=l --s--s
Then an unbiased estimator of the
p
e
variance of
=
L
¢.s
is given by
5=1
P
= r
Var(e)
(2.4.7)
A
as
s=l
s-s-s +
g' g
2 Hog' Q g
r<s
r s-s r s-r
where
o
and
ss
0
rs
are obtained from (2.4.1) and (2.4.2),
respectively, of Theorem 2.4.1 and Q
,r
rs
<
s
is given by (1.4.5).
This is obvious from Corollary 2.2.3 and Theorem 2.4.1.
Proof:
We now apply the results of Theorem 2.4.1 and its corollaries
to the special cases of the MDI1, Gil1 and HIM models.
Corollary 2.4.5:
of L
For the MOM model, a consistent and unbiased estimate
is given by L = «ors»
(2.4.8)
ass = n -
(2.4.9)
=
-l-
where
, s=l, ... , p
v'[I
~rs ~
n
- A (A'A )-A'][I - A (A'A )-A']v ,r#s
s s s
s
n
r r r
r ~
and
U
rs
= tr[I n
Corollary 2.4.6:
of
=
- A (A'A )-A'][I - A (A'A )-A']
s s 5
S
n
r r r
r
For the GIM model, a consistent and unbiased estimate
is given by L
= «ors»
where
4It
67
(2.4.10)
and
(2.4.11)
o
rs
Corollary 2.4.7:
estimate of [
1
x' [I
N - R(D ) -rs N
rs
. rs
rs
=
D
rs
(D' D
rs rs
)-D' ]x
, r'fs.
rs-sr
7
For the HIM model a consistent and unbiased
is given by [=
«0
rs
))
where
0
ss
,s=l, ... ,?
is given
by (2.4.10) and
(2.4.12)
cr
(2.4.13)
0
7
~ote
8
x
-sr
= -5
x
rs
rs
1
-
1
-
=
x' [I - D (D'D ) D']Q' x , r
s s s
s rs-r
N - R(D ) -s N
s
s . :~ s
=
x' [I - D (D'D ) D']Q' x
N - R(D ) -r N
r r r
r sr-s
r
r
r
that D = D
, r'fs
rs
sr
Note that
for r
<
(J
s
rs
a
cr
sr
,
r
<
s
>
s
8
for the GIM model.
for r'fs
and also D
rs = Ds
x
-rs
= Q'rs-r
x
CHAPTER III
BAN ESTIMATION FOR THE MGLM AND GGCM MODELS
3.1
Introduction
We have seen in the previous chapter that for the MGLM model
the unique BLUE of any estimable linear function or linear set of the
treatDent parameters is given by a linear function (2.2.1) or linear
set (2.3.l)t respectively, which involves the unknown parameters of
the variance matrix E.
By restricting linear estimates to be known
functions which do not involve E we will obtain estimates whose
variances are not minimum unless certain additional conditions on the
model (given in Theorem 2.2.4 and
Theor~l
2.3.2) are satisfied.
We
have also seen in the first chapter that for the GCM model the unique
BLUE of any estimable linear function of the treatment parameters is
given by a linear function (1.2.1.4) of the data which involves the
unknown variance matrix.
These results naturally lead us to consider
nonlinear methods of estimation for the MGLM and GGCM models which
use estimates of E and which give variances that are t in large samples t
the minimum that could be achieved by linear estimators if E were
known.
In this chapter we consider for these models a method of estimation which generally yields nonlinear estimators whose variances are
minimum in large samples.
Estimators obtained by this method are
called Best Asymptotically
~ormal
(BAN) estimators.
BAN estimators and
e·
69
their limiting variances may sometimes, as with many linear models, be
determined without knowing the likelihood function for the data of the
experiment.
In general, however, the latter is needed.
Specifically,
we define a BAN estimator as follows:
Definition 3.1.1:
An estimator
~(u
x 1) . based on a sample of n
observations is said to be a BAN estimator for the parameter
= (6 1 , " ' , 6u )'
Q.(u x 1)
provided
~ ( -6
- 6 » ~
-n.:-o
.-:t..,( In B
(3.1.1)
n
.\
N
(0 , I )
u --u
u
where
B (u
(3.1.2)
n
x
u)
= Fisher's
e··
Information Matrix
1 ;,2 log 1>n
n
;, 6 2
e=
e
is the true value of Q. ,
~
't'n
is the likelihood function for the sample,
-0
and matrix differentiation is defined in the Appendix.
It is well known (e. g .• Rao [41] ) that in general, i f
1
positive definite, then n
B
-1
n
1
-1
n
n
estimate of
e.
-n
•
-
-B
1
-1
Var(a*) - - B
-n
n
n
is
is the multiparameter Cramer-Rao lower
bound for the variance of an unbiased estimate of
Var(e")
B
is non-negative definite whenever
Furthermore, i f B-1
n
~
B
-1
is non-negative definite.
as n
~
~,
e'~
-n
0:>
i. e. ,
is any unbiased
then
It is in this sense that
we consider a BAN estimator to have a minimum limiting variance
70
It is also well known (e.g., Cramer [17], Rao [41]) that when
the data for an experiment consists of independent and identically
distributed (i.i.d.) random variables, maximum likelihood estimators
(MLE's) are BAN provided the underlying density function satisfies
certain regularity conditions.
In this case, if we let -xl' •.. , -n
x
denote the data and p(x, !) denote the density function where !(u x 1)
is an unknown parameter, then
<P
n
2
n
IT
=
p(x., !) and
i=l
-~
-Bti(u x u)
=
E
a
8
-0
p(x,
e)
ae 2
e = -0
e
-
J
•
Now in many experiments (e.g., the 8M and the MGLM models) the
original data are independent but not identically distributed.
Never-
theless, it may often be shown for such experiments that the solution
~
to the MLE equations or any other statistic based on the data is BAN
merely be calculating its limiting distribution.
This can be done for
many simply structured statistics and for iterative estimators based
initially on BAN estimators.
For the 8H model, for example, it is well
known that by transforming the original data of (1.1.1) to X
we obtain i.i.d. data.
Thus the MLE of L
is BAN.
=Y-
A~,
Furthermore, irre-
spective of this, the solution of the original untransformed maximum
likelihood equations for;
when estimable is given by
-1
(A'A)
A'Y, which is clearly normally distributed (if the rows of Y
are normal), BLUE and therefore BAN.
from Theorem 2.3.1 that if n
set!
For the MGLM model it is clear
is known, then the MLE of an estimable
= H'f is given by the weighted least squares estimate (using the
vector version notation)
~
71
(3.1.3)
This is clearly normally distributed, since x is assumed normal, and
efficient (i.e., minimum variance).
If n
is unknown but can be
estimated consistently by n, it is also clear that the restricted MLE
of 8
obtained by replacing n
efficient and normal, since n
by n
+
p
is estimated by n(L), where L
L, then the restricted MLE of 8
n.
in (3.1.3) is asymptotically
If,
furthermore,~n
is unknown and
is the solution to the MLE equations for
using n(L)
is asymptotically normal
and efficient since n(L) is consistent from general MLE theory.
In this chapter, ~e describe for the MGLM and GGCM models a
number of different methods for obtaining BAN estimators which use
different consistent estimators of n
in (3.1.3).
from above that all these estimators are
Whereas it is clear
asymptotically equivalent,
their small sample properties are different and ought to be considered
whenever we are not completely sure that the sample size is large.
We
will later describe a computer study which has dealt with such small
sample comparisons.
variance matrix for
We begin this chapter by deriving the limiting
B&~
estimators of estimable sets
I
and
H'I.
This
matrix must be obtained for the calculation of the Wald Statistic
described in Chapter Four and used for testing hypotheses.
The limiting
variance matrix solution is fairly obvious when using the vector version
notation.
Nevertheless, it is of considerable practical utility because
of the unwieldy size of n
to express this result in terms of the
original notation (involving much smaller matrices) given in (1.4.1).
This can be done with some effort using a number of valuable matrix
differentiation rules.
We then describe several BAN estimators for
2
72
(when estimable)
which have been proposed in the literature for the
GIM and MDM models.
~
In addition, we give an extension of some results
of Bhargava [13] using a method of Anderson [8] to obtain solutions to
~
the MLE equations for
in the HIM model.
A BAN estimator for H'I
for the MGLM model is then proposed and two methods of iteration which
use this estimator in.itially are described.· Finally ,we consider BAN
estimation of
3.2
H'~
for the GGCM model.
Derivation of the ASymptotic Variance Matrix .
of a BAN Estimator for the MGLM Model
Recalling the notation of (1.4.1) to (1.4.8) defining the MGLM
model, we can write its likelihood function
~
in two different ways:
n
(3.2.l)
where N, D, nand
i
are defined from the vector-version model
(1.4.8), or
(3.2.2)
log ~
=-
u n,q.
2: -Lllog 2'IT -
j=l 2
n
1
u
- 2" 2:
j=l
tr[ B ~ LB . ]
J
u
2:
n.
f
j=l
log IB~2:B.1
J
J
-1
[Y. - E(Y , ) ] I [Y. - E(Y . ) ]
J
J
J
J
J
using the notation of (1.4.1), where
£(y ,)
J
=
[A,"
f;
J "'j 1-.e. j 1
To obtain the information matrix for
i
as given in (3.1.4) we will
use four of the matrix differentiation rules described in the Appendix.
These are:
(3.2.3)
••
73
0
(3.2.4)
-BlJ
oJ:!.. 0
(3.2.5)
oA
= B' ,
tr U-1P'P
where U
-2B'PU- 1M'
=
= M'fM
and P
=X
- BAM
and
3
aA AAB = B'
(3.2.6)
~
A'
Now using (3.2.3) on (3.2.1) we obtain
(3.2.7)
We then use (3.2.4) on (3.2.7) to obtain
2
3 log ~n _ _
~
o
(3.2.8)
o!:...z
[0
log ~~n]
- of:..
of:..
= -D'n- 1 D
Thus the information matrix for; in the MGLM model is given by
(3.2.9)
where
n
is the true variance-covariance matrix.
The asymptotic
variance matrix for; in the HGLM model is then given by
(3.2.10)
provided D
is of full rank and
n
is positive definite.
In terms of
the notation used in the original definition of the MGLM model as given
in (1.4.1), the
infor~ation
can, nevertheless, obtain B
matrix B
n,f:..
c
n,~
is not easy to present.
We
in the original notation using the
likelihood function in (3.2.2) for the nicer cases of the GIM and MOM
models and then inferring from the information matrices so obtained the
•
information matrix for..2.
in the MGU1 model.
The likelihood function for the GIM model is given by:
74
log ep
(3.2.11:)
n
n.q.
u
=- E
~ J log 21T -
j=l
1
- -2
.;
--p
u
u
E
j=l
tr[B~EB.]
J
-1
J
n.
-t- log IBJ~ EB.J I
E
j=l
[Yo - A.';B.J'[Y. - A.';B.] ,
J
J J
J
J J
).
Using (3.2.5) on (3.2.11) we obtain
=
-1
u
L
j=l
A~(Y
-
J
A.';B.)(B~EB.)
J
J
J
-1
u
= E
j=l
(3.2.12)
J
J
J
-1
u
A~Y(B~EB.)
J
B~
B~
J
-
J
E
j=l
A~A.';B.(B~EB.)
J J
J
J
J
B!
J
•.. .1
We then use (3.2.6) on (3.2.12) to obtain
-1
(3.2.13)
[B.(B~EB.)
J
J
Thus the information matrix for .;
u
1
Bn,_
.;(GIM) = -n E [B.
J
j=l
(3.2.14)
J
B~ 8 A~A.]
J J
J
for the GIM model is given by
-1
(B~EB.)
J
J
B~
J
8 A~A.]
J J
The likelihood function for the MDH mod e1 can be given by
(3.2.15)
log
<P
n =-¥ log 2rr -
Letting Y = [Y ...
l
[Y -
E(Y)]' [Y
Yp]
E(Y)]
= «a
rs
)) .
t
trE-l[Y-E'(Y)] '[Y-E(Y)]
is given by (z.r - ArS-r ) , (1 s - AJ-s)
p
1
10giEI -
it is easy to see that the (r,s)-th element of
trE-l[y - E(Y)] '[Y - E(Y)] =
where E-
¥-
so that
p
E
Ears (y - A E; )' (y - A E; )
~
r-r
"-S
S-'-S
r=l s=l
Now using (3.2.3) on (3.2.15) we obtain
,
75
=-
1"f [Pr P cr rs
a
-2 -"-
i:
i:
= 1 5= 1
t
(y
'-I'
- A ~ ) I (y
r-r
"-S
- A ~ )~
s--s
P
-2 i:
5=1
S:ft
=
(3.2.16)
P
ts
i: cr A'(y s=l
t "-S
A~')
•
s-s
Using (3.2.4) on (3.2.16) we obtain
(3.2.17)
so that
(3.2.18)
e.
Thus the information matrix for
B
(3.2.19)
_(HDM) =
n,~
1
n
in the MOM model is given by
~
«crrsA'A ))
r s
From (3.2.14) and (3.2.19) the information matrix for the MGLM
model can be easily inferred.
The result is given in the following
theorem:
Theorem 3.2.1:
For the MGLM model the information matrix for
f as
defined by (3.1.4) is given by the
(M x H)
matrix:
u
(3.2.20)
=
1«
n
rs
i:
. 1
cr rs
t
~=
.
Al
rs~
t
.r
rs~
A
t
))
.5
rs~
where
D and
u
rs
[l
are given by the vector-version of the HGUI model,
is the number of sets S.
J
in which V
r
and V
5
are
76
observed together,
t
is the index of the i-th set S
.
t
rs~
V
At
~ t
rs 1 < t rs 2 < ••• < t rsu
~
u,
rs
is the design matrix for V
.r
rS1
rs
°t
rsi
and
and
are observed together where
s
1
in which both V
r
.
rS1
in set St
r
rsi'
-1
is the element of [B I
t
!:B
rsi
t
]
which corresponds to
rsi
the variate pair (V , V )
r
s
}
It is important to note that
V
r
and V , r1 S
s
o
rs
t
= osr
t
.
rS1
and that if variates
.
rS1
are not measured together in the entire experiment,
then we must replace
u
rs
A
!: ors AI
t
. t
,r t
.s
i=l
rS1 rS1
rS1
by Oem
r
x m )
s
in (3.2.20).
Corollary 3.2.2:
any
Bfu~
For the MGLM model, the limiting variance matrix of
estimator of
~
is given by
=
(3.2.21)
u rs
«!: ors
'=1
1
t
-1
A'
. t
rs~
.r
rs~
A
t
rS1.s
»
provided D is of full rank.
If we wish to consider instead of I
is not necessary to assume that D
limiting variance matrix.
BA~;
any estimable set HII, it
is of full rank to obtain a unique
We first need to extend the definition of a
estimator to functions of unknown parameters:
77
Definition 3.2.1:
.8.(~) =,
Suppose
(gl <.~), ... ,
vector function of an unknown parameter
~(u x
Let B 8 be the information matrix for
n,-o
~
6
gw<'~»'
1)
=
is a (w
(8 , ... ,
1
x
1)
eu )'
.
given by (3.1.2) and define
by
(u x w)
a
6 =
a
(~)
.8.
« a8J i -
=
a~
-
Then an estimator .8.(8)
-n
.8.(~)
1
~ (rn c~"2 [a ~
where
8
=
0
6\
e=
-
and
observations is said
provided
- .8. (~) J) -+
~ w(.9w'
I w)
is the true value of 8 ,
-0
6
»
based on a sample of n
to be a BAN estimator for
(3.2.22)
g. (8)
= 6'
C
n
0
8
-0
B-
n,8
-0
6
0
is symmetric, positive definite and of
full rank w
Rao [41], page 265, has shown that if f(8*)
- -n
of ~0(8), then Var(f(8*»
--n
1
- -n Cn
is any unbiased estimator
is non-negative definite.
this sense, then, that we consider a
Bfu~
-
estimator .8.(8)
-n
It is in
to have mini-
mum variance.
Now consider for the HGLM model an estimable linear set H'i
where H'(w x M)
38then 6 =
B
n,i
df
H
is of full rank w.
-
If we let B
n,f
Clearly, if we let a(i)
= H'i
,
denote any generalized inverse of
(where we assume f' to be the true parameter), then using
78
Theorem A.l.l. (iii) of the Appendix we have
=
C
n
~'B
~~
n,s..
= H'B- H is unique,symmetric, positive
n,{
definite and of full rank.
Theorem 3.2.3:
For the MGLM model, the asymptotic variance matrix of
~stimable
any BAN estimator of.an
where
Thus we have the following theorem:
linear set H'l,
is of full rank w, is given by
H (M x w)
(3.2.23)
1
n
Note that - C
is the same as the variance matrix of the unique BLUE
n
set (2.2.1) for H'I
when n
is known.
We turn now to the GGCM model defined by (1.5.2), and we derive
the information matrix for I
H'I
where
H(~p x
w).
and the asymptotic variance matrix for
The likelihood function for the GGCM model is
given by
u
(3.2.24)
log
<p
n
n.q.
u
j=l
1
- -2
n.
L ~ log IB~LB. I
= -L ~ log 2rr -
j=l
J
J
-1
u
L
j=l
tr [B ~ LB. ]
J
J
[Y
j
- A~PB.] , [Y. - A~PB.]
J
J ..
J
.
Using (3.2.5) on (3.2.24) we obtain
a
(3.2.25)
log 4>n
de;
u
=
-1
A~(Y
L
j=l
J
-
A~PB.)(B~LB.)
J
J
J
B~P'
J
Then using (3.2.6) on (3.2.25) we obtain
(3.2.26)
a210g 4>n
u
-a-~""'2;---
= -L
-1
PB.(B~LB.)
j=l
3 2 10g ;Pn
and since
3;L
.
3 210g qln
i
a~L
J
J
J
B~P' ~ A~A.
J
J J
from Theorem A.2.1. (x) of the Appendix
~
79
we thus obtain the
Theor~n
theorem:
For the GGCM model, the information matrix for
3.2.4:
given by the (mp
(3.2.27)
followin~
mp)
x
is
matrix:
-1
u
1
=
Bn,t;
~
[PB.(B~EB.)
E
n
J
j=l
J
J
B~P' ~ A~A.]
J J
J
.
Using Definition 3.2.1 for the BAN estimator of any transformation of
we then obtain the following corollary:
Corollary 3.2.5:
For the GGCM model, the limiting variance matrix of
any BAN estimator of an estimable linear set
H'~,
where H(mp
x
w) is
of full rank w, is given by
(3.2.28)
3.3
lC
= H'[
n n
-1
u
E PB.(B~EB.)
j=l
J J J
B~P' ~ AJ~AJ.]-H
J
Estimation for the Gut Model--A Review
of the Literature with Some Extensions
In this section we review some methods proposed in the litera-
ture for estimating the parameters
in (1.4.1.4) of Chapter One.
~
and E
of the GIM model defined
For convenience we repeat the definition
of the model:
0.3.1)
=
E (Y .)
J
A. ~ B .
J J
=
Var(Y.)
J
I
where the rows of Y.(n.
J
~
p)
x
A. (n.
J
B. (p
J
x
J
x
x
g B~EB.
J J
qJ')
[~l ~2
(m x p)
E(p
.e
J
n.
J
=
m)
q.)
J
«a
rs
))
, j:;:l, ... ,u ,
are normally distributed,
... ~]
is unknown,
is unknown and positive definite,
is known,
is the incidence matrix of 0' sand l' s for the
~
80
j-th set (S.)
J
~
q.
J
and
=
n
of experimental units,
p, j =1, ... ,u ,
u
L
n.
j=l
J
variate linear model for missing observations.
The problem of missing
observations in multivariate experiments has been discussed often in
.s
the literature.
In a recent paper by Afifi and Elashoff [3], the liter-
ature is quite thoroughly reviewed and a complete bibliography through
1966 is given.
Subsequent papers have been written by Afifi and
Elashoff [4], [5], [6] and Hocking and Smith [28].
Most of the earlier
papers (e.g., Wilks [69], Yates [71], Bartlett [11], Tocher [62],
Buck [16] and Dear [19]) and the recent papers of Afifi and Elashoff
consider randomly occurring missing observations.
Except for Wilks,
all these authors considered the regression of a preferred response
variate on the rest of the variates and did not assume normality on
the data.
In some early papers by Lord [34], Edgett [21], Nicholson
[36], Anderson [7] and more recently by Bhargava [131, Trawinski and
Bargmann [63], Hocking and Smith [28], and Srivastava [58], [60]
attention has focused on the problem of missing observations which
follow some pattern.
In all of these papers except those of Srivas-
tava, who considers BLUE estimation only, and Trawinski and Bargmann,
=A
whose paper assumes A.
:1.
A.
:1.
= -n
1
.
:1.
and
~
= ~'(l
x
for all i, the very special case of
p) is considered.
We will discuss below the
4It',
81
papers by Anderson [7] and Hocking and Smith [28], whose approaches
were general enough to include many models discussed in other papers
(e.g., Lord, Edgett, Nicholson).
Anderson's method will be extended
to the HIM model to obtain maximum likelihood estimators of
~
and E.
Hocking and Smith's method can be used for the general missing observations model of (1.3.1), although it seems to be computationally
unfeasible except for simply structured situations.
3.3.1
Anderson's Method
Anderson [7J has described a method for obtaining the MLE's of
~
= i.' (p
x
and E when A.
1)
J
=
.- - The usual method for obtaining
1
-rl
j
MLE's of given parameters involves solving the likelihood equations
Using the likelihood function given in (3.2.11) and
for the model.
the rules of matrix differentiation as given in the Appendix, we obtain
the following likelihood equations from
a
log ep =
a
and
at;
a log
aE ~
= 0,
respectively, for the GIM model:
-1
u
A.t;B.)(B~EB.)
E Al'(Y; i=l
...
u
E
i=l
~
~
~
B' =
~
i
a
-1
[n.B.(B~EB.)lBl' - B.(B~LB.) (Y.-A.t;B.) '(Y.-A.t;B.)
~
~
~
~
~
~
~
~
~
~
~
~
-1
(B ~ EB . )
~
It is clear that these equations
r~E's
t;
and E
canno~
~
B~] =
~
~
a .
be solved directly for the
and that a solution by iteration techniques using a
computer, similar to that used for estimating factor loadings in the
factor analysis model, would require a costly and laborious procedure .
.e
To avoid this difficulty, Anderson has given an alternative Method for
obtaining solutions to the maximum likelihood equations.
82
To use Anderson's method it is first necessary to be able to
partition the p
WI' W2 , ... , Wk
mutually exclusive groups
function
~
... ,
response variates VI'
V
P
into, say,
k(i p)
so that the likelihood
for the entire experiment can be written as
k
~
where
=
~i(8i)
II
i=l
~ i (8 i)
is the likelihood function for the data observed on
the variates in
W.
in
W.~-1
conditioned on the data observed on the variates
~
together, and 8
i
denotes the unknown (condi-
tional) parameters of the i-th conditional likelihood.
meters
8 , •.. , 0
k
1
form a mutually independent set,
1
If the parathen
~
can be
e·
01
maximized by finding 0
i.
i
which maximize
The maximum likelihood solutions
~
~.
~
(0.) . separately for each
~
= [II' "',
~]
and E
are then found, usually recursively, using the identities which relate
and L:.
Anderson's method, unfortunately, does not apply to the GIM
model in general.
It works only for special cases in which the B.
fairly simply structured.
~
are
Such models are characterized by not having
any pair of variates for which the experimental units common to both is
a non-empty proper subset of each of the sets of experimental units of
lIhis will be the case provided none of the original parameters
8 , ... , 8
both conditionally and uncondition1
k
ally, and provided the total number of parameters in (;)1' ••• , 8 equals
k
the total number in ~ and L.
iI' ... , i p appear in
e
83
the variates separately.
model if V
2
For example, we would have an unacceptable
was the only variate observed in the j-th set and V was
3
the only variate observed in the k-th set, and yet both V
observed together in the t-th set.
If however,
V
2
and
V
and V
2
3
were
were ob-
3
served separately in different sets but were not observed together in
any set, then the model would still be acceptable.
Some examples of
acceptable structures include:
a)
The HIM model, where
= I P , B1;~ = [ I p-1;. +1
0p-i+1,i-1] , i=2, ... ,p,
k=p, W = {V }, j=l, ... ,po
j
j
b)
Edgett's model, vlhere
p=3, u=2, q1 = 3, q = 2, B = 1 , B' =
2
3
2
1
k=2, W
1
c)
1
~J
'
{V ' V2 }, W = {V3}'
2
1
Lord's model, where
p=3, u=2, ql = 2, q2 = 2, Bi • [~
d)
[~
a
o
a
1
o
p=s, u=3, q1 = 3, q2 = 3, q3 = 2,
B' =
1
[~
k=S, W
1
0
0
0
1
0
0
0
0
1
~l
B'
2
. [~
0
0
0
1
0
0
0
0
0
n
Bj
=
[~
0
0
0
0
1
0
{V }, W = {V 2 }, i~3 = {V }, W = {V }, W = {V3}.
1
2
4
4
s
S
Some unacceptable structures include:
~J
84
e)
2 [~
B = 1 , B
1
3
=
f)
e
p=3, u=3, q1 = 3, qz = 2, q3 = 2,
0
1
~J'
Bj -
[~
0
0
~J
p=Z, u=3, q1 = 2, qz = 1, q3 = 1,
2
B = 1 , B = [1
2
1
3
0], B = [0
I}
We can present these examples diagramatica11y as fo11ows: 2
a)
b)
.
c)
J
d)
e·
.
e)
e)
TABLE 3.1
EXAMPLES OF GIM SITUATIONS
Z A vertical line in the diagrams represents the experimental
units observed on the response variate above the line. Any group of response variates is observed together on the ~ experimental unit if a
horizontal line can be drawn which intersects each of the vertical lines
corresponding to each response variate in the group.
~
..,
85
In his paper, Anderson did not give explicit results except for
the HIM model when p=2
and for Edgett's and Lord's model.
[13], however, did obtain explicit,
the HIM model with general p
Bhargava
though recursive, solutions for
and A. = 1
-n
~
, t; =
s..
I
(1
p).
x
We now
i
extend Bhargava's results to the HIM model:
3.3.2
Derivation of MLE's for the HIM Model
Noting that B. (p
j) = [1. :
x
J
J
model, it is easy to see that N > N
r
s
P
N =
s
E
j=s
Letting x. (N
n.
J
"""""S
S
x 1)
o.J,P-J.]'
j=l, ..• ,p in the HU1
if r < s, where
and D .(N
s s
m)
x
denote the obser-
vation vector and design matrix, respectively, for response Valone,
s
s=l, ... ,p, we have
x
-s
(N
x 1) =
s
y
""-Ss
and
D (N
s
s
x
m)
=
A
s
Zs+l,s
As +1
Ips
A
P
where
and
A.(n.
J
x
J
m) , j=l, ... ,u
is the design matrix for the j-th
set of experimental units.
We also define the following additional notation:
For s=l, ... ,p, let
E (s
s
x s) =
B'EB
s s
86
and for s=2 t ... tPt let
a «s-l)
x
s « s- 1)
x
-s
6
a
1)
1)
)1
a = (6 s'1 t "'t 6
= l:-1
s-1-s
s's-1
t
a
ss'12 ... (s-1)
all:
ss
-1
a
-s s-l-s
~(m x 1) = ~s - ;(s-1)~
and
Lemma 3.3.2.1:
x
-s
I
For s=2 t .•. ,P
we have
X
I""' NN
(D s-s
a + Xs-s
0 ,a ss' 12 . ,. (1)
-xl t ... t -sJ.
s- IN ) ,
s
s
Proof:
ClearlYt the distribution of -s
x
as that of ~
I
Xs
independent of -s
x
zs
= [X
I
X 1
-xl' "'t -s-
since the first Nr - Ns
by assumption,
1 x]
s ' -s
is a (N
is the same
rows of x r ' r
<
S,
are
Now
s
x s)
random matrix whose rows
are each s-variate normal and
Ds-s
; ] = Ds; (s)
E[Z 5 ] = [EX s
Since
~
5S
we can use for each row of Z
s
]
,
a well known theorem (e.g., Rao [41] t
page 441) which gives the conditional distribution of any subset of
components of a multinormal vector.
independent t we then obtain
Noting that the rows of Z
s
are
~~
87
s
·x
-s
+
X-N. (D
s
N
~
s-s
s
(X - D E;(
s
'
s
0,[-11°
s-1))[-11°,
s- -s (0 ss - -s
s- -s )IN) .
-1
Since D~s
~
+ (X s - Ds s( s-1))[ s- 1a-s = Ds-s
a + Xs-s
0
ass
- a ,,,,-1 a
-s~s-l-s
= a 55'12 ... (5-1)
and
by definition, we have the desired
result.
q.e.d.
We can now write the likelihood function $
where
NN
1
$1 (~l' all)
(Dl~l'
allI N )
1
function for
NN
~
s
is the'density function for
and $ (a , 0 , a
I )
.
s -s -s
ss·12 ... (s-1) N
s
e~
~
will be maximized by maximizing $
Clearly, the estimators of
-1
=
D
Dlx l
(Di
1)
~l
~l
0
-s'
a
-1
ss·12 ... (s-l)
ss-12 ... (s-l)
separately
'(
1
N~] I - Dl (DiDl) Dl)~l
1
and all
)
3
Also
is the likelihood function for univariate
covariate ANOVA and thus we can obtain estimators of
a
s
Using
are
and all
is the likelihood function for univariate ANOVA.
~ (a
~s -s'
is the density
(D a + X 0 , ass' 12. . . (5 -1) IN ) , s= 2 , . . . ,p.
s-s
s-s
s
Anderson's approach,
for each s.
for our model as
~,
is
and
as follows:
-1
(ID
s
; X ]' [D
s
s
X ]) [D
s
s
X ] 'x
s -s
-1
and
a
1 x' (IN - (D s 11Xs ]{ [D s !Xs ]' [D s : Xs ]} [D s'~ Xs ]') -s
x .
ss·12 ... (s-l) N -s
s
s
=
Now
.e
is assumed to be estimable, the design matrices D
s
must be of full rank for each s.
88
[D
xs ] [D s
xs ]
I
s
=
s s
[D'D
D'X
s s
X'D
X'X
s s
Assuming [D
; X]
s
is of full rank, which will be the case with
s
probability 1
s
D'X
s s]-1
s s
[
is of full rank, we can write (Rao [41J):
when D
DID
(3.3.2.1)
s s
X'D
s s
X'X
s s
where
•
".J
(D'D )-lD,X
(3.3.2.2)
F
(3.3.23)
E
s s
= X'X
s S
s s
- X'D (D'D )-lD,X
S S
S 5
S S
= X'[I
S
- D (D'D )-l DI ]X
S
S S
5
•
S
def
= N5 [ 5- l '
Now
-1
1
= - li-5-1
XID (DID)
N
s S s s
s
and
E
-1
=
--1
1
N
[
s
s-l
Thus
-1
= (D'D)
£.g
s s
- lN
s
s-s
N
s
(D'D ) D'X ~-1 X'D (D'D)
s s
s s~s-l s s
-1
D'x
s s
s-s
-1
D'X i-I X'x
(D'D)
s s
s s s-l s-s
-1
=
-1
D'x + l
(D'D)
s s
D'x
s-s
1
-N
5
-1.
(D'D) D'X [-1 x'rr
s s
S 5 5-1 SL
-
-1
D (D'D) D']x
s
s
S
s-s
89
and letting
def 1
(3.3.2.4)
-1
= -N X's [I - Ds (D's D)
D's ]x
s
-s
cr
-s
s
-1
we can write
--1 = (D'D)
D'(x
- Xs r s-l-s
cr)
s s
s -s
CL
-s
Also,
8
-s
-1
1
-"N
=-
Es-1
-1 X'D (D' D) D'x + L E-IX' x
s s s s
s-s
N
s-l s-s
s
=
1
N
s
~-l X'[I - D (D'D)
s-1 s
s
s
-1
D']x
s s
s -s
--1 -
= rs-l.£..s :i'
Now using the identity relationship
recursive formula for
~,
obtain~,
~ - E; (s-l)is
~,
the MLE of
CL
+ E; (s-I)-s
8
-s
Thus to
~
given by
where E;(s-l) = [£1
we calculate
s=2, ... ,p
we obtain a
previous values obtained for £1' ... ,
and -s
c
CL
-s
~-1
and then use the
in the recursive formula.
Using (3.3.2.1) through (3.3.2.4) we also have that
x'[I-D (D'D)
cr ss·12 ... (s-1) = L
N -s
s s s
5
1
-N..'
5
s
1
7':'"""
~
s
N
s
-1
s s
-1
D'X ~-l X'{I-D (D'D ) D'}
s s s-l s
s
D'}]x
5-5
-1
- D (D' D)
s
D '] X
s s
s -s
-1
x'[I-u (D'D)
-s
S
5
x' [I-D CD I D)
~
s
S
1
-(~ f
=
-1
D'+ 1- D (D'D)
xs~-5- lX'{I-D
(D'D)
5
5
5 5
= ..~ -s
x' [I
s
.e
-1
S
S 5
S
-1
D']X ~-1 X'[I-D (D'D)
S
S
5-1 s
- - -1 D' ] x - ,.. ' 7"
,..
S ~
~-s-l~s
s
s s
-1
D']x
s
-S
s s
S
90
=
-
~'-l
-
We then obtain MLE's ars' r < s
where
ars' r
<
a
and where r
of r s- l '
using r
s-
ss
= a
s
is the r-th element of -s
a
s
A
ss·12 ..• (s-1)
A_I
DI ] x
s-s
recursively,
= r s-l-s
0
A
a'r a
-s s-l-s
-
is the previously calculated maximum likelihood estimate
s-l
Here it is necessary to compute a
-s
1
- D (D I D)
s ss
I
and ass' s=2, ... ,p
A
and
-1
J:-x
[I
N-s
a ss - -s
a'r s- la,
where a
-s
ss
and 0
-s
before computing a
=
.
'"
... , a s,s-l )'
(a l'
s
Thus we have the following
ss
theorem:
Theorem 3.3.2.1:
and r
=
«a rs »
~ =
The maximum likelihood estimators of
[~l
-
...
~
--p
]
for the HTI1 model with full rank design matrices D
s
are given by
-1
£..1 = (DID l )
~
=
1
CL
-s + ~(s-l)is
a ss= a
'" "'-1 '"
where
~ (s-1) = [~l
s
Zs_l~
and Zs-l
maximum likelihood estimators of
-1
(D'D)
s s
CL
=
8
0
= z: S-1-8
-s
-s
"-1 -
Di]x 1
for s=2, ••• ,p,
= the r-th element of -s
° = r s-l-s
0
ars' r
<
-1
Dl (DiDl)
for s=2, ... ,p,
a'r s- 10-s ,
ss'12 ... (s-l) + -s
and
£s =
DI~l ' all = ~ ~l [I
DI (x
s-s
are recursively calculated
~(s-1)'
r-
Zs-lis
l ; )
X
s s-l-s
and Z
s-
l' respectively,
91
xs =
Ys,s-l
;'1
;',s-l
-1
Es-l
=
a =
-s
Lx'
[I
N
s
LX'
N s [I
s
- 0 (0' D)
s s s
-1
- 0 (0' D)
s
s
a sso12 ... (s-l)
ass =
~ ~[I
s
0 ' ]X
s s
a
--
0 ' ]x
s -s
s s
- --1 aIL:
a
ss
-s s-l-s
0s(O~Os)
-1
0s]!.s' for s=2, ..• ,p
p
and
N
e
x
s
-s
=
L:
i=s
=
l.ss
n.
~
os
=
A
s
, for s=1 , •
0
•
,
p.
AS +l
3.3.3
Hocking and Smith's Approach
A second approach for obtaining BAN estimators of
for the GIM model was developed by Hocking and Smith [28].
and L:
~
Their
general methodology applies to any GIM situation for which B. = I
J
p
for
Some j, although it may be modified in case no set of experimental units
is complete
(i.e., without missing observations).
For the special
case of the HIM model Hocking and Smith have shown that their approach
.e
yields the maximum likelihood solutions obtained by Anderson's method .
They have not established this for the general GIM model, however, but
92
they have shown that their estimators of
and ~
s
are consistent and
~
asymptotically efficient;i.e., they are BAN if we assume normality.
The estimation technique can be briefly summarized as follows:
(i)
Initial estimates of the parameters are obtained from
that group of observations with no missing observations (e.g. group j=l).
(ii)
These initial estimators are modified by adjoining
optimally the information in all the remaining groups in a sequential
manner by the addition of linear combinations of zero expectations.
By "optimally" it is meant that at each stage in the estimation procedure, the newly defined-
e~timators
will be consistent and asymptotically
efficient when only the partial data used in making the new estimate is
considered.
A severe drawback to Hocking and Smith's method,at least from a
practical point of view, is the extremely cumbersome notation which
would be reqUired to write out for the general model the formulae for
calculating the estimators at each stage.
•
Thus a general computer
program for obtaining the estimators will not only be costly to write,
but might also be difficult for the applied statistician to use.
In
any case, such a computer program has not been developed as yet, and
use of Hocking and Smith's results at the present must be restricted to
sL~ply
structured models and to small u (e.g., u=2 or 3).
Because of the notational difficulties for the general model,
Hocking and Smith were obliged in presenting their method to consider
the Nested Multivariate (NN) model defined by
(3.3.3.1)
where
E (Y .)
J
= 0, Var (Y. ) = I
B = 1 p'
1
J
B. = [1
J
n.
J
q.
J
0
3 B~l:B.
J
J
]
qj , p-qj
j=l, ... ,u
, , j=2, ... ,u
~
93
and
(Note that when q.
J
= p-j+l we have the HIM model variance structure.)
We will not describe the estimator [
obtained for [
reader can refer to Hocking and Smith's paper.
here, since the
We will, however, de-
scribe how their method can be extended for the nested situation when
the expectation structure is given by
(3.3.3.2)
E (Y .) = A,
J
J
~B
., j = 1 , ••• ,u
J
and B.
where A. is of full rank for each j
J
We fir::;t write
estimator
i
~ = [_E;l'
(1)
~
-p
]
as -E; =. (~,
' ~'
-1 -2
is defined by (3.3.1).
•
t,'
E; ')'
-p
•
The initial
is then obtained by rolling out the columns of
~*= (AlA )-lA'Y
1 1
.<}.
J
into one long (mp
1 1
x
1) vector.
Additional data are
then adjoined in stages so that the j-stage estimator
~
(j)
is given
by:
(3.3.3.3)
where
E; (j) = E; (j-l)
\.1. (mq,
-J
J
x 1)
=
-
+ Ij!'j
\.1,
-J
j =2, ... ,u
-fl (j-l)
-
5.2
- J. 'c-*
->2
(j-l)
E;''<
j-l
j~, ]
J
•
:,; (j-l) =
5.1
(j-l)
is the previously obtained (j-l)stage estimator of
~
(j-l) ....
i '
94
-~. (mq.
J
where
-E
J
mp)
=
~.
J
s.. = s..
(j-l),
-
E • E (j -1)
(j-l) is the (j-l)-th stage estimator of E, and
~j(mqj x
and;
x
mp) is a matrix function of. the
eleme~ts
which is determined so that each of the mp
of the (mp
mp)
x
V(j)
of
~
diagonal elements
variance matrix
-
= Var (5.. (j -1) + If' jH.j)
are minimum variances.
Clearly we can write V(j)
V(.)
J
where
= Var(~ (j-l»
--
E. (mq.
x
mq.)
F.(mq.
x
mp)
J
J
J
J
as
+ 1f'~[E.If'.+ 2F j ]
J J J
= Var (lJ.)
J
and
-J
-
= Cov(lJ., _; (j-l».
--J
Thus the optimum value of If'j
= -F j ,
is given by the solution to Ejlf'j
1. e. ,
(3.3.3.4)
=
~.
J
-1
-E. F . .
J
J
Therefore. we can directly
and
-r
(j -1)
to obtain
-~j'
calculate~.
J
provided we can calculate Ej
is clear that as the number of sets S.
J
E.
J
and F.
J
and then substitute
s..- (j-l)
and F . .
J
It
of experimental units increases,
will be increasingly difficult to compute.
Thus even with
the nice structure of the nested model. explicit formulae for the
estimators of
~
and Z would be very messy indeed.
It is understand-
able that Hocking and Smith did not present such formulae even for r
in their paper.
So, despite the theoretical soundness of their
e"
95
approach, their method is not presently suitable for practical use in
general GIM situations.
~
A simple example when p=2,
= I'(l x 2)
and A . = 1
J
-no
J
which
illustrates their method is presented in [53].
3.4
BAN Estimation for the MDH Model-A Literature Review
In this section, we discuss several BAN estimators for
I = (Ii ••.
~)
which have been proposed for the MDM model.
I
For con-
venience we repeat definitions (1.4.1.1), (1.4.1.2) and (1.4.1.3) for
the model:
(3.4.1)
Ap--p
~ ]
Var(Y) = I
r
@
n
= A .;
(3.4.2)
Er"
\L )
(3.4.3)
E(l.) = DI , Var(y) = r
where
Is(m s x 1)
r(p
p)
x
x
D(np
x
is
1)
1.(np x 1)
and
In '
is unknown
s
Yg(n
@
= (1 rs I n
is unknown, s=l, ... ,p
A (n x m)
s
Cov(.lr' Yg)
,
~
s
kno~m,
s=l, ... ,p
is the s-th column of Y, s=l, ... ,p
= (1.i 1.2 ...
p
r m ) =
s
s=l
A
1
A
Z
~)
I
0
0
A
It is assumed
throu~hout
-
P
this section that A
s
is of full rank m
s
for
96
each s, so that
~
B~~
Four
is estimable.
estimators for
~
have been proposed and discussed in
the literature and recently [33] have been compareJ by Monte Carlo
methods.
Two of these estimators are due to Zellner [72], [73], who
was the first to deal with this subject.
They are called Zellner's-two
stage Aitken estimator (ZEF) and Zellner's iterative Aitken estimator
(IZEF).
A third estimator is given by Telser [61] and called Telser's
iterative estimator (TIE).
The last estimator is the maximum likelihood
estimator (MLE) which has b~en obtained by Kmenta and Gilbert [33]
using the assumption that the rows of Y corne from a multivariate normal
population.
(i) ZEF.
To derive the ZEF estimator, one needs to use the
vector version of the model given in (3.4.3).
We have already estab-
lished in Theorem 2.3.1 that if E were known, the BLUE of
I
would be
given by
-1 -1
-1
~* = [D'(E 0 I ) D) D'(E 0 I ) Y
-
Since E
n
where
i
.
is assumed unknown, Zellner proposed replacing E by its con-
sistent estimator E =
(3.4.7)
n
0
rs
=~
n-~
«0 rs ))
defined by
-1
v'[I
~
-1
- A (A'A ) A'] [I - A (AlA) A']v
s s s
s ~
n
r r r
r
n
is the number of independent variables in each regression,
which is thus assumed to be the same for each regression.
(If this
assumption is relaxed, it would be possible to use instead of
o
or
-1
(
rs
tr[I
- A (A'A ) AI](I
n
r r r
r
n
The ZcF can therefore be written
A
s
(AlA) AI].)
S
s
s
i
either
97
.e
S-
[D'
-1
19 I )D] DI
n
CZ- l
(i- l
e In)z.
or
(3.4.8)
S-l
"llA'A
(J
1 1
"12 A 'A
(J
1 2
~lPA'A
S.2
(J
"12 A 'A
2 1
o
A22 A'A
2 2
~2PA'A
2 p
~lPA'A
P 1
~2PA'A
P 2
;PPA'A
P P
-1
P "Is
1: (J ;
s=l
P "2s
A' 1: (J ;
2
s=l
A'
1
1 p
=
~
where
P "ps
A' 1: a ;
P s=l
"_1
1:
= ((~rs».
Kakwani [30] has shown" that the ZEF is unbiased even in small samples.
Also, the asymptotic variance-covariance matrix of
(D'[1:-1
@
-1
I ]D)
n
=
I
is given by
-1
llA'A
0
11
which is the limiting variance matrix already obtained for the MDM
model.
Thus f
is asymptotically efficient and since 1:
is consistent
for 1:, the ZEF is BAN.
(ii) IZEF.
Zellner's ZEF can be used for calculating a
new set of residuals leading to a new estimate of Q
turn can be used for obtaining new estimates of f
f
.e
= 1: 19 I n ,which in
and so on.
is the ZEF as given in (3.4.8), we would define Q(2)
and use as our first iterative estimate of S-.'
as
Thus if
98
Continuin; in this way we would have for the (k-l)-th iteration:
and
I
The process would continue until the estimator of
converges and the
final estimator would be the IZEF.
(iii) TIE.
Letting u =
-s
Y
~
-
A
, s=l, .•. ,p
~
~s
Telser
showed that
p
(3.4.9)
where
=
u
-s
a
~
r=l
r#s
Eu v
-r-s
= 0
u
sr-r
+ -s
v " s=l, ... , p
for r#s
and
a
sr
=-
cr
cr
ss
The main idea of the TIE is to substitute for u
-s
v
.L.s
= A ~ + u
s-s
-s
the right hand side of (3.4.9).
as well as of a
s=p
sr
in each equation
Then estimates of
are obtained beginning with s=l
~~E.
~
--s
and proceeding to
and then repeating the cycle until the estimates of
(iv)
•
sr
~
-:-s
converge.
Under normality assumptions on the rows of Y,
the likelihood function for the HDM is given by
_EU?
n
2
1
(21T) 2 I ~ 1 exp{-"2
tr(~ -1 @
The maximum likelihood estimators I
and
=
I)(y - DI)(y - DI)'} .
were obtained by Kllenta an
and Gilbert [33] using computer iteration techniques to solve the following equations:
~
-1
=
[D'C:- 1 8 I)D] D'CZ- 1 ~ I)y
A
99
where
3.4.1
• s= 1 •.' ..• p •
Comparison of the Estimators by Honte Carlo Methods
Kmenta and Gilbert's study was organized around four experiments,
each containing a number of different models.
The experiments were
distinguished from one another by the values of
the~planatory varia-
bles, that is, the independent variables corresponding to the columns
of the A
s
matrices.
The models within each experiment were charac-
terized by having the same explanatory variables and were distinguished
by the structures of the variance matrix E.
Only design
matrices of
full rank (i.e .• regression matrices) were used in the study.
For
every experiment and every model considered. the computer runs consisted of 100 samples of sizes la, 20 and 100.
The iterative process
used in calculating the TIE. IZEF and MLE estimators was stopped when
there was no change in the ninth decimal place.
All calculations were
carried out in double precision.
The most striking result obtained from the study was that the
TIE, IZEF and MLE estimates were identical in every sample.
Thus these
three methods appear to be different ways of obtaining the same solution.
In comparing the ZEF with the MLE. the IZEF and the TIE. the
results were somewhat mixed.
When the correlation between explanatory
variables was low and the correlation between disturbances (i.e.,
P
rs
= 0
rs
the ZEF.
/0
0
rr ss
,r1s) was high, the MLE had smaller variances than
Also, when the disturbances
~
= Is
- As~
were autoregres-
sive and correlated across equations, the MLE had smaller variances
regardless of the correlation between explanatory variables.
In all
otner models, regardless of sample size. the ZEF estitnator had variances
100
smaller than or equal to the variances·of the MLE estimator.
4It
Thus,
overall, the results favor the ZEF over the other estimators with regard
to efficiency in small samples.
In addition, since the
~~E,
TIE and
IZEF methods are considerably more time consuming than the procedure of
calculating the ZEF, Kmenta and Gilbert recommended the latter. Furthermore, the ZEF is always unbiased, whereas the MLE, TIE and IZEF estimators may not be unbiased.
3.5
A BAN Estimator for the MGLM Model
The method used to obtain the
ZEF
estimator for the MDM
model can be easily extended to the MGLM model ... This is described in
the following theorem:
Theorem 3.5.1:
For the MGLM model, a BAN estimator which is unbiased
for any estimable set
•
i = H'i is given by
(3.5.1)
where
~,D
are given by (1.4.8) and
~
and
by substituting the elements of [
=
for the corresponding elements of [
Proof:
«0
Of course, we cannot use
But since [
~ 0
-np
;:>* - 8
-11
=
rs
is obtained from
~
p
[
)).
but ion ; that is,
e
-n
a*
-n
defined by
in practice, since Q
is
from Corollary 2.4.3, we must have
Thus, using
254, we can conclude that
a*
-n
~
given in Theorem 2.4.1
We have already seen that the estimator
is BAJ.'l for a
unknown.
«Ors))
~
a well known result by Cramer
e
-n
and
is BAi.'l for
e
e*
-n
~7],
page
have the same asymptotic distri-
101
,e
Unbiasedness follows from the argument of Kakwani [30], who
showed for the tIDM model that
whenever y -
D~
~
is symmetrically distributed about
~
follows a continuous symmetric probability law.
q.e.d.
It is important to note here that although (3.5.1) gives a BAN
estimator, we are not claiming that it is the "best I' BAN estimator in
the sense of having a variance matrix which approaches the limiting
variance matrix (3.2.23) at the fastest rate as the sample size increases.
We shall describe below a method of iteration using (3.5.1)
which will give another
~BAN
estimator for the MGLM model.
A computer
study similar to the one made by Kmenta and Gilbert [33J for the
~IDM
model would be needed to compare these different BAN estimators in
e.
s~al1
samples, although it can be said that the results for the MOM
model indicate that the estimator (3.5.1) would compare favorably with
iterative-type estimators.
4
In any case, it is clear that for large
samples, (3.5.1) has minimum variance.
We also show in the next
chapter that (3.5.1) may be used in large sample tests of hypotheses
for the MGLM model.
3.5.1
Special Cases
It is easily seen that if
~
itself is estimable and we consider
the special case of the MOM model, then for H
Zellner's ZEF estimator.
=
~
, (3.5.1) gives us
If we consider the SM model and we recall in
Section 2.2 of Chapter Two that
~,
and therefore
~,
drops out of
4It should also be pointed out that although Kmenta and
Gilbert's study compared the small sample variances of different estimators, it did not directly compare the different estimators as to the
rate at which they approached their limiting variances. A recent paper
by Bahadur [9] discusses the concept of the rate at which an estimator
approaches its limiting variance.
102
(D'n
-1
-
D) D'n
-1
,we see that (3.5.1) gives us the usual
for the SM model.
~~E
and BLUE
As for the GIM model, we will now show that we can
write (3.5.1) in terms of the original data matrices Y.
J
by considering
the original definition of the model:
E(Y .) = A. sB .
J
_ J
J
Var(Y.) = I
J
Let t ing Y. (n.
J
Yj (n j qj
x
n
1) = (Yj 1 Yj 2:,"
E(y. ) = A.sb.
Jr
j=l, •.• ,u.
q.) = [Y ' l' ..• , Y
j qj ], we can def ine
J
J
x
J
S B~LB. ,
J J
J -:Jr
Yj q / '
'. Clearly
, where b.
is the r-th column of B .
-Jr
J
: A.
I
Thus
E(v.) =
L J
Also
Var (y.)
b~l
-J
8 A.J
£. = (B~ 8 A.)£..
J
J
J
If we now define
yeN
x
1) =
(Yi Yz ...
~' )
we have
(3.5.1.1)
E (y..) =
Ds..
Var (1.) = 12
where
(3.5.1.2)
D(N x mp) ...
[Bi
B'
u
s]
b
J1' - j r
I
103
,e
and
r,- (N
(3.5.1.3)
N) =
x
o
o
(BIL:B ) 0 I
u
= HI~
Thus i f e
u
n
u
fs estimable, we can write the BAN estimator (3.5.1)
as
e
-n
= HI~
-1
u
A.)]- Z (B. 0 A~){(B~ZB.) 0 I
}y.
J
J
J J
n
J
J
j=l
j
-1
u
B.(B~L:B.) B~ 0 A~A.]- Z
u
=H'[ L:
J
j=l
J
J
J
J J
j=l
-1
[B.(B~L:B.)
J
J
J
0 A~Jy . .
J
J
Hence we have the following corollary to Theorem 3.5.1:
Corollary 3.5.1.1:
linear set
e
= H'I
For the GIN !!lodel a BAJ.'l' estimator for an estimable
is given by
-1
u
' -1
B.(B~l:B.) B~@ A~A.]- l:
[B.(B~l:B.) 0 A~Jy.
J
J J j=l
J J J
J J
j=l J J J
U
(3.5.1.4)
where l:
e
-n
is defined in Theorem 2.4.1, and Yj
columns of Y.
J
3.5.2
HIS. = H'[ Z
is formed by putting the
underneath each other.
Iterative Techniques
We now describe two iterative methods which use the estimator
(3.5.1) to obtain alternative BAN estimators.
The first is analogous
to that used for the IZEF estimator described in Section 3.4.
It uses
(3.5.1) to calculate a new set of residuals leading to a new estimate
104
e
4It
as the above, uses the well-known (e.g., Rao [41]) method of scoring..
4It
of~.
This estimate in turn is used for obtaining new estimates of
and so on.
Thus for
e
= H'~
as given in (3.5.1) we would define
as
e:
and use as our first iterative estimate of
Continuing in this manner we would have for the (k-l)-th iteration
0.5.2.1)
i(k) = H'I(k) =
where
n (k) =
[~-'
H'(D';(~)D)-D';(~~
Dl(k-l)] [~ -
ni(k-i.)] , ...
e
This process would continue until the estimator of
converges.
An alternative procedure, which should yield the same result
In order to use this method, it is necessary to calculate B
n ,Q.
information matrix defined in (3.1.2) for the p(p-l)/2
parameters in
~
=
«u rs ))'
independent
where
We thus need the following result:
For the MGLM model, the (r,s)-th element of
Theorem 3.5.2.1:
a
log
a
.~
0
(3.5.2.2)
is given by
a
log p
dO
rs
1
= -
'2
u
0 rs
- <B. (B
J
[
~
J
j=l
-1
~ ~B .)
J
-1
n. <B. (B ~ ~B.) B ~
J
J
J
J
J
>
.
-1·
p ~ P . (B ~ ZB .) B ~)
J J
J
J
J rs
,the
rs
J
105
.-
where
8
sf'
rs
r::fs
1, res
and
P.
J
= Y. - E(Y.).
J
J
corresponding to cr
Also, the element of B
n '£.
rs
and cr
is
wz
given by the formula
(3.5.2.3)
i
1
8
2n rs
=
rs,wz(.£)
<d~:z > =
where
u
-1
E n. <B . (B ~ EB . ) B~
J J J
J
j=l J
mn
J
J
J
B~>
J
rs
(w,z) or (z,w)
=
{l i f (m,n)
-1
fL
B. (B ~ EB . )
cr wz
o otherwise.
.
"}
(3.5.2.2) is obtained using (viii) and (xii) of Theurem A.2.1
Proof:
in the Appendix.
Letting b('.) (1 x q.)
- J r
J
denote the r-th row of B .
J
we can write
(3.5.2.2) as
u
d log ep
1
= - -2 8
E [n.b('.)
d cr
rs. 1 J- J r
rs
J=
-1
(B~EB.)
J J
-1-1
b(.) - b('.)
- J s - J r
Differentiating again with respect to cr
(B~EB.)
J J
P!P.(B!EB.) b(.) J.
J J J J - J s
using (i) and (ii) of
wz
Theorem A.2.1 gives
0 2 108 t
dO ocr
rs
wz
1
= -2
u
E 8 [n.b('.)
j=l rs J- J r
-1
- b (' .) (B ~ EB .) B ~
- J r J J
J
-1
(B~EB.) B~
J
~
0 wz
0
J
.e
l~ow
E[P~P.]
J J
J oOwz
-1
J
= E[(Y.J
=
J J
- E(Y.»
J
n.B~LB.
J J
J
J
J
J
J
b(.)
- J s
B. (B ~ EB .) P ~ P . (B ~ EB .) b ( . )
J J J
J J J J ..... J s
-1
eB~EB.) p~p.eB~EB.) B~ "oE
J
-1
B.(B~EB.)
-1-1
-1
- be'.)
- J r
0E
-",-
J
J
0
°~vZ
B.eB~EB.)
J
'(Yo - E(Y.»]
J
J
J
J
be') ] .
- J s
106
Thus
2
E [.
1:. a
[ n aO"
10g
rs
aO"
<p n
wz
}= ~n 0rs
1
=
°
n rs
"2
~
j=l
u
~
n.
j =1 J
ar
-1
u
n.b('.)
J- J
<
(B~rB.)
r J J
B. (B ~
J
J
~B . )
-1
(B~~Bj)
J J
B! -a- B.
J
O"wz
a~
B ~ -3-
-1
J
b(.)
- J s
-1
B. (B ~ EB .) B ~
0 wz
J J J
J
J
>.
rs
q.e.d. 5
Using the results of Theorem 3.5.2.1, we can proceed with the
method of scoring for the MGLM model as follows:
Let
0
be the estimate of
which uses the
0
first iterative estimator- of
S!..(2 ) =
0
-1
+ Bn ,S!..~
~
of Theorem 2.4.1. The
is then given by
0"
a log
<P
n
ao
0
= S!..,
~ = ~
.'
we define
which we use to obtain the first iterative estimator of
e
=
H'~.
~
= i (k-l)
At
the (k-1)-th iteration we thus have
3 log <p
a
n
0
= It
5
~J"
Z
a0d wz
Note also that
B
J
J'
tv
= 1:.
2
~ wz (b (J' ) wt
U
is the (c,d)-th element of B.
J
b( .)
J zv
+ b( .)
b
)
J wv ( J. ) zt
where b(j)cd
107
and 2(k) = H,~(k)
given by (3.5.2.1).
Since Kmenta and Gilbert's study yields identical IZEF, TIE and
MLE estimators, it would be expected that generalized versions of their
estimators for the
MG~~
model would yield identical results.
We thus
do not suggest the use of a method analagous to the TIE method nor one
which solves the likelihood equations directly, since both methods are
difficult to use on the computer.
3.6
B&~
Estimation for the GGCM Model
Recalling Section 1.5 of Chapter One, the GGCM model is given
by
= A. F;PB .
E (Y .)
J
J
Var(Y.)
J
J
= I n.
0
J
B~LB.
J
j=l, ... ,u
J
where
Y. is (n. x q.) , A.
J
J
J
L is (q x q)
j-th set S.
J
and
is (n. x m)
J
, Bj(q
qj)
x
is (m
F;
J
p),
x
is the incidence matrix for the
of experimental units
pep
x
q)
is a regression matrix of full rank p •
We have obtained for the GGCM model in Corollary 3.2.4 of this
chapter the limiting variance matrix of any BAN estimator of an estimabIe linear set H'f , where f
full rank w.
=
(fi, " ' , ~)'
and H(mp
x
w)
is of
The limiting variance matrix is given in (3.2.28) as
-1
u
C
n
= nH' [
L
j=l
PB.
J
(B' LB .) B ~ p' 3 A ~ A.] H
J
J
J
J J
To obtain an estimator which is both unbiased and
B&~
for H'f.
we can
use a method similar to that used for the CDl model in Section 3.5.1.
108
We first give an unbiased and consistent estimator of ~: 6
Lemma 3.6.1:
of Z(q
For the GGCM model, an unbiased and consistent estimator
q)
x
(3.6.1)
is given by Z =
a
rs
=N
1
- R(D
rs
rs
«a rs ))'
x I [I
) ~s
where
- D (D' D ) - D' ] x
rs
rs rs
rs -sr
r, s=l, ... ,q
where
N
rs
is the number of experimental units in which
(, 2)
both V
r
x
~s
and V
s
are observed,
(N rs x 1) -Jis the observation vector on Vr
corres-
ponding to those experimental units on which both V
r
and V
s
and
D
rs
(N
rs
x m)
A.
is the design matrix, consisting of a row of
matrices, corresponding to x
7
~s
J
Proof:
are observed,
Using L~nma 2.4.1 of Chapter Two with G= I - D (D' D )-D'
rs. rs rs
rs
and noting that E(x' )GE(x ) = 0, it is easily seen that
~s
-sr
E (0"
rs
)
=
tr G
-~;.....;~-:---
N
rs
- R(D
rs
0"
) rs
=
0"
Consistency follows from the fact that a
rs
rs
is the usual pooled estiN
mate of covariance which is consistent for a
provided lim ~
n
oo
rs
n+
exists and is non-zero for all r,s.
q.e.d.
bAs for the MGLM model, it is shown by example in Chapter Five
that L is not necessarily positive definite, although it does have a
positive definite limit.
7
and D • D corresNote that when r=s, N55 • NS' ~s
x
• -s'
x
5S
S
pond to all those experimental units on which V
is observed.
s
•
109
Theorem 3.6.1:
For the GGCM model an unbiased BAN estimator of an
estimable linear set 6 = H-'f.
is given by
-1
u
-1
u
(3.6.2) 6 = H'[ L PB.(B~LB.) B~P' @ A~A.]- L [PB.(B~LB.)
-n
J J J
J
J J j=l
J J J
j=l
where y.(n.q.
J
J J
x
1)
each other and L
Proof:
is formed by putting the columns of Y. underneath
J
= ~(crrs»
=
Letting Y
j
is defined by (3.6.1).
[y-l 1...'2 ••• y.
:J
:J qj
J
y. ) ' .
Jqj
] , we can write
Now i f we let PB .
J
= [.E.J·l .E. . 2
J
then
=
[
I
A.C,l
I
i
_c, =
Thus
Also
=
Var (y.)
J
If we now define y(N
E(y)
=
D~
x
B~L:B.
J
J
~
I
mp)
J
(Yl yZ ...
1)
, Var(y) =
= B'p'
1
@A
1
B'p' @ A
u
and
(B~P'
A.c,
J--P
]
n.
~J
r
@A.)c,
J -
n.
J
where
D(N x
I
A.C,2 .••• I
J-'
,
!
J-
u
~
~)'
, then clearly
.E.jq .] ,
J
110
=
'-;(N x N)
o
o
(B'rB
u
u
)~
I
n
u
Recalling the proof of Theorem 3.5.1 for the MGLM model it is obvious
that
(3.6.3)
is an unbiased BAN estimator of
r
replacing
r.
by
j=l
(PB. @ A!){(B!rB.)
J
J
so that
J
J
e
-n
J
I } (B. P' 8 A
nj
u
r PB.(B!rB.) B!P'
j=l
J
~
-1
u
=[
J
is obtained from Q
_ u
-1
A
J
Q
by
Now by direct computation we obtain
U
[r
!' where
~ A~A.)
J J
r
j=l
j
J
))
j=l
-1
IPB.(B~rB.)
J
J
J
-1
~ Aj)[ (B j rB j ) @ I
nj
J
r (PB.
e
AJ~]Y'
J
is given by (3.6.2).
q.e.d.
Alternative estimators to (3.6.2) can be obtained by iterative
techniques which are directly analogous to those given in Section 3.5.2
for the MGLM model.
]21 .
CHAPTER IV
TESTING LINEAR HYPOTHESES FOR THE HGLM
AND GGCM HODELS
4.1
Introduction
In this chapter, a method is proposed for testing linear
hypotheses in the MGLM and GGCM models when the underlying data is
normally distributed.
The test statistics are quadratic forms called
Wald Statistics, and they are constructed from BAN estimators of linear
functions of the treatment parameters and consistent estimators of the
variance parameters.
The asymptotic distribution of a Wald Statistic
is a central chi-square variable.
Thus when the sample size is large,
the test criteria yield chi-square tests.
Even for small sample sizes
the test criterion may have desirable properties, since it can be
regarded as a generalization of Hotelling's trace for the SM model.
Nevertheless, to use the Wald Statistic in small samples, it is necessary to obtain its exact distribution.
This has not yet been done and
appears difficult.
4.2
Review of the Literature
A number of different approaches have been used in the liter-
ature to obtain tests of hypotheses for the
~mM,
HM and GIM models.
The
tests obtained are generally impractical with regard to computation and
require more assumptions on the hypothesis and design matrices than the
112
usual estimability requirement.
For example, the necessary and suffi-
~
cient conditions given by Srivastava in [57] for reparameterizing the
:1D~1
model to the SH model are that the vector spaces spanned by the
In [58],
colunns of the different design matrices are the same.
Srivastava restricts the GIM model
regular" design.
l
to be what he calls a "strongly
In [57], he proves to be not strongly regular a par-
ticular example of the kind of design likely to be considered in pracIn [58], however, he works through a GIM
tice.
mode~
which is strongly
regular and specifies the general computational method for reparameterization.
Nevertheless, th 7 class of
strongly regular designs
needs to
be better described in terms ,of GIM models likely to be used in practice.
This problem seems formidable since the computations needed for va1idating the
strongly regular
property are quite complex.
Furthermore,
the reparameterized form of the model has a E matrix which is restricted
where A ••
~J
p ..
~J
may be less than 1 for some i#j
is the correlation between variates V.
~
and V.
and
The problem of
J
testing a hypothesis in a SM model with the above type of restriction
on E remains to be solved and is likely to require a laborious
computer procedure.
In [64], Trawinski and Bargmann summarize the main results of
Trawinski's thesis [63] in which is obtained a likelihood ratio test
for the special case of the GIM model in which u=p, n. A.
J
= A,
j=l, ... ,po
and
- q
J
The maximum likelihood equations for their model
require for solution an iterative technique based on the Newton-Raphson
lSrivastava's model in [58] allows ;.
-J
to
be (n.
J
x
to be (m.
m.), whereas we have assumed that rn.
J
J
= rn
J
x
1) and A.
for all j.
J
.'
113
method.
However, their method cannot be directly generalized when
not all the A.
matrices are the same.
J
In a demonstration of their
method for a particular set of data, the initial estimate [
used in the iteration yielded an estimate of;
which was comparable to
the final estimate obtained after several iterations.
estimate, in which the (r,s)-th element
estimates of a
rs
0
rs
[
o
is a pooled
is obtained by averaging
from all experimental units in which both V
r
are measured together.
In [59],
0
of [
o
and V
s
2
Srivast~Ya
has suggested-several different union-
intersection type tests for the HM model, one of which was obtained in
the earlier paper [55] and which involves a generalization of Roy's
step-down procedure [47].
The testing procedure involves making a se-
quence of independent F tests and rejecting the hypothesis if any of the
F tests are rejected.
Srivastava points out, however, that such a
procedure may only be used when the response variates can be ordered as
in the case of the HM model.
The \vald Statistic proposed below was first suggested by Wald
[66] and has been used by Allen [lJ in connection with nonlinear multi-
variate models, and by Bhapkar [12J and others [25J for problems with
2=
is the same as the [
o
of Corollary 2.4.6.
Thus
o
(J
rs
is
given by
(J
o
rs
= _-:1:--:--:-_
~
- R(A)
rs
[I
N
rs
- D
rs
(D' D
rs rs
Jx
rs-sr
)-D'
and x (N
x 1)
is the data vector
(N
x m) = [A' ... Al J'
-rs rs
rs rs
for V corresponding to all experimental units on which both Vr and
where D
•
Xl
-TS
r
V
s
are observed together .
114
categorical data.
Allen showed that the likelihood ratio criterion and
the Wald criterion for his nonlinear model are asymptotically equivalent.
For small samples, however, the Wald Statistic was not a good
approximation to a chi-square variable.
Nevertheless, this may have
been due to the use of a linear approximation of the test criterion,
which is not necessary for the linear MGLM model.
4.3
The Wald Statistic
In Section 3.2 of the previous chapter we have defined a BAl.1'
estimator
ace)
-n
for
aC~)
by requiring
I
2 .
.~~rn C LS.(6)~
n
where
e
-n
aCe )])
aC~)
(0 , I )
w~
~
is the true value of
-i)
N
+
-i)
w
Cu x 1) ,
is a continuous (w x 1)
vector function of
e
with
continuous partial derivatives,
=
C
n
6
6'B6
0 n,8
0
-i)
is symmetric, positive definite,
a a(~)
o
= a
8
e = -e0
Bn ,8 =
-0
e
e ]
=
-
~
and
Now if
"I'n
a(~)
-0
is the likelihood function for a sample of size n .
is BAN for
a(~),
then it is well known, e.g., Cramer [17],
that
(4.3.1)
n [&(8 ) - a(8 )]' C-n
-0
n
Furthermore, if ~
P
~,
l
[s.(.tLJ
u
we can claim, e.g., using [17], that
~
us
A
(4.3.2)
n.
A-I
[0'(6
) - .Q.
0-(6--,:) )]' Cn .
.Q. -n
A
where C
is obtained from C
n
by substituting -n
6
n
for --,:)
6
Now suppose we want to test the hypothesis
Ho : a(6--,:) )
=0 .
Wald [66] suggested using the statistic
(4.3.3)
2
which is asymptotically X when H
w
o
whenever W
n
is true, i. e. , we would reject H
o
x~ , where a
exceeded the (l-a)-th percentage point of
is a chosen level of significance.
We call the statistic W
n
the Wald
Statistic, and we will use it to obtain test criteria for the MGLM and
GGCM models.
We note here that Wald suggested using W
when the data
n
in the sample were i.i.d. and when 6
was the MLE of 6.
-n
It is clear
from the above, however, that the asymptotic distribution of W
n
remain chi-square when a(6)
-n
is any BAN estimator for
the closeness of the approximation of W
n
vary with different BAN estimators.
2
to X
w
a(~),
will
although
in small samples may
For the i.i.d. situation using
MLE's, Wald showed that a test using W
n
had a number of desirable
asymptotic properties; namely, it had aSymptotically best average
power, asymptotically best constant power, and was aSymptotically most
stringent.
It still remains to be shown that these properties can be
carried over to W
n
formed from independent but not i.i.d. data using
arbitrary BAN estimators.
4.4
Testing Linear Hypotheses in the HGU1 Model
We wish to test the hypothesis
116
Ho : H's..
where H'I
{;
-
.'2. -
=0
is estimable, H(M
(' I
{; I )
.2.1 ••• .'2.p
HGLN model.
x
w)
is of full rank
M)
and
is the vector of unknown treatment parameters in the
I
Since we have already obtained the asymptotic variance
matrix of any BAN estimator for H's..
in (3 •.2.23) of Chapter Three, we
can form the corresponding Wald Statistic.
= H '£
Theor~l
w(~
then t.
4.4.1:
=
a a(~)
=H
aI
~1GLM
Fer the
.
Also, it is easily seen that
Thus we have the following theorem:
model, let H'I
be estimable where H(M
x
w)
.J
is of full rank w.
H
o
Then under
HII
=0
we have that
(4.4.1)
W
n
is asymptotically distributed as a central chi-square variable with w
degrees of freedom, where
I is any positive definite consistent estimator of I,
nand D are defined by the vector version (1.4.8),
H'£
and
is any BAN estimator of
In particular we may use H'I
Three and I
H'I .
as given in Theorem 3.5.1 of Chapter
as given in Th eorem 2 . 4 . 1
0
f Ch apter Twoo
3,4
3 If Z is not positive definite, however, then its use will lead
to negative values of W ,~thus giving spurious results. In Chapter
Five it is shown that tRe I of Theorem 2.4.1 is not always positive
definite.
4D,::-lD
may be repla~ed by its equivalent form in terms of the
original design matrices and I as given in Theorem 3.2.1 of Chapter
Three.
117
To test the hypothesis H
:
o
0
we may thus reject
H
;£
W
...
W
can be used as a test statistic provided its exact distribution
o
n
>
n -
2
Xw,l-a.
H'~ =
can be obtained.
and accept otherwise.
When-the sample size is small
The following corollary indicates that Wn
desirable test properties even when n
Corollary 4.4.2:
might have
is small.
The Hald Statistic obtained for the SM model is
2
equivalent to Hotelling's Trace Criterion (T ) when testing H :
o
0
where C(g
Proof:
x
m)
and D(p
v)
x
are of full rank g
C~D
and v, respectively.
Using the notation of Section 1.1 we have that
j
2
-1
To = tr SHSE ,where
~
-1
SH = CC.;D)'[C(A'A)-C ' ] (CE;;D)
S
E
=
D ' Y'[1 - A(A'A)-A']YD
and
The Wald Statistic for testing
C~D
0 (i. e. , (D I ~ C ]~
= .9)
is
clearly given by
-1
[ (D' e C)§..J'[(D' e C)(z-le A'A) - (0 ' e C) , ] [ CD e
n =
W
I
~
where
')
-p
I =
1n
,
Y'[1 - A(A'A)-A']Y
I-I @ 1 A'A
and
C)~]
n
is the information matrix for ~ .
Now using the properties of Kronecker Product multiplication and generalized inverses, we have that
-1
W = [(0' e C).£]'(D' e C){ Z 0 (A'A)-}(D e C')] (D' e C)§..J
n
=
0
118
= [CD 1
@ C)~]
= n[ (D'
= n
"
CE;d
@
, [D' L:D
C)i] 1[S-l
-
E
-1
C(A'A) - C' ] [ (D 1
@
-1
(C(A'A) -C1) ][ (D 1
C)~]
~
~
C)~J
-1
[S-l
E
1
~
@
(C(A'A)-C') ]
CE;~l
CE;d
-v
where
d.
is the j-th column of D
-J
v
=
n
ij
sE
where
=
v
..
"
-1
,.
L:
L: s~J(C;d )'[C(A1A)-C 1] (CE;d.)
E
-i
-J
i=l j=l
-1
is the (i,j)-th element of SE
n
q.e.d.
It is easily seen that even for the MGLM model, W
n
can be
written as a trace statistic:
,.
W
n
=
-1
tr[H'(D,~-lD)-H] (H'~)(H'I)'
.
2
to be a generalization of Hotelling's T
o
so that we may consider W
n
2
Since T
o
has been shown in the computer studies of Schatzoff [50] to
provide good protection for a wide spectrum of alternative hypotheses,
we might suspect that W
n
will have this property, too.
Nevertheless,
work still needs to be done here and a study similar to that of
Schatzoff might be used to compare different tests for the
MGL~I
model.
We note in addition that if we consider the hypothesis
H
o
C ;D = 0
for the
GL~
model, then we obtain the following corollary:
•
119
Corollary 4.4.3:
hypothesis
For the GIM model, the Wald Statistic for testing the
C~D
H
o
=
0, where t~D
is estimable and C(g x m)
D(p x v) are of full rank g and v, respectively, is given by
. (4.4.2)
~
W
n
= [(D'
U
~ C)~]'[(D' ~
C){
-1
A
.
_
-1
~ B.(B~~B.) B~ ~ A~A.}(D ~
j=l J
J
J
J
J J
C')J [(D'
~
C)iJ
where
C~d
-v
d. (p
-J
~ C)~
(D'
and
4.5
is the j -th column of D
x 1)
~ C)~
is any BAN estimator of (D'
is any consistent positive definite estimator of
~
~.
Testing Linear Hypotheses in the GGCM Model
For the GGCM model, we wish to test the hypothesis H
o
H'~
where
~ = (~l'
is estimable, H(mp
... , -p') ,
~
w)
x
is of full rank
w(~
mp)
: H'I
=
and
is the vector of unknown treatment parameters .
Since we have already obtained the asymptotic variance matrix of any
BAN estimator of
H'~
in Theorem 3.2.4 of Chapter Three, we can form
the corresponding Wald Statistic.
Theorem 4.5.1:
H(mp
x
For the
GGw~
is of full rank w.
w)
We thus have the following theorem:
model, let
H'~
Then under H
o
be estimable where
H'~
= 0, we have
that
-1
-1
PB.(B~2:B.) B~P' 0 A~A.}-H] (H'_~)
J J
J
J
J J
j=l
u
(4.5.1) W =
n
(H'~)'[H'{
2:
is asymptotically distributed as
" ,
x~
where
0
120
is any consistent positive definite estimator of r
~
H'~
and
is any BAN estimator of
H'~
.
as given in Lemma 3.6.1 5 and H'i as
In particular, we may use ~
given in Theorem 3.6.1 of Chapter Three.
When restricted to the GCM model and the hypothesis
C~D
=
0,
we obtain the following corollary:
Corollary 4.5.2:
C(g x m)
For the GCM model, let
be estimable where
C~D
are of full rank g and v, respectively.
and D(p x v)
Then
the Wald Statistic is given by
(4.5.2)
-1
(C~D)' (C (A' A) -C'] (C~D)
A
where
SH
=
S
= D'(p~-lp')
-1
E
and
Proof:
and
C~D
D ,
-1
~
=
S
= Y'[I - A(A'A)-A']Y
(A'A)-A'YS-lp'(pS-lp')
We have already seen in Chapter One that
1:.
s =~
n E
is consistent for
~.
C~D
is BAN for
Thus the WaldStatistic (4.5.1)
becomes
W
n
-1
[(0' 3
C)~]'[(O'
3 C) (pr-lp' 3 A'A) (0
= [(D' 3
C)~]' [(0'
@
C) {(p~-lp')
~
1
=
A
=
A
-1
@
C')] [(D' 8 C)~]
-1
-1
e
(A'A)
}(O
-1
e
C')] (D'
[(D' e C)~]'[O'(PL:- pI) De C(A'A)-C'] [(0' ~ C)~]
5
~
L
must be positive definite, however.
~ C)~]
121
,e
C'"d
'"'-v
where
d.
is the j-th column of D
-J
=
n
.v
v
l:
l:
-1
s~J(C~~i)[C(A'A)-C'] (C~~i)
..
~
i=l j=l
s~j
'\There
is the (i,j)-th element of S~l
-1
~
= n tr (fF;D)'[C(A'A)CIJ (C~D)S~l
q.e.d.
Note that SE
that SH
is the same as that obtained by Khatri in (1.2.2.5) but
differs from Khatri's (1.2.2.4) in that
-1
R = (AIA)-+ (A'A)-A'Y[S-l- S-lp'(pS- 1P') pS-l]Y'A(A'A)is now replaced by (AlA)
Clearly Khatri's trace and ~ W
n n
are
asymptotically equivalent since they have the same asymptotic distribution.
Also both of these statistics use the information in S, the
sample covariance matrix.
It is not obvious, however, which of these
statistics is better in small samples, either as an approximation of a
chi-square variable or with regard to protection against alternative
hypotheses.
A study similar to that of Schatzoff [50] would perhaps be
needed to compare the two with regard to the latter.
Use of Khatri's trace for the GCM model clearly suggests an
.e
analogous competitor to (4.5.1) for the GGG! model.
We can arrive at
such a competitor by using the method of covariance adjustment.
Since
122
we have assumed that P. = PB.
.
J
is of rank p
J
can thus find matrices F. «q.
J
j
for each j=l, ... ,u , we
of full rank q. - p
J
J
for each
such that
P.F~
(4.5.3)
J J
= O(p
x
(q. - p», j=l, ... ,u.
J
-1
Note that we can choose F. to be a column basis for I
J
qj
P~(P.p!)
-
J
P.
J
JJ
We can now consider the following competitor to (4.5.1):
-1
u
-1
W* = (H'E;)'[H'( E PB.(B~EB.) B'P' ~ Q.)-H] (HIE;)
n
j=l J J J
j
J
-
(4.5.4)
where
-1
Q. = A!A.- A~Y.F~(F.Y~Y.F~) F.Y~A. , j=l, ... ,u .
J
J J
J J J J J J J
J J J
(4.5.5)
To complete the analogy, we would like to show that Q.
J
of the choice of F
j
R.
J
= Q.
J
in the sense that Q
has a generalized inverse
j
which is analogous to R.
is independent
e. '
We first state without proof an
important result due to Khatri:
Lemma 4.5.1. (Khatri [31]):
and PF' = 0
Suppose pep
where F «q - p) x q)
q)
x
has full rank
is of full rank.
p(~
Then if S(q
q)
x
q)
is any positive definite matrix, we must have
-1
(4.5.6)
F' (FSF') F
We can now use Khatri's lemma to prove the following result:
Lemma 4.5.2:
Let P
be of full rank q
and F
be defined as in Lemma 4.5.1.
and let Seq
x
q) = y'[I
generalized inverse of
-1
Q = A'A - A'YF'(FY'YF') Fy'A
is given by
n
- A(A'A)-A']Y
Let yen
Then a
x
q)
123
-1
R = (A'A)-+ (A'A)-A'Y[S-l- S-lp'(pS-lp') pS-l]Y'A(A'A)-
(4.5.7)
is any generalized inverse of A'A.
where (A'A)
Furthermore, if A
is of full rank m, then (4.5.7) gives a unique inverse of Q .
Proof:
Let U = A'A , Y = A'YF', W = Fy'YF' .
we have R = U + U-Y(FSF,)ly'U-.
Then, using Lemma 4.5.1,
Clearly using Theorem A.l.l (ii) of
the Appendix
UU Y = (A'A) (A'A)-A'YF' = A'YF' = Y
(4.5.8)
always.
Also
(4.5.9)
YW-1FSF' = y - YW-ly'U-y
since
FSF'
= FY'[I
-1
Furthermore QR = [U - VW-ly'][U- + U-Y (FSF') y'U-]
-1
- YW-1Y'U-]Y(FSF') V'U
= UU
YW-ly'U-+ [UU
= UU -
-1
VW-ly'U- + [V - YW-ly'U-Y] (FSF') Y'U-
-
using (4.5.8)
= UU
Thus
=Q
so that R is a generalized inverse of Q.
~ow
.e .
suppose A is of full rank, so that A'A
is of full rank
-1
and (A'A)
= (A'A) , the unique inverse of A'A.
Then following the
124
= UU = I = UU = RQ, so that (4.5.7)
above proof it is clear that QR
will be the unique inverse of Q
q.e.d.
Using Lemma 4.5.2 we obtain the following theorem:
Theorem 4.5.3:
For the GGCM model let P.
PB.
J
J
S. = Y~[I - Aj(A~A.)-A!]Y. , and define F.J
J n
.1 J
J J
J
a generalized inverse of Q.
J
J J
Thus a competitor to
Then
as defined in (4.5.5) is given by
J
R. = (A:A.)
to satisfy (4.5.3).
,
-,
-1
-1
J
J
-1
+ (AjA.) A.Y.[S. - S. P:(P.S.
J
(4.5.~)
J J
for testing
J
HI.~ =
J J
0
-1
P~)
J
-1,
P.S. ]Y.A.(A~A.)
J J
J J
J J
is given by (4.5.4).
CHAPTER V
prJ\CTICAL EXAMPLES
5.1
Introduction
In this. chapter, four examples of MGLM and GGCM models are
presented to illustrate ~he kinds of results obtained when using the
BAN estimators proposed in Theorems 3.5.1 and 3.6.1 of Chapter Three
and the Wald tests of linear hypotheses described in Theorems 4.4.1 and
•
4.5.1 of Chapter Four.
(2)
MOM (p=2), (3)
The four models used are (1)
~illM
(p=4) and (4) GGCM (p=7).
HIM (p=3) ,
The data for the
first three of these models were generated using a normal random number
generator.
Thus the BAN estimators obtained for these models could be
compared with the true values of the parameters since the latter
(usually unknown) were known.
Also, the results of tests of hypotheses
could be compared with the actual truth or falsity of the hypotheses
in question.
Furthermore, the procedure used to generate the data
enabled an experiment with a fixed set of underlying parameters {I,
E}
to be repeated to give additional data sets and thus provide some
indication of the way in which the estimates and test statistics vary
from sample to sample.
the model and the true I
It was also possible to vary L while keeping
fixed to see how different covariance struc-
tures affected the estimates and test statistics.
126
The data for the fourth model were obtained by deleting observations from growth curve (GCM) data given in Allen and Grizzle's paper
[2].
Thus although the unknown parameters in this case remain unknown,
the BAN estimators and test results obtained could be compared with the
results of Allen and Grizzle for the complete (i.e., without missing
observations) data.
The numerical calculations for. all the examples were·made with
the assistance of a GE-265 Time-Sharing Call-a-Computer (CAC).
It is
important to note thattheCAC system does not allow calculations to be
done in double precision.
Thus it is possible that some of the results
obtained were not as accurate as could be achieved from a computer such
as the 360 which does allow double precision.
5. 2
Examp 1 e 1 :
An
HIM nModell
The model used for Example 1 is given by
E (Y .)
J
where
A(15
x
2)
= AE; B •
J
=[1
1
1
1
1
1
1 ·1
1
-7 -6 -5 -4 -3 -2 -1
o
1
2
3
4
6
1
1
1
1
1
5
and
The parameters
~
and [
used in the simulation are
lThis model, except for different B. 's, was used by Trawinski
J
in her thesis [63].
127
.e
;(2
3)
x
15
and
2:(3
3) =
x
2
The (15
x
3) data matrices Yl' Y
and Y
3
2
14
6
6
3
-4
0
-4
o
32
were obtained by deleting
the appropriate observations from the complete data matrix
Y* (45 x 3) =
Y*
1
Y*
2
.y -I<
- 3
where y~ = A~ + E. , j=l,2,3
J
and the rows of E.
J
J
are distributed as
Four sample data sets were generated.
The estimators of 2:
are given in Table 5.1.
as given in Corollary 2.4.7 of Chapter Two
Also given in Table 5.1 are the estimators of L
obtained from the complete data.
The
B~~
estimators
~
as given in Corollary 3.5.1.1 of Chapter
Three are given in Table 5.2 together with the corresponding estimators
for the complete data (which are BLUE as well as
is
(I
I) '(I - I),
the squared distance of
I
B~~).
from
Also provided
I·
Four tests of hypotheses of the form H'I = 0
were performed
for each of the four samples using the,Wald Statistic described in Chapter Four.
HI-I)
e-
The hypotheses tested are as follows:
HI (2 x 6)
= D'
-e
o -2
0
0
0 --3
0
0
:]
~ C
or
HI
where
C (1 x 2) :: (1, 0)
and DC) • 2) • [:
-2
0
01'
-3
J
TABLE 5.1
ESTll1ATES OF L:
=
~ -6J
32
[1:
-4
FOR
Since
H'~ =
~
(0, -20)'
SM~LES
I-IV OF
0
EXfu~PLE
1
(0, 0)' , the test statistic should reject this
hypothesis.
H 1-2)
or
H'(2 x 6)
H'
=
D'
@
C
o
-2
0
0
o
a
0
-2
129
-
-
§..(IV)
§..(IV)
9.250
9.25
9.885
9.885
2.807
2.807
2.871
2.870
~
§..(II)
10
10.122
10.12
10.080
3
3.000
15
14.928
2
1.997
20
19.404
1
0.736
0.977
1.290
1.105
1.195
1.110
D2
0.47
1.45
0.26
0.02
2.59
0.55
2.999
§..(II)
3.086
14.99
3.086
14.915
1.986
15.09
2.048
20.447
18.84
19.13
~
Since H'§.. = (0, 0) ,
14.78
1.945
14.741
1.934
18.512
14.85
1. 912
19.38
16.688
1.909
18.87
0.899
1l.02
0.780
1.29
TABLE 5.2 2
FOR SAMPLES I-IV OF EXAMPLE 1
C (1 x 2) = (1, 0)
where
14.607
2.036
"1
ESTIMATES OF
I(III)
10.08
-
A
~(III)
~(I)
..
e·
-
§..(I)
and
D(3
[:
2) =
x
-2
a
T
-2
, the test statistic should not rej ect the
hypothesis.
HI (4
H 1-3)
x
6) =
;-
•
or
H' = D' @ C
where
C (2 x 2)
a
-1
0
a
a
0
1
0
-1
a
a
1
a
0
a
-1
a
0
1
0
0
0
-1
= 12 and D(3
x 2) =
-1
G
0
T
-1
estimate for the HIX data and S. denotes the
corresponding estimate for the complete data. Also, D2= (£-f) , (I-I)
2
A
f denotes the
e
1
BAJ.~
-
where i is replaced by i for the complete data.
130
where t.;
t.; = -3
t.;
II = -2
--s
This is equivalent to testing
column of t.;.
H 1-4)
is the s-th
The test stat ist ic should rej ect the hypothesis.
H' (4
x
6) =
3
0
-2
0
0
0
0
2
0
-3
0
0
4
0
0
0
-2
0
0
2
0
0
0
-6
!r
This hypothesis ,matrix cannot be expressed as H' = D' 8 C.
H'i =
Since
(0, 0, 0, 0)', the test statistic should not reject the hypothesis.
The Wald Statistics obtained for each sample and each hypothesis
are presented in Table 5.3.
The values obtained were compared with
percentage points of a chi-square random variable with the appropriate
degrees of freedom and were deemed significant if they exceeded the 95th
percentage point.
Incorrect test results are circled.
For the complete ~
data sets, test statistics were also calculated. 3 ,4
Tables 5.2 and 5.3 show that the estimators I and test statistics compare well with the true
hypotheses, respectively.
~
and the truth or falsity of the
Furthermore, these results compare favor-
ably with those for the complete data.
Those
~ses
in which the HI}!
data estimate is better than the complete data estimate may be attributed to sampling variability.
The estimators of E in Table 5.1 for
3This was done using program ~WGLM*, a library CAC program which
could not handle hypothesis H 1-4 since it was not of the form C t.;D=O.
Furthermore, only likelihood ratio statistics were provided for H 1-1
and H 1-2. For H 1-3 both likelihood ratio and T~ statistics were
given. Thus the test results for the complete model could not be compared numerically with those of the HIM model except for H 1-3 , and
they are therefore described only with regard to whether the hypothesis
was rej ected.
4For each of the complete data samples, hypotheses H 1-1 and
H 1-3 were rejected with .01 significance and hypothesis H 1-2 was
not rejected.
~
~
131
e
I
Sample
II
III
IV
D.F.
23.97**
13.11**
2
Hypothesis
li 1-1
21. 59**
33.02**
H 1-2
0.26
0.66
2.94
(i.oiB>
2
H 1-3
359.62**
902.71**
374.57**
754.41**
4
H 1-4
0.93
0.10
..s.72
9.06
4
5
TABLE 5.3
WALD STATISTICS FOR
the
EXM~LE
1
liIH data are not very good, but since the estimators for the com-
plete data are not accurate either, it is possible that these results
can be attributed to sampling variability and too small a sample size.
The results also indicate that the estimators i
are fairly accurate
despite small differences in Z) although. it should be remembered that
...
only four simulations we~e made. Clearly Z(IV) was the worst of the
estimators of Z and was considerably poorer than Z(I) and Z(I1).
Yet
i(IV) was inaccurate only in its estimate of 16.688 for 20.
5.3
Example 2:
An MDM Model
6
with p=2
The model used for Example 2 is given by
Var(Y)
= 1 30 0 Z
5 * denotes .05 significance but not .01 significance.
denotes .01 significance. A circle denotes a test result that leads
to an incorrect conclusion about the given hypothesis. D.F. denotes
degrees of freedom.
**
,e
6This model was used by Kmenta and Gilbert in their study [33].
132
where
Al (30
x
3)
=
1
1
1
I
1
3'
3
3
0
0
0
I
1
1
1
I
1
.J
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
9
9
9
1
1
1
1
1
1
1
1
1
the rows of Y(30
and
The parameter
.;')'
i = (.;'
-1 -2
£.1 (3
x
x
x
3)
=
I
1
I
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
10
10
10
8
8
8
0
-0
0
16
16
16
0
0
0
0
0
0
4
4
I
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
6
1
6
6
3
3
3
1
1
0
0
0_
8
8
8
1
1
1
4
6
6
6
7
7
7
0
0
0
0
0
0
1
1
1
1
1
1
10
10
10
14
14
14
2)
are each normally distributed.
1
•
is given by
1) · UJ
and
5..2 (3
Five sample data sets were simulated.
E
A (30
2
0
0
0
1
=
[:.6 :.6]
=
[: l:J
x
1)
=
[1:]
Samples I-III used
Sample IV used
E
and Sample V used
e
.
133
E
=
20
-18] .
[ -18
29
The estimators E of E as given in Corollary 2.4.5 of. Chapter
Two are presented in Table 5.4 and the BAN estimators
~
of
~
as
given by (3.4.8) of Chapter Three are presented in Table 5.5.
Four tests .of hypotheses of the form
H'~
=
Q
were performed
for each of the five samples using the Wald Statistic described in Chapter Four.
The hypotheses tested are as follows:
H' (3
HI-I)
6) =
x
.
.."
-1
0
0
-1
0
0
0
1
0
-0 . -1
0
0
0
1
0
This hypothesis is equivalent to testing
e·
0
-1
~l = ~2
The test statistic
should rej ect the hypothesis.
Sample
I
II
III
IV
V
E
:.6]
[ :.6
[
[
[
[ 0.708
0.474
1
0.6
1
0.6
:.6J
[ 0.853
:.6J
[
5
5
5
10
[20
-18
-18
29
0.851 _
0.368 J
0.368
0.599
1.105
0.370 J
0.370
0.873
3.434
4.982
J
4.982
10.827
1
J
[21.834
I
ESTD~TES
0.474 J
-21.799
TABLE 5.4
OF r FOR ~~LE 2
J
-21. 799 J
29.107
134
"
S-
i(I)
i(II)
s..(UI)
i(IV)
s..(V)
10
10.528
10.328
10.053
10.466
8.371
2
1. 956
1.944
1. 930
1.885
1.909
-5
-5.276
-5.164
-4.794
-4.637
-3.847
-10
-10.298
-9.701
-10.393
-9.430
-10.133
6
6.036
5.996
5.995"
5.993
6.062
3
3.360
2.721
3.444
2.399
5.108
0.59
0.32
0.37
0.94
4.03
n2
BA.J.~
H 2-2)
H' (2
x
TABLE 5.5
ESTIMATES OF s.. FOR EXAMPLE 2
6) =
[:
Since H's.. = (0, 0)
I
0
0
1
0
3
0
0
-1
:]
e
for this hypothesis, the test statistic should
not rej ect the hypothesis.
H 2-3)
H' (2
x
6) =
[:
Here again H't; = (0, 0)
r
2
5
0
0
0
-:]
1
°
°
so the test statistic should not
rej ect this
hypothesis.
H 2-4)
H' (2
x
6)
= [:
Since for this hypothesis H's..
5
2
o
0
o
o
°
1. 2
°1 .
-2
(0, 1.2)' # (0, 0)' , the test statistic
should reject the hypothesis.
The Wald Statistics obtained for all samples and for all hypo theses are presented in Table 5.6.
Examination of Table 5.5 shows that i
was very good for
e
135
-
Sample
I
II
IV
III
V
D.F.
507.68**
3
Hypothesis
H 2-1
8433.98**
6672.53**
4429.45**
2114.31**
H 2-2
1.63
2.35
2.77
2.63
1.64
2
H 2-3
2.63
2.64
1. 75
0.48
2.62
2
H 2-4
~
13.35**
~
@
@
2
TABLE 5.6
WALD STATISTICS FOR EXAMPLE 2
Samples I-III, a little.worse for Sample IV (where lEI
very good for Sample V (where lEI
that the estimates ~
256).
= 25),
and not
These results may indicate
will decrease in accuracy as lEI increases.
The results given in Table 5.6 provide good test performance
except for the fourth hypothesis, for which four of the five samples
did not give significance.
It is not obvious why the results for the
fourth hypothesis were poor since, when comparing H'f
sis in all five samples, H'f(II)
the origin.
In fact
H'~
(IV)
and
for this hypothe-
does not have greatest distance from
H'~
(V)
have larger distances from
the origin and yet the test statistic was not significant for these
samples.
The estimate of the information matrix clearly plays an im-
portant role here also.
5.3
Example 3:
An HDH Hodel
7
with p=4
The model used for Example 3 is defined by
E(Y)
Var(Y) = 1
20
0 L
7This model was also used by Krnenta and Gilbert [33].
136
e
where
Al (20
A (20
2
x
x
3)
3)
-n
1
1
0
1
1
0
1
=G
1 1 1
0 10 10
1 0 0
1
8
0
1
8
0
1
2
0
1
0
1
1 1 1
0 18 18
1 1 1
A (20
3
x
3) = A1
A (20
4
x
3)
. [~
1
3
1
1
6
1
1
3
1
6
1
1
0
1
the rows qf 1(20
and
1
1
9
9
1
1
1
1
0
1
0
0
1
1
6
1
1
6
1
1
1
3
3
0
1
1
16
0 0
1
0
1
1
0
1
1
1 1 1
2 12 12
0 1 1
o 16
0
4)
x
1
1
0
The parameter; = ( ; ' ; ' ; ' ; ' ) '
-1-2-324
2
0
1
6
0
1
6
0
1
~.
0
1
8
1
1
8
1
1
0
1
1
0
1
1
4
1
1 1 1 1
4 10 10 14 14
1 1 1 1 1
1
6
0
1 1 .1 1 1
6 16 16 12 12
0 1 1 0 0
7
0
~'
1
7
0
~l'
are each normally distributed.
used in the simulation is given
by
~l (3 x
1) =
~3(3 x
10
1) =
and ; (3
-2
x
1) = ; (3
24
2
x
1) •
-1:]
-5
Five data sets were simulated.
E =
•
Samples I-III used
1.
0.64
0.64
0.64
0.64
1.
0.64
0.64
0.64
0.64
1.
0.64
0.64
0.64
0.64
1.
and Samples IV and V used
L: =
3
3
-4
3
3
6
-3
6
-4
-3
14
-14
3
6
-14
21
The estimators L:
for Samples I-III are presented in Table 5.7
e
137
and the estimators
~,
in Table 5.8.
Nine tests of hypotheses of the form
H'~
=
0
were performed
for each sample (including Samples IV and V, which will be dealt with
later) using the Wald Statistic.
H 3-1)
H(3 12)
I
x
=
The hypotheses tested are as follows:
[1 0 0 0 0 0-1 0 0 ° 0:0]
°1
0
0
°
0
0 -1
0
a a
1
a
0
0
a a
-1
a a
a 0
138
A
A
£
f.(I)
f.(II)
f.(III)
10
9.692
9.823
9.615
2
2.160
2.000
1. 967
-5
-5.723
-4.784
-4.346
-10
-10.700
-9.695
-9.907
6
6.028
5.975
6.016
3
3.367
2.509
2.669
10
10.325
9.951
10.152
2
1. 989
1. 948
1.944
-5
. -5.863
;"4.612
-4.563
-10
-9.800
-10.762
-9.912
6
6.040
5.992
5.963
3
1.865
4.253
3.479
e
.~
2
D
BAN
3.02
ESTI~~TES
OF
2.92
~
1.22
TABLE 5.8
FOR SM1PLES I-III OF EXAMPLE 3
This hypothesis is equivalent to testing
= ~3'
~l
The test statistic
should not reject the hypothesis.
H 3-2)
HI (3 x 12)
=
r:
0
0
1
0
0
0
0
0 -1
0
0
0
1
0
0
0
_0
0
0
0
0
1
0
0
This is equivalent to testing
~2
rej ect the hypothesis.
=
I;
~
0
0
0
0 -1
0
0
0
0 -1
The test statistic should not
e-
139
e
1
0
0
0
0
o -1
0
1'0
0
0
0
0
1
0
0
0
0
0
0
0
0
This is equivalent to testing
1..1
H 3-3)
H'(6 x 12) =
=
0
0
0
0
0
0
o -1
0
0
0
0
0
0
0
0 -1
0
0
0
1
0
0
0
0
0 -1
0
0
0
0
1
0
0
0, 0
0 -1
0
0
0
O. 1
0
0
and
~
-3
1..2
=
0
0
0 -1
The test statis-
~
i4
tic should not rej ect the hypothesis.
H 3-4)
H' (9 x 12) =
••
1
0
0
0
0
0 -1
0
1
0 .0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0 -1
0
0
0
0
0
0
0
0 -1
0
0
0
1
0
0
0
0
0 -1
0
0
0
0
1
0
0
0
0
0 -1
0
0
0
0
0
1
0
0
0
0
0 -1
1
0
0 -1
0
0
0
0
0
0
0
0
0
1
0
o -1
0
0
0
0
0
0
0
0
0
1
0
0 -1
0
0
0
0
0
0
~
= ~ = ~
This is equivalent to testing II = -2
-3
i4
.
The test statistic
should rej ect the hypothesis.
H 3-5)
H' (3 x 12) =
1
0
0
0
0
0 -1
0
0
0
0
0
0
0
0
1
0
0
0
0
0 -1
0
0
0
0
0
0
0
1
0
0
0
0
0 -1
Here H'I • (0, 0, 0)' so that the test statistic should not reject the
hypothesis.
H 3-6)
H' (2 x 12)
[:
e
Here H'S. • (-1,0) ,
hypothesis.
1
0
0
0 -1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
-:]
so that the test statistic should reject the
140
H ' (2
H 3-7)
Here H'S.
=
x
12)
=
C
0
0
0
0
1
0
0
0 -1
0 -1
0
0
0
0
0
0
0
0
0
:]'
(0, -1)' so that the test statistic should rej ect the
hypothesis.
H' (2 x 12)
H 3-8)
=
[:
Since H's.
=
H 3-9)
(0, 0) "
H' (4
x
H'~
=
0
0
0
0 -1
0
0
0
0
0
0
0
0
1
0
0
0
0
0
-:]
the test statistic should not rej ect the hypothesis.
12)
=
J
Here
0
(0, 0, 0, 2)'
1
0
0
0
0
0 -1
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0 -1
0
0
0
0
0
0
0
0 -1
0
0
0
0
0
0
0
0
0
0
0
so that the test statistic should reject the
hypothesis.
The Wald Statistics obtained for Samples I-III are presented in
Table 5.9.
The estimators of
I
given in Table 5.8 are fairly accurate with
the noticable exception of the estimate of the last parameter for f(I)
and I(II).
I(III) seems to be superior to the other two estimates.
The
accuracy of the test results of Table 5.9 appears to be correlated with
the squared distance of
I
from
~
given in Table 5.8 since the number
of circles in the latter table decreases as the distance increases.
results for Sample I
~vere
The
particularly poor as four of the nine test
statistics gave wrong answers.
Estimators of Z for Samples IV and V are presented in Table 5.10
and estimators of
i are presented in Table 5.11.
nants it was shown that the estimators of:
Upon taking determi-
based on Corollary 2.4.5
e-
TABLE 5.9
WALD STATISTICS FOR SAHPLES I-III OF EXAMPLE 3
,e
of Chapter Two were not positive definite (since the determinants were
negative).
Although this does not seem to noticeably affect the accu~.
racy of the BAN estimator
~
, it is clear that lack of positive definite-
ness could (and did) result in negative Wald Statistics.
Thus for the
purpose of testing hypotheses it is necessary to adjust the estimator
L
to be positive definite.
o
-lv'(I n
).J
-"-S
rs
rs
Noting that the estimator is given by
-1
-1
- As (AlA)
A']
[I - A (AlA) A']Z , r,s=l, ... ,p
s s
s
n
r r r
r r
-1
-1
).Jrs = tr(I n - As (AlA)
s s A']
s [I n - Ar (A'A)
r r A']
r
and Z
e·
s
is the s-th column of the data matrix Y , it is clear that
replacing).J
rs
for each (r,s) pair by some constant would yield an
142
Sample IV
Sample V
3
3
-4
3
3
3
-4
3
3
6
-3
6
3
6
-3
6
-4
-3
14
-14
-4
-3
14
-14
3
6
-14
21
3
6
-14
21
Irl
36
=
3.030
1.651
-3.190
0.256
1.372
1.735
-3~142
3.175
1.651
4.806
1.548
1.331
1.735
4.733
-2.214
6.372
-3.190
1.548
13.054
-7.789
-3.142
-2.214
0.256
1.331
-7.789
10.549
3.175
III = -25.06
16.088 -15.327
6.372 -15.327
lEI
21.727
-1.25
=
2.576
1.285
-2.712
0.216
1.166
1.350
-2.671
2.677
1.285
4.085
1.204
1.036
1.350
4.023
-1.723
4.962
-2.712
1.204
11.096
-6.576
-2.671
-1.723
0.216
1.036
-6.576
8.967
2.677
r*
4.962 -12.941
I[I
1[1 = 50.23
=
18.468
0.54
1.372
1.589
-3.142
3.150
1.589
A.733
~2.027
5.838
-3.142
-2.027
3.150
16.088 -15.225
5.838 -15.225
Ir*l=
ESTI}~TES
13.675 -12.941
21.727
0.46
TABLE 5.108
OF r FOR SAMPLES IV AND V OF EXAMPLE 3
8 In this table, : is the estimator obtained from Corollary 2.4.5,
, and [*
rs
is the adjusted estimator which uses n - 3 = 17 instead of u
rs
i: is the adjusted estimator which uses n • 20 instead of u
~
~
143
e<
-
s.(IV)
S.
s.(IV)
-
s.(V)
~*(V)
s.(V)
10
10.849
10.876
10.191
10.179
10.179
,
2
1. 954
1.943
2.083
2.082
2.082
-5
-5.092
-5.042
-5.342
-5.310
-5.310
-10
-9.872
-9.729
-9.641
-9.585
-9.585
6
6.040
6.039
6.026
6.030
6.030
3
3.296
3.060
2.634
2.505
2.505
10
10.048
10.030
8.748
8.978
8.978
2
2.091
2.092
2.010
1. 952
1. 952
-5
-6.032
~6.010
-3.787
-3.733
-3.733
-10
-10.661
-10.593
-8.314
-8.538
-8.538
6
5.938
5.924
5.905
5.934
5.934
3
4.549
4.638
1. 784
1.762
1.762
2
D .
4.51
4.86
7.93
6.93
6.93
BAN ESTIHATES OF
estimate of
~
S.
9
TABLE 5.11
FOR SAMPLES IV AND V OF EXAMPLE 3
which was positive definite (since the estimate could
then be written as XX'
for some matrix X).
To keep the adjusted esti-
mate consistent (although it would clearly be biased) the obvious
replacement for
would be n-£
~rs
would be n (in this case 20).
where £
Another possibility
was some positive constant between 0
and n .
Table 5.10 contains for Samples IV and V adjusted estimators of
n instead of
~
rs
~
using
and for Sample Valone an adjusted estimator using
9~(IV)and i(v) are the BAl~ estimates using ~(IV) and ~(V), _
respectively. ~(IV) and ~(V) are the BAN estimates using Z(IV) and Z(V),
respectively. ~*(V) is the B~~ estimate using Z*(V).
144
n-3.
(The value t
=3
was chosen because
,
corresponding adjusted BAN estimators
~
~
and
= 17
ss
~*
for each s.)
The
were also calculated
and presented in Table 5.11 with no noticeable change in accuracy from
the original estimators
~.
It should be noted here that neither Corollary 2.4.5 for the
~IDM model nor its parent Theorem 2.4.1 for'the MGLM mbdel claims that
the estimator
~
will be positive definite.
They do claim instead that
this estimator will be positive definite in the limit (since
assumed positive definite); that is, as the sample size n
the estimator will
Since the n
eventu~ly
~
is
is increased,
(in probability) be positive definite.
chosen for Example 3 is 20, which is relatively small, it
should not be surprising that positive definiteness was not obtained.
Nevertheless, the problem remains in general to determine a method of
adjustment of
~
whenever it does not turn out positive definite.
Probably the most practical procedure is the use of scoring as described
, --
in Section 3.5.2 of Chapter Three to calculate succesive estimators of
~
until a positive definite Z
is obtained.
Another possibility is
the use of Hocking and Smith's procedure [28], although it remains to
be shown that their estimator of Z
t~ald
is positive definite.
Statistics were calculated for Samples IV and V and
hypotheses H 3-1 through H 3-9.
These are presented in Table 5.12 for
both the original and the first adjusted estimators.
lO
Negative Wald
Statistics were circled in addition to those leading to incorrect test
conclusions.
i*(V)
laThe second adjusted estimator was not used since leV)
were equal.
and
e
145
-
----
IV
(original)
IV
(adjusted)
V
(original)
V
(adjusted)
D.F .
H 3-1
0.77
0.85
1.47
1. 72
3
H 3-2
1. 28
3.63
0.80
3
3-3
3.16
5.77
5.54
6
7040.84**
9
0.58
3
C0
2
Sample
.
Hypothesis
H
H 3-4
H
3-5
H
3-6
H
3-7
H
3-8
H
3-9
~
-6.22
7158.68** €08-30.!)
1. 59
1. 61
1. 93
GT!0
9.15*
E0
@
1. 65
G82.7}) .
0.78
1. 66
~:rt
1387.
3l~':* E30~
~
2
0.53
2
2888.85**
4
TABLE 5.12
WALD STATISTICS FOR SAMPLES IV AND V OF EXM1PLE 3
e
5.5
Example 4:
A GGCM Model
The model used in Example 4 was obtained by deleting some data
from a growth curve (GCN) experiment described by Allen and Grizzle [2].
Their model is given by
E(Y) =
A~P
Var(Y) = 1
~vhere
e
.
A(12
x
36
4) =
!:l L:
.9.9
.9.9
.Q.g
.9.10
1 10
.9.10
.9.10
.9.8
.9.8
19
~
.9.9
.9.9
.9.9
19
!19
146
P(4 x 7) =
1
1
1
1
1
1
1
-3
-2
-1
0
1
2
3
5
0
-3
-4
-3
0
5
1
1
1
0
-1
-1
-1
1-
4)
E; (4
x
'[.(7
x 7)
e
"
is a matrix of unknown treatment parameters
,
and
'.
is the unknown covariance matrix.
The GGCM model formed from this data was obtained by deleting
for a given individual the measurements on either the last three variates» the last variate or no variates.
The model thus had a hierarchal
pattern of missing observations and was given by
E(Y. ) = A. E;PB.
J
J
J
var(Y ) = I
3 B ~ L:B . » j=1»2»3
j
n.
J J
e
J
where
n
1
= 14» n = 12, n = 10
2
3
B (7 x 7) = 1
1
7
Y (14
1
x 7)
=
Al (14 x 4) =
B (7 x 6) =
2
[~]
4.0
4.2
3.7
3.4
3.0
3.8
4.2
4.1
3.1
3.5
3.4
4.0
4.1
3.5
4.0
4.4
3.9
3.4
3.2
3.9
4.0
4.2
3.3
3.9
3.4
4.6
4.1
3.6
4.1
4.6
3.9
3.5
3.0
4.0
4.2
4.3
3.2
5.8
3.5
4.8
3.7
3.6
3.6
4.9
4.8
3.1
3.0
3.9
4.1
4.3
3.1
5.4
3.3
4.9
4.0
4.2
1.3
.9.3
.9.3
Q.S
.!.5
Q.5
-3
Q.5
Q.2
.9.2
2:.2
.9.2
Q.2
Q.2
o -,
I
Q.2
.!.2J
B (7
3
3.6
5.3
5.2
3.1
3.1
3.5
4.2
4.2
3.2
4.9
3.4
5.4
4.1
4.8
3.8
5.6
5.4
3.7
3.2
3.5
4.0
4.0
3.1
5.3
3.2
5.6
4.6
4.9
x
4)
[::,4]
3.1
4.9
4.2
3.3
3.1
3.4
4.0
4.2
3.1
5.6
3.4
4.8
4.7
5.0
e
147
.e
Y2 (12 x 6) =
A (12 x 4) =
2
4.3 4.7
4.4 .5.3
4.6 4.4
3.1 3.2
3.3 3.3
4.4 4.3
3.3 3.8
3.4 3.5
4.0 4.4
3.5 3.5
3.2 3.6
4.5
3.9
-
4.7
5.6
4.6
3.0
3.4
4.5
3.8
4.6
4.2
3.2
3. T
4.7
4.8
5.9
5.4
3.3
3.6
4.3
4.4
4.9
4.6
3.0
3.7
3.9
1:.3
Q3
Q3
Q3
Q3
l3
Q3
Q3
Q3
Q3
l3
Q3
Q3
Q3
Q3
1:.3
Y (10 x 4) =
3
4.3
5.2
5.2
3.2
3.1
3.7
5.7
5.0
4.3
4.3
4.5
5.3
5.6
3 •..3
3.0
3.7
4.9
5.4
3.9
3.6
5.8
4.2
5.4
3.1
3.0
3.6
4.0
4.4
3.4
3.8
5.4
4.1
4.7
3.1
3.0
3.7
4.0
3.9
3.5
3.7
A (10 x 4) =
3
1:.3
Q3
Q3
Q3
Q2
1:.2
Q2
Q2
2.3
Q3
1
-3
Q3
Q2
Q2
Q2
1:.2
e
5.0
5.9
5.9
3.0
3.1
4.4
4.2
5.2
4.8
3.0
4.2·
3.8
5.2
5.3
5.6
3.0
3.1
4.4
4.0
4.4
5.4
3.2
4.4
3.7
As defined in (3.6.1) of Chapter Three, a consistent unbiased
is given by Z = «u rs »
estimator of Z
0
e
rs
=
1
N
rs
- R(D rs )
r, s=1, ... ,7
where
-1
[I - Drs (D'rS Drs ) D'rs ]x-sr
-rs
x
148
In this case
«N
)) =
rs
DIs = Al
D =
rs
-14
14
14
14
14
14
14
14
26
26
26
26
26
26
14
26
26
26
26
26
26
14
26
26
36
36
36
36
14
26
26
36
36
36
36
14
26
26
36.
36
36
14
26
26
36
36
36
.,
36
36
for s=l, •.• , 7
[::]
for r=2,3
and s=r, .•• ,7
}
and
D =
rs
Al
for r=4,5,6,7
and s=r, ..• ,7.
A
2
e
A
3
Also, of course,
Drs. = D
.
sr
for all r,s.
The estimator L calculated from the GGCM data is given by
L=
0.1577
0.1545
0.2130
0.1993
0.1735
0.1698
0.1892
0.1545
0.1785
0.2078
0.2108
0.1966
0.2187
0.2300
0.2130
0.2078
0.4162
0.3674
0.2778
0.2983
0.3416
0.1993
0.2108
0.3674
0.4407
0.3690
0.2871
0.2583
0.1735
0.1966
0.2778
0.3690
0.4337
0.3734
0.3178
0.1698
0.2187
0.2983
0.2871
0.3734
0.5235
0.4607
0.1892
0.2300
0.3416
0.2583
0.3178
0.4607
0.5131
This compares fairly well with the estimator obtained by Allen and
Grizzle [2] for the complete data, which is given by
149
-L: =
0.2261
0.1721
0.1724
0.2054
0.1705
0.1958
0.1817
0.1721
0.1696
0.1840
0.1919
0.1628
0.1700
0.1644
0.1724
0.1840
0.3917
0.3473
0.2370
0.1876
0.2194
0.2054
0.1919
0.3473
0.4407
0.3689
0.2870
0.2582
0.1705
0.1628
0.2370
0.3689
0.4337
0.3733
0.3178
0.1958
0.1700
0.1876
0.2870
0.3733
0.5235
0.4606
0.1817
0.1644
0.2194 .0.2582
0.3178
0.4606
0.5131
Whereas it is clear that L:
was found that
definite.
lEI
=
must be positive definite by definition, it
-3.35 x 10- 9
which implies that L:
Thus as in the previous example, use of L:
Wald Statistics might
to negative.
values.
.
~e~4
)
-
rison, therefore, it was decided to use L:
BAN estimators of
(3.6.2) of Chapter Three.
e·:
~
is not positive
for calculating
For purposes of compa-
as an adjusted estimator.
were obtained using the formula given in
The estimators obtained using both L:
-
and L:
are presented in Table 5.13 along with the two estimators obtained by
Allen and Grizzle using the complete data.
in the form
These estimators are given
PP'~.
Six tests of hypotheses of the form C
and
~
is obtained from
using the Wa1d test of hypotheses.
~
~ =
H'~ =
0
where
in the usual way) were performed
The hypotheses used and the Wa1d
Statistics obtained are described in Table 5.14.
circled.
0 (or
Negative values are
Since Allen and Grizzle did not present any test results in
their paper, no comparisons were made.
As described earlier, the
negative values in the table are the result of L: 's not being positive
definite.
5.6
Conclusions
It is not possible because of the limited scope of this computer
study to make any definite statements with regard to the general
TABLE 5.13
ESTIMATES OF pp'~ FOR EXAMPLE 4
reliability of the BAN estimators and the Wald test of linear hypotheses.
Nevertheless, for all four examples the
B~~
estimators of f
to be fairly accurate and when the estimator of L
was positive defi-
nite, the Wald test also seemed on the whole reliable.
estimator of H'i
appeared
If the BAN
was poor, however, the test results were not likely
,_
151
Wald
Hypothesis
11
Number
Description
C
Hatrix
H 4-1
5.1
=
~
(1
0
H 4-2
5.1
=
5.3
(1
o -1
H 4-3
5.1
=
5.2
(1 -1
H
4-4
5.2
=
5.3 ,
H
4-5
5.2
=
H 4-6
5.3
=
Statis~ic
Using L
0 -1)
Wald
Statistic
Using r
~
D.F.
21. 96**
4
0)
36.94**
14.50**
4
0
0)
23.19**
33.26**
4
(0
1 -1
0)
'42.62**
17.36**
4
~
(0
1
0 -1)
5.27
4
~
(0
0
1 -1)
5.65
4
~
~
TABLE 5.14
TEST OF HYPOTHESES USING WALD STATISTICS
FOR EXAHPLE 4
to be lead to correct conclusions, although this does not completely
explain the reason for bad results (as was seen with hypothesis H 2-4
of Example 2).
B~~
If the estimator of L
estimators of 5.
was not positive definite, the
still appeared accurate, but the reliability of
tests using Wald Statistics was obviously spurious.
It is clearly
important therefore to plan one's experiment with a large enough n to
insure that L
is positive definite, but until further studies are made
it is not clear what n
to pick.
mine a method for adjusting L
i t is not .
Thus it is also important to deter-
to make it positive definite if in fact
This can be easily done for the HDM model.
In general, the
most practical procedure would probably be the use of the method of
scoring until a positive definite estimator is obtained.
11
As usual,
~
is the s-th column of ;.
CHAPTER VI
RECOMMENDATIONS FOR FURTHER RESEARCH
From the discussion in the preceding chapters» it is evident
that a number of important problems requiring both theoretical and computational research stiil )remain to be" studied.
In Chapter Two we have obtained for the MGL}! model necessary
and sufficient conditions for the unknown variance matrix L to drop out
of the BLUE estimate H'(DQ-lD)-D'Q-l~ for H'i.
e.
Furthermore» it is
easily seen when the conditions hold that H'(D'D)-D'~ is the MLE of
H'~
since the likelihood equations for
independent of the MLE for L.
~
will then have a solution
These conditions are restrictions on the
design matrices that can be used in the experiment.
A study which
describes more precisely the type of designs that can be used will be '
extremely helpful to the experimenter in planning an experiment to which
the
model applies.
~GLM
In Chapter Three we have obtained for the MGLM model a BAN esti~
mator
n
H'~
for
H'~
where
~
~
--1
-
--1
= (D'n D) D'n
~
and Q is an estimate of
using a consistent unbiased estimate L of L.
We have also described
an IZEF-type iterative procedure for obtaining additional
B~~
estimators.
It is easily seen that at each iteration the likelihood equation for
is satisfied
by~.
A theoretical proof that the limiting estimator
obtained by the iteration technique is indeed the
~LE
for
H'~
would
~
153
.e
corroborate the computational results of Krnenta and Gilbert [33] for
the
~IDM
model and would allow for direct computation of the
the IZEF approach.
~~E
using
Also, a study similar to that of Kmenta and Gilbert
for the MGLM and GGCM models would be helpful in comparing different
B&~
estimators with regard to their small sample variances.
We have described in Chapter Three Hocking and Smith's approach
[28] for obtaining asymptotically efficient estimators of
the
GI~1
model.
~
and r for
At present because of notational and computational
difficulties use of their results must be restricted to simply structured models and to small u (e.g., u=2 or 3).
Although this author is
somewhat skeptical as to the possibility of writing a general program
for large u and arbitrary structure for the GLM model, it is nevertherecommended here that an attempt be made to explore the possibility of
such a program.
In Chapter Four the Wald Statistic has been proposed for testing
linear hypotheses of the form
H'~
=
0
for the
MGL~
and GGCM models.
A computer study similar to that of Schatzoff [50] is recommended for
comparing Wald Statistics using different BAN estimators with regard to
small sample power considerations.
In particular a comparison of the
proposed estimators for the GCM and GGCM models would be of interest.
In addition, a computer study similar to that of Kmenta and Gilbert [33]
would be desirable for determining the rate of convergence to a chisquare variable of any Wald Statistic.
It has been shown in Chapter Four that the Wald Statistic for
the MGL:l model is a generalization of Hotelling's T2 .
o
It would be
desirable to obtain the exact distribution of the Wald Statistic which
154
uses the B~~ estimator or the proposed MLE given in Chapter Three, for
we would then have an exact test for the MGLM model.
The exact distri-
bution appears to be quite complex, however.
It can be easily seen that when considering the GCM model, the
Wald Statistic determined after covariance adjustment is the same as
Khatri's trace described in Chapter One and at the end of Chapter Four.
In particular, the BAN estimator of
C~D
obtained from the conditional
model is independent of the choice of error matrix F
and consequently
its conditional and unconditional distributions are the same.
If it
can be shown for the GGCM model that the conditional BAN estimator for
H'~
is independent of the choice of the Fj
matrices defined in (4.5.3),
then with some difficulty it can be established that the Wald Statistic
determined after covariance adjustment is the same as the generalization
of Khatri's trace given in (4.5.4).
Another approach to testing hypotheses for the MGLM model might
possibly use the concept of sufficiency.
It is therefore recommended
that such an approach be explored and compared to the Wald Statistic
method.
In Chapter Five it has been shown that the estimators
~
given
in Theorem 2.4.1 and Lemma 3.6.1 may not be positive definite.
It is
therefore necessary to determine an alternative estimator to
which
will always be positive definite and consistent
for~.
~
The most
practical approach here appears to be the method of scoring as described
in Section 3.5.2 of Chapter Three, but this needs to be demonstrated.
Another possibility is the use of the method proposed by Hocking and
Smith [28].
~
.e
REFERENCES
[1]
ALLEN, D. M. (1967).
Multivariate analysis of nonlinear models.
Institute of Statistics Mimeo Series No.S58.
North Carolina.
[2]
Chapel Hill, N. C.
ALLEN, D. M. and GRIZZLE, J. E. (1968).
dose response curves.
No. 576.
[3]
University of
Un~versity
Analysis of growth and
Institute of Statistics Mimeo Series
of North Carolina.
AFIFI, A. and ELASHOFF, R.
~1.
(1966).
multivariate statistics I.
Chapel Hill, N. C.
Missing observations in
Review of the literature.
J. Amer. Statist. Assoc. 61 595-604.
[4 ]
AFIFI, A. and ELASHOFF, R.M. (1967).
multivariate statistics II.
linear regression.
[5]
Missing observations in
Point estimation in simple
J. Amer. Statist. Assoc. 62 10-29.
AFIFI, A. and ELASHOFF, R. M. (1969).
Missing observations in
multivariate statistics III. Large sample analysis of a simple linear regression.
[6]
J. Amer. Statist. Assoc. 64 337-358.
AFIFI, A. and ELASHOFF, R. M. (1969).
Missing observations in
multivariate statistics IV. A note on a simple linear regression.
[7]
J. Amer. Statist. Assoc. 64 359-365.
ANDERSON, T. W. (1957).
Maximum likelihood estimates for a
multivariate normal distribution when some observations are
missing.
J. Amer. Statist. Assoc. 52 200-203.
156
[8]
w.
ANDERSON, T.
(1958).
An Introduction to Multivariate
Statistical Analysis, Wiley and Sons, New York.
[9]
BAHADUR, R. R. (1967).
test statistics.
[10]
Ann. Math. Statist. 38 303-324.
BARTEN, A. P. and KOERTS, J. (1962).
voting
[11]
Rates of convergence of estimates and
Transition frequencies in
Unpublished paper.
be~avior.
BARTLETT, M. S. (1937). Some examples of statistical methods of
J. Roy Statist. Soc. Supplement 4
research in agriculture.
137-183.
[12]
BHAPKAR, V. P. (1966).
. }
A note on
th~
equivalence of two test
criteria for hypotheses in categorical data.
J. Amer.
Statist. Assoc. 61 228-235.
[13]
BHARGAVA, R. (1962).
Multivariate tests of hypotheses with in-
complete data. ·Applied Mathematics and Statistical Labora-'
e.
tories, Technical Report 3.
[14]
BOX, G. E. P. (1949).
A general distribution theory for a class
of likelihood criteria.
115]
BOX, G. E. P. (1950).
Biometrika 36 317-346.
Problems in the analysis of growth and
wear curves. Biometrics 6 362-389.
[16]
BUCK, S. F. (1960).
A method of estimation of missing values in
multivariate data suitable for use with an electronic computer. J. Roy Statist. Soc. Ser. B 22 302-307.
[17]
CPJu~R,
H. (1946).
Mathematical Methods of Statistics, Princeton
University Press, Princeton, N. J.
[18]
DAVIS, A. W. (1968).
A system of linear differential equations
for the distribution of Hotelling ' s generalized
~~th.
Statist. 39 815-832.
T~. ~
4It
157
[19]
DEAR, R. E. (1959).
A
for multiple regression models.
ration Report SP-86.
[20]
DWYER, P. S. (1967).
missing-data method
principal-~omponent
System Development Corpo-
Santa Monica, California.
Some applications of matrix derivatives
in multivariate analysis.
J. Amer. Statist. Assoc. 62
607-625.
[21]
EDGETT, G. L. (1956).
Multiple regression with missing observa-
tions among the independent variables.
J. Amer. Statist.
Assoc. 51 122-131.
[22]
ELSTON, R. C. and GRIZZLE, J. E. (1962).
Estimation of time
response curves and their confidence bands. Biometrics 18
148-159.
[23]
GLESER, L. and OLKIN, I. (1966).
covariance.
A K sample regression model with
Multivariate Analysis (Edited by P. R. Krish-
naiah.), Academic Press, New York.
[24]
GOURLAY, N. (1955).
F-test bias for experimental designs of the
latin-square type.
[25]
Psychometrika 20 273-287.
GRIZZLE, J. E., STAR}lliR, C. F. and KOCH, G. G. (1969).
of categorical data by linear models.
[26]
HECK, D. L. (1960).
Analyses
Biometrics 25 489-503.
Charts of some upper percentage points of
the distribution of the largest characteristic root.
Ann.
Math. Statist. 31 625-642.
[27]
HOADLEY, A. B. (1968).
Asymptotic properties of MLE's when the
observations are independent but not iid with applications
to estimation under variable censoring.
Bell Telephone
Laboratories Memorandum, Holmdel, N. J. Unpublished.
158
[28]
Estim~tion of parameters
HOCKING, R. R. and SMITH, W. B. (1968).
in the multivariate normal distribution with missing observations.
[29]
J. Amer. Statist. Assoc. 63 159-173.
ITO, K. (1962).
A comparison of the powers of two multivariate
analysis of variance tests.
[30]
KAKWlU~I,
Biometrika 49 455-462.
The unbiasedness of Zel1ner t s seemingly
N. C.. (1967).
unrelated regression equations estimators.
J. Amer. Statist.
Assoc. 62 141-142.
[31]
KHATRI, C. G. (1966).
problems in
A note on a
gr~wth
curves.
~~OVA
model applied to
Ann. Inst. Statist. Math. 18
75-86.
[32]
KLEINBAu~,
D. G. (1969).
A general method for obtaining test
criteria for multivariate linear models with more than one
design matrix and/or incomplete in response variates.
Insti-
tute of Statistics Mimeo Series No. 614. University of North
Carolina.
[33]
Chapel Hill, N. C.
KMENTA, J. and GILBERT, R. F. (1967).
Small sample properties
of alternative estimators of seemingly unrelated
regressio~s.
J. Amer. Statist. Assoc. 62 1180-1200.
[34]
LORD, F. M. (1955).
data.
[35]
Estimation of parameters from incomplete
J. Amer. Statist. Assoc. 50 870-876.
MORRISON, D. F. (1967).
Multivariate Statistical Methods,
McGraw-Hill, Inc., New York.
[36]
NICHOLSON, G. E. (1957).
multivariate samples.
[37]
PILLAr, K. C. S. (1960).
variate Hypotheses.
Estimation of parameters from incomplete
J. Amer. Statist. Assoc. 52 423-426.
Stati$tical Tables for Tests of MultiThe Statistical Center, University of
4It
159
.e
the Philippines. Manila.
[38]
PILLAI, K. C. S. and GUPTA, A. K. (1969).
bution of Wilks' criterion.
[39]
On the exact distri-
Biometrika 56 109-118.
POTTHOFF, R. F. and ROY, S. N. (1964).
A generalized multi-
variate analysis of variance model useful especially for
growth curve problems.
[40]
RAO, C. R. (1959).
Some problems involving linear hypotheses in
multivariate analysis.
[41]
RAO, C. R. (1965).
Biometrika 51 122-127.
Biometrika 46 49-58.
Linear Statistical Inference and Its
Applications, Wiley and Sons, New York.
[42]
RAO, C. R. (1965).
The theory of least squares when the parameters
are stochastic and its application to growth curves.
Biometrika 52
[43]
RAO, C. R. (1966).
447-458.
Covariance adjustment and related problems
in multivariate analysis.
P. R. Krishnaiah).
[44]
RAO, C. R. (1966).
Multivariate Analysis (Edited by
Academic Press, New York, 87-103.
Generalized inverses for matrices and its
applications in mathematical statistics.
Statistics (Edited by F. N. David).
[45]
RAO, C. R. (1967).
Research Papers in
Wiley, London, 263-269.
Least squares theory using an estimated dis-
persion matrix and its application to measurement of signals.
Proceedings of the Fifth Berkeley Syrnoosium 1 355-372.
[46]
RAO, C. R. (1967). Calculus of generalized inverses of matrices.
Part I:
[47]
General theory.
ROY, J. (1958).
Sankhya Series A 29 317-342.
Step-down procedure in multivariate analysis.
Ann. Math. Statist. 29 1177-1187.
160
[48J
ROY, S. N. (1957).
Some Aspects of Hultivariate Analysis.
Wiley and Sons, New York.
(49J
ROY, S. N. (1964).
A report on some results in simultaneous or
joint linear (point) estimation.
Mimeo Series No. 394.
Institute of Statistics,
University of North Carolina. Chapel
Hill, N. C.
[50]
SCHATZOFF, M. (1966).
Sensitivity comparisons among tests of
the general linear hypothesis.
J. Amer. Statist. Assoc. 61
415-435.
[51]
SCHATZOFF, H. (1966).
...
,
ratio criterion.
[52]
SIOT&~I,
M. (1968).
Exact distributions of Wilks I likelihood
Biometrika 53 347-351.
Some methods for asymptotic distributions in
the multivariate analysis.
Series No. 595.
Institute of Statistics Mimeo
University of North Carolina, Chapel Hill,
N. C.
[53]
S~lITH,
W. B. (1968). Bivariate samples with missing values, II.
Answer to Query 29.
[54]
SRIVASTAVA, J. N. (1965).
Markov theorem.
[55]
Technometrics 10 867-868.
A multivariate extension of the Gauss-
Ann. Inst. Statist. Math. 17 63-66.
SRIVASTAVA, J. N. and ROY, S. N. (1965).
Hierarchal and p-block
multiresponse designs and their analysis.
Dedicatory Volume (edited by C. R. Rao).
Mahalanobis
Pergamon Press,
New York.
[56]
SRIVASTAVA, J. N. (1966).
Incomplete multiresponse designs.
Sankhva Ser. A 28 377-388.
[57]
SRIVASTAVA, J. N. (1966).
analysis of variance.
Some generalizations of multivariate
Multivariate Analvsis (Edited by
4It
161
~
P. R. Krishnaiah).
[58]
Academic Press, New York. 129-145.
SRIVASTAVA, J. N. (1968).
On a general class of designs for
multiresponse experiments.
[59]
SRIVASTAVA, J. N. (1968).
Ann. Math. Statist. 39 1825-1843.
Some studies of intersection tests in
multivariate analysis of variance.
Symposiu~
Paper presented at the
on Multivariate Analysis at Dayton, Ohio.
Unpublished.
[60]
SRIVASTAVA, J. N. (1967) .
On the extension of Gauss-Markov
theorem to complex multivariate linear models.
Ann. Inst.
Statist. Math. 19 417-437.
[61]
TELSER, L. G. (1964).
Iterative estimation of a set of linear
regression equations.
•
[62]
TOCHER, K. D. (1952).
J. Amer. Statist. Assoc. 59 842-862.
The design and analysis of block experi-
ments. J. Roy Statist. Soc., Ser. B 14 45-100.
[63]
TRAWINSKI, I. M. (1961).
lished thesis.
Incomplete-variable designs.
Virginia Polytechnic
Insti~ute.
Unpub-
Blacksburg,
Virginh.
[64J
TRAWINSKI, I. M. and BARGMANN, R. E. (1964). Maximum likelihood
estimation with incomplete multivariate data.
Ann. Math.
Statist. 35 647-658.
[65]
WALD, A. (1941).
Asymptotically most powerful tests for statis-
tical hypotheses.
[66]
WALD, A. (1943).
Ann. Math. Statist. 12 1-19.
Tests of statistical hypotheses concerning
several parameters when the number of observations is large.
Trans. Amer. Math. Soc .54 426-482.
[67J
WALD, A. (1948).
Asymptotic properties of the maximum likeli-
hood estimate of an unknown parameter of a discrete stochastic
162
process.
[68)
Ann. Math. Statist. 19 40-46.
WALD, A. (1949).
A note on the consistency of the maximum
likelihood estimate.
[69]
WILKS, S. S. (1932).
Ann. Math. Statist. 20 595-601.
~foments
and distributions of estimates of
population parameters from fragmentary samples.
Ann. Math.
Statist. 3 163-195.
[70)
WILLIAMS, J. S. (1967).
J. Amer. Statist. Assoc. 62
estimators.
(71)
The variance of weighted regression
YATES, F. (1933).
l290-l~Ol.
The analysis of replicated experiments when
the field results are incomplete.
Empire J. Exper. Agric.
1 129-142.
(72)
ZELLNER, A. (1962).
An efficient method of estimating seemingly
unrelated regressions and tests for aggregation bias.
e.
J. Amer. Statist. Assoc. 57 348-368.
[73)
ZELLNER, A. (1963).
Estimators for seemingly unrelated regression
equations: some exact finite sample results.
J. Amer.
Statist. Assoc. 58 977-992.
(74)
ZELLNER, A. and
HUAl~G,
D. S. (1962).
Further properties of
efficient estimators for seemingly unrelated regression
equations.
Internat. Econ. Rev. 3 300-313.
,e
APPENDIX
We present below (without proof) a number of important matrix
results used in the main text dealing with generalized inverses, matrix
derivatives and Kronecker products. These topics have been treated by
a number of authors, notably Rao [41], [44], [46], Dwyer [20] and
Anderson [8].
A.l
Generalized Inverses
Definition A.l.l:
TheoremA.l.l:
= v,
R(C)
C
A-
is a generalized inverse of A if and only if
GivenA(pxq),
= L'A
r(pxp)
andC(vxq)
for some L(p x v) and r
suchthat
is symmetric and
positive definite, then
A(A'A)-A' and I
(i)
- A(A'A)-A'
P
idempotent, R(A[A'A]-A')
(ii)
(A' rA) (A' r A) -A' = A' •
(iii)
C(A'rA)-C'
(v)
S
are uniquely determined, symmetric
= R(A)
and R(I
P
- A[A'A]-A')
=P
- R(A).
is unique and positive definite.
= ¥'[I - A(A'A)-A']Y is of full rank with probability one
if Y is a matrix random variable whose rows are normally
distributed and E(Y) =
Definition A.2.l:
A(p
for some matrix
~
.
Matrix Differentiation
A.2
e-
A~
x
q)
= (Ca, ,)),
~J
Let
B(r
a
x
denote partial differentiation.
s) =«b. )) • [b
1. j
-
l
... b]
-s
and a
Let
be any scalar
164
f unc t ion.
Then
aa
aB
(r
aA
aA
= (2..2-»
ab ~J
..
s)
x
aa
I
aB (rq
(p
Z
ps) = [Ai : A
x
q)
x
A']
=
aa ..
«-2.1»
aa
when r
s
> 1
or s
> 1
,
\"here
a aa
and
aB [aA] •
Theorem A.Z.l:
The following is a list of matrix differentiation rules:
(i)
a
aa
(ii)
aa - =
(AB)
aB- l
aBj.l
-B
-1 aB
~
B
-1
•
0""
e.
B'.
(iii)
o.!:!.- =
(iv)
~.!:!. (z - A.!:!.) 'fez - A£) = - ZA'f(Z - A.!:!.)
(v)
The (i,j)-th element of -ar -x'fx-
a
by
(vi)
o.. /xx~ ..
~J \--/~J
where f
where 0 ..
a
where V = M'fM
--
= - ZB'PU- 1M'
P = X - BAM and f
~J
is of full rank,
is symmetric.
The (i,j)-th element of
rank and r
~J
is symmetric is
is defined in (v) .
given by 0.. -f -1 xx' f - , ..
tr v-lp'P
is symmetric is given
1, i=j
a ~'f -1~
aj.l
s)~metric.
~J
The (i,j)-th element of If
<
(viii)
where f
is
where 0 .. ={-Z, i¥j
~J
(vii)
whenever f
3~ ln iul
where U
is symmetric is given by
= M'fM is of full
165
~
0 ~J
.. MU
(ix)
-1 ~
M'I"~J
where 0..
l-A~B = B' e A'
where the elements of ~
a~
~
pendent (e.g.,
(x)
a2 a
,is defined in (v).
~J
is not symmetric).
a2 a
and .!::..!::.l
11 = (11', ••• ,11')'
.!::.p
11]
a\.I " -- ~
al:!.." where 11,. -- [11.!::.l' ... , -1>
~
are mutually inde-
.
.
-
whenever a
is of the form a
=
tr r
-1"
(B~M)'(B\.IM)
where r
is
symmetric and Band M are arbitrary.
(xi)
a
W
2
-1
p = X - B\.IH
(xii)
=
tr U p'p
where U = M'rM
and r
is symmetric.
-0 ..
~'1U-lp 'PU-1N')..
~J
are as in (xi) and
is defined in (v).
0 ..
~J
A.3
is given by
where r, U, M and P
~J\
e
a tr U-lp'P
ar
The (i,j)-th element of
(
is of full rank,
Kronecker Products
Definition A.3.1:
A
e
matrices A = «a .. »
ij
»
is the Kronecker product of the
and B.
~J
Theorem A.3.l:
B = «a
The following are some results involving
(i)
(A @ B) + (A e C) = A @ (n + C).
(ii)
(A
(iii)
(A 0 B)(C
e B) + (D
e
@ B)
D) = (Ae
e:
(A + D) @ B.
e
ED)
if the products AC
and BD
defined.
(iv)
(A
(v)
A-;B
e
B)
A
-e
and (B' e
-,
s.. = (.:i.l'
••• J
B
A)~
:-
\.,rhere
~
= [il ...
C'
~
]
' ) , consist of the same set
~
and
of elements.
are