Giesbrecht, Francis and Peter Burrows; (1976)Estimating variance components for the nested model, minque or restricted maximum likelihood."

,
.
ESTIMATING VARIANCE CO~WONENTS
FOR THE NESTED MODEL, MINQUE
OR RESTRICTED MAXIMUM LIKELIHOOD
by
Francis Giesbrecht and Peter Burrows
Institute of Statistics
Mimeograph Series No. 1075
Raleigh - June 1976
•
·'
Estimating Variance Components for the
Nested Model, MINQUE or Restricted Maximum Likelihood
by
Francis Giesbrecht and Peter Burrows
ABSTRACT
This paper outlines
~n
efficient method for
computing MINQUE or restricted maximum likelihood
estimates of variance components for the nested or
hierarchical classification model.
The method does.
not require storage of any large matrices and consequently does not impose limits on the size of
the data set.
Numerical examples indicate that the
computations converge very rapidly.
.'
Estimating Variance Components for the Nested Model,
MINQUE or Restricted Maximum Likelihood
Francis Giesbrecht and Peter Burrows
I.
Introduction
In a series of papers Rao (1970, 1971a, 1971b, 1972) has developed the
MINQUE (MInimum Norm Quadratic Unbiased Estimation) theory of estimating
variance components.
These papers provide a complete solution to the problem
in the sense that if normality holds then the method will give locally best
(minimum variance) ~uadratic unbiased estimates of the variance components.
Unfortunately, the form of the estimation
e~ua,ti.ons
given is such as to tax
even a very large computer if one has a reasonably large data set since they
involve inversion of an n x n matrix where n is the number of data points.
Also the procedure requires prior
. information .about the relative magnitudes
of the variance components.
The possibility of an iterative scheme seems
reasonable, but places an even larger premium on efficient computations.
Hemmerle and Hartley (1973) develop a method for computing maximum likelihood estimates of the variance components.
Corbeil and Searle (1976) modify
the Hemmerle and Hartley algorithm to give restricted maximum likelihood
estimates by partitioning the likelihood into two parts and then maximizing
the part which is free of the fixed ·effects.
the repeated inversion of a
levels in the model.
Howeve;r, both algorithms require
x
~
matrix, where
Typically
~
will be much smaller than U, though for
~
~
is the total number of random
large data sets even that rapidly becomes unreasonable.
iterative and as such
re~uire
Both algorithms are
only reasonable initial starting values for the
unknown variance components rather than prior estimates.
The object of this paper is to show that for the hierarchal model the
2
MINQUE formulas can be re-written in a form that requires only that a P x P
matrix be inverted, where P is the number of variance components or levels of
nesting in the model.
by oo.e pass
It is shown that all necessary totals can be obtained
through the data set.
It is also shown that if the MINQUE
process is iterated (with some provision to replace negative components with
small positive values) and converges, then the final answers will also satisfy .
the equations obtained by applying the restricted maximum likelihood method.
II.
MlllQUE With Invariance and Normality
Following Rao (1971a, 1971b) let Y be an n-vector of observations with
structure
(1)
where X is a known nx s matrix of rank s, 13 is an s-vector of unknown
Parameters, U. is a known n x c.J. matrix and s.J. is a c.-vector of normal random
~
variables with zero means and dispersion matrix I
a~
~
~.
of appropriate dimension.
It is also assumed that s. and e. are uncorrelatedfor all i .;. j.
J
~
F.[y] = Xfi
Jiy]
= criv1
Clearly
and
+ •••+ (J2y
pp
where
for i = 1, .••• , p.
The quadrati~unbiased and invariant (invariant in the sense that the
estimates are unchanged if we replace the vector of observationsY by Y + Xb
for any s-vector b) estimate of
••• +
given by theMINQUE principle is
~
3
where
I
Q
i = Y RViRY ,
R
= V*-l(I_P)
P
= X(X/V*-~f~/V*-l,
,
and the {A.) are suitable (derivable) constants for arbitrary {P.) .
~
~
Replacing V* by
••• + a pp
V
amounts to selecting a different norm and disturbs neither the unbiasedness
nor invariance.
{a.} happen tobe proportional to the true
Further, if the
~
but unknown [cr~} then the estimates obtained will also be the minim.um variance
1.
quadratic unb.iased invariant estimates at that particular point in the
parameter space.
If all variance components are to be estimated simultaneously, rather than
only a particular linear function,then the [Ai} need never be computed.
procedure is to compute the
[Q.},
1.
resulting system of equations.
The
equate to their expectations and solve the
This can be written as
...
=
(2)
...
where Sij
= tr(RViRVj
).
It seems reasonable that if prior knowledge of the relative magnitudes of
4
the unknown variance components is available then this information should be
used to define a set of {O'i) and the matrix V'*- replaced by V in (2).
Since (2)
is a linear equation and the right hand sides are quadratic forms, the variancecovariance matrix of the estimated variance components follows immediately (in
terms of the true variance components and {O'.).
J.
Formally this provides a complete solution for the variance component
estimation problem.
However the computational aspects of obtaining the elements
of (2) directly are formidable.
They involve inverting an n x n matrix.
Consequently the examination of special cases is in order.
Of particular
interest here is the nested analysis of variance model, since La Motte (1972)
has obtained V-l in a manageable form.
In.
Connection with Restricted Maximum Likelihood
In order to obtain the maximum likelihood estimates of the
The first is to assume a~
ents we proceed in two steps.
and
estimate~.
= O'i
v~riance
compon-
for i = 1, ••• , p
This yields
~
= (X'V-~)-~'v-ly .
The second step is to compute residuals
Z
,.
=Y
- X~
= Q Y
v
and apply maximum likelihood to obtain estimates of {a~}
Define
Z
*=
T
Y
v
where Tv consists of n - s linearly independent rows of Qv •
The likelihood
can now be written as
s.,
= - -K- " - 1
ITv VT'\2
v
exp r_~ Y'T ' (T VT')-~ Y}.
v
"
v v
v
2
Treating Tv as a constant (the la.}
J. have been replaced by to'J.'}) gives the
e
..
5
likelihood equations,
~
~
..en
1-
2
=
a.~
-itr[(T VT ,)-IT V.T ']
v v·
v ~ v
+ .l.y'T' (T VT' )-IT V. T' (T VT,)-lT Y
2
= 1,
for i
V · V
V
V
~ V
V
V
V
2, ••• , p.
Equating the partial derivatives to zero and rearranging terms leads to
tr[T'(T VT ,)-IT V.]
v v v
v ~
= 'lIT' (T VT I rlT V. T I (T VT ,)-IT Y
I vvv
v~vvv
v
= 1,
fori
Khatri
2, •.. , p.
(1966) has shown that for
any
symmetric positive definite matrix C,
Direct substitution into (3) gives
= 1, 2, .•. ,
for i
p.
Also we can replace V by V to obtain
'" -1
Q,.V.)
trey
= 1,
for i
v~
"'-L. "'-1
= Y Q",V
v.v
G",Y
v
~
"V
I
2, ••• , p.
The quadratic forms in
(4)
are similar to those required by MINQUE, with the
only difference being the nature of V or V.
of
(4)
In MINQuE theory V is a function
f~.}
while in maximum
likelihood estimation theory
.
.
~
Also
if the [0'.1
are chosen correctly i.e. agree with
.
~
Vis
a function. of
(&~}.
~
fa~1 then the expected
~
values of the quadratic forms obtained under the MINQUE principle can be written
as
6
It follows immediately that i f one uses an iterative scheme for solving the
maximum. likelihood equations in which the
(ail
{a.}
~
are updated to agree with the
(with some provision to replace negative values with some small positive
constants) the solution will agree exactly with that obtained from the MI:NQUE
equations using the same rules for iteration.
IV.
Estimation in the Nested Model
The development in sections II and In has been in terms of a relatively
general linear mOdel.
For the remainder of this paper we will specialize to
the 3:-level nested model with only the one fixed parameter ~. The 3 -level
... ,
ne6ted model was selected because it is sufficiently complex to display the
\,
general
pa~tern
the notation
of terms that arise when more levels are introduced without
becomi~
too complex.
The extension to the case where the rendom
effects are nested within levels of a fixed effect is obvious and will not be
discussed.
The 3 -level nested model is given by
+
~
+b h~. + chij + e hijk
Yhijk =
J.L
for
k= 1,2,
j = 1, 2,
·.. , nhij .
·.. , ~i
i= 1, 2,
·.. , ~
h= 1, 2,
·.. , m
and where
(Y
} are the observations,
hijk
~
is an unknown constant,
(a ) are independent N(O,
h
(bhil are independent N(O,
{c
hij
ai),
a~),
} are independent N(O, a~),
(ehijkl are independent N(O,
a~)
and
(5)
·
'
7
the latter
4
sets are also mutually independent.
Since we will have occasion to
re~er
to the value obtained by summation over
o~
certain subscripts, we will use the convention
replacing the subscript with the + symbol.
indicating such a sum by
For example
~.+ = E~ ..
1:
j
J.J
and
etc.
To establish the connection between (1) and
(5),
to the n+++ elements {Yhijk } in d~ctionary order.
~
the one element
~.
note that Y corresponds
X is' a column of lIs and
Ul corresponds to an n+++ x m matrix of 0' sand l's,
U2 to an n+++ x m+ matrix
o~
0 t sand l's, U to an n+++ x m++ matrix and U4
3
the n+++ x n+++ identity matrix.
The covariance between Yhijk andYh'i'j'k'
can be written as
where
i~
L
=m
eLm =
i~ t :/ m
The corresponding element
o~
V can be written as
where a. represents the prior estimate
J.
~or a~.
Since only the relative
J.
magnitudes of the ta.} are important they can be scaled so that one
J.
is 1.
o~
them
Also if one wishes the more primitive form of the MI~UE, Rao (1972),
all can be set all equal to 1.
immediate.
The extension to more levels of nesting is
The corresponding element of V-
l
can be written as
8
where
= 0'3(0'3
n hij + 0(4)-1
hi
= 0'2(0'2
n +
)",1
hi
0'3
w
h
= 0'1(0'1
n
,w
hij
w
=
W
n
and
Summa~ion
as
h
+ 0'2)-1
,
(t w nh)-l
h h .
hi
=;
~ijWhij
n
=~
nhiwhi
h
,
J.
over hi, ii, j' and k ' yields
~he elemen~ of ~he vec~or v-lx corresponding ~o Yhijk in Y. Also no~e
that the scalar X
'v-lx
is O'~lw-1
The general element of R now follows as --;,.
- wh ' ,wh ' whw;W'h IW I , IWh I , I , I -1
J.J
J.
.
h J.
J. J 0'1
The pattern for the case with more levels is obvious.
The typical element of Z
= RY
is
,
.
9
- whi,w
' (t w ' "Y , "+. )0'3J h ~" h ~J h ~J
.
1
J
-Wh· , ,who wh(t I: wh ' IWh · I "Yh , I, '+)C{2~J
~
,I"
~
~ J
~ J
~
1
J
- Whi,Wh,whw(t I: E Wh/Wh"/Wh/ "/Yh"/j· '+)C{1-1 ,
J
~
~
h/i/j'
~
J
~
where
The quadratic forms in (2) are now obtained as
E E
,
2
t t zh' 'k
~J
h i j k
2
and
E E zh'++
h i
J;
t tt
h i j
Z~, ,+'
~J
2
E zlrt++
h
,
,
where the + symbol indicates summation over the subscript before squarihg. We note at
this point that these quadratic forms are obtained by two passes through the
data file and are not affected by size of data set.
In order to obtain expressions for the elements in RV1 ' RV ' RV ' RV
2
4
3
and the various products in a systematic pattern it is convenient to define
th
-1
= C{2
-1
+ whw 0'1
'
-J. +
whiwht h
t hi= C{3
and
-J.
t hij = C{4 + WhijWhithi
The elements in RV1 ' RV ' RV and RV are given in Table 1 in a convenient
2
4
3
form. Expressions for tr(RV,RV,) for 4 ~ i ~ j ~ 1 are given in AppendixA.
~
J
These expressions can (and should) be condensed before being used for
computations.
However when this is done the pattern is not as clear..
We also
.10
'notice that if one arrives at a point where the estimates obtained agree with
the prior values then the variance-covariance matrix of the estimates can be
approximated by twice the inverse of the left side of 2.
e
e
e
Table I.
Coefficient
RV
4
e
Elements in the RV , RV , RV , and RV Matrices
2
4
l
3
RV
RV1
RV
2
3
&hh,IiU'&jj,li kk ,
-1
0'4 -whij t hij
Whij(~31_~ijWhijWhithi}
WhijW~i(O';l_~iWhiwhth}
( -1
-1)
WhijWhiWh 0'1 -I\whwO'l
6hh ,6 U ,6 jj (l-li kk ,)
-Whijthij
Whij(n;l_I\ijWhijWhithi}
WhijWhi(O';l_I\iWhiWh~)
( -1
-1)
WhijWhiWh 0'1 -I\whWO'l
li hh ,l'U,(1-6 jj ,}
-WhijWhiWhij,thi
-WhijWhiWhij'~ij,thi
WhijWhi(O';l_~iWhiwhth}
( -1
-1)
WhijWhiWh 0'1 -I\WhWO'l
6hh , (1-6 U ')
-WhijWhiWhWhi,Whi'jth
-WhijWhiWhWhi'Whi/j'~i'j,tn
-whijwhiwhwbi '~i ' ~
( -1
-1)
WhijWhiWh 0'1 -~whwO'l .
(1-6hh ,)
-1
-WhijWhiWhWWh,wh'i,Wh'i'j'O'l
-WhijWhiWhWWh,wh'i,Wh'i'j'~'i'j'O'l
I
-1
-1
-WhijWhiWhWWh,wh'i'~'i'O'l
-1
-whijwhiwhwwh'~ '0'1
......
12
V.
Computational Aspects
An examination of the formulas developed in the previous section shows
that a very efficient computing routine is possible.
If the data are sorted
or arranged according to the nesting structure then all totals required in
equation (2) can be obtained in one pass.
Alternatively, if one wishes to
minimize the programming effort, all the accumulations can be obtained rather
easily by using any one of a number of standard programs for computing means
and totals of subsets of observations in a data set.
\
obtained at each level of nesting.
Summaries need to be
These summaries are then merged and a
final set of accumulations
obtained.
In order to examine the behavior of the suggested iterative scheme
several analyses were performed on synthetic, unbalanced data sets.
The
results of three typical cases are given in Table II.
The striking feature
of these computations is the very rapid convergence.
In fact it appears that
2 or at most 3 cycles are sufficient.
The extent of the unbalance in the
data can be judged from Table III.
VI.
Acknowledgement
The authors wish to thank T. M. Gerig for his valuable comments and sug-
gestions in the preparation of this paper.
,
.
13
TABLE II
Summary
ofTh~ee
2
0'1
A.
B.
c.
True parameter va1ues *
Starting values
Estimates first cycle
Estimates second cycle
Estimates third cycle
Estimates fourth cycle
Estimates fifth cyc1e
EstimatesA.o.V.
True parameter values*
Starting values
Estimates first cycle
Estimates second cycle
Estimates third cycle
Estimates fourth cycle
True parameter values
Starting values
Estimates first cycle
Estimates second cycle
Estimates third cycle
Estimates fourth cycle
Estimates fifth cycle
Estimates A.o.V.
Analyses
2
°2
2.
1.
4.289
4.146
.1
2.
.006
4.133
4.1.34
4.134
-.Oll
4.398
2.
5.
4.292
4.149
4.135
4.134
2.
5.
3.187
3.255
3.260
3.260
3.260
3.102
-.015
-.012
-.012
-.061
.1
1..
2
0'3
(,2
.5
1.
1.
1.
.555
.231
.223
.224
.224
.130
4
1.157
1.140
1.1.44
1.143
1.143
1.173
1.
1.
1.128
1.142
.039
- .006
-.012
-.012
.5
.1
.211
.220
.224
.224
1.143
•1
1.
.5
.1
1•
1.
.017
0.003
-.005
-.005
-.005
.01.4
.354
.330
.332
.332
.332
.31.8
1.135
1.158
1.158
1.158
1.158
1.158
l.l4~
l
in Simulated Data Sets
Pattern of ~ij'S
TABLE III
h
Sets
A &B
i
11
2
3
4
5
6
7
8
9
10
1
12,1
2
1,1,2
3,1,1
2,1,3
3,3
2
3,1
3
3,2,1
2
11,2
2,1
2
2,1,1
2,3,1
2,3
2
2,2,1
3
I
2,1
2,3,1
1,3
1,1
2,3
3
3,2
2,3
2
4
3
5
3,1
2,2
2
2
1,2,3
2,3,1
1,1,1
1
2,3,1
2,3,3
1,2
2
3
1,g,1
1,1
3,1
2
13
3
2,1
1
1,1,3
2,2
Set
3
I
2
2
2,1,3
3
1,3,2
c
4
I
1,3,1
5
1
1
1,3,3
1
1,1
1
1l,2,2
2,2
1
3
1
3,3,3
1,3,3
2
3,2
.....
~.
e
,
e
e
,.
15
REFERENCES
Corbeil, R. R. and Searle, S. R. (1976). Restricted maximum likelihood (REML)
estimation of variance components in the mixed model. Technometrics 18,
31-38.
Hemmerle, W. J. and Hartley, H. a. (.1973). Computing maximum likelihood estimates for the mixed A.a.V. model using the W-transformation.
Technometrics 15, 819-832.
Kh~tri,
C. G. (1966). A note on a manova model applied to problems in growth
curve. Ann. Inst. Stat. Math. 18, 75-86.
L&~tte,
L. R. (1972). Notes on the covariance matrix of a random nested
anova model. Ann. Math. Statist. 43, 659-662.
Rao, C. R. (1970). Estimation of heteroscedastic variances in linear models.
J. Amer. Statist. Assoc. 65,.161-72.
Rae, C. R. (197la). Estimation of variance and covariance components-MmQUE theory. J. Mult. Analysis 1, 257-75.
Rae, C. R. (1971b). Minimum variance quadratic unbiased estimation of
variance components. J. Mult. Analysis 1, 445-56.
Rae,C. R. (1972). Estimation of variance and covaria.l'lce components in
linear models. J. Amer. Statist. Assoc. 67, 112-15.
·
A-l
APPENDIX
+ I:
«L
h
h
L
222
L
i
n.
w 0w w )
n ~J h ~J h ~ h
0
j
2
0
0
2
0
2
2
2
2 2 -2
- (L L 11. 0 OWhOjWho)wh)(L Ln.. owho oWho)whw cx I
n~J
~
~
j
n~J
~J
~
~ J
~
o
0
•
0
-I
-1.
tr(RV4 RV3 ) = L L L 11. owho 0(cx 4 - W oot 00)(cx3 - 11. OOWhOjWhotho)
h i j n~J ~J
n~J n~J
n~J
~
1 1
0
+ I: L L «L . n.
h i j
2
n~J
j
2
+ L «L L L n.
h
2
2
22'
2 2
2
2
2
- (I: L 11. 0 ~rh 0 0W W
h1J 1J h 1 h
i J
0
2
oWho oW ow. )
h i J 1J h 1 h
h i j
0
.
who 0 ) - 11.iowho 0)11. • owho oWhot ho
~J
n J ~J h1J ~J 1 1
00
0
» (1.: L n 2
0
~
0
J
2
0
2
2 2 -2
oW 0 0W 0)w.hW CX I
h 1J h 1
n~J
•
,
,
<
A-2
2
-1
- 1::.'1:: 1::. n.m.J
.. (n. .. - l)w:h · .wh·t. ··(0'2 - n. .wh·W.hth )
h~J
~J]. h~J
h~
~
h ~ J
.
2
2
2
2
2 2 2 2
+ 1:: 1:: (( 1:: 1:: n. ..wh . ,wh .) - (1:: n. .•wh •.Wh' ))n. . wh . whth
hi
i j hlJ lJ 1
j . hlJ lJ 1. h l l
«.h1:: 1::i 1::j n.n~J. . ,.W"h'2 ~J•wh2 ~•wh2 )
+L:
h
(
.J
~ ~»(
______
- _.;.y~.,-L:j,..--~ij hij hi h
~
2
~ )-rl¢ -2
~ ~i hi
~
h
0'1
•
2
-1
-1
- 1:: 1:: 1:: n. .. (n. .• - 1)'Wh ·· "1h · wht. ., (0'1 - n. wh w0'1 )
h i j n~J n~J
~J
~
hJ.J
h
+
L:
h
2
«h1:: 1::i 1::j n.. n1.J.. wh2J.J· .2
wh . h )
1.
1IT.
A-3
tr(RV3 RV3 ) ... I: I: I:
h i j
+ I: I:
h i
.
2
11. ,.W
« 'Ei 'Ej
+ !: «'E 'E 'E
h
-1
2
· ,·(0'3 .- n. ,.whi··wh,t ,)
hJ.J
J J. h J.
hJ.J h J.J
2
2
2)
n. ,·w. , ,w. ,
h
h
hJ.J
J.J J.
2"
2
2
2
n. , . w. ' . w. , w )
h i j hJ.J h J.J h J."h
2
2
2 2.
2
2
2
2 2 -2
- ('E, 1::, n.hJ.J
,·w.h J.J
, .wh,W.h»(:E :E n. "wh ' ·W.h·)Whw 0'1
J.
' j hJ.J J.J J.
J. J
J.
2
-1
tr(RV3 RV2 ) ... :E :E:E n. ,. w. ' . w ' (0'3
h i j hJ.J h J.J h J.
. . -1
- n. .. w. , . w ' t ·) (0'2 - n. ,wh ' wht h )
hJ.J hJ.J h J. h J.
hJ. J.
2
2
2
2
2-1
- I: 'E 'E «:E n. i ·w'h' .) - n. , . wh ' .) n. ,. wh ' . w.h · t h , (0'2 h i j j h J J.J
. hJ.J J.J hJ.J J.J J. J.
+ I:
h
«'Eh
2
2
2
2
:E :E n. i · wh ' . wh ' wh )
i j
h J
J.J! J.
2
2
2
2
2 2
2 2 -2
- (1:::En. , .w ' .wh,)Wh)(:E n. ,wh·)whW 0'1 •
h
i j
hJ.J J.J. J.
i
hJ. J.
11.
,w ' wht.. )
hJ. h. J.
h
A-4
+ hI:
2
.. I: I: I:
h i j
+
-1
-1-1
I: I: 11. - _(n. - - - 1)w.h - ·W.h .W.h (O'3 - n. - .w.h " .wh -th -)(O'1
i j h J .J . hJ.J
J.J
J.
.
.. hJ.J.. J.J.. J . 1
«I:j
2
2
11. - - w · -)
h1J h 1J
(
I: ~ ~ ~ij ~ij"
h J. J
+ h1: I:i
«I:i n.2.?
- -W -)
h1 h1
..
2
2
2
w2 (-1
. W w· ) 2
hij hi 0'2 .. ~i hi h~h
1) .;
2': - ) 11.2..}J2
• h - hth
.. 11. -
h1
-1
11. .. W: - -) 11. - • w - . w. . W.ht . (0'1
h1
h1J h 1J
h1J h 1J h 1
1. h1
1
..
-.n. wh wO'1 )
h
-1
11. w wO'1
h h
)
~
.,
.
A-5
•
+
~
i~ ~j
(
~ij ~lij -
1)W2 w2 w ( -1
hij hi h 0'2
w Wt
- ~i hi h h
h
2 2 2
. w. )
«hi·
t tn..
m. h h
I:.
~ ~ ~i - ~ij hij ~ij hij hi h 0'1
+t
+
-1
0'1
'W.
J.
(
h
)(.
J.
J
W)
W
~
if-(
-1
W w_.- 1)2
- ~ h 'u1
W w. -1)
- ~ h 0'1