Sriburi, Songsiri; (1978)Properties of Variance Component Estimators Obtained by Restricted Maximum Likelihood and by Minque."

a
PROPERTIES OF VARIANCE OOMPONENT ESTrnATORS OBTAINED BY
RESTRICTED MAXIMUM LIKELIHOOD AND BY MmQUE
Songsiri Sriburi
Institute of Statistics
Mimeograph Series No. 1175
RaJ.eigh - May 1978 .
iv
TABLE OF CONTENTS
Page
LIS'! OF TABLES •
LIS'I OF FIGURES.
.............
·....
.. ...... ·..
. . · . . ... . . . · . . . . . . · . .
1.
INTRODUCTION. •
2.
ASYMP'IO'IIC PROPERTIES OF RESTRICTED MAXIMUM
LIKELmOOD ES'IDfATES IN THE MIXED MODEL •
~
.·......····
. · · ·· ··
3.
Proof of COtldition a.
Proof of Condition b.
Proof of Condition c.
Summarizing Theorem •
··
·· ·· .. ·· •
•
· ·
..··
·
·
• · ·
· .. ·
·
·..· ···
'.
THE DISTRIBU'IION OF VARIANCE COMPONENT ESTDf.ATORS
3.1 Introduction • • • • • • • • • • • • •
3.2 The Method of Maxtures • • • • • • • •
3.3 The Distribution of Variance Component
Estimators (Positive Quadratic Forms) ••
The
Distr1bution of Var1ance Component
3.4
Estimators (Indefinite Quadratic Forms).
4.
viii
1
7
2.1 Introduction
2.2 The Mixed Medel and Its Assumptions.
2.3 The General Asymptotic Theorem •
2.4 A Sequence of Experiments.
2.5 COtlsistency, Asymptotic NOl:mality, and
Asymptotic Efficiency of REML Estimates.
2•.5.1
2.5.2
2.5.3
2.5.4
v
·.
7
8
12
14
16
17
18
21
27
31
31
31
·..
·..
33
36
COMBINING INFORMATION FROM SEVERAL EXPERIMENTS.
50
Introduction. • • • • • • • • • • • • •
Model and Methods. . • • • • • • • • • . • • • •
Distribution and Variances of Estimators •
A Comparison of Four Methods of Estimation • . • ••
The Effect of the Number of Levels of Each
Random Factor. • • • • • • • • • • • • •
4.6 The Effect of the True Variance Components. •
50
51
57
60
4.1
4.2
4.3
4.4
4.5
....
·..
5.
SUMMARY
6.
LIST OF REFERENCES.
7.
APPENDIX. • • • • • •
·..·
·.
·..
72
84
99
• 105
• • 107
v
LIST OF TABLES
Page
3.4.1
The cummulative distribution obtained from
the method developed and the Monte Carlo
,
-3
method of T with e • 10
4.4.1
••••••••
....
47
Variances of &2, probabilities of negative
r
.. 2
a , and the 95th percentiles of the distri-
r
.. 2
butions of a from four methods and eight
r
combined experiments when a
2
2
• 1. 0, a c • .4,
2
2
r
a rc • .8, and a e • 1.0• • • • •
4.4.2
.. 2
.. 2
c
c
61
Variances of a , probabilities of negative a ,
and the 95th percentiles of the distributions
of &2 from four methods and eight combined
c
2
2
2
experiments when a • 1.0, a • .4, a
• .8,
2
r
c
rc
and a • 1. O.
• • • • •
e
4.4.3
Variances of &2 , probabilities of negative &2 ,
rc
rc
and the 95th percentiles of the distributions
of &2 from four methods and eight combined
rc
2
2
2
e:cperiments when a • 1.0, (j .: .4, a
• .8,
.
2
r
c
rc
an.d a • 1.0 • . . . . . . . .
e
4.4.4
67
2 pro ba b i 1 ities 0 f negat~ve
...2
Vari~ces 0 f ..a,
a ,
e
e:
and the 95th percentiles of the distributions
of a2 from four methods and eight combined
e:
2
2
2
experiments when a • 1.0, a • .4, a
• .8,
2
r
c
rc
and a
4.5.1
64
e
• 1. o.
. . . . . . . .
70
a2
.. 2
, probabilities of negative a ,
Variances of
r
r
and the 95th percentiles of the distributions
of 2 when using method d and sixteen combined
ar
2
2
2
experiments with a • 1.0, a • .4, arc
r
c
2
an.d a • 1.0. . . . . . . . . .
e
= .8,
73
vi
LIst OF tABLES (continued)
Page
4.5.2
.. 2
.. 2
Variances of a , probabilities of negative a ,
c
c
and the 95th percentiles of the distributions
.. 2
of a when. using method d and sixteen combined
2'
c
4.5.3
4.5.4
2
2
experiments with a • 1.0, a • .4, a
• .8,
2
r
c
rc
and a .' 1. O. • . . • . . • . . • •
e
.. 2
.. 2
Variances of a ,probabilities of negative a ,
rc
rc
and the 95th percentiles of the distributions
.. 2
of a
when. using method d and sixteen combined
rc
2
2
2
experiments with a • 1.0, a • .4, a
• .8,
2
r. c
rc
and a •. 1. O. . . . . . . . . . • •
• • •
e
.. 2
Variances of a
€
..
76
..
79
.. 2
probabilities of negative a ,
J
€
and the 95th percentiles of tne distributions
2
of a when using method d and sixteen combined
2
€
2
2
experiments with a • 1.0, a • '.4, arc • .8,
2
r
c
and ae: • 1.0.
• . • • • •
4.6.1
.. 2
Variances of a
r
81
A2
J
r
' .
probabilities of negat1ve a
J
and the 95th percentiles of the distributions
of
a2r when using method
2
experiments with a
cr
4.6.2
2
rc
.=..
8, and
2
(j
e:
d and three combined
2
• .01, 1.0, 2.0, cr . : .4,
r
c
1. O. . . . . . . . . . . . . .
a
.. 2
•
85
A2
Variances of a , probabilities of negat1ve a ,
r
r
and the 95th percentiles of the distributions
of
;2 when
r
using method d and three combined
2
2
experiments with a • 1.0, a • .01, .4, 2.0,
2
2 r
c
a
.:. 8 , and a • 1. o. . . . . . . . . . . .
rc
e:
89
vii
LIS'!' OF TABLES (continued)
Page
4.6.,3
A2
A2
t
probabilities of negative a ,
r
r
Variances of a
and the 95th percentiles of the distributions
A2
of a
~
when using method d and three combined
2
r
2
2
experiments with a • 1.0, a • .4, a
• .01,
2
r
c
rc
.8, 2.0, and a • 1.0 • • • •
e:
4.6.4
A2
r
92
A2
r
Variances of a , probabilities of negative a ,
and the 95th percentiles of the distributions
of &2 when using method d and three combined
r
experiments with a
2
• 1.0, a
2
2
r
c
and a e: • .01, 1.0, 2.0 • • . •
7.1
2
• .4, arc • .8,
Seven terms with appropriate choices of A, B, and C
95
124
viti
LIS'! OF FIGURES
4
3.4.1
Graphs of the p.d.f. of T, where T·
4
- j-1r aj~r+"J
r
AiU~
i-1
for three sets of (A1,AZ,A3,A4,-al
.....
4.6.1
.. 2
2
where .01 < a < 2.0
r
- r-
The graphs of variances of a
for three combined experiments..
4.6.2
• • • • • •
The graphs of variances of ~2 where .01
r
<
a2
<
c-
4.6.4
·86
2.0
for three combined experiments. •
4.6.3
49
90
.. 2
2
< 2.0
The graphs of variances of a where .01 < a
r
rc for three combined experiments. •
.. 2
The graphs of variances of a . where .01 < a2 < 2.0
........
r
for three combined experiments. •
-
93
e:-
. .• . . . . . .
96
1.
INTRODUCTION
Estimation of variance components in the mixed model of the analysis of variance has been the subject of considerable discussion for
several years.
If the data are balanced, then estimation relies_almost
exclusively on the analysis of variance method.
This method consists
of making an analysis of variance table, equating expected values to
observed mean squares, and using the solutions to the resulting equations as the estimates.
It has been shown by Graybill (1954) and
Graybill and Wortham (1956) that-under the assumption of normality,
these estimates obtained from balanced data sets have minimum possible
variances, in the class of unbiased estimates.
Henderson (1953) has
suggested analogous techniques for unbalanced data.
Recent17, several new methods have been proposed.
maximum likelihood method by Hartley and Rao (1967).
One is the
This method
yields simultaneous estimations of both fixed effects and variance
components by maximizing the likelihood function with respect to each
of the fixed effects and the variance components.
Even though the
maximum likelihood estimators of variance components have some desirable properties, their use has been limited.
The major reason for this
is that effective algorithms are not readily available.
Another
criticism is that the maximum likelihood estimators do not take into
account the loss in degrees of freedom due to estimating fixed effects.
Recently, several attempts have been made to improve on the maximum
likelihood method.
Patterson and Thompson (1971) eliminated the second
problem through their Restricted Maximum Likelihood (REML) approach by
~
2
partitioning the likelihood function into two parts; one part entirely
free of fixed effects.
The maximization of this part yields what we
call REML estimates of variance components.
Another development was MINQUE, Minimum Norm Quadratic Unbiased
Estimation (or Estimator) by Rao (1970, 1971a, 1971b, 1972).
In the
general mixed model
! '"' XA + Ue
:II
XA + Uof.o +
suppose the random effect e: is observable.
!
i-O
PiO'~
+ U e: ,
(1.1)
p-p
A natural estimator of
would be e:' Ae: where A is a suitably defined diagonal matrix.
Since, in practice, ! rather than e: is observable, the MINQUE principle
leads one to select an estimator of the form r'AY where matrix A is
selected to minimize the nOTm of the difference,
to AX
:II
a and tr(AV i )
,
:II
I IU'AU
Pi' Vi '"' UiU i for i-O,l,.,p.
... 2
• •• , 0'2)
p is given by ~
-
AI I,
subject
The MINQUE of
-
:II
S S, where S is a (P+l)x(p+l)
matrix with element Sij equal to tr(QViQVj
), Q -
, - L -1_ -1
r-1 -E-L
-X(X r -X) -x't ,
S- denotes a generalized inverse of S, and q is a (p+l) x 1 vector whose
ith element is
!' QVi Q!.
An inversion and numerous multiplications of
nxn matrices, where n is the number of observations, are reqUired to
obtain the MINQUE.
Some developments on reducing the size of matrices
needed to be inverted have been made.
Liu and Senturia (1977) have
2
shown that it is possible to obtain the MINQUE of.£ by manipulati?i
3
matrices of size g
x
gt where g is the sum of the numbers of levels of
all random factors omitting the error, the number of fixed effects, and
one.
Giesbrecht and Burrows (1978) have shown that for the nested
model one need only invert a p x p matrix, where p is the number of
variance components.
MINQUE also requires that one has prior values for the variance
components.
The properties of the estimators depend on the quality
of these prior values.
Since good prior values are not always avail-
able, several authors have proposed an iterative MINQUE procedure,
where the prior values are replaced by the estimates from the previous
cycle and the negative estimates are replaced by zeros.
Harville
(1977) and Giesbrecht and Burrows (1978) have pointed out that if the
process converges, then iterative MINQUE is identical to REML, i.e.,
these estimates satisfy the equations obtained from the REML method.
In two recent papers Weiss (1971, 1973) has discussed the asymp-
totic properties of maximum likelihood estimates for some nonstandard
cases.
He has shown that a general asymptotic theorem holds for a
class of cases where the observed random variables are not necessarily
independent "and identically distributed.
This theorem also allows the
possibility of different normalizing sequences for different sequences
of estimates.
Miller (1977) used Weiss's general asymptotic theorem to
obtain asymptotic properties of maximum likelihood estimates in the
mixed model of the analysis of variance under some mild restrictions
on the design sequences.
In this paper it is shown that similar asymptotic properties hold
for the REML estimates.
In particular, these asymptotic results obtain
4
when the number of levels of each random factor increases to infinity
or when the ex-periment is repeated.
The sequences of the estimates
may require normalizing sequences which differ in order of magnitude
in order to eliminate the problem of a degenerate limiting distribution.
Truncation does not affect the asymptotic results because the
true variance components are assumed to be positive and the estimates
are consistent with high probability.
There exists a number of nOn-iterative schemes for estimating
variance components.
A common feature of these methods is that the
resulting estimates are obtained as quadratic functions of the original
observations.
The selection of the actual quadratic function is often
open to debate.
If one is willing to use prior information about the
components, then the MINQUE
p~inciple
corporate these prior values.
leads to specific forms that in-
Even if the original observations are
assumed to have a normal distribution, in general the distributions of
the quadratic
fo~
used as estimators have remained intractable.
Some progress can be made by noting that quadratic functions of normal
random variables are distributed as linear functions of
single degree of freedom chi-square random variables.
independ~~t,
In particular,
Robbins and Pitman (1949) have shown that a positive quadratic form in
normal variates is distributed as a mixture of chi-square distributions.
Press (1966) extended this to obtain the distribution of the difference
of two positive definite quadratic forms, and hence to the indefinite
quadratic forms in normal variates.
Wang (1967) used a similar tech-
nique to study the distribution of several variance component estimators
in
the one-way balanced model.
5
In Chapter 3 Press's (1966) results are extended to
~~ine ~he
distribution of variance component estimators in a large class of
models.
The first step is to derive the distribution of a quadratic
estimator
l' AY,
where the eigen values of At are all positive, in terms
of an infinite series of chi-square distributions.
It is shown that
the coefficients in the series can be evaluated recursively.
Next,
,
the distribution of a quadratic estimator ! AY, where the eigen values
of At are not all positive, is derived in terms of the confluent hypergeometric function.
Finally, the probability density function of a
,
•
quadratic estimator! AY, where the eigen
v~lues
of At are
no~
all
positive, is expressed as infinite series of chi-square density functions when both the number of positive eigen values and the number of
negative eigen values are even integers.
This series is useful in
studying the behavior of variance component estimators.
In Chapter 4 the distribution derived in Chapter 3 is used to study
the effects of changes in experimental design and true variance components on MINQUE.
The designs used for this study consisted of pairs of
independent two-way balanced experiments.
Recently, Giesbrecht (1977)
has examined methods of estimation and shown by a simulation study that
the estimates obtained from an iterative MINQUE procedure have smaller
variances than the estimates obtained from method of pooling sums of
squares and method of computing mean of the analyses of variance estimates.
It is notable that all these estimates are obtained by equating
the quadratic forms to their expected values and solving the system of
equations.
His work is extended to comparing the three methods togethar
with method of averaging mean squares by using several combined
·e
6
experiments.
In this .work, these methods are compared by using three
criteria, variances, the probabilities of negative estimates, and the
95th percentiles of the distributions.
The last two are obtained from
the distribution derived in Chapter 3.
Recall that the distribution
of a variance component estimator
values of matrix A!.
,
! AI
depends only on the e1gen
The true variance components are assumed to be
known, and the prior values required by MINQUE are replaced by the
true variance components.
7
2.
ASYMPTOTIC PROPERIIES OF RESTRICTED MAXIMUM
LIKELIHOOD ESTUfA,TES
2.1
m
THE MIXED MODEL
Introduction
In a recent paper, Miller (1977) has shown that in the mixed model
of the analysis of variance there is a sequence of roots of the likelihood equations which is consistent, asymptotically normal, and efficient
in the sense of attaining the Cramer-Rao lower bound for the covariance
matrix.
In this chapter it will be shown that similar results hold for
the Restricted Maximum Likelihood (REML) estimators of the variance
components defined by Patterson and Thompson (1971) and Corbeil and
Searle (1976).
One can view REML estimators as estimators obtained by
factoring the likelihood into two parts, one a function of contrasts
among fixed effects and the other a function of contrasts with zero
expectation and then maximizing the latter.
Reasons for considering
REML as opposed to conventional maximum likelihood include the
following:
a)
The REML estimates of variance components agree with the
values obtained from the analysis of variance when the data set is
balanced in the sense that there is an adjustment for the "degree of
freedom" lost due to the fixed effects in the model.
b)
The system of non-linear equations
,. 0
for i-O,l, ••• ,p
where (P+1) is the number of variance components in the model which can8
be reformulated as a system of linear equations,
for i-O,l, ••• ,p
Y'A Y •
-
i-
where the {Ai} and {cij } depend on assumed prior values of the variance components.
These have been discussed by Harville (1977) and
Giesbrecht and Burrows (1978).
The equations obtained are exactly the
equations obtained by applying MINQUE theory, developed by Rao (1970,
1971a, 1971b, 1972), when one has prior information about the variance
components.
Consequently if the iterative scheme suggested by the
linearMINQUE equations converges (with proper allowance to replace
negative estimates by zeros) then one has the REML estimates.
c)
Occasionally REML estimates are easier to obtain than con-
ventional maximum likelihood estimates.
2.2
The Mixed MOdel and Its Assumutions
Consider the model
Y
a
XB
--
+
U €1
1-
+ ••• + U €
+
(2.1)
U~
p-p~
where Y is an nxl vector of observations; X is an nxk matrix of known
constants; U , iaO,l, ••. ,p, are nxc matrices of known constants with
i
i
U
oa
I ; 8 is a kxl vector of unknown constants; e:., i=O,l, ••• ,p, are
n
~
-
cixl vectors of random variables such that
~
2
'" N(O,Dio ), where Di are
i
,
cixc i matrices of known constants with DO ... In' and E (~i.f..j) "" 0 for i~j.
9
From the model (2.1) it follows that E(Y) • XB and Var(Y) • 1:
Let d
i
be the rank of the symmetric matrix. D •
i
Then there exists
,
....ic'
a cixd i matrix Gi with full column rank such that Vi· ViDiU i • Uiui
if
where Ui • UiG i •
Define
2
~
2 2
2
• (oO,ol' ••• 'op).
2
is positive definite, provided 00
n
~
k + P + 1.
>
O.
Note that the matrix 1:
It will be assumed that
'.
The following three assumptions on the X and U matrices
i
are required:
a)
X has a full column rank.
b)
Ranks of the augmented matrices [X
U ], i-O,l, ••• ,p, are
i
greater than rank of X.
c)
U , i-O,l, ••• ,p, have full column ranks.
i
The first assumption can be satisfied by a suitable reparameterization.
The second and the third assumptions require that the fixed effects are
not confounded with any
oth~r
random effects and the random effects are
not confounded with each other.
If we assume!
~ N(~,t),
then the log-
likelihood function of Y is
(2.2)
The maximum likelihood procedure yields estimates of both fixed effects
and variance components simultaneously by maximizing the likelihood of
Y with respect to each element of
~
and each of the variance components.
As stated in Chapter 1, the method of maximum likelihood is rarely used
in practice because the arithmetic is difficult and because the method
fails to take into account the loss in degrees of freedom resulting from
10
estimating fix,d effects.
The approach of this paper will be to follow
Patterson and Thompson (1971) and partition the log-likelihood into two
parts, one part entirely free of fixed effects.
restricted to the latter.
of the variance components.
nonsingular transformation
Our attention will be
Maximization will provide REML estimates
Partitioning is accomplished by using the
rlxt1:-~1
-z •
Y
where T is an (n-k) xc. matri."t
r
~ ~ ~[[x't-lxs~'Lo
tT '
o
such that TX - O.
It follows that
0 11
X't-~
The model of the part which is entirely free of fixed effects is given
by
(2.3)
and the parameter space is defined as
(2.4)
, -1 .
2
The log-likelihood of TY and X
1 denoted by Ll (TY;Q. ) and
1:
(2.5)
and
(2.6)
Using the rules for matrix differentiation given by Graybill (1969),
2
differentiation of L with respect to elements of {cr } gives
l
i
11
for i-O,l, ••• ,p,
where tr(A) is the trace of a
mat~ix
A.
(2.7)
The second-order partial
2
derivatives of L (TY;,2, ) are
l
(2.8)
- Y'T' (TtT,)-lTU~U*'T'(TtT')-lTU*U*'T' (TtT,)-l TY
-'.
Joi
,
'-1
Define Q • T (TtT)
j j . -
T.
Th~
for i,J-O,l, ... ,p.
(2.7) and (2.8) can be written as
(2.9)
for i-O,l, ••• ,p
and
for i,j-O,l, ••• ,p.
(2.10)
Also note that Khatri (1966) has shown that
rt!'\
A value of E.2 E~
that maximizes L1 (T!),2,2 ) is referred to as a REML
A
est 1mate oX- E..2 •
Frequently numerical techniques will be needed to
solve the system given by (2.9).
beyond the scope of this paper.
The discussion of such techniques is
For the remainder of this chapter, the notation L (TY ;a 2 ) is
1n - 0 . -
12
2
used instead of Ll (T!;.£ ) to emphasize the dependence on n, the size
of the sample.
2.3
The General Asymptotic Theorem
A general asymptotic theorem, proved by Weiss (1971), is presented
by using notation which fit our needs.
This theorem is very general
and concerns the asymptotic properties of roots of the likelihOod
equation in some nonstandard cases, where the observed random variables
are neither independent nor identically distributed.
The theorem also
allows normalizing sequences of different orders of magnitude for
estimates of different parameters.
Miller (1977) has applied Weiss's theorem to prove asymptotic
properties of maximum likelihood estimates in the mixed model.
In
his work, Weiss's theorem has been stated in essentially this form.
In Section 2.5, the asymptotic properties of REML estimates in
the mixed model of the analysis of variance -.Jill be proved by applying
this theorem.
The general asymptotic theorem is stated as follows:
Theorem 2.3.1.
For a sequence of random variables TY
with the
-n
log-likelihood functions L n (TY
;0'2) where -0'2E~,
suppose the true
-n~
l
parameter pOint
~ is an interior point of CE>. Let there be 2 (P+l)
sequences {ni(n)} and {mien)}, iaO,l, ••• ,p, of positive constants such
that lim°ni(n) •
tl""-
=,
lim mien) no+-
00,
mien)
and lim n. ()
n
n-+- 1.
= O. Denote
13
2• 2
a-r.
,a
)
-nl n (TY
a2 e N
2
(.Q:n).
n-v
-
a)
, i,j-O,l, .•• ,p, fOT all
If
theTe exist I ij
(~), i,j-O,l, ••• ,p, such that Bij (n,fL2)
IfL-20
2 2
2
2
conveTges in probability to I ij (20)
as n ... ca, I ij (20)
is a continuous
2
. 2
function of ~, and a matTiX [I ij (~)] is positive definite, and
b)
t~ere
·22
exist sequences ( y(n,~)} and (o(n,~)} of positive
constants, which conveTge to zeTO as n ...
2
for all cr E
2
Nn(~)'
ca,
such that for each n
then there exists a sequence of estimates
... 2
~
(n),
2
which are roots of the equations
aLln (TY
;a ).
-n-
--~--2~--- ~
0,
i~o,l, •••
,p, such
ocr .
1.
that &2(n) is consistent and the vector whose ith component ni(n)(cr~(n)
- a
2
Oi
) conveTges in distribution to a normal random vector with mean
vector.Q. and covariance matriX [ I ij
2 ]-1 •
(.2:0)
In applying Theorem 2.3.1 to REML estimates in the mixed model
of the analysis of variance, we need to show that under some assumptions
on the mixed model the model (2.3) implies the requirements of Theorem
14
2.3.1.
These rill be shown in Section 2.5.
The assumptions on the
mixed model have been discussed in Section 2.2, and some further
assumptions required on the sequence of experiments will be discussed
in the following section.
2.4 A Seguence of Experiments
The asymptotic results will be established for a sequence of
experiments where the size of the experiments increases, that is, the
number of levels of each random effect increases.
One can visualize
extensions of previous experiments or an entirely different experiment
at each stage.
These assumptions are required to eliminate sequences
of experiments in which the limiting dist'l:ibucions are degenerate.
Without loss of generality, for each n let TU., i-O,l, ••• ,p, in
J.
model (2.3) be 1aaeled so that the ci(n) are in decreasing order of
magnitude.
Generate the partition of the integer {O,l, ••• ,p}, SO'
so that for any two indices i and j in a set S , the
s
associated ci(n) and cj(n) have the same order of magnitude.
Note
that there are a + 1 sets in the partition, SO' Sl, ••• ,Sa' where
Ss = {i s , i s +1, ••• ,is+l -1} for s-O,l, •.. ,a-l and Sa
= {ia , i a+l, ••• ,p}.
Define
(2.11)
for i E S
s
vi(n) • rank [TUJ.. :TU +l: ••• :TU ] - rank[TU. : •• :TU. 1:TU.+1: ••• :TU ]
si s P
J. s
J.J.
P
•
15
where the matrix ['rUl : 'rUZ: ... :TUp] is the augmented matrix.
Thus,
v (n) is the dimension of the part of 'rU not dependent on the other
1
1
'rU where 1 < j < p, 1+j, and 1
j
s- -
e S.
s
It is notable that v (n),
1
i-O,l, ••• ,p, are related to the degrees of freedom of the
squares in the analysis of variance.
sums
Define
for i-O,l, ••• ,po
It is assumed that lim
vi (n)
()
n"'- c i
exists for i-O,l, •.• ,p.
n
of
(Z.12)
This assumption
implies that the ith random effect not become confounded with any other
random effects when n becomes large, that is, v (n) and ci(n) have the
1
same order of magnitude.
Recall from Theorem 2.3.1 that
(2.13)
To facilitate the latter proof, 0'2, where E,.2 e N (~), rl~l be indexed
n
by subscript k, k-0,l,2.
For each n and each
2
~
2
e Nn (.£0)
(2.14)
1: k -
Since the covariance matrix 1:
nonsingu1ar matrix
t\.
k
is positive definite, there exists a
such that 1: k ...
I
t\.~.
Define
(2.15)
16
and
2 2
a -a
-~
(2.16)
Using (2.16) and the expected value of a quadratic form, the expected
fork.-i
(2.17)
Finally, define
2
It is assumed that all elements of {Iij(£o)} exist and that the
matrL~
[I ij ~) 1 'is positive definite.
2.5
Consistency,
Asvm~totic
Normality, and Asymptotic
Efficiency of REML Estimates
In this section we will prove that the requirements of theorem
2.3.1 are satisfied by the sequence of experiments discussed in
17
We will first establish a set of three basic requirements
Section 2.4.
listed below and referred to as conditions a, b, and c and then prove
a summarizing theorem, Theorem 2.5.1.
Condition a)
The three basic requirements are:
There. exist 2(p+l) sequences {ni(n)} and {mien)},
.i-O,l, ••• ,p, of positive constants such that lim
ni(n)~,
0:+»
lim m (n)-=,
o:+»i
mien)
and
lim
tr+""
ni(n)
• O.
Condition b)
all i,j,
2
Bij(n,~)
There exist
i,j-O,l, ••• ,p, such that for
2
converges in probability to I ij (20) as n
2
I ij (~) is a continuous function of
Condition c)
2
Iij(~)'
+
=, and
2
2.0.
There exist sequences
2
{y(n,~)}
positive constants, which converge to zero as n
+
2
and {o(n,£o)} of
=,
such that for
each n
2.5.1.
Proof of Condition a
By the definition (2.12), ni(n) •
lim ni(n) •
n-
=.
v~(n),
i-O,l, ••• ,p, implies
Define
K(n) - max
i,j
IE2 [B ij (n,~)] ~
I ij
(~)
I·
(2.19)
18
It is easily seen that K(n) converges to zero as u
~
~
~
+~.
Now define
~
m(n) - min (uO(u), ~ (n), ••• ,up(u), K (n».
(2.20)
Without loss of generality, let
mien)
It follows that lim mien) • ~ and lim ui(n) n-
2.5.2.
(2.21)
for i-O,l, ••• ,po
mien) • m(n)
n-
o.
Proof of Condition b
To prove the second and third conditions, we will assume that the
2
true variance components 00i' i-O,l, ••• ,p, satisfy the following:
•
2
2mi (u)
°Oi > u (u)
Condition 1.
for i-O,l, ••• ,p.
i
Since m.(n)
• men), i-O,l, ••. ,p, condition b becomes (jOi
2
1.
i-O,l, ••• ,p.
2m(n)
> n (n)'
i
This requires that the true variance components be posi-
tive numbers.
Define A.(A), i-l,2, ••• ,n, as the eigen values of an nxn matrix
1.
Also X
~
n p
0 2 E N (~) means that for any fixed e:
-
n~
p {IX -xl> e:} <
2
n
o
>
0 and 0
(j
>
2
X for all
0 there e..~ists
o.
In the next two Lemmas establish that
Iij(~)' i,j-O,l, •.• ,p,
defined in (2.18) satisfy the second requirement of Theorem 2.3.1.
We
19
first prove that for all i,j, B (n,~) converges in probability
ij
to
2
Iij~)
+~.
as n
From this point on, the notation of dependence on
n will be suppressed, for example, using n , m , and
i
i
2
Bij~)
instead
2
of ni(n) , mien), and Bij(n,ak ).
2
Lemma 2.5.1.
2
If B (Eo) and I (Eo), i,j-O,l, ••• ,p, are as defined
ij
ij
'in (2.16) and (2.18), then for all i,j, Bij(~) converges in probability
n +
Proof.
~.'
It is sufficient to prove that Var
2
[Bij(~)l
converges to
2
~
zero as n
+~.
Inequality.
The desired result then follows from Chebyshev's
From the definition (2.16), we have
v~r [Bij(~)l • v~r
~
{-
n~nj [~tr(QOViQOVj) -
I'QoViQoVjQo!l}
~
* *'
Using Lemmas 7.3 and 7.6 and Vj • UjU
j , we have
20
It follows
By Lemma 7.8,
is bounded by some constant.
2
It follows that VarfBij (£:0)]
2
converges to zero as n
+
=, completing the first part of
~
~.
2
The second part of b, that is the continuity of I .. (cr ) is
~J
o
established by the following lemma.
Lemma 2.5.2.
If I
ij
(~), i,j-O,l, •• .,p, are as defined
and condition 1 is tr.le, then I
ij
i..."1 (2.18)
2
(.£0) is a continuous function of
2v2
for all i,j.
Proof.
that
Iij(~)
is a continuous function of
prove that there exist cri and
~,
it is sufficient to
cr~ e Nn(~) such that
21
Since
n:~jltr(~ViQ2Vj) -
tr(Q1ViQ1Vj )\ converge. to zero a'
n
by Lemma 7.11, then
n~njltr(Q2ViQ2Vj) -
+.
converges to zero as n
function of
2.5.3.
.2.02
+
CIS.
Therefore, I
ij
tr(Q1ViQ1Vj
2
(.2.0)
)I
is a continuous
for all i,j.
Proof of Condition c
Let L(A) be defined as the linear space formed by all linear
combinations of the columns of matrix A.
From (Z.ll) and for each n,
Vi is the dimension of the part of TU i not dependent on the other TUj ,
For s-1,2, ••• ,a-l, let H be an
s
orthonormal basis for the part of L(TU
TU i
S+l+l
: ... :'!U ).
P
: TU +l: .• ':TU ) orthogonal
i s is p
Let Ha be an orthonormal basis for
(TU i : TU +l: •.• :TU ), and let HO be an orthonormal basis for the
ia
a
p
orthogonal complement of L(TU : TU : ••• :TU ).
2
p
1
Let the dimension of Hs
be (n-k)xc s , s-O,l, ••• ,a, then H .. (H0 : Hl: •• :Ha ) is an orthogonal
matrix.
Since Ti:ZT
,
is positive definite, there exists a lower
triangular matrix D such that T!2T' .. DD'.
definite.
Also H'T!2T'H is positive
It follows that there exists an upper triangular matrix R
such that R'H'DD'HR .. I.
..
o
for i;'j
(2.ZZ)
22
t
The vector Z can be written as
and the vector
' t ,
(~,Z1.""
,;) ,
'I! can be. written as
a
r
5-0
Fs-s
Z
(2.23)
The condition of the bound of -5
Z , s-O,l, ••• ,a, will be set up to
facilitate later development.
This condition will not rule out any
design of interest since it appears that the probability of this
condition being true approaches one as n
~ <w.
The condition is as
follows:
,
-¥s
11
-<-
Condition 2.
-c
- 10
s
for s-O,l, ••• ,a.
Finally, to prove condition c, there exist sequences
and
2
{o(n,~)}
of positive constants which converge to zero as n
such that for each n,
P
~
P 2
L L
m
p
{
2 i-O j-O
. 2
for all a
-
fJ..
I
2
sup
Bij (~ )
2 E N ( 2)
n~
E N
2
(a~),
n-u
it is sufficient to prove that for all i,j,
This is proved in Lemma 2.5.3 as follows:
2
{y(n,~)}
~
..
23
Lemma 2.5.3.
i,j~O,l,
2
2
Let Bij (~) and Iij~)' where k-O,1,2 and
••• ,p, be as defined in (2.16) and (2.18).
If condition 1 and
condition 2 are true, then
(2.24)
Proof.
We
have that
2
2 sup
< m
2
.Q:.l E Nn (~)
+
m2IBij(~~)
- E
2
[Bij(O'i)]1 + m21 E
~2
+ mZIE
z
~2
[Bij(~)l
I
Z
[Bij(~)l\
2
2
0'
-2
Z
+m\ E
Z
I
.
[ B (0' 22) ] - E2[B (9{)]
2
ij
ij
~2
- E
2
Bij (0' 1) - Bij (0' 2)
(Bij(~)J
I
(2.25)
-
Iij(~)1
.
£.0
~
Next, we will show that each term on the right hand side of (2.25)
converges to zero as n
+
=.
For the first term on the right hand side of (2.25),
converges to zero as n
2
By the definition (2.16), m IBij(O'i) -
Bij(O'~)1
+
=.
24
+
1
u n
i j
P~t r (Q2Vi Q2Vj)
~ 2::njltr(Q1ViQ1Vj)
I
2
But n:n
j
- !' Q2Vi Q2Vj QZy]
- tr(Q2 Vi Q2Vj l
I
'- !'QzViQzVjQzyl converge to zero as n
Therefore, nmnZ
i j
to zero as n
+
l
2
tr(QlViQlVj l - tr(Q2 Vi Q2 Vj )
7.16, respectively.
I
I
'
and n:n)!' Q1V i Q1 VjQ1 Y
+
CD
by Lemmas 7.11 and
Z - Bij(oZ)
Z
Bij('£'l)
! converges
CD.
This has been proved
in Lemma 7.17.
Therefore, the second term on the right hand side
Z
of (Z.25), m \ Bij (.£.;) - E [B ij
Z
(o~) 11,
converges in probability
°z
to zero as n
+
CD.
Using (Z.17), the third term on the right hand side of (Z.Z5)
m21 E2 [Bij
.£Z
(cri']' - E2 [Bij (~lJI· m21
-
n~nJ"tr (Q2Vi Q2Vj l
- tr(Q2 ViQ2Vjll\
°2
+
n~nj [~tr(QOViQOVj)
- tr(QOViQOVjQO!2)1!
~
2
z:'1n j
Itr(~V1Q2Vj)
- tr(QoV1IloVj )
25
I
)I·
2
+ n:n !tr(Q2V1Q2Vj) - tr(Q OV1QOVj QOt 2
j
2
By Lemma 7.11, 2:1njltr(Q2V1Q2Vj) -
tr(IloV1~Vj)1
converges to
2
zero as n ..
era.
Next we will show that
n:njltr(Q2ViQ2Vj)
- tr(QOViQOVjQot2)! converges to zero as n ..
era.
Since
~.
Q2 - QO'
it follows that
2
n:njltr(Q2V1~Vj)
2
• n:n
j
- tr(QOV1QOVjQot2)I
Itr(~Vi~Vj)+tr(~ViQOVj)+tr(QOVi~Vj)+tr«QO-QOt2QO)ViQOVj)/
.
2 2 2
I
:;. n:n tr(,iV1,iVj )
j
I + n:nj Itr(,iV1QOVj) I + n:nj Itr(QOV1,iVj ) I
.
2
+ n:n Itr«QO-Qot2QO)ViQOVj)I •
j
By the definition of m and lemma 7.10, ';le have that
.L!tr(m.~vJlr'
nin
r
j
.
converge to zero as n
- tr(oviQOVj
)I
21 E2 [Bij (a 22 )]
m
E.2
~
era.
°2
2
These, together with 2:1nj!tr(Q2V1Q2Vj)
converging to zero as n"
2
- E2 [B ij (.20)]
l.
I converges
era,
~ply
that
to zero as n ..
era.
26
Using (Z.17), the fourth term on the right hand side of (2.Z5)
Z
2 ~ E (Bij(~)]
Z
mZ\ E (Bij(~)]
Z
Z
0'2
20
- tr(QOViQOV j )/.
I ~ nin
m
j
\ tr(QOviQOVjQOL2)
It has been shown in Lemma 7.10 that
mZltr(QOViQOVjQotz) - tr(QOViQOVj )\ converges to zero as n
Therefore, m21Ez (Bij
(~)]
- EZ (Bij
~
0'2
n +
=.
(~) JI converges
+
=.
to zero as
-
Finally, the fifth term on the right hand side of (Z.Z5),
Z
m \ EZ (B ij (~)] - I ij
(~)
I,
converges to zero as n
+
<»
by the"·
~
definition of limit.
These steps are assembled to find nO such that given
First, choose n1 such that for all n > n 1
P {condition 1 and condition Z are false}
2
Q.Z
Then choose u
z~
u
1
such that for all u
- E}Bij (Q.~)]I
Q.Z
> n
Z
<
~ •
27
21
2
This is true because m B (0'2) _- E
ij
2
0'2
Next, choose n
~
3
n 2 such that for all n
- E2 [B ij
(~) ]
I,
3
u '
3
21 2 [B ij ~) ]
m E
~2
0'2
choose nO > n
>
- E2 [B ij
~
such that for all n > nO'
<
Then we may conclude that for n
~
.i
5
nO'
for
2.5.4.
(~) ] I'
2
~2 E
2
Nn (.2:0) •
Summarizing Theorem
The results from Subsections 2.5.1, 2.5.2, and 2.5.3 can now be
summarized by the following theorem:
Theorem 2.5.1.
For a sequence of experiments, each described by
the mixed model (2.1) under the assumptions discussed in Section 2.2,
satisfying the assumptions discussed in Section 2.4, consider the model
(2.3), which is entirely free of fixed effects,
IT •
'!Uof-o +
TU1:€l
+ ... +
TU~
28
of~,
with the log-likelihood function
space
QD be defined as that defined
2
point ~ is an interior point of
2
Ll(TY;£) • .Let the parameter
in (2.4) where the true parameter
®.
Assume that there exist (P+l)
sequences of positive constants ni(n) which depend on n, i-O,l, ••• ,p,
2
2
2
such that the matri..~ I~) - [Iij~)]' where I ij (.£o) -
... 2
2
sequence of estimates £ (n) of 2. with the properties as follows:
a)
Given
~ >
0, there exist
such that for all n >
a(~)
such that
a
a(~) >
and
nO(~)
nO(~)
~
b)
!he (P+l)x1 vector whose ith component is
l-e:.
ni(n)(&~(n) - a~i)
2 -1
converges in distribution to a Np+l(Q,(I(£o»
) random vector.
Proof.
As stated above, if we can prove that the assumptions of
this theorem imply the requirements of Theorem 2.3.1, the asymptotic results will follow immediately.
Consider the following
three steps:
First of all, there exist 2(P+1) sequences {n.(n)} and {m (n)},
1.
i
i-O,l, ••• ,p, of positive constants, as defined in (2.12) and (2.21),
m. (n)
1.
such that lim ni(n) • =, lim mien) • =, and lim n.(n) • O.
nn~
n- 1.
implies the first requirement of Theorem 2.3.1.
This
29
2
Iij(~)'
Secondly, there exist
i,j-O,l, ••• ,p, as defined
2
in (2.18), such that for all i,j, Bij(n,£o) converges in probability to I ij
2
(~) ~s
And for all i,j,
n
-+
2
Iij~)
CD.
This has been proved in Lemma 2.5.I.
is a continuous function of
has been proved in Lemma 2.5.2.
2
£0.
This
Therefore, the second requirement
of Theorem 2.3.1 is satisfied.
2
Finally, by Lemma 2.5.3 we have that m.
..
sup
2
2
2
IBij(n'~l)
.2.1 E Nn(~)
Ihis implies the last
requirement of Theorem 2.3.1.
It has been shown that REML estimates of variance components in
the mixed model of the analysis of variance. are consistent and asymptotically normal.
It is of interest to inquire whether these estimates
are asymptotically efficient.
In the independent, identically distri-
buted case, the Cramer-Rao lower bound for the. covariance matrix is
the inverse of the Fisher information matrix for one observation.
The
bound, that is, the inverse of the information matrix, in the problem
conSidered here cannot be defined in the usual sense because the observations in the sequence are neither independent nor identically
distributed, and normalizing sequences of different orders of magnitude
may be required by estimates of different parameters.
Thus the
definition of an information matrix which is analogous to the definition
30
in the independent and identically distributed case must be considered.
We define an information matrix in our case as the matrix I where the
ijth element is
2
2}
for i.,j-O, .. .,p. (2.26)
(1 -(1
-
.::.0
If the estimates are consistent and asymptotically normal and.if the
asymptotic covariance matrix is the inverse of the information matrix,
then a sequence of estimates is said to be asymptotically efficient.
The REML estimates are consistent and asymptotically normal with the
2 -1 •
asymptotic covariance matrix (I(£Q))
2
Since the matrix I(£Q) is the
same as the information matrix defined in (2.26), the REML estimates
in
the mixed model of the analysis of variance are asymptotically
efficient in the sense of attaining the Cramer-Rao lower bound for the
covariance matrix.
31
3.
THE DISTRIBUTION OF VARIANCE COMPONENT ESTIMATORS
3.1
Introduction
In this chapter we develop a general method of examining the
statistical properties of variance component estimators obtained by
equating translation-invariant quadratic forms of normally distributed
\
.
random variables to their expected values and solving the resulting
equations.
Clearly the estimators obtained in this manner are again
translation-invariant quadratic functions of the observations.
It will
be shown that distributions of these estimators can be WTitten as
linear functions of chi-square distributions, i.e., mixtures of chisquare distributions.
In general the coefficients of the individual
chi-square distributions will be functions of the true but unknown
variance components.
In a subsequent chapter the method will be used to study a number
of specific variance component estimation techniques.
The behavior of
several methods will be compared for specific values of the components.
3.2
The Method of Mixtures
In order to derive the distribution of a quadratic form of normal
random variables we require the definition of a mixture of a sequence
of distributions.
Definition 3.2.1.
If FO(x),Fl(x), ••• is any sequence of distri-
bution functions and if cO,c l ' ••• is any sequence of constants such
~
that c j > 0, j • 0,1, ••. , and
L
cj • 1, then the function F(x) •
CjFj(X) is called a mixture of the sequence of distribution functions.
4It
32
The following lemma provides a starting point for the derivation
of the distribution of variance component estiJnators.
n
If ! '" N(XS,1:), then y'AY is distributed as )
Lemma 3.2.1.
AjU
J-l
2
j
where ~, j-l,2, ••• ,n, are independent noncentral chi.-square random
variables each with one degree of freedom and noncentrality parameter
1
.. T""
j
Note that 1: .. LL
" AXS for A "s. 0 and Tl
PjL
j
,
j
is arbitrary for Aj - O.
,
and P. is the eigen vector corresponding to the non-:J
zero eigen value Aj of the matrix At.
The most commonly used methods of estimating variance components
are based on equating translation-invariant quadratic forms to their
expected values and solving the system of linear equations.
It is
clear that the estimators obtained are also translation-invariant quadratic estimators.
The definition of a translation-invariant quadratic
form is adopted in this work as follows:
Definition 3.2.2.
A quadratic y'AX is called translation-i~variant
if and only' if (Y - XS)' A(! - XS) .. X' AY for all!.
This is equivalent
to AX .. O.
It follows from Lemma 3.2.1 that a translation invari.ant estimator
n
is distributed as
,~ AjU 2 where U2j , j-l, ••• ,n, are independent central
j
j-l
chi-square variables each with one degree of freedom.
33
3.3
The Distribution of Variance Component Estimators
(Positive Quadratic F01:Il1s)
,
Let A. be an arbitrary symmetric matrix such that Y AY is translation invariant.
The eigen values of A.t can be identified such that
Al ~ A2 ~ , ••• , ~ An > O.
2
a(Un
2
One can then write
n
f..t"
j-l
'jU2 as
j
1\
A_
_2
n j
+ a 1Un_ 1 + ... + an _ 1Ul) where a • An and a j - --X-'
jal, ••• ,n-1.
n
The cummulative distribution function of a(U
2 + a U2_
n
1 n 1
+ ... +
can be written as a mixture of an infinite sequence of distribution
functions.
Theorem 3.3.1.
j-I, ••• ,n, are independent chi-square variables each with one degree
of freedom, and a, a , j-1, ••• ,n-1, are positive constants such that
j
a
j
~
0 and a
>
O.
Then for z > 0
P(Z
n-1
where Co ..
1r
j-1
~
z)·
L
k-O
~Fn+2k
<:)
_~
1 k-1
a j , ~ - 2k
i\_tCt for k > 1 such that
L
g.-a
(3.1)
34
H ..
m
n-l
1 m
,\" (1 - --)
, and F (z) is the cummulative distribution funcj-1
P
aj
tioo of a chi-square random variable Z with P degrees of freed01Jl.
Proof.
(See Johnson and Kotz (1970»
2
Z
+ ••• + an _ 1U1 , the characteristic function of ~, 'z(t), is
a
(1 - 2it)-~
n-1
1T
(1 - 2iajt)~.
Let w .. (1 - 2it)-1.
Therefore,
j-1
the characteristic function 'Z(t) can be written as
a
n-1
It is clear that IT
j-1
function of
'1
r
1 n-1
aj
1
2
(1 - -)U ..
2 j-1
aj
J
Q .. -
~k(Q;2'(1 -
(1 - (1 - L)w) -~ is the moment generating
Define
1
1k
a-)
,... , 1
2'(1 - a-» .. E(Q ).
1
n-1
Then the characteristic function 4l Z (t) can be written in terms of
a
an infinite series expansion, that is,
3S
4lZ (t) ~
,g, n-1
Z
w
(IT"
-~}
CD'
aiL
j-1
~. k
k! W
k
k-O
Define
kl
n-1
It follows that
~.
-~
(.rr aJ
)
J-1
k!
where K
m
~(l
- a
l
is the mthcummulant of Q.
l»
•
n-
~(m-1) !Em where
By using
n-1
Em·
L
.
K
m
(Q;l (l - 1
a
Z
) , ... ,
1
1 m
and
a.
J
(1 - - )
j-1
, we have that
~
k-1
.. -k g.-OL ~_g. c
n-l
Therefore, cO·
Tr
j-l
Since W
=
a-; and c k -
for k .. l,Z, ....
g.
k
k-l
L ~-t
g.-O
c t for k · l,Z, ••••
(1 - Zit)-l is the characteristic function of
X~Z)'
CD
4l z(t), then the characteristic function 4>Z (t)..
-a
I
k-O
~4>n+2k (t) .
36
Z
'Ihis implies that P (<
a -
transformation Z
a
. !a'
Z
a
)
By the linear
..
we have that for z > 0
3.4 'Ihe Distribution of Variance Component
Es~1mator~
(Indefinite-Quadratic Forms)
When. the eigen values of AI are not all positive, an estimator
y' AI is di.stributed as the difference of two independent linear func-
tions of single degree of freedom chi-square variables.
'Ihe
distribution is derived in terms of the confluent hypergeometric
function and in terms of chi-square distributions when the number of
positive eigen values and the number of negative eigen values are
even.
Define
(3.2)
for c,x
tion.
>
0 where
~(c,d;x) deno~es
the confluent hypergeometric func-
'Ihis function satisfies the confluent hypergeometric differential
equation of Kummer:
2
x
.!..f
+
dx
(d-x)
'* -
cy .. O.
37
The probability density function of the difference of two chisquare variables with m and n degrees of freedom can be written in
terms of confluent hypergeometric function as follows:
2
Theorem 3.4.1.
a,b > O.
2
Suppose'! • aX - bY where X '" X (m)' Y '" X (n)' and
The probability density function of T is given by
mrn
-t
----1
1
2
t
e
2i
1jI
[11.!!!ta.
,(a+b) t'
2' Z ' Zab j
for t
~
0
(3.3)
p(m,n)(t) •
-(~:~)t]
for
t
<
0
•
geometric functions as defined in (3.Z).
Proof.
See Press (1966).
When both m and n are even positive integers, the probability
density function p
(t) defined in Theorem 3.4.1 can be expressed as
m,n'
a mixture of a finite number of chi-square distributions as follows:
Corollary 3.4.1.
and a,b
>
O.
Suppose T • aX - bY where X '" X\Zk)' Y '" X~21) ,
The probability density function of T is given by
for t
P2k,21 (t) -
~
0
(3.4)
~ k-~
(a+H
1 \ 1-1
1a.+b1 s-o
2
ill.§.~lS
Lt)
s! Ia+b
f z (1-s) \ b
for t
~
0
38
where (k)
is the ascending factorial such that (k)
s
s
... r (k+s)
d
r (k) an
f (x) is the probability density function of a chi-square random varir
able X with r degrees of freedom.
Proof.
The proof is given for the case where t
where t
~
~
0 only; the case
0 can be proved analogously.
From (3.3), we have that
for
PZk,Z.2. (t) •
t ~
From (3.Z),
la+bj t ]
tP [.2., k+.2.; \2ab
.. -r
1
<.Z)
-1
r (.2.)
.r 1
(.2.)
(a+b\
By using the linear transformation Z.. \ Zabi tY and the gamma
function
r(a)
.. Je-z za-l dz,
we have that
o
k-l
1
tP~ .e.,k+.e.; ,Zab tJ .. r(.e.)
r
a+bj ';
r
Ik-l)' (Zab\
\s
a+b/
s-O
• r
~.e.)
k-l
r
saO
J e-z zS+i-l dz
s+.e.
all
t
-(s+.2.)
0
f(k)f(s+.e.)
f(k-s)s!
~ab)S+.2. -(s+.2.)
G+b
t
O.
39
Therefore,
k-i
P2k, 21 (t)
~
2
-t
s-k s+t-k.. s
a
-0 r (s-H. )
''-'''--::''S+~t--=--=-~~
-
saO (a+b)
_L.!-) t-l{
\a+b
2a
e
t
k-s-1
str(R.)r(k-s)
-t
k 1
~ J!l!.lJL)
1 \
a+b1'
saO
st
ra.:+h
s{e
-
2a
(~l
k-s-1
k
}
r(k-s)2 -s
Since
-.L
f 2 (k-s)
(~.
2a
(..;J k-s-1
_e_.-...;..;:a1=--__
r(k-s)2 k- s
then we have that
PZk, 21 ( t ) ·
U-)
1p:rb
t-l
k-1
W
\i+bl
~
t.
s-O
~ L-2-' s
s!
\a+b)
f
rSo\
2 (k-s) iaJ
for t
~
0
Corollary 3.4.2 gives an alternate form of the probability density
function pm,n (t) defined in Theorem 3.4.1.
This form is more convenient
for this study.
Corollary 3.4.2.
and a,b
>
O.
Pm,n (t)·
Suppose T • aX - bY where X ~
X~m)'
Y~
X~n)'
The probability density function of T is given by
.L
ab
'"
f m (t:V)
J
0
f n (~) dv
for t
~
0
(3.5)
'"
-ab1 J f n
0
-(~
fmra) du
for t
~O
where fr(v) is the probability density function of a chi-square variable
V
with r degrees of freedom.
4It
40
Proof.
The proof is given for the case where t
where t
~
0 only; the case
0 can be proved analogously.
Using (3.2) and (3.3), we have that for t
e 2a t 2
Pm,n (t) - ';;;"'m+n-"";;;~m-n---
~ b2
2
2
m+n la+b}
n
1P [ 2'"2-; \2ab
0
GO
0
GO
J
0
]
1
f
By the linear transformation S -
1
ab
t
riB!)2
-t
m+n
--1
e 2a t 2
•
!!t!l m n
2 2 a2 b 2 r(~\
21
--
~
m+n
--1
-_t
Pm,n(t)
~
1
m
e
~
-\2a
e
- (~
2ab st
m
n 1·
--1 2(1+s)2
s
ds.
r (~l
JJ
t'
r:
for t > 0
m
--1
V
j2
r( ~} 22
n
v
-2 -1
e- 2b (v)
b
dv
n
f(n)22
2
GO
1
aab
f
f
,t+v,
'v
f (hj dv.
\ a
n
m '-)
0
,
As noted above, if the estimator Y AY is an indefinite quadratic
form, that is, Ar is not positive definite, then it may be treated as
the difference of two positive definite forms.
This implies that a
r
variance component estimator X'AY is distributed as
s
r
j-1
L
i-1
AiU~
ajU~. where a j • -Ar+j' j-1, .•• ,s, r is the number of positive
J
41
eigen values, and s is the number of negative eigen values.
can be represented as
clear that
It is
~~tures
of chi-square variables.
Write
h
b • al' a
b(b s U2r+s + b s-lU2r+s-l + ••• + ~2
~~l ) were
a - '
h
r'
i
i • 1,2, ... ,r-l, and b j • ~
b ' j • 2,3;... ,so
';\,
.
r-l.
.a
'Ihe probability density
2
function of a(u; +. alU;-l +. . •. + a r- l U1 ) - b (b s u2r+ s +
as- lulr+s-1
+ ••.
2
) is derived in term~ of the confluent hypergeometrlc function.
r+l
+. U
2
Let. 'I - a(U r2 + a u rl' +... + ar-l TTu 12 ) - b('o s U2r+s ~'
l
Thee rem 3. 4. 2.
b
.,
s-l r+s-l +
U
2
+ Ur+l)'
'Ihen the probability density function of 'I
is given by
for _
where
r
to
E
:Ie
~,cl
< t
<
=
(3.6)
are the constants defined as in Theorem 3.3.1 corresponding
2
0jUr+j' respectively, and Pm,n(t) is the probability
density function of the difference of two independent chi-square variables as defined in (3.3).
...
.,
42
and V are independent random variables.
From Theorem 3.3.1, the
cummulative distribution function of a random variable U
QII
Gu(u) •
Jo ~Fr+2k{~)
for u
~
(3.7)
0
where Fp (u) denotes the cummulative
distribution function of a chi,
square random variable U TNith P degrees of freedom.
It is clear
that Fr+2k (:), k • 0,1, ••• , are differentiable, that is,
~
fr+2kl:) .. ;
F~2k [~)
where f p (u) denotes the probability density
function of a chi-square random variable U with P degrees of freeco
dom.
Since the series
c
[.\' ...£
a f r+2k
k-O
(U)
a
converges uniformly on
every finite interval of u, then if converges to the function
Su(u) where gU(u) .. GU(u).
Therefore,
QII
8u(u)..
r .f fr+2k
k-O
(:)
for u
QII
By the similar argument, we have that
r
~
(3.8)
O.
c*
b i f s+2i
(~)
converges
i=O
uniformly on every finite interval of v to gv(v) where 8V(V) ..
,
G (v).
v
Therefore,
43
for v
(3.9)
> 0
By the convolution formula, the probability density function of
a random variable '! is given by
CD
~(t)
-
f
for t
iu(t+v) gV(v) dv
(3.10)
> 0
o
Substituting (3.8) and (3.9) into (3.10), we have that for t
CD
h.x(t)
-
f
CD
.
~ f r+2k I~]. :
[L
o
I
r !
r :1 f s+2.9. (;rJ'
dv
•
;,~fr+2k
~
(t+v)
-;- and the series
k-O
c.*
1-0
*
L 1-0 .
k-O
Since both the series
co
CD
~_O
,.,
fs+ .9.[ ~) converge uniformly on every finite interval of
2
v, then
h.r(t)...
r
ft+V
}
a
1<;-0
f s+22.
By Corollary 3.4.2, we have that
CD
co
for t
For t
~
>
0
0, h.r(t) can be derived analogously by using the con-
volution formula
44
h.r(t) -
f
gv(-t+u)
~(u)
for t
du
~
0
o
When both r and s are even positive integers, that is, r - 2p and
s - 2q, the probability density function of a random variable T is
given by
r
I
h.r(t) -
~
for t
0
I
I
(3.11)
\'
I
co
co
l ~o
r
*
.2.-0 ~c.2. \a+bl
i.JL) p+k-Lt.J:.j q+2.+1. (p+k)s (~ s
( t,l
.Ia+bl
s 1.
a+bJ f 2 (q+~-s) -bJ
sIo .
for t
~
o.
In applying the results from Theorem 3.4.2 when both r and s are
even positive integers, the formula which appears well-suited for
co~
puter calculation of the cummulative distribution function of a random
variable T is
,'/
i
co
r r
co
c*{~
~. k-O ~-O ~ .2. -a+H
t'saOt
q+.2. [.....k-l
(g+~)s
s.
S( 1 w
\a+bJ
F
{ta IJ J]
2 (P+k-s)
for t
>
0
H.r(t)
(g+k) s I a' s 1
s!
(a+b)
( -
F
t-t1)))]
2 (q+.2.-s) ;
for t- < O.
45
where F (t) denotes the cummulative distribution function of a chip
.
square random variable T with p degrees of freedom.
c.~ (a+bb} p+k
~
d... _ (t) -
r+t-l
,\
Define
(pZ~) S (a+ab) s (1 - F2 (q+1-s)
]
(-t
b ))
saO
where
k~..t
..
O~l~....
Then the cummulative distribution function of a
random variable T can be written as
for t
~
0
for t
~
0
H.r(t) ..
For our programming purposes the steps in the calculation of the
cummulative distribution function of a random variable ! for t
~
0 are
set out as follows:
a)
For any k, co~ute
N
r
d~k(t) where N is the number such that
2.-0
d~k(t) is less than a prescribed magnitude ~, usually 10
b)
Note that the
{~}
-4
for
2. ~
N+l.
are computed using the formulas given in
Theorem 3.3.1.
c)
Compute
4
in less than a prescribed magnitude €, usually 10- , for
it
k ~ N +1.
46
Therefo re,
H-(t) • I _
L
~* ~
~O
9.=0
* L.!...l q+9.[P+~-1 (g+9.)s W s
~c1
\a+bl
s=o
s!
la+bl
(t))]
(1-F2 (P+k_S);
for
t ~
0
In the case where t < 0, the steps in the calculation are similar to
those when t
~
O.
It is easily seen that the distribution depends only on a,b,a.,
J.
i • 1,2, ••• ,.r-l! and b j , j • 2,3, .... ,s.
In the case where one of these
numbers is large when compared with the other, the series will converge slowly.
One should change a prescribed magnitude
€
to 10-5 to
include more terms.
The cummulative distribution obtained from the method developed
has been compared with the cummulative distribution obtained from the
Monte Carlo method.
~(t)
As an example, we consider the evaluation of
2
2
.2
2
2
2
.2
• P(.4SU I + .2SU 2 + .2SU3 + .2SU4 - .13U S - .13U 6 - . 1307
-
2
.2lU 8
< t)
for different values of t.
The cummulative distribution from the Monte
Carlo method is obtained from the relative cummulative frequency distri2
2
bution of 30,000 random variates distributed as (.45X(1) + .2SX(1) +
222
222
.25X(1) + .2SX(1) - .13X(1) - .13X(1) - .13X(1) - .21X(1»·
results are shown in Table 3.4.1.
The
47
Table 3.4.1
The cummulative distribution obtained
from the method developed and the Monte Carlo method
of T with
€ •
10- 3
H.r(t)
t
Exact
Monte Carlo
-1.50
.00550
.00657
-1.00
.02415
.02427
- .50
.08662
.08990
.00
.26421
.26660
.50
.52368
.52187
1.00
.72299
.72017
1.50
.84694
.84533
2.00
.91768
.91430
2.50
.95663
.95327
3.00
.97735
.97440
3.50
.98871
.98603
4.00
.99507
.9-9300
e
48
Note that the method developed depends also on when the series
is terminated.
In this example, all series are terminated when the
contribution from the remaining terms is less than 10- 3 , the cummulative distribution obtained from the method developed is less than that
obtained from the Monte Carlo method when t
0 and greater than that
<
obtained from the Monte Carlo method when t > O.
Allowing more tems
of the series will give the more accurate results.
It is of interest to know what the graph of the probability denr
sity function of T, where T •
r
i~l
2
AiU i -
s
r
j-l
.
CL •
J
2
Ur+j' looks like.
In
Figure 3.4.1 the graphs of the probability density functions of T, where
4
T•
. 4
2
AU
CL.U "
r
r
4
i 1
j-1
i-1
?
J
~J
, for three sets of Ai' i • 1,2, 3 , 4, and
CL. ,
J
j .. 1,2,3,4, are shown.
It is noticeable that when the ratio
becomes large, the
graph of the probability density function shifts to the right and becomes flatter.
Also the tail of the graph on the right is larger than
the other, that is, the distribution is positively skewed.
.60
r-----~'=_----"-----,-----.----......,r-----I
i
I
I
(.45 •• 25 •• 25 • • 25. -.13. -.13. -.13. -.21)
.40
(.61 •• 33 • • 33 • • 33. -.13. -.13. -.13. -.21)
,.."
4.J
......,
r
.20
1.1625. 1.1625. 1.1625.
-.16. -.16. -.16. -.16
-2.0
0.0
2.0
4.0
6.0
8.0
t
lo'igure 3.4.1
4
Graphs of the p.d.L of T I where T ""
r
4
2
2
"iUiaju +jl for
r
i;;1
.
j;;i
l
three sets of ("l" "2 1 "3 1 "41 -all -a 21 ~a31 -a )
4
,f:-
\D
e
e
e
so
4.
COMBDUNG INFORMATION FROM SEVERAL EXPERIMENTS
4.1
Introduction
, It is not unusual to encounter situations where an experimenter
wishes to estimate variance components and has at his disposal data
from a number of experiments.
If these experiments have identical
designs, then it is clear that one should combine information by
averaging estimates from individual experiments.
If, however, these
experiments happen to have different designs, then he may select any
one of a number of possible teclmiques of analysis.
different choices may lead to different answers.
Unfortunately,
!he theory developed
in Chapter 3 is now used to study and compare properties of four
reasonable alternative variance component estimators.
The rationale is
that this situation has sufficient structure to permit study and jet
give some insight to the fully general unbalanced case.
Despite the
limited nature of this study, that is, combining pairs of balanced
experiments, certain general trends appear.
We find, for example, that
the straightforward technique of simply computing averages of estimates
obtained from separate analyses of variance may be very inefficient.
The following four methods of obtaining estimates are considered:
a)
ance.
Pool the sums of squares from the separate analyses of variThis can be thought of as an adaptation of either Henderson's
method 1 or method 3.
b)
Compute the mean of the estimates obtained from the separate
analyses of variance.
51
c)
Compute the averages of the mean squares in the analyses of
variance, equate to their expected values, and solve for the estimates.
d)
Apply the MINQUE theory discussed in Chapter 2 to the
unbalanced data set.
It can be shown that in this case the estimators
are weighted functions of the sums of squares in the individual
analyses of variance with weights that depend on the true variance
components.
In practice prior estimates of the components must be
used when computing these weights.
For purposes of this study it was
assumed that the true values were known and consequently the results
must be interpreted as providing a bound for the technique.
The estimates will be compared on the basis of variance, probability of yielding negative estimates, and the 95th percentile of the
distribution.
By their very nature, methods a, b, and c are unbiased.
Method d is also unbiased if one uses
fL~ed
prior values for the
components and accepts the occasional negative estimate.
Truncating
the distribution by replacing negative values by zeros destroys the
unbiasedness.
The effects of iterating the process are not investi-
gated in this study.
4.2
MOdel and Methods
We assume the conventional linear model for the rxc table with
interaction and n sub samples
where ~ is an unknown constant, the {r },' {c j }, {rc }, and {£ijk}
ij
i
are all independent normal random variables with zero mean and variances
4It
52
a 2 ' a2 , a 2 , and a 2 , respectively.
c
rc
e
r
For purposes of this chapter, it is
convenient to rewrite the model as
(4.1)
It follows that
itions.
When information from a series of m experiments with common vari-
2
2
2
2
ance components, a , a , a ,and a , is to be combined, the'model (4.1)
r
c
rc
e
will be indexed by subscript k, k-1,2, ••• ,m.
Four methods of combining
information and obtaining the estimates to be investigated are the
method of pooling sums of squares (method a), the method of computing
mean of the analyses of variance estimates (method b), the method of
averaging mean squares (method c), and the MINQUE method (method d).
In each case, estimates are obtained by equating quadratic fOrMS to
their expected values and solving the resulting system of equations.
These equations will be listed as follows:
a)
The system of equations obtained by pooling sums of squares
(method a) is given by
m
m
I
ck~(rk-l)
r
(r.-l)
k-l
m
j-l
J
m
I
~(rk-l)
I
(rj-l)
k-l
a"2r + m
jal
.. 2
.. 2
cr
+
cr
rc
e
I
s~
r
(rj-l)
k=l
=
m
j-l
53
m
r
.,-;;k~-;;;.l
rk~('it-1)
m
r
j-l
(cj
m
r
.c
-1)
~('it-1)
&2 + ._k-___.1
m
&2
m
+ &2
rc
r
•
r.SSSc.
__·_ _
~k.-~·;;;.l
m
€
r
(cj-l)
j-l.
(Cj-l)
j-l
(4.2)
b)
For the method of computing the mean of the analyses of vari-
ance estimates (method b), we first find the analysis of variance
estimates obtained by solVing the system of equations (4.3) for each
experiment and then find the mean of these estimates from m experiments.
!he system of equations for each experiment is given by
2 +"2
cna.. r2 + na.. rc
a€
= MSR
rna.. 2 + na.. 2 + a.. 2
c
rc
€
= MSC
(4.3)
.. 2
.. 2
na rc + a €
= MSRC
&2
• MSE •
€
c)
The system of equations obtained by averaging mean squares
(method c) is given by
m
l:
2
o
m
r
2
o
n c
+
n
+
k-l k k r k-l k rc
2
ma €
m
•
I
MSR..
1<.-1
nk
54
m
2
~ tl l 0
k-1 1 k C
- m
2
+ ~ tlk 0 +
1<.-1
rc
r
m
2
ma. e:
m
=
(4.4)
-2
-2
m
~crrc + mcr e: • ~ MSRCk
k-1
k-1
2
ma e:
d)
r MSCk
k-1
m
•
r MSE-
~
k-1
The system of equations obtained by the MINQUE method (method
d) is g:1.ven by
m (ri.-1)cktl 2 -2
m (rk-1)ckn -2
k
k
cr
+
cr
cr
+
L
k-1 (E (MS~» 2 e:
k-1 (E(MS~»2 rc
k-1 (E(MSR ~ » 2 r
m
2 2
(rk-1)~nk -2
r
r
•
m ~tlkS~
r
1<.-1 (E(M~»2
2 2
2
m (ck-1)rktl
m (ck-1)rk~ -2
m (ck -l)rk~ .2
k .2
cr
+ E
cr
cr· + E
1<.-1 (E(MS~»2 rc
k-1 (E(MSC »2 C
k-1 (E(MSC »2 e:
k
k
r"
-E
t
L
(4.5)
m rk~SSCk
kw1 (E(MSC »2
k
2
2
.. (
) 2
m· (~-l)rk~
.2
m
rk-1
~
cr-2 + L
cr +
E k
k-l (ECMSC »2 c
1<.=1 (E(MS~»2 r
1<.=1 (ECM~»2
k
m
(r
-1)~~
2
+
(~-1)~
CECMSC »2
k
55
(rk-l)(~-l)
(E(MSRC ))2
k
+
SS~ )
(E(MS~))2
.
It is easily seen that the estimates obtained in each case are
weighted linear functions of the sums of squares in the separate
analyses of variance, i.e.,
The"
~~ected
SS~,
SSC , SSRC , and
k
k
SS~,
k-l,2,.,m•
•
values of these sums of squares are
(4.6)
E(SSRC.) • (r, -l)(c, -1)0'
l<.
t(.
l<.
2
€
?
+.,'-kl<.
(r, -1) (ck-1)0'rc
and the variances of these sums of squares are
Var(SS~)
2(E(S~)]2
•
(rk-1)
(4.7)
Var(SSC k ) ·
2(E(SSC )]2
k
(~-1)
56
2[E(SSRC )]2
k
Var(SSR~) .. (rk-l)(~-l)
Var(SS~)
&t
..
2[E(SSE )]2
k
for k-l,2, ••• ,m.
(1)
rk~~-
In order to apply the theory developed in Chapter 3 the sums of squares
must be written in the form:
J
V
kl
N
k
C
k3
C~, - - -r-"-n- , C
c:. t1.
k2
-K K
k-k"K.
element is equal to one, and
~k
is an N x
k
~k
fore, the estimates obtained can be written in
fo~ y' A..!.
, and
identity matriX.
te~s
There-
of a quadratic
These estimates are translation-invariant quadratic esti-
mates because we have the condition AX = 0 in all cases.
Recall that the distribution of a translation-invariant quadratic
- - .
estimator y' AY delJends only on the eigen values of the matr:ix AI: where
4 .. Var(Y).
values.
positive.
Clearly the estimates of interest can assume negative
These may happen because the eigen values of AI: are not all
The method developed in Chapter 3 of finding the distribution
of variance component estimators will be discussed in more detail in
Section 4.3.
In Section 4.4 to Section 4.6 the method will be applied
57
to find the probabilities of negative estimates and the 95th percentiles
of the distributions of the estimates.
4.3
Distribution and Variances of Estimators
The four estimates will be compared, in Section 4.4 for a set of
unbalanced designs, by using three criteria, variances, the probabilities of negative estimates, and the 95th percentiles of the
distributions of the estimates.
method developed in Chapter 3.
The last two are obtained from the
Also, the properties of the MINQUE
method for selected unbalanced designs
~d
selected values of the true
variance components will be examined in Sections 4.5 and 4.6.
The estimates are linear functions of the independent sums of
squares in the separate analyses of variance.
of these
estima~es
Consequently variances
are simply obtained by using (4.7).
,
The estimates are also of the form YAY.
Consequently, the method
of finding the distribution of quadratic forms can be applied to find
the probabilities of negative estimates and the 95th percentiles of
the distributions.
Recall that the variance component estimator
!
,
AY
is distributed as
-Qj' j-l,2, ••• ,s, such that Ai and Q
j are positive numbers, are the
eigen values of a matrix At.
2
2
This can be written as a(U r + a1Ur- 1
i-l,2, ••• ,r-l, and bj
.. ~b'
j-2,3, ••• ,s.
This
58
study is restricted to the case where both the number of positive
eigen values and the number of negative eigen values are even, that is,
both r and s are even.
Define T ..
2
AU i
i
s
L
j-l
2-
C%jUr+' ~
J
The
probability density function of T when r - 2p and s • 2q is given by
for
h.r(t) ..
~
t:O
~
t
>
0
(4.8)
lo·
c*(.JL)p+k-l/...L) q+t-l !P+k)s a s
(_ !')
t a+b
'a+b
s! (a+b) f 2 (q+t-s)
b
for t
~
O.
The distribution of the estimators obtained from the four methods is
the same as that of T.
For discussion purposes, a random variable T
is used to represent any estimator.
The probability of negative estimates and the 95th percentile of
the distribution are obtained from the cumulative distribution function
of T.
The formula which appears well-suited for computer calculation
of the cumulative distribution function of T is
*(.....L)q+t [P+k-l (o+.7.)s
b s
t ~
r
r
~Ct a+b
L
sl
(a+b) (l-F2 (P+k-s) (a»J
k-O taO
saO
CD
1 ~(t)
..
CD
for t > 0
for t
<
O.
(4.9)
59
since the cumulative distribution function HT(t) is of the infinite series form, the results in Section 4.4 to Section 4.6 are then
calculated by asing
for t
~
0
for t
~
O.
(4.10)
where N is the number such that
is less than 10-4 for t
N+l,
~
N* is the number such that
is less than 10-4 for k
N+l,
~
M is the number such that
r....E-)P+k [
Ck\a+b
q+9.-l
\'
'0
s·
-4
is less than 10
for k
(P+k)s ( a )s(
sl
~
-;H;"
'- t»]
I-FZ(q+R._s)l;""b
M+l,
M* is the number such that
*
CNd
b p+k q+9.-1 (~k)
a s
t
~1(.\I.a+b)
[ \ ,'
~S! s I.\~) (1 - F 2(q+t-s) (-b
k-O
s-O
M
\'
(.
-»
]
.
-4
.
is less than 10
for 1
By calculating
1i: (t)
60
~
M*+l.
for several values of t a.n.d interpolating,
the 95th percentile of the distribution of T is obtained.
4.4 A COmDarison of Four Methods of Estimation
In this section the four methods of estimation discussed in
Section 4.2 will be compared for eight experiments.
The combined
experiment is defined as a pair of dissimilar but balanced exper1ments.
The behavior, 1. e., variances, the probabilities of yi.elding negative
estimates, and the 95th percentiles of the distributions, of &2, ;2,
r
c
&2 , and &2 obtained from all four methods and for eight combined
rc
e:
experiments are considered under a set of true variance components,
2 · ' 4 , cr 2rc • .8, an'd cr e:.
2
1 ••
0
cr 2r =- 10
• , crc
To facilitate the later discussion, the methods will be referred
r
to as simply methods a, b, c, and d.
The seructure
be used to represent the combination of
~~o
(r
l
2
experiments,
the number of levels of row factor in the kth experiment,
~here
~
l.<:.
number of levels of column factor in the kth experiment, and
number of subsamples in the kth
~~eriment,
r, is
«.
is the
~
is the
where kw l,2.
Variances, the probabilities of negative eseimates and the 95th
percentiles of the distributions of
ar2
obtained f~om four methods of
estimation and eighe combined e."tperiments are shown in Table 4.4.10
61
Table 4.4.1
~2
~2
Variances of a , probabilities of negative a , and the 95th
r
r
~2
percentiles of the distributions of cr
r
from four methods and eight
2
2
2
2
combined experiments when ar • 1.0, ac • .4, a~ • .8, and a& • 1.0
•
No
Design
Method
1
2 2 2)
(422
a
1.573
.194
3.417
b
2.097
.218
3.758
c
2.097
.218
3.758'
d
1.569
.193
3.393
a
2.259
.255
3.960
b
2.468"
.256
4.086
c
2.291
.256
3.965
d
2.236
.255
3.963
a
.999
.116
2.978
b
1.871
.178
3.549
c
1.230
.143
3.143
d
.992
.105
2.936
a
1.587
.190
3.431
b
1. 825
.200
3.593
c
2.545
.223
4.047
d
1.489
.187
3.366
a
.753
.078
2.636
b
.823
.098
2.680
c
.764
.080
2.640
d
.745
.082
2.613
2
2 2 2)
(242
3
4
5
4 2 21
(4 4 2)
95th value
For each combined experiment, variances of &2 obtained from methr
ods a and d are appreciably less than those from methods b and c, and
variance of
02r
obtained from method d is less than that from method a.
But there is an almost equal distribution of incidences in which
variance
0f
aA2 obtained f rom metho d b is greater than, equal to, or
r
less than that from method c.
Therefore, when considering the criterion
of variance, methods a and d yield more accurate results than methods b
2
and c, and method d yields the most accurate results on 0 of all four
r
63
methods.
Note that method d yields the minimum variance quadratic
unbiased estimates of the variance components.
In general, methods b and c are more likely to yield negative
estimates for (12 than a and d.
r
... 2
The probability of a negative (1
r
obtained from method d is less than from method a, except for combined
experiment number 5.
In this combined experiment, the magnitude of
the difference between these two methods is small.
The probabilities
of yielding negative
a; from methods b and c depend on the experiments
which are combined.
It is notable that methods band c frequently give
the same results.
When the criterion
o~
95th percentile of the distribution is used,
one has to give some consideration to the probability of yielding
... 2
negative (1 because the shape of the density function is different.
r
The results, in general, agree with the results when the variance
criterion is used.
In some cases, it happens that one method gives a
... 2
smaller probability of yielding negative (1 and a larger 95th percentile
r
of the distribution, while the other gives a larger probability of
... 2
yielding negative (1 and a smaller 95th percentile of the distribution
r
(see combined experiment number 5; compare methods a and d).
Therefore,
these sometimes make a comparison impossible, and in this case variance
may be the best criterion.
From the above discussion, it can be seen that a conclusion cannot
be made as to the desirability of methods band c because they depend
on the experiments which are combined.
But it is notable that they
64
frequently give the same results.
Therefore, one may say that when
all are considered, method d is the most desirable, method a the
second most desirable, and methods band c the least desirable of all
four methods.
Variances, the probabilities of negative estimates, and the 95th
percentiles of the distributions of ;2 obtained from four methods of
c
estimation and eight combined experiments are shown in Table 4.4.2 •
... 2
... 2
Variances of a , probabilities of negative a , and the 95th
c
c
percentiles of the -distributions of·&2 from four methods and eight
c
2
combined experiments when a • 1.0, a 2 • .4, a2 • .8, and a2 • 1.0
c
r
rc
€
Table 4.4.2
e
... 2
Var(a )
c
P(a 2
0)
No.
Design
Method
1
~ ~
a
.806
.375
2.135
b
1. 043
.366
2.302
c
.837
.364
2.144
d
.803
.374
2.130
a
.763
.328
2.012
b
1.017
.348
2.217
c
1.017
.348
2.217
d
.761
.325
1.989
a
.330
.259
1.529
b
.856
.309
2.042
c
.505
.280
1. 711
d
.329
.250
1.490
2
3
2
2
{~
(~
2 2'
4 21
2
4
~
<
c-
95% value
65
Table 4.4.2 (continued)
No.
Design
4
~i~
5
6
7
8
(4 2 2)
442
~
i :)
(4 4 2)
424
I~ : ~
Method
var(&2)
c
p(&2 < 0)
c-
95% value
a
.797
.367
2.124
b
.846
.352
2.144
c
.846
.352
2.144
d
.739
.362
2.074
a
.280
.226
1.439
b
.374
.257
1.592
c
.374
.257
1.592
d
.280
.229
1.426
a
.642
.346
1.952
b
.797
.341
2.093
c
.662
.340
1.965
d
.641
.347
1.954
a
.282
.223
1.439
b
.324
.234
1.509
c
.452
.274
1.717
d
.266
.216
1.409
a
.266
.225
1.376
b
.282
.212
1.370
c
.282
.212
1.370
d
.246
.212
1.346
~
e
66
For each combined exper~ent, variances of
a2c
obtained from
methods a and d are significantly less than those obtained from methods
b and c, and variance of &2 obtained from method d is less than that
c
obtained from method a.
Changes in variances
0f
a.. 2 obtained f rom
c
methods b and c are directly related to the experiments which are
combined.
Therefore, when considering a criterion of variance, methods
... 2
a and d yield more accurate results on a than methods b and c, and
c
... 2
method d yields the most accurate results on a •
c
For four out of eight combined experiments, that is, combined
experiment numbers 2, 3, 5, and 7, the probabilities of observing nega... 2
tive a when using methods a and d are significantly smaller than when
c
using methods band c.
Method d, in general, gives a smaller proba-
... 2
bility of a negative a c than method a.
... 2
obtaining negative cr
c
As before, the probabilities of
when using methods b and c depend on the experi-
ments which are combined.
When considering the 95th percentile of the distribution of ;2
c
... 2
together with the probability of yielding negative a
c
as the criteria,
the results are the same as the results when considering variance as a
criterion.
These have been discussed in the preceding paragraph.
We
conclude that method d yields slightly better results on &2 than method
c
a and considerably better results than methods b and c.
67
Variances, the probabilities of negative estimates, and the 95th
..2
percentiles of the distributions of cr
obtained from four methods of
rc
estimation and eight combined exper::fJnents are shown in Table 4.4.3 •
Table 4.4.3
.. 2
... 2
Variances of arc' probabilities of negative arc' and the
... 2
95th percentiles of the distributions of arc from four methods and
2
2
2
eight combined experiments when a r • 1.0, cr c • .4, arc • .8, and
No.
1
2
3
Design
·Method
95% value
a
.887
.181
2.644
b
1.174
.213
2.905
c
1.174
.213
2.905
d
.873
.181
2.604
a
.887
.181
2.644
b
1.174
.213
2.905
c
1.174
.213
2.905
d
.882
.184
2.615
a
.363
.062
1.914
b
.978
.139
2.744
c
.978
.139
2.744
d
.359
.066
1.903
68
Table 4.4.3 (continued)
69
..2
For each combined experiment, variances of a when using methods a
rc
and d are significantly smaller than when using methods b and c.
Also
",2
method d leads to a smaller variance f or a than method a. It is
rc
..2
notable that variances of a
obtained from methods b and c are often
rc
the same. Therefore, when considering variance as the criterion,
methods a and d yield more accurate estimates of all.
'the probabilities of obtaining negative estimates when using methods a and d are, in general, appreciably· smaller than when using methods
b and c except for combined experiment number 4.
In combined experiment
number 4, the probabilities of negative &2 when using the four methods
rc
are in the order a<b<d<c, but the magnitude of the differences between
pairs of methods are small.
When methods a and d are compared, it is.
.. 2
clear that the probabilities of negative a
from these two methods are
rc
nearly equal.
For five out of eight combined e."tperiments methods b and
.. 2
c have the same probability of a negative a • In the remaining three,
rc
method b leads to a smaller probability of a negative estimate.
,,2
When considering the 95th percentile of the distribution of cr
,,2
together with the probability of yielding negative cr
rc
rc
as the criteria,
the results are the same as the results when considering variance as a
criterion.
These have been discussed in the preceding paragraph.
,,2
Therefore, method d proves to yield the most accurate results on cr
rc
of all methods.
Variances, the probabilities of negative estimates, and the 95th
percentiles of the distributions of &2 obtained from four methods and
e:
eight combined experiments are shewn in Table 4.4.4.
70
Table 4.4.4
... 2
Variances of a e , probabilities of negative a... e2 , and the
.. 2
95th percentiles of the distributions of a
e
from four methods and eight
2
2
2
combined experiments when a r ,. 1.0, a - .4, a 2 ,. .8, and a e • 1.0
c
rc
No.
Desiga
1
{; ; ~
2
3
4
5
(~
(~
2
4
2
4
~)
~)
Ii 22 Z)
(: 42 ~
Method
... 2
Var(a e )
95% value
a
.167
1.753
b
.188
1.827
c
.188
1.827
d
.167
1.753
a
.167
1.753
b
.188
1.826
c
.188
1.826
d
.167
1.753
a
.100
1.616
b
.156
1.750
c
.156
1.750
d
.100
1.616
a
.100
1.616
b
.104
1.711
c
.104
1.711
d
.100
1.616
a
.083
1.537
b
.094
1.582
c
.094
1.582
d
.083
1.537
71
Table 4.4.4 (continued)
No.
Design
6
7
8
Method
2
Var(a )
e
95% value
a
.056
1.451
b
.063
1.471
c
.063
1.471
d
.056
1.451
a
.050
1.437
b
.052
1.446
c
.052
1.446
d
.050
1.437
a
.050
1.437
b
.052
1.446
.052
1.446
.050
1.437
d
It is rather striking that methods a and d appear to give the same
variance
01:~
.. 2 f or all co mb me
. d experJ.Il1ents.
.
a
e
This condition also occurs
'Nith methods b and c.
But methods a and d yield more accurate estimates
than methods b and c.
w"hen using the 95th percentile of the dist=ibu-
tion of &2 as the criterion, similar conclusions obtain except that in
€
combined
e~eriment
number 4 the 95th percentile of the distribution of
&2 for method d is larger than the corresponding value for method a.
e:
When the results from Table 4.4.1 to Table 4.4.4 are combined, one
may conclude that method d is the most desirable, method a the second
most desirable, and methods b and c the least desirable of all four methods.
Since method d appears to be the most desirable of the four, the
next tvo sections TNill be devoted to examining the effects of number of
e
72
levels of the random factors and the effects of size of the true
variance components.
4.5
!'he Effect of the Number of Levels of Each Rand01U Fa.ctor
In this section the effects of the nu:tber of levels of each random
#
h
... 2 ... 2 ... 2
... 2
...actor on t e estimates, ~ , ~ ,~ ,and ~ , are e..'"tamined whc using
r
c
rc:
~
method d and sixteen combined experi:1ents.
The combined e:tper:iments
are ~a:in defined as pa irs of independent t"'.JO-way balanced e:tperiments.
There are sixteen. observations in the combined exper..:nent denoted by
2
2
2
2
~
and twenty-four in the combined e.."qler..:nents denoted by
42 . 2J2' '
~
(2
and ~2
,
2
2
4'
21·,
For the rest of the combined
e:tperiments eacit involves th1r-::y-two observations.
T:1.e behavior, i. e. ,
variances, the probabilities of negative estimates, and the 95th percentiles 6f the distributions,
•
o~
the estimates,
are examined under a set of true variance
...2
~
r
... 2 ... 2
, a ,a
... 2
and a_,
~c:'
c
~
co~ouents, a 2 .. 1.0, J2 .. .4,
r
c
~2:,c ... 8, and ~:... .. 1.0.
Variances, the probabilities of negative
pe::'c:!!ltiles of the disttibutions of
... '?
~-
1:'
esti~tes,
and the 95th
r,.onc using :lethod d and
combined exper-=:ents under a set of true variance compouents,
J
2
c
. . . 4,
'?
a-
re
• .8, and a
2
e
• 1.0, are shown in Table 4.5.1.
si:~een
2
~_
..
.. 1.0,
As defined
above, a combined expert:ent is a pair of experi:ents so to read Table
e..~eri-
4.5.1 locate :he row and the column labeled -Mitn the pair of
cents, for ezample, the results for combined e:Qeri.:lent \;
1-
found at the intersection of row 1 and colucn 4.
4.5.1 is symmetric.
~otice
2
2
2\
2J
are
that Table
Table 4.5.1
Variances of aA2 , probabilities of negative aA2 , and tbe 95th percentiles
r
r
0
f the distributions
0
f
"2
2
2
2
2
a •• 4, a rc • .8, and a £ • 1.0
a r when using method d and sixteen combbled experiments witb a r • 1.0,
,c
(4
2
(2
2)
4
2)
(2
2
(2
4)
2
2)
Experiment
A2
Var(a )
pea.. 2<0 )
95% value
Var(02)
p«(j2<0)
. r-
95% value
Var(a.. 2)
r
r
r-
r
95% value
p(a~O) 95% value var(o~) p«(j2<0)
r-
(4
2 2)
1.048
.145
2.927
1.291
.166
3.207
1.489
.187
3.366
1.569
.193
3.393
(2
4 2)
1.291
.166
3.207
1. 791
.213
3.727
2.096
.241
3.918
2.236
.255
3.963
(2
2 4)
1. 489
.187
3.366
2.096
.241
3.918
2.601
.258
4.136
2.842
.272
4.306
(2
2 2)
1.569
.193
3.393'
2.236
.255
3.963
2.842
.272
4.306
3.145
.283
4.500
.'
w
'"
e
e
e
74
Clearly combined experiment
2
estimating a •
2)
2z· 2
( "4'
is the most desirable for
When considering only combined experiments involvi..."1g
r
thirty-two observations, one obtains a better estimate of a2 when using
r
eight rows than when using only siX rows, and when using six rows than
when using only four rows.
This likewise occurs in combined experiments
involVing twenty-four observations.
With the same number of observa-
tlons and the same number of rows, combined experiments which have
larger degrees of freedom for the interaction between row and column
2
effects provide better estimates of a , .that is, using combined experir
(~
ment
2
4
~)
with six degrees of freedom for interaction is better
than using combined experiment
and combined' experiment
(~
4
2
~)
(~
4
4
~)
or combined experiment
experiment
(~
4
2
2)
2
2)
2
2
(i
with four degrees of freedom,
is better than combined e."t'periment
(~
2 4,
2
4
(~
2 .4)
However, there are large
22'
differences among combined experiments (;
2
2
using combined
.~so
with four degrees of freedom for interaction is
bet,ter than combined experiment
(~
J •
4
4
~ ) , (~
Also, there is a
:)- with the last being the least desirable.
large difference between combined experiments
(~
i .~ ) , and
4
2
iJ
and
(~
2
2
iJ
2
It is notable that one can obtain a better estimate of a r when the
total number of observations is twenty-four than when it is thirty-two.
This can happen when the combined experiment with a smaller number of
observations has larger degrees of freedom for interaction, for
e.~ample,
75
using combined experiment (4
\2
2
2
-(2
2)
2 or 2
4
2
~
each with four degrees
of freedom is better than using combined experiment
degrees of freedom.
(i
2
2
:)
with
two
Also, when the size of O'r2 not being so small
combined experiment with a smaller number of observations but a larger
2
number of rows may provide a better estimate of 0'.
r
combined experiment
(~
4
4
~) , (~
(i
2
2
For example, using
22) is better than using combined experiment
2
2
4
2
Variances, the probabilities of negative estimates, and the 95th
... 2
percentiles of the distributions of O'c when using method d and sixteen
combined experiments under a set of true variance components, 0'
2
2
-2
O'c • .4, 0' rc • .8, and 0' e: • 1.0, are shown in Table 4.5.2.
2
r
a
1.0,
The struc-
e
ture of Table 4.5.2 is the same as that of Table 4.5.1.
Clearly combined experiment (
estimating 0' 2 •
C
2
2
:
;) is the most desirable for
When considering only combined experiments involVing
thirty-two observations, using eight columns is better for estimating
0'2 than using only four or six columns.
c
This also occurs in combined
experiments involving twenty-four observations, that is, using six
columns is better for estimating 0'2 than using only four columns.
c
With
the same number of observations and the same number of columns, combined
experiments which have larger degrees of freedom for the interaction
between row and column
effects provide better estimates of O'~, that is,
e
e
e
Table 4.5.2
Variances of
A2
0 •
c
probabilities of negative
when using method d and sixteen combined experiments with
(2
(4
4 2)
2
A2
0 •
c
0
2
r
e·
.
and the 95th percentiles of the distributions
-
0
f A2
0
c
1.0. a 2 - .4. a 2 - .8. and a 2 • 1.0
c
rc
I:
2)
(2
2 4)
(2
2 2)
Experiment
A2
Var(o )
c
A2
p (0 <0)
95% value
A2
Var(o )
c
A2
p(o <0)
95% value
A2
Var(o )
A2
p (0 <0)
95% value
A2
Var(o )
c
A2
p (0 <0)
95% value
c-
c-
c
c-
e-
(2
4
2)
.508
.290
1.689
.519
.305
1.159
.610
.322
1.951
.161
.325
1.989
(4
2
2)
.519
.305
L 159
.561
.346
1.895
.139
.362
2.014
.803
.314
2.130
(2
2
4)
.610
.322
1.951
.139
.362
2.014
1.131
.363
2.365
1.291
.314
2.529
(2
2
2)
.161
.325
1.989
.803
.374
2.130
1.297
.374
2.529
1.525
.383
2.640
"
0\
77
.
(2
using combined experJ.ment
4
4
2
~) with six degrees of freedom for
(~
interaction is better than using combined experiment
Combined experiment (44
four degrees of freedom.
4
~)
2
il with
2
2
.
degrees of freedom is better than using combined experJ.ment
with four degrees of freedom or combined experiment
degrees of freedom.
Also combined experiment
(i
(~
2
degrees of freedom.
4
combined experiments (4
(42
2
2
:)
with four
(~
~) with
2
2
However, there are large differences among
2
2
2
2
last being the least desirable.
·
six
:) with two
2
i)
2
degrees of freedom is better than combined ext'eriment
~o
2
with
betiJeen combined experiments
(42
2
2
:) , with the
Also, there is a large difference
2
2
~)
and
(i
2
2
i) .
For combined
experiments involving thirty-ewo observations, one may obtain a better
estimate of a 2 when using a smaller number of columns but a larger numc
ber of rows, for example, using combined experiment
for estimating a 2 than using combined experiment. (22
c
where this can happen is that combined experiment
degrees of freedom for interaction.
l:
~)
l~
2
2
4
z)
2
2
2
is better
One reason
~) has larger
One can obtain a better estimate
of a 2 when the total number of observations is twenty-four than it is
c
thirty-two.
This can happen when the combined experiment with a smaller
number of observations has larger degrees of freedom for interaction,
78
(2. '42 '2)'
2
for example, using combined experiment ?Z
than using combined experiment
~
(i
or
2
r
2
2
i)
is better
2
2
It may be sufficient to state that the general
same as for a.
(22
conc1~sions
are the
However now, looking at more columns has a larger
effect than before.
This appears to be related to the relative magni-
tudes of a 2 and a 2 •
r
c
Variances, the probabilities of negative estimates, and the 95th
percentiles of the distributions
0f
... 2
arc when using method d and sixteen'
combined experimem:s under a set of true variance components, a 2 .. 1. 0,
r
a
2
2
2
c ... 4, a rc ... 8, and a & • 1.0, are shown in Table 4.5.3.
The struc-
ture of Table 4.5.3 is the same as that of Table 4.5.1.
Clearly combined experiment
estimating a
2
rc
•
(~
2
4
~)
is the most desirable for
When considering only combined experiments involVing
thirty-two observations, one obtains a better estimate of a
2
when
rc
using combined experiment with six degrees of freedom for the
inte~
action between row and column effects than using combined experiment
with four degrees of freedom, and using combined experiment with four
degrees of freedom for interaction is better than using combined experiment with two degrees of freedom.
This also occurs in combined experi-
ments involVing twenty-four observations, that is, using combined
experiment with four degrees of freedom for interaction is better than
using combined experiment with two degrees of freedom.
There are large
Table 4.5.3
Variances of
when using method d and
(4
.. 2
G
rc
,probabilities of negative
si~teen
.. 2
0
rc
,and the 95th percentiles of the distributions of
.. 2
G
rc
222
2
combined experiments with a - 1.0, G - .4, a - .8, and a - 1.0
r
c
rc
£
2 2)
(2
4
(2
2)
2 4)
(2
I
2
2)
Experiment
.. 2
.. 2
Var(o"2 ) p(a.. 2 <0) 95% value Var(02 ) p(a 2 <0) 95% valuQ Var(o . ) p(a <0) 95 % value
rc
rcrc
rcrc
rc-
..2
"2
Var(a ) p(a <0) 95% value
rc
rc-
(4
2 2)
.595
.123
2.272
.580
.123
2.254
.750
.155
2.515
.813
.181
2.604
(2
4
2)
.580
.123
2.254
.595
.123
2.272
.758
.159
2.526
.882
.184
2.615
(2
2 4)
.750
.155
2.515
.758
.159
2.526
1.108 . .220
2.947
1.350
.256
3.101
(2
2
.813
.181
2.604
.882
".184
2.615
1.350
3.107
1.753
.311
3.450
2)
.256
......
\0
e
e
e
80
~
differences among combined experiments with different degrees of freedom for interaction even though the total number of observations remains
constant.
However, there are small differences among combined experi-
ments which involve the same number of observations and the same degrees
of freedom for interaction, for exampl~ combined experiments (:
4
2\
~ , combined experiments (2
2
4
, and (22 4
4
4
2
, and combined experiments
obtain a better estimate of a
2
rc
(i
~ ~)
and
(~
2
2
2
2
4
2
~)
.
One can
when the total number of observations is
twenty-four than when it is thirty-t"'.Jo.
This can happen when the com-
bined experiments with smaller numbeJ;'s of observations have larger
degrees of freedom for interaction, for example, using combined experi(4
ment \ 2
(~
2
2
~)
or
(~
~)
4
2
is better than using combined experiment
2
2
Variances and the 95th percentiles of the distributions
0f
.. 2
a_0:. when
using method d and sixteen combined experiments under a set of true
variance components, a
shown in Table 4.5.4.
of Table 4.5.1.
2
2
r = 1.0, a c =
.,
.4, a-
rc
1.0, are
'!he structure of Table 4.5.4 is the same as that
Note that the probabilities of negative ;2 are all
c:
equal to zeros.
Clearly combined experiment (;
2
2
1)
is the most desirable for
2
estimating a • When considering only combined experiments involving
c:
thirty-two observations, using combined experiment where the number of
Table 4.5.4
A2
Variance.. of 0 £ and the 95th
combined el'ller1l111mta with a
(2
2
"'crc~ntUe8
r
... 2
of the diatributionll of G£ when uains aethod d and sixteen
222
2
- 1.0, 0 - .4, 0
- .8. and" - 1.0
r
c
rc
£
4)
(2
4
(4
2)
2
(2
2)
2 2)
Experilllellt
~2
~2
~2
Var(" )
951 value
Var(" )
95% value
Var(" )
95% value
£
£
£
Var aj2) .
£
95% value
(2
2 4)
.083
1.537
.100
1.611
.100
1.616
.125
1.685
(2
4
2)
.100
1.611
.125
1.671
.125
1.671
.167
1.753
(4
2
2)
.100
1.616
.125
1.671
.125
1.677
.167
1. 753
(2
2
2)
.125
1.685
.167
1. 753
.167
1.753
.250
1.960
!
QO
~
e
e
e
82
subsamples is four is better for estimating a
2
e:
than using combined
experiments where the numbers of subsamples are two and four or combined experiment where the number of subsamples is two, and using
combined experiment where the numbers of subsamples are two and four
is better than using combined experiment where the number of sub samples
is two.
This likewise occurs in combined experiments involving twenty-
four observations.
With the same number of observations, combined
experiments which involve the same numbers of subsamples or the same
number of sub samples provide the same results on estimating a e:2 ' that is,
combined experiments
(:
2
2
2
2
~)
, (i
~J
and
(~
2
2
~)
l~
an'd
~)
2
4
~) ,
, combined experiments
4
~)
, and
(i
(~
4
2
~)
However, there are small differences among
2
.
4
4
all sixteen combined experiments.
and combined exp erimen ts
It is observable that combined
experiments which involve t'r.venty-four observations may provide the same
results as combined experiments which involve thirty-t'r..To observations,
combined experiment
(~
2
2
i)
tions and combined experiment (:
2
2
~)
,
for
~xample,
with twenty-four observa-
(i
2
4
2\
21 , or
(~
4
4
~J
each with thirty-two observations provide the same results On estimating
a~.
When considering the results from Table 4.5.1 to Table 4.5.4, one
would prefer to use combined experiment with a larger number of rows
or a larger number of columns when the total number of observations
remains constant.
The reason is that combined experiment with larger
83
numbers of subsamples improves the results on estimating cr 2 , but the
E
improvement is small when comparing with a larger number of rows or a
larger number of columns which improves the results on estimating
222
cr , cr , and cr •
rc
r
c
However, a larger number of rows or a larger number
of columns does not yield much different results on estimating cr
2
rc
and
cr 2 when the total number of observations remains the sam~ but it proE
2
r and cr c • When the value
2
2
of cr is large when comparing with the value of cr , it·is better to
r
c
duces a noticeable improvement on estimating cr
have the ·number of rows increased.
increased otherwise.
.
increas~g
t he numb er
2
The number of columns should be·
The reason is that when the size of cr
0f
2
r
is large,
rows pro vid es muc h sma11er vari ance of crA2
r
than that when increasing the number of columns or increasing the numbers of subsamples when the total number of observations remaining the
same.
Ihis will be discussed in more details in the next section.
And
also, increasing the number of rows and increasing the number of columns
do not yield much different results on estimating cr
cr
2
c
is small, cr
2 ,.
.4 in the present work.
2
when the size of
It is noticeable that one can
c
2
2
c
2
and cr
when the total number of
obtain a better estimate of cr r' cr
c'
rc
observations is twenty-four than when it is thirty-two.
This happens
when combined experiments with smaller numbers of observations have
larger degrees of freedom for interaction, for example, using combined
84
experiment
{~
2
2
~1
or
(~
4
2
~
each
~N.ith
four degrees of freedom
2
2
2
is better for estimating a , a , and a
than using combined experir
c
rc
\i
meut
2
2
4.6
:} with two degrees of freedom.
The Effect of the True Variance Components
2
In this section the effect of the true var:t.ance components, a ,
1:
2
2
2
... 2
ac' arc' and a e:' on the estimates a r
combined e:tperiments are examined.
when using method d and three
The combined
~~eriments
are de-
fined as in Section 4.5 each involving t"oJenty-four observations.
The
behavior, i.e., variances, the probabilities of negative estimates,
and the 95th percentiles of the distributions, of the estimates
under several sets of true variance components are e:tamined.
2
2
the effect of a , where a
r
r
where cr
2
c
a.
a
ar2
That is
"'2
_ 2
.01, 1.0, 2.0, on a , the effect ot cr ,
r
c
...2
2
2
01, .4, 2. 0, on a J the effect of a , T.vhere cr
• • 01,
.
r
rc
rc
...2
2?
"'2
.8, 2.0, on ar' and the effect of ae:' where a~ • .01, 1.0, 2.0, on are
Variances, the probabilities of negative esti~tes, and the 95th
... ?
percentiles of the distributions of cr- when using method d and three
r
2
combined experiments T.with a · . 01, 1. 0, and 2. a are shewn in Table
r
4.6.1.
85
... 2
.. 2
and the 95th
Table 4.6.1 Variances of a , probabilities-- of negative-a,
r
r
.. 2
eercenti1es of the d1strib~~ions of ar when using method d and thl:ee2
2
combined experiments with a • .01, 1.0, 2.0, a 2 • • 4, a
c
r
rc • .8, and
0
2 • 1.0
e
Design
r;
•
(~
(~
2
2
4
2
2
2
0
~t
i)
i)
2
r
... 2
Var(o )
r
p(.;2 < 0)
r-
95% value
.01
.252
.567
.• 936
1.00
1.569
.193
3.393
2.00
3.719
.099
5.753
.01
.426
.490
1.146
1.00
2.236
.255
3.963
2.00
6.217
.147
7.503
.01
.679
.493
1. 361
1.00
2.842
.272
4.306
2.00
7.017
.183
8.306
e
-
e
e
2 2 4)
( 222
2 4 2)
(222
6.0
4.0
r-.
N
<0
J.4
'-"
J.4
l1S
I
//
/
(~ ~ ~)
P
2.0
0.0
1.0
2.0
o
Figure 4.6.1
2
r
The graphs of variances
0
~2
f 0
r
where .01 <
-
0
2
< 2.0 for
r-
three combined experiments
00
0\
87
2
It is noticeable that when the value of or increases, the probability of observing a negative
the distribution of
cr 2r
.. z
0
r
decreases while the 95th percentile of
increases.
We consider the behavior of the
.. 2
.. 2
2
2
estimate 0 by looking at variance of 0 as a function of 0 with 0 ,
r
0
2
rc' an d
of
0
0
r
r
The graphs of variances of
2e: b e ing fixed.
.. 2
0
c
as functions
r
2 for three combined experiments, data from Table 4.6.1, are
r
shown in Figure 4.6.1.
For each value of
0
2
, one obtains a better estimate of
r
fi
using combined experiment
{~
ment
~}
4
2
or
(~
2
2
~ ~)
0
2
when
r
than when using combined experi-
and using combinid experiment
is better than using combined experiment
(~
2
2
three utilize the same number of observations.
il
i
l~
~)
even though all
When 0
Z
is small,
r
there are small differences among these three combined experiments.
The rate of increase in variance of
r2
for combined experiment l2
2
2
ment
~
4
2
22
"Z as a function of
0
r
0
2 is largest
r
4) somewhat smaller for combined experi2'
, and much smaller for combined experiment
(~
2
2
~) .
It
is noticeable that the rates of increase are nearly the same for
combined experiments
2
(~
4
2
2
2
.. 2
~.
This implies that each
value of or affects variances of or for combined experiments
2
2
bined
i}
by nearly the same amount.
~~eriments
12
l 2 24
2\
21
One can conclude that for com-
involving twenty-four observations using combined
experiment with six rows is better than using that with six columns or
88
that with numbers of subsamples being two and four when the value of
0
2
r
is small and much better when the value of
.
0
2
r
is large.
Variances, the probabilities of negative estimates, and the 95th
percentiles of the distributions of
combined experiments with
o~
A2
0
r
when using method d and three
• • 01, .4, and 2.0 are shown in
Table 4.6.2.
Clearly for each combined experiment the probability of a nega-
A2
tive o
r
value of
remains the same when the value of
0
2
c
tile of the
A2
the probability of a negative
0
.
different combined experiments.
distribution~
2 increases, but for each
c
0
r changes when using
These also hold for the 95th percen-
In order to examine the behavior of &2, we
•
r
2
2
2
2
consider variance of &2 as a function of 0 with 0 , 0 , and 0
r
c
r
rc
€
The graphs of variances of
being fixed.
A2
r as functions of
0
0
2
c for
three combined experiments, data from Table 4.6.2, are shown in
Figure 4.6.2.
For each combined experiment there are no differences among
variances of
A2
0
.• 2
r for different values of
2
does not affect variance of cr •
not depend on the value or size of
0;
than using combined experiment
2
combined experiment (
12
2
2
4
2
~l
c , that is, the value of
This implies that estimating 0
r
obtains a better estimate of
0
0
2
c
•
For each value of
0
e~eriment
(i
i) ,
4
~)
or
(~
2
2
2
c
does
2
one
c
when using combined
2
2
r
0
(;
2
2
and using
is better than using combined
~~eriment
89
",2
"'2
Variances of cr , probabilities of negative cr r , and the
r
.. 2
95th percentiles of the distributions of cr when using method d and
r
Table 4.6.2
2
2
2
three combined experiments with cr r • 1.0, cr • .01, .4, 2.0, cr
c
rc • .8,
2
and cr e: • 1.0
Design
(i
(~
(~
2
2
4
2
2
2
~)
~) .
iJ
cr
2
c
Var(cr"2 )
r
p(a
2 < 0)
r-
95% value
.01
1.568
.193
3~.386
.40
1.569
.193
3.393
2.00
1.572
.196
3.414
.01
2.236
.255
.40
2.236
.255
3.963
2.00
2.236
.255
3.963
.01
2.842
.272
4.300
.40
2.842
.272
4.300
2.00
2.842
.272
4.300
•
3.963
e
e
e
e
3.0 I-
(~ i ~)
'"
-
4
(2
2)
222
2.0
N ~
<"b
'-'
(4222
2 2)
I
~
C\1
l>
1.0
0.0
1.0
2.0
2
a
c
A2
2
The graphs of variances of a where .01 < a < 2.0 for
r
- cthree combined experiments
Figure 4.6.2
\0
o
91
Variances, the probabilities of negative estimates, and the 95th
..2
percentiles of the distributions of a when using method d and three
r
combined experiments with
2
a
· . 01,
rc
• 8, and 2.0 are shown in
Table 4.6.3.
",2
For each combined experiment the probability of a negative a
r
increases when a
2
increases.
rc
Also, for each value of a
2
the probabirc
lity of a negative a"'2 increases when using cott1bined experiment
r
(i
4
2
i)
rather than using cott1bined experiment
combined _experiment
4
i) .
2
2
2
;)
2
2
~
and using
rather than using combined experiment
These also hold for the 95th percentile of the distribu-
.. 2
tion of a •
r
.
funct~on
as a
(~
(;
.. 2
We consider the behavior of cr
of a
r
by looking at variance of
2 with a 2
22
, a , and cr . being fi%ed.
rc
r
c
~
"'2
The graphs of
2
variances
. of a r as functions of cr rc for three combined experiments,
data from Table 4.6.3, are shown in Figure 4.6.3.
For each value of a
2
rc
, one obtains a better estimate of a
using combined experiment (;
(~
i
~)
or
(~ ~
iJ,
~ ~)
2
r
when
than when using combined experiment
and using combined experiment
better than using combined experiment ( 22
2
2
(~
i
4 ) even though
2
2 ) is
2
all three
combined experiments utilize the same number of observations.
The rate
",2
2
as a function of a
is smallest for
r
rc
of increase in variance of a
4
combined experiment (
2
4
2
2
2
2\
~
, somewhat larger for combined experiment
;), and much larger for combined experiment
(~
2
2
;) .
How.ever,
92
"2
...2
Variances of a , probabilities of negative ar , and the
Table 4.6.3
r
... 2
95th percentiles of the distributions of a r when using method d and
2
2
2
three combined experiments with a r • 1.0, a c • .4, arc • .01, .8, 2.0,
and a
2
e
-
1.0
Design
(i
(~
/2
\2
2
2
4
2
2
2
~)
~)
~)
..2
Var(a )
r
p(.;2 < 0)
r-
.01
.820
.078
2.786
.80
1.569
.193
3.393
2.00
3.293
.286
4.430
.01
1.424
.142
3.412
.80
2.236
.255
3.963
2.00
3.819
.336
4.792
.01
1. 437
.116
3.452
.80
2.842
.272
4.306
2.00
6.172
.350
5.622
a
2
rc
95% value
2 2 4)
( 222
6.0
4.0 lr-.
N
~
<0'-'
/
I
~
~
~ ~
/'
(; ; ~
(4 2 2)
222
l\t
l>
2.0
0.0
1.0
2.0
a
2
rc
A2
2
The graphs of variances of a where .01 < a
< 2.0 for
r
- rcthree combined experiments
Figure 4.6.3
'D
LJ
e
e
e
94
when the size of a
2
is small, there are small differences among these
rc
three combined experiments.
It is notic.eable that the rates of
increase are nearly the same for the combined experiments
4
2
~) .
(~
2
2
2
This implies that each value of arc affects variances
fi
of ;; for combined experiments
~
4
2
2
2
, which have the
same degrees of freedom for interaction, by nearly the same amount.
Variances, the probabilities of negative estimates, and the 95th
... 2
percentiles of the distributions of a r when using method d and three
combined experiments with
a~
~Ol, 1.0, 2.0 are shown in Table
•
For each combined experiment the probability of a negative
increases when the value of a
probability of a negative
2
combined experiment
~) .
2
2
... 2
increases, and for each value of a
€
(i
4
2
2
€
the
increases when using combined ~~eriment
21~ rather than using combined experiment
2
a.
r
;2r
2
~
4
(22
2
~
and using
rather than using combined experiment
These also hold for the 95th percentile of distribution of
... 2
We also consider the behavior of the estimate a
r
by looking at
variance of ;2 as a function of a 2 with a 2 , a 2 , and (12 being fixed.
r
E:
r
c
rc
... 2
2
The graphs of variances of a as functions of a for three combined
r
E:
experiments, data from Table 4.6.4, are shown in Figure 4.6.4.
2
r
For each value of (12 one obtains a better estimate of a when
€
using combined experiment
ment
(~
4
2
~j
or
(~
2
2
2
2
~
rather than using combined experi-
and using combined experiment
l~
4
2
~
is
95
,,2
"2
Variances of a , probabilities of negative a , and the
r
r
,,2
95th percentiles of the distributions of a when using method d and
r
2
2
2
three combined experiments with a • 1.0, ac • .4, arc • .8, and
r
Table 4.6.4
0'2 • .01, 1.0, and 2.0
e:
Design
(~
(;
(~
2
2
4
2
2
2
;)
~
i)
2
ae:
,,2
Var(a )
r
.01
1.064
.128
3.0ll
1.00
1.569
.193
3.393
2.00
2.202
.238
3.844
.01
1. 706
.194
3.634
1.00
2.236
.255
3.963
2.00
2.845
.294
4.300
.01
2.128
.223
3.876
1.00
2.842
.272
4.306
2.00
3.659
.295
4.705
p(a
2 < 0)
r-
95% value
e
e
e
e
2 2 4)
( 222
3.0
2 4 2)
( 222
~
~
N
'~
~
2.0
I
(4 2 2)
-------
2 2 2
1.0
0.0
1.0
2.0
a
Figure 4.6.4
2
£
.
A2
2
The graphs of variances of a where .01 < a < 2.0 for
r
-
£-
three combined experiments
'0
0\
97
2
2
better tha;!. using combined experiment (2
2
4)
2
even though all three
combined experiments utilize the same number of observations.
rate of increase in variance of
[~ ~ ~),
combined experiment
0r2
as a function of a
2
The
is largest for
€
smaller for combined experiment ( 22
and much smaller for combined experiment (;
~ ~.
4
2
There' are large
differences among these three combined experiments when the value of
2
a 2€
is small, and the
differences become larger when the value of a €
.
In other words, using combined experiment
increases.
better than using combined experiment
(~
4
2
2)
2
(2
or \2
(
'+2'
2
2
2}2 JoS
.
2
2
~)
when the
2
value of a 2 is small, and much better when the value of a i.s large.
€
€
When considering the results from Table 4.6.1. to Table 4.6.4,
2
2
2
the value of ar' arc·' and a2 have a large effect on estimating a
r
c::
all three combined experiments •
..2
affect the results on a.
r
for
2
However, the value of a c does not
There are small differences among three
2
.
is small, but large differr
combined e..'"q)eriments when the value of cr
ences when the value of a 2 is large.
r
value of a
It is noticeable that when the
2
is large, using a larger number of rows prOVides much
r
.. 2
better results on a than using a larger number of columns or larger
r
numbers of subsamples even though the total number of observations
remains constant.
Also, there are small differences among three
combined experiments when the value of a
2
rc
is small, but combined
experiment with a larger number of rows or a larger number of columns
.. 2
provides much better results on a than larger numbers of subsamples
r
98
when the value of
0'2.
rc
is large.
However, a larger number of rows is
better than a larger number of columns, that is, combined experiment
2
2
~
2J
4
2
~.
changes.
is better for estimating
2 than combined experiment
0' r
The same conclusion can be made when the value of
0'2
E
99
5.
SUMMARY
The aims of the present work are restated as follows:
a)
Examine the asymptotic properties of REML estimates in the
mixed model of the analysis of variance.
b)
Derive the distribution of translation-invariant variance
component estimators.
c)
Compare four methods of variance component estimation for
several combined experiments.
d)
Study the effect of the number of random levels and the
effect of the true variance components on MINQUE.
It is shown in Chapter 2 that the REML estimates are consistent,
asymtotically normal, and asymptotically efficient in the sense of
attaining the Cramer-Rao lower bound for the covariance matrix under
some assumptions on the mixed model and some assumptions on a sequence
of experiments.
A class of design sequences where the number of levels
of each random factor increases to infinity is used.
In considering
the asymptotic properties for the mixed model, it is different from the
usual method of proof of such properties because the observed random
variables are not independent and identically distributed.
Different
normalizing sequences which are related to the degrees of freedom in
the analysis of variance are used for different sequences of estimates.
We prove the asymptotic theory by showing that under some assumptions
on the mixed model and a sequence of experiments the requirements of a
theorem proved by Weiss are satisfied.
100
For the purpose of studying the behavior of the
estima~ors,
the
exact distribution of estimators which are translation-invariant
quadratic forms is derived in Chapter 3 by using the method of mixtures.
The distribution is shown to depend only on the eigen values of At which,
in general, are not all positive.
When all eigen values are positive,
the distribution can be expressed as an infinite series of chi-square
distributions.
When there are both positive and negative eigen values,
the distribution can be expressed in terms of the confluent hypergeometric function.
However, the distribution can be represented in
terms of the chi-square distribution when both the number of positive
and negative eigen values are even.
Even though the distribution derived
is in terms of infinite series, an adequate approximation can be
obtained by truncating the series.
The accuracy of the distribution
depends on when the series is terminated.
The method has been checked
with the distribution obtained from the Monte Carlo method.
In Chapter 4, the four methods of variance component estimation
for eight combined experiments are compared under a set of true variance components by using three criteria, that is, variance, the
probability of a negative "estimate, and the 95th percentile of the
distribution, where the last two are obtained from the distribution
derived.
It appears that the MINQUE method and method of pooling
sums of squares, in general, yield better estimates than method of
computing mean of analyses of variance estimates and method of averaging
mean squares.
In the meantime, one obtains a better estimate when using
the MmQUE method than using the method of pooling sums of squares.
A2
For all estimates, aA2r , aA2 ' aA2rc , an d a,
e:
c
'0
t
h e same cone l
'
b
can be
us~on as a ove
101
made.
However, a conclusion cannot be made so as to the desirability
of method of computing mean of the analyses of variance estimates and
method of averaging mean squares because they depend on the experiments
which are combined.
same results.
It is noticeable that they frequently give the'
It appears that the probability of a negative &2 is
e:
equal to zero in all cases.
Since the MINQUE method is the mest
2
2
2
2
desirable of all four methods for estimating or' a c
,ar e
,and
e:
°,
it is
then worthy to use this method to estimate variance components when one
experiment~,
has a series of dissimilar but balanced
at least for a
series of two-way balanced e."'q)er1ments.
w~en
the effects of the number of levels of each random factor
A2
~ d it l.'S
on MINQUE, that is, aA2 , aA2 ,aA2 ,and ae:'
are cons id e_e,
r
c
rc
2
clear that using more rows yields a better estimate of a , using more
r
2
columns yields a better estimate of a , and using larger numbers of
c
subsamples yields a better esticate of
observations remaining constant.
0
2
e:
when the total number of
\ol1th the SaJ!le number of observations
and the same number of rows, combined experiments which nave larger
degrees of freedom for the interaction between row and column effects
pro vid e b etter
.
est~tes
~
2
r
ot a .
With the same number of observations
and the same number of columns, combined experiments which have larger
degrees of freedom for interaction provide better estimates of cr~.
Combined
e.~periments
with the same number of observations but larger
2
degrees of freedom for interaction provide better estimates of arc
102
However, there are small differences among the results on
2
arc
obtained
from combined experiments with the same number of observations and the
same degrees of freedom for interaction.
For estimating
0
2
e:
,
the re-
sults obtained from different combined experiments are not much
different even though using combined experiments with larger numbers of
subsamples prOVide better estimates.
All these results suggest one to
have rows increased or columns increased, when the total number of
observations remains constant, to improve his estimation.
or2 and
0
The sizes of
2
c also have an effect on deciding that rows or columns should
be increased.
When. the size of
0
a large improvement on estimating
2
is large, increasing rows produces
r
0
2•
Also, increasing columns pro-
r
duces a large improvement on estimating o~ when the size of o~ is
large.
Hence, it is better to have rows increased when the size of
is large when comparing with the size of
0
0
2
r
2
and to have columns inc
creased otherwise.
It is not always true that one obtains better estimates w.hen using
combined experiments with larger numbers of observations.
In some
cases, combined experiments with smaller numbers of observations but
larger degrees of freedom for interaction prOVide better estimates, for
example, combined experiments
( "'2'
2
2
~)
and
(~
4
2
~
each with twenty-
four observations and four degrees of freedom for interaction provide
better estimates for
0;, o~,
and O;c than combined experiment
[~ '~ ~)
103
with thirty-two observations and two degrees of freedom for interaction, and all three combined experiments provide the same results on
The variance component estimates depend not only on the number of
random levels but also on the size of some true variance components.
2
2
2
2
For estimating cr , the sizes of cr ,cr
and cr have a large effect on
r
r
rc'
€
... 2
r
cr , but the size of cr
95th percentil e
0f
2
... 2
has no effect on cr •
c
r
· t r ib ut i on
the d~s
0f
Variance of
.
h
he
cr... 2 ~ncrease
went
r
... 2
increases, but the probability of a negative cr
2
ar2 and
r
.
s~ze
the
0f
cr 2
r
decreases when the size
When the size of cr 2 increases, variance of cr"'2 , the
rc
r
... 2
probability of a negative cr , and the 95th percentile of the distribuof cr
r
increases.
r
tion of
ar
2
increase.
•
The same concluSion on variance of
a2r ,
probability of a negative
the
and the 95th percentile of the distribu-
tion of a~ can be made when the size of cr~ increases.
... 2
a2r ,
When considering
2
variance of or as a function of or' the rates of increase in variance
of cr 2 for different combined ~~periments with the same number of obserr
vations are not the same.
Combined experiments with larger numbers of
rows provide much better estimates of
0
2 than combined ~~eriments with
r
larger numbers of columns when the total number of observations remains
constant and the size of
0
2
r
is large.
However, there are small differ-
ences among combined experiments with the same number of observations
104
when the size of a
2
r
is small.
When considering variance of &2 as a
r
2
A2
function of arc' the rates of increase in variance of a for different
r
combined experiments with the same number of observations are not the
same, but they are almost the same for combined experiment with the
same number of observations and the same degrees of freedom for interaction.
When the size of a
2
is small, there are small differences
rc
among combined experiments with the same numbers of observations.
The
A2
2
general conclusion when considering variance of a as a function of a
r
€
is the same as that when considering variance of a 2 as a function of
r
105
6.
LIST OF REFERENCES
Corbeil, R. R. and S. R~ Searle. 1976. Restricted maximum likelihood
(REML) estimation of variance components in the mi:md model.
Tecbnometrics 18:31-38.
Giesbrecht, G. F. 1977. Combining experiIl1ents to estimate variance
components. Personal communication.
Giesbrecht, G. F. and P. Burrows. 1978. Estimating variance co:nponents
using MINQUE or restricted maximum likelihood when the error
structure has a nested model. To be published in Communications
in Statistics.
Graybill, F. A. 1954. On quadratic estimates of variance components.
AImals of Mathematical Statistics 25:367-372.
Graybill., F. A. and A. W. Wortham. 1956~ A note on unifor.nly best
1mbiased estimators for variance components. Journal of the
American Statistical Association 51:266-268.
Graybill, F. A. 1969. Introduction to Matrices with Application in
Statistics. Wadst~\,)rth Publishing, Inc., Belmont, California.
Bartley, E. O. and J. N. K. Rao: 1967. Maximum likelihood estimation
for the mixed an'alysis of variance medel. Biometrika 54 :93-108.
Harville, D. A. 1977. Maximum. likelihood approaches to variance
component estimation and to related problems. Journal of the
American Statistical Association 72:320-340.
Henderson, C. R. 1953. Estimation of variance and covariance
ponents. Biomet=ics 9:226-252.
co~
Khatri, C. G. 1966. A note on a manova model applied to proble!I1S
in growth curve. Annals of the Institute of Statistical
Mathematics 18:75-86
Johnson, N. L. and S. Katz. 1970. Continuous Univariate
Distributions-2. Boughton Mifflin, Inc., Boston.
Liu, L. and J. Senturia. 1977. Computation of MDlQUE variance
component estimates. Journal of the American Statistical Association 72:867-868.
Miller, J. J. 1973. Asymptotic properties and computation of maxi~um
likelihood estimates in the mixed model of the analysis of
variance. Technical Report No. 12, Depar:nent of Statistics.
Stanford Univeristy, Stanford, California.
106
Miller, J. J. 1977. Asymptotic properties of maximum likelihood
estimates in the mixed model of the analysis of variance.
Annals of Statistics 5:746-762.
Patterson, H. D. and R. Thompson. 1971. Recovery of Interblock
inf ormation when block sizes are 1.mequal. Biometrika 58: 543-554.
Press, S. J. 1966.. Linear combinations of non-central chi-square
variates. Annals of Mathematical Statistics 37:480-487.
Rao, C. R. 1970. Estimation of heteroscedastic variances in
linear models. Journal of the American Statistical Association
65: 161-172.
Rao, C. R. 1971a.
MIQUE theory.
Estimation of variance and covariance components
Journal of Multivariate Analysis 1:257-275.
Rao, C. R. 1971b. Minimum variance quadratic 1.mbiased estimation
of variance components. JourtIB.l of Multivariate Analysis
1:443-456.
Rae, C. R. 1972. Estimation of variance and cOl7ariance components
in linear models. Journal of the American Statistical Association
67:112-115.
Robbins, H. E. and E. J. G. Pitman. 1949. Application of the method
of mixtures to quadratic forms in normal variates. Annals of
Mathematical Statistics 20:552-560.
Wang, Y. Y. 1967. A comparison of several variance component
estimators. Biometrika 54:301-305.
Weiss, L. 1971. Asymptotic properties of maximum likelihood estimators
in some nonstandard cases. Journal of the American Statistical
Association 66:345-350.
Weiss, L. 1973. Asymptotic properties of maximum likelihood
estimators in some nonstandard cases, II. Journal of the
American Statistical Association 68:428-430.
107
7.
APPENDIX
The following lemmas have been used in the proofs of Lemma 2.5.1
to Lemma 2.5.3 and Theorem 2.5.1.
Some lemmas, TNhich were collected
from reference materials, are stated in order to facilitate the proofs
of the consequent lemmas.
In the next four lemmas, the basic results on the eigen values of
an
tlXll
matrix A'A (A), TNhere i-l,2, •• ,n, TNith A (A)
l
i
~
A (A)
2
~ •••
> A (A) are stated.
n
Lemma 7.1.
The eigen values of an
tlXll
symmetric ide!l1t'otent matrix
are either O·or 1.
Lemma. 7.2.
If A is any nonsingular matrix and B is any n:m matrix,
Lenma 7.3.
If A and B' are mxn matrices, then the nonzero eigen
values of A.B and the nonzero aigen values of BA are the same.
Lemma 7.4.
The eigen values of A and A' are the same, and the
matrix AP , TNhere p is a positive intager, has the eigen values Ai(A).
The notations TNoich ....ill be used in the re!!lainder of this appendi."'t
are the same as those defined in Chapter 2.
From the definition (2.13)
2
2
2
and condition 1, if 201 and cr
2
lcrii -
cr~il ~~,
e
2
2
cr Oi
2
3aOi
Nn(ZQ) , thenT < cr ji < -y-, and
TNhere j-l,2 and i-O,l, •• ,p.
In the following
l~,
the results on the bounds of the
Q~"'timum
of the aigen values and the bounds of the maximum of the absolute values
e
108
of the eigen values of some particular matrices are paraphrased from
M:.U1er (1977).
2
2
2
Given.2.1 and.2.2 E Nn (2.0)' let: m, I: k , where k-0,1,2, ni'
Lemma 7.5.
2
a 01 ' and Vi' where i-O,l, •• ,p, be as defined in
Chapter~.
If eondi-
tion 1 is true, then
a)
A1
(1:~lv1) ~
+'
a Oi
-1
b)
A (1: I: ) ~ 2
1 1 O
c)
A (1:
1 2
d)
max IA t (1: o (Zl-I: O»I ~
-1
to)
~ 2
-1
m
t-1,2,.,n
e)
-1
max IAt(!o (!2- t o»1 <
t-l,2,. 'n
m
min
1-0,1,. ,p
f)
g)
IAt(t -1
o (1: 1-I: Z»
2.:a1,2,.,n
I
max IA9. (I:~1 O:l-ZO»)
)'-l,Z,.,;l
I<
max
-<
2m
2
, and
(n a )
Oi
1-0,1,. ,p 1
min
2m
min
i-O,l,.,p
P~of.
2
(n 1a 01)
.
(::t
2
)
a
i Oi
See Miller (1977).
The bound of the maximum of the absolute values of the eigen values
of the matrix which is of the form C'BC is given as follows:
Lemma 7.6.
then
If 'c is an nxm matrix and B is an nxn symmetric ~atrL~,
~~ IAk(C'BC)1 ~ A1(C'C)
max 1;\2(3)1.
k-1 , 2 , . ,m
9. -1 , Z, • , n
109
P~oof.
sult~
See Miller (1973).
,
, -1
where t , k-O,1,2, and T are as defined in
Qk - T (TtkT)
k
Chapt er 2, we have that
Recall that there e.tists a nonsingular matri:t ~ such that t k • ~Ak'
where k-O,1,2.
k • I - ~lx(X't;lX)-1x,~t, then Qk • t;l -
Define P
-1
-L -1
-t
-1
t k X(X'tkX) -X'tk~· ~ ?k~ .
It is clear that P
k is a symmetric
idempotent matrix.
Applying Lemma 7.6, the bounds of the maximum of the eigen values
and the bounds of the maximum of the absolute values of the eigen values
of some particular matrices are as follows:
Lemma 7.7.
Given.z.i and
a~
E N'n (~), let
i-O,l,.,p, be as defined in Chapter 2.
a)'
Al (AOQaAO) 5.. 1
b)
Al(A~Q1AO)
<
2
c)
Al(A~Q2AO)
<
2
d)
~
and
~*
be as defined
If condition 1 is
,
2m
max [A 2. (Ao~Ao)1 ~ .
2
~
(n.a)
2.-l,2,.,n
i • 0 , 1 ,.,p ~ 01
, and
tr~e,
then
~
110
P~oof.
The proofs are given for (a) and (d); the other cases can
be proved analogously.
a)
Using
~
-t
-1
- AO PerO ' where Po is a symmetric idempotent
matrix, and Lemma 7.1, we get
~1
d)
max
By the definition of
~,
IA1(AbaAaAb~o)1 - max IA1(A~QO(EO-t2)Q2AaA~Q2(tO-t2)QOA~
1-1,2,.,n
1-1,2,.,n
Applying Lemmas 7. 3, 7. 4, and 7. 6,
max IA9.. (A~Ml1\~MO)
;2.-1,2,. ,n
I
Using (a), (c), and Lemma 7.5, we can conclude that
•
III
max lAg. (A~AAO)I
~ 1 2
Thus,
...-
,
<
----2m~-2~
min
,.,Il
.I.
..,-
0 , 1 ,. J P
(uia Oi)
min(di,d )
j
In the uext lemma, we will show that .;;i;,J;,....j _ -__
is bounded for
Il Il
i j
i, j -0 , 1, . , p •
Lemma 7.8.
then
If d
i
and n , where i-O,l,.,p, are defined in Chapter 2
i
is bounded for all i,j.
•
Since c
P-roof.
i
and n
2
i
have the same order of magnitude, then
min(c. ,c )
.;;i.......j __1._ j_ is bounded.
n.n.
1.
It is
know~
that min(d.,d.) <
J
•.
1.,J
1.
J
~in(c.,c.).
-. j
1.,
J
1.
is bounded fo-r all i,j.
Therefo-re,
The bound of the trace of the
matri~
which is of the fore AV.BV.
1.
J
is given as follows:
Lemma 7.9.
Let A and B be symmetric nxn matrices.
and V., where i-O,l,.,p, are as defined in Chapter 2, then for all i,j,
1.
max
I Ai
(A~-\AO) I max
2,-1,2, . ,n
Proof.
See Miller (1973).
IAi
(A~BAO) I.
2,-1,2, . ,n
112
Applying Lemma 7.9, the bounds of the traces of matrices which are
of the form
AViBV j
- for different matrices A and B are obtained as
ninj
follows:
Lemma 7.10.
2
Given 0'12 and 0'22 E :::In (~),
let A and A.* be as defined
2
above, and let m, t k , Qk' where k-0,1,2, d i , ni' a Oi ' and Vi' where
1-0,1,.,p, be as defined in Chapter 2.
If condition 1 is true, then
for all i,j,
2m
m
min(d. ,d )
1, j 1. j
.
(2 ) 2
!I1~
1=0,1, • , P
, and
n i 0' 01
-
16m
Proof.
The proofs are given only for (a) and (c); the other cases
can be proved analogously.
113
a)
Applying Lemmas 7. 7 and 7.9, we have that
max
IAi (A~MO) I
1-1,2,.,n
c)
max
IA2. (A~MO) I
1-1,2,. ,n
Applying Lemmas 7.6 and 7.9, we get
n~l1j
I
tr«Qo-Qo"zQo)V1 QOVj)
min(di,d )
j
.
< i.j
IIiIIj
2
2
a Oi cr OJ
I
max IA1(A;<QO-QOt2QO)AO)! max lA2.(~QoAo)1
t-l,2,.,n
2.-1,2,.,n
max IAi(A~QO(!0-t2)QOAO)1
2,-1,2,. ,n.
by Lemmas 7.1 and 7.5.
114
Applying temna 7.10, the bounds of the differences of the traces
of matrices which are of the form
AV iBVj
-
for different matrices A
D..n.
1. J
and B are obtained as follows:
222
Given 0'1 and 0'2 e Nn
<.20),
LI!IIm1a 7.11.
above and let
til,
•
let A and A be as defined
2
~, Qk' where k-O,1,2, d , D. , 0'01' and Vi' where
1
i-O,l,.,p, be as defined in Chapter 2.
i
If condition 1 is true, th~~
for all i,j,
Proof.
a)
Using A • Q2-QO' it follows that
Hence
(7.1)
The first two terms on the right hand side of (7.1) converge to
zero as D. ..
ca
by Lemmas i. 8 and 7.10 and by the definition of
The third term likewise converges to zero as
as n ..
ca.
n ..
ca.
Iil.
Therefore,
11.5
Therefore,
(7.2)
The first two
zero as n
third
+
te~
te~s
on the right hand side of (7.2) converge to
= by Lemmas
7.8 and 7.10 and the definition of m.
likewise converges to zero' as n
+
=.
Then for all i,j,
2
n:nj!tr(Q2ViQ2Vj) - tr(QlViQlVj )/ converges to zero as n
-s
The
+~.
c
Lemma 7.12.
are bounded for s-O,l, •. ,a.
If c s ,
~here
2
s-O,l, .. ,a, n , and 00i'
i
are as defined in Chapter 2, then for .all s,
-c s
are bounded.
~here
i-O,l,.,p,
116
Proof.
The proof is given only for the first case, the second
case can be proved analogously.
n~:l. have the same order of magnitude.
For any i e 5 , c and
s
i
Then it is sufficient to show that
Cs
Since
cs
has the same order of magni-
.. dim{L(TU
is
: •• :TU )}
p
ra
'EU S
J
t
t-s
ra
ci -
that:
c
ra
C <
j < s-
c
j EU St
jEU 5
t-s+1
e-s
.:i • a
ci
number.
for i E S
s
and j
This implies that
any
for
j
iE 5 •
s
It is also lc.own
t
e
-s
C
5s+ , then ~ is positive finite
1
ci
and c
i
have the same order of magni-
-
s
-----------2
'
C
tude.
Hence,
min
i"O , 1, • ,is+1-1
s"O,l, •• ,a, are bOlmded because
(n )
i
The bOlmd of a quadratic for.n
~Nhich
is of the for.n
.!
, ,
T ABCTI is
given as follows:
Lemma 7.13.
If!Y is an (n-k)xl vector, A and C' are (n-k):tn
matrices, B is an nxn matrix, and A is as defined in Chapter 2, then
117
The next step is to find the bounds of quadratic forms which are
-1 ,
A A. TY for different matrices A. Recall from
O O
,
(2.23) that under TX"~ N(O,Tl: T ), TY can be W1:'i1:ten as
2
,
,
of the form Y T M
-t
,
,
-t
-1 ,
F Z • This gives! I M O AO A II
s-s
! r
:II
Z'F'o'AA-tA:IA'OF Z for any matrix A.
saO t- O
-s s
0 -lJ
t-t
By Cauchy-Schwarz
Inequality,
And by condition 2., we have that
p~ve
Io
,
that! I
,
- t -1 ,
.~O A A II
O
is bounded for different matrices A,
we first find the bound of Al(F:O'AA~tA~lA'DFs) and then combine this
result with the fact discussed in the preceding paragraph.
In the following
values
0
l~,
the bounds of the maximum of the eigen
' , -t - t ,
f matrices which are of the form FsO
AA A A DF
O O
s
are given.
UB
Lemma 7.14.
2
2
Given ~l and'£2
e
2
Nn(~)' let t
k , where k-O,1,2, Ps '
2
where s-O,l,.,a, n , cr Oi ' and Vi' where i-O, •• ,p, be as defined in
i
,
,
Chapter 2, and there exists a nonsingular mat:-ix D such that T!2I -DD •
If condition 1 and condition 2 are true, then for all s,
, and
b)
- 0
otherwise.
Recall from Chapter 2 that P-O' E:R. where H is an orthogonal
Proof.
,
,
matrix, n is a lower triangular matr1:t such that I!2I -DO , and R
is an upper triangular matr1.."( such that R's:'nn'mt.. r.
Ihe matrix
F, i.e., F· (F O:F1 :.:Fa ), can be written as 0' (~:H~: ... :H:)
where
a: depends on ac'
a)
t-O,l, •• ,s.
'-1
-t-1
Using (T:ZI)
• 0 D ,
Then V T'a*=O for 1
i
s
e
a
U
j-s+l
Sj.
119
-But:
a*s -D- tF,
s
s-O,l,. ,a, and
19+1- 1
- r
1-0
Therefore, (t -t )T'a* - E*T'a* - E*!'D-tF.
1 2
s
s
s
This together with
with the definition of the maximum of the eigen values imply that
sup
- 1iO
Applying Lemmas 7.3 and 7.6, we have that
~
* - t -1 * - t I
A1(D-1TA A,T, D t ) max IAj (A-1
2 E AO AO E A2 )
2 2
j-l,2,. ,n
120
by Lemmas 7.1 and 7.5.
, -t
By the same argument as in (a), Vi! D F s • 0 for i
e
a
U
Sj.
j-s+l
a
For i
~
U
Sj' applying Lemmas 7.2, 7.3, and 7.6, we have that
j-s+l
121
max IAj(A;Q2A2)!
j-l,2,. ,n
IAj
max
j-l,2,.
(P2)
I.
,tI.
Applying Lemnas 7.1 and 7.5, we have that
for i
~
The bounds of the maximum of the eigen values of matrices which
are of the form
" AaAaBAa
,
AbZ
for diff erent matrices B are given as
follows:
Lemma 7.15.
and let m, r k , Qk' where k-O,1,2, nit a;it and Vi' where i-O,l,.,p, be
as defined in Chapter 2.
If condition 1 is true, then fo'!' all i,j,
<
b)
122
Proof.
The
p~oof
is given only for (a); the other cases can be
proved analogously.
Applying LC!MI1a 7. 6, we have that
, for all i, by
Lamas 7. 5 and 7. 7•
By assembling the results from Lenma 7.13 to Lemma 7.15, we will
converges to zero as n
~
= in
the fo1loYing Lemma.
123
Given .£12 and
Lemma 7.16.
O'2z
E
Nn
2
<.£0),
i-O,l,. ,p, be as defined in Chapter Z.
let 6. and 6. :lr be as defined
,
Under !Y '" N(£, TtZT ), if condi-
.
2
ticn 1 and condition Z are true. then for all i,j, n:n
- I'Q1ViQ1VjQlyl converges to zero as n
~
)1' z
Q viQZVj Qz!
a.
Bence,
,
I
Since Qk • T (TtkT )
-1_
~,
each of the seven terms on the right hand
side of (7.3) can be written in the form of m2 1 (l'T'AB~)1 for
different matrices A, B, and C as shown in Table 7.1.
124
e
Seven terms with appropriate choices of A, B, andC
Iable 7.1
Ierm
A
1
(IIZI')-lr(tl~tZ)
Z
(II I')-lT --1
Z
ni
C
B
V.
Vi
Q1
ni
~
n
j
QZ
V
, -1
(TtZI)
4
J
(It I') -IT
2
j
V
Z
Q1
ni
Q2
t:.*
.:..1
i
(TtZT')-lTCZl-tZ)
n
+.~
,
,
1
Q
'
, -1
,
, -1
,
, -1
)T (It I )
Z
2
O:l- ZZ)T (T!ZI )
1
V
-!!l* in Q1
1 n
j
i
fo~
I')-l
Z
O:l-tZ)T (It 2T )
m-., I (1"T ABCTY) I, converges
is given; the rest can be proved analogously by
a
a
\'
\'
r..
A A TY)· r..
O O
saO taO
- t -1 '
j
_
(J.. -i:
Q1
Q
using Lemna 7.13 to Lemma 7.15.
(Y T .U.
n
j
:roving that the first term of the
to zero as n
Vj
V
.J.
n
V
7
V
nj
.J. I' (It
i
Vi
I')-l
Z
(t 1-t Z)I ' (TtZI , ) -1
Q1
Q -! 6.*
1 n
T(t 1-t Z)
1
6
QZ n
.
(TtzI' )-lrO: -t )
5
n.
J
V
1
2
~I'(It
t/'
(Tt I') -lr ....1
Z
ni
3
I' (It I')-
~
",
F DM
S S
, ,
Recall that (Y T ABCTI)
- t -1
,
Z
A A DFtZ... for any matriX A.
O O
•
125
By Cauchy-Schwarz Inequality,
and
By lemma 7.12, there
L~sts
a constant B such that for s-O,l,.,a,
-cs
<
B for
j ~
a
U
52.
2.-s+1
Applying Lemma 7.14, we have that for s-O,l,.,a,
88
2
~IOm13,
and then
therefore,
(7.4)
a
Applying Lemma 7.14, we have that for s-O,l,.,a and j
~
U
1.-s+:4
s
2.
126
V
V
" D (Tt T ' )-~
l _ - j A-tA- I _ j T' (Tt T')--nF
L
l'
Z 'F
Z <...l:.
-s S
Z
tl
0
0
tl
Z
s-s
10 s
j
j
c
??
-4
~
.
Z_
Z4
tlj 0'
OJ
B,
100'OJ
and then
Therefore,
By Lemma. 7.15
<
16
2 4 •
Ilia Oi
(7.6)
Combining (i.4), (7.S), and (7.6), we have that
4 ( Y " T crt,! I
!1l
v
) -
z
..
4 88m
~ m
V
j ! I ( Tt !
1 tli QZ _
Z
Ilj
L ( !l-I: ) Q -i
J:
Z
'"'lo
B (a+1)
Z
) -
1
T"Z ) Z
127
2
Vi
1m (!' t;.* ~
i
Vj
Q t r Qz!)
2
I,
converges to zero as
11 .. =a.
For the
j
remainder of the ter.:ns of the form
2
m
1<y'T'ABC'!Y)I, one can
prove by the similar argument that each term converges to zero
as
11 .. =a.
converges to zero as
the following
11 .. =a.
l~.
i-O,l,. ,p, are as defined in Chapter 2, then Var(B ij (E:i)]
£~
~
32min(di, dj )
zi 214 4
a
l1 i n j a Oi OJ
for all i,j.
By using the definition of variance of a quadratic
fo~,
?
Var(y'AY) • 2tr(AZ )-'
Z
2
1.2
v~r [Bij~~)l • v~r {n~~j [~tr(Q2ViQ2Vj)
1.2
22
- I'QzViQZVjQ2Y1J
128
But there are at most min(di,d.) nonzero eigen values of Q2ViQ2Vj'
i,j
J
Therefore,
Applying Lemmas 7.3 and 7. 6,
we
~ Al(U;'A~tA~1u;)
e·
have
Al(A;Q2ViQ2AO)
• Al (!;lvj ) Al (A;Q 2AOAOlv iAotA;Q2AO)
-L_
"
-1
~ Al(!O-Vj ) Al (AOQ2AOAOQ2Aa) Al(r O Vi)'
Hence, Al(Q2ViQ2Vj)
then
e"
~
4
2 2
O'OiO'Oj
by Lemma 7.5 and
L~
7.7.