de Andrade, Dalton Francisco and Helms, Ronald W.; (1984).ML Estimaation and LR Tests for the Multivariate Normal Distribution with Patterned Mean and Covariance Matrix Complete and Incomplete-Data Cases."

•
ML ESTIMATION AND LR TESTS FOR THE MULTIVARIATE NORMAL
DISTRIBtITION WITH PATTERNED MEAN AND COVARIANCE MATRIX.
COMPLETE AND INCOMPLETE-DATA CASES.
by
Dalton Francisco de Andrade
and
Ronald W. Helms
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1455
.
January 1984
ML ESTTIvtATION AND LR TESTS FOR THE MULTIVARIATE
NORMAL DISTRIBUTION WIW PATTERNED MEAN
AND COVARIANCE MATRIX. CCMPLETE AND
INCCJ.1PLETE -DATA CASES.
by
Dalton Francisco de Andrade
A dissertation submitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillment of the requirements for
the degree of Doctor of Philosophy in the
Department of Biostatistics
Chapel Hill
1984
•
DALTON F. ANDRADE.
ML Estimation and LR Tests for the Multivariate
Nonnal Distribution with Patterned Mean and Covariance Matrix.
(Under the direction of
Complete and Incomplete-Data Cases.
•
RONALD W. HElMS.)
In this study ''Ie investigate maximum likelihood
(ML)
estimation
and likelihood ratio (LR) tests for the multivariate normal distribution with mean vector
structures:
11
~
=
~
and covariance matrix k following the linear
Xr< and .-]:;
~
= I gT g~g
G,
where ~X and ~g
G are properly defined
and ! = (T g) are vectors of unknown parameters.
Likelihood equations are obtained when the parameters are subject to
known matrices, and
~
three types of constraints:
(iii) 1t~
(i) 1t~
= ~o' (ii) §t! = 10 and
= ~o and §t! = 10 with 1 and §, and ~o and Xo known matrices
and vectors respectively.
Also, LR tests to test hypotheses related
to the linear structures and constraints referred above are developed.
In the estimation problem, necessary and sufficient conditions
for a particular type of solution to the likelihood equations and
examples of practical situations where these conditions are satisfied
are presented.
Two iterative procedures are suggested to solve the
equations when they do not have explicit solution.
It is shown that
one iteration of one of the procedures gives asymptotically efficient
estimates of the parameters provided we have a consistent estimate of
the covariance matrix.
Also, asymptotic null and nonnull distribu-
tions of the ML estimators are presented.
In the hypothesis testing problem, asymptotic null and nonnull
•
distributions of the LR statistics are discussed.
The asymptotic
nonnull distributions are obtained under sequences of "local" altern-
atives and are shown to have a noncentral
x2
distribution.
Both complete and incomplete-data cases are considered in this
study.
The results obtained for the complete-data case are extended
to situations where we have more than one population with some
common parameters.
Two numerical examples are presented in order to illustrate part
of the results obtained.
•
TO
Francisco, Ana, Janete,
Cristina and Fernando
ACKNO\TLEDGEMENfS
I would like to express my gratitude to my advisor, Dr. Ronald
W. Helms, for his friendship and guidance during my staying in
Chapel Hill.
I am also thankful to the other members of my committee,
Dr. Pranab K. Sen, Dr. James E. Grizzle, Dr. Paul Stewart, Dr. David
G. Kleinbaum and the late Dr. Ralph C. Patrick.
Many people have helped me during my academic life.
them I would like to thank.
To all of
In special, I would like to express my
sincere gratitude to Cl6vis A. Peres, Eliseo R. A. Alves and
Joao G.C. Silva.
All the financial support for this research came from EMBRAPA
(Brazilian Agency for Agricultural Research), to which I am deeply
•
indebted.
In addition, I am grateful to Dr. Gerald Strope for providing
the data use in Example 2, Chapter VI.
Finally, I must express a warm thanks to MS. Judy Harrelson for
the excellent typing job.
Table of Contents
PAGE
INTRODUCTION AND LITERATURE REVIEW
1
1.0
Introduction
1
1.1
Complete data
5
1.1.0 Estimation: One and K-popu1ation cases
1.1.1 Hypothesis testing: One and K-population
cases
1.1.2 MOre on asymptotic distribution
7
14
18
Incomplete data
18
1.2.0 Estimation: One and K-popu1ation cases
1. 2.1 Hypothesis testing: One and K-popu1ation
cases
1.2.2 More on asymptotic distribution
24
30
30
OUtline of the research proposal
31
CHAPTER I:
1.2
1.3
OiAPTER II:
CClvtPLETE DATA:
ONE-POPUlATION CASE.
ESTINATION
AND ASYMPI'OTIC RESULTS
33
2.0
Introduction
33
2.1
Estimation:
solutions
2.1.0
2.1.1
2.1.2
2.1. 3
2.2
2.3.
Likelihood equations and explicit
33
Unconstrained parameters
Constraint 1t~ = ~o
st
.
.
Constralnt
~ ! = Xo
Constraint 1t~ = ~o and §t!
Estimation:
35
38
44
=
Xo
Iterative procedures
53
54
2.2.0 The Method of Scoring
2.2.1 The EM algorithm
55
56
Asymptotic distribution and efficient estimates
62
2.3.0 Asymptotic distribution
2.3.1 Efficient estimates
65
70
PAGE
CHAPI'ER III:
CCMPLETE DATA: ONE-POPULATION CASE. LIKELIHOOD
RATIO TESTS AND ASYMPTOTIC DISTRIBlITIONS
74
3.0
Introduction
74
3.1
Testing HO,Cl): 1t~ = ~O
76
3.2
Testing HO,(2): §t1 = 10
78
3.3
Testing HO,(3): 1t~ = ~O and §t1 = 10
82
CCl>1PLETE DATA: K-POPULATION CASE. ESTIMATION,
LIKELIHOOD RATIO TESTS AND ASYMPTOTIC RESULTS
84
4.0
Introduction
84
4.1
No common parameters among the populations
85
4.2
Common parameters among the populations
86
4.3
Case ~l=···=~k = ~ and ll=···=lk = 1
87
4.3.0
4.3.1
4.3.2
4.3.3
4.3.4
4.3.5
88
91
CHAPTER IV:
Estimation: Unconstrained parameters
Estimation: Constrained parameters
Estimation: Iterative procedures
Asymptotic distribution of the MLE's
Efficient estimates of ~ and 1
Hypothesis testing
100
109
111
INCCMPLETE DATA: ONE-POPULATION CASE. ESTIMATION,
LIKELIHOOD RATIO TESTS AND ASYMPTOTIC RESULTS
116
5.0
Introduction
116
5.1
Estimation:
solutions
CHAPTER V:
5.2
5.3
Likelihood equations and explicit
94
118
5.1.0 Unconstrained parameters
5.1.1 Constrained parameters
118
120
Estimation:
121
Iterative procedures
5.2.0 The Method of Scoring
5.2.1 The EM algorithm
121
122
Hypothesis testing and asymptotic results
123
PAGE
NUMERICAL EXAMPLES
124
6.0
Introduction
124
6.1
Example 1
124
6.2
Example 2
131
SUMMARY AND RECGMENDATIONS FOR RJR1HER RESEARCH
143
7.0
Summary
143
7.1
Recommendations for further research
145
CHAPTER VI:
Q-lAPI'ER VI I:
APPENDIX
147
BIBLIOGRAPHY
150
O-IAPTER I
INTRODUCTION AND LITERATIJRE REVIEW
1.0 Introduction
The problem of testing and/or estimation in the multivariate
normal distribution is an important subject in statistics because of the
great number of situations where the use of this distribution is suitable.
The basic idea is that we have independent observations y.(pxI),
-1
i=I, ... ,N, associated with a random vector ¥(pxl) normally distributed
with a pxl mean vector
~
= E(y) and a pxp positive definite covariance
matrix ~ = E(¥_~)(y_~)t, and we want to make some inference(s) on ~
and/or~.
For example, we might want to estimate
hypothesis that
~
certain submatrix of
~
~
and
~
and test the
i.e. HO: ~ = ~O.
has a specific form, e.g. a
is equal to a known vector
Another important hypothesis is whether
~
~O(pxI),
is equal to a null matrix, or the off diagonal
elements of L are all equal.
The above situation is called the one-population case.
to the K-population case is straightforward.
Extension
Important hypotheses in
this case are whether the means are all equal and/or the covariance
matrices are all equal.
It is also known that in many practical situations, e.g. longitudinal data arising in epidemiological or clinical studies, we have to face
the situation in which the observed data are not complete, i.e. instead
of observing all the p components of Y we observe H Y (uxI), where f,,1 is
2
a known uxp matrix of rank u .:s.. p.
by design or at random.
The incompleteness can occur either
(See Kleinbaum (1970) and Rubin (1976) for
examples and additional references.)
Thus, what could be a simple
analysis can be changed into a complex one just because some of the
information is missing.
As pointed out by Hartley and Hocking (1971),
the occurrence of incomplete data is so frequent that it has become one
of the more important problems in statistical analysis.
This work will deal mainly with the complete and incomplete-data
situations where the mean vector and the covariance matrix have the
linear structure considered by Anderson (1969, 1970, 1973).
Specific-
ally,
r
l! = l! (§) =
X
~
I
~.§. = XB
= [xl' ... ,x ], B = (B1'''' ,Br ) t , where the
~
(1.1)
j=l J J
~r
~'s
are known, linearly
independent (for convenience) p X 1 column vectors and the S's are unknmvn
scalars.
The covariance matrix is given by
m
L
=
L(T) =
- -
I TG
g=l g-g
(1.2)
m.:s.. p(p+l)/2 , I = (Tl, ... ,Tm) t , where the G's are known, linearly
independent pxp symmetric matrices and the
T'S
are unkno\vn scalars.
It
is also assumed that there exists at least one value of T such that
~(~)
> 0, i.e. ~ is positive definite (p.d.).
See (1.4) and (1.5) for
the K-population case.
The idea of considering this linear structure is that most of the
linear models considered in the literature can be studied with this
approach.
Also, when we have incomplete data, some constraint on the
3
covariance matrix can be very helpful for the estimation and/or testing
problem.
For instance, we can, in a certain way, try to compensate the
loss of information in the data with additional information on the covariance matrix.
Of course, expression (1.2) allows the inclusion of
those constraints.
The random effects and mixed models (e.g. Anderson (1970,1973), and
Szatrowski and Miller (1980)), the moving average stationary stochastic
process of finite order (e.g. Anderson (1973)), and the general model
of factor analysis with the matrix of factor loadings specified (e.g.
Anderson (1970)) are examples where (1.1) and (1.2) can be applied.
Also, the multivariate general linear model and the mUltiple design
multivariate model (MIJ.t) discussed by KleinbalIDl (1970) are models where
the linear structure given before can be applied.
To show this, all we
need is to write de multivariate model in a variate-wise representation.
For instance, for the multivariate general linear model
y
••• y
i,ll
- - - -
l
i,lsl
-- - " - -
=
-------~
y
••• Y
i,tl
i,ts
·
f
1,
11 ••• X.1, 1r ~
-------+ E:.
~i~t~ ~.~ ~i~t:.
~1
i=l, ... ,n(=N), in the situation where we have t treatments, s measurements on each experimental mit and n replications, we define
r·1
= (Y.1, 11"'" Y.1, 1S , ... ,Y.1, tl"'" Y.1, t)
S
t
p (=ts) xl ,
4
~
=
~* ~
Is' pxr(=r*s), where
~*
is the design matrix
given above,
~
= !t
any row of fi.
form.
~ ~*,
Clearly
where
~
~*
is the covariance matrix of
is of the form given by (1.2) if
~*
is of that
Takeuchi, Yanai and Mukherjee (1982, Ch. 10) present a very
interesting and updated discussion about inference on patterned covariance marices, which they call covariance structures, and also several
situations where those structures arise.
Maximum likelihood estimates (MLE's) and likelihood ratio
(LR)
tests will be considered for estimation and hypothesis testing purposes
respectively.
The reason is their nice asymptotic properties that are
well known for the cases where B is unstructured (r = p in (1.1)) and
~
has the most general pattern (m = p(p+l)/2 in (1.2)), and are expected
to hold for the cases where r < p and/or m < p(p+l)/2.
Because of the nature of the likelihood functions that will be considered throughout this work, models whose associated covariance matrix
~ does not follow (1.2) but f- l does (e.g. Anderson (1969, 1970)) can
also be studied with this approach.
So, we keep (1.1) and change (1.2)
by
~
m*
-1
= I Xg* ~g*
g*=l
where the same considerations on
applied here to
~
and the
~'s
T
~
(1.3)
and the G's in (1.2) are also
respectively.
~
Results for this new situ-
ation can easily be obtained from the ones obtained when (1.2) is
applied.
Unfortunately, a covariance matrix of the form
5
1
E
,....,
=
P
2
0
L,:l-
P-1l
p
P2 ••• P
1
P ••• Pp-2
- - - - - - - - - - p-2
p-3
P
P
J
which is considered in some repeated time series, does not satisfy
either (1.2) or (1.3) (see Anderson (1970)).
The linear structures (patterns) given by (1.1) and (1.2) were originally introduced for the complete-data situation.
The results related
to this situation will be presented first.
1.1 Complete data
~lajor
results related to the estimation and hypothesis testing in
the multivariate normal distribution with complete data can be found in
many good text books (e.g. Anderson (1958), Rao (1973),
~brrison
(1976),
Arnold (1981), and Takeuchi et a1 (1982)), and are also formally discussed in many courses.
Tests for the means, for some special patterns
of the covariance matrix, e.g. the validity of the repeated measures
model, i.e. the equal-variance, equal-covariance pattern, and for the
equality of covariance matrices are presented in these books.
course, the estimation problem is also discussed.
Of
Searle (1971, ch. 9)
and Arnold (1981, ch.15) present important results for the patterns
generated by random effects and mixed models.
Results related to the
general model of factor analysis can be found in Anderson (1958, ch.
14) and Morrison (1976, ch.9) , and in Anderson (1971) and Box and
Jerkins (1976) for time series.
"
6
Harville (1977) presents an interesting review paper on restricted
and unrestricted
~fLE
for the random and mixed models.
discusses the LR test to test
\~ether
Anderson (1970)
the covariance matrix has the
pattern given by (1.2) with gl' ... '~ and m specified.
Mis assumed.
No structure on
Rogers and Young (1975) and Young (1976) present some LR
tests related to the mean vector M and the covariance matrix I when I
is totally reducible, i.e. when ~g"'Il
G ~_ =
tion 2.3).
~
G , g,h=l, ... ,m (see defini-
"11~g
Rogers and Young (1977) also study the special situation
where both I and I-I have linear structures as defined by (1.2) and
(1. 3) respectively.
(m = p(p+l)/2).
For instance,
~en
I has the "most general pattern"
In fact, for this case m = m* and G = H *,
g = g* = l, ... ,m.
distribution of the
~g
Explicit representation of the
~~E's
~ll£'s
~g
and asymptotic
and LR tests are discussed in these papers.
Cole (1969) discusses the problem of obtaining
~rrE's
for the
parameters of the well known multivariate linear model y* =
~*~*+f*
when I*, the covariance matrix of any row of y*, is a matrix that follows a pattern as in (1.2).
This same problem is studied by Rogers and
Young (1978) but with the assumption that I* and I*-l have both the
same pattern, e.g. I* has the most general pattern.
Notice that the
multivariate linear model is a special case of our approach, i.e. it
can be represented by our vector notation as shown before in section
1. o.
Szatrowski (1979) gives a list of references where
~n,E'
sand/or LR
tests involving special cases of (1.1) and (1.2) were studied.
For some
of the patterns the MLE's and LR tests can be explicitly obtained, for
some others they cannot.
He also presents LR tests with their asympto-
tic null and nonnull distributions for the one and K-ponulation cases.
7
In 1980, the same author discusses the problem of obtaining necessary
and sufficient conditions for explicit representation of the MLE's of
l!. and
~
when they follow (1.1) and (1. 2).
In subsections 1.1.0 and 1.1.1 we will consider in detail the
works by Anderson (1970, 1973) and Szatrowski (1978, 1979, 1980),
where the general linear structure given by (1.1) and (1.2), and/or
(1.4) and (1.5) are studied.
As
said before, a number of authors have
considered special cases of these structures, but our interest is 1imited to the general case.
Notation for the K-popu1ation case.
The notation for the one-pppulation
case was introduced at (1.1) and (1.2).
The linear structures which have been considered in the literature
for the K-popu1ation case (see Szatrowski (1979, 1981)) are
r
= L ~.8d· = ~d
j=l J
(1.4 )
J
and
m
=
I
g=l
Tdg~g
(1.5)
d=l, ... ,K, where the structure of the parameter and matrices for popu1ation d is the same as the one that was specified at (1.1) and (1.2).
1.1.0 Estimation: One and K-popu1ation cases
Let
Xl' ... ,XN be
a random sample from Np(l!.,D, where l!. and
given by (1.1) and (1.2) respectively.
~
are
The likelihood function can be
written as
(1.6)
8
where £ = £(g = ~) =
6+
_
_
t
(X-g)(X-g)
N
_
L Xi
with X=(l/N)
and
i=l
N
6 = (l/N).L (Xi -¥) (Xi-y)t. The log likelihood function is
1=1
~(~,1) = (-N/2)10g 2n - (N/2)10gl~1 - (N/2)tr~-1~
and the likelihood equations, which are obtained from
j=l, ... ,r, and
(a~(~'l)/aTg)
(1. 7)
(a~(A,T)/as.)
J
J:::, ~
=0,
= 0, g=l, ... ,m, are
(1. 8)
and
(1. 9)
1 is an mxm matrix whose elements are trf-IG
f-IG
gh
~
~g~
-h
g,h=l, ... ,m, and [( ) ] is a mxl column vector whose g-th, g=l, ... ,m,
g
A-I A-IA
_
element is tr~ §g~ ~. The general mean, X, and ~ = ~(g = ~) are
where [(
)
A
A
A
A
t
A
~
A
t
given at (1.6), 8 = (8 1 , ... ,8 r ) ,1= (Ll, ... ,Lm) , and
m
LTG.
g=l g~g
A
A
~
A
=
A
~(1)
=
Notice that ~t%-l~ and [(tr%-l§g%-l~ )gh] are positive defi"h
nite (p.d.) matrices provided ~ is p.d. (see Appendix, Lemma A.O).
Explicit solutions for these equations have been obtained only for
very special structures of
~ and~.
For instance, when
Xis
the vector
representation of the multivariate linear model considered by Rogers and
Young (1978).
Szatrowski (1980) presents necessary and sufficient conditions for
what we call S-explicit representation (see definitions 1.1 and 1.2) of
§ and
1,
the ~~E's of ~ and 1.
The conditions are based on the assump-
tion (without loss of generality) that the problem is in what Szatrowski
calls "canonical form", viz., there exists a value of 1, say.:!,+, such
9
+
that ~L(T ) = ""P
I. In other words, ""P
I is one of the allowable values of
of~.
If the problem is not in the canonical form, it can be rotated
+
to this form.
+
y = ~(1 )
>
.
For instance, let 1 be any allowable value of 1 and
o.
Also, consider the linear transformation y* = y-~.
Then, Jd* = E(y*) = ~*~ with ~* = y-~~ and ~* = E(Y*-Jd*) (Y*-Jd*)) =
-k
V
~
-k
~
2L;V
2
m
1
1
= g~l
\' T g~g
G* with ~g
G* = V-~
V-7..
~ ~g~
The problem with .-y*,
11
1':,
* and ':..*
.':.
is in the canonical form because ~*(1+) = Y-~~(l+)Y-~ = Y-~-~ =
.!.p.
Below, we rigorously define what we mean by S-explicit repre-
sentation.
Definition 1.1 We say that the ~~E
if and only if
m
I TO g ,
g=l ,g g
10
>
6 can
Bof
be obtained from (1.8) with
where 10 = (TO 1,···,T o m)
"
0, in particular
8 has S-eA~licit representation
~O
=
t
.
IS
I
replaced by ~O =
any value of 1 such that
lp.
o
Definition 1.2 We say that the ~~E
i
of 1 has S-explicit represen-
tation if and only if ~ has S-explicit representation and
tained from (1.9) with
k replaced
Clearly if both ~ and
i
i
can be ob-
by any matrix 10 defined above.
0
have S-explicit representation, the like-
lihood equations (1.8) and (1.9) can be solved directly and the
j~'s
will be given by
(1.10)
and
A
T =
~
[(tr G G) ]-l[(tr G C) ]
~g'1l gh
~g~ g
.
In most of the imnortant
situations tr G
G = 0 for g " h.
~"l'
~gl1
(1.11)
10
Notice that it is possible to have explicit representation, i.e.
equations (1.8) and (1.9) can be solved directly, and not S-explicit
representation for the MLE's.
In other words, there exist situations
where the likelihood equations can be solved directly but the
~~E's
are
For instance, when f has the most
different from (1.10) and (1.11).
general pattern (m=p(p+l)/2) the MLE of
~
is given by
~
=
(~t~-l~)-l~t~-ly, where ~ is given at (1.6) (e.g. Kabe (1975)).
Clearly
the problem is in the canonical form and ~ is not given by (1.10),
except for very special values of
~.
Szatrowski's necessary and sufficient conditions for S-explicit
representation of the MLE's can be described as:
(i)
The MLE of
~
(and thus of
~)
has S-explicit representation
if and only if the r columns of
~
are spanned by exactly r
eigenvectors of f ,
(ii)
assuming that f is totally reducible (see definition 2.3)
and ~ has S-explicit representation, the HLE of 1 (and
thus of
D has S-explicit representation if and only if
the eigenvalues of f consist of exactly m linear combina-
Condition (i) was also presented by Anderson (1970) as a sufficient
condition for obtaining the MLE of
~
independently of f.
Also, Young
(1976) presents condition (ii) as a sufficient condition for obtaining
explicit MLE of f when
~
is not patterned.
Szatrowski (1980) also pre-
sents a necessary and sufficient condition for S-explicit representation
of
i
when f is not totally reducible.
in detail in the next chapter.
These conditions will be treated
11
The block and nonblock forms of complete (Wilks (1946)), compound
(Votaw (1948)) and circular symmetry (Olkin and Press (1969), and
•
Olkin (1972)) are shown by Szatrowski (1978,1980) to satisfy the conditions.
~
=
For instance, one situation of complete symmetry is when
~18l'
~l
where
= lp' a pXl colwnn vector of one's, and
a b · · • bb a
• b
E = aI +b(l It - I ) =
""P
~
""P""P
""P
b b ••• a
= (81), m=2, ll=a, l2 = b, §l = lp, and
G = ""p""p
1 1t -I""p . To verify conditions (i) and (ii) we can see that:
~2
~
Note that r=l,
•
(a)
=
[~l]' ~
I lp = alp + (p-l)blp = lp[a+(p-l)b], i.e. !p is an eigenvec-
tor of I and therefore (i) is satisfied;
(b)
let P be an pxp orthogonal matrix, ppt=pt p = ""p
I , with first
~
,...
-~
t
row equal to P 21.
"'P
-
f'J
Then, the succeeding rows are contrasts and
PLpt = (a-b)PI pt + bPI
It pt = (a-b)1""p + bB ,
"""'P""P
...".."...
where
~
"""'P~
~
~
is a pxp matrix with all elements equal to zero, except the
element of the first row and first column which is equal to p.
Thus,
£k£t = diag(a+(p-l)b, a-b, ... ,a-b) ,
i.e. the condition (ii) is also verified because the eigenvalues of 1,
a+(b-l)b and (a-b) of multiplicity p-l, consist of exactly m(=2) linear
combinations of ll(=a) and l2(=b).
12
Szatrowski and Miller (1980) applied these results in the mixed
model of the analysis of variance.
Also, Rubin and Szatrowski (1982)
•
give one more example where condition (ii) is satisfied.
In general, the likelihood equations (1.8) and (1.9) cannot be
solved directly, but they can be solved iteratively.
procedures have been proposed by T.W. Anderson.
Two iterative
The Newton-Raphson
(N-R) method (Anderson (1970)) and a second one (Anderson (1973)) which
can be shown to correspond to the Hethod of Scoring (see Rao (1973,
p.366) and Harville (1977) for additional results on this method).
The
Newton-Raphson method was also considered by Cole (1969) in his work on
MANOVA using patterned covariance matrices.
Notice that, the likeli-
hood equations obtained by Cole are a particular case of (1.8) and
(1.9).
Some results on the convergence of this method, based on a Monte
Carlo study, are presented by this author.
As in Anderson (1973), the rth, r=1,2, ... , iteration of the Method
of Scoring consists of replacing
%by f(r-l)=
and thus M(r)= ~Cr) and ~(r) = ~(M=M(r)).
f(r(r-l)) to obtain ~(r)
With
%replaced by f(r-1)
and ~ by ~(r) in (1.9) we obtain r Cr ) and then fer) = fCr(r)) =
m
I L(r)G to yield the next value of
g=l g ~g
r.
~
The procedure stops when the
values of ~Cr) - S(r-1) and L(r) - L(r-l) are small.
Convergence is
not guaranteed for either procedures.
Both procedures utilize second-order partial derivations.
In
fact, they are identical except that the N-R utilizes the observed
second derivatives whereas the Method of Scoring utilizes the expected
values of the second derivatives, i.e. information.
There seems to be
some empirical evidence that the second one is to be preferred (Cox and
Hincley (1974), p.308), and as the expected values of the second"deri-
•
13
vatives are ordinarily easier to compute than the observed ones
(Harville (1977)), the Method of Scoring, as described before, may be
preferred.
Anderson (1973) has shown that one iteration of this algo-
rithm gives asymptotically efficient estimates of
~
and 1 provided that
%is replaced by a consistent estimate of ~.
Another iterative procedure, the
later.
~1
algorithm, will be discussed
This algorithm is usually not directly applicable to the
complete-data situation.
However, many authors have suggested the use
of certain strategies that generate an artificial incomplete-data framework, and then the algorithm can be applied to this new situation to
find
~~E's
for the original problem, i.e. the complete-data situation.
Szatrowski (1980) shows that the
~~E's
of
~
and 1 have S-exp1icit
representation if and only if the Method of Scoring converges in one
iteration from any positive definite starting value ~(O) = ~(1(0)).
Notice that, as the problem is assumed to be in the canonical form,
1(0) can be chosen to be equal to
T+,
""-I
I"toI
i.e. ~(O) = "'P
I.
~
The iterative
procedure can be used to check whether the "necessary and sufficient
conditions for S-explicit representation are satisfied or not.
For the K-popu1ation case the mean vectors and covariance matrices
have the structures given by (1.4) and (1.5) respectively.
out before, here we have K one-population cases and the
(and thus
of~)
sample Ydl""'Y
and
dNd
1d (and thus of
from
Np(~d,Ld)'
~d)'
As pointed
~~'s
of
~d
d=l, ... ,K, based on a random
are a solution of the likelihood
equations
(1.12)
14
and
(1.13)
where the structure of the parameters and matrices for population d is
the same as the one specified at (1.8) and (1.9).
Notice that now we
have
Nd
~d = (liNd) .L (~i-Yd)(Ydi-Yd)t and ~ = ~. Also, the
1=1
matrices
~
t
-1
kd
~
and
A-I
A-I
[(tr~ gg~
.
£h) gh] are positive definite provided
~ is positive definite (see Appendix, Lemma A.O).
All the results given in the one-population case related to the
solution of the likelihood equations are directly extended to this case.
1.1.1 Hypothesis testing: One and K-population cases
As noticed before, only LR tests will be considered in this work.
Let w be the region in the parameter space
hypothesis and
L(~)
=
L(~,l)
given by (1.6).
n specified
by the null
Then, the likelihood ratio
statistic is
A = sup L(8)/sup L(8).
~EW
(1.14)
~En
The LR test consists in rejecting the null hypothesis when A is less
than some predetermined constant.
In some situations the exact null
distribution for A can be obtained; in some it cannot.
One important
result is that under the null hypotheses that are considered in this
work and also under certain regularity conditions that are satisfied
here, -2log A has a x2 limiting distribution with number of degrees of
freedom equal to the difference between the number of independents
15
parameters in
Q
and w.
Two hypothesis testing procedures related to the patterns given by
(1.1) and (1.2), and based on a random sample Y1""'YN from Np(g,I),
have been proposed in the literature.
(i)
(Anderson (1970)). Testing whether the covariance matrix is
of the form given by (1.2) with G , ... ,Gm and m specified.
1
on g is assumed. The LR statistic is
No structure
where ~ = A, with A given at (1.6), and ~w is the MLE of k. given by
(1. 9) with ~ = ~(Q=D.
It can be shown that if g follows the structure
given by (1.1), then ~ = A+(y-~)(y-~)t with ~ = (~tA-1~)-1~tA-1y
k
(see, e.g. Kabe (1975)), and w is the MLE of I given by (1.8) and
(1.9). In both cases, the limiting distribution of -2 log Aunder the
2
null hypothesis is x with [p(p+1)/2]-m degrees of freedom.
Definition 1.3 Let g be a pxp symmetric matrix.
<g> is defined to be
a [p(p+1)/2]x1 column vector consisting of the upper triangle of
elements of
~,
i.e.
(1.16)
where g = (gjk)' j,k=l, ... ,p.
Definition 1.4 The matrix
~
is defined to be a [p(p+l)/2]xm matrix
such that
(1.17)
where the g's are defined at (1.2).
Notice that
(1.18)
16
(Szatrowski (1979)) Let ~ = [~O:~l]' ~ = (~~, ~i)t,
t t t
.
~ = [~O:~l] and I = (10,11) , where ~O and ~O are px r O and raxl respec(ii)
tively, and
~O
and 10 are [p(p+l)/2] x mO and mOxl respectively, with
one of the inequalities r ~ r and m ~ m assumed to be true. Consider
O
O
also the hypotheses HO:~l = Q and 11 = Q versus Ha : they are not all
equal to zero.
Then, the LR statistic to test HO vs. Ha is
A-
where ~
2
IN = 110 I/ 1Zr1 I ,
o
is the NLE of k. given by (1. 8) and (1. 9) and 10 is the ~n..E of
o
k. given by (1.8) and (1.9) with ~,
go'
i o and
A
(1.19)
1,
A
A
~, i, and f replaced by ~O' k.O'
~o respectively. Notice that, ~o = ~oio' i o =
'"
(1 1 ,···,Tm )
o
t
A
~O
and
=
A
A
f(~o=~o~O)'
bution of -2 log A, under Ho' is
In this case, the 1imitingdistri-
x2 with (m+r)-(mO~rO) degrees
of free-
dom.
For the K-population case, three hypotheses were proposed by
Szatrowski (1979).
(a)
HO:
~l
They are:
= ... = ~k = ~ and k.1 = ... = k.K = k.
vs.
Ha : they are not all equal,
(b)
HO: k.1 = ... = k.K = k. vs. Ha : they are not all equal, and
(c)
HO:~l
= ... = ~K = ~ vs. Ha : they are not all equal, with the
additional information that k.l = ... = k.K = k. under HO and Ha ·
It can be seen that the parameter space r1 is the same in (a) and (b),
but it is different in (c), i.e. in (a) and (b) we have independent random samples from
Np(~d,Id)'
d=l, ... ,K, whereas in the last one we have
17
random samples from Np (k~d ,~). Now, the region w in the parameter
space n specified by the null hypothesis is the same in (a) and (c),
but it is different in (b), i.e. in (a) and (c) we have independent
random samples from N (g,k), whereas in (ii) we have samples from
p
Np(gd'~).
Thus, the region w in (b) is equal to the parameter space
n in
(c), and only three ML estimation problems need to be solved to
test all the above hypotheses.
Let f d , d=l, ... ,K, be the MLE of ~d given by (1.12) and (1.13),
tea) be the MLE of ~ given by (1. 8) and (1. 9) with
K Nd
Y
\L ~Yd·'
~ = (lIN) \L
1
d=l i=l
ffild
1 = lea)'
e
~
K Nd
= (lIN) \L \L (Ydo-Y
and ~f = ~f( a )
~ 1 ~d ) (Ydo-yd)t
~ 1 ~
d=l i=l
and also let feb) be the MLE of ~ given by the likelihood
equations
(1.20)
and
(1.21)
A
~d is given at (1.13) with ~ replaced by
where ~(b)
K
~(b),d = ~ g(b),d' and N = d~lNd'
The Method of Scoring, discussed
before, can be used to solve these likelihood equations.
The LR statis-
tics to test (a), (b) and (c) are:
A
-If \(a) - -(a)
N/2
K
-Nd/2
(rrlk.d
),
l
d=l
A
(1. 22)
18
(1. 23)
and
(1. 24)
respectively.
a<~
For asymptotic results, it is assumed that
Nd/N < 1.
i = a,b,c, is
X~(i)
Under HO' the limiting distribution of -2 log A(i)'
with f(a) = (m+r)(K-l), feb) = m(K-l) and fCc) =
r(K-l).
1.1.2 More on asymptotic distribution
For both the one and K-population cases Szatrowski (1979) also
-,
presents asymptotic nonnull distributions for the LR statistics given
by (1.22)-(1.24).
The author shows that the expressions for the vari-
ance of those distributions are greatly simplified when the
S-explicit representation under the null hypothesis.
~~E's
have
The same author,
in 1981, gives much nicer form for the variance and also presents
asymptotic joint distributions for the MLE's of .@ and 1 in the one and
K-population cases.
1.2
Some of those results will be presented later.
Incomplete data
Unlike the complete-data situation, major results on this case can
only be found in relatively recent papers, instead of in textbooks.
Of
course, there are some exceptions, e.g. Searle (1971, ch. 10-11) where
methods of estimating variance components from unbalanced (incomplete
in our case) data are discussed.
Also, only courses on special topics
19
discuss those major results.
Two very good reviews of the literature
until 1970 are presented by Kleinbaum (1970), and Hartley and Hocking
(1971) .
Kleinbaum (1970) introduced what he called the More General
Linear
~fultivariate
(MGll1) model to incorporate both missing data and
different design matrices in the analysis of multivariate data.
a generalization of the Growth Curve Multivariate
introduced to allow incomplete data.
(Ga~)
Also,
model was
Estimation and testing problems
were both investigated in those models, and quadratic forms called Wald
statistics were proposed for testing purposes, when the underlying distribution is multivariate normal.
One critical point in using these
quadratic forms is that it is necessary to have a positive definite
estimate of the covariance matrix, which is not an easy task in many
structures of missing data.
The Method of Scoring, discussed before in
subsection 1.1.0, is suggested by the author as a practical approach to
deal with this problem.
Hartley and Hocking (1971) defined a procedure to derive the likelihood equations
for the situation where N observations, sampled from
a p-variate normal distribution, are divided in R groups, each one having N , q=l, ... ,R, observations with the same pattern of missing values.
q
This situation is a particular case of the incomplete-data-one-popu1ation case that will be discussed below.
ance matrix
~
In their study, the covari-
is allowed only to have the most general pattern, i.e.
m = p(p+l)/2 in (1.2), and g is not structured as in (1.1).
An itera-
tive procedure, similar to the one described in subsection 1.1.0, is
proposed and it is shown to converge quickly for some missing value
patterns.
Also, they show that:
(i) the likelihood equations can be
20
solved analytically for the special cases of the nested groups considered by Anderson (1957), and (ii) their procedure yields the same
likelihood equations as the ones obtained by Trawinski and Bragman
(1964) for the multivariate general linear model with incomplete data.
Hocking and Harx (1979) improve the results obtained by Hartley
and Hocking (1971) by writing the likelihood equations in a more efficient way, giving closed form solutions for the general nested case and
by presenting some computational considerations on a computer program
that was implemented to give iterative solutions to the likelihood equations.
Another method of obtaining MLE' s was introduced by Orchard and
Woodbury (1972), which they called the
'~v1issing
Information Principle".
The idea is to consider the values of the missing data as random variables such that, estimates which are well defined when all the data are
present become random variables (being functions of the missing data).
The likelihood for the sample involves the conditional distribution of
the complete data given the observed data.
Dempster, Laird and Rubin (1977), present the iterative two-step
~t
algorithm for obtaining MLE's from incomplete data, and show that
the method given by Orchard and Woodbury (1972) is an example of an EM
algorithm.. The estimation (E) step, in one cycle, involves calculating
tIle conditional expectation of the complete data sufficient statistics
given the observed data and the current estimate of the parameters.
The maximization (M) step, in the same cycle, then consists of using
these conditional expectations as if they were hte observed complete
data sufficient statistics, and finds the values of the parameters
which minimize the likelihood function of the complete-data case.
For
21
instance, in the one-population case,
for the parameters.
expectations of
mate of
~
and 1.
(1.9) with
sufficient statistics
Thus, the E step evaluates the conditional
y and
~
given the observed data and the current esti-
The M step solves the likelihood equations (1.8) and
y and ~
at the E step.
y and 8 are
replaced by their conditional expectations evaluated
The Method of Scoring can be used in the H-step.
vergence properties are presented.
has guaranteed convergence.
Con-
For certain problems the algorithm
One important result is that each cycle of
the algorithm always increases (at least does not decrease) the likelihood function of the original problem.
Rubin and Szatrowski (1982) suggest the use of this algorithm for
finding MLE's of patterned covariance matrices whose MLE's do not have
S-explicit representation, but are submatrices of larger patterned
covariance matrices, whose
~~E's
have S-explicit representation.
The
algorithm is also applied by Dempster, Rubin and Tsutakawa (1981),
Laird (1982), and Laird and Ware (1982) to estimate variance components.
Andrade and Helms (1983) suggest the use of this algorithm in the
one-population-complete-data case described in section 1.1.
They
introduce a general strategy to artificially treat the complete data as
incomplete data in order to apply the algorithm.
The last three papers
cited above use a special case of this general strategy.
Additional
applications of the EM algorithm can be found in, for instance, Morgan
and Titterington (1977), Aitkin, Anderson and Hinde (1981) and Chen
(1981) .
When comparing the two procedures, the EM algorithm and the Method
of Scoring, it can be seen that the first has linear convergence, whereas the second has quadratic convergence, i.e. when they both converge,
22
the Method of Scoring converges faster than the EM algorithm.
On the
other hand, the EM algorithm is computationally and conceptually much
simpler than the scoring algorithm.
Also, as said before, at each
cycle of the EM algorithm the likelihood function of the observed data
is always increased (at least does not decrease).
This fact does not
happen to the Method of Scoring.
Notice that the likelihood function of the observed data, whether
the data is complete or not, may have several local maxima; hence
several starting values may be necessary to ensure that the solution
obtained represents in fact the global maximum.
sents some special situations where the
~1
Szatrowski (1981) pre-
algorithm is shown to be more
dependent upon the starting points than the Method of Scoring.
Addi-
tional studies on the convergence and dependence upon the starting
values of the
~1
algorithm are given by Wu (1983).
Rubin (1974, 1976) discusses the problem of estimability of parameters in a multivariate data set with missing data and the problem of
ignoring in the analysis the process that causes the missing data.
In
the first paper, a factorization table which indicates the amount of
data available to estimate each parameter is proposed.
In the second
one, the author defines "missing at random" and presents conditions
under which inferences about the distribution -of the data are not
affected by ignoring the process that causes the missing data.
The paper by Harville (1977) cited before also contains results
on missing data for variance component estimation.
Woolson, Leeper and Clarke (1978), Woolson and Leeper (1980), and
Leeper and Woolson (1982) suggest the generalization of the GCM model
introduced by Kleinbaum (1970) for the growth curve analysis of incom-
.e
23
plete longitudinal data.
The third paper deals with the problem of
finding a positive definite estimate for the covariance matrix.
Szatrowski (1981, 1983) studies the problem of estimation and
hypothesis testing when we have both, missing data and the linear
structures given by (1.1) and (1.2), or (1.4) and (1.5).
As our interest is limited to the situations where we have the
general structures cited above, the results of Szatrowski (1981,1983)
are presented in greater detail in the next subsections.
Notation for the one-population case.
Here, we extend the notation
considered in subsection 1.1.0 to allow incomplete data.
Assume that
there are R $ N different patterns of missing values which can be
R
arranged in R subsets of N , q=l, ... ,R, I N =N, elements each, and
q
q=l q
let Y ., i=l, ... ,Nq , denote the pXl complete-data random vector asso~ql
ciated with the (qi)th subject (experimental unit).
The incomplete-
data notation is defined by assuming that instead of observing the
complete -data vector Y ., we observe the random vector Z . = MY. ,
~ql
~ql
~q~ql
where Mq is a known Pqxp matrix with rank Pq $ p, rank [~: ... :t~]tx
t
X = r and for each g, there exists a q such that
. H G M .,. 0,
~
~q~g~q
~
If those conditions on M
q did not hold, data would not be
available for estimating one or more of the unknown parameters and
g=l, ... ,m.
some matrices that appear in the likelihood equations would not be
positive definite.
Usually,
M
is
q
a matrix consisting of a subset of
rows of an identity matrix; MY.
"selects" or "picks off" the non~~ql
missing elements of the complete random vector Yqi . fvlore generally,
M may create linear combinations of the components of Y '. Notice
~
~ql
that, ~ql
Y.
~
Np (ll,E)
'" ~ with
~i ~ Npd (ldq 'kq) with
~
and ~E defined by (1.1) and (1.2) and so,
24
(1. 25)
x
"'Q
= MX, and
"'Q""
(1. 26)
G = M GMt. In fact, we have a situatIon similar to the K-popula"'Qg ""q""g"'Cl
tion case described before, but with different definitions for the mean
vectors and covariance matrices.
1.2.0 Estimation:
One and K-population cases
Based on the arguments presented before in subsection 1.1.0, only
the EM algorithm and the Method of Scoring will be considered here for
finding the
~ITE's
of the parameters.
Let Yll""'YlN , ... , YRl""'YRN be a random sample of size
R
1
R
N = I N taken from N (~,L), where ~ and L are defined by (1.1) and
q=l q
P "" ,..,
""
""
(1.2), and assume that instead of observing Yqi , i=l, ... ,Nq , q=l, ... ,R,
what we observe is Z . = MY., with H as defined before. Then, the
""q 1
""q""q 1
""q
likelihood function of the observed (incomplete) data is
R
- ~ NqPql 2 R
-N 12
R_
L(~,!) = (2n)q-l
( IT If I q )exp{(-~) I N trL Ie
q=1 q
-
q=1 q
-
t
""q
"'Cl
}
N
q
where ~q = ~q(Mq=~.§) = 8q+(kq -MqHkq-Mq) , with ~ = (liN ) I z .
q i=I"'Ql
N
and A = (liN) Iq(z . -2 HZ . -2 ) t .
"'Cl
q i =1 ""q 1 "'Q ""q 1 "'Q
The log likelihood is
(1.27)
25
~C~,!) = (-log 2n)
-(~)
R
R
I N P /2-C~) q=l
I Nq10giL-q I
q=l q q
R
L N trL- l £,
q=l q -q
C1. 28)
'"1
and the likelihood equations, which are obtained from
j=l, ... ,r, and
a~r~,!)/aLg
J = 0,
a~(~,L)/as.
.t:: ~
= 0, g=l, ... ,m, can be written as
Cl.29)
and
(1. 30)
Here, [(
)gh] and [(
)g] are a matrix and a column vector respec-
tively, as described at (1.9), ~Zd and
A
(1. 27), J2,
A
At'"
A
e
~q
= ~q~q
c (~ = ~q.t::
X ~) are given at
At
A
= (B l , .. ·, Br ) , ! = (Ll'''.' Lm) ,and ~q=~q (T) is given by (1. 26) .
R
Notice that the matrices
\ N xtf-lx and [Ctrf-IG f-IG h) h] are
q;l q~q~ ~q
-q ~qg~q -q g
both positive definite provided
%= fCi) is positive definite
(see Appendix, Lemma A.O).
Only in very special situations, e.g. the nested pattern of missing
values described by Anderson (1957), have the above likelihood equations.
8 by 0,
with ~q
X replaced by M
and
~q
~
situations the mean vector
~
~
been solved directly.
In those
is assumed not to be structured as in
~
(1.1) .
Thus, in most of the practical situations interative procedures
will be necessary to obtain the MLE I s of
~
and :E. with incomplete data.
The two iterative procedures suggested are described below for this
26
problem.
(i)
Method of Scoring.
The algorithm is similar to the one
presented in subsection 1.1. O. Using (1. 29) with f replaced by
""Cl
t
L(r-l)
=
M
L(T(r-l))M
r=l
2
•••
~(r)
is
obtained
and then ,
~q
~q~ ~
~q ,
"
,~
ll(r) = X
~(r) c(r) = ~q
C (lI(r))
as at (1.30) , ~T(r) using (1.30) with ~q
I
,l;;q
~q~' ~q
~q
and ~q replaced by ~~r-l) and ~r) respectively, and finally fer) =
m
L T(r)g to yield the next estimate of ~ and of course, of L =
g=l g g
~
M LMt . The procedure stops when ~~(r) - ~~(r-l) and ~r(r) _ ~L(r-l)
~q~q
are small. Convergence of this procedure is not guaranteed.
Szatrowski (1981,1983) extends the results by Anderson (1973) and shows
that one iteration of this procedure gives asymptotically efficient
estimates for
~
and
r
"-
provided that I is replaced by a consistent
estimate of L.
~
(ii)
EM algorithm.
As noted above, this procedure uses the
conditional expectation of the complete-data sufficient statistics
given the observed data ~11"" '~lN , ... '~Rl"" '~RN and a current
1
estimate of .§ and
r.
R
Notice that we use the same notation for the
random vector and its observed value.
The complete-data sufficient
statistics are
R
Y
= (liN) f.\
~
Nq
\ Y.
.f. ~ql
(1. 31)
q=l l=N
and
(1.32)
27
and following Szatrowski (1981), it can be shown that
(1.33)
where T is a (p-p )xp matrix such that [Mt:Tt]t is of full rank and
~q
q
~q ~q
B = T ~ + (T LMt)L-lCZ -~ ) with ~ and L given by (1.1) and (1.2),
""l
~q:-.
~q-..... q ""l
~q;...q
~
~
and
EC8lkll, .. ·,ZRN ;.§,;s) = (liN)
R
+
Nq (1- (VNl lDqUj
R
I
q=l
[~-t
M~
~~ll,q
~
~2l,q
~
;q -1
-
(lIN)
S
l
~12,qJ
~22,q
L~ L~(l/Nl Nq Nq ,
q~q'
x
~
~q'
-t
~~ ~q~~ I J
M -1
(1. 34)
q'
+N (T LMt)L-lA L-lCM LT t ) with A given at (1.27), and
q ~q""'""'Q ~q ~q~q ~q""""'q
~
n = at,Bt)t.
-q
q ~q
chosen to be
lp.
T and B
are defined above.
~q
q
~
t ] can be
Usually, [Mt:T
~q ~q
The rth, r=1,2, ... , cycle of the algorithm is de-
scribed as:
E step:
Evaluates (1.33) and (1.34), the conditional expectations, with
~ and 1 replaced by §(r-l) and I(r-l) ,
28
M step:
Solves the likelihood equations (1.8) and (1.9) with
Areplaced
y and
by the results obtained above for (1.33) and (1.34)
respectively.
The solutions will provide the values of
~
and
1 for the next Estep.
Each cycle of the PM algorithm involves both E and M steps, although
one M step may require an additional iterative procedure to solve the
"complete-data" likelihood equations.
If the Method of Scoring is used
in the M step and the MLE' s in the complete-data situation have
S-explicit representation, then only one iteration of this method will
take place.
Clearly the PM algorithm is not recommended for situations
where the complete-data likelihood equtions do not have explicit
solutions.
Notation for the K-population case.
The notation considered in sub-
section 1.1.0 in the complete-data situation with K populations is extended here to allow incomplete data.
Assume that for population d,
d=l, ... ,K, there are Rd ~ Nd different patterns of missing values which
.
Rd
can be arranged III Rd subsets of Ndq' q=l, ... ,Rd , L Nd = Nd ,
q=l q
K
L Nd=N,
~l
observations each, and let 'idq"
1
i=l, ... ,Nd ' denote the pXl
q
complete-data random vector associated with the (dqi)th subject
(experimental unit).
The incomplete data notation is defined by
assuming that instead of observing the complete-data random vector
Y...3
., we observe ~Zdql. =
~ql
~.,.1"Yd
._~~
t
. 3 is a known Pdqxp matrix with
ql., where M.-uq
t
rank Pdq ~ P and rank [~d1 ... :~dRd]
comments on the
~
t
R
d
= p ~q~lPdq' d=l, ... ,K.
The same
matrices in the one-population can hold here for
~q.
e
29
•
11
,J:;;dq
Y
~dq
= M
~
~dq~d
= X
S
(1. 35)
~dq~d'
=M
X
and
~dq~'
t
m
k.dq = k.dq CLd) = Mqk.d (lci)Mq =g~l Tdg~ ,
r-
~dqg
(1. 36)
GMt
-- M
Ndq~g~q·
As point out before in the complete-data situation, results for
the K-population case can easily be obtained as an extension of those
obtained Mlen we have only one population.
d=l, ... ,K, we have a random sample
Mlich is taken from Npdq (~q'
written as
~q).
~i'
For population d,
i=l, ... ,Ndq' q=l, ... ,Rd ,
The log 1ikel ihood function can be
(1. 37)
The likelihood equations are similar to those given by (1. 29) and
•
(1.30) and can easily be obtained from (1.37).
Also, the Method of
Scoring and the EM algorithm can easily be extended to this new situation.
30
1.2.1 Hypothesis testing:
One and K-population cases
The same hypotheses discussed in subsection 1.1.1 were considered
by Szatrowski (1981, 1983) in the incomplete data situation.
Following
the same notation used in that subsection, based on the random sample
in testing HO: ~l = a
and 'T 1 = a vs. Ha : they are not all equal to zero. The 1ikel ihood
ratio statistic is
~qi'
i=l, ... ,Nq , q=l, ... ,R, we are
inte~ested
RAN
A
/2
(1.38)
= II (Ikq ol/I%q n \) q
q=l
'
, a
where ~,no = Mq~6)~ is the MLE of
kq
given by the likelihood equa-
tions (1.29) and (1.30), and kq,o = ~£(10)~ is the MLE of ~ given
by the same equations but with X
= MX, replaced by X
a=M
~fI'
""q ""q""
-q,
-q.-v
~q
by Iq,o'
~
by
~o' 1 by la'
and
~
by
~,o
=
~ (.Bq,o
=
~q:~o)·
2
Under H ' the limiting distribution of -2 log A is x with (m+r)O
(mO+r O) degrees of freedom.
As in the one-population case discussed above, LR statistics
equivalent to those three presented in the conplete-data situation and
given by (1.22)-(1.24), can easily be obtained for incomplete data.
They are given in Szatrowski (1981).
Notice that, it is necessary to
derive likelihood equations, based on the random sample
~dqi'
i=l, ... ,N , q=l, ... ,N , d=l, ... ,K, under different situations (hypodq
d
thesis).
1.2.2 More on asymptotic distribution
For both the one and K-population cases, Szatrowski (1981) presents the asymptotic distributions of the MLE's discussed
befor~.
31
Also, he gives the asymptotic lUlU and nonnull distributions of the
four LR tests that are discussed in his work.
Part of those results
will be presented later.
1.3 Outline of the research proposal
The purpose of this dissertation is to extend the works by
Szatrowski (1979, 1980, 1981, 1983) by considering a more general
linear structure for the mean vectors and covariance matrices, and more
general constraints on the parameters which are estimated through the
method of maxi.m.lm likelihood.
In O1apter II, the problem of estimation in the comp1ete-data-oneMLE' s of .§ and
population case is studied.
.:E
are disOlssed when the
parameters are subject to three different types of constraints:
t
t
t
t
.
(1) 1'§ = ~o' (2) § .:E = Xo and (3) 1 .§ = ~o and § .:E = Xo' w1th
~o
and
Xo
defined properly.
and efficient estimates of
b, §,
Asymptotic joint distribution of the MLE' s
~
and
.:E
are also studied.
O1apter III discusses LR tests to test hypotheses associated with
the constraints given in O1apter II.
Both the IUlll and nonnull asymp-
totic distributions for the LR statistics are presented.
Olapter IV also deals with the complete-data situation, but for
K populations instead of only one.
results are presented.
Estimation, LR tests and asymptotic
This chapter is basically a generalization of
O1apters I I and I I I when the K populations have some common parameters.
O1apter V deals mainly with the same estimation and hypothesis
testing problems disOlssed in the previous chapters.
is that here we have incomplete data.
disOlssed.
The difference
Asymptotic results are also
Most of the results in this chapter are extensions of the
32
resul ts obtained in O1apter IV.
In O1apter VI, the methods and results are illustrated through
two numerical examples.
Finally, in O1apter VII we present a summary of this work and some
topics for further research.
OlAPTER II
CCJ,1PLETE DATA:
ONE-POPULATION CASE.
ESTIMATION AND ASYI,fPTOTIC RESULTS
2.0
Introduction
In this chapter, we discuss the !1roblem of finding MLE' s when we
have one population and the parameters, or part of them, are subject
to constraints.
For each different type of constraint, the likelihood
equations are presented.
Necessary and sufficient conditions for
S-exrlicit representation of the HLE's, and iterative procedures to
solve the likelihood equations are given.
tions of the
?·~E's
and
ef~icient
Also, asymptotic distrihu-
estimates of the parameters are
rresented.
The linear structures given by (1.1) and (1.2) are assumed to hold
throughout this chapter.
2.1 Estimation: Likelihood equations and exnlicit solutions
Before we start discussing the estimation problem with constrained
parameters, we will write
a~ain
sented in the previous chapter.
some imnortant results already l1reThe idea is to have all the major
results related to the estimation in the one-population case with
cOJTl!'lete data, together in one chapter.
Example:
A completely randomized experiment with two treatments and N
replications (2N experimental units) is considered here to illustrate
34
the results presented in the next 4 subsections.
Three measurements of
a random variable are made on each experimental unit, each one at a
different time.
The usual linear model associated with this situation
can be described as:
for the ith (i=l, ... ,N) replication
•
= X*A*
E(Y~)
~,t::
~1
,
where
the equal-covariance pattern.
to be independent.
The Y..
~lJ
IS,
j=1,2, i=l, ... ,N, are assumed
Also normality is assumed.
Here,
Yijt : observation made on the (ij)th experimental unit at time t,
t
= 1,2,3,
reference cell mean at time t, and
differential treatment effect at time t.
Now, in the notation established in Chapter I, the same problem is
represented in the following way:
I31~
~
X= 0 I
~
~
~*)
,~
In this case, p=6, r=6, m=2,
~l
~3
,~
= diag(~*
~
~
for the ith (i=l, ... ,N) replication
= 12
~~~D
Ul~
Also,
3S
it can be seen that the orthogonal matrix
£ =
1//6
1//6
1//6
1//6
1//6
1//6
1/16
1/16
1//6
-1/16
-1/10-
-1/16
l/Z
-l/Z
0
l/Z
-l/Z
0
l/Z
0
-l/Z
-l/Z
0
l/Z
l/IIT
l/m
-z/m
l/III
1/112
-z/III
-lim
2/m
-lim
l/m
-21m
1/112
is such that ~t
=
n = diag(TZ+2Tl,TZ+ZTl,TZ-Tl,TZ-Tl,TZ-Tl,TZ-Tl),
i.e. the columns of £t are 6 linearly independent eigenvectors of ~,
and they do not depend on 1.
The covariance matrix 1; is said to be
totally reducible (see definition Z.3) and its eigenvalues are the
diagonal elements of
n.
o
Based on a random sampIe Xl' ... ' XN from Np (1:" D, where 1:, and ~ are
defined in (1.1) and (l.Z), we want to find the ~{LE's of ~ (and thus of
1:,)
and 1(and thus of 1;) when the parameters are either subject to or
not subject to constraints.
The likelihood function for this situation
is given by (1.6) and reproduced here as
The problem here is to find the values of ~ and 1 that maximize (2.1).
As noted above, the parameters mayor may not be subject to constraints.
2.1.0 Unconstrained parameters
In this case, the MLE's of ] and 1 are 1(0) and 1(0) respectively,
which are a solution of the likelihood equations (1.8) and (1.9) repro-
36
duced here as
(2.2)
and
(2.3)
The subscript (0) denotes that the MLE's are being evaluated with the
parameters unconstrained.
Below, we present, without proof, major
results obtained by Szatrowski (1980) on the solution of (2.2) and
(2.3) .
Definition 2.1
(Szatrowski (1980)).
We say that the problem is in the
canonical form when there exists a value r+ such that l(r+) = Ip.
See
subsection 1.1.0 for discussion and example.
Definition 2.2
~
=
~(I)
C~derson
(1969) and Szatrowski (1980)).
to be a [p(p+l)/2]
2
x
¢ij,k£ = [N /(N-1)]cov(aij'~£)
6 defined
We define
[p(p+l)/2] matrix whose elements are
= 0ikOj£
+ 0i£Ojk' i ~ j, k ~ £,
i,j,k,£ =l, ... ,p), where 0ik is the (ik)th element of
(ij)th element of
0
at (1.6).
1 and a ij is the
The notation ¢ij,k£ represents
the element of ~¢ with row in the same position as the element a 1)
.. in
<6>
(see definition 1.3) and column in the same position as ak £ in
~~.
0
It can be shown that t is N2/(N-1) times the covariance matrix of
and that ~~
¢I = ~¢(I)
= diag(2I"'P , ""P
I (p- 1)/2)' The next three
~
theorems are based on the assumption (without loss of generality) that
<A>
~
the problem is in the canonical form.
37
Theorem 2.1
(Szatrowski (1980)).
The MLE of
~
(and thus of g) has
S-exp1icit representation (see definition 1.1) if and only if the r
COlumLS of
~
are spanned by exactly r eigenvectors of
Theorem 2.2
(Szatrowski (1980)).
The
~1LE
~.
of 1 (and thus
[)
of~)
has
S-exp1icit representation (see definition 1.2) if and only if ~(O) has
S-exp1icit representation and the m columns of
~
(see definition 1.4)
-1
o
are spanned by exactly m eigenvectors of 2 21 .
The last result comes from the fact that equation (2.3) can be
written as
(2.4)
where ~(O) = 2(f(0)); a straightforward application of Lemma A.6 (see
~
Appendix).
Definition 2.3
(Rogers and Young (1975)).
We say that f(l) is totally
reducible if and only if there exists an orthogonal matrix £, independent of 1, such that £f(1)Et is diagonal. This is equivalent to sa)'ing
that ~g~
GG =
h
~
o
G , g,h=l, ... ,m.
Nh~g
Theorem 2.3
(Szatrowski (1980)).
Let f be total1y.reducib1e and
asswne that ~(O) has S-explicit representation.
(and thus of
f)
eigenvalues of
Then, the MLE of 1
has S-explicit representation if and only if the
~
consist of exactly m linear combinations of
o
ll' ... ,lm·
As said before in Chapter I, if ~(O) and
i CO )
both have
S-exp1icit representation then, equations (2.2) and (2.3) can be
solved explicitly and the solutions are
38
A
.§(O),E
=:
(xtx)-lxtv ,
f"OoJ
f'"O.J
(2.5)
,....,,....,
and
A
X(O),E
=:
[(trG Gh) gh]
where £(O),E = ~(~(O),E
~llE's
-1
~g~
=:
A
[(trG C(O) E) g]
~g~,
(2.6)
~~(O),E)' The subscript E denotes that the
have S-explicit representation.
Notice that, if only ~(O) has
S-explicit representation then, it is given by (2.5) and 1(0) is given
by (2.3) instead of (2.6).
Also, when
~
and the Q's only consist of
zeroes and ones, equations (2.5) and (2.6) are equivalent to obtaining
the MLE's directly by "averaging" (see Szatrowski (1978, 1980)).
In the example presented above we can see that
(i) the 6 columns of .et fom a basis for R6 and therefore they
span the 6 columns of
(ii)
~
~;
is totally reducible and its distinct eigenvalues are
given by exactly two linear combinations of T l and T 2 .
From theorems 2.1 and 2.3 we can conclude that both ~(O) and 1(0) have
S-explicit representation and are given by (2.5) and (2.6) respectively.
In many situations (2.2) and (2.3) cannot be solved directly.
Iterative procedures for obtaining the MLE's will be discussed later.
Next we start discussing maximum likelihood estimation where the
parameters are subject to constraints.
2.1.1
Constraint
D = ~O
Here, we want to find the MLE's of .§ and r when .§ is subject to
t
the constraint 1t .§ = go' where 1 is an ~xr matrix of rank ~ < r and
go
is a known
~xl
vector.
39
Theorem 2.4 The likelihood equations for this situation can be written
as
(2.7)
and
~-l
A-I
-1
A-I
~-l A
= [(trl:~(l)~g~(l)Nh
G l: r... ) gh ] [ (trl:~(l)~g~(l)~(l)
G l: C )]
g ,
~
L
~(l)
(2.8)
where ~(O) is the unconstrained ~rrE of ~ given by C2.2) with
1CO )
replaced by ~Cl)' ~(l) = ICr(l)) and ~(l) = f(M(l) = ~(l))·
The
M and
~ of dimensions rxQ, and rxCr-Q,) respectively, are such
1
that [t~:BL] = [1:gL]-t, where g~ is any (r-Q,)xr matrix which produces
[1:g1] of full rank. The subscript Cl) denotes that the MLE's are
matrices
#"W
~
f'OV
f'OV
being evaluated when the parameters are subject to the constraint
t
1 ~ = ~o·
Proof Let
~(l)(Q,xl)
be the Lagrange multiplier vector and
Q,C~(l)'l(l)) be the loglikelihood function given by (1.7) with ~ and I
replaced by ~Cl) and i(l) respectively.
~
~
t
We maximize
t~
QCl ) = Q,(~Cl),ICl)) + ~Cl)(1 ~Cl)- ~O)
with respect to ~(l)' ~(l) and rCl).
The partial derivatives are Csee
Appendix, Lemmas A.l and A.2, for special results on matrix derivative)
(1)
(ii)
and
tA-l - t~-l ~
dQCl)/d~Cl)= N(~ I(l)Y-~ kCl)~(l))+ 1Q Cl ) ,
~
A-I
~-l
A-I A
dQCl)/dICl)g = C-N/2)Ctrk(1)gg-trICl)ggI(1)fCl))' g=l, ... ,m
A
40
From (i)
= Q and (iii) = Q, we can see that
t~-l
(-1/N)1
~ ~(l)~
1
t
Q
~(l)
=
t~-l ~ ~(l)r
~(1)
~O
Using the results for the inverse of partitioned matrices (see Appendix,
Lemma A.3) the first expression for ~(l) follows.
The second expres-
sion in (2.7) can be obtained from the first one by noticing that
and 1~1 =1~ (see Appendix, Lemma A.4).
This second expression is
exactly the expression we would have obtained if we had had used
reparametrization.
For equation (2.8) we can see that
Thus, (ii) = 0, g=l, ... ,m, implies that
and equation (2.8) follows directly.
o
t~-l
t tA-l
-1
Notice that the matrices ~ ~(l)~' 1 (~~(l)~) 1,
(~1)tf(i)(~1) and [(tr~(i)Qg~(i)~)gh] are positive definite matrices
provided ~(i) is positive definite (see Appendix, Lemma A.O).
Corollary 2.1
The MLE of ~ subject to the constraint 1t~ = ~O is
invariant to the reparametrization, i.e. ~(1) is the same whatever the
41
value of UL ( and thus of
~\
o
and R ) as defined before.
L
Below we present two theorems on S-explicit representation for
"
~(l) and 1(1)·
Definitions 1.1 and 1.2 introduced in Chapter I are
easily extended for this situation.
The basic idea is to find condi-
tions where the ~fiE's can be obtained from (2.7) and (2.8) with %(~)
replaced by!P.
before, the problem is considered to be in the
As
canonical form.
Theorem 2.5 The ~~E ~(l) has S-explicit representation if and only if
whatever the value of B/ as defined before, the r-£ columns of
1
are spanned by exactly r-£ eigenvectors of
"
~(l)
I.
~
Under this condition,
is given by
"
"
~ (1) ,E = ~ (0) ,E
(2.9)
where
"
~(O),E
is the unconstrained
~fiE
of § given by (2.5).
Proof The proof here is a straightforward application of Lemma A.5
(see Appendix) and by noticing that the second expressions in both
equations (2.7) and (2.9) are equivalent to each other for all the
values of y-~~o if and only if
Now, from Lemma A.5, the above equality is true if and only if the
r-£ columns of
vectors of
I;
~1
are linear combinations of exactly r-t eigen-
hence the theorem follows.
The first expression in (2.9)
can be obtained from the second one by noticing that (see Appendix,
42
Lemma A.4)
o
8L[(~L)t(~L)]-18i= (~t~)-1_(~t~)-11[1t(~t~)-11]-11t(~t~)-1
t'"oJ
,....,
,....,
~
Notice that because of Corollary 2.1, it is sufficient to verify
this necessary and sufficient condition for only one value of 8 , In
1
other words, if for a fixed value 8 = 8t we have that the columns of
1
~t are spanned by r-£ eigenvectors of ~, then the columns of llli will
L
~
also be spanned by r-£ eigenvectors of
defined before.
~,
whatever the value of
Unfortunately this result is in terms of
~
~
as
instead of
Corollary 2.2 Assume (without loss of generality) that 1 t = [11t :1t2] ,
where 1i is an ~x~ matrix of full rank. Also, let ~ = [~1:~2] with
~l of dimension px£.
Then, the MLE ~(l) has S-eA~licit representation
if and only if the r-~ columns of ~2-~11it1~ are linear combinations
of exactly
r-~
eigenvectors of
~.
Proof As ~(l) is invariant to the choice of ~L (see corollary 2.1) and
1i is assumed to be nonsingular,
nonsingular.
~I
=
[Q:l(r-~~]
will make
[1:~1]t
Thus, applying Lemma A.3 (see Appendix) we can see that
-1
81 = [-11 12:1(r-~)]
Now we have
t
-t t
o
and ~1 = ~2-~11l 1 2 ,
llliL in terms of
~
and 1.
This result can then be used
to generate 1 matrices which will produce MLE's with S-explicit repret
For instance, in some situations 12 = Q and ~1 = ~2' a
matrix fonned by the last r-~ columns of~. Thus, for a fixed ~, any
t
constraint with 1 = [1i:Q] will produce~~E's with S-explicit representation.
sentation as long as the columns of
~2
are linear combinations of
43
exactly
r-~
eigenvectors of
~,
regardless of the value of the full rank
matrix 11 . Notice that S-explicit representation for ~(l) does not
depend on the value of ~o at all.
Consider again the example discussed before.
Two different
constraints:
~
OOOOJ
°o 10 10 00 °0 00
and ~O = (O,O,O)t, which is equivalent
to: "the two treatments have the same effect at all the
three times:, and
1t = [1 0 0 1 -1 0] and QO = 0, which is equivalent to:
(ii)
"treatment 1 at time 1 and treatment 2 at time 2 have the
same effect" ,
are considered here to illustrate the above results.
1i
r-£ = 3,
[13: 13]t.
= 13 which is of full rank, 1~ =
In (i),
~=3,
Q and ~ = ~2 =
It can be seen that the three columns of ~2 are given by
(2/I6)El+E3+(2/IIT)Es' (2/I6)El-E3+(2/IIT)Es and (2/I6)El-(4/If2)Es'
where p.,
i=1,3,S, is the ith column of pt.
Nl
~
Thus, the 3 columns of
~1 are linear combinations of exactly 3 eigenvectors of ~ and ~(l) has
S-exp1icit representation according to theorem 2.4.
r-£=S,
~=
1i
In (ii), £=1,
= [1] which is of full rank, 1~ = [0 0 1 -1 0] and
00010
1 001 0
010 0 1
00100
o 001 0
°° 0 °1
5 eigenvectors of
In this case, it can be shown that there are no
k which
span the 5 columns of ~ and therefore, ~(1)
44
does not have S-explicit representation according to the same theorem.
Notice that we may also have S-explicit representation for 8(1)
and not for 8(0)' i.e. we may have situations where the r-~ columns of
ffi are spanned by exactly
1
columns of
~
r-~
eigenvectors of
spanned by exactly r
eigenv~ctors
I and do not have the r
of
I.
From equations (2.3) and (2.8) we can see that the constraint on
~
does not alter the form of the likelihood equations for the
~~E
of 1.
Results on S-explicit representation for 1(1) are the ones given in the
previous subsection for
i(o).
Thus, if 1(1) has S-explicit representa-
tion, it is given
(2.10)
where £(l),E
= ~(~(l),E = ~(l),E)' f(l),E given by (2.9).
Again, if ~(l) and 1(1) both have S-explicit representation then
the likelihood equations (2.7) and (2.8) can be solved explicitly and
the solution is given by (2.9) and (2.10).
If only f(l) has S-explicit
representation, it is given by (2.9) and 1(1) is given by (2.8) instead
of (2.10).
The idea of "averaging", see end of subsection 2.10, does
not hold at all for obtaining the MLE of
obtaining the
r~
~
but it still holds for
of 1 when both MLE's have
S-ex~licit
representation.
Later, we will present iterative procedures to solve the likelihood equation (2.7) and (2.8).
2.1.2
Constraint ~-""-A.O
st T = v
-----
In this subsection, we present the likelihood equations and
conditions for S-explicit representation for the MLE's of
when
~
and 1,
r is subject to the constraint §tr = ~1' where ~t is a sxm matrix
45
of rank s
<
rO
m and
is a sxl known vector.
It is assumed that there
exists at least one I such that it satisfies the constraint and I(1»O.
Theorem 2.6 The likelihood equations for this situation can be written
as
(2.11)
and
A
A-I A
A-I
tA-l
-1 tA-l A
)
1(2) = ~(2)~(2) - ~(2)~(~ ~(2)~) (~~(2)~(2) - Xo
=
M~ro
+
C2.12)
%(2),1 (i(2)-%(2)M~!0) ,
A
A
A
t
-1 t A
where I(2) = ICI(2))' ~(2) ,1 = Bs(ES~(2)Bs) B'"s ' ~(2) =
A
.
,..",,..,,,.
[(tr%(~)Qg%(~)~)gh] and
~(~(2)= ~(2))'
[(trf(~)Qgf(~)~(2))g] with
i(2)=
The matrices
M
s and Bs
mx(m-s) respectively, such that
(m-s)xm matrix which produces
~
-
-
[~§:~]
[~:g~]
f(2) =
are of dimensions mxs and
= [§:g§]
-t
of full rank.
t .
, where US
IS
any
The subscript (2)
denotes that the MLE's are being evaluated with the parameters subject
to the constraint ~t1 = XO'
Proof The proof here parallels the one given for theorem 2.4.
~(2) (sx1)
Let
be the Lagrange multiplier vector and ~(~(2) '1(2)) be the
loglike1ihood function given by (1.7) with ~ and 1 replaced by ~(2) and
A
1(2) respectively.
We maximize
with respect to ~(2)' ~(2) and i(2)'
(1)
The partial derivatives are:
46
(3)
where ~g is the gth column of §t. Equation (2.11) follows directly
A-I
A-I
A-I
tA
from (1) = Q. Now, as tr~(2)Qg = [(tr~(2)Qg~(2)~)h] 1(2)'
g,h=l, ... ,m, we can see that (2) = 0 and (3) = Q imply
(-2/N)§
(4)
Q
Therefore, using the results for the inverse of partitioned matrices
given by Lemma A.3 (see Appendix), the first expression for r(2)
follows.
The second expression for r(2) can be obtained from the first
one by noticing that
§(§t%(~)§)-l§t = %(2)-
%(2)%(2),1%(2)
~d §~§
=
(see Appendix, Lemma A.4).
Is
0
tA-1
A
tA-1
t
Notice that the matrices ~ k(2)~' k(2)' § k(2)§ and B§k(2)~ are
A
p.d. provided f(2) is p.d. (see Appendix, Lemma A.O).
Corollary 2.3 The MLE of ! subject to the constraint §tr = ~O is
invariant to the choice of !J§ (and thus of M§ and
%)
as defined
o
before.
As we can see from equations (2.2) and (2.11), the constraint on
! does not change the form of the MLE of
unconstrained case.
g when comparing it to the
So, we can use theorem 2.1 to state necessary and
sufficient conditions for S-exp1icit representation for ~(2)'
If
~(2) has S-explicit representation, then it is given by (2.11) with
1(2) replaced by
lp'
i.e. ~(2),E = g(O),E given by (2.5).
Again, we
are assuming that the problem is in the canonical form, i.e. there
47
exists I
.-
+
+
t +
such that L(T ) = l'l
I and, of course, S T = YO'
/"'f>J
I"J
-
vector of parameter 1, the fonn of its
need to be obtained.
~E
~
I"J
For the
has changed and new results
Before presenting them, it is convenient to
re\VTite equation (2.12).
Corollary 2.4 The MLE i(2) of 1 can be written as
(2.13)
where ~(2) = ~(%(2)) and < >, ~ and ~( ) have already been defined.
(See definitions 1.3, 1.4 and Z.2.)
Proof Using Lennna A.6 (see Appendix), we can see that
A-I
A-I
tA-l
A-I
A-I A
trk(Z)~gk(2)~ = 2<Qg> ~(Z) <~>, g,h=l, ... ,m, and trk(2)Qgk(2)~(2) =
tA-l A
..
A
_ rtA - l I A
Z<Qg> ~(Z)<~(2»' WhlCh lmply that k(Z) - 2~ ~(2)~' ~(2) =
2~ti(~)<~(2»' and the first expression in (2.13) follows. The second
expression can easily be obtained from the first one by noticing that
S[st(wt~-l W)-lS]-lSt= Wt¢-l w_Wt~-l WR_[(WR )t¢-l (WR_)]-l
-
~
-
-(2)~
~
x
~
~ -(2)~
-
_(2)NN~
~§
~(2) NN~
l\\TR-) t¢-l W
......~ ~(2)~
and §~ = Is (see Appendix, Lennna A.4).
o
Below we use the fonn of 1(2) given by the above corollary to
state necessary and sufficient conditions for S-explicit representation
of i(2)'
48
Theorem 2.7 The MLE i(2) of 1 has S-explicit representation if and
only if ~(2) has this representation and whatever the value of B§ as
defined before, the m-s columns of
-1
eigenvectors of 2(2)21
~S
are spanned by exactly m-s
~.
A
Under this condition, 1(2) can be written as
A
A
-1
t -1
-1 tA
1(2),E = 1CO),E ~(2),E§C§ ~(2),E§) C§ lCO) ,E -
ro)
C2.l4)
= M~o+ ~(2),lE(fC2),E- ~(2),E M~o) ,
where iCO),E is the unconstrained ~fiE of I given by (2.6), ~(2),lE =
t
B§(B~(2),EB~)
-1 t
[(tr~g£(2),E)g]
B§'
A
~(2),E = [(tr~g~)gh] and z(2),E =
with £(2),E
=
f(M(2),E
=
~(2),E)
.
Proof As for theorem 2.5, the proof here is a straightforward application of Lemma A.5 (see Appendix) in the second form of 1(2) given by
(2.13).
Notice that the second expression in (2.13) is equal to a
similar expression with !(2) replaced by ~1' for all the values of
- WMS~O' if and only if
<C(2»
tA-l
[(~S) ~(2)(11Es)]
"""'I
,......
-1
tA-l
(~S) ~(2)
("'oJ
=
t -1
-1
t -1
[(11Es) ~I (~S)] (11Es) ~1 .
t"'o.J""""""'"
""
Now, by Lemma A.5 the above equality is true if and only if the m-s
columns of
-1
~(2)~I
~S
~
are linear combinations of exactly m-s eigenvectors of
; hence the theorem follows.
Expression (2.14) is easily
obtained from the expressions in (2.13) with I(2) replaced by
!p'
0
Using the comments that followed theorem 2.5, subsection 2.1.1, we
can conclude that it is sufficient to verify the above necessary and
sufficient condition for S-explicit representation of T(2) for only one
value of
Rs.
Unfortunately, the condition is in terms of Wand
¢
which are
matrices that need to be evaluated and, in general, are large matrices.
49
t
When f(z)' the covariance matrix 1(!) subject to §! = XO' is totally
reducible (see definition 2.3), much simpler
results can be obtained.
First notice that the linear transformation
y~
~1
= "-"1
PY., i=l, ... ,N,
where £ is the orthogonal matrix which diagonalizes f(2)' induces the
linear tra~sformations ~* =m~'
Q(2 ) = diag (dC2 ),1,···,dC2 ),p)'
~(2)£t = L t(2) £§ £t = I T(2) diag(hj(g) , ... ,h*(g)), where
g=l
,g g
g=l
,g
P
d(2 ),k,k=l, ... ,p, is an eigenvalue of f(Z) and hkCg ) is the kth
diagonal element of the diagonal form of §g' and ~Cz) = ~(2)£t
last transformations can be written as <PG pt > = B <Gg > and
~g__
<~(2»
= ~ <~(2»
l"V
The
__
for a suitable nonsingu1ar matrix ~, and !(Q(2))
~!Cf(2))~t (see Anderson 1969, p.61).
=
Clearly the likelihood eauations
C2.ll) and (2.12), and also equations (2.13) and (2.14) are invariant
to the linear transformation, i.e. we can find conditions for explicit
solution to the likelihood equations by considering X* and D(2 ), the
diagonal form of
L(2 )'
instead of
~
and k(2)'
Assuming lwithout loss of generality) that §t = [§i:§~] with
§i
of order sand nonsingu1ar, and k(2) is totally reducible, it can be
shown that £(2) = Cd(2),l, ... ,dC2),P)t, the set of eigenvalues of
·
d* -- H*S-t
H* s-tst)
h
f(2)' can be wrItten
as ~(2)
~l~l !o+ (H*
~2-~lx~1
~2 la' were
!a =
(T +1 , ... ,T m)t, H* = [Hi:Hi] is a pxm matrix whose gth, g=l, ... ,m,
s
column is (h*(g)
= ..-..g~'
PG pt
1
' . . ., h*(g))t
p
, the diagonal elements of G*
~g
Hi
is pxs, and rank (H~-Hi§lt§~) = m-s, i.e. the eigenvalues of f(2)
are given by at least m-s linear combinations of ls+l, ... ,Tm.
so
Theorem 2.8 Let ~(2) be totally reducible, §t=[§i:§~] with §i of order
sand nonsingular,
~*,
Q(2) and
§~
be as above, the problem be
in the canonical form and B(2) have S-explicit representation.
Then,
the ~~E of!, i(2)' has S-explicit representation if and only if the
eigenvalues of k(2) are exactly m-s linear combinations of
Proof As shown before, we can use Q(2)' instead of k(2)' in the search
for conditions for S-explicit representation for i(2).
Also, as
D*(2) is a diagonal matrix, D*( ) = diag(d*
d*) =
2
(2) ,1'" ., (2),p
m
.
\ T
G* with G* = PG pt = diag(h*(g) ... h*(g)) it can be shown
g~l (2)'-g
-g --g1"
P
,
diag(2d(~),1'''· ,2d(~) ,p '2) with ~ a row vector, 2OJ(2 ) ).t?=
diag(d(~),l, ... ,dC~),p' ~), ~* = [< Qi >, •.. ,< ~ >] = [~*t:Q]t and
~*~ = [~*t:Q]t x [-§2§i l :l]t = [(~2-~i§it§2)t:Q]t, because E§ ~~n be
that 2 CQ(2)) =
given by [Ms:B ]
-
s
-
=~Si §~J-l
-0 -I
Here, we want to find conditions where
the columns of W*Rs are spanned by m-s eigenvectors of ~(D(2))~il in
order to apply theorem 2.7.
We know that the eigenvalues of I subject
to the constraint are given by at least m-s linear combinations of
(i)
Assl.UI1e that they are given by exactly m-s linear combinations.
2
2
Then, ~(D*(2))~I-l = diag(8 +l I
, ... ,8 1 ,a), where the 8'5
s Ps+l
m pm
are the distinct eigenvalues, and the matrix ~2- ~i§it§~ has to have
exactly m-s different rows, which implies that
~
51
k(s+l) ••• k(s+l) ••• kCs+l) ••• k(s+l) 0 ••• a t
s+l
s+l
m
m
W*R_
=
...... N-S
......
....
k(m) • • •• kern) ••••
s+l
s+l
o ....•
0
" the jth distinct row of
where (k j(s+l) , ... ,k j(m)"_
), J-s+1, ... ,m, lS
-t t
~~-~r~l ~2· Clearly the m-s columns of ~*Bs are spanned by exactly m-s
eigenvectors of
-1
~cn*(2))~I
...
and by theorem 2.7, the
~~E
A
1(2) of
T
has S-explicit representation.
Cii)
Assume that the elements of Q(2) are given by more than
m-s, let us say m-s+f, linear combinations.
In this case
~CQ(2))~Il = diagC8;+11r
, ... ,8~r' 0ilt , ... ,o~lt ,~), where the
s+l
m
1
f
8's and o's represent the distinct eigenvalues, and the matrix
-t t
~~-~r~l ~2
has to have r different rows,
m-s eigenvectors of
of the new form of
-1
~C~(2)) ~l
~*%
m-s~r~m-s+f.
Clearly more than
are necessary to span the m-s columns
and according to theorem 2. 7, we do not have
0
S-exp1icit representation for r(2).
...
The next corollary shows that the necessary and sufficient
condition stated by theorem 2.8 can never be achieved, unless
!o
=
Q.
Corollary 2.5 The number of distinct eigenvalues of k subject to the
constraint has to be greater than m-s, i.e. the r~E of 1,r(2)' cannot
have S-explicit representation, unless !o
=
Q.
Proof We already know that the set of eigenvalues of k subject to the
"d*
"
* ) t .-constra1nt,
. . . (2)' can b e
WT1tten
as d*
. . . (2) -- (d*(2),1'···' d(2),p
5Z
~i§it~o + (~~-~i§lt§~)la and rank (~~-~i§lt§~) = m-s, i.e. the p
elements of £(Z) have to be given by at least m-s linear combinations
of ls+l, ... ,lm' the elements of lao
Let us assume that they are given
by exactly m-s linear combinations, i.e. £(Z) has exactly m-s distinct
elements.
such that
Then, there exists a [p-(m-s)]xp matrix 8 of rank p-(m-s)
~(Z)
=
Q. For instance, if £(Z) = (8 l ,8 l ,8 l ,8 Z) t , one of
1-1 0
the possible values of 8 is 8 =Lo 1
-t
-t t
<~-> ~i§l ~O + 8(~~-~i§1 §z)la =
0oJ.
-1
-1
Q <-->
~(Z)
Now,
-t
~i§l ~O =
Q and
-t t
8m~-~~§1
§Z) = Q <=> ~i = Q and ~2 = Q (assuming ~O ~ Q)
From the last equality, we can see that
~
~
= Q.
(~)
= m.
Thus,
Q implies that the number of distinct elements of £(Z) has to be
greater than m-s.
<
<--> ~
has to have exactly m-s
distinct rows, which is not possible because rank
Xo
=Q
When Xo = Q, ~(Z) =
Q<
>8(~2-~i§1
-t t
§Z)
=
~
Q
-t t
>~~-~i§l
§Z has exactly m-s distinct rows, which is possible
because its rank is equal to m-s.
0
As an illustration, consider the example described before with
§t = [1 0] and !O =
Q, i.e. II = O. In this case, £(Z)= lz16 and by
theorem Z.8 the
of 1 has S-explicit representation.
~rrE
A second example.
k=
Let
l3
lZ
II lZ
lZ
l3
lZ
T
II lZ l3
T
~z
T
I
lZ
l
and
~
be such that there exists S-explicit
Z
l3
representation for the MLE
It can be shown that there exists
t
orthogonal and independent of !, such that m = diag(l3+ ZT Z+ l l'
of~.
£,
53
13-212+11,13-11,13-11)'
The 3 distinct eigenvalues of
k are
exactly 3
linear combinations of 11 ,1 2 and 1 and by theorem 2.3 the MLE of 1 has
3
S-exp1icit representation. The following constraints on I are
considered:
(i) §t = [1 -1 0] and
!o = 0,
i.e. 11 = 1 2 , and
In (i), 2.Cz) = (1 3+31 2 ,
(ii) st = [1 0 0] and 2':0 = 0, Le. 11=0.
13-12,13-12,13-12)t and there exists S-exp1icit representation for
A
1(2)'
On the other hand, such explicit representations do not exist
in (ii) because in that case 2.(2) = (13+212,13-212,13-12,13-12)t.
It is imPOrtant to notice that, when
tl~
0
has exactly m-s distinct
rows, all constraints of the form §t = [§~:Q] and
lio = Q will provide
S-exp1icitrepresentation for 1(2) given that ~(2) has S-exp1icit
representation.
The same considerations made at the end of subsection 2.1.1,
which are related to S-exp1icit representation for both
for the MLE of .§, can also be made here.
~fi£'s
or just
Also, the idea of "averaging"
(see Szatrowski (1978, 1980)) can be considered here for the MLE of
.§.
Iterative procedures to solve the likelihood equations (2.11) and
(2.12) will be presented later.
2.1.3 Constraint 1~ and §~ = Xo
Here, we want to obtain the MLE's of .§ and 1 when both vectors of
parameters are subject to the constraints already defined in the last
2 subsections.
Clearly the likelihood equations, necessary and sufficient
conditions for S-explicit representation for the MLE's, and other
results can easily be obtained from those results presented in sub-
54
sections 2.1.1 and 2.1.2.
For instance, the likelihood equations for
this case can be written as
(2.15)
and
(2.16)
= M§Yo
+
%(3),1 (i(2) -%(2)t!~O) ,
where g(O) is the unconstrained MLE of ~ given by (2.2) with %(0)
~
A
A
~
t~
-1 t
BS,
A
replaced by k(3)' ~(3) = k(1(3))' ~(3) ,1 = B,.....,....,
s (BSk(3)B,....,s) ,...., ~(3) =
A-I
A-I
A
A-I
~-l A
[(tr~(3)gg~(3)~)gh]' and ~(3) = [(tr~(3)ggk(3)£(3))g]' g,h=l, ... ,m,
with £(3) = £(M(3)= ~(3))'
The matrices M , L~, B and ~ are"aefined
1
1
in the last two subsections. The subscript (3) indicates that the
are being evaluated with the parameters subject to the constraint
t
t
.
tA-l
t tA-l
-1
1 ~ = go and § 1 = 10' Also, the matrIces ~ k(3)~' 1 (~k(3)~) 1,
t tA-l
tA-l
tA
A
E1~ k(3)~1'k(3)' § k(3)§ and E§~(3)B~ are p.d. provided k(3) is p.d.
~~E's
A
(see Appendix, Lemma A.O).
In the next section, we present two iterative procedures to find
~~E's
of
~
and 1 when the likelihood equations already discussed do not
have explicit solution.
2.2 Estimation:
Iterative procedures
In the last section, we introduced the likelihood equations
associated with each different situation of possible constraints on
~
and/or 1.
Special cases where those likelihood equations can be
solved explicitly, were presented.
In this section, we present two
55
iterative procedures, the Method of SCoring and the EM algorithm, to
obtain the HLE' s when the likelihood equations do not have explicit
solution.
2.2.0 The
~~thod
of Scoring
The way we will describe the algoritlnn is different from the way
it is commonly described.
This fonn was first suggested by Anderson
(1973) and it can easily be shown to be equivalent to the traditional
fonn.
The rth, r=1,2, ... , iteration of this procedure is described as
lOt follows'.
f
Wl°th ~(w)'
W
~(r-l) we 0 bt aln
.
= 0 , 1 , 2 , 3 , rep1aced by k(w)
(r)
.@(w) from (2.2), (2.7), (2.11) or (2.15), depending on the situation
Then C(r) = C(ll(r) = x~(r)) and T(Y) from (2.3)
~(w)
~ N(w)
~(w)
~(w)
,
(r)
(r-l)
(2.8), (2.12) or (2.16) with ~(w) replaced by ~(w) and k(w) by k(w)
m
r )) G is the value of I( ) for the next
The value of ~L((r))
=
\L T(e w
w
g~g
~ w
g=l
'
step. The procedure stops lvhen ~(r) - ~(r-l) and T(r) - T(r-l) are
we are studying.
A
h
small.
A
k:(w)
k:(w)
~(w)
~(w)
Convergence of this procedure is not guaranteed.
see that, when the
~{LE's
of
.@ and
~
It is eas)' to
have S-explicit representation, this
method converges in one iteration from any starting value of ~(w)'
Also, later in subsection 2.3.2 we will show that
.@~:~
asymptotically efficient estimates of .@ and 1 provided
and
~~:~
are
k~~~ is a
consistent estimate, i.e. one iteration of this method gives asymptotically efficient estimates provided we start with a consistent
estimate of k(w)'
56
2.2.1 The EM algorithm
Despite the fact that this iterative procedure is commonly used
in the incomplete-data situation, it is possible to generate an artificial incomplete-data situation where the algorithm is applied to
find MLE's of parameters in the original problem.
been described by Andrade and Helms (1983).
Here, we extend their
ideas to situations where the vector of parameters
constraint 1t~ = ~O.
This strategy has
~
is subject to the
The results obtained by Andrade and Helms (1983)
can be viewed as a special case of the ones obtained here with 1 = Q,
R = I and 80 = O.
L
Consider the situations in which y.
=~
XR + ~1
0. + E.,
i=l, ... ,N,
~1
~1
~~
~
~
~
where
with kl =
I = kl + wI· The unknown parameters Yh and the pxp known matrices
R
are defined in the same way as
~
expression which defines k.
L
g and ,G , g=l, ... ,m, in (1.2), the
Also, w is an unknown positive parameter,
and X and 8 are given by (1.1). If we artificially assume that
t t t , i=l, ... ,N, is our "complete data", the likelihood function
(Y.,o.)
1
1
for these new data can be written as
-N/2
kl+wIp kl
kl
-1
exp{(-N/2) [trkl
~l +
kl
(2.17)
57
t
where Y = (Yl,···,Ym ) , Cl
l
~
~
N
(l/N).L (~i-~)t(~i-~)' ~i = Yi-~i· It can be shown that the likeli1=1
hood equations for this artificial "complete data", when ~ is subject
to the constraint 1t~ = ~O' can be written as
~ = (~t~)-l~t~ _ (~t~)-11[1t(~t~)-11]-1[1t(~t~)-1~t~ - ~O]
(2.18)
- M e + R [(XR_) t (XR_) ] -1 (XR ) t (d XM e )
- ~L~O
~L NNL
NNL
~L
- -"""L-O '
~
......
""'-I
......
......
......
(2.19)
and
~ = (1/p)C 2 '
(2.20)
N
\
t
(l/N).L (Yi-~i)' kl = kl(r) with r = (Yl,···,Y ) , and
ml
1=1
Cz = CZ(M=~~). The matrices ~ and
are defined in subsection Z.l.l,
£=
where
A
A
A
A
A
E1
and
~l
is given at (2.17).
N
If the "complete-data" sufficient statistic ( .L\ -1-1
d~d. ,
N
N
1=1
\ Xtd., . \L ""'
o.o~)
• L _ ~1
1""'1 were known, the above likelihood equations could be
1=1
1=1
solved explicitly for ~ and~. Explicit solution for will depend
i
solely upon the structure of kl and not upon
subsection 2.1.1.
~1'
as before in
Of course, the "complete-data" sufficient statistic
described above is not known.
Only Yl, ... ,YN, which in this context
are supposed to be the incomplete data, are observed.
So, we first
"complete" the unobserved "complete data" and then use the above 1ike1ihood equations to obtain the
~~E's
of
~
and !' thepararneters of the
original problem.
The rth, r=1,2, ... , iteration of the EM algorithm has two steps:
58
(i) E step At the rth iteration, the E step "completes" the
artificial "complete data", i. e. it computes the conditional
expectation of the "complete-data" sufficient statistic given:
(1) the observed data Yl' ... 'Y ' and (2) the estimated values of the
N
parameters from the (r-l)th iteration.
are:
I
.
t
E[ ~L ~.~.
y·, ... ,yN,
.1=1 1 1 ~1
""
~
In this context, the equations
(r-l) , Y(r-l) , w(r-1)_
] ""
N
\ ( ._o~r))t( ._o~r)) + Ntry(r)
r1
.L
1= l
E[
~L
. 1
1=·
""I
(2.21)
.-
I - ~I""Xt (v. - ,0.(r) ),
Xt d. # ] ""I
r1
""I
.
L
~1
1=
(2 ° 22)
""I
and
(2.23)
. °
f or " ll'··· ,IN; ,t::j'.«(r-l)
wh ere '#'.IS a typograph 1ca1 a bbrev1at1on
O
y
(r-l)
,
(r-1)" ,
, w
""
(2.24)
with Llr - l ) = L (y(r-l))
~l
""I ""
'
and
y(r) = L(r-l) _ L(r-l) [L(r-l)
.-
""1
""
= w(r-1)[lp
""1
+
+
w(r-l)I ]-l (r-1)
L1
""p
""
w(r-1) (kir-l))-l]-l
(2.25)
(ii) M step At the rth iteration, the tf step evaluates the
values of the parameters which maximize the "complete-data" likelihood function with the sufficient statistics replaced by their
conditional expectations evaluated at the E step.
In this context,
59
the values of the parameters are given by
= (~t~)-l~t~(r) _ (~t~)-11[1t(~t~)-11]-1[1t(~t~)-1~t~(r)_~O]
= ULQo+EL[(~)t(XBL)]-l(~~)t(Q(r)_~ ~O)
--
y
(r)
,..,.,
= [(tr~(r)
1
,....,
-1
,..,.,
(2.26)
,....,
-1·
-1-1
H..)
(l[(tr~(r) H ~(r) C(r))]
-h 1
--h' hh'
1
h 1
1 h
H. ~(r)
(2.27)
h,h' = l, •.. ,m , and
l
(2.28)
C(r) =
where a(r) is the arithmetic mean of d~r) = y._o~r)
,..,.,
""""I
,....,1 """"1
'''''''1
N
t
N
y(r) + (l/N).I ~ir)~ir) , cir) = (l/N).I (Qi r ) _ ~(r))t(Q~r)_ ~(r))+
1=1
1=1
tr y(r) and k~r) = kl(r(r)). ~fr) and y(r) are given by l2.24) and
r ), X(r) and w(r) 'oJill be the value of the
(2.25) respectively.
i
parameters for the next Estep.
Notice that one should have kl such that equation (2.27) can be
solved explicitly.
Otherwise, an iterative technique, for instance
the Method of Scoring, to solve (2.27) at each iteration of the 91
algorithm is necessary.
This would render the entire method unusable.
When kl is such that !(r) (and thus kir)) has S-explicit representa~
tion, then !(r) is obtained from (2.27) with ~ir) replaced by
!p.
Also, notice that L(](r), l(r)) ~ L(](r-l), l(r-l)), 'vhere L(],l)
IS
the likelihood function for the original problem given by (2.1), a
nice property of the EM algorithm.
In some situations, for instance the example given below, the
problem is overparameterized;
y
and ware not separately identifiable.
~
However, the parameter being estimated is 1, which is identifiable.
The practical consequence is that one should check for convergence of
60
l(r) rather than separately checking for convergence of r(r) and w(r) .
As an illustration, consider the first example presented in subsection 2.1.1 with 1t = [1 0 0 1 -1 0] and
k = I2 l
~
g=l g g
~ ~
with Q1 = 12 ~ 1 10 1
1 1 0
and g2
~O
= O.
= !6'
In this example,
.
and!
= ~I3
0
"'"
1~
I .
""'3
So, if we let m1=2, tl1=Q1' tl2=~2' Y1=ll and Y2+w = l2' k can be
written as k = k1 + wI and X(r) has S-exp1icit representation because
k1 has the same structure as k (see theorem 2.3). Here, the values
of the parameters ~(r) and !(r) are given by
((I(r) -2d(r.) +d(r)) I~
1
4
S
I
( _(I(r)+2(I(r)-a(r))/2
1
2
S
i
r )=
where a~r), j=1, ... ,6, is the jth
J
(f(r) _(fer)
3
6
a(r)
4
((I( r ) +a( r) ) 12
1
S
(f(r)
6
element of a(r) , and
c(r) + c(r) + c(r) + c(r) + c lr ) + c(r)
12
13
23
4S
46
S6
y(r) = (1/6)
6
I d~)
j=l JJ
j,k=1, ... ,6, is the (jk)th element of ~ir).
(2.28).
Also, w(r) is given by
As said before, at the rth iteration we check convergence for
~(r) and r(r) = (yi r ) ,y~r)+w(r)).
61
In other situations, there exists a natural way of splitting 1 in
two components.
In these situations, for instance the mixed model of
analysis of variance (e.g. Anderson (1973), Szatrowski and Miller
(1980), and Laird (1982)), the random vectors Xi' i=l, ... ,N, can be
written as
Yt=M +Mt
+fi' where ~J
-NIDp+q[~l,[~t w~
m·
k, is a known pxq matrix of rank q~p, ki = L1Yh~' Qi =
Mi
and
.
h=l
Ll = Var(o.)
= ZLl*zt. Also, y].1 and Hh* are as l g and """g
G in (1.2).
"""'1"""'''''''' ,....,
Therefore, using the results obtained before and considering
row
(""oJ
•
d. = y.-o.=Y.-ZM, the rth, r=1,2, ...., iteration of the algorithm can
~1
~1
~1
~1
~1
be described as:
(i) E step Evaluates (2.21) and (2.22) with o~r) = Zo~(r) and
""1
N
""""1
N
E[ \ o~o~tl#] = NV*(r) + \ o~(r)o~(r)t where
.L ""1""1
""
.L ""1
""1
'
1=1
1=1
o~(r)
""1
= r*(r-l)zt[ZL*(r-l)zt + w(r-l)I ][ ._XA(r)]
""1
"" ~""l
""
""'P
2:1 ~
= [k,tk, + w(r-l) (ki(r-l))-l]-l[2:i-!§(r-l)]
with r*(r-l) = r*(y(r-l))
""1
""1 ""
'
and
v*(r) = L*(r-1)_ r*(r-l)zt[zt r *(r-l)zt+w(r-1)I ]-l zr*(r-1)
~
~1
~1
"" ~ ""1
~
~
~""1
= w(r-1) [k,tk,+w(r-l) (ki(r-1))-1]-1
(ii) M step Using the conditional expectations obtained above,
it evaluates ](r) from (2.26), w(r) from (2.28) and r(r) from (2.27)
modified by adding a superscript asterisk to each instance of 1~r) ,
~,
_
h-1, ... ,ml , and
(r)
~l
,and
62
In general, the matrix
ki
is such that y(r) has S-explicit representa-
tion and it is given by
y(r) = [(trH*R*)
]-l[(trR*C*(r))]
~hNh' hh'
Nh~l
h
as for instance, when
1
~
k~
has the most general pattern or it is equal to
k where k has the most general pattern.
We finish this chapter presenting some asymptotic results for the
MLE's of ] and 1 discussed in section 2.1.
2.3 Asymptotic distribution and efficient estimates
Below, we present two well known results on asymptotic distributions that will play an important role in obtaining most of asymptotic
results.
The second result can be found in Bishop, Fienberg and
Holland (1975 - theorem 14.6.2).
Theorem 2.9
(Szatrowski (1979)).
N~[IYJj
-[[~J]
L<~>
<'pp
distributed according to N(Q, diag(I,!), where yand
is asymptotically
~
are given at
(1.6), and < > and! by definitions 1.3 and 2.2 respectively.
Theorem 2.10
(Standard Delta Method).
random vector and
lim
N-+oo
!,; "
L(N2C~N-ft))
~
Let ~N be at-dimensional
a t-dimensional vector of parameters such that
= NCQ,ICft)). Also, let i be an u-dimensional
function defined on an open subset of a t-dimensional space, with a
differential at ft, with the expansion given by
0
~
63
Then,
lim
N-+oo
L(~(i(§N)-f(~))) = NCQ,(af/a~)Ic~)caf/a~)t).
o
In this section, the u-dimensional function f is given by
f( )C(yt,<A>t)t)
= c~tc
),~t(
))t, w = 0,1,2,3.
~
~
~w
~w
~w
and
IC~)
Notice that u = r+m
= diag(1,2)·
Definition 2.4
The partial derivative of f = (fl, ... ,fu)t in relation
to ~ = Cel, ... ,et)t is defined as
.2f/d~ =
, a uxt matrix.
af lae
u
1
•••
dimension 1, i.e. u=l, f=f and
When
af lae
u
t
af/a~
= [af/ael, ... ,af/aetJ, a lxt row
vector.
0
For instance, for fCw)
have
af
~w
C
)/au =
~
, where, for
instance
with
i has
<~>jk'
j
$
k=l,.",p, the (jk)th element of
<~>,
64
The above theorems provide us the proper results for evaluating
the asymptotic joint distribution of the MLE's of
~
and
r.
First,
notice that the likelihood equations obtained in section 2.1 and given
by (2. 2) and (2. 3), (2. 7) and (2. 8), (2 .11) and (2 .12), and (2.1 5) and
(2.16) can be summarized as
(2.29)
and
T
-(w)
=M
v
~§~O
+
Z
~(w),l
(z-(w) - Z
My)
-(w)~~O
(2.30)
'
A-I
D
- ~(w)~
D-1 L(L~t~-l
D
L) -1 -Lt ~(w)
D-1
~(w)
~(w)A
=
~(w) = [(trf(~)§g%(~)§h)gh]
~
-1 t
t~
k(w),l
= B§e~ew)~) ~
A-I A
~-1
~
z
~ew)
=
A
= 1, ... ,m,
, g,h
.
[(trW( w)G L( w)C( w))]
g
~
~g~
_A
t
_'"
A
_
WIth C( w) - A +
~
~
A.
~
A
Be w)
~
A
~(w) = (Y-~(w))(Y-~(w)) , ~(w) = ~](w) and A given at (1.6).
AI so, M;6
=
Q, 8;6
=
4,
k
=
Q, ~
=
a and
!2. 0 = Q when w = 0, 2
(lll1constrained situations for ]), and they are as defined in subsection
2.1.1 when
and 10
w =
1,3.
= Q when
On the other hand, M§ = Q, ~ =
lm,
§ = Q, s
=
0
w = 0,1 (lll1constrained situations for ;r), and they are
as defined in subsection 2.1.2 when w
= 2,3.
65
2.3.0 Asymptotic distribution
Here, we present the asymptotic joint distribution of B(w) and
l(w) , w = 0,1,2,3, the MLE's of ~ and 1 discussed in section 2.1.
Theorem 2.11 The asymptotic j oint distribution of the HLE' s of .§ and
1 evaluated at the true parameter value
1*
(Y,6)
= (};!.*
,k.*) , where .k!.* and
need not be patterned, is given by
lim L [J2[
N ~
[~(W) ]
00
,[Yew) ,1 ~ (w) '2J
t
lew)
v~(w), 2
,(2.31 )
~(w),3
w = 0,1,2,3, where
v~(w),l
•
=
D
XtT
XD
-2[K~(w)~(w),l
Z
(I -Z
F )-tQt
~ ~(w),l~(w)
(w)
~(w),l~ ~(w)~(w),l
+ Q (I -Z
F )-lZ
Kt · ]+2Q V
Qt
(w) ~ -(w),l~(w)
~(w),l~(w)
(w)~(w),3 (w)
is a rxr variance-covariance matrix related to the elements of g(w) ,
V
= 2K~Cw)~(w),l
Z
(I~ -F~(w) ) -t -2Q Cw)~(w),3
V
~(w),2
is a rxm matrix related to the covariance between the elements of ~(w)
and lCw) , and
7
V
= 2(1~ -Z~(w),l~(w)
F )-1 ~(w),l
(2H +J)Z
(I -Z
F )-t
~(w),3
~(w) ~(w) ~(w),l ~ ~(w),l~(w)
is an mxm variance-covariance matrix related to the elements of
i Cw)·
DCw ) ,1 and ZCw),l are the values of D(w) ,1 and ZCw) ,1 respectively,
given at (2.29) and C2.30) with ~(w) replaced by ~(w) and ~(w) by
~Cw)'
66
T
~(w)
= ~(w)~
z:-l Z:*L- l
~(w)
K
- D
X!.r
[( t
)]t u t
~(w) - ~(w),l~ ~(w) ~(w),g g '-(w),g
= (g*_g
(w)
)tL-1 G
~(w)~g
x
l XD
(I~ _LXt ) where ( ) g represents the gth row of the matrix
~(w)~(w),l~
[ ( )gl ,
-1
t
:t
~h~(w)~~(w)~ )) gh] , ~Cw) = (~* -~(w) )(~* -~(w))
Q(w)
= [((
G L- l XD
)]t
g* -gCw) )tz:-1
~(w)~g~(w)~(w),l g
,
,
H
= [cut
T ut
) ] and ~(w)
J
= [(trT~(w) G
T G) ] '
~Cw)
-(w),g~(w)~(w),h gh
~g~(w)~h gh
g ,h=l,
... ,m.
Also, ~Cw) and lew) are the "MLE' s" of .§ and 1 obtained
from (2.29) and (2.30) with ¥ replaced by g* and A by
~*.
Notice that
m
Q(w)
(and thus g(w) = ~(w)) and lew) (and thus ~Cw) = g~lT(w),g~g) are
not statistics.
Proof The proof here is an application of theorems 2.9 and 2.10 with
-.t
t t
A
At
At t
. .
A
QN = CX , < 6 »
and few) (QN) = C8(w)' lCw)) . CombInIng those two
theorems we can see that the variance of the asymptotic joint distribution, evaluated at the true parameter value (¥,6) =
Cg*.~*),
is
given by
Y(w) ,1 =
caL(w/a¥)~*(a~Cw/ay)t+ca~(w/a <A»~*(af(w/a <6»t ,
Yew) ,2 =
(a~(w/ay)k.* (al(w)!ay) t+(a~(w)/a <6 >)~* (arCw/ a <6 » t and
Yew) ,3 = cai(w/aY)E* (ai(w/a~ t+(a!(w/ a
<
~ >)!* (ai (w/a < ~ » t ,
e
67
where the partial derivatives are evaluated at
2* = 2(1*)·
Cy,~)
= CM*,1*) and
The first step is then to evaluate the partial deriva-
tives.
Let U be an element of Cyt,
< f'j >t) t,
then applying Lemmas A.l
and A.2 Csee Appendix) we have that
(1)
A-I
mIl
a~Cw)/au = -5=1
L (aT()
G fw,s lau)f~(w)~s~Cw)
(2)
aD
~(w),l
m
lau = \ caT
s~l
(w),s
Xtf-l G f- l x6
laU)D
~(w),l~ ~(w)~s~(w)~(w),l
'
(5)
(6)
a~(w)/au = a~au + a~Cw)/au
(7)
a~(w)/au = [(tr%(~)~g%(~)(a~/aU))g]
m
and
+
"1
2[(tr%(:)~g%(:)3Cw) ,l)g]
AlB
A-I
A-I
+2S~1 (aT(w),s/aU)[(tr1(w)~g1(w)(~(w)~(w)8s~(w)
A
XD
Xt
~Cw),l~
-
G f- l
x
e ))]g
~s~Cw)~(w)
Using Cl) - (7) it can be shown that the partial derivatives of the
MLE' 5 with respect to one element of cxt, < ~ >t) t, evaluated at the
true parameter value
Cy,~)
=
(M*,~*),
can be written as
68
(a~(w)/aU)
(8)
I
= £(w)(aY/au)I
(~* ,1*)
= ~D(w) ,1~xt~-l
where ~P(til)
~ (w)
(~* ,I*)
- Q( )(a1( )/au) 1
W
W
(g* ,I*)
and
I
where ID( ) = [(tr1(-1 )§ ~(-1 ) ((a6/aU)
)) ] +2 [C~(t )
w
w g w
(g * ,~L:*) g
w ,g
x ~-(1)((aY/aU)
w
I
(g*,~*)
)) ],
g
g=l, ... ,m.
Equation (8) can be rewritten
as
CIa)
(a~( w)/aU) I( * L:*) = £( w) (aYlaU) I( * L:*) - Q( w) (I-m -k( w,
) 1E( ))-1
w
g ,~
g ,~
x
Z
~(w)
m
, l~ (w)
.
Therefore, using the fact that ay/au
= ~l
e. if U = Y.,
,!,j
1
U = <6>jk' and a6/au =
J.
:$
a if
a if U = Yi , Jjk if U = <6>jk' i=l, ... ,r,
k = 1, ... ,p,
where e. is a rx1 column vector with 1 in the ith
.
~l
position and a elsewhere, and Jjk is a pxp matrix 'llith 1 in the jk and
kjth positions and 0 elsewhere, we obtain
(11)
iS laY" I
(a~(w)
~)
Cg* ,1*)
(12)
(13)
= ~(w)
P
-Q (w) (aT~(w) laY"
~)
I
I(g* ,1*)
(a~C )/a< 8 »
= -Q( ) (ai( )Ia< 8 »
w
w
w
(g*,1*)
(aT
~(w)
I
laY"
=
~) (l!*,1*)
MCw),! =
t
-1
(I
-z
I l!* L:*)
,~
F ) -l z
M
~(w),l~(w),Y
-m ~(w),l~(w)
2[(~(w),g1(w))g] ,
C
g=l, ... ,m, and
with
'
69
C14 )
The
calc
)/a<A»1
=
,- w
.CM*,l;*)
theor~n
(I
"1Tl
-z~Cw),l~Cw)
F
)-l Z
M
~Cw),1~Cw),<6>
follows by applying (11) - (14) in the expressions for
Y(w),l' Y(w),2 and YCw),3 given before, and also by noticing that
t
t
(16)
H
-L:*M
-= 4[(u
T u
) ] - 4H~(w) ,
~Cw),X ~ ~Cw),X
~Cw),g~(w)~(w),h gh -
(18)
M
~*Mt
"'(w), < 6 >
"'(w) , < 6 >'"
-1
= 4Wt~-1
~*~-l W= 4[«G >t~~l
'" ~(w)", ~(w)~
"'g "'lw)
x
= 2[(trT(
'" w)G T( w)G'"h) gh] = 2J()
'" w (see Appendix,
x ~*~(
"'''' w)<G'"h » gh]
~g'"
o
Lerrona A. 7) .
The above theorem gives the asymptotic joint distribution of the
MLE's evaluated at the true parameter value (M*,l;*) which can be either
included or not in the parametrization under which the MLE's were
derived.
In fact, M* and l;* do not need to be patterned, i.e. do not
have to have the structures defined by (1.1) and (1.2).
When
(M*,l;*) is included in the reparametrization, we have that
(MCw),l;(w))
= (M*,l;*) and the so called asymptotic joint null distri-
bution of the HLE's.
have that 11*
~
For instance, for both
= X~*
with
~
Lt~*
~ ~
~
m
and
r
constrained we
= "'0
e and ~L:* = \L T*G
with ~st ~T* = y
.
g~g
~O
g=l
70
Replacing
y by
~~* and
m
8 by I
g=l
T~~g in C2.l5) and C2.l6) we obtain
~(3) = ~* and 1(3) = 1*, i.e. Cg(3),kC3)) = Cg*,k*)·
When Cg*'k*) is
not in the parametrization, we have the so called asymptotic joint
nonnull distribution of the MLE's.
This later distribution is useful
for constructing confidence intervals.
The following corollary shows how the expressions for the variance
in theorem 2.11 are simplified where Cg*'k*) is included in the
parametrization.
Corollary 2.6 The asymptotic joint null distribution of the
~ and I is given by C2.3l) with YCw),l = RCw),l' YCw ),2 =
~~E's
of
Q and
YCw ),3 = 2kCw),1 .
e
Proof As shown above, when Cg*'k*) is in the parametrization,
-1
CgCw)'kCw)) = Cg*'k*) and thus, l Cw ) = kCw)' ~Cw) = Q, fCw) = Q,
QCw ) = Q, tl Cw ) = Q and ~Cw) = kCw)· The result follows by using these
equalities in theorem 2.11.
0
Notice that, when the parameters are not constrained, w = 0,
t -1
-1
-1
-1
-1
-1
QCw),l = Rca) = C~ kCO)~)
and tCO),l = ,tCO) =[Ctr.t(O)§g.tCO)~)gh]'
g,h=l, ... ,m.
Following, we extend the result obtained by Anderson (1973) on
asymptotic efficient estimates.
2.3.1
Efficient estimates
In this subsection, we extend the result by Anderson (1973)
obtaining asymptotically efficient estimates of
~
and ! from one
71
iteration of the Method of Scoring described in subsection 2.2.0.
The
parameters mayor may not be subject to constraint and equations
(2.29) and (2.30) are going to be considered here as a summary of all
the likelihood equations obtained before.
The basic idea is to show that the estimate of
~,
obtained from
(2.29) with I(w) replaced by a consistent estimate of ~, is an
asymptotically efficient estimate of
~
in the sense of attaining the
Cramer-Rao lower bound for the covariance matrix of unbiased estimates.
Clearly it may not be the asymptotically most efficient estimate of
~.
For more details on asymptotic efficiency see, e.g. Cox and Hincley
The same idea can be extended for 1.
(1976 - p.304).
1
First notice that Nlim L(N~(~(w),N - ~(w)))
+00
= N(Q,2(w),1)'
where ~(w),N is the solution of (2.29) with I(w) replaced by k(w)·
This result is true even when the underlying distribution is not
normal.
Here,
~(w)
and k(w) are values of the parameters
~
and
~
subject to the constraints, if any.
Theorem 2.12
Let Yl' ... 'YN be identically and independently distri-
buted with mean
~
and covariance
(1.2) respectively.
~(w).
~,
which are defined by (1.1) and
Also, let I(w),N be a consistent estimate of
Then, if §Cw),N is the solution of (2.29) with ~(w) replaced by
k
A
A
k(w),N' Nlim L(N2(~(w),N - ~(w))) = N(Q,2(w),1)·
+00
"
If ~(w),N is
asymptotically efficient, so is ~(w),N (in the same sense).
Proof
72
_
A.
~
tA-1
- N {n11fto+Q(w),N1~ k(w),N(Y-~kftO)-.@(W)]-[~1fto
+D~(w),l~Xt ~(w)
t- 1 (Y-XM
e ) _R ]} ( where ~(w),Nl
D
. D
~ ~k~O ~(w)
IS ~(w) ,1
with k(w) replaced by ~(w),N)
=
(D~(w),Nl
Xt f-1
converges stochastically to
t -1
- ~(w),l~
D
Xt ~(w)
t- 1 )N~(Y-XR
)
~ ~(w)
~ ~(w),N
Q because p lim ~(w) Nl~tf(-:) N =
N-+oo'
A-I
,
,~
Q(w),l~ k(w)(k(w),N is a consistent estimate of k(w)) and N2(Y-~(W))
has a limiting distribution.
Notice that
matrices M
L,
BL and ~L are defined in subsection 2.1.1. From the
stOChastic~con~ergenc;wecan conclude that ~(~(W),N-~(W)) and
N~(~(w),N- ~(w)) have the same limiting distribution and if ~(w),N is
asymptotically efficient, then ~(w),N is (in the same sense).
0
In this work, the underlying distribution is the normal distribution and therefore ~(w),N is asymptotically efficient because it is
maximum likelihood estimate.
In other words, equation (2.29) with
"-
k(w) replaced by a consistent estimate of k(w) gives us an asymptotically efficient estimate of
~(w).
Theorem 1 in Anderson (1973) can be
obtained from the above theorem with w =
a
(unconstrained case).
73
Similar results can be obtained for T by writing equation (2.30)
in the same form as equation (2.29), i.e. by writing %(w)' Z(w),l and
i Cw )
as functions of !(w) and ~ instead of %Cw) and the gls.
See
Corollary 2.4.
Following Anderson (1973), initial estimates
(0)
(0)
.
.
T(w),l, ... ,TCw),m can be obtaIned from (2.30) wIth
y and LCw )
replaced by
by any positive definite matrix that satisfies the constraint
on 1, if any.
LCw )
~
~(w)
Then,
~~~~ and T~~~ obtained from (2.29) and (2.30) with
m
replaced by
k~~~ = gI1T~~~,ggg
are asymptotically efficient.
The next chapter deals with the hypothesis testing problem.
Likelihood ratio tests are generated to test hypothesis related to the
constraints defined before.
rnAPTER III
CCMPLETE DATA:
ONE -POPULATION CASE.
LIKELIHOOD RATIO TESTS AND ASYMPTanC DISTRIBUTIONS
3.0
Introduction
Three important hypothesis testing problems are considered in this
chapter.
They are related to the three tyPes of constraints discussed
in the previous chapter.
For each one, a likelihood ratio (LR) test
is developed and its asymptotic null and nonnull distributions are
presented.
The nonnull distribution is obtained under "local"
alternatives.
As already presented in Chapter I (subsection 1.1.1) the LR
Statistic is defined as
A=
sup L(~)/ sup L(~)
~
E
W
~
E
(3.1)
n
where L is the likelihood function, n is the parameter space and w is
the region in the parameter space specified by HO' the null hypothesis.
t
t t
A
At
At
t
Also, for ~ = (i ,r) and ~ = (~(O),r(O)) , the MLE's of ~ and r
presented in Chapter II (subsection 2.1.0), we have from corollary 2.6
with w =
0
lim
N+oo
M
A
L(v~(~-~)) =
N(Q,le-1 )
~
(3.2)
'IS
=
where I (Q)
fit~p'-l~
L.-
~[(trL
/'OJ
J'
-1 Q -1
GL G
/'OJg'"
,. . ,h) gh]
and from theorem 2.1 0
lim L(IN(h(~)-h(Q)) = N(Q, H(Q)I-l(Q)Ht(Q)),
(3.3)
N-+oo
where h(Q) is a q-dimensional function related to the constraints, as
for instance h(Q) = [1t Q] - 80 when only ~ is constrained, and
H(Q)= ah(Q)/aQ of rank q. Therefore, following Serfling (1980, section
4.4) we can say that under h(Q) = Q, i.e. under the null hypothesis,
lim L(Nht(~)[H(Q)I-l(Q)Ht(Q)]-lh(Q)) = x2 ,
q
N-+oo
2
a central X with q degrees of freedom, and under h(Q) =
(3.4)
~
Q+N2~,~
a
.,
fixed qXl vector, i. e. under a sequence of "local" alternatives,
(3.5)
a noncentral X2 with q degrees of freedom and parameter of noncentrality
equal to
~
t
[H(Q)1 -1 (Q)H t (Q)] -1 ~. The above results will play an
important role when obtaining asymptotic distributions for the LR
statistic.
All the tests are based on a random sample, Yl , ... ,YN, of size N
from a p-variate normal distribution with mean vector ~ and covariance
matrix
~
described by (1.1) and (1.2).
76
3.1
Testing
HO,(l)~ 1~ = ~O
Here, we want to construct the LR test to test the null hypothesis
t
t
r
HO,(l): 1 ~ = ~O vs. Ha ,(l): 1 ~ ~O' where 1 and ~O are defined in
subsection 2.1.1. For this situation, the parameter space n is the
~, ~
space
where
~
is given by (1.1) and
~
by (1.2) with m specified.
w is the region in this space with ~ satisfying the constraint 1t~=20.
Theorem 3.1 The LR statistic to test the null hypothesis HO,(l) vs.
Ha,(l) is given by
(3.6)
where %(0) is the MLE of ~ discussed in subsection 2.1.0 and f(l) is
the MLE of
~
discussed in subsection 2.1.1.
The null hypothesis is
rejected if 1..(1) is "too small".
Proof
As said before, we need to evaluate Sup L when the parameters
belong to
n and
w.
Notice that L is the likelihood function given by
(2.1) .
(i)
\Vhen
(~,~)
exp{ -Np/2} because
n, we have Sup L = (2TI) -Np/2 I~(o) ,-N/2
A-I A
A
€
tr~(O)~(O)
= p.
.
x
This last equality comes from the
.
A-I
A-I A
fact that OL/OT g = O,g=l, ... ,m, ImplIes tr~(O)Qg~(O)~(O) =
trf(~)gg ~ trf(~)(gIli(o),~g) x ~(~)£(o) = trf(~)(gIli(o),ggg) ~>
trk(~)%(o)f(~)~(o)
(ii)
=
\Vhen (};I. , D
exp{-Np/2} because
trf(~)f(o)
€
=>
tr%(~)~(O)
= trIp = p.
w, we have Sup L = (2 TI)
A-I A
tr~(I)~(I)
-Np/2
Ik lI) I -N/2
A
x
= p (see above).
The result follows using (i) and (ii) in the expression (3.1).
0
77
In general, the exact distribution of A(l) is not known or it is
difficult to evaluate.
Asymptotic results, which follow, are then
necessary.
Theorem 3.2.
Under certain regularity conditions that are satisfied
here and also under the null hypothesis HO,(I): 1t~-fto = Q
lim
N+oo
L(-2Iog A(l)) = x~
(3.7)
The null hypothesis HO,(I) is rejected when -2log A(l) is "too large".
Proof This is a straightforward application of the theorem given by
Serfling (1980, p. 158) and expression (3.4) with
h(~)
= 1 t ~-~o' tl(ft) = [1t Q] and q = £. From that theorem we can see
that under HO,(I)
lim
L(-2log A(l)) = lim L(Nht(~)[tl(ft)I-I(ft)tlt(g)]-lh(§))
N+oo
N+oo
o
and the result follows from (3.4).
Theorem 3.3 Under certain regularity conditions that are satisfied here
and also under a sequence of "local" alternatives Ha,(l)N: 1t~-.§0
=
-k
N 2Q1 , Ql a fixed £xl vector,
(3.8)
a noncentral
x2 with
£ degrees of freedom and noncentrality parameter
~t[Lt-XtL-IX)-lL]-l~ .
a1 = ~l
~ l~ ~ ~
~
~l
78
Proof This is an application of the concept of contiguity (Hajek and
v
Sidak (1967, Chapter VI - section 1)) and expression (3.5) with
h(~) = 1t~-~0 ' B(~) = [1t Q] and q = ~.
Let
~
be the likelihood function given by (2.1) subject to
a,(l)N
t
-1.:
t
1 ~-~o = N 2Ql and ~ be (2.1) subject to 1 B-~o = Q. Then,
o
T~ = log 1ia ,(l)N 11io,(1) = -~(~Ql)t~-1(~Ql)+(~1Ql)t~-1[/N(~i~lki)]
where k.i
=
Xi-~LQO - ~LQ; , Q;
=
""-I
f'.j
1!~~ , and M
L, BL and 1!L are defined in
'"
,...."
I"'oJ
'"
As under the null hypothesis """'1
Z.
subsection 2.1.1.
~
NIDp (O,~),
we have
,. ,." '"
N
lim L(IN(~ I k.i)) = N(Q'k) (applying the central limit theorem)
i=l
N~
and lim L(T A) = N(- (~)o 2, 0 2) with 0 2 = (~LQl) tk.-1 (~LQl), i. e.
that
00
N~oo
~
a, (l)N
"',...."
~
is contiguous to
corollary).
~
v
0
(Hajek and Sidak (1967, p. 204/
~
Therefore, as
is contiguous to
1
alternatives
lim
N
~
L(-2log
00
A
)= lim
(1)
N ~ 00
~
, under the "local"
0
L(Nht (8) [H(8)I- 1 (8)Ht (8)]-1 x
~
-
~ ~
-
~ ~
-
b(Q) and the result follows from (3.5).
3.2 Testing
o
~,(2)~~=Xa
Unlike the previous section, in this section we test hypothesis
on 1 and not on .§ .
t
HO, (2): ~ 1=2:0 and
2:0' where § and 2:0 are defined in subsection 2.1.2.
The null and alternative hyPOtheses are:
Ha :(2): §t1 r
Here, n is the space
~,k.,
where
~
is given by (1.1) and k. by (1.2) with
m specified, and w is the region in this space with 1 satisfying the
constraint §t1=Xo'
one in section 3.1.
Notice that the parameter space
n is
the same as the
79
Theorem 3.4
The LR statistic to test HO,(2) vs Ha ,(2) is given by
(3.12)
where ~(O) is the MLE of ~ discussed in subsection Z.l.O,
fez)
is the
MLE of ~ discussed in subsection Z.l.Z and
A
t tA-l
-1 tA-l A
.
A
A
V(Z) = rO(§ k(Z)§) C§ ~(Z)~(2) -XO) wlth ~(2) and ~(2) given at (Z. 30).
The null hypothesis is rejected if A(2) is "too small".
Proof As in the last theorem, we want to maximize the likelihood
function
1(~,1)
(i) (~'k) E n, and (ii) (~'k)
Ik(O)I- N/ Z exp{-Np/2}. In the second
given by (2.1) when:
For (i), Sup L = (2n)-Np/2
case, Sup L = (2n)
-Np/2
x
x
1~(2)
I -N/Z
A
exp{-(N/2) (P+Y(2))}'
E
w.
This last
equality comes from previous results obtained in the proof for theorem
A-I A
t
In that proof, (2) = 0, g=l, ... ,m --> trk(Z)~(2) = p+(2/N)!Oa(Z),
tA-l
-1
tA-l A
.
where a(2) = (N/2)(S k(2)§) x (§ k(Z)~(2)-rO); a result obtalned by
2.6.
solving (4).
The result follows using (i) and (ii) in (3.1), the
o
expression for the LR statistics.
The LR test given by Anderson (1970) to test whether the covari-
ance matrix is given by (1.2) with m specified, and the extension for this
test when M=
theorem.
~~
given in Chapter I, are special cases of the above
In both cases
fn = f(o)
is obtained explicitly (see subsection
1.1.1) .
In general, the exact distribution of A(2) is not known, or it is
difficult to be determined, and asymptotic results are necessary.
next two theorems are on these asymptotic results.
The
80
Theorem 3.5 Under certain regularity conditions that are satisfied here
and also under the null hypothesis HO,(2): §t1 - Xo = Q
2
lim L(-2log A(2)) = x .
(3.13)
s
N-+oo
The null hypothesis HO,(2) is rejected when -2log A(2) is "too large".
Proof It is exactly as the proof for theorem 3.2 with
tl(Q)
h(g)=
§t1 - Xo
'
= [Q §t] and q = s.
0
Theorem 3.6 Under certain regularity conditions that are satisfied
here and also tmder a sequence of "local" alternatives
Ha , (2)N:
t
-!.:
§ 1-2:0 = N 2~2' ~2 a fixed sxl vector,
(3.14)
a noncentral x2 with s degrees of freedom and noncentrality parameter
aZ
t t
-1
-1
-1-1
= 2~2{S
[(trr G r Nh
~) gh] S} ~Z'
~
~
~
~g~
~
~
Proof It follows the same ideas as the proof for theorem 3.3.
All we
~
need to show is that
, the likelihood ftmction given by (2.1)
a,(2)N
subject to §t1 -10 = N-~k2' is contiguous to ~
, the likelihood
0, (2)
ftmction subject to ~t!-ro
Let
ktI
= Q.
be the covariance matrix
k
subject to
~
t
1-XO
= N-!.:2~2
and
a
ktI
o
be k subject to
t
~
1-XO = Q.
Then kH = 4I
a
+
-~.
N
~,
where M is a
0
symmetric pxp matrix whose elements are ftmction of the elements of §,
the Q's and
~Z'
Also,
81
= N-~tr(1HlMO
o
- (~)N-ltr(1Hl~2 + (1/3)N- 3/ 2q ,
0
.
(see Appendix, Lemma A.9 with ~ = kH and ~ = N-~) where Q is a pxp
o
matrix with finite eigenvalues and q <
00.
Combining the above results
we can see that
T
= log T__
/
1~a,(2)N
~0,(2)
=
(-N/2)log
l~l~
I - (~) ¥~~(~~l_~~l)k'
·-n6~a
i=l 1 ·~a ·~O 1
82
N
N-~[N-1
.2
1=1
k~Q ki] converges stochastically to zero t
-1 ~ t
-1 2-1
N
l ki[e~ ~ ~H ]k- converges stochastically to its expected value
i=l
0
0 1
tr(~lM)
o
and lim
N + 00
N(Ot 2tr(~lM)2).
o
L(~{N-1 I k~[ek~l~~l]k. - tr(~lM)})
i=l
0
1
0
0
1
=
From Hajek and ~idak (1967, p. 204/coro11ary)
r"
is contiguous to r"
and it follows that the asymptotic
a t (2)t N
Ot(2)
distribution of -21og 1.(2) under the sequence of "local" alternatives
is given by (3.5) with beg) = ~t!-Xo'~(~)
= [Q
§t] and q
= s.
0
This last section deals with the problem of testing
. t _
t _
. t
t ,J
HOt (3)· b ~ - ~O and §! - 10 versus Hat (3)· 1 ~ f ~O and/or § Lr 10'
where ~ and ~, and ~ and
respectively.
~,~
where
~
Xo
are defined in subsections 2.1.1 and 2.1.2
For this situation, the parameter space n is the space
is given by (1.1) and k. by (1.2) with rn specified.
region w is the region in n with
straints.
Notice that
n is
~
The
and 1 satisfying the above con-
the same as in the last
t~~
section.
Be10w t we present the LR statistic to test HO,(3) and its asymptotic null and nonnu11 distributions. The proofs for these theorems are
extensions of the proofs given in sections 3.1 and 3.2, and are omitted
t
here.
~~J
For the asymptotic distributions we have that h(e)
·
H(e) =
~t iJ
and q =
~+s.
-_ f1t .J:1 s
~
e
83
Theorem 3.7 The LR statistic to test HO,(3) vs. Ha ,(3) is given by
(3.15)
ECO )
and f (3 ) are the MLE's of k discussed in subsections 2.1.0
.
A
t t A-1
-1 tA-l A
and 2.1.3 respectIvely, and Y(3)=XO(§ k(3)§) (§ k(3)~(3)-XO). The
where
0
null hypothesis is rejected if A(3) is "too small".
Theorem 3.8 Under certain regularity conditions that are satisfied
here and also under the null hypothesis
(3.16)
o
The null hypothesis is rejected if -2log A(3) is "too large".
Theorem 3.9 Under certain regularity conditions that are satisfied
here and also under a sequence of "local" alternatives
~
,...,
Ha , (3)N:
~2
tl<_e~
""'0
~
st_~
~ ! 1(1
=
-!.:
~ 2~,~
=
~
f::,~
~l
~2
a fixed t+s vector with
~l
and
defined in sections 3.1 and 3.2 respectively,
lim L(-2log A(2)) = xi+s(a 3) ,
C3.17)
N+oo
a noncentral x2 with t+s degrees of freedom and noncentrality parameter
a = ~i[1t(~tk-l~)-11]-1~1 + 2~~{§tX[(trk-lQgk-l~)gh]-1§}-1~2 = a1+a 2
3
with a1 and a 2 defined in the last two sections.
0
rnAPTER IV
C(J·1PLETE DATA:
K-POPULATION CASE.
ESTIMATION, LIKELIHOOD RATIO TESTS AND ASYMPTOTIC RESULTS
4.0
Introduction
In this chapter, the problems of estimation and hypothesis testing
for the complete-data situation, when we have more than one population,
are studied.
As
in Chapters II and III, we are interested in obtaining
HLE's for the paraJTleters when they are both subject to or not subject
to constraints and also, in developing appropriate LR tests to test
hypotheses associated with those constraints.
asymptotic distributions for the
~fiE's
Both null and nonnu1l
and LR statistics are obtained
as a natural extension of the ones obtained in the last two chapters.
Also, asymptotically efficient estimates of the parameters are given.
The studies are based on a random sample of size N, which is the
result of K independent random samples, each one with
K
N , d=l, ... ,K, N = I Nd , elements taken from K distinct populations.
d
d=l
For population d, we describe our random sample as Yd1"."YdN
d
taken from Np (~,l;d)'
n.
.
The mean l!d is given by
(4.1)
~ = [?$d1'·" '~r ],
f\1
= (Bd1 ,· .. , Bdr ) t, where the ?$d' s are knOloJl1,
d
linearly independent (for convenience) Pdx1 column vectors and the
d
85
6d 'S are unknown scalars.
The covariance matrix is given by
md
kd = kd(!d) =
md
5
L Ldg%g
(4.2)
g=l
Pd(Pd+l)/2, !d = (Ldl,· .. ,Ldm )t , where the Qd's are known,
d
linearly independent PdXPd symmetric matrices and the l d 'S are unknown
scalars. It is also assumed that there exists at least one value of
!d such that
ku (!d)
> 0, i. e. kd is positive defini te (p. d. ) .
4.1 No common parameters among the populations
When the populations do not have cormnon parameters, each population is studied separately from the others.
The reason is that the
random sample Ydl' ... 'YdN does not contain any information related
d
to the parameters .§d' and !d"
d' # d = 1, ... ,k.
K one-population cases to study.
Therefore, we have
For each case, MLE's and asymptotic-
ally efficient estimates, LR tests and asymptotic distributions can
easily be obtained from the results presented in Chapters II and III.
For instance, for population d, d=l, .•• ,k, the likelihood equations
are extended from (1.8) and (1.9), and
~ = (~fdl~) -l~~id lYd
\~itten
here as
(4.3)
and
(4.4)
86
The next section deals with situations where the populations have
some common parameters.
4.2 Common parameters among the populations
Basically, in the more than one population case, the main goal is
to make comparisons among the parameters of different populations, i.e.
to test hypotheses which relate parameters from distinct populations,
as for instance, the hypothesis M1 =... = Mk and
k1
=... =
kk'
Clearly
under those hypothesis the populations have some common parameters.
So, to obtain the MLE's of the parameters for the LR tests, part (or
all) of the K random samples have to be grouped according to the
common parameters.
Of course, it is impossible to list all the different situations
where we can have common parameters among the populations.
But,
there are some situations which are very important and appear frequently
in data analysis.
5zatrowski (1979, 1981) discussed ML estimation,
LR tests, and asymptotic results for the following situations:
(i)
(ii)
(iii)
M1 =
= Mk = Mand kl =... = kk =
k1
=
=
kk
=
k
k
and
MI =..• = Mk = M given that kl =•.. =
where, in each case, rd=r, Pd=P,
!!!d=m,
~=~,
(these quantities are defined in Chapter I).
kk
=
k,
and %g=gg' d=l, ... ,k
In (i) one tests HO:
Expression (i) is true vs Ha : Expression (i) is not true. In (iii)
87
one assumes tl = ... = Ik=L and tests
HO:~l=
... =Bk = B vs Ha:H O is
not true.
Throughout this chapter we will consider the potential restrictions:
(iv)
~l
= ... = ~
=~
and 11
= ... ='1k
=1
in which we do not require Xd = X or Gd = G. The number of elements
- g -g
of Y
di can vary from population to population and, therefore, the
number of rows of
rd
= rank(~) =
~d
and the dimension of
~
can vary, but we require
r, and the number of elements of .1d' md=m.
The
restrictions (iv) can be shown to be suitable for several applied
situations, as for instance, the covariance components models described
by Dempster, Rubin and Tsutakawa (1981) and the random-effects models
for longitudinal data described by Laird and Ware (1982).
The results
obtained for this case will also be useful for the next chapter, where
incomplete data are studied.
Notice that two other situations similar to (ii) and (iii) could
also be considered here.
and
(v)
(vi)
They are:
~l
=... = ~ = ~
given that 11
=... = 1k = 1·
The problems of estimation and hypothesis testing for these two situations can be studied following the ideas discussed in the next section.
In this section, we discuss the problems of estimation and hypothesis testing related to the parameters of the K populations, when
38
they are supposed to follow .§l = ..• = ~ = .§ and .11 = ... = .1k =
,1.
MLE's of .§ and .1, when the parameters are either subject to or not
subject to constraints, asymptotic distributions for those MLE's and
efficient estimates for the parameters
~
and .1 are discussed here.
Also, LR tests related to the constraints and their asymptotic distributions are presented.
In some sense, this section parallels
Chapters II and III.
Under the above assumption, the random sample of size N can be
described as XU'··· ,XlN , ••. ,Xkl , ... ,XkN
1
where
k
m
~i .... NIDpd(lI.d = ~d~' kd =
I T Q ), and the likelihood function can
g=l g dg
be written as
where ~d = ~d(l!u = ~.§) = ~+(Yd-l!u)(~-lI.d)t ,with Yd and 8d given at
(4.4). The MLE's of B and T are the values of these parameters that
maximize (4.5).
The next two subsections deal with this problem.
4.3.0 Estimation:
Unconstrained parameters
This subsection parallels subsection 2.1.0, which deals with the
estimation of unconstrained parameters in the one-population case.
Next, the likelihood equations for this new situation are presented.
Theorem 4.1 The likelihood equations for the K-population case when
the parameters are not constrained can be written as
(4.6)
89
and
k
IlO),k =
k
[(d~lNdtrf(~) ,d~dgf(~),d~dh)gh]-l[(d~lNdtr~(~),d~g
x
(4.7)
g,h=l, .•. ,m, where %(O),d = kd(!(O) ,k) and ~(O),d = ~d(~(O),d =
~d~(O),d)'
The subscript (0), k denotes that the ~fiE's are being evalu-
ated with the parameters unconstrained and are related to the
K-population case.
Proof Let
£(~,!)
be the loglikelihood function for this problem.
Then,
and the likelihood equations are the equations defined by
a£(~,!)/a~
= Q and
a£(~,!)/aTg
= 0, g=l, ... ,m.
The partial derivatives
are (see Appendix, Lemmas A.l and A.2 for special results on matrix
derivative):
and
~
Expression (4.4) comes directly from (1) = Q and expression (4.5) comes
from (2) = 0, g=l, ..• ,m, by noticing that tr~dlQag
=
90
o
.
As pOInted out before, the matrices
k
L Nd~)
d=l
and .Ed =
~d
=
[(tr~-(~) ' dgdg~-(~) , dgdh) gh]
tA-l
~dk(o),d~
(and thus
k
(and thus
,
L Nd.Ed)
d=l
are
p.d. provided%(O),d is p.d. (see Appendix, Lemma A.O).
Notice that equation (4.6) can be rewritten as
(4.8)
tt A
.
A
A
t
where ~ = [~l: ..• :~k] , k(O) = dIag( (l/Nl)k(O) ,1" .. , (l/Nk )k(O) ,k)
and
y = (Yi, ... ,Y~) t,
and therefore (see Appendix, Lemma A. 5) ~(O),k
can be obtained from (4.8) with f(O) replaced by diag(l/Nl, ... ,(l/Nk))
if and only if the r columns of
of diag(kl"",kk)'
are spanned by exactly r eigenvectors
~
Of course, we are assuming that the problem is in
the canonical form (see definition 2.1), i.e. there exists 1+ such
+
A
that kd(1 ) = lpd' d=l, ... ,k.
The MLE
~lO),l
is expected to have this
explicit representation only in very special situations.
it can be shown that even when we have the r columns of
exactly r eigenvectors
of~,
For instance,
~d
spanned by
d=l, ... ,k, the necessary and sufficient
condition, described above, is not satisfied, t.mless some additional
assumption is made (e.g. all the
population case).
~'s
and k's are the same; the one-
In fact, explicit solution for the likelihood equa-
tions is expected in only a few cases; hence conditions for S-explicit
representation (see definitions (1.1) and (1.2)) of the MLE's are no
longer going to be discussed throughout this chapter.
Iterative pro-
cedures will be discussed later.
MLE's of the parameters subject to constraints are discussed next.
4It
91
4.'3.1 Estimation:
Constrained parameters
Below, we give the likelihood equations for the situation where
both
~
and ! are subject to constraint.
The equations for the situa-
tions where only one of the parameters is constrained are easily
obtained from the ones given in the next theorem.
In fact, even
equations (4.6) and (4.7), l.D1constrained case, can be shown to be a
special case of these equations.
Theorem 4.2 The likelihood equations for the K-population problem
when
~1 = ... = ~ = ~
to the constraints kt~
and !,
= ••• =
!k
=
!' and
~
and ! are subject
= ~o and §t! = lo' where k' ~O'
§ and rO are
defined in subsections 2.1.1 and 2.1.2, can be written as
~(3),k = ~(~),k~(3),k-Q(~),kk(k~(~),kk)-1(L~(~),k~(3),k-~O)
(4.9)
and
A
!(3),k
A-I
A
~-I
= k(3),k~(3),k--(3),k§(§
tA-I
-1 tA-I A
k(3),k§) (§ k(3),k~(3),k-ro)
(4.10)
A
A
A
tA
-1 t
where k(3),d = kd(!(3),k)' n(3),kl = BkCBkn(3),kBk) Bk ,
~
tA-l
A
~
tA-1 _
A
n(3),k = d;lNJ~dk(3),d~' £(3),k = d;lNd~k(3),dYd
A
tA
-1 t A
k
A-I
A-I
k(3),kl= ~(B~k(3),~) B§' k(3),k= [(d!lNdtrkC3),d~gk(3),dgdh)gh]'
A
~
A-I
A-I
A
~(3),k = [(d;INdtrk(3),d~gk(3),d~(3),d)g]
~~(3) ,k)'
and
A
A
~(3),d=~(~(3),d =
The matrices ~ and ~, and ~ and ~ are defined in Chapter
92
II, subsections 2.1.1 and 2.1.2 respectively. Also, the subscript
(3),k denotes that the MLE's are being evaluated when the parameters are
subject to the constraints and we are in the K-population case.
matrices
!S~f(~),d!Sd
"-1
(and thus Q(3),k'
"-1
1~(~),k1 and:E1ij(3), kE1)
The
and
"t"-l
[(trk(3),d~gk(3),d~h)gh] (and thus k(3),k' § k(3),k§ and
t"
E§k(3),k~)
"
are p.d. provided k(3),d' d=l, ... ,k is p.d. (see Appendix,
Lemma A. 0).
Proof Let
0.
1
(£xl) and 0. 2 (sxl) be the Lagrange multiplier vectors and
£(~(3),k'!(3),k) = logL(~(3),k'!(3),k)' L given by (4.5) with ~ and
replaced by ~(3),k and !(3),k respectively.
with respect to ~(3),k' !(3),k' ~ and Q2·
l
We maximize
The partial derivatives
(see Appendix, Lemmas A.l and A.2) are:
"
k
t"-l
t"-l
"
(1)
dQ/d~(3),k = dIlNd(~k(3),dYd-!Sdk(3),d!Sd~(3),k)
(2)
dQ/d~(3).kg= (-~)dllNd(tri(~).d~g-trf(~).d~gf(~).d£(3).d)+~~22'
g=l, .•. ,m, where ~g is the gth column of §t ,
From (1) = Q and (3) = Q, we can see that
+
1Q1 '
93
and the first expression in (4.9) follows by using Lemma A.3 (see
Appendix). The second expression in (4.9) can be obtained from the
.
. .
t"'-l
-1 t
'"
'"
'"
. fIrst one by notIcIng that 1(1 2(3),k1) 1 = Q(3),k- Q(3),k2 (3),kl x
~(3),k and
1111 = 1£
(see Appendix, Lemma A.4).
k
.noticing that
On the other hand, by
k
d~INdtrf(~),dQdg= [(d~lNdtrk(~),dgdgf(~),dgdh)h]ti(3),k'
we obtain from (2) = 0, g=l, ... ,m, and (4) = Q the following system of
equations
The two expressions for i(3),k given by (4.8) follow from the above
system of equations by using again Lemmas A.3 and A.4 (see Appendix). 0
Explicit solutions for these equations are not expected to exist,
except for very special cases, and iterative procedures, which will be
presented in the next subsection, are necessary.
Below we give the likelihood equations related to the situations
where only one of the parameters is constrained.
Corollary 4.1 The 1ikelihood equations for the situation where onl y
~
is constrained are given by (4.9) with the subscript (3) replaced
,..".
A-I
'"
by (1) and !(l),k = ~(l),k!(I),k .
0
94
Corollary 4.2 The likelihood equations for the situation where only r
"
... - 1 "
is constrained are given by ~(2),k = ~(2),k2C2),k and (4.10) with the
subscript (3) replaced by (2).
Q
Clearly equations C4.6) and (4.7), unconstrained case, are given
by the HLE's of
~
and
r
given by corolla'ries 4.2 and 4.1 respectively,
.
with the subscripts (1) and (2) replaced by (0).
4.3.2 Estimation:
As
Iterative procedures
in the one-population case, the Method of Scoring and the Ht
algorithm are the ones suggested to be used here in order to obtain
the MLE's of
(i)
~
and
r.
The Method of Scoring
First notice that the likelihood equations given before for both
the constrained and unconstrained cases can be summarized as
= B- 1
R
~Cw),k
a
_B-1
LCLt B- l
a
L)-l cLt fi- l
-8 )
- ~Cw),k-Cw),k ~O
~Cw),k~Cw),k ~lw),k~ ~ ~Cw),k~
(4.11)
and
T
= Z-1
~Cw),k
_2- 1
i
scst 2- 1 S)-l cst 2- 1 Z
-y )
- ~Cw),k~
- ~Cw),k~(w),k ~O
~Cw),k-Cw),k -Cw),k~
(4.12)
w = 0,1,2,3, where kCw),d = ~CiCw),k)' fCw),d= fdCQCw),d=~gCw),W
~d
_
~(w),dd with ~d given at C4.4) and ~Cw),dd = CYd-~Cw),d)
+
~
t
-'
CYd-~(w),d) , ~(w),k1
t"
-1 t
-'
x
81 , ~Cw),k =
k
t-'-l
"
k
tA-l _ -'
t'"
-1 t
L
L
Nd~dICw)
d~d,2lw)
k
=
d
Nd~kCw)
dYd'
kCw),k1=
Bs(BS~Cw),kBs)
ES,
d=l
'
,
=1'
~
~
~.
~
=
~C81~(w),kB1)
=
95
A
~
A-I
A-I
A
k(w),k = [(d~lNdtrk(w),d~dg~(w),d~dh)gh] and ~(w),k =
~
A-I
A-I
A
[(d;lNdtrk(W),d~dgL(w),df(w),d)g]' g,h=l, ... ,m.
1 = Q,
Notice that,
£
= 0,
= Q, ~-;1 = ~0 and R~L = -r
I for w = 0,2, and they are as defined
in subsection 2.1.1 for w = 1,3. On the other hand, s = 0 '1"'0.1S = .0. . . . ,
~O
10 = Q,
t'J
t~
= Q and B.§ =
1m for
subsection 2.1.2 for w = 2,3.
w = 0,1, and they are as defined in
Also, it is easy to see that the values
of ~(w),k and !(w),k do not change when we replace Nd by f d = Nd/N in
the expressions for Q(w),k' ~(w),k' %(w),k and i(w),k·
These later
forms will be more appropriate when studying asymptotic results.
As in subsection 2.2.0, the rth, r=1,2, ... , iteration of this pro-
cedure can be described as it follows: with t(w),d' w = 0,1,2,3,
_
(r-l).
(r)
d-l, ... ,k, replaced by k(w),d we obtaIn ~(w),k from (4.11). Then,
~~~~,d = ~d~~~~,k' ~~~~,dd
(r)
.
=
(Y-~~~~,d )(Y-~~~~,d)t,f~~~,d= ~d+~~~~,dd
A
(r)
A
and !(w),k from (4.12) wIth f(w),d replaced by f(w),d and ~(w),d by
(r-l)
L(w),d
(r)
_ m (r)
The value of kCw),d - g~lTCw),kg§dg is the value of
I Cw ),d
for
(r)
(r-l)
(r)
the next step. The procedure stops when ~Cw),k - ~(w),k and !(w),k
(r-l)
!Cw),k are small. Convergence of this procedure is not guaranteed.
Two important results are:
(1)
The algorithm converges in one iteration from any starting value of
~(w),d provided the ~UB's have S-explicit representation, and
(2)
~~~~,k and !~~~,k are asymptotically efficient estimates of ~ and
provided
4.3.4).
~~~~,d is a consistent estimate of ~C~) (see subsection
1
96
(ii)
The EM algorithm
Again, as in subsection 2.2.1, we use the strategy described by
Andrade and Helms (1983) to generate an artificial incomplete-data
situation where the B-1 algorithm can be applied to find the
parameters in the original problem.
strained is discussed here.
Only the case where
Results for when
~
and
r
~
~·n..E' s
of the
is con-
are unconstrained
can easily be obtained from the ones presented here with
1 = Q, E1 = lr
and QO = Q.
For population d, d=l, ... ,k, consider the situations where
Ydi
= ~d~
+ ~i +
fdi ' i=l, ... ,Nd , where
Qdi]
[Ed'
~
1
and k.d = k.d 1 + wIn·
,
matrices
~dh
1'd
The unknown parameters Yh and the pdXP d knO\\11
dg = Tg and Qdg in (4.2),
t
t t
Also, w > O. If (Ydi , ~di) ,
are defined in the same way as
the expression that defines kd.
T
i=l, ... ,Nd , d=l, ... ,k, is assLUTled to be our "complete-data" random
sample, the likelihood function is given by
k
- L NdPd/ 2
L(~'X,wl' ... '~) = (2n) d=l
(4.13)
97
N
d
Cd
,
2(~d = ~d~) = (lINd) .L (Ydi-~d~)t6cdi-~d~)' Ydi=Xdi-~di·
Also, it
1=1
can be shown that the likelihood equation for this artificial situation
with ~ subject to 1t~ = QO' can be written as
.§
=
(4.14 )
t k
t.
-1 t k
t
k
t
= ~L~O+ BL[BL( L Nd~d2Sd)EL] EL[ L Nd2SdYd- ( L N~d2Sd)ML~O] ,
- - d=l
- d=l
d=l
k
Y=
-
[(
k
L Ndtr%dll~dh%dll~dh')hh,]-l[( L Ndtrldll~dh1dllfd
d=l
"
d=l"
,
l)h]'
(4.15)
h,h' = l, ... ,m , and
l
w
= (II
k
k
"-
d=l
d=l
L NdPdH L NdCd
'
2),
(4.16)
~
where ~ = (l/Nd)i~l~di' kd,l = kd,l(Y) with
Cd ,2
= Cd,Z(Qd = 2Sd~)·
The matrices
M
and
1
X= (Yl ,··· ,Yml )
t
.
and
E are defined in sub1
section 2.1.1.
f
The above likelihood equations can be solved explicitly for
Nd
Nd
Nd
and ~ as long as we know (.L X~iXdi' .L 2S~Xdi' .L £di£~i)' d=l, ... ,k,
1=1
1=1
1=1
the "complete-data" sufficient statistic. Explicit solution for
i
will depend solely upon the structure of
statistics described above are not known.
kct ,1.
Clearly the sufficient
In fact, only the I's, which
in this context are suppose to be the incomplete data, are observed.
So, we first "complete" the lDlobserved "complete data" and then use
98
(4.14)-(4.16) to obtain the MLE's of ....B and ....T, the parameters of the
original problem.
The rth, r=1,2, ... , iteration of the
~1
algorithm can be described
as:
(1)
Estep
Computes the conditional expectation of the "comp1ete'data"
sufficient statistic given the observed data X11, ... ,XkNk' and the
estimated values of parameters from the (r-1)th iteration.
In this con-
text the equations are:
Nd
\ t
I
~(r-l)
(r-1)
(r-1)]
E[i;lXdiXdi r11,···,rk~; ~
, X
' w
N
\d
1=
(r) t
(r)
= .1....
L (Yd'-£d'
1
1 ) (Yd'-£d'
.... 1
1 )
+
(r)
NdtrYd '
(4.17)
N
d
t
(r)
(4.18 )
= . I ,...Xd(Yd'-Od')
.... 1 ,... 1
1=1
and
(4.19)
"t"Ion f or "r11, ...
where '#" IS a typograph'lca 1 abbrevla
y
....
(r-l) ,
W
(r-l)"
Q(r-1) ,
,rkNk;~
,
(4.20)
(4.21 )
99
(2)
M step
Evaluates the values of the parameters which maximize (4.13) with
the sufficient statistics replaced by their condition expectations
evaluated at the E step.
Here,
(4.22)
(4.23)
h,h' = 1, ... ,m1 , and
(4.24 )
-(r)
where V
d ' d-1
- , ... , k , l"S the arl"thmetl"c mean of vCr)
-di --)T-di _6(r)
-di '
r(r) = r
~d,l
(y(r)) with yCr) = CyCr), ... ,yCr))t, CCr ) =
~,1 ~
~
C1/N ) I 6(:)6 C:)
d i=l~dl ~dl
fV(:)~~dl
~
t
+
~1
~d,l
~
vCr) and C(r) = (liN) I (v(:)-x .§Cr))t x
~d
d,2
d i=l ~dl ~
X
A(r)) + try~r).
~d~
-u
respectively.
~1
6 C:) and VCr)
are given by (4.20) and (4.21)
~d
~dl
100
Notice that, one should have
(4.23) can be solved explicitly.
~
, l' d=l, ... ,k, such that equation
Otherwise, an iterative technique,
for instance the Method of Scoring, is necessary to solve (4.23) at
each iteration of the procedure.
method unusable.
This fact would render the entire
When ~,1 is such that r(r) has S-exp1icit represen-
tation, X{r) is obtained from (4.23) with kd(r ) replaced by I
,1
""'Pd
In some situations, for instance the covariance components
models described by Dempster, Rubin and Tsutakawa (1981) and the
random-effects model for longitudinal data described by Laird and Ware
(1982), there exist a natural way of writing kd as a sum of two
matrices.
£di]
[ Ed'
- 1
In those situations, YdI" = XdA
+ ZdCd*'+
Ed"
_K,
-I I where
Q
- NID
qd+Pd
wI
""'Pd
l
J
m
1
Pdxqd matrix of rank qd
s;
Pd' kd,l = h~lYh!:!dh' £di '" ~d£di and
~,1 = ~dkd,l~~' Also, Yh and !:!dh are as Tdg = Tg and Qdg in (4.2).
Thus, using the results obtained above and considering
~di
=
Yd' - -cd'1 = ....,Yd,-Zd§~'
, the rth, r=1,2, ... , iteration of the algorithm
1 ....,
...., 1
~1
can be described as:
(1)
Estep
Evaluates
and
101
°th ~*(r-l) = ~* ((r-l)) and
kd,l
kd,l
r
WI
(2)
Mstep
Using the conditional expectations obtained at the E step, it
evaluates ~(r) from (4.22), wd r ) from (4.24) and !(r) from (4.23)
modified by adding a subscript asterisk to each instance of kdri
, '
(r)
t!dh' and £d ,1 ' and
c*(r)
t 4.25)
-d,l
In general, qd = q, ~d ,1 = ~i, d=l, ... ,k, and l(r) has S-eA~licit
representation, i.e.
k
y(r)
= (l/N) [(trR*R*
)
]-1[( \l. NdtrR*C*(r))
] .
_
N"hN"h' hh'
N"h.....d 1 h
d=l
'
For instance, when
~i
or it is equal to 1
~
(4 ° 26)
has the most general pattern eml = q(q+l)/2)
Q, where
~
has the most general pattern.
Following, we present the asymptotic joint null and nonnull
distributions of ~{w),k and l(w),k .
4.3.3 Asymptotic distribution of the MLE's
In this subsection, we present the asymptotic joint distribution
of the MLE's of .@ and;E.
Results for the case where the K populations
102
do not have any common parameters, can easily be obtained from the ones
presented in subsect ion 2. 3.1. Theorems 2.9 and 2.10 , given in sect j on
.
t
t
t
t t
~
2.3, W1th Q = (~1' < II > '···'~k' < Ik > ) , a[ L Pd(Pd+ 3)/2] x 1
d=l
"
~
. t
~
t t
.
"vector, ~N = (~l' < ~ > ,···,!k' < ~ »
and fCw) = f(w)(~N) =
(~~w),k' lCw),k)t, a (r+m)xl vector, w =.0,1,2,3, provide the basic
results for this study.
f
It is assumed that 0 <.lim f < 1, where
N-+oo n
n = Nn/N, n=l, ... ,k.
From theorem 2.9 cited above, we can see that for population n,
n
= l, ... ,k,
~ = ¢(L ) (see definition 2.2)
-n
--n
1
>
(by noticing that N
n
1
= ~~)
n
0, f -1
n
[
~
>
lim L
N-+oo
0
II
ld1
< 81 >
k1 >
-----
----Ik
l ~k
<
->
lim L(J2(~N-~))
N-+oo
>
= N(Q,I),
<
= N(O,
-
[In
-0
~J]
T) ,
~
~k
< kk >
where ~N' ~ and
I are given above. Theorem
103
.
A
At
At
t .
.
2.10 wIth f(W)(~N) = (~(w),k' !(w),k) IS then used to obtaIn the
asymptotic joint null and nonnu11 distributions of ~(w),k and l(w),k.
The expressions for ~(w),k and !((u),k used here are the second forms of
(4.11) and (4.12) with Nd replaced by f d in the expressions for QCw),k'
~(w) ,k' %(w) ,k
and few) ,k·
Theorem 4.3 The asymptotic joint distribution of the MLE's of
~
and!
evaluated at the true parameter value ry
, A ) = (11*
r*), n=l, ... ,k,
~n ....n
Nn ....n
where M~ and k~ need not be patterned, is given by
lim
N
-+
1
L ~
[r~(w) 'kl _[:(w) .k]
_.!Cw),k_
00
_.... lw),k
N [~~w)
J = [Q'
,k!
YCw)
'k2lJ C4.27)
.... Cw),k2 V
.... Cw) ,k3_
w = 0,1,2,3, where
+n
~Cw),k
(I
-z+
F
)-t +
Kt
]+2n
V
_ot
m .... Cw),k1.... Cw),k ....ZCw) ,k1.... Cw) ,k ~Cw),k.... Cw),k~(w),k
(4.28)
is a rxr variance-covariance matrix related to the elements of ~(w),k'
V
.... (w),k2
= 2K.... (w) ,k....Z+(w) ,k1 (I""ffi -Z+
F
)-t_ 2n
V
.... (w),k1.... (w),k
~(w),k.... (w),k3
(4.29)·
is a rxm matrix related to the covariance between the elements of
~Cw),k and i(w),k' and
V
.... Cw),k3
= 2(1""ffi _zt
F
)-l Z+
(2H
+J,
)Z+
(I~Cw),k1~(w),k
~Cw),k1 ~(w),k ~~w),k .... (w),kl ""ffi
(4.30)
104
is a mxm variance-covariance matrix related to the elements of T(w),k'
+
+
+
""
Here, Q(w),k1' £Cw),k and ~(w),k1 are the values of £(w),k1' D(w),k and
z( )
- w , k1 given at (4.12) with -f()
w ,n replaced by -r()
w ,n and Nn by f n
k
T
= \ f XtT
X T
= r- 1 r*r 1
-(w),k n;l ~(w),n-n' -(w),n -(w),n-n-(w),n
k
K
-(w),k
= \
f K
K
= D+
Xt
T
n;l n-(w),n' -(w),n -(w),k1-n-(w),n
[ (u t
)]
-(w),ng g ,
t
-(w),ng =
u
k
= 2[(d;l
\ f trr- 1
G r- 1
(G- 2:- 1
(2:
-2:*-B
)
-(w),k
d -(w),d-dg-(w),d~h-(w),d -(w),d -d -(w),dd
F
k
+X D+
( \ f
xt r- 1
G
2:- 1
B
))]
-d-(w),k1 ¢;1 ¢-¢-(w),¢-¢h-(w),¢-(w),¢d gh
'
k
t
and
=
L
f H
,
~(w),n=
[(~(w),ngI(w),~(w),nh)gh]
-(w),k n=l n-(w),n
H
k
\L f n-(w)
J
-(w) , k = n=l
'n
J
, J()
- w ,n
= [(trT()
G T()
G h) gh]' g,h=l, ... ,m.
- w ,n-ngw ,n-n
Also, 8(w),k and T(w),k are the ''MLEts'' of 8 and T obtained from (4.11)
Y
and (4.12) with d replaced by
~d
and 8d by
kd·
Notice that .§(w),k
(and thus ~(w),d = ~d.§(w),k) and !(w),k (and thus k(w),d =
m
L T( ) k gd ) are not statistics.
g=l w, g g
105
Proof This proof follows the same lines as the one presented for
theorem 2.11, which gives us the asymptotic joint distribution of the
MLE's in the one-population case.
N was replaced by f
d
that
The superscript "+" denotes that
d in the expression.
From the above, we can see
.~*(aTl
. .n ..... w) , k/3< .....An »t], and
k
(3)
Yew) ,k3
=
n~lf~I[ai(u\),k/ClYn)~(dllW) ,kldYn) t+(ClICw) ,k/ Cl < fJn
»
x
'*
t] ,
.~. .n(dT. . . Cw) , k a< A
.....n »
A
/
where the partial derivatives are evaluated at (Y
E*) true
. . .n'.A)
. . n = (11*
~n~ . . . n '
Parameter values, and .~*
. .n = .~(E*)
. . -n . So, we need to evaluate those partial
derivatives first.
vt
tt
Let Un ge an element of CYn' < An > ) , n=l, ... ,k.
applying Lemmas A.I and A.2 (see Appendix) we have that
(6)
Then,
106
(9)
m
+
k
1
1
ak(w),k/aUn = -2 5=1
L (aT( w,
) k5laun)[( d=1
L f dtrl-( w,
) d~d g1-( w) , d
G f- 1
G)
-d5-(W),d~
gh
]
x
'
(10)
k
m
(a%+() k/au )i( ) k = -2 L (aT( ) k ;au ) [( L f dtr%-(I) d§d
w,
n w,
5=1
w, 5 n d=1
w, g
(13)
t
t
aB
lau = B
+B
+B
+B
with
-(w),dd n -(w),dnl -(w),dnl -(w),dn2 -(w),dn2
B
-(w),dnl
= (~
x
I -f X 0+
Xt f-l
)(aV lau )(V _~
)t
dn-Pd n~-(w),kl-n-(w) ,n -n n -d ~(w),d
and
m
k
B
=
\
(aT
lau )lX D+
\ £ Xt I- 1 G I-I
x
-(w),dn2 5~1 -(W),k5 n -d-(w),kl~~1 ~-~-(w),~-~s-(w),¢
, and
(14)
az+() k/au
- w,
n
1 G r- 1 (aA lau )) ]+
= [fntrr-(w),n-ng-(w),n -n n g
1
2[(£ trrx
n -(w),n
107
k
l
(aY lau )(Y -~
)t) ]-2[(f \ f trf- l
G
x
~ng~(wJ,n ~n
n ~n ~(w),n g
nd~l d ~(w),d~dg~(w),d
r-
G f- l
m
~dQ(w),kl~~%(~),n(a~/aUn)(Yd-g(w),d)t)g]+2s~1 (a~(w),ks/aUn)x
k
{ref trf- l
G f- l X 5+
( \ xtf-l G £-1 B
)) ]n ~(w),n-ng~(w),n-n~(w),kl ¢~lN¢~(w),¢~¢S~(w),¢~(w),¢d g
k
[( \ f
d~l
trf~(w),d~dg~(w),d~ds~(w),d~(w),d
-1 G f- l
G f- l
e )g]}
d
.
From (4)-(14) and noticing that
we can see that the partial derivatives of the MLE's with respect to U
n
evaluated at the true parameter values
(15)
(a~
(w),k
I
/dU )
n (11* r*)
=
~n'~n
p
~(w),n
I
(Yn'~)
=
(~'k~)'
I
(aY /dU )
-n n (11* r*)
Q(w),k (aT~(w),k /dUn) (* r*)
M;.;'~n
are given by
-
~n'~n
'
t -1
+
where p()
) klX r()
w ,n = f D(w,
w ,n and
~
(16)
n~
(a"l(w),klaun)
~n~
I
(ll.~ ,~~)
= (I~ -z+
F
) -lZ+
m
~(w),kl~(w),k
~(w),kl~(w),n'
where
m
~(w),n
1 G r- 1 (aA lau )1
= f n{[(trr~(w),n~g~(w),n ~n
n (11*
) ]+2[(11*-11
)t x
r*) g
~n ~(w),n
~T'I'~n
108
Equation (15) can be rewritten by replacing (ax(w),k/aUn) by its value
in (16).
In order to get the partial derivatives with respect to
Y
~n
and
<A > , we also evaluate
~n
aY;au
~n
n
=
e "if Un
-n,l
aA-n laun
= 0
- if Un
=
-n,l"' -0 if Un
= Y
=
Y ., -n,J
J ·t if U
-n,l
n
<A
-n >"J t and
=
<A
-n >"t
] '
i=l, ... ,p , j $ t = l, ... ,p , where e . is a p xl column
n
n
-n,l
n
I in the ith position and 0 elsewhere, and J "t is a p x
-n,J
n
with I in the (jt) and (tj)th positions and 0 elselvhere.
vector with
p matrix
n
Thus, using
the partial derivatives results in (15) and (16),
(17)
(a~
lay)
(w),k -n
I(* r*)
ld;i'-n
= -(w),n
P
-Q
(I -2 +
F
) -1
(w),k -m -(w),k1-(w),k
x
where
k
M
-(w)'Y
n
2f {[((
= n
-1
*
)t r -1
G r- 1
) ] [( ~ f (
)t x
~n-~(w),n -(w),n-ng-(w),n
g - d~l d ~d-~(w),d
'Ie
-1
+
t -1
~(w),d~dgk(w),d~Q(w),kl~k(w),n)g]
(18)
(a~( ) k/a<~ »1
w ,
n
(11* L*)
~n'-n
}
,
109
where
with -¢()
= -¢(E()
W ,n
- w ,n ),
(19)
and
The result follows by substituting (17)-(20) in (1)-(3) and also by
noticing that
~*pt
P
-(w)
,nkry-(w) ,n
= f2 D+
XtT
X n+
n~(w),kl-n-(w),n~n-(w),kl'
\1
~*\lt
=
~(w)'Ynkn~(w),~
P
E*!'-1t
= 2f2K
and
-Cw),n-n-(w)'Y
n-(w),n
n
2 t ",-1 "'*\\1 = 2f2J
"1
"'*~lt
= 4f W
~(w),<A > ~n-(w),<A >
n-n~(w),~n
n-(w),n .
-n
-n
4f 2H
n-(w),n
0
Below, we present the asymptotic joint null distribution of
~(w),k and l(w),k'
This is the distribution obtained when 0d~'k~)
are included in the parametrization under which the MLE's were
m
derived, i.e. -n
l.I~ = X B* and E* = \ 1"*G
which ~'r
imnlies that
-~
-n g=l
L g-ng
.
<y*>
,E() ) = (1I*,L*),
n=l, ... ,k.
-n = <W
-n >1"*
- and therefore (lI()
~ w ,n - w ,n
~n -n
Corollary 4.3 The asymptotic joint null distribution of the MLE's is
given by (4.27) with Y(w),kl = QCw),kl' Y(w),k2 = Q and
110
v~(w),k3 = 2Z+
~(w),kl
.
Proof ~fuen (B(wJ ,n 'kC)
w ,n ) = CB*,E*),
n ~ we have ICw) , k = ~+Cw) , k'
.
+
K
= -'
0 -(w),k
F
= ~'~(w),k
0 n
= -'
0 H
= ~0 and J-(w),k = ~Cw),k'
Z
and
-Cw),k
~lw),k
and the result follows.
0
In the case where
(B~'k~)
does not belong to the parametrization
cited above, we have the so called asymptotic joint nonnull distribution of the MLE's.
4.3.4 Efficient estimates of
~
and r
Here, we extend the results obtained in subsection 2.3.2, which
deals with the same problem but for only one population.
First notice that
lim LC~(A
-~
)) -- N(O~'~Cw),k1'
n+
)
~Cw),kN ~(w),k
N-+oo
+
'"
'"
where ~(w),kl is the value of ~(w),kl given at (4.12) with I(w),d
replaced by kCw),d and Nd by f d , and ~(w),kN is the solution of C4.l1)
with ICw),d replaced by k(w),d·
.§Cw),k and kCw),d are values of the
parameters subject to the constraints, if any.
This result is true
even when the underlying distribution is not normal.
Theorem 4.4
Let Xdl, ... ,XdN
d
'
d=l, ... ,K, be K independent random
samples taken from K populations whose distributions have mean Bd
and covariance kd defined by C4.1) and (4.2) with
~d
=
'"
Also, let k(w),dN
be consistent estimates of k(W) ,d·
~
and
!ct
=
"*
If .§(w),kN
is
the solution of (4.11) with ICw),d replaced by %(w),dN then,
1im LC~C~(w),kN-.§(w),k)) = N(Q, ~~w),kl)·
N -+00
asymptotically efficient, so is ~(w),kN .
Also, if
r·
~(w),kN is
111
Proof
~~w),klN is ~(w),kl with kCw),d replaced by fCW) ,dN)
k
xtf
- D
XtL:
)[f ~cv -x p.
.
~(w),kl~d~Cw),dN ~(w),kl~d~(w),d
d ~d ~~Cw),k)]}
t {(D
=d~l
converges stochastically to zero because for each d, d=l, ... ,K,
1
1
~[Nd(Yd-~Cw),d)] which has a limiting distribution.
Therefore,
~(~(w),kN-.§Cw),k) has the same limiting distributions as
~C~(w),k-.§(w),k)'
which is NCQ,
~(w),kl)
and if
~Cw),k
is as)TIptotic-
ally efficient, so is ~(w),kN (in the same sense).
o
In this work, the underlying distribution is the normal distribution and therefore, ~(w),k is asymptotically efficient in the sense
of attaining the
Cram~r-Rao
lower bound for the covariance matrix of
unbiased estimates, because it is the
known.
~~E
of .§ when
ka
= kd (1 ) is
In other words, equation (4.11) with few) ,d replaced by a
consistent estimate of k(w),d gives an asymptotically efficient
estimate of
~
(and thus
~
=
~d~).
112
Similar results can be obtained for
T
by writing equation C4.12)
in the same form as equation (4.11), i.e. by writing ~(w),k' %(w),k1
and iCw),k as functions of
~Cw),d and ~d instead of %Cw),d and the 9'5
matrices.
Following Anderson (1973), as at
t~e
end of subsection 2.3.2,
. . . 1·
(0)
(0)
b
b
d
InItla
estlJllates 2: (w)
,kl' ... ,1 (w)
,km can e 0 taine from C4.12)
,nth ~Cw),d replaced by Y
d and ~Cw),d' d=l, ... ,k, by any positive
matrices which satisfy the constraint on 1, if any. Then ~~:~,k and
!~:j,kmobtained
CO )
~(O)
= g~l
\ T (w),kg
-(w),d
and
from (4.11) and (4.12) with
~(w),d
replaced by
G 'I.e. one .
. 0 f teet
h M h0 d 0 f ScorIng
.
-dg'
IteratIon
are asymptotically efficient.
4.3.5 Hypothesis testing
In this last subsection, we present LR tests, and their asymptotic
null and nonnu11 distributions to test hypotheses associated with the
three different types of constraints on the parameters discussed
before.
This study parallels the one done in sections 3.1-3.3, in
Chapter III; hence the proofs here are basically a repetition of those
presented in the cited sections and are not going to be presented
here.
A LR test of HO: ~l= ... = ~ = ~ and !l= .•. =!k = !, and the
asymptotic null distribution of the LR statistic is also presented.
(0)
Testing HO,k~
~
:: ~ and !d ::!' d=l, ... ,k
Here, we want to construct the LR test to test the null hypothesis
HO,k:
~
=
~
and!d = 1, d=l, ... ,k versus Ha,k: they are not equal,
113
based on a random sample X
di
~
~
~ NIDPd(~'kd)'
i=l, ... ,Nd , where
~d
and
are given by (4.1) and (4.2) with r d = r and m = m. The parameter
d
is the space ldd,kd where ~ and kd are given above, and the region I"
is the region in this space where
~d
=
~
and Id = I,. d=l, ... ,k.
Theorem 4.5 The LR statistic to test the null
h)~othesis
HO,k is given
by
(4.31 )
I
where d is the MLE of kd given in section 4.1 and 1(0),d is the HLE of
~ given in subsection 4.3.0.
The null hypothesis is rejected if Ak is
o
"too small".
In general, the exact distribution of Ak is not known or it is
difficult to be evaluated. Asymptotic results are necessary.
Theorem 4.6 Under certain regularity conditions that are satisfied
here the asymptotic distribution of -2log Ak lIDder the null hypothesis
is X~d-1)(m+r)' The null hypothesis is rejected if -2log Ak is too
"large".
o
Next, we present the LR tests to test the hypotheses associated
with the constraints on the common parameters
(i)
Testing
~
and I.
HO,(l)k~~ ~~O' given ~d~ and!d~
Assuming that
~d
=
~
and Id=I, d=l, ... ,k, we construct the LR
test for testing the null hypothesis HO,(l)k:kt~ = ~ versus
Ha,(l)k:!t~ :f
20,
is given by (4.5).
based on the random sample whose likelihood ftmction
Also, asymptotic null and nonnull distributions of
114
the LR statistic are presented.
The proofs for these results are very
similar to the ones given in section 3.1; hence they will not be given
here.
~d
Notice that, now the parameter space
and kd are given by C4.l) and C4.2) with
n is the space ~d'kd where
gd = g'.!d = 1, r d = r and
md = m. The region w is the region in ~his space where 1 t g
1 and ~O defined in subsection 2.1.1.
= ~O with
Theorem 4.7 The LR statistic to test HO,Cl)k vs Ha ,(1)k is given by
-2
A(1 ),k
k
A
A
= d=l
IT CII (1 ) dl/1kCO) d!)
'
,
Nd
(4.32 )
where kCO),d and k(1),d are the ~fi£'s of Id given in subsection 4.3.0
and Corollary 4.1 respectively.
The null hypothesis is rejected if
A(1),k is "too small".
o
Theorem 4.8 Under certain regularity conditions that are satisfied
here, the asymptotic distribution of -2log A(1),k when HO,Cl)k is true
is x~. The null hypothesis is rejected if -2log A(1),k is "too
large".
o
Theorem 4.9 Under certain regularity conditions that are satisfied
here and also tmder a sequence of "local" alternatives
Ha ,(1)kN: 1tg-~o = N-~~l' ~l a txl fixed vector, the asymptotic
2
distribution of -2log A(1),k is xical,k), a noncentral x with £
degrees of freedom and parameter of noncentra1ity a l k =
'
t t k\
t -1
-1 -1
~l [1 C L f d~d~ ~d) 1] ~l·
d=l
0
115
Here, we present a study similar to the one presented in (i),
but with!, instead of
~,
constrained, i.e. given that
~
=
~
and
!d = !, d=l, ... ,k, we construct the LR test to test
HO, (2)k: .~t£ =:YO vs. Ha , (2)k: §t,! of :YO ' where § and :YO are
defined in subsection 2.1.2. This test is based on the random sample
described in (i) and asymptotic distributions are also obtained.
Again, as in (i), the proofs are omitted because they are similar to
previous proofs given in section 3.2.
The parameter·space
n is
the
same as in (i) and the region w is the region in this space where
t
S
T
~~
= YO'
~
Theorem 4.10 The LR statistic to test the null hypothesis described
above is given by
(4.33)
where
A
t
t
A
Y(2),k = rO[§ (k(2),k)
-1
-1
§]
t
A
[§ Ck(2),k)
-lA
~(2),k - ra]
A
with ~(2),k and
k(2),k given at (4.12), and %(O),k and i(2),k are the MLE's of kd given
in subsection 4.3.0 and corollary 4.2 respectively.
The null
hypothesis is rejected when A(2),k is "too small".
o
Theorem 4.11 Under certain regularity conditions that are satisfied
here, the asymptotic distribution of -21og A(2),k when HO,(2)k is true
is X;. The null hypothesis is rejected if -21og A(2),k is "too
large".
o
116
Theorem 4.12 Under certain regularity conditions that are satisfied
here and also tmder a sequence of "local" alternatives Ha ,(2)kN:
st T - YO = N-~62' 62 a fixed sxl vector, the asymptotic distribution of
-21og A(2),k is x;(a 2 ,k)' a noncentral x~ with s degrees of freedom and
.
t
t
\
-1
-1
-1-1
noncentra11ty parameter a 2 k = 2~2{.§ [( L fdtrkd gdgkd ggh)gh] §} k. Z· O
,
d=l
Finally, we have the situation where we want to test HO, (3),k:
t
1 ~ = 20 and
1,
~o'
t
§!
t
= ro vs. Ha ,(3),k: 1
~
r ~O and/or §t ! r rO'
where
§ and !O are defined in subsections 2.1.1 and 2.1.2, based on
the same random sample considered in (i) and (ii).
Clearly the null
hypothesis here is just a combination of the ones discussed in (i) and
(ii) and therefore, the LR statistic and its asymptotic null and nonnull distributions can easily be obtained from the results presented
there.
The next chapter deals with the incomplete-data situation.
CHAPTER V
INCOMPLETE DATA:
ESTIM~TION,
5.0
ONE-POPULATION CASE.
LIKELIHOOD RATIO TESTS AND ASYMPTOTIC RESULTS
Introduction
In Chapters II-IV, it is assumed that the random vectors -1
Y. (one-
population case) and Y
di (K-population case) are observed in full, i.e.
all the components of each of those vectors are individually observed.
Here in this chapter, we assume that some of the components of the
random vectors may not be observed, i.e. we allow missing value to
occur, either by design or at random but, if the values are missing at
random, the process which "deletes" values is assumed not to affect
inferences about the distribution of the data (see Rubin (1976)).
Part of the estimation and hypothesis testing problems discussed before
are considered here in the context of this new framework.
In fact, our
study will be restricted to the one-population case, because we think
that the most important situations for the K-popu1ation case have
already been covered by Szatrowski (1981).
Following the notation introduced in section 1.2, we assume that
there are R ~ N different patterns of massing values, which are
arranged in R subsets (or populations) of Nq elements each one,
R
L N = N. Those elements are represented by
q=l q
kn i
't
=
~rni'
'1
't
i=l, ... ,Nq ,
mn
where Y
vectors, represented bv, Y.
-q1. are the co"y 1ete-data Dx1
-1 in
118
Chapter II, arranged in such a way that the first N of those vectors
I
are associated with one pattern of missing value, the next N2 with a
different pattern of missing value, and so on.
The M
are known
--q
Pq xp matrices which "define" the patterns of missing values and are
such that rank ~ = Pq :5: P , rank [M~: ••,. :~~]t~ = r and for each g,
there exists a q such that ~§g~ r Q, g=l, ... ,m. If those conditions
on M
did not hold, data would not be available for estimating one or
---q
JOOre of the unknown parameters and some matrices that appear in the
likelihood equations would not be positive definite.
Usually, each
M contains a subset of the rows of an identity matrix; ~1 Y .
-q
--q-ql
"selects" the non-missing elements of the como1ete
random vector -ql
Y"
.
~bre
generally,
Mq
may create linear combinations of the elements of
Y
where
-ql.. As Y
-ql. - NIDP (ll,L),
~ .::::ni - NID (lJ_,L), where
't
pq ""'4 -q
= -qX l3 with
1lq = Hll
-q
X
---q
11
~
and -L are given by (1.1) and .(1.2),
,
= MX
-q-
(5.1)
and
kq
with
m
= -qL (T) = M L (-r )~1t = LTG
-q- - -q
g=l g-qg
(5.2)
M G ~lt , q=l, ... ,R
~g = -q-g-q
Clearly our random sample -ql
Z ., i=l, ... ,Nq , q=l, ... ,R, can be
considered as R independent random samples, each one coming from a
distinct normal distribution with mean and covariance given by (5.1)
and (5.2).
In other words, this new framework can easily be associ-
ated with the special K-population case discussed in section 4.3,
where the mean vectors and covariance matrices are functions of the
.e
119
same .§ and .I vectors respectively.
Therefore, we can extend the
results obtained in that section to this incomplete-data case by
replacing
r by
~,
5.1 Estimation:
d by q and K by R.
We also define f q = NqIN.
Likelihood equations and explicit solutions
~11""'~1~1""'~R1"'"
Based on the random sample of size N:
~RN
R
taken from Np (ldq 'k.q)' where Pq' .Bq and k.q ' q=l, ... ,R, are
q
defined in the previous section, we want to find the HLE's of
In
P.(and thus of
~
(and thus of ~l: = \;'L L ~g
G ). Notice that X
~
~
g=1°
and the G's are known matrices related to the complete-data situation
11
=~
XP.) and
L
and defined at (1.1) and ll.2).
The likelihood function for this case is given by (1.27) and it
is reproduced here as
R
- ~ NqPq/2 R
= (2n) q-1
X P.) = A
where ~q
C = ~q.l:q
C (11 = ~qJ;;
~
-N 12
R
( TI Ik. I q )exp{(-~) L N trk.-lf }, (5.3)
q=l q
q=l q q q
+
(2
-lJ_) (2
-lJ_) t with 2
=
~ -q
~q '-(.1
~q
N
N
q
q
t
\;' 2 . and A = (lIN) \;' (2 .-Z )(2 .-Z). Both the uncon(lIN)
~q
q i~l ~ql ~q ~ql ~q
q i;l~ql
strained and constrained parameter situations are studied.
5.1.0 Unconstrained parameters
In this situation, the MLE's of S and
8CO ),I
and l
CO ),1
L
are represented by
respectively, and are a solution of the likelihood
equations given by (1.29) and (1.30), and reproduced here as
..
(5.4)
120
and
G
f- 1
C
)]
g
(5.5)
-qg~(OJ,q~(U),q
where I(O),q = LqCIco) ,I)' i(O),I=(l(O),Il,···,i(o),Im)t and f(o),q =
Cq(UlO),q = Xq~(O),I)'
The subscript (0),1 denotes that the ~~E's are
being evaluated with the parameters uncontrained and they are related
to the incomplete-data case. From Lemma A.O (see Appendix), we can
.
~
tA-l
~
A-I
c-l
see that the matrlces L N ~ L(u) X and [( L N trL lO ) . G k(O) x
q=l q q
,q-q
q=l q
,q-qg
,q
Qqh)gh] are positive definite provided I is positive definite.
Notice that expressions (5.3)-(5.5) can also be obtained from
(4.5)-(4.7) respectively, the corresponding expressions for
th~
special
K-population case studied in section 4.3.
As mentioned earlier, only in very special situations, e.g. the
nested pattern of missing values described by Anderson (1957), will the
above equations, with ~ replaced by Uq and ~lO),I by B(O),I' have
explicit solutions. In those situations, the mean vectors are assumed
not to be structured, as in (1.1), and the
Jv~E' s
do not have
S-explicit representation (see definitions 1.1 and 1.2)).
Here, as in
the K-population case, explicit solutions for the likelihood equationsare not expected to occur and iterative procedures, which
,~ill
. be presented later, are necessary.
Next, we present the likelihood equations for the situation where
the parameters are subject to constraints.
121
5.1.1
Constrained parameters
Here, we want to find the t-tLE's of
~
and 1 when either
~
or 1, or
both are subject to the constraints 1t~ = ~O and/or ~t1 = rO' where
1,
~, ~O
and rO are defined in subsections 2.1.1 and 2.1.2.
Using the correspondence between this incomplete-data structure
and the special K-population case studied in section 4.3, we can see
that the likelihood equations can be written as
= fl~1-0
.1 e
A
A
A
ld
-D
~l A )
-lw) ,II -(w),I -(w),I~1~0 '
+ D
(5.6)
and
A
-(w),1
T
A-1
A
A-1
tA-l
-1 tA-l
A
= -(w)
Z
z
Z
5 (5 Z
5) l5 Z
z
-y )
,1-lw) ,I -~w),I- - -(w) ,1- -lw),I-(w),I -0
= ~L
Y
N~O
+
Z ,II lZ-(w) ,I --(w),1N~
Z
M_Y )
-(w)
0
(5.7)
'
"
"
'"
'"
w = 1, 2 , 3, where -L:" ()
L: (-T ( w,
) I), C()
C ( 11 (W) ,q = -qX l3 ( w) , I)
w ,q = '""q
- w ,q = -q.l;;
=A
+ B
A given at (5.3) and -(w),qq
B
= (2-q -11.l;;(w),q)lZ-q -11.l;;lw),q ) t ,
-q -(w),qq' -q
A
" - "
D
- R (R!fi TL)-IRt
-(w),Il - ~b Nb-lw),INb -1'
~
R
~(w),I
= \ N Xt f-l
X
q;1 q-q-(w),q-q ,
R
"
= \ N Xtf-l Z
t"
-1 t
Q(w),I q;1 q-q-(w),q-q' k(w) ,11= ~(B§~(w),1~)
E§
A
A
Z
-(w) ,I
R
l
= [(q;1
\ Nq trrG i-I G) ]
-(w),q-qg-lw),q-qh gh
R
= [( t N trr- 1 G £-1
z
-(w) ,I
q;l q -(w),q-qg-(w) ,q-(w) ,q g
A
e )]
and
g,h=l, .•. ,m.
- -
122
Also, L = 0, 80 = 0, ML = a and RL = I for w = 2 (situation where
~ is not constrained), and they are as defined in subsection 2.1.1 for
~
uJ = 1,3.
w
~
~
~
-~
~
~~
~
On the other hand, § = Q, 1.0 = Q, t~ = Q and
%= !m for
= 1 (situation where! is not constrained), and they are as defined
in subsection 2.1.2 for w = 2,3.
of ~( uJ ) , I and
i(uJ J , I
It is easy to see that the values
do not change when we rep1acy Nq by f q = Nq IN
in the expressions for n(w),I' ~(w),I' %(uJ) ,I and i(uJ) ,I'
These
later forms are more appropriate to obtain the asymptotic results.
No-
tice also that, expressions (5.4) and (5.5J can be obtained from (5.6)
and (5.7) with
and
%= lm.
1 = Q,
= Ir ,
In fact, the above equations are the likelihood equations
§ =
Q,
~O
=
Q, Xo = Q, t1 = Q,
l~
=
for both the unconstrained and constrained parameter cases.
Q,
~
Next,
iterative procedures are presented.
5.2 Estimation:
Iterative procedures
As we have noted before, the Method of Scoring and the fl.l
algorithm can be used to obtain the riLE's of .§ and
r.
The first
method is applied directly to the likelihood equations (5.6) and (5.7),
whereas the second one is applied to the likelihood equations associated with the complete-data-one-popu1ation case, which are given by
(2.2) and (2.3), (2.7) and (2.8), (2.11), and l2.12), and (2.15) and
(2.16), and summarized by (2.29) and (2.30).
5.2. a The Method of Scoring
As described before, the rth, r=1,2, ... , iteration of this procedure can be descrl"be d as:
'th L
~
WI,
= '1
~(r-1) x
we obtal'n ~(r-1)
~(uJJ,q
l'q~(W)
-0 1 2 3
1 d
~(r-l)
luJ ), uJ- , , , , rep ace by ~lw)
lit
-1" R
d SLr)
I~q' q- , ... , ,an
lw),I f rom (5 ' 6J "'l'tl1
n
123
~
replaced by ~(r-1).
-(w),q
-lw),q
Then C(r)
= C (~lr)
= X S lr) ) d
,q an
-(w),q -q lw),q -q-lw)
e
i~~~,I
C(r) .
from (5.7) with k(w),q replaced by ~~~)~~ and -(w),q by -lw),q
m
~lr)
= 1 T(r) G is the value of ~lW) for the next step. The
""lw),q g=1
~
(w),I-g
5.2.1 The
fl~
algorithm
Unlike the complete-data cases studied before, here we do not
need to create any artificial incomplete-data framework to be able to
apply this algorithm, because we already have missing data.
In this
situation, the algorithm is applied in the same way it was described
in Chapter I.
The rth, r=1,2, ... , iteration, of this procedure can be described
as:
(i)
Estep
Involves evaluating (1.33) and (1.34), the conditional
expectations of the complete-data suffIcient statistics
(r-1)
(r-1)
_
.§ and 1 replaced by .§(w)
and lew) , w - U,1,2,3.
(ii)
y and 6,
with
M step
Involves solving the complete-data likelihood equations (2.29)
and (2.30) with
y and
~
obtained in the E step.
replaced by their conditional expectations
The solutions are the values .§t~j and
l~~~ used in the next E step. The procedure stops when .§~~j _ .§~~)1)
124
(r)
(r-l)
and lew) - llw)
are both small.
Notice that the H step my requIre
an additional, entire iterative procedure to solve the complete-data
likelihood equations.
The results obtained in sections 2.1 and 2.2
are then very useful for this step.
Convergence for both procedures is not guaranteed.
See section
1.2 for some comparisons between the two procedures.
5.3 Hypothesis testing and asymptotic results
As in the case of the likelihood equations presented in section
5.1, asymptotic joint null and nonnul1 distributions of ~ew),I and
lew),I' and asymptotically efficient estimates of ~ and I can also
easily be obtained from section 4.3.
In the same way, LR tests and
their asymptotic null and nonnull distributions used to test hypotheses
related to the constraints on the parameters, can be obtained.
·~heoren
4.3 and corollary 4.3 give the asymptotic joint distributions of the
~~E's.
Asymptotically efficient estimates can be obtained from
subsection 4.3.4 and the LR tests and their asymptotic distributions
from subsection 4.3.5.
In the next chapter, we illustrate part of the results obtai.ned
in this work through two numerical examples.
CHAPTER VI
NUMERICAL EXAMPLES
6.0
Introduction
Here, two numerical examples are presented in order to illustrate
part of the results obtained in this work.
The first example is
related to the incomplete-data situation where there exists an explicit
solution to the likelihood equations when the parameters are not
constrained.
Results from Chapters II, IV and V are used.
The data
for this example were generated using a normal random number generator.
The second example is an application of section 4.3, the K-population
case with common parameters, using part of the data collected by the
"Pulmonary Scor:Ll.IDg Injury by Childhood Infections" project.
These
data were obtained through personal conummications with Dr. Ronald ,,',
Helms and Dr. Gerald Strope.
The numerical calculations were made at the Triangle Universities
Computation Center CTUCC) using an
I~1
3081 computer and computer
programs written using PROC MATRIX in SAS.
6.1
Example I
In this example, besides showing an application of some of the
results presented in Chapters II, IV and V we also show, at least for
situations similar to this example, that the two iterative procedures
suggested throughout this work have good performance when applied to
126
solve likelihood equations with explicit solution.
In other words, if
we do not know that the likelihood equations have explicit solution and
we apply these procedures, they are expected to converge very quickly
to the explicit solution, i.e. the MLE('s).
It is assumed that we have a bivariate normal d1stribution with
tDlpatterned (X = 12 in (1.1:)) and covariance matrix l;
having the most general pattern (m=3 in (1. 2)). Based on a random
mean vector
\.I
sample of size N with N1 complete observations and N , N = NI +N 2 , having
2
only the first component observed, we obtain the MLE's of the parameters
when:
(i) the parameters are not constrained, and (ii) the parameters
are subject to the constraint that the variances are equal.
Following the notation in previous chapters, we have r
==
2, p
=
2,
~ = 12 and l:!. = .§ = (\.11'\.12) t (see expression 1.1), m = 3 and
,1; =
gt'gfig with
!h
=
~~
•
fi z =
~~
and
fi 3
=
~~
(see
e~ession
(1.2)), §t = [1 0 -1] and Xo = 0 (see subsection 2.1.2), and R = 2,
M
1 = 1z and Mz
= [1 0] (see section 5.0).
The data were generated using l:!. = (ZO, IS)t and! = (4, 1.6, l)t,
and the sample size was chosen to be N = 30 divided as NI = ZO
(complete observations) and NZ = 10 (incomplete observations). The
~~E's
obtained in (i) and (ii) are used to test the null hypothesis
that the variances are the same, a false statement.
The data values
are shown in Table 6.0.
(i)
Unconstrained
par~ters.
When the parameters are not subject
to any constraint, we have from (5.4) and (5.5) that the likelihood
equations are given by
127
Table 6.0:
Bivariate normal data C*) generated using
t
t
~ = (20,15) and 1 = (4, 1.6, 1) .
21.36
23.89
21.94
19.65
22.90
19.34
22.34
21. 04
16.34
20.87
23.01
18.90
18.43
19.23
24.70
19.96
19.91
21.18
20.16
14.98
19.54
19.46
20.14
20.72
20.07
22.83
17.40
21.91
22.24
18.04
(*)
15.46
16.21
15.95
15.15
16.0'9
13.67
16.62
14.91
13.92
14.06
16.76
14.48
14.09
14.52
16.89
15.56
14.32
15.57
14.94
13.50
14.92
15.72
15.43
15.21
15.44
15.72
13.30
16.02
15.18
14.08
These values are an approximation of the values
generated by the computer. In terms of the analysis,
the values of the second component of the last 10
pairs are assumed to be missing.
128
2
M)-lCtNMtf- l ! )
q~l q-q-(O),q~
q~l q~-(O),q-q
2
~(O),1
A
= (tNMt f-1
= [(
and
2
2
t N trf- l
G f- l
G ) ]-1[( t N trf- l
G
q~l q -(O),q~g-(O),q-qh gh
q~l q -(O),q-qg
x
Anderson (1957) showed that these equations have an explicit solution,
Applying Anderson's results to our data we obtained Q(O),1 =
(20.416, 15.099)
t A t
and 1(0),1
= (4.519,
1.744, 0.936)
as the MLE's of
Mand 1 respectively.
Table 6.1 shows the results obtained by using the two iterative
procedures for finding those MLE's.
values were considered.
Three different sets of starting
Notice that the Method of Scoring uses the
above likelihood equations, while the EM algoritmn uses the comp1etedata likelihood equations.
2.1. 0 with
(1.33) and
Those equations are given in subsection
y and 8 replaced by their conditional
(1.34). As ~ = 12 and k has the most
expectations given by
general pattern, it
can be shown that at the rth iteration of this algorithm the ''!'vILE's''
have S-explicit representation and are given by
(r) _
B(O),1 -
r~(r)
(r) _ (r)
(r)
(r) t
and 1(0),1 - (c ll ' c l2 ' c 22 ) ,where
y*(r) is the conditional expection of Y and c~:) is the (ij)th element
D
"
1
'
f
A
be
h
'
(r-l) an d 1(0),1
(r-l) '
h
d
f
o t e con ltlona expectatlon 0 _, t glven MCO),1
Q1e can see that both methods converged to the Values obtained from
Anderson's results in all the three instances.
Q)
("-.I
~
Table 6.1:
Results obtained by applying the two iterative procedures to find the
StartinR Values
rial
Method
(0)
III
1
B-1
20.506(4)
Scoring
2
e-l
15
.
-
15.134 5.365
(5)
20
2
Scoring
3
B-1
Scoring
1
0.1
of
~
and
~
when the parameters are not constrained.
Converged Estimates
T(O)
2
T(O)
3
2.070
1.062
Same
-1
3
Same
6
~LE's
-0.002
S3I"e
1
"
A
L(2)
1
Number
of
Iterations
Time (3)
Min: Sec
T1
T?
T~
15.099
4.519
15.099
4.519
1.744
1. 744
0.932
0.936
-34.498
-34.268
-34.267
2
2
0:00.8
0:00.9
15.099
15.099
4.519
4.519
1.744
1. 744
0.933
0.936
-309.185
-34.268
-34.267
8
6
0:00.9
0:01.3
15.099
(0.188)
15.099
(0.189)
4.519
(0.890)
4.519
(0.893)
-34.268
11
0:01.0
-34.267
5
0:01.0
vI
11 2
20.416
20.416
20.416
20.416
20.416
(0.388)(6)
20.416
(0.388)
L~l)
I
1.744
(0.244)
1. 744
(0.226)
0.932
(0.019)
0.932
(0.019)
-33,907.100
(1) Value of the nonconstant part of the 10glikelihood function evaluated at the starting values.
(2) Value of the nonconstant part of the 10g1ikelihood function evaluated at the converged estimates.
(3) The time reported above includes CPU tlme for compiling the program as well as executing it. Such CPU times on J1llltiprocessing systems may vary from
one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of execution
time.
(4) These initial values for III and 11 2 are the MLE's of VI and Ili using the 20 complete observations.
(5)
(6)
e
The~~thod of Scoring does not use initial values of the Il·.
1
The first value for the vCr)
are v(l)
computed from equation (5.6) using ~E(T(O)).
1
1
~
Standard error of the estimate (asymptotic).
-
.
,
e
130
The criterion used for checking for convergence was lIl!.g~ ,I 3
l!.~~)~~11 / 1Il!.~~L111 < 10- and II !gL1-!g)~~II/11 !~~LIII < 10- 3,
and as an additional infonnat ion , the t-tLE' s of ld and 1 based on the
A
30 complete observations are l!.(0)
"= (20.416, 15.124) t and 1(0)
=
(4.519, 1.725, 0.915) t .
(ii)
Constrained parameters.
In this case, the likelihood equations
are given by (5.6) and (5.7) with w = 2, i.e. ~(2),I is equal to
~(O),I given in (i) with I(O),q replaced by %(2),q and
2
A
1(Z),I
= R_(Rt [( \ N trf- 1
N~
-§
G
r- 1
G )
q~l q -(2),q-qg-(Z),q-qh gh
]R )-lR!
N~
-§
x
Z
\ N trr- l G f- l
[( q~l
q -(2) ,q-qg-(2) ,q-(2),q g
e
%is chosen to be IT ~ ~ t
)]
and the equations are not supposed to have
e
explicit representation.
Table 6.2 shows the results obtained by using the same two
iterative procedures for finding the MLE's of
ject to the constraint Ll
values are considered.
~
and 1 when! is sub-
= L3' Again, three different sets of starting
While the Method of Scoring uses the above
equations for finding the MLE's, the EM algorithm uses the complete-data
likelihood equations given in subsection 2.1.2 with yand A replaced by
their conditional expectations as in (i).
As X =
12 and 1'(2)' the
covariance matrix 1 subject to the constraint, is totally reducible
with L3 + LZ and L3-LZ as its eigenvalues, it can be shown that at the
r th iteration of this algorithm the "M..E's" have S-explicit representation and are given by
.
r-t
l")
r-t
Table
flo ~:
Results obtained by applying the two i terati ve procedures to find the
Start in!! Val ues
Tri:ll
1
Method
EM
(0)
III
20.506 (4)
IJ
EM
.
15.134
SCoring
2
.
3.214
(5)
15
20
uonvergea
,(0)
2
2.070
2
-1
2
Same
6
1
0.1
-0.002 0.1
3
SCoring
(0)
'3
3.214
Same
SCoring
EM
)
~1LE' s
Same
III
,
11 2
20.416
20.416
15.076
15.076
20.416
20.416
20.416
(0.322)(6)
20.416
(0.322)
of
~
and
~
when
~
is subject to 'I .: '3 .
~s~una~es
,
,
3.102
3.113
'2
1.982
1.980
'3
3.102
3.113
15.077
15.076
3.102
3.113
1.982
1.980
3.102
3.113
15.076
(0.366)
15.076
(0.367)
3.102
(1.136)
3.113
(1.158)
1.982
(1. 088)
1.91l0
(1.105 )
3.102
(1.136)
3.113
(1.159)
'I
L(1 )
0
Li
2)
Number
of
Iterations
Time (3)
Min:Sec
-48.260
-48.213
-48.212
3
2
0:00.8
0:00.4
-327.915
-48.213
-48.212
S
4
0:00.9
0:01.0
-48.213
9
0:00.. 9
-48.212
3
0:00.9
-52,720.100
(1) Value of the nonconstant part of the 10glikelihood function evaluated at the starting values.
(2) Value of the nonconstant part of the 10g1ikelihood function evaluated at the converged estimates.
(3) The time reported above includes CPU time for compiling the program as well as executing it. Such CPU times on multiprocessing systems may vary
from one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of
execution time.
(4) These initial values for 1J and 11 2 are the MLE's of 1J 1 and 11 2 using the 20 complete observations.
1
(5) The Method of SCoring does not use initial values for the Il·. The first value for the Il(r) are 1J~1) computed from equation (5.6) using E('(O))'
1
1
1
~ (6) Standard error of the estimate (asymptotic).
e
-
e
...
"#
132
lI(r) = y*(r) and T(r)
";(2),1"'"
""'(2),1
where y(r) and
C{j)
= ((c(r)+ c(r))/2 c(r) (c(r)+c(r))/?)t
11
are as in (i).
22
'12'
11
22
'-
,
Again, one can see that the thlO
procedures converged to the same values in all the "three instances.
The MLE's of ld and! with Tl = T3 were found to be ~(2),1
t A t
15.076) and !(2),1 = (3.113, 1.980, 3.113) .
= (20.416,
The convergence criterion used in (i) was also used here and as an
additional infonnation, the MLE's of ld and ! with Tl = T based on the
3
A
t
A
30 complete observations are ld(2) = (20.416, 15.124) and !(2) =
(2.717,1.725, 2.717) t •
Finally, to test the null hypothesis HO:T
l
=
T3 , the LR statistic
is
2
2
N
A- = q~l (IE(2),ql/IE(0),ql) q
"
(see section 5.3), where k(O),q
"
t
H
_q_L:(T(2) r)M.
"""q
1"'01
,
At"
= ~k(!(O),r)~ and k(2),q =
Combining (i) and (ii), we can see that -21og A =
1.21 which corresponds to a p-value of 0.0005.
Thus, the null
h)~othe-
sis, which is known to be false, is rejected at a level of 0.01.
Later we will make some comments on computational differences
between the two procedures.
6.2 Example 2
A longitudinal study was perfonned over a period of eight years on
children starting as early as two and one half years of age.
One of the
objectives of the study was to study the relationship between forced
vital capacity (PVC), the volume of air that can be exhaled, measured
"
133
in liters (£) and height (lIT) measured in cm.
After preliminary
studies, it was found that a straight line would adequately describe
the data over a height range of 100-150 cm.
Based on those preliminary
studies, it was decided to use the following linear model for analysis
of the data:
where FYC1<
~ =I
(~Xl)
is the vector of observations for the kth child,
III
~~kl-115 ••• HTkpk-ll~
t •
il = (Bl'B )t
z
is the vector of popula-
tion parameters with 61 being the intercept at HT = 115 em and Sz the
slope; Qk = (bk ,l' bk , z)t is a random vector of individual differences
from the population values with h-k, 1 related to the intercept and bk , 2
to the slope; and f.k is the random measurement error term. Notice
that different children can have different numbers of observations
(Pk) and also, the observations can be made at different heights.
fact, no two subjects have the same
assumed that
b ] ~
~k
[ f.k
NID
.
0,
+
Zp
k
~
[L:*~
In this model it is
\mere I>C =
Q
and
...
matrix.
In
~ = diag (HT~l'
... ,HTkpz ) .
k
(Previous studies showed the variance·of measurement error is
approximately proportional to HT 2 .) Our objective here is not to dis-
134
cuss the model or the subject matter area results, but to illustrate
how the results obtained throughout this work can be applied to this
problem.
From the model described above we can see that the vector of
observations for each child can be considered as a random sample of
size 1 taken from a Pk-variate normal distribution with mean
Bk = ~k.§ and covariance matrix kk = kk(l) = T1~1
Ty....~3 + T4~4'
.
+ T2~2 +
t
where 1 = (T l ,· .. ,T 4) ,
In other words, we can say that we have observations taken from
different populations with conunon parameters 8 and T.
obtained in section 4.3 can be applied here to find the
T and also to do some hypotheses testing.
The results
~fi£'s
of 6 and
Notice that there are no
replications because the X matrices are different from child to child,
i .e. the number of children is the number of populations.
Two groups of kids were considered for illustration purposes:
black male (EM) with 11 children and white male (WM) wi th 6 chil dren.
The likelihood equations for each group are not expected to have
explicit solution so that iterative procedures are necessary.
Tables
6.3-6.5 show the results obtained by applying the two iterative procedures as described in subsection 4.3.2, to find the
for each group.
~~E's
of
~
and 1
Five sets of starting values were considered and
the criterion used for checking for convergence was
\I ~(r)_.§(r-1)1I III ~(r)1I
< e
and 1I1(r)-1(r-1)l\/lll(r)1I < e "rith e =
10- 3 in Tables 6.3 and 6.5, and e=10- 5 in Table 6.4. The five trials
.
In
~
.-4
Table 6.3:
Results obtained by applying the two iterative procedures to flnd the
Starting Values
Trial Method ' 8(0)
1
1
2
3
eol
5
8(0)
2
3
Scoring
Tl(0)
T(O)
2
0.5
0.001
1.108 0.035(5)
Same
Scoring
----
Same
B-1
Same
0.002
Same
1
Same
2
SCoring
4
eol
T(0)
3
0.02
0.0001 0.001
Same
0
T4(0)
B-1
1
-0.0003 0.02
Same
SCoring
(1)
(2)
(3)
(4)
(5)
(6)
-
~
and
~
for the black male group (11 children)
Bl
82
~
TlxlO
2
~
T2xlO
4
5
T3xlO
JUnber
6
T4xlO
L(l)
0
L(2)
1
of
Time (3)
Iterations Min: sec
1. 765
1.070
0.250 48.413 1514.080 4908.950
7.472
0.029 1.666
3.872
0.773
0.898
92.67 1113.49
1155.85
8
9
0:01. 9
0:44.9
1.116
1.070
0.032
0.029
1.857
1.666
8.558
7.472
4.763
3.872
0.814
0.898
209.22 1154.52
1155.85
19
9
0:03.2
0:44.9
0.00001 1.105
1.070
0.031
0.029
1. 773
1.666
8.068
7.471
4.408
3.873
0.815 1015.41 1155.03
0.898
1155.85
20
10
0:03.4
0:49.8
1.111
1.070
0.031
0.029
1.816
1.666
8.302
7.471
4.597
3.873
0.814
0.898
-1.23 1154.94
1155.85
19
11
0:03.2
1:18.5
1155.01
19
0:03.2
1155.85
7
0:39.5
0.1
1
Scoring
5
of
Converged Estimates
(4 )
B-1
~~E's
0.005
1.107
(0.042) (6)
1.070
(0.040)
0.031
1.792
(0.002) (0.817)
0.029 1.666
(0.002) (0.764)
8.121
(3.938)
7.472
(3.630)
4.452 0.815
(2.252) (0.090)
3.872 0.898
(2.082) (0.100)
476.79
Value of the nonconstant part of the loglikelihood function evaluated at the starting values.
Value of the nonconstant part of the loglikelihood function evaluated at the converged estimates.
The time reported above includes CPU time for c~iling the program as ~'e11 as executing it. Such CPU times on nul tiprocessing systems may vary
from one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of
execution time.
(r)
(1)
(0)
The ~~thod of Scoring does not use initial values for the 8. The first values for the 8.
are e.
computed from-equation (4.6) using L(T)
).
1
1
1
- These initial values for Bl and 82 are the ordinary least square estimates of 81 and 82 ,
Standard error of the estimate (asymptotic).
~
e
..
,
e
..
•
e
e
•
e
\0
t")
r-l
Table 6.4:
Additional trials for the black male group.
Converged (l) Estimates
Starting Values
Trial
6(5)
7
6(0)
1
6(0)
2
1')1
1. 765
0.250
9
1')1
1(0)xl0 2 1(0)xl04
1
2
48.413
(6)
Scoring
8(7)
1.070
0.029
EM
1514.080
1(0)xl0 5 1(0)xl06
3
4
4908.950
O. 77~
Same
1.666
Scoring
10 (8)
(1)
.
Method
7.472
3.872
0.898
Same
1.107 0.031
1. 792
Trials 6 and 10 did not converge.
8.121
4.452
0.815
.
•
82
•
2
l xlO
l
4
•
1 xlO
2
1. 775
0.248
49.806
1523.060
482l.6~0
0.774
1113.49
1113.62
131
0:18.8
1.070
0.029
1.665
7.477
~.866
0.899
1113.49
1155.85
13
1:14.5
1.070
0.029
1.675
7.491
3.895
0.824
1155.85
1155.85
3
0:01.8
1.070
0.029
1.665
7.477
3.866
0.899
1155.85
1155.85
7
0:16.6
1.071
0.029
1.679
7.455
3.953
0.819
1155.01
1155.51
146
0:22.6
l~xlO
5
tU1tler
TiJRe(4)
of
Iterations Min:sec
61
14 Xl06
L(2)
0
L (~)
1
In those trials the total time of processing was limted to 1:00.0 and number of iterations· ISO.
Here, the
(5)
(6)
convergence crIterion uses e_lO· 5 instead of 10. 3 as in the other tables.
Value of the nonconstant part of the loglikelihood function evaluated at the starting values. Notice that both 6 (0) and 1 (0) are need to obtain
this value.
~
~
Same as above but evaluated at the converged estimates.
The time reported above includes CPU time for compiling the program as well as executing it. Such CPU times on JIllltiprocessing systems may vary fTal
one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of execution
time.
The starting values are the converged estimates obtained at trial 1 with the ~I algorithm.
The Method of Scoring does not use initial values for the 6·. The first values for the 6~r) are 6~l) coqJUted from equation (4.6) using I:{Y(O)).
(7)
(8)
The starting values are the converged estimates obtained at trialS with the Scoring.
The starting values are the converged estimates obtained at trialS with the EM algorithm.
(2)
(3)
(4)
1
1
1
-
-
r-I"")
~
Table 0.5:
Results obtained by applying the two iterative procedures to fInd the
Starting Values
Trial
1
~iethoci
EM
SCoring
2
EM
13(0)
1
5
3
--(4)
0.5
EM
1.365 0.036(5)
EM
Same
Same
0.002
0.0001
Same
1
0.00001
0
1
1
Same
Same
2
-0.0003 0.02
5
Scoring
0.001
Same
SCoring
a.t
0.1
Same
Scoring
4
0.02
1(0)
4
Same
Scoring
3
0.001
Same
of e and
1
for the white male group (6 children)
Converged Estimates
(0)
"3
13(0)
2
~n.E's
0.005
5
" 3x lO
13 1
62
1.789
1. 347
0.031
0.039
1.338
1. 347
0.039
0.039
2.336
2.328
2.334
2.328
1.342
1.240
0.039
0.044
2.330
5.157
1.341
1.258
0.039
0.041
0.039
(0.003)
0.039
(0.003)
1.341
(0.066)(6)
1.347
(0.066)
• x 61
"4 lO , L(1)
0
22.556 1170.610 6824.150
2.328
3.219
2.328
1.040
1.134
- ..."'''c
,) ....
3.219
2.324
4.104
2.332
1.727
2.321
7.156
2.330
(1.488)
2.328
(1. 494)
2.327
(4.197)
2.327
(4.213)
L(2)
1
Nuri>er
Time (3)
of
Iteration Min: sa:
780.93
803.70
7
10
0:01. 2
0:56.7
1.047
1.134
148.29 803.49
803.70
27
10
0:02.8
0:56.7
3.248
-5.760
1.047
1.025
719.51
803.50
27
6
0:02.8
0:34.4
3.250
0.470
1.047
0.943
-0.86 803.50
--
28
5
0:02.9
0:32.5
803.50
26
0:02.7
803.70
9
0:57.7
.)
3.250 1.047
(2.370) (0.136)
3.222 1.134
(2.384) (0.148)
58.54
--
338.24
Value of the nonconstant part of the loglikelihood function evaluated at the starting values.
Value of the nonconstant part of the loglikelihood function evaluated at the converged estimates.
The time reported above includes CPU time for compiling the program as well as executing it. Such CPU times on nultiprocessing systems may vary from
one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of execution time.
(4) The Method of SCoring does not use initial values for the 13 .. The first values for the 13(r)are 13~1) computed from equation (4.6) using E(t(O)).
1
1
1
(5) These initial values for 13 and 13 are the ordinary least square estimates of 13 and 13 ,
1
2
2
1
(6) Standard error of the estimate (asymptotic).
(1)
(2)
(3)
~
e
-
~
;
-
138
presented in Table 6.4 are additional trials that were made in order
to verify some initial conclusions drawn fram Table 6.3.
Fram Tables 6.3-6.5 we can see that, for each group, the two
procedures converge to values that are'basically the same.
For
instance, trial 10 in Table 6.4 shows that the EM algorithm at trial
5 will converge to the same estimates as the SCoring, if we allow a
greater number of iterations.
The EM algorithm had a poor performance
at trial 1 in both groups and trial 6, whereas the Method of SCoring
converged to spurious estimates of T(L is not positive definite) at
trials 3 and 4 in Table 6.5.
6.3 -6.5, the MLE' s of 8 and
A
~BM =
(1.070, 0.029)
t
Fram the results obtained in Tables
T
for each of the two groups were fotmd to be
A
and !Bt-1 = (1.666 x lO
-2
,7.472 x lO
- 4 · 'x
-S
-4
-S
,3.872 lO
,
0.898XIO- 6 )t,
and
A
~1
= (1.347,0.039) t
and
A
!~1 =
(2.328 x I0
-2
,2.327 x lO
,3.222 x lO
,
1.134XlO-6) t.
Their asymptotic standard errors are given in Tables 6.3 and 6.5,
trial S.
(he natural hypothesis for this type of problem is whether
§wr.1 = .e
.em-I =
and !BM = !WM = !' i.e. the relationship between PVC and Hf in
the two groups can be represented by the same straight line, and also
the variance-covariance parameters are equal for these groups.
hypothesis of interest is to test whether .eBt-1
= ~~ = .e
Another
without any
,
assumptions on !BM and !WM .
139
Following subsection 4.3.5, it can be shown that the LR statistic
to test the first hypothesis is
2 17
A- = n (Iic ki/likl)
k=l
'
where
i k = ~CiPM)
,
for k=l, ... ,11, and ];kCil"M)
i
for k = 12, ... ,17,
i
and c ,k = ];k(1) with the MLE of I under the hypothesis that ~~ =
~1 = ~ and IBM = lWM = 1· Clearly under this hypothesis we have all
the 17 "populations" with the same parameters .§ and 1.
Thus, the two
iterative procedures, as described in subsection 4.3.3, can again be
used for finding the MLE' s of .§ and 1 under this new assumption.
Table 6.6 shows the results of the use of these procedures for
obtaining ~ and
i.
Again, five sets of starting values were con-
sidered and the same convergence criterion used in Tables 6.3 and 6.5
was used in this case.
Basically, the same conclusions drawn before
from Tables 6.3-6.5 are true here and the MLE's were fotmd to be
~ = (1.175, 0.032)t and
i
= (3.627 X10- 2 , l2.00l x lO- 4 , 6.360 XlO- S,
0.987 XlO- 6)t.
Their standard errors are given in Table 6.6, trial 5.
Applying the results obtained above to the LR statistic we can
see that -2log A = 17.95 which corresponds to a p-value of 0.006.
Thus, the hypothesis of equality of .§'s and l's is rejected at a level
of 0.01.
Next, we test the second hypothesis discussed before.
As noted earlier in section 4.2, there are many different
..
situations Where one can have common parameters among the populations
Cchildren in our example).
situations.
In that section, we listed six of those
Clearly under this second hypothesis the parameters of
e
•
e
e
o
o::t
.....
Table o.b:
Results obtalned by applying the two iterative procedures to find the MLE's of
1
2
3
4
91
B(O)
1
.. 5
Scoring
91
6(0)
2
1(0)
1
3
0.5
(4)
1. 201
0.038(5)
Scoring
91
Same
0.002
91
same
1
91
5
Scoring
same
2
for the two groups together (17 children)
same
1.844
1.175
0.271 48.574 1603.300 5678.780
0.032 3.627
12.001
6.360
0.870
0.987
same
same
1.216
1.175
0.035
0.032
3.766
3.628
12.794
12.009
6.947
6.360
0.898
0.987
357.57 1949.29
1950.55
21
8
0:05.6
1:25.6
0.001
0.0001
1 (0)
3
0.02
1(0)
4
81
0.1
0
S2
1- 1"10 2
1- 2"10 4
13"10 5
- "106
4
1
L(1)
0
"
0.001
0.00001
1.214
1.175
0.035
0.032
3.751
3.627
12.711
12.001
6.898
6.360
0.898 1721. 07 1949.29
0.987
1950.55
20
7
0:05.4
1:14.6
1
1
1.216
1.175
0.035
0.032
3.764
3.627
12.795
12.001
6.955
6.360
0.897
0.987
-2.08 1949.27
1950.55
17
7
0:05.3
1:15.4
1949.29
17
0:05.4
1950.55
11
1:58.1
same
Scoring
~
Number
Time (3)
of
L(2)
1 Iterations Min:Sec
0:01.9
151. 21 1889.51
1950.55
8
1:25.6
1 (0)
2
same
Scoring
and
Converged Estimates
Starting Values
Trial Method
~
-0.0003 0.02
0.035 3.744
1.213
(0.048) (6) (0.002) (1.345)
1.175
0.032 3.627
(0.047)
(0.002) (1. 307)
0.005
same
12.673
(5.204)
12.001
(4.%3)
6.875 0.898
(2.672) (0.076)
6.360 0.987
(2.52/)) (0.083)
815.00
(4)
Value of the nonconstant part of the 10glikelihood function evaluated at the starting values.
Value of the nonconstant part of the 10g1ikelihood function evaluated at the converged estimates.
The time reported above includes CPU time for c~iling the program as well as executing it. Such CPU times on 1IU1tiprocessing systems may vary frOlll
one run to another depending upon the system load at the time of execution and should, therefore, be used only as a general indication of execution
time.
The Method of Scoring does not use initial values for the B.. The first values for the 8(r) are 8(1) computed from equation (4.6) using k(l(O)).
(5)
(6)
These initial values for 61 and 8 are the ordinary least square estimates of 81 and 82
2
Standard error of the estimate (asymptotic).
(1)
(2)
(3)
1
1
1
-
-
141
the 17 ''populations'' do not belong to the situations listed as
(i)-(iv) and new likelihood equations need to be derived.
But, the
ideas introduced in section 4.3 can easily be extended to this new
situation.
Following those ideas, it can be shown "that under the
hypothesis that .§BM = ~~1
=~
\vithout any assumption on .:EBM and
.:EWM the likelihood equations can be written as
and the ones for
iBM
and i\~1 are similar to expression (4.7) using
K = 11 and 7 respectively.
The Method of Scoring as described in sub-
section 4.3.2 can be applied to solve these new equations.
•
Newequa-
tions for the EM algorithm could also be obtained by following the
ideas introduced in the same subsection, but we do not think that this
is necessary because the
~4ethod
of Scoring has already sho\vn to be
very efficient for this particular example.
Three sets of starting values were considered and the program
converged to the same values in all the three instances.
starting values were:
The sets of
the t.IT..E' s of .:Em-I and .:EWM obtained before and the
two sets of starting values considered in Tables 6.3 and 6.5, trials 2
and 3.
Notice that this procedure does not require ~ (0).
The ~n..E' s
were found to be
.
~*= (1.177, 0.032)t,
iBM = (2.799
XlO-
2 , 11.012xlO- 4 S.214 xlO- s ,
O.887 xIO- 6)t and i~1 = (S.2l7 xIO- 2 , 13.966 XlO- 4 , 7.9IS XlO- S ,
1.136 xlO- 6)t
142
Again, following subsection 4.3.5,
'We
can see that the likelihood
ratio statistic to test the above hypothesis is
-2
A
where
ic, k is as above and ~~, k = ~ (i;1)
~ (i~) for k=12, ... ,17.
iWr.1
for k=1, ... ,11, and
Applying the values iBM' i~, i~t and
obtained above to the LR statistic
corresponds to a p-value of 0.002.
'We
got -2log A = 12.37 which
Thus, the second null hypothesis
is also rejected at a level of 0.01, i.e. different straight lines
should be considered for different groups.
The MLE's of the parameters
groups are ~BM' i PM , ~ and iwr.r The standard
errors for those estimates are given in Tables 6.3 and 6.5, trialS.
for each of the
we
t\-lO
finish this chapter by making some comments on computational
differences between the two procedures.
As can be seen from previous
chapters, the two procedures are based on completely different equations; hence it is expected that, even when they converge to the same
values, the number of iteration and CPU time required for the calculations are different.
of Tables 6.3-6.6.
For instance, consider the last two coltmllls
From those results we can see that the Method of
Scoring required much more CPU time than the EM algorithm, despite
the fact that the number of iteration required by the EM was greater
than the number required by Scoring.
The reason is that, while the B-1
algorithm requires most of the time operations with 2x2 matrices, the
Method of Scoring requires operations with matrices of order equal to
the number of observations for each child (a number equal to 30 for
some kids).
Clearly, depending upon the equipment available it may
•
143
occur that the EN algorithm can be applied but the Scoring cannot.
is interesting to notice that 165 iterations of the
~1
It
algorithm
(trials 5 and ]0 in Tables 6.3-6.4) needed 25.8 sec, while only 7
iterations of the scoring (trialS in Table 6.3) needed 39.5 sec.
(X1
the other hand, this difference between the CPU time required
by the two procedures is not expected to occur in the incomplete-data
case.
In this case, the equations used by them contain matrices of
approximately the same orders.
As noted before in Chapter I, the
~~thod
of Scoring has quadra-
tic convergence, while the EM has linear convergence.
Thus, the HI
algorithm is eXPected to require a greater number of iterations than
the other procedure.
Finally, we can see from Tables 6.3-6.6 that the EM algorithm
depends more on the starting values than the
•
~thod
of Scoring.
In
most of the situations, Scoring converged to exactly the same
values, whereas this did not happen to the EM algorithm.
For
instance, a simple change in the starting value for
~
completely different results (see trials 1 and 2).
Also, it is
important to notice that the
~thod
produced
of Scoring does not require a
starting value for 8.
Clearly we cannot say that one procedure is better than the
other one based only on these two examples.
Specific studies need to
be done, in particular for situations like the one considered in
example 2.
..
Most of the studies already done compare the two procedures
only in the incomplete-data case .
rnAPTER VII
•
SlM1ARY AND RECCMvIENDATIONS FOR RJRTIffiR RESEARGI
7.0
Summary
In this study we investigate maximum likelihood (ML) estimation
and likelihood ratio (LR) tests for the multivariate normal distribution with mean vector
~
and covariance matrix
k following
the
linear structures given by (1.1) and (1.2).
In Chapter I, we present a review of the literature and the
results obtained by Anderson (1969, 1970, 1973) and Szatrowski (1979,
1980, 1981, 1983), which are the basis for this work.
Chapter II discusses the estimation problem when the parameters
~ and 1 are subject to three types of constraints:
(ii) §t1
(i) 1t~
= &0
•
'
= rO and (iii) 1t~ = &0 and §t1 = rOo Likelihood equations
are obtained and necessary and sufficient conditions for S-explicit
representation (see definition 1.1) of the ML estimates are given.
It
is shown that it is not possible to have this explicit representation
when
k is
totally reducible and ! is constrained, unless 1.0 = Q.
The
strategy suggested by Andrade and Helms (1983) is extended here to
allow the use of the B1 algoritlml for solving the likelihood equations
when they do not have explicit solution and only .§ is constrained.
Another iterative procedure, the Method of Scoring, is also suggested
to obtaining the ML estimates in (i)-(iii).
It is also shown that one
iteration of this second procedure gives asymptotically efficient
145
estimates of the parameters when we have a consistent estimate of ok.
The asymptotic distribution of the ML estimators are also given.
In Chapter III, LR tests to test hypotheses related to the
linear structures and constraints given above are developed.
It is
shown. that the expressions for the LR statistics have an additional
term besides the usual quotient of detenninants of the ML estimates of
the covariance matrix ok, when.! is constrained.
depends on::LO and it vanishes for ::La
= O.
tions of the LR statistics are discussed.
This additional tenn
Null and noimti11 distribuThe nonnull distributions
are obtained under sequences of "local" alternatives and are shown.
to follow noncentra1 x2 distributions.
Chapter IV discusses the same points considered in Chapters II
and III but for situations where we have more than one population.
•
The linear structures given by (1.1) and (1.2) are extended for this
new approach and are given by (4.1) and (4.2).
the populations have equal
more detail.
~s
The situations where
and also equal I' s are studied in much
ML estimates, LR tests and asymptotic results are pre-
sented.
The incomplete-data case is discussed in Chapter V.
It is
shown. that different pattern of missing value can be considered as
different populations with same 12' s and also same .1' s.
Hence the
results obtained in section 4.3 are easily extended to this
incomplete-data case.
New equations for the EM algorithm are con-
sidered because for this particular iterative procedure the strategy
suggested in that section is no longer applicable to this case.
Finally, in Chapter VI two numerical examples are introduced to
illustrate part of the results presented throughout this work.
Some
146
comparisons between the two iterative procedures suggested to solve
the likelihood equations are made.
7.1
Recommendations for further research
Five points that might be considered for further research are
•
discussed in the following paragraphs.
1.
Study of practical situations where the covariance matrix is
given by the linear structure given by (1.2) plus some known symmetric
m
matrix, i.e. L = LO + L l G. The reason is that this structure can
- g=l g-g
be written as ~ =
20 = 1,
and
m
Ll g
g=O g g
subject to st l = YO with §t = [1 0 ... 0]
-
which has been already covered in Chapters II and III.
In other words, determine if there are practical situations where
:Yo
~
Q.
•
2.
Use of the necessary and sufficient conditions presented in
Chapter II for constructing experimental designs that provide S-explicit representation of the MLE's of the parameters.
This is particu-
1ar1y important in situations where incomplete data are expected to
occur and the EM algorithm is used for finding MLE's.
Clearly the
class of positive definite reducible matrices has an important role in
this study.
3.
Extend the results obtained in section 4.3 to different struc-
tures of common parameters among different populations, as for
instance when the populations are grouped in several different groups
(treatments), with the same
make comparisons among the
~
~ts
and l within each group, and we want to
and lIS.
147
4.
Extend the strategy suggested by Andrade and Helms (1983) for
applying the EM algorithm to complete data, to situations where 1
is constrai.ned.
As
this strategy involves partitioning
~
(and thus 1),
different partitions might imply different results.
S.
Evaluate the computational performance of the EM algorithm in
complete -data
situations as suggested in Chapters II and IV.
As
noted in the second example discuss'ed in Chapter VI, in many of those
situations this algorithm is much simpler to apply than the Method of
Scoring.
Actually, in most situations, one does not expect to have
an explicit solution of the likelihood equations; we think that a
broad study of iterative procedures to solve those equations needs to
be done .
•
APPENDIX
Lerruna A.O
(i)
~-l is positive definite (p.d.) provided ~ is p.d.
n
(ii)
(1 11
0
0
0
)
(iv)
oL ~i is p.d. provided ~i' i=l, ... ,n, are p.d. '5.
1=1
AtB-IA
d.
_ _ _ 15 p. d . provl·ded A·
_ 15 0 f f U 11 co 1umn rank and~O
15 p.
0
B-lC)
]
[( trB-lC
= l, ... ,m, is p.d. provided § is
- -5- -5' 55' , 5,5'
m
p.d. and given by § = L b C , where the CIS are symmetric
5=1 5-5
matrices and the b's are scalars.
Proof (an expanded version of Szatrowski (1983)'5 proof).
Only the
proof for (iv) is presented. The first three results are well knmvn.
Let ~ = (zl, ... ,zm)t with real elements.
Then,
m
zt[(trB-lC B-lC)
]z =
' z (trB-lC B-lC ,)z , =
- -5- -5' 55' L
5 - -5- -5 5
5,5'=1
tr§-lk(~)~-lk(~) = tr ~-l~-~k(~)~-~~-~k(~) = tr
y2
in
where C(z) = '. zk~ and V = B-~C(z)B-~ = (v. 0)' with all
- k~l Nk
- - - I)
Vo1) = 0 <=> ,..""
C(z) = 0 <=> Z = O. Now, as)l is svmmetric,
the diagonal
J •••••
elements of V2 are LV~ 0, and therefore tr V2 = L v~ 0 > 0 for ~ ~ Q.
0
f"V
-
~
_
j
""-I
I)
-
i,j
I)
o
Lemma A.l
then
(Szatrowski (1981)).
If
~
is a matrix given by (1.2),
•
149
o
Lemma A.2
matrix
~
(Szatrowski (1981)).
= (xij ),
If
---
I)
= tr
-1 )/ax ..
a (Y
~
I)
A(ay/ax .. )B,
~
~
•
[
-
I)
~
and
-1
= -y~ -1 cay~lax I)
. . )Y
~
Lemma A.3 (Arnold (1981 - p. 450)).
=
a matrix ftD1ction of a
then
atr(AYB)/ax ..
A
r is
o
•
Let
~1l
~2l
11
-1
= ~1
+
such that fJ and ~1 are both nonsingu1ar. Then, fj
-1
-1
-1
12
-1
-1
21
-1
-1
22
-1
~1~2~ ~1~1 ,fJ = -fJll~2~ ,~ = -~ ~1~1 and ~ = ~
-1
where ~ = fJz2 - fJz1~2· A similar result can be obtained when ~22'
instead of
~1' is nonsingu1ar. In this case, ~1l =~1-~2~~~21) -1.
o
Lemma A.4
(Andrade)
of dimension a
dimension a
2
x
1
Let ~
t,§t,
f
and
J2
be matrices such that fjt is
and rank a ~ a , and [fj : §] -t
2
2
1
a l . Then,
x a
= [f : J2]
\vi th
where E is any a x a matrix of full rank.
2
2
Proof
matrix.
Part (i) follows directly from the definition of inverse of a
For (ii) we can see that
f
of
150
which implies that
[:~:~:
. Then, us ing Lenuna 3 we can
see that
6~-16 = [~t££ _ ~tgQ(~tgQ)-l~t££]-l ==> 6(6t £-16)-16t =
N::.tf:Qt _ N::.t@ (~t@) -1)2\~t. The result follows by
t
noticing that ACt
I - BD (see (i)).
--- = ....a
........
2
from (ii).
Lemma A.5
(Szatrowski (1980)).
Part (iii) follows directly
0
Let 8 and
~
be sYmmetric,
posi~~ve
definite a x a matrices and let ....C be an a x b matrix of rank b ~ a. A
t -1 -1 t -1
t -1 -1 t -1
necessary and sufficient condition for (~8 ~) ~ 8 = (~~ ~) ~ ~
are spanned by exactly b eigenvectors of ~
When B
= """"a
I , we have that (C t A- 1C)-l ct A-1 = (C t C)-l ct if and only if
"..,. ,
is that the columns of
~
fI'IttJ,..."
the b columns of
Lemma A.6
let
fI"ItJ,..",,...,,
Let 8 and
~
Then,
<8>t 2-l<~>
I-l~-l~
(Szatrowski (1981)).
~
= (~)tr
o
be symmetric matrices and
be given by definition 2.2.
matrices and let
sion.
,..",,..,,,
are spanned by exactly b eigenvectors of 8.
(Szatrowski (1979)).
2 = 2(~)
Lemma A.7
~
/"t<J
-1
o
L, ~ and ....r be positive definite
Let ........
and ] be sYmmetric matrices, all of the same dimen-
Then,
o
•
151
Lemma A.8
(Helms).
Let
~
be a pxp symmetric matrix with eigenvalues
Ai satisfying IAil < 1, i=l, ..• ,p.
Then,
)
and the series is absolutely convergent.
Proof Let the spectral decomposition of ~ = ~t where
~ = Diag(A l ,A Z'''. '~), ITt = £t£ =.!p. Then, l.!p +~I =
~I I£tl = Iln+~1 =
r
so
that log
.IT
1=1
I!+~l
Note that lAo I
(l+A i ).
<
1
Lell.!p +
1 implies ~(l+Ai) > 0
1
is well defined and
00
10g(1+A.) = I (_l)k+lA~/k
1
k=l
1
is convergent.
Then,
p
•
= Ilog(l+A.)
1
.1
1=
=
I Y(_l)k+lA~/k
i=l k=l
1
P
00
= I (_l)k+lk- l L A~
k=l
i=l 1
00
= I (_l)k+lk-ltr(~k)
k=l
k
A1·• Since
A~ s p[max(\A. !)]k<p.l,
using the fact that tr(8k) =
!
1=1
o
!
. 1 1
1=
1
this power series is absolutely convergent and the interchange of
o
summations is justified.
Lenma A.9
(Helms) .
Let
~
and f be both pxp syrnmetric matrices,
be positive definite and let the eigenvalues of ~-lf be
~
152
...
~
Ap (these eigenvalues are all real.)
If all
IA.1 I
< 1t
and the series is absolutely convergent.
Proof Let ~ = 11t be a square root decomposition of ~, note ~-1
1-t1-1 ,and
let
l!p+~-lfl
8 = 1-1f1 -t .
=
=
Then,
t
11 ll!p+1- t 1- l fI11- t ,
= II +L-1CL-tl
""P ""
""""
Now, A = At ~ 0 and the eigenvalues of A are identical to the eigen""
""
values of
~
-1
-
~.
Thus,
•
Application of the preceding Lerrma proves the result above.
0
BIBLIOGRAPHY
Aitkin, M., Anderson, D. and Hinde, J. (1981). Statistical modelling
of data on teaching styles. J.R.S.S., A44, 419-61.
Anderson, T.W. (1957). Maxinurn likelihood estimates for a nultivariate nonnal distribution when some observations are missing.
J.A.S.A., 52, 200-3.
Anderson, T.W. (1958). An Introduction to Multivariate Statistical
Analysis. Wiley, New York.
Anderson, T.W. (1969). Statistical inference for covariance matrices
with linear structure. Proceed· s of the Second International
smosium on Multivariate a YS1S.
P.
r1S 1
- 6.
Aca emic Press, New York.
Anderson, T.W. (1970). Estimation of covariance matrices which are
linear combinations of given matrices. Essars in Probability and
Statistics, 1-24. University of North Carollfia Press, Chape
Hill, NC.
Anderson, T.W. (1971).
Wiley, New York.
The Statistical Analysis of Time Series.
Anderson, T.W. (1973). Asymptotic efficient estimation of covariance
matrices with linear structure. Ann. Statist., !, 135-41.
Andrade, D.F. and Helms, R.W. (1983). Maxim.un likelihood estimates
in the multivariate nonna1 with patterned mean and covariance
via the EM algorithm. University of North carolina at Chapel
Hill; Institute of Statistics Mimeo Series #1445.
Arnold, S.F. (1981). The Theory of Linear Models and M.1ltivariate
!Jlalysis. Wiley, New York.
Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1975). Discrete
M.1l tivariate Ana~sis: Theory and Practice. MIT Press,
CaJTibridge, Massa usetts.
Box, G.E.P. and Jenkins, G.M. (1976). Time series Analysis: Forecasting and Control, revised ed. Holden-Day, San Francisco.
Chen, Chan-Fu (1981). The EM approach of the nultiple indicators and
nultiple causes model via the estimation of the latent variable.
J.A.S.A., 76, 704-8.
.
Cole, James W.L. (1969). Multivariate analysis of variance .
University of North carolina at Chapel Hill; Institute of Statistics Mimeo Series #640.
154
Cox, D.R. and Hinkley, D.V. (1974).
and Hall, London.
Theoretical Statistics.
Chapman
Dempster,.A.p., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood
from Incomplete data via the EM algorithm. J.R.S.S.,B39, 1-38.
Dempster,.A.p., Rubin, D.B., and Tsutakawa, R.K. (1981).
covarIance components models. J.A.S.A.,~, 341~53.
Estimation in
v
~jek, J. and Sidak, Z. (1967).
.Press, New York.
Theory of Rank Tests.
Hartley, H.O. and Hocking, R.R. (1971).
Biometrics, 27, 783-823.
Academic
The analysis of incomplete data.
Harville, D.A. (1977). Maximum likelihood approaches to variance component estimation and to related problems. J.A.S.A., 72, 320-40.
R.R. and Marx, D.L. (1979). Estimation with incomplete data:
improved computational method and the analysis of nested data.
Commun. Statist.,A8, 1155-81.
~locking,
An
Kabe, D.G. (1975). Some results for the GMANOVA model.
Statist.,!, 813-20.
Comrmm.
Kleinbaum, D.G. (1970). Estimation and hypothesis testing for generalized multivariate linear models. Ph.D. thesis, University of North
Carolina, Chapel Hill, NC.
Laird, N.M. (1982). Computation of variance components using the
algorithm. J. Statist. Comput. Simul., 14, 295-303.
Elt
Laird, N.M. and Ware, J.H. (1982). Random-effects models for longitudinal data. Biometrics, 38, 963-74.
Leeper, J.D. and Woolson, R.F. (1982). Testing hypotheses for the
growth curve model when the data are incomplete. J. Statist.
Comput. Simul., 15, 97-107.
Morgan, B.J.T. and Titterington, D.M. (1977). A comparison of iterative methods for obtaining maximum likelihood estimates in contingency tables with a missing diagonal. Biometrika, 64, 265-9.
Morrison, D.F. (1976). Multivariate Statistical Analysis, 2nd ed.
McGraw-Hill, New York.
Olkin, I. (1972). Testing and estimation for structures ~lich are
circulary syrmJetric in blocks. Proceedings of the S~sitm1 on
Multivariate Analysis, 183-95. Dalhousie, Nova ScotIa.
Olkin, I. and Press, S.J. (1969). Testing and estimation for a circular
stationary model. Ann. Math. Statist., 40, 1358-73.
.
ISS
Orchard, T. and Woodbury, M.A. (1972). A missing information principle:
theory and applications. Proceedin,s of the 6th Berkeley Symposium
on Math. Statist. and Prob. I, 697- 15.
Rao, C.R. (1973). Linear Statistical Inference and Its
2nd ed. Wiley, New Yor .
1ications,
Rogers, G.S. and Young, D.L. (1975). Some likelihood ratio tests when
a normal covariance matrix has certain reducible linear structures.
Commun. Statist., !' 537-54.
Rogers, G.S. and Young, D.L. (1977). Explicit maximum likelihood estimators for certain patterned covariance matrices. Commun. Statist.
A6, 121-133.
Rogers, G.S. and Young, D.L. (1978). On testing a multivariate linear
hypothesis when the covariance matrix and its inverse have the same
pattern. J.A.S.A., 73, 203-7.
Rubin, D.B. (1974). Characterizing the estimation of parameters in
incomplete-data problems. J.A.S.A., 69, 467-74.
Rubin, D.B. (1976).
581-92.
Inference and missing data.
Biometrika, 63,
Rubin, D.B. and Szatrowski, T.H. (1982). Finding m1e of patterned
covariance matrices by the EM algorithm. Biometrika, 69, 657-60.
•
Searle, S.R. (1971).
Linear Models.
Wiley, New York .
Serfling, R.J. (1980). Approximation Theorems of Mathematical
Statistics. Wiley, New York.
Szatrowski, T.H. (1978). Explicit solutions, one iteration convergence
and averaging in the multivariate normal estimation problem for
patterned means and covariances. Arm. Inst. Statist. Math., A30,
81-8.
Szatrowski, T.H. (1979). Asymptotic nonnu11 distributions in the multivariate normal patterned mean and covariance matrix testing
problem. Arm. Statist., I, 823-37.
..
Szatrowski, T.H. (1980). Necessary and sufficient conditions for explicit solutions in the multivariate normal estimation problem for
patterned means and covariances. Ann. Statist., ~, 802-10 .
Szatrowski, T.H. (1981). Missing data in the multivariate normal
patterned mean and covariance matrix testing and estimation problem.
Program Statistics Research, Technical Report No. 81-17. Princeton,NJ
Szatrowski, T.H. (1983). Missing data in the one-population mU1tiv~ri­
ate normal patterned mean and covariance matrix testing and estlmation problem. Ann. Statist.,11, 947-58.
156
Szatrowski, T.H. and Miller, J.J. (1980). Explicit maximum likelihood
estimates from balanced data in the mixed model of the analysis of
variance. Ann. Statist., ~, 811-9.
Takeuchi, K., Yanai, H., and Mukherjee, B.N. (1982). The Foundations
of Multivariate Analysis. Wiley Eastern, New Delhi.
Trawinski, I.M. and Bargman, R.E. (1964). Maximtnn likelihood estimation with incomplete multivariate data. Ann. Math. Statist., 35,
647-57.
Votaw, D.F. (1948). Testing compound symmetry in a normal multivariate
distribution. Ann. ~~th. Statist., 19, 447-73.
Wilks, S.S. (1946). Sample criteria for testing equality of means,
equality of variances and equality of covariances in a mrmal multivariate distribution. Ann. Math. Statist., 17, 257-81.
Woolson, R.F., Leeper, J.D., and Clarke, W.R. (1978). Analysis of
incomplete data from longitudinal and mixed longitudinal studies.
J.R.S.S., Al4l, 242-52.
Woolson, R.F. and Leeper, J.D. (1980).
and incomplete longitudinal data.
Growth curve analysis of complete
Commun. Statist., A9, 1491-513.
Wu, C.F.J. (1983). On the convergence properties of the
Ann. Statist., 11, 95-103.
~1
algorithm.
Young, D.L. (1976). Inference concerning the mean vector when the
covariance matrix is totally reducible. J.A.S.A., 71, 696-9.
...
.