XI. Estimation of linear model
parameters
XI.A
XI.B
XI.C
XI.D
XI.E
Linear models for designed experiments
Least squares estimation of the expectation
parameters in simple linear models
Generalized least squares (GLS) estimation
of the expectation parameters in general
linear models
Maximum likelihood estimation of the
expectation parameters
Estimating the variance
Statistical Modelling Chapter XI
1
XI.A Linear models for designed
experiments
• In the analysis of experiments, the models that we have
considered are all of the form
E Y X and var Y V i2Si
• where
is an n1 vector of expected values for the observations,
Y is an n1 vector of random variables for the observations,
X is an nq matrix with nq,
is a q1 vector of unknown parameters,
V is an nn matrix,
i2 is the variance component for the ith term in the variance
model and
– Si is the summation matrix for the ith term in the variance model.
–
–
–
–
–
–
• Note that generally the X matrices contain indicator
variables, each indicating the observations that received
a particular level of a (generalized) factor.
Statistical Modelling Chapter XI
2
Summation matrices
• Si is the matrix of 1s and 0s that forms the sums for the
generalized factor corresponding to the ith term.
• Now,
Mi n1f Si so that Si nf Mi
i
i
• where
– Mi is the mean operator for the ith term in the variance model,
– n is the number of observational units and
– fi is the number of levels of the generalized factor for this term.
• For equally-replicated, generalized factors n/fi gi where
gi is the no. of repeats of each of its levels.
• Consequently, the Ss could be replaced with Ms.
Statistical Modelling Chapter XI
3
For example, variation model with Blocks
random for RCBD with b 3 and t 4
Unit
1
2
3
4
1
Block
II
2
3
1
2
BU
B2
B2
B2
B2
0
0
0
0
0
0
0
0
2
B2
2
BU
B2
B2
B2
0
0
0
0
0
0
0
0
3
B2
B2
2
BU
B2
B2
0
0
0
0
0
0
0
0
4
B2
B2
B2
2
BU
B2
0
0
0
0
0
0
0
0
1
0
0
0
0
2
BU
B2
B2
B2
B2
0
0
0
0
2
0
0
0
0
B2
2
BU
B2
B2
B2
0
0
0
0
3
0
0
0
0
B2
B2
2
BU
B2
B2
0
0
0
0
4
0
0
0
0
B2
B2
B2
2
BU
B2
0
0
0
0
1
0
0
0
0
0
0
0
0
2
BU
B2
B2
B2
B2
2
0
0
0
0
0
0
0
0
B2
2
BU
B2
B2
B2
3
0
0
0
0
0
0
0
0
B2
B2
2
BU
B2
B2
4
0
0
0
0
0
0
0
0
B2
B2
B2
2
BU
B2
I
Statistical Modelling Chapter XI
III
4
1
2
3
4
Thus the model is of the
form:
E Y XT
2
V BU
In B2 Ib Jt
Now MBU In and
MB 1t Ib Jt
note n
b bt b t
2
so V BU
MBU t B2MB
2
BU
SBU B2SB
4
Classes of linear models
• Firstly, linear models split into
– expectation model is of full rank
– expectation model is of less than full rank.
• The rank of an expectation model is given by the
following definitions — 2nd is equivalent to that given in
chapter I.
• Definition XI.1: The rank of an expectation model is
equal to the rank of its X matrix.
• Definition XI.2: The rank of an n q matrix X, with n q,
is the number of linearly independent columns in X.
• Such a matrix is of full rank when its rank equal q and is
of less than full rank when its rank is less than q.
• It will be less than full rank if the linear combination of
some columns is equal to the same linear combination as
some other columns.
Statistical Modelling Chapter XI
5
Simple vs general models
• Secondly, linear models split into those for which
the variation model is
– simple in that it is of the form 2In
– more general form that involves more than one term.
• For general linear models, S matrices in variation
model can be written as direct products of I and J
matrices.
Definition XI.3: The direct-product operator is denoted
and, if Ar and Bc are square matrices of order r and c,
respectively, then
a11B
A r Bc
ar 1B
Statistical Modelling Chapter XI
a1r B
arr B
6
Expressions for S matrices in general
models
• When all the variation terms correspond to only
unrandomized, and not randomized, generalized factors,
use the following rule to derive these expressions.
• Rule XI.1: The Si for the ith generalized factor from the
unrandomized structure that involves s factors, is the
direct product of s matrices, provided the s factors are
arranged in standard order.
• Taking the s factors in the sequence specified by the
standard order,
– for each factor in the ith generalized factor include an I matrix in
the direct product and
– J matrices for those that are not.
– The order of a matrix in the direct product is equal to the number of
levels of the corresponding factor.
Statistical Modelling Chapter XI
7
Example IV.1 Penicillin yield (continued)
• This experiment was an RCBD and its
experimental structure is
Structure
Formula
unrandomized 5 Blends/4 Flasks
4 Treatments
randomized
• Two possible models depending on whether
Blends random or fixed.
• For Blends fixed,
– and taking the observations to be ordered in standard
order for Blends then Flasks,
– the only random term is BlendsFlasks and the
maximal expectation and variation models are
E Y XB X T XB
X T
2
2
V BF
SBF BF
I5 I4
Statistical Modelling Chapter XI
8
Example IV.1 Penicillin yield (continued)
– Suppose observations are
in standard order for Blend then Treatment;
with Flask renumbered within Blend to correspond to Treatment as
in the prerandomization layout.
– Analysis for prerandomization and randomized layouts must be same
as prerandomization layout is one of the possible randomized
arrangements.
– Then the vectors
1 0 0 0 0
1 0 0 0
and matrices in the
1 0 0 0 0
0 1 0 0
1 0 0 0 0
0 0 1 0
maximal expectation
1 0 0 0 0
0 0 0 1
0 1 0 0 0
1 0 0 0
model
E Y XB X T
XB
are:
X T
– Can write the Xs as
direct products:
XB I5 14
XT 15 I4
Statistical Modelling Chapter XI
0
0
0
0
XB 0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
1
0
0 , 2 ,
3
0
4
0
5
0
0
0
0
1
1
1
1
0
0
0
1
X T 0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0 ,
0
1
0
0
0
1
0
0
0
1
1
2 .
3
4
9
Example IV.1 Penicillin yield (continued)
1
1
1
1
0
0
0
0
0
XB 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0 , 2 ,
3
0
4
0
5
0
0
0
0
1
1
1
1
1
0
0
0
1
0
0
0
1
X T 0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0 ,
0
1
0
0
0
1
0
0
0
1
1
2 .
3
4
• Clearly, sum of indicator variables (columns) in XB and
sum of those in XT both equal 120
• So the expectation model involving the X matrix [XB XT]
is of less than full rank — its rank is b + t -1.
• The linear model is a simple linear model because the
variance matrix is of the form 2In.
• That is, the observations are assumed to have the
same variance and to be uncorrelated.
Statistical Modelling Chapter XI
10
Example IV.1 Penicillin yield (continued)
• On the other hand, for Blends random,
– the random terms are Blends and BlendsFlasks and
– the maximal expectation and variation models are
E Y XT
2
2
V BF
SBF B2SB BF
I5 I4 B2I5 J4
• In this case the expectation is of full rank as XT
consists of t linearly independent columns.
• However, it is a general, not a simple, linear
model as the variance matrix is not of the form
2I n.
Statistical Modelling Chapter XI
11
Example IV.1 Penicillin yield (continued)
• The form is illustrated for b 3, t 4 below, the extension
to b 5 being obvious.
Flask
1
2
3
4
1
Blend
II
2
3
1
2
BF
B2
B2
B2
B2
0
0
0
0
0
0
0
0
2
B2
2
BF
B2
B2
B2
0
0
0
0
0
0
0
0
3
B2
B2
2
BF
B2
B2
0
0
0
0
0
0
0
0
4
B2
B2
B2
2
BF
B2
0
0
0
0
0
0
0
0
1
0
0
0
0
2
BF
B2
B2
B2
B2
0
0
0
0
2
0
0
0
0
B2
2
BF
B2
B2
B2
0
0
0
0
3
0
0
0
0
B2
B2
2
BF
B2
B2
0
0
0
0
4
0
0
0
0
B2
B2
B2
2
BF
B2
0
0
0
0
1
0
0
0
0
0
0
0
0
2
BF
B2
B2
B2
B2
2
0
0
0
0
0
0
0
0
B2
2
BF
B2
B2
B2
3
0
0
0
0
0
0
0
0
B2
B2
2
BF
B2
B2
4
0
0
0
0
0
0
0
0
B2
B2
B2
2
BF
B2
I
III
4
1
2
3
4
• Model allows
for covariance,
and hence
correlation,
between flasks
from the same
blend.
2
2
• Clearly, BF occurs down the diagonal and B occurs in
diagonal 44 blocks.
Statistical Modelling Chapter XI
12
Example IV.1 Penicillin yield (continued)
• The direct-product expressions for both Vs
conform to rule XI.1.
• Recall, the units are indexed with Blocks then
Flasks.
2
• Hence, the direct product for BF
is I5 I4 and it
involves two I matrices because both
unrandomized factors, Blends and Flasks, occur
in BlendsFlasks.
2
• The direct product for B is I5 J4 and it
involves an I and a J because the factor Blends
is in this term but Flasks is not.
Statistical Modelling Chapter XI
13
Example IX.1 Production rate experiment
(revisited)
• Layout:
Factories
1
Parts
1 2 3
2
Parts
1 2 3
3
Parts
1 2 3
4
Parts
1 2 3
I
2 2 2
A C B
3 3 3
A B C
3 3 3
B A C
3 3 3
B C A
II
1 1 1
B A C
2 2 2
A C B
2 2 2
B A C
2 2 2
A C B
III
3 3 3
C B A
1 1 1
C B A
1 1 1
C B A
1 1 1
C A B
Area
Numerals inside boxes represent methods of organisation
A–C represent source of raw material
• The experimental structure for this experiment was:
Structure
Formula
unrandomized 4 Factories/3 Areas/3 Parts
3 Methods*3 Sources
randomized
Statistical Modelling Chapter XI
14
Example IX.1 Production rate experiment
(revisited)
• Suppose all the unrandomized factors are
random
• So the maximal expectation and variation models
are, symbolically
– E[Y] MethodsSources
– var[Y] Factories + FactoriesAreas
+ FactoriesAreasParts.
• That is
E Y XMS αβ
var Y V
2
2
FAP
SFAP FA
SFA F2SF
2
2
36 2
FAP
MFAP 436
M
M
3 FA FA
4 F F
Statistical Modelling Chapter XI
2
2
FAP
MFAP 3 FA
MFA 9 F2MF.
15
Example IX.1 Production rate experiment
(revisited)
• Now if the observations are arranged in standard order
with respect to the factors Factories, Areas and then
Parts, we can use rule XI.1 to obtain the following directproduct expression for V.
2
2
V FAP
SFAP FA
SFA F2SF
2
2
FAP
I4 I3 I3 FA
I4 I3 J3 F2I4 J3 J3.
• In this matrix
2
– FAP
will occur down the diagonal,
2
– FA will occur in 3 3 blocks down the diagonal and
2
– F will occur in 9 9 blocks down the diagonal.
• So this model allows for
– covariance between parts from the same area and
– a different covariance between areas from the same factory.
Statistical Modelling Chapter XI
16
Example IX.1 Production rate experiment
(revisited)
• Actually, in ch. IX, Factories was assumed fixed and
so
E Y XFδ XMS αβ
2
2
V FAP
SFAP FA
SFA
2
2
FAP
I4 I3 I3 FA
I4 I3 J3.
• So the rules still apply as terms in V only involve
unrandomized factors
• This model only allows for
– covariance between parts from the same area.
Statistical Modelling Chapter XI
17
XI.B Least squares estimation of the
expectation parameters in simple
linear models
• Want to estimate the parameters in the
expectation model by establishing their
estimators.
• There are several different methods for
doing this.
• A common method is the method of least
squares, as in the following definition
originally given in chapter I.
Statistical Modelling Chapter XI
18
a) Ordinary least squares estimators
for full-rank expectation models
• Definition XI.4: Let Y X + e where
– X is an nq matrix of rank m q with nq,
– is a q1 vector of unknown parameters,
– e is an n1 vector of errors with mean 0 and variance
2I n.
The least ordinary least squares (OLS)
estimator of is the value of that minimizes
εε ni 1e i2
• Note applies to the less-than-full rank case
Statistical Modelling Chapter XI
19
Least squares estimators of
• Theorem XI.1: Let Y X + e where
–
–
–
–
Y is an n1 vector of random variables for the observations,
X is an nq matrix of full rank with nq,
is a q1 vector of unknown parameters,
e is an n1 vector of errors with mean 0 and variance 2In.
The ordinary least squares estimator of is given by
-1
θˆ XX XY
• (The ‘^’ denotes estimator)
• Proof: Need to set differential of e'e (Y – X)'Y – X) to
zero (see notes for details);
• Yields the normal equations XXθˆ XY whose solution
is given above.
Statistical Modelling Chapter XI
20
Estimator of
• Using definition I.7 (expected values), an
estimator of the expected values is
-1
ˆ Xθˆ X XX XY
ψ
• Now for experiments
– the X matrix consists of indicator variables and
– the maximal expectation model is generally the sum of
terms each of which is an X matrix for a (generalized)
factor and an associated parameter vector.
• In cases where maximal expectation model
involves a single term corresponding to just one
generalized factor,
– the model is necessarily of full rank and
– the OLS estimator of its parameters is particularly
simple.
Statistical Modelling Chapter XI
21
Example VII.4 22 Factorial experiment
• To aid in proofs coming up consider a two-factor factorial
experiment (as in chapter 7, Factorial experiments).
• Suppose there are the two factors A and B with a b 2,
r 3 and n 12.
• Assume that data is ordered for A then B then replicates.
Y Y111 Y112 Y113 Y121 Y122 Y123 Y211 Y212 Y213 Y221 Y222 Y223
• The maximal model involves a single term corresponding
11
to the generalized factor AB — it is
1
1
1
0
0
AB E Y X AB 0
0
0
0
That is, each column of XAB indicates
0
which observations received one of the 0
combinations of the levels of A and B.
0
Statistical Modelling Chapter XI
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
11
11
12
11 12
12 12
21 21
22 21
21
22
22
22
22
Example VII.4 22 Factorial experiment
(continued)
• Now, from theorem XI.1 (OLS estimator),
the estimator of the expectation
parameters is given by
XABX AB
-1
• So we require XABX AB
Statistical Modelling Chapter XI
XABY
-1
and XABY.
23
1 0 0 0
Example VII.4 22 Factorial experiment 1 0 0 0
1 0 0 0
(continued) — obtaining inverse
0 1 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
XAB X AB 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0
0 0 1 0
0 0 0 1
0 0 0 1
0 0 0 1
3 0 0 0 • Notice in the above that
– rows of XAB are columns of XAB
0 3 0 0
0 0 3 0
– each element in product is of the form
0 0 0 3
n
xi x j
xki xkj
3I4.
XABX AB
-1
1
I
3 4
Statistical Modelling Chapter XI
k 1
– where xi and xj are the n-vectors for
the ith and jth columns of X,
respectively.
– i.e. sum of pairwise products of
elements of ith and jth columns of X. 24
Example VII.4 22 Factorial experiment
Y111
(continued) — obtaining product
1
XAB Y 0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
Y112
Y113
Y121
0 Y122
0 Y123
0 Y211
1 Y212
Y
213
Y221
Y222
Y223
Y111 Y112 Y113 Y11.
Y Y Y123 Y12.
121 122
Y211 Y212 Y213 Y21.
Y221 Y222 Y223 Y22.
• where Yij. is the sum over the replicates for the ijth
combination of A and B.
Statistical Modelling Chapter XI
25
Example VII.4 22 Factorial experiment
(continued) — obtaining parameter estimates
-1
XAB X AB XAB Y
31 0 0 0 Y
0 1 0 0 Y11.
3
12.
1 0 Y
0
0
21.
3
0 0 0 31 Y22.
Y11.
Y
12.
Y21.
Y
22.
• where Yij . is the mean over the replicates for the ijth
combination of A and B.
Statistical Modelling Chapter XI
26
Example VII.4 22 Factorial experiment
(continued) — obtaining estimator of expected
values
ˆ X AB
ψ
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
Y11.
0
Y11.
0
Y11.
0
Y
0
12.
Y
Y
0 11.
12.
0 Y12. Y12. A B M Y
AB
0 Y21. Y21.
0 Y22. Y21.
0
Y21.
1
Y22.
1
Y
1
Y22.
22.
-1
X AB XABX AB XAB is the mean
• where MAB
operator producing the n-vector of means for the
combinations of A and B.
Statistical Modelling Chapter XI
27
Proofs for expected values for a
single-term model
• First prove that model is of full rank and
then derive the OLS estimator.
• Lemma XI.1: Let X be an n f matrix with
n ≥ f.
Then rank(X) rank(XX).
Proof: not given.
Statistical Modelling Chapter XI
28
Proofs for expected values for a
single-term model (continued)
• Lemma XI.2: Let Y X + e where X is an n f
matrix corresponding to a generalized factor, F
say, with f levels and n ≥ f.
Then, the rank of X is f so that X is of full rank.
• Proof: Lemma XI.1 states that rank(X)
rank(XX) and it is easier to derive the rank of
XX.
• To do this we need to establish a general
expression for XX which we do by considering
the form of X.
Statistical Modelling Chapter XI
29
Proofs for expected values for a
single-term model (continued)
• Now
– n rows of X correspond to observational units in the
experiment and
– f columns to the levels of the generalized factor.
• All the elements of X are either 0 or 1
– In the ith column, a 1 occurs for those units with the ith
level of the generalized factor
so that, if the ith level of the generalized factor is
replicated gi times, there will be gi 1s in the ith column.
– In each row of X there will be a single 1 as only one
level of the generalized factor can be observed with
each unit
if it is the ith level of the generalized factor that occurs
on the unit, then the 1 will occur in the ith column of X.
Statistical Modelling Chapter XI
30
Proofs for expected values for a
single-term model (continued)
• Considering the f f matrix XX
– its ijth element is the product of the ith column, as a row vector,
with the jth column as given in the following expression (see
example above):
n
xi x j
xki xkj
k 1
• When i j, the 2 columns are same and the products are
either of two 1s or two 0s, so
– product is 1 for units that have ith level of generalized factor and
– sum will be gi.
• When i ≠ j, have product of vector for ith column, as a row
vector, with that for jth column.
– As each unit can receive only one level for the generalized factor,
it cannot receive both ith and jth levels and the products are
either two 0s or a 1 and a 0.
– Consequently, sum is zero.
Statistical Modelling Chapter XI
31
Proofs for expected values for a
single-term model (continued)
• It is concluded that XX is a f f diagonal
matrix with diagonal elements gi.
• Hence, in this case, the rank(X)
rank(XX) f and so X is of full rank.
Statistical Modelling Chapter XI
32
Proofs for expected values for a
single-term model (continued)
• Theorem XI.2: Let Y X + e
where
– X is an n f matrix corresponding to a generalized
factor, F say, with f levels and n ≥ f.
– is a f 1 vector of unknown parameters,
– e is an n1 vector of errors with mean 0 and variance
2I n.
The ordinary least squares estimator of is
denoted by θ̂ and is given by double over-dots indicate
θˆ Ff 1
not n-vector, but f-vector.
where Ff 1 is the f-vector of means for
the levels of the generalized factor F.
Statistical Modelling Chapter XI
33
Proofs for expected values for a
single-term model (continued)
• Proof: From theorem XI.1(OLS estimator) we
have that the OLS estimator is
-1
θˆ XX XY
• Need expressions for XX and XY in this special
case .
• In the proof of lemma XI.2 it was shown that XX
is a f f diagonal matrix with diagonal elements
gi.
• Consequently, (XX)-1 is a diagonal matrix with
diagonal elements g1i
Statistical Modelling Chapter XI
34
Proofs for expected values for a
single-term model (continued)
• Now XY is an f-vector whose ith element is the
total for the ith level of the generalized factor of
the elements of Y.
– This can be seen when it is realized that the ith
element of XY is the product of the ith column of X
with Y.
– Clearly, the result will be the sum of the elements of Y
corresponding to elements of the ith column of X that
are one:
those units with the ith level of the generalized factor.
• Finally, taking the product of (XX)-1 with XY
divides each total by its replication forming the fvector of means as stated.
Statistical Modelling Chapter XI
35
Proofs for expected values for a
single-term model (continued)
• Corollary XI.1: With the model for Y as in
theorem XI.2, the estimator of the expected
values are given by
ˆ Xθˆ Fn1 MFY
ψ
where
– Fn1 is the n-vector an element of which is the
mean for the level of the generalized factor F
for the corresponding unit and
– MF is the mean operator that replaces each
observation in Y with the mean for the
corresponding level of the generalized factor F.
Statistical Modelling Chapter XI
36
Proofs for expected values for a singleterm model (continued)
ˆ Xθˆ and, as stated in proof of
• Proof: Now ψ
lemma XI.2 (rank X), each row of X contains a
single 1
– if it is the ith level of the generalized factor that occurs
on the unit, then the 1 will occur in the ith column of X.
ˆ Xθˆ will be the ith
• So corresponding element of ψ
element of θˆ Ff 1 which is the mean for the ith
ˆ Fn1
level of the generalized factor F and ψ
ˆ MFY, we have from theorem XI.1
• To show that ψ
-1
ˆ Xθˆ X XX XY
(OLS estimator) that ψ
-1
ˆ MFY
• Letting X XX X MF yields ψ
• Now, given ψ
ˆ Fn1 also, MF must be the mean
operator that replaces each observation in Y with
the mean for the corresponding level of the F.
37
Statistical Modelling Chapter XI
Example XI.1 Rat experiment
• See lecture notes for this example that involves a
single, original factor whose levels are unequally
replicated.
Statistical Modelling Chapter XI
38
b) Ordinary least squares estimators
for less-than-full-rank expectation
models
• The model for the expectation is still of the form E[Y]
X but in this case the rank of X is less than the number
of columns.
• Now rank(X) rank(XX) m < q and so there is no
inverse of XX to use in solving the normal equations
XXθˆ XY
• Our aim is to show that, in spite of this and the difficulties
that ensue, the estimators are functions of means.
• For example, the maximal, less-than-full-rank, expectation
model for an RCBD is:
B+T E[Y] XB XT
• Will show, with some effort, the estimator of the expected
values is:
ˆ B+T B T - G
ψ
39
Statistical Modelling Chapter XI
Less-than-full-rank vs full-rank model
1. In the full rank model it is assumed that the parameters
specified in the model are unique.
There exists exactly one set of real numbers {1, 2, …, q}
that describes the system.
In the less-than-full-rank model there are infinitely many
sets of real numbers that describe the system.
For our example, there are infinitely many choices for {1, …,
b, 1, …, t}.
The model parameters are said to be nonidentifiable.
2. In the full rank model XX is nonsingular and so (XX)-1
exists and the normal equations have the one solution:
-1
ˆ XX XY
In a less-than-full-rank model there are infinitely many
solutions to the normal equations.
3. In the full rank model all linear functions of {1, 2, …, q}
can be estimated unbiasedly.
In the less-than-full-rank model some functions can
while others cannot.
Want to identify functions with invariant estimated
values.
Statistical Modelling Chapter XI
40
Example XI.2 The RCBD
• Simplest model that is less than full rank is the
maximal expectation model for the RCBD,
E[Y] = Blocks + Treats.
• Generally, RCBD involves b blocks with t
treatments so that there are n b t
observations.
• The maximal model used for an RCBD, in matrix
terms, is:
B+T E[Y] XB XT and var[Y] 2In
• However, as previously mentioned, the model is
not of full rank because the sums of the columns
of both XB and XT are equal to 1n.
• Consequently, for X [XB XT],
rank(X) rank(XX) b + t - 1
Statistical Modelling Chapter XI
41
Illustration for b=t=2 that the
parameters are nonidentifiable
1 1 10
1 2 15
• Suppose following information is
known about the parameters:
2 1 12
2 2 17.
• Then the parameters are not identifiable for:
1.if 1 5, then 1 5, 2 10 and 2 7;
2.if 1 6, then 1 4, 2 9 and 2 8.
• Clearly can pick a value for any one of the parameters
and then find values of the others that satisfy the above
equations.
• Infinitely many possible values for the parameters.
• However, no matter what values are taken for 1, 2, 1
and 2, the value of 2 - 1 2 and of 2 - 1 5;
– these functions are invariant.
Statistical Modelling Chapter XI
42
Coping with less-than-full-rank
models
• Extend estimation theory presented in
section a), Ordinary least squares
estimators for full-rank expectation models.
• Involves obtaining solutions to the normal
equations using generalized inverses.
Statistical Modelling Chapter XI
43
Introduction to generalized inverses
• Suppose that we have a system of n linear
equations in q unknowns such as Ax y
• where
– A is an n q matrix of real numbers,
– x is a q-vector of unknowns and
– y is a n-vector of real numbers.
• There are 3 possibilities:
1. the system is inconsistent and has no solution;
2. the system is consistent and has exactly one
solution;
3. the system is consistent and has many solutions.
Statistical Modelling Chapter XI
44
Consistent vs inconsistent equations
• Consider the following equations:
x1 + 2x2 7
3x1 + 6x2 21
• They are consistent in that the solution of one will also
satisfy the second because the second equation is just 3
times the first.
• However, the following equations are inconsistent in that
it is impossible to satisfy both at the same time:
x1 + 2x2 7
3x1 + 6x2 24
• Generally, to be consistent, any linear relations on the left
of the equations must also be satisfied by the righthand
side of the equation.
• Theorem XI.3: Let E[Y] X be a linear model for the
expectation. Then the system of normal equations
XX ˆ XY
is consistent.
Proof: not given.
Statistical Modelling Chapter XI
45
Generalized inverses
• Definition XI.5: Let A be an n q matrix. A
q n matrix A such that
AA-A = A
is called a generalised inverse (g-inverse
for short) for A.
• Any matrix A has a generalized inverse but
it is not unique unless A is nonsingular, in
which case A- A-1.
Statistical Modelling Chapter XI
46
Example XI.3 Generalized inverse of a 22 matrix
1
0
2
0
Take A
, then A 2
0 0
0 0
1
2
0
2 0 2 0 2 0
since
0 0 0 0 0 0 0 0
It is easy to see that the following matrices are also
generalized inverse for A:
12 0 and 12 0
1 1
4 -3
This illustrates that generalized inverses are not
necessarily unique.
Statistical Modelling Chapter XI
47
An algorithm for finding generalized
inverses
To find a generalized inverse A- for an
matrix A of rank m:
1. Find any m m minor H of A i.e. the
matrix obtained by selecting any m rows
and any m columns of A.
2. Find H-1.
3. Replace H in A with (H-1)'.
4. Replace all other entries in A with zeros.
5. Transpose the resulting matrix.
Statistical Modelling Chapter XI
48
Properties of generalized inverses
• Let A be an n q matrix of rank m with n q m.
• Then
1. A-A and AA- are idempotent.
2. rank(A-A) rank(AA-) m.
3. If A- is a generalized inverse of A, then (A-)' is a
generalized inverse of A'; that is, (A-)' = (A')-.
4. (A'A)-A' is a generalized inverse of A so that
A A(A'A)-(A'A) and A' (A'A)(A'A)-A'.
5. A(A'A)-A' is unique, symmetric and idempotent as it
is invariant to the choice of a generalized inverse.
Furthermore rank(A(A'A)-A') m.
Statistical Modelling Chapter XI
49
Generalized inverses and the
normal equations
• Lemma XI.3: Let Ax y be consistent.
Then x A-y is a solution to the system where
A- is any generalized inverse for A.
• Proof: not given.
Statistical Modelling Chapter XI
50
Generalized inverses & the normal
equations (cont’d)
• Theorem XI.4: Let Y X + e
where X is an n q matrix of rank m q with n q, is a
q 1 vector of unknown parameters, e is an n 1 vector
of errors with mean 0 and variance 2In.
The ordinary least squares estimator of is denoted by θ̂
and is given by θˆ XX - XY.
• Proof: In our proof for the full-rank case, showed that the
LS estimator is the solution of the normal equations.
• Nothing in our derivation of this result depended on the
rank of X.
– So the least squares estimator in the less-than-full-rank case is
still a solution to the normal equations XX ˆ XY.
• Applying theorem XI.3 (normal eqns consistent) and
lemma XI.3 (g-inverse gives soln) to the normal
equations, yields ˆ XX - XY as a solution of the
normal equations as claimed. (not unique — see notes)
Statistical Modelling Chapter XI
51
A useful lemma
• Next theorem derives the OLS estimators for the lessthan-full-rank model for the RCBD.
• But, first a useful lemma.
• Lemma XI.4: The inverse of a Im - m1 Jm b m1 Jm is
1
1 1
1J
where a and b are scalars.
I
J
m
m m
m m
a
b
• Proof: First show that
and Im - m1 Jm m1 Jm 0.
1J 2
m m
1J ,
m m
Im -
1J 2
m m
Im - m1 Jm
Think of J matrices as
“sum-everything” matrices.
• Consequently,
1
1 1
1J
1J
1J
a
I
b
I
m m m m m a m m m b m Jm
Im - m1 Jm m1 Jm
Im
1
1 1
1
• so that Im - m Jm m Jm is the inverse of
a
b
a I - 1 J b 1 J
Statistical Modelling Chapter XI
m
m
m
m
m
53
OLS estimators for the less-than-fullrank model for the RCBD
• Theorem XI.6: Let Y be a n-vector of jointly-distributed
random variables with
E[Y] XB + XT and var[Y] 2In
where
– , XB, and XT are the usual vectors and matrices.
• Then a nonunique estimator of , obtained by deleting the
last row and column of X in computing the g-inverse, is
• Triple over-dots
indicates a modified,
double over-dots
vector
• In this case, t-vector Tt1
where
modified
– Bb1 is the b-vector of block means of Y,
– T t -11 is the (t -1)-vector of the first t -1 treatment means of Y,
– Tt
is the mean of Y for treatment t and
– Y
is the grand mean of Y.
Bb1 Tt 1b - Y 1b
ˆ T t -1 1 - Tt 1 t -1
0
Statistical Modelling Chapter XI
54
Proof of theorem XI.6
• According to theorem XI.4 (OLS for LTFR),
ˆ XX XY
• and so we require a generalized inverse of X'X where
X [XB XT], so that
X
X X XBX T tIb Jbt
XX B XB X T B B
XT
XT XB XT XT Jt b bIt
1
1
1
1
0
0
0
0
0
XB 0
0
0
0
0
0
0
0
0
0
0
Statistical Modelling Chapter XI
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0 , 2 ,
3
0
4
0
5
0
0
0
0
1
1
1
1
1
0
0
0
1
0
0
0
1
X T 0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0 ,
0
1
0
0
0
1
0
0
0
1
no. times each
treatment occurs
in each block
1
2 .
3
4
56
An algorithm for finding generalized
inverses (from before)
To find a generalized inverse A- for an
matrix A of rank m:
1. Find any m m minor H of A i.e. the
matrix obtained by selecting any m rows
and any m columns of A.
2. Find H-1.
3. Replace H in A with (H-1)'.
4. Replace all other entries in A with zeros.
5. Transpose the resulting matrix.
Statistical Modelling Chapter XI
57
Proof of theorem XI.6
X
XX B XB
XT
X X
X T B B
XT XB
XBX T tIb
XT XT Jt b
Jbt
bIt
• Our algorithm tells us that we must take a minor of order
b+t-1 and obtain its inverse.
• So delete any row and column and find its inverse
– say last row and column.
• To remove the last row and column of X'X,
– let Z [XB XT-1] where XT-1 is the n(t -1) matrix formed by
taking the first t -1 columns of XT.
Jb t -1
XBX T -1 tIb
• Then, ZZ XBXB
XT -1XB XT -1X T -1 J t -1b bI t -1
is X'X with its last row and column removed.
• Now Z'Z is clearly symmetric so that its inverse will be
-1
also. Our generalized inverse will be ZZ with a row
and column of zeros added. (no need to transpose)
Statistical Modelling Chapter XI
58
Proof of theorem XI.6 (continued)
• Now to find the inverse of Z'Z we note that
– for a partitioned, symmetric matrix H where
H A C
C D
-1
-1
– then H H
U V
V W
A -1 - VCA -1
-1
WC
A
-A -1CW
D - CA -1C
-1 .
• So first we need to find W, then V and lastly U
• In our case
tIb
ZZ
J
t -1b
Statistical Modelling Chapter XI
Jb t -1 A C
bI t -1 C D
59
Proof of theorem XI.6 (continued)
-1
W D - CA -1C
bI
I
I
I
I
I
I
bI t -1 Jb t -1
tIb
ZZ
J t -1b bI t -1
A C
C D
1
b
1
b
• Note that
lemma XI.4
(special inverse)
is used to obtain
the inverse in
this derivation.
Statistical Modelling Chapter XI
J t -1b 1t Ib Jb t -1
t -1
-
t -1
-
t -1
- 1 -
b
J
t t -1
-
-1
-1
t -1 1
J
t t -1 t -1
-1
1 1 J
t t -1 t -1
t
-1
1
b
t -1
1 J
t -1 t -1
1 1 J
t t -1 t -1
1
b
1 J
t -1
t -1 t -1
1 J
t -1 t -1
1
b
t -1
J
t -1
t -1 t -1
1
b
t -1
-1
J t -1 .
60
Proof of theorem XI.6 (continued)
tIb
ZZ
J
t -1b
Jb t -1 A C
bI t -1 C D
V -A -1CW
- 1t Ib Jb t -1 b1 I t -1 J t -1
- bt1 Jb t -1 t - 1 Jb t -1
U A -1 - VCA -1
1t Ib b1 Jb t -1J t -1b 1t Ib
1t Ib tbt-1 Jb.
- b1 Jb t -1
• Consequently,
• and
Statistical Modelling Chapter XI
1 I t -1 J
-1
t b
bt b
ZZ 1
- b J t -1b
1 I t -1 J
b
t b
bt
X
X
- b1 Jt -1b
01b
1
b
I t -1 J t -1
- b1 Jb t -1
- b1 Jb t -1
1
b
I
t -1
J t -1
01 t -1
0b1
0 t -11 .
0
61
Proof of theorem XI.6 (continued)
• Now want to partition X'Y similarly:
1
0
0
0
X 0
1
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
1
0
0
1
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
0
0
1
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0 X'Y is a vector
1 of totals
0
0 Total
0
= mean n.
1
B
BY tBb1
X
X
X
XY B Y XT -1 Y XT -1Y bT t -11
X T
xt
xt Y bT
t
where
Express as
means for
notational
economy
– XT-1
is the n(t -1) matrix formed by taking the first t -1
columns of XT,
–
–
–
–
is column t of XT,
is the b-vector of block means of Y,
is the (t -1)-vector of the first t -1 treatment means of Y,
is the mean of Y for treatment t and
xt
Bb1
T t -11
Tt
Statistical Modelling Chapter XI
62
Proof of theorem XI.6 (continued)
• Hence the nonunique estimators of are
ˆ XX XY
1 I t -1 J
b
bt
t b
- b1 J t -1b
01b
- b1 Jb t -1
1
b
I
t -1
J t -1
0b1 tB
b1
0 t -11 bT t -11
0
bTt
01 t -1
I t -1 J B - J
T
b
b
b
1
b
t
1
t
1
1
b
- bt J t -1bBb1 I t -1 J t -1 T t -1 1 .
0
• Now to simplify this expression further we look to simplify
JbBb1, J t -1bBb1, J t -1T t -11, Jb t -1Tt -11.
Statistical Modelling Chapter XI
63
Proof of theorem XI.6 (continued)
• Firstly, each element of JbBb1 is the sum of the b Block
means and
b
b
B
b
j 1 j
j 1B j b bY
where Y is the grand mean of all the observations.
• Hence, JbBb1 bY 1b
• Similarly, J t -1bBb1 bY 1t -1
• Next, each element of Jt -1Tt -11 is the sum of the first
(t -1) Treatment means and
t -1
T
i 1 i
- T tY - T
t
T
i 1 i
t
t
• Hence, Jt -1Tt -11 tY - Tt 1t -1
• Similarly, Jbt -1Tt -11 tY - Tt 1b
Statistical Modelling Chapter XI
64
Proof of theorem XI.6 (continued)
• Summary:
• Therefore
I t -1 J B - J
JbBb1 bY 1b
T
b
b
b1
b t -1 t -11
b
J t -1bBb1 bY 1t -1
ˆ - bt J t -1bBb1 I t -1 J t -1 T t -11
Jt -1Tt -11 tY - Tt 1t -1
0
Jbt -1Tt -11 tY - Tt 1b
Bb1 t - 1Y 1b - tY - Tt 1b
-tY 1 t -1 T t -11 tY - Tt 1 t -1
0
Bb1 Tt 1b - Y 1b
T t -11 - Tt 1 t -1 .
0
• So the estimator is a function of block, treatment and grand
means, albeit a rather messy one.
Statistical Modelling Chapter XI
65
c) Estimable functions
• In the less-than-full-rank model the problem we
face is that cannot be estimated uniquely as it
changes for the many different choice of (X'X)-.
• To overcome consider those linear functions of
that are invariant to the choice of (X'X)- and
hence to the particular ̂ obtained.
• That is, consider functions θ whose estimators θˆ
are invariant to different ̂ s in that their values
remain the same regardless of which solution to
the normal equations is used.
• This property is true for all estimable functions.
Statistical Modelling Chapter XI
66
Definition of estimable functions
• Definition XI.6: Let E[Y] X and var[Y] 2In where X
is n q of rank m q.
• A function ' is said to be estimable if there exists a
vector c such that E[c'Y] ' for all possible values of .
• This definition is saying that, to be estimable, there must
be some linear combination of the random variables Y1,
Y2, …, Yn that is a linear unbiased estimator of '.
• In the case of several linear functions, we can extend the
above definition as follows:
• If L is a k q matrix of constants, then L is a set of k
estimable functions if and only if there is a matrix C of
constants so that E[CY] L for all possible values of .
• However, how to identify estimable functions?
• The following theorems give us some important cases that
will cover the quantities of most interest.
Statistical Modelling Chapter XI
67
Expected value results
• Lemma XI.5: Let
– a be a k 1 vector of constants,
– A an s k matrix of constants and
– Y a k 1 vector of random variables or
random vector.
E a a and var a 0;
E aY aE Y and var aY a var Y a;
E AY AE Y and var AY A var Y A.
Statistical Modelling Chapter XI
68
Estimable if linear combination of X
• Theorem XI.7: Let E[Y] X and var[Y] 2In
where X is n q of rank m q.
• Then ' is an estimable function if and only if
there is a vector of constants c such that ' c'X.
• (Note that ' c'X implies that is a linear
combination of the rows of X.)
• Moreover, for a k q matrix L, L is a set of k
estimable functions if and only if there is a matrix
C of constants such that L CX.
Statistical Modelling Chapter XI
69
Proof
• First suppose that ' is estimable.
• Then, by definition XI.6 (estimable function),
there is a vector c so that E[c'Y] '.
• Now, using lemma XI.5 (E rules) , E[c'Y] c'E[Y]
c'X.
• But also E[c'Y] ' and so c'X = '.
• Since this last equation is true for all , follows
that c'X = '.
• Second, suppose that ' = c'X.
• Then ' = c'X = E[c'Y] and, by definition XI.6
(estimable function), ' is estimable.
• Similarly, if E[CY] L, CX = L for all and
thus CX = L.
• Likewise, if L = CX, then L = CX = E[CY] and
hence L is estimable.
Statistical Modelling Chapter XI
70
Expected values are estimable
• Theorem XI.8: Let E[Y] X and var[Y]
2In where X is n q of rank m q.
Each of the elements of X is estimable.
• Proof: From theorem XI.7, L is a set of
estimable functions if L = CX.
• We want to estimate X, so that L = X
and so to have L = CX we need to find a CX = X
Clearly, setting C In we have that CX = X = L
and as a consequence X is estimable.
Statistical Modelling Chapter XI
71
A linear combination of estimable
functions is itself estimable
• Theorem XI.9: Let 1, 2, , k be a collection of k
estimable functions.
Let z a1 1 a2 2 ak k be a linear combination
of these functions.
Then z is estimable.
• Proof: If each of i θ, i 1, , k, is estimable then, by
definition XI.6 (estimable function), there exists a vector
of constants ci such that E ci Y i θ
• Then z
where d
and so z is estimable.
k
a θ
i 1 i i
k
a c
i 1 i i
Statistical Modelling Chapter XI
k
aE
i 1 i
k
ci Y E i 1ai ci Y E dY
72
Finding all the estimable functions
• First find X, all of whose rows are
estimable functions.
• It can be shown that all estimable functions
can be expressed as a linear combination
of the entries in X.
• So find all linear combinations of X
and you have all the estimable functions.
Statistical Modelling Chapter XI
73
Example XI.2 The RCBD (continued)
• For the RCBD, the maximal model is
E[Y] XB + XT.
• So for b 5, t 4,
1 1
1 2
1 3
1 4
2 1
2 2
2 3
2 4
3 1
ψ 3 2 .
3 3
3 4
4 1
4 2
4 3
4 4
5 1
5 2
5 3
5 4
Statistical Modelling Chapter XI
• From this we identify the 20 basic
estimable functions, i + k.
• Any linear combination of these 20 is
estimable and any estimable function
can be expressed as a linear
combination of these.
• Hence i – i' and k – k' are estimable.
• On the other hand, neither i nor k + k'
are estimable.
74
d) Properties of estimable functions
• So we have seen what estimable functions are
and how to identify them, but why are they
important?
• Turns out that they can be unbiasedly and
uniquely estimated using any solution to the
normal equations and that they are, in some
sense, best.
• It is these properties that we establish in this
section.
• But first a definition.
Statistical Modelling Chapter XI
75
The LS estimator of an estimable
function
• Definition XI.7: Let ' be an estimable function
ˆ
and let θ XX XY be a least squares
estimator of .
• Then θˆ is called the least squares estimator of
'.
• Now the implication of this definition is that θˆ is
the unique estimator of '.
• This is not at all obvious because there are
infinitely many solutions θ̂ to the normal
equations when they are less than full rank.
• The next theorem establishes that the least
squares estimator of an estimable function is
unique.
Statistical Modelling Chapter XI
76
Recall properties of g-inverses
• Let A be an n q matrix of rank m with n q m.
• Then
1. A-A and AA- are idempotent.
2. rank(A-A) rank(AA-) m.
3. If A- is a generalized inverse of A, then (A-)' is a
generalized inverse of A'; that is, (A-)' = (A')-.
4. (A'A)-A' is a generalized inverse of A so that
A A(A'A)-(A'A) and A' (A'A)(A'A)-A'.
5. A(A'A)-A' is unique, symmetric and idempotent as it
is invariant to the choice of a generalized inverse.
Furthermore rank(A(A'A)-A') m.
Statistical Modelling Chapter XI
77
LS estimator of estimable function
is unique
• Theorem XI.10: Let ' be an estimable function.
Then least squares estimator θˆ is invariant to the least
squares estimator θ̂.
That is, the expression for θˆ is the same for every
expression for θˆ XX XY
• Proof: From theorem XI.7 (estimable if ' c'X), we have
ˆ
' c'X and, from theorem XI.4 (LS/LTFR), θ X X XY
• Using these results we get that θˆ cX XX XY
• Since, from property 5 of generalized inverses, X XX X
is invariant to choice of (X'X)-, it follows that cX XX XY
has the same expression for any choice of (X'X)-.
• Hence θˆ is invariant to (X'X)- and hence θ̂.
Statistical Modelling Chapter XI
78
Unbiased and variance
• Next we show that the least squares estimator of
an estimable function is unbiased and obtain its
variance.
• Theorem XI.11: Let E[Y] X and var[Y] 2In
where X is n q of rank m q.
• Also let ' be an estimable function and θˆ be
its least square estimator.
• Then θˆ is unbiased for '.
• Moreover, var θˆ 2 XX - , which is
invariant to choice of generalized inverse.
Statistical Modelling Chapter XI
79
Proof of theorem XI.11
ˆ
• Since θ X X XY and ' c'X we have
E θˆ E θˆ
XX XE Y since XX X is a matrix of constants
-
-
-
XX XXθ
-
cX XX XXθ
cXθ
from property 4 of generalized inverses
θ.
• This verifies that θˆ is unbiased for '.
Statistical Modelling Chapter XI
80
Proof of theorem XI.11(continued)
• Next
var θˆ
var
X I X XX
-
XX XY
XX
-
n
-
cX XX XX XX
2
-
-
-
Xc
cX XX XX XX Xc
2
-
2cX XX Xc
2 XX
Statistical Modelling Chapter XI
since var aY a var Y a
2
and var[Y] In
-
2
-
using XX XX
& property 3 of g-inverses
from property 4 of g-inverses
as
cX .
81
Unbiased only for estimable
functions
• The above theorem is most important.
• This follows from that fact that θˆ is not unbiased
for any '.
• In general,
ˆ
ˆ
E θ E θ X X XXθ
which is not necessarily equal to '.
• Thus the case when ' is estimable, and hence
θˆ is an unbiased estimator of it, is a very special
case.
Statistical Modelling Chapter XI
82
BLUE = Best Linear Unbiased Estimator
• Definition XI.8: Linear estimators are estimators of the
form AY, where A is a matrix of real numbers.
• That is, each element of the estimator is a linear
combination of the elements of Y.
• The unbiased least squares estimator developed here is
an example of a linear estimator with A '(X'X)-X'.
• Unfortunately, unbiasedness does not guarantee
uniqueness.
• There might be more than one set of unbiased estimators
of .
• Another desirable property of an estimator is that it be
minimum variance.
• Theorem XI.12, the Gauss-Markoff theorem, guarantees
that, among the class of linear unbiased estimators of ',
the least squares estimator is the best in the sense that
the variances of the estimator, var ˆ , is minimized.
• For this reason the least squares estimator is called the
BLUE (best linear unbiased estimator).
Statistical Modelling Chapter XI
83
A result that will be required in the proof of
the Gauss-Markoff theorem
• Lemma XI.6: Let U and V be random variables and a and
b be constants. Then
var aU bV a 2 var U b 2 var V 2ab cov U,V
• Proof:
2
var aU bV E aU bV - E aU bV
using definition I.5 of the variance
2
E aU - E aU bV - E bV
aU - E aU 2 bV - E bV 2
E
2 aU - E aU bV - E bV
Note that last step
uses definitions I.5
2
2
a 2E U - E U b 2E V - E V
and I.6 of the
variance and
2abE U - E U V - E V
covariance.
a 2 var U b 2 var V 2ab cov U,V .
Statistical Modelling Chapter XI
84
Gauss-Markoff theorem
• Theorem XI.12 [Gauss-Markoff]: Let E[Y] X and
var[Y] 2In where X is n q of rank m q. Let ' be
estimable.
• Then the best linear unbiased estimator of ' is
ˆ where ̂ is any solution to the normal equations.
• This estimator is invariant to the choice of ̂.
• Proof: Suppose that a'Y is some other linear unbiased
estimator of ' so that E[a'Y] '.
• The proof proceeds as follows:
1. Establish a result that will be used subsequently.
2. Prove that ˆ is the minimum variance estimator by proving that
var ˆ var aY
3. Prove that ˆ is the unique minimum variance estimator.
Statistical Modelling Chapter XI
85
Proof of theorem XI.12
• Firstly, E[a'Y] a'E[Y] a'X, and
since also E[a'Y] ',
a'X ' for all .
• Consequently, a'X '
this is the result that will be used subsequently.
• Secondly, we proceed to show that
var ˆ var aY
• Now, from lemma XI.6 (var[aU + bV]), as ˆ and a'Y are
scalar random variables and ˆ - aY is their linear
combination with coefficients 1 and –1, we have
var ˆ - aY var ˆ var aY - 2cov ˆ, aY
Statistical Modelling Chapter XI
86
cov ˆ, aY cov XX XY, aY
Proof of
theorem
XI.12
(continued)
using definition I.6 of the covariance
E XX XY - XX XXθ aY - aXθ
as E Y X and aY - aXθ is a scalar
E XX X Y - Xθ a Y - Xθ
E XX X Y - XθY - Xθ a
XX XE Y - Xθ Y - Xθ a
XX X var Y a using definition I.4 for V
-
2 XX Xa
2 XX
var θˆ .
Statistical Modelling Chapter XI
E XX XY - E XX XY aY - E aY
-
as var Y 2In
as aX
(The last step uses theorem XI.11
— LS and var of est. func.)
87
Proof of theorem XI.12 (continued)
• So, var ˆ - aY var ˆ var aY - 2cov
var ˆ var aY - 2 var
var aY - var ˆ
ˆ, aY
ˆ
• Since, var ˆ - aY 0, var aY var ˆ
• Hence, ˆ has minimum variance amongst all linear
unbiased estimators.
Statistical Modelling Chapter XI
88
Proof of theorem XI.12 (continued)
• Finally, to demonstrate that ˆ is the only linear unbiased
estimator with minimum variance, we suppose that
var aY var ˆ
• Then var ˆ - aY 0 and hence ˆ - aY is a
constant, say c.
• Since E ˆ E aY , E ˆ - E aY 0
so that E ˆ - aY 0
• But ˆ - aY c so that E ˆ - aY E c c
• Now we have that c 0 and so ˆ aY
• Thus any linear unbiased estimator of ' which has
minimum variance must equal ˆ .
• So, provided restrict our attention to estimable functions,
our estimators will be unique and BLU.
Statistical Modelling Chapter XI
89
e) Properties of the estimators in
the full rank case
• The above results apply generally to models either of less
than full rank or of full rank.
• However, in the full rank case, some additional results
apply.
-1
-1
ˆ
• In particular, E θ E X X X Y X X XXθ θ so
that in this particular case itself is estimable.
• So ̂ are BLUE for .
• Further, since ' is a linear combination of , it follows
that ' is estimable for all constant vectors .
Statistical Modelling Chapter XI
90
f) Estimation for the maximal
model for the RCBD
• In the next theorem, because each of the
parameters in ' [' '] is not estimable,
we consider the estimators of .
– theorem XI.8 tells us that these are estimable,
– theorem XI.10 that they are invariant to the
solution to the normal equation chosen and
– theorem XI.12 that they are BLUE.
Statistical Modelling Chapter XI
91
Estimator of expected values for
RCBD
• Theorem XI.13: Let Y be a n-vector of jointlydistributed random variables with
E[Y] XB + XT and var[Y] 2In
where , XB, and XT are the usual vectors and
matrices.
ˆ B T - G where B, T and G are the n• Then ψ
vectors of block, treatment and grand means,
respectively.
Statistical Modelling Chapter XI
92
Proof of theorem XI.13
• Proof: Using the estimator of , established in
theorem XI.6 (LS estimator of in RCBD), the
estimator of the expected values is given by
ˆ Xˆ
ψ
XB
X T -1
Bb1 Tt 1b - Y 1b
x t T t -11 - Tt 1 t -1
0
XB Bb1 Tt 1b - Y 1b X T -1 T t -11 - Tt 1 t -1 .
Statistical Modelling Chapter XI
93
Proof of theorem XI.13 (continued)
1.
2.
Now each row of [XB XT-1] corresponds to 1 obs:
say obs. in ith block that received jth treat.
In the row for obs. in ith block that received jth treat:
– the element in the ith column of XB will be 1,
– as will the element in the jth column of XT-1;
– all other elements in the row will be zero.
3.
Note if treatment is t, only nonzero element will be ith
column of XB. 1 0 0 0 0
1 0 0
1
1
1
0
0
0
0
0
XB 0
0
0
0
0
0
0
0
0
0
0
Statistical Modelling Chapter XI
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0 ,
0
0
0
0
0
0
1
1
1
1
0
0
0
1
0
0
0
1
X T-1 0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
Put B-bar-.. and T-bar …
vectors on board and
consider what get for
say observations 7 and
8 when evaluate
expected value
94
expression
Proof of theorem XI.13 (continued)
4. So from XB Bb1 Tt 1b - Y 1b XT -1 Tt -11 - Tt 1t -1
find estimator of expected value for obs. in ith
block with jth treat is the sum of:
– ith element of Bb1 Tt 1b - Y 1b and
– jth element of Tt -11 - Tt 1t -1,
unless treat is t (j = t) when it is ith element of
Bb1 Tt 1b - Y 1b
5. Estimator of expected value for obs.
– with treatment t in ith block: Bi Tt - Y
– with jth treatment (j ≠ t) in the ith block:
Bi
T t - Y T j - Tt Bi T j - Y
That is, ij Bi T j - Y for all i and j, and clearly
ψ̂ B T - G
Statistical Modelling Chapter XI
95
Notes on estimator for RCBD
• Now,
ˆ BT -G
ψ
G B - G T - G .
• That is the estimator of the expected values can be
written as the sum of the grand mean plus main effects.
• Note that the estimator is a function of means with
-1
B MBY, T MT Y, G MGY and Mi X i Xi X i Xi
where Xi XB, XT or XG.
• That is, MB, MT and MG are the block, treatment and
grand mean operators, respectively.
• Further, if the data in the vector Y has been arranged in
prerandomized order with all the observations for a block
placed together, the operators are:
MG n -1Jb Jt n -1Jn
MB t -1Ib Jt
Statistical Modelling Chapter XI
MT b -1Jb It .
96
Example XI.2 The RCBD (continued)
• For the estimable function 23 2 + 3,
ˆ 23 2 3 Y2. Y.3 - Y.. Y.. Y2. - Y.. Y.3 - Y..
• In this case, the mean operators are as follows:
1
1
1
1
1
1
1
1
1
1 1
MG 20-1J5 J4
20 1
1
1
1
1
1
1
1
1
1
Statistical Modelling Chapter XI
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
97
Example XI.2 The RCBD (continued)
MB 4-1I5 J4
1
1
1
1
0
0
0
0
0
1
0
4 0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Statistical Modelling Chapter XI
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
98
Example XI.2 The RCBD (continued)
MT 5-1J5 I4
1
0
0
0
1
0
0
0
1
1 0
5 0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
Statistical Modelling Chapter XI
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
1
99
XI.C
Generalized least squares (GLS)
estimation of the expectation
parameters in general linear models
• Define a GLS estimator.
• Definition XI.9: Let Y be a random vector with
E[Y] X and var[Y] V where X is an n q
matrix of rank m q, is a q 1 vector of
unknown parameters, V is an n n positivedefinite matrix and n q.
• Then the generalized least squares estimator of
is the estimator that minimizes the "sum of
squares"
Statistical Modelling Chapter XI
y - X V -1 y - X
100
GLS estimator
• Theorem XI.14: Let Y be a random vector with
E[Y] X and var[Y] V where X is an n q
matrix of rank m q, is a q 1 vector of
unknown parameters, V is an n n positivedefinite matrix and n q.
• Then the generalized least squares estimator of
is denoted by ̂ and is given by
ˆ XV X
-1
-
XV -1Y
• Proof: similar to proof for theorem XI.1.
• Unsurprisingly, properties of GLS estimators
parallel those for OLS estimators.
• In particular, they are BLUE as the next theorem
states.
Statistical Modelling Chapter XI
101
Gauss-Markoff for GLS
• Theorem XI.15 [Gauss-Markoff]: Let Y be a
random vector with
E[Y] X and var[Y] V
where X is an n q matrix of rank m q,
is a q 1 vector of unknown parameters, V is an
n n positive-definite matrix and n q.
Let ' be an estimable function.
Then the generalized least squares estimator
ˆ XV X
-1
-
XV -1Y
is the best linear unbiased estimator of ', with
var ˆ
• Proof: not given
Statistical Modelling Chapter XI
XV X
-1
-
102
Estimation of fixed effects for the RCBD
when Blocks are random
• We now derive the estimator of in the model for
the RCBD where Blocks are random.
• Symbolically, the model is
E[Y] = Treats and var[Y] = Blocks + BlocksUnits.
• One effect of making Blocks random is that the
maximal, marginality-compliant model for the
experiment is of full rank and so the estimator of
will be BLUE.
• Before deriving estimators under this model, we
establish the properties of a complete set of
mutually-orthogonal idempotents as they will be
useful in proving the results.
Statistical Modelling Chapter XI
103
Complete set of mutually-orthogonal
idempotents
• Definition XI.10: A set of idempotents {QF: F
} forms a complete set of mutuallyorthogonal idempotents (CSMOI) if
QFQF QFQF FFQF for all F,F F and
FF QF In
where FF 1 if F F' and FF 0 otherwise.
• Of particular interest is that the set of Q matrices
from a structure formula based on crossing and
nesting relationships forms a CSMOI.
• e.g. RCBD
Statistical Modelling Chapter XI
104
Inverse of CSMOI
• Lemma XI.7: For a matrix that is the linear
combination of a complete set of mutuallyorthogonal matrices, such as , V FF FQF
its inverse is
V -1 FF F-1QF
• Proof:
VV -1
FF
FF
FQF
FF
FF
F-1QF
FQFF-1 QF
FF FQFF-1QF
FF QF
In .
Statistical Modelling Chapter XI
105
Estimators for only Treatments fixed
• In the following theorem we show that the GLS estimator for the
expectation model when only Treatments is fixed is the same as the
OLS estimator for a model involving only fixed Treatments, such as
would occur with a CRD.
• However, first a useful lemma.
• Lemma XI.8: Let A, B, C and D be square matrices and a, b and c
be scalar constants. Then,
A B C D AC BD
provided A and C, as well as B and D, are conformable.
A B A B
aA a A
aA bB abA B
A bB cC bA B cA C
trace A B trace A trace B .
Statistical Modelling Chapter XI
106
Estimators for only Treatments fixed (cont’d)
• Theorem XI.16: Let Y be an random vector for the
observations from an RCBD with the Yijk arranged such
that those from the same block occur consecutively in
treatment order as in a prerandomized layout.
• Also, let
E Y X Τ 1b It
2
2
V B2SB BU
SBU B2 Ib Jt BU
Ib It
• Then the generalized least squares estimator of is given
-1
by
-1
ˆ XT V X T XT V -1Y b1 XT Y
and the GLS estimator of the expected values is Tn1 MT Y
-1
where MT X T X T X T XT
Statistical Modelling Chapter XI
107
Proof of theorem XI.16
• From theorem XI.14 (GLS estimator), the GLS
estimator of is given by
-1
ˆ XT V X T
-1
XT V -1Y
the generalized inverse being replaced by an
inverse here because XT is of full rank.
• To prove the second part of the result, we
i.
ii.
iii.
iv.
derive an expressions for V-1,
use this to obtain XT V -1X T ,
invert the last result, and
-1
-1
show that XT V X T XT V -1Y b1 XT Y
Statistical Modelling Chapter XI
108
Proof of theorem XI.16 (continued)
• We start by obtaining an expression for V in terms of QG,
QB and QBU as these form a complete set of mutually
orthogonal idempotents.
• Now,
2
2
V B2SB BU
SBU t B2MB BU
MBU
• But, from section IV.C, Hypothesis testing using the
ANOVA method for an RCBD, we know that
Q G MG
QB MB - MG
QBU MBU - MB
• so that
MB QB Q G
MBU QBU QB Q G
Statistical Modelling Chapter XI
109
Proof of theorem XI.16 (continued)
• Consequently
2
V t B2MB BU
MBU
2
t B2 QB Q G BU
QBU QB QG
2
2
t B2 BU
QBU
QB QG BU
B QB Q G BUQBU
2
2
where B t B2 BU
and BU BU
• Since QG, QB and QBU form a complete set of mutually
orthogonal idempotents, we use lemma XI.7 (CSMOI
inverse) to obtain
-1
V -1 B-1 Q G QB BU
QBU
-1
B-1 MG MB - MG BU
MBU - MB
-1
-1
B-1 - BU
MB BU
MBU
Statistical Modelling Chapter XI
-1
B-1 - BU
t
-1
Ib Jt BU
Ib It
we have our
expression for V-1.
110
Proof of theorem XI.16 (continued)
• Next to derive an expression for XT V -1X T , note that
X T 1b It and
• so that 1b It Jb Jt
1b It Ib Jt
1b It Ib It
A B
A B
b1b Jt
1b Jt
1b It
1b1b b
• We then see that
-1
-1
-1
-1
B
BU
XT V X T 1b It
Ib Jt BUIb It 1b It
t
-1
B-1 - BU
-1
1b Jt BU1b It 1b It
t
Statistical Modelling Chapter XI
-1
B-1 - BU
t
-1
bJt BU
bIt
-1
b B-1 - BU
1t Jt BU-1 It .
111
Proof of theorem XI.16 (continued)
• Thirdly, to find the inverse of this:
– note that It - 1t Jt and 1t Jt are two symmetric idempotent matrices
whose sum is I and whose product is zero
– so that they form a complete set of mutually orthogonal
idempotents — see also lemma XI.4 (special inverse).
• Now
XT V
Statistical Modelling Chapter XI
-1
XT
-1
-1
1
-1
-1 1
-1
- BU t Jt BUIt
b B
-1
1 -1 1
-1
1
B t Jt BU It - t Jt
b
1
B 1t Jt BU It - 1t Jt .
b
112
Proof of theorem XI.16 (continued)
• Finally,
ˆ XT V -1X T
-1
XT V -1Y
1
-1 1
-1
B 1t Jt BU It - 1t Jt B-1 - BU
1
J
t
BU1b It Y
t b
b
-1 1
-1 1
B B-1 - BU
note J 1 J ;
1
J
t
B BU t 1b Jt 0
1
t b
t
t
Y
-1
1
b
0
using
CSMOI
1
I
J
t t t
BU BU b
1
-1
-1 1
1
B B-1 - BU
BU
1
J
1
I
1 Jt Y
b
t
b
t
t
t b
b
1
1t 1b Jt 1b It - 1t 1b Jt Y
b
1
1b It Y
b
1
XT Y.
b
ˆ
Statistical Modelling Chapter XI
113
Proof of theorem XI.16 (continued)
• The GLS estimator of the expected values is then given
1
by
ˆ X T ˆ X T XT Y
ψ
b
• But, as XT X T 1b It 1b It bIt
and, from corollary XI.1 (expected values for single-term
model),
-1
-1
MT X T XT X T XT X T bIt XT b1 X T XT
1
ˆ X T XT Y MT Y T
ψ
b
• So the estimator of the expected values under the model
2
ψ E Y X T and V B2 Ib Jt BU
Ib It
are just the treatment means, the same as for the simple
2
linear model E Y X T and V BU
Ib It
• This result is generally true for orthogonal experiments,
as stated in the next theorem that we give without proof.
Statistical Modelling Chapter XI
114
Orthogonal experiments
• Definition XI.11: An orthogonal experiment is
one for which QFQF* equals either QF, QF* or 0 for
all F, F* in the experiment.
• All the experiments in these notes are
orthogonal.
• Theorem XI.17: For orthogonal experiments with
a variance matrix that can be expressed as a
linear combination of a complete set of mutuallyorthogonal idempotents, the GLS estimator is the
same as the OLS estimator for the same
expectation model.
• Proof: not given
• This theorem applies to all orthogonal
experiments for which all the variance terms
correspond to unrandomized generalized factors
Statistical Modelling Chapter XI
115
XI.D Maximum likelihood estimation of
the expectation parameters
• Maximum likelihood estimation: based on
obtaining the values of the parameters that
make the data most 'likely'.
• Least squares estimation minimizes the
sum of squares of the differences between
the data and the expected values.
• Need likelihood function to maximize.
Statistical Modelling Chapter XI
116
The likelihood function
• Definition XI.12: The likelihood function is the joint
distribution function, f(y; ), of n random variables Y1, Y2,
…, Yn evaluated at y (y1, y2, …, yn).
• For fixed y the likelihood function is a function of the k
parameters, (1, 2, …, n), and is denoted by L(; y).
• If Y1, Y2, …, Yn represents a random sample from f(y; )
then L(; y) f(y; )
• Notation f(y; ) indicates that
– y is the vector of variables for the distribution function
– the values of are fixed for the population under consideration.
• This is read ‘the function f of y given ’.
• The likelihood function reverses the roles:
– the variables are and y is considered fixed
– so we have the likelihood of given y.
Statistical Modelling Chapter XI
117
Maximum likelihood estimates
• Definition XI.13: Let L(; y) f(y; ) be the
likelihood function for Y1, Y2, …, Yn.
• For a given set of observations, y, the value ξ
that maximizes L(; y) is called the maximum
likelihood estimate (MLE) of .
• An expression for this estimate as a function of y
can be derived.
• The maximum likelihood estimators are
defined to be the same function as the estimate,
with Y substituted for y.
Statistical Modelling Chapter XI
118
Procedure for obtaining the
maximum likelihood estimators
a) Write the likelihood function L(; y) as the joint
distribution function for the n observations.
b) Find ln[L(; y)].
c) Maximize with respect to to derive the
maximum likelihood estimates.
d) Obtain the maximum likelihood estimators.
• The quantity is referred to as the log
likelihood.
• Maximizing is equivalent to maximizing L(; y)
since ln is a monotonic function.
• It will turn out that the expression for is often
simpler than that for L(; y).
Statistical Modelling Chapter XI
119
Probability distribution function
• Now the general linear model for a response variable is
ψ E Y Xθ and var Y VY F2SF FQF
• We have not previously specified a probability distribution
function to be used as part of the model.
• However, need one for maximum likelihood estimation.
(least squares estimation does not).
• A distribution function commonly used for continuous
variables is the multivariate normal distribution.
• Definition XI.14: The multivariate normal distribution
function for is
-1
-1 2
-n 2
f y; , V 2
V
exp - y - V y -
• where
– Y is a random vector, E[Y] X,
– X is an n q matrix of rank m q with n ≥ q,
– is a q 1 vector of unknown parameters, and var[Y] V.
Statistical Modelling Chapter XI
120
Deriving the maximum likelihood
estimators of
• Theorem XI.18: Let Y be a random vector
whose distribution is multivariate normal.
• Then, the maximum likelihood estimator of
is denoted by and is given by
Statistical Modelling Chapter XI
-1
XV X
-
-1
XV Y
121
Proof of theorem XI.18
The likelihood function L(; y)
• Given the distribution function for Y, the likelihood
function L(; y) is
L ; y 2
-n 2
V
-1 2
-1
exp - y - V y -
Find ln[L(; y)]
• Given the distribution function for Y, the likelihood
function L(; y) is
ln L ; y
-1
-1 2
-n 2
ln 2
V
exp - y - V y -
n
1
-1
- ln 2 - ln V - y - V y - .
2
2
Statistical Modelling Chapter XI
122
Proof of theorem XI.18 (continued)
Maximize with respect to
• The maximum likelihood estimates of are then obtained
by maximizing with respect to , that is, by differentiating
with respect to and setting the result equal to zero.
• Clearly, the maximum likelihood estimates will be the
solution of
y - V -1 y - y - X V -1 y - X
0
• Now,
-1
y - X V y - X
Statistical Modelling Chapter XI
yV -1y - 2XV -1y XV -1X
123
Proof of theorem XI.18 (continued)
• and applying the rules for differentiation given in the proof
of theorem XI.1, we obtain
yV -1y - 2XV -1y XV -1X
-2 XV -1y 2 XV -1X
-2 XV y XV X XV X
-1
-1
-1
• Setting this derivative to zero we obtain
-2 XV -1y 2 XV -1X 0 or XV -1X XV -1y
Obtain the maximum likelihood estimators
1
• Hence, XV X XV -1Y as claimed.
So for general linear models the maximum likelihood and
generalized least squares estimators coincide.
Statistical Modelling Chapter XI
124
XI.E Estimating the variance
a)Estimating the variance for the simple linear
model
• As stated in theorem XI.11, var θ 2 XX and
so to determine the variance of our OLS estimates need
to know 2.
• Generally, it is unknown and has to be estimated.
• Now the definition of the variance is
var[Y] = E[(Y – E[Y])2].
• The logical estimator is one that parallels this definition:
-
– put in our estimate for E[Y] and
– replace the first expectation operator by the mean.
Statistical Modelling Chapter XI
125
Fitted values and residuals
• Definition XI.15: The fitted values are:
– the estimated expected values for each
observation, E Yi .
– obtained by substituting the x-values for each
observation into the estimated equation.
– given by Xθˆ .
• The residuals are:
– the deviations of the observed values of the
response variable from the fitted values.
– denoted by e y - Xθˆ
– the estimates of ε Y - Xθ.
Statistical Modelling Chapter XI
126
Estimator of 2
• The logical estimator of
2
is:
ˆ n2
Y - Xθˆ Y - Xθˆ
n
εˆεˆ
n
• and our estimate would be ee/n.
• Notice that the numerator of this expression measures
the spread of observed Y-values from the fitted equation.
• We expect this to be random variation.
• Turns out that this is the ML estimator as claimed in the
next theorem.
• Theorem XI.19: Let Y be a normally distributed random
vector representing a random sample with E[Y] X and
var[Y] VY 2In where X is an n q matrix of rank m q
with n ≥ q, and is a q 1 vector of unknown
parameters.
• The maximum likelihood estimator of 2 is denoted by 2
and is given by
Y
Xθ
Y - Xθ εε
Proof: left as
2
n
an exercise
Statistical Modelling Chapter XI
n
n
127
Is the estimator unbiased?
• Theorem XI.20: Let
• with
ˆ n2
ˆ
Y - Xθ Y - Xθˆ
n
– Y the n 1 random vector of sample random variables,
– X is an n q matrix of rank m q with n ≥ q,
– θ̂ is a q 1 vector of estimators of .
n -q 2
2
• Then E ˆ n
n
2
• so an unbiased estimator of is
ˆ n2-q
Y - Xθˆ Y - Xθˆ
n -q
• Proof: The proof of this theorem is postponed.
• Latter estimator indicates how we might go more
generally,
– once it is realized that this estimator is one that would
be obtained using the Residual MSQ from an ANOVA
based on simple linear models.
Statistical Modelling Chapter XI
128
b) Estimating the variance for the
general linear model
• Generally, the variance components (2s)
that are the parameters of the variance
matrix, V, can be estimated using the
ANOVA method.
• Definition XI.16: The ANOVA method for
estimating the variance components
consists of equating the expected mean
squares to the observed values of the
mean squares and solving for the variance
components.
Statistical Modelling Chapter XI
129
Example IV.1 Penicillin yield
• This example involves 4 treatments applied using an
RCBD employing 5 Blends as blocks.
• Suppose it was decided that the Blends are random.
• The ANOVA table, with E[MSq]s, for this case is
Source
Blends
df
4
Flasks [Blends]
15
Treatments
3
Residual
12
MSq
E[MSq]
2
66.0 BF
4 B2
F
3.50
Prob
0.041
1.24
0.339
23.3 2 q
BF
T
2
18.8 BF
2
• The estimates of and BF
are B obtained as follows: 2
2
66.0 - ˆBF
2
BF
4 B2 66.0
ˆB
2
4
so that ˆBF 18.8
2
BF 18.8
66.0 - 18.8
• That is, the magnitude of the two
4
components of variation is similar.
2
Statistical Modelling Chapter XI
11.8.
130
Example IV.1 Penicillin yield
• Now
2
var Yijk BF
B2
(see variance matrix in section XI.A, Linear
models for designed experiments, above),
2
ˆB2 18.8 11.8 30.6
• so that var Yijk ˆBF
• Clearly, the two components make similar
contributions to the variance of an
observation.
Statistical Modelling Chapter XI
131
XI.G Exercises
• Ex. 11-1 asks you to provide matrix expressions for
models and the estimator of the expected values.
• Ex. 11-2 involves a generalized inverse and its use.
• Ex. 11-3 asks you to derive the variance of the estimator
of the expected values.
• Ex. 11-4 involves identifying estimable parameters and
the estimator of expected values for a factorial
experiment.
• Ex. 11-5 asks for the E[Msq], matrix expressions for V
and estimates of variance components.
• Ex. 11-6 involves max. likelihood estimation (not
covered).
• Ex. 11-7 requires the proofs of some properties of Q and
M matrices for the RCBD.
Statistical Modelling Chapter XI
132
© Copyright 2026 Paperzz