I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
,I
SOME CONSIDERATIONS ABOUT THE EFFECT OF REDUNDANCIES AND
RESTRICTIONS IN THE GENERAL LINEAR REGRESSION MODEL
by
Gary G. Koch
University of North Carolina
Institute of Statistics Mimeo Series No. 459
February 1966
This research was supported by the National Institutes of
Health Institute of General Medical Sciences Grant No.
GM-12868-02.
The following paper represents a further study of topics considered
in Statistics 150 and Statistics 254 at the University of North Carolina
at Chapel Hill and Statistics 691 at North Carolina State University at
Raleigh. Its primary purpose is to investigate the effect of restrictions
and redundancies in linear models. An additional topic considered is the
effect of a variance-covariance matrix which is different from the
identity matrix and may be singular.
In the paper, Section 2 is based upon the notes of R. C. Bose for
Statistics 150; Section 6, upon the notes of R. C. Bose for Statistics
254; Section 4, upon the notes of H. L. Lucas for Statistics 691. Results
in the other sections are discussed also in Rao [1965].
DEPARTMENT OF STATISTICS
UNIVERSITY OF NORTH CAROLINA'
Chapel Hill, N. C.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
ACKNOWLEDGMENTS
The author is indebted to R. C. Bose for providing him with
the tools with which to deal with linear models and to H. L. Lucas
for stimulating his interest in linear models subject to restrictions.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
Some Considerations About the Effect of Redundancies and
Restrictions in the General Linear Regression Model
1. Introduction.
In this paper, we consider the ways in which the presence of
redundancies in the design matrix and/or restrictions on the parameters of
a general linear regression model influence the class of estimable functions
of the parameters.
Depending on the model, this class may be partitioned
into disjoint subsets to which correspond disjoint uncorrelated sets of
linear functions of the observations.
The sets of these partitions playa
fundamental role in the construction of best linear unbiased estimators
(b.l.u.e.) of estimable linear functions of the parameters and in the construction of tests of linear hypotheses involving estimable functions.
Finally, the theory developed provides a means of dealing with linear
models in which the variance-covariance matrix of the observations may be
singular.
It should be noted that most of the results given here are already wellknown (see Lucas [1965], Rao [1945b, 1946, 1965], Plackett [1960]).
However,
the paper is of interest since its principal purpose is an interpretation
of these results.
2. The non-restricted model.
Let Yl' Y2' ••• , Yn be uncorrelated random
2
variables with a common unknown variance a and with expected values which
are linear functions with known coefficients of m unknown parameters
PI' P2' ••• , Pm as follows
+ ••• +
(1)
...
...
...
+ a
p
ron m
I
I.
I
I
I
I
I
I
I
-2-
=
••I
= l(l\
x 1) denote the random
...
Yl\.
and let E
= E(m x
1) denote the column vector of unknown parameters
E=
...
Also let A = A(m x n) denote the matrix
Ie
I
I
I
I
I
I
I
Let l
where the a .. 's are known constants.
lJ
column vector
(2) .
A
=
all a 12
a
a
2l 22
·..
a
1n
·.. a2n
... • •• ...
a
a
a
ml m2 ·..
mn
The model can now be compactly written as
the matrix A' is called the design matrix and is assumed to have rank
n
< m < n.
o -
The following approach to linear estimation in the model
specified by (3) is due to Bose [1950].
Definition:
linear function of P , P , ••• , Pm will be said to be
2
l
(linearly) estimable with respect to the model (3) if there exists a linear
~A
function of Y ' Y ' ... , Y which is an unbiased estimator of it.
l
n
2
I' = (ll' 12' .•• ,
lp)'
LIE is
estimable if there exists
Le., if
£' = (c l ,c2 ,···,cn )
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
-3such that £'
~
is an unbiased estimator of it.
Hence, we see that
l'
such that e{£1 ~)
= £'
E will be estimable if and only if there exists c'
A' E
=!' E.
As a result, we see that
be estimable if and only if the equations A£
be the case if and only if
of the design matrix AI.
l'
=I
E will
have a solution which will
lies in the vector space spanned by the rows
As a result, we see that every estimable function
of the parameters must be of the form £IAI E =
Definition:
i.'
steel).
A linear function of Yl' Y2' ••• , Yn is said to belong to
error if its expectation vanishes independently of the parameters; i.e.,
if e l
=
(el ,e2 , ••• ,en ) and ~'l belongs to error, then e(~'l)
for all values of
E.
Thus, we see that
~'l
only if
~'
belongs to error if and only if
is orthogonal to the 'rows of A.
A~
= Q;
= ~'A'R = 0
i.e., if and
With this in mind, we call the
set of (1 x n) vectors orthogonal to the rows of A the error space and the
corresponding set of linear functions of Y ' Y ' ••• , Yn the error set.
l
2
Since the rank of A is ri , there exist n = n - n independent (1 x n)
o
e
0
vectors orthogonal to the rows of A; thus the rank of the error space is n •
e
Definition:
A linear function of Y , Y , ••• , Yn is said to be an estimating
l
2
function if it is uncorrelated with all linear functions belonging to error;
i.e., if c'
Cov
= £'(1
x n) and
S'
~
is an estimating function, then
2
(2'Z, ~'~) = £.'~O' = 0 for every ~'l belonging to error.
The set of (1 x n) vectors which are the coefficient vectors of estimating
functions is called the estimation space and the corresponding set of linear
functions of Y , Y ' ••• , Y is called the estimation set. For the model
l
n
2
specified by (3), we see that c' ~ is an estimating function if £' is
orthogonal to all vectors in the error space which will be the case if and
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
..
I
-4only if c' lies in the vector space spanned by the rows of A.
this case, the rows of A span the estimation space.
Thus, in
Finally, we note that
the estimation space and the error space are disjoint (i.e., the only
vector they have in common is the zero vector) since vectors which are
orthogonal to one another are necessarily linearly independent.
We now quote some useful lemmas, due to Bose [1950], about vectors,
matrices, and linear equations:
g
Given a matrix A = A(m x n), the matrix A = Ag(n x m) is defined
Lemma 1:
to be a conditional (generalized) inverse of A if AAgA
= A•.
For every
g
matrix A, there exists a conditional inverse A •
Lemma 2:
If the equations Au
-
=V
are
-
consistent, then u
-0
g
solution if A is any conditional inverse of A.
is a solution of the equations
A~
=
~
= Ag"
is a
Conversely, if u
-0
for every
~
= A*v
for which they are con-
sistent, then A* is a conditional inverse of A.
Lemma 3:
A'~
The equations
AA'~
= A~
for determining
~
have a solution and
is uniquely determined.
Lemma 4:
If A
= A(m
fact X is given by X
x n), we can find X
=
= X(m
(AA' )gA; moreover, A'X
x n) such that AA 'X
= A'(AA' )gA
= A;
in
is uniquely deter-
mined, symmetric, and idempotent.
From the above lemmas and the concepts of estimabi1ity, error space,
estimation space, the following fundamental theorem of linear estimation
in the model specified by (3) may be proven.
Theorem 1:
If
!';e is
any estimable function of the parameters Pl,P2, ••• ,Pm'
then
(i) there exists a unique linear function £.'l, of the random variables
Yl'Y2' ••• 'Yn such that c' belongs to the estimation space and
I
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
II
-5-
e(£'r) = !'~;
also the variance of £I r is smaller than the variance
of any other linear unbiased estimator of LIE;
(ii) the best estimator of lIE may be written as either
(a) 9:.'
Ar 'Where
9:. 1 M I = i'
or
(b) !'~ 'Where~ is a value of E which minimizes the sum of squares
S2
= (r
- AlE) I (l. - AlE); Le., ~ satisfies the equations
- = Ay·-'
Ml p
(iii) the variance of the best estimator of
and given by
i'
!'E is
uniquely determined
(MI)g!.
For a proof of Theorem 1, the reader is referred to Bose [1950].
However,
certain basic parts of the proof will be duplicated in the derivation of the
results of Sections 3 and 4.
Let us now consider the problem of finding a solution to the equations
(4)
which are called the normal equations. From Lemmas 1, 2, 3, it follows that
the equations (4) have a solution which is given by
Alternatively, we may obtain a solution of
(4)
as follows.
rank n , there exist (m - n ) linearly independent (m x
o
0
which are linearly independent of the columns of A (if n
is formally correct but has a vacuous meaning).
form the columns of a matrix R
matrix B
=B
(6)
B
=
=
R(mx (m - n )).
o
(m x m) defined by
[A
= [MI
+ RR']
1)
o
Since A has
column vectors
= m,
what follows
Let these column vectors
We may then say that the
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
-6has rank m and hence is non-singular.
If we let
1 = 1 ((m
- no) x 1) be a
vector of arbitrary constants, then we may consider the equations
which may be solved by noting that RR 'E. =
R1 and thus
.,
(8)
Hence, a solution of
Let Q
that
=Q
(7)
and also a solution of
(4)
is given by
(n x m) be defined to be a matrix of rank n whose rows are
a
a
orthogonal to the columns of R; i.e., QR
= 0n
• If!' is a linear
a , m - na
function of the rows of QB, then there exists ~' = b' (1 x no) such that
l'
= ~' QB and
(10)
Since QB = Q [AA' + RR'] ,;,
QAA', it follows that linear functions of the
rows of QB are linear functions of the rows of A'.
Conversely, since B is
non-singular, the rank of QB is n which implies that the rank of QA is n •
a
a
Hence, the rows of QB represent n
a
linearly independent linear functions of
the rows of A' and thus form a basis of the vector space generated by the
rows of A'.
As a result, it follows that
lyE. is
estimable if and only if
it can be written as a linear combination of the rows of QB in 'Which case
its best estimate is given by (10).
Also, one can see that from the stand-
point of estimation of estimable functions, we may take!
= 2m _n
without
a
any loss of generality; in this case (9) becomes
I
1.
I
I
I
I
I
I
I
Ie
.-
I
-7~
(11)
1
which implies B-
= (AA I)g
= B-1
and that AlB -lA is uniquely determined (in the
sense that it does not depend on R).
~
Finally, we note that a particular
choice of R has a meaningful in~erpretation when we take
Definition:
The vector r l
= r l (1 x
1 = Q.
m) is said to correspond to a redun-
dancy in the model (3) if it is orthogonal to the columns of A; i.e.,
!.I
A
= ~.
Hence redundancies are in a 1 - 1 correspondence with linear relationships
among the rows of A.
Also, we see that all
(1 x m) vectors corresponding
to redundancies are independent of the rows of Al and thus are not estimable.
As a result, we may take the rows of R' to be (m - n ) independent
o
vectors corresponding to redundancies.
(1
x m)
If this is done, the equations (7)
may be written as
(12)
AA 'E.
R'E.
I
A~
=
Al.
=Q
where the rows of AA' and the rows of RI are orthogonal; equations of the
form (12) arise in many applied problems (eg., those involving analysis of
variance models).
3. Models subject to non-estimable restrictions. Suppose now that we know that
the parameters of the model specified by (3) satisfy
B
1:5
(m - no)
linear
restrictions of the form
I
I
••I
(13)
Ri E.
=
11
(81 x m) and 1 1 = t 1 (sl x 1) are known. Without loss of
generality, we may assume that the rows of Ri are linearly independent •
Where Ri
= Ri
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
..
I
-8We further assume that the restrictions (13) are non-estimable.
Definition:
The restrictions specified by (13) are said to be non-
estimable if the rows of Ri are linearly independent of the rows of A'.
We now compactly write the model (3) subject to (13) as
(14)
= I
TIr
~l
= i'
cl-
=R'p
1 -
where the matrix [A
Let ¢
n
R ] has rank (no + sl).
l
E be a linear function of the parameters.
is estimable with respect to the model
(14)
¢
=
~'r + ~'
and Var
11
=
l'E.
A
is its best estimate provided e(¢)
(¢) = ~'~cl
£' = £'(1
if there exists
t)
and ~' = ~' (1 x sl) such that e(£'r + ~'
A
We will say that ¢
is a minimum; we can find
¢ by
multipliers to minimize _c'c_ subject to cIA' + d'R'
-
1
x n)
If ¢ is estimable,
= ~'A'E +
~'Ri E.
= l'E
the method of Lagrangian
= I'.
-
Let
where A.'
= A.' (1
x m) is the vector of Lagrangian multipliers.
ating with respect to the elements of
£,
~, ~,
Differenti-
we obtain the following
system of equations for the stationary values of v.
c
(16)
= A'A.
Q = Ri~
A~ + Rl ~ = 1
We may obtain a solution of the equations
(16)
as follows.
First
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
..
I
-9Ac
= AA I?::.,
o =R
-
I
implies?::.,:;:: (AA I )g A£
= R'
:>-..
1 -
(AA I )g Ac
1
-
R' (AA,)gAc + R ' (AA')~ d
1
1
1-
= RI1
(AA,)g R d
1-
= Rl'
One possible choice for (AAI)g is the matrix B-1 where B
(AA,)g
I
h.
B(m x m) is
=
defined by
=
B
[A
R
l
R ]
2
A1
R'
1
R'
2
columns are independent of the columns of A and the columns of R • It
l
follows that B has rank m and is non-singular. Hence a solution of (16)
is given by
d
-
(19)
=
I
[R ' B-1 R ]-1 R ' B-1
1
1
1
£ = AI?::., = A'B-IA£ = AIB=
l
(l.. - Rl~)
AlB-Ii - AIB-~l rRiB-~l]-~iB-ll.
Hence, we have that
A
If [I
.
= ~*~i where ~*'(l x sl)' then ¢ = ~*I
other hand, suppose there exists b l
-
A
and Var (¢)
= O. On the
x n ) such that
0
[I = ~I QB
(21)
where Q
= b'(l
-
11
= Q(no
x m) is defined to be a matrix of rank n
orthogonal to the columns of R
= R(m
0
whose rows are
x (m - no))defined by R
= [Rl
R ];
2
I
'.I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
-10i.e., QR
=
[QRl
= 0n
QR2]
o , m - n0
Then, we would have by substituting
(21) into (20)
(22)
¢ = "£'QBB-1Ay - "£'QBB-~l (RiB-~l)-l
= "£'QBB-1Al. - "£'QR
l
= "£'QBB
(RiB-1Al. -
(RiB-~l)-l (R.iB-1Al.
11 )
-11 )
-1
Al.
d -1
= L'B
Al.
;
but again we recognize that
(23)
QB
= Q[AA'
2] = QAA'
+ R1R.i + R R
;
2
and hence by an arglmlent similar to that following (10), we observe that (21)
holds if and only if
l'£ is
estimable with respect to the basic model (3).
because of (10), (22), and the uniqueness of A'(AA,)gA, it follows that if
Thus,
l'£
is estimable with respect to (3), then its best estimate with respect to the
model (14) coincides with its best estimate with respect to the model (3).
Hence, we see that the non-estimable restrictions specified by (13) have
no influence on the construction of b.l.u.e. of linear functions which are
estimable with respect to (3).
estimable functions from an n
o
Their only effect is to widen the class of
dimensional space spanned by the rows of A'
to an (no + sl) dimensional space spanned by the rows of A' and R.i..
For the
linear functions of the parameters belonging to this wider class (i. e.,
those estimable with respect to (14)), the b.l.u.e.'s are given by (20).
4. Models subject to estimable restrictions.
In this section, we assume that
the parameters of the model specified by (3) are known to satisfy s2
linear restrictions of the form
L'£ = Z.
<n
-
0
I
I.
I
I
I
I
I
I
I
Ie
I
~
I
..I
•
..
-11where L' = L' (s2 x m) and r = r(s2 x 1) are known.
Without loss of
genera.lity, we may assume that the rows of L' are linearly independent.
Finally, we assume that the restrictions (24) are estimable.
Definition:
The restrictions specified by (24) are said to be estimable
if the rows of L' are linearly dependent on the rows of
A';
i.e., if the rows
of L' can be written as linear combinations of the rows of A'.
Since the restrictions (24) are assumed to be estimable, there exists a
matrix K
= K(s2
x n) of rank s2 such that
L' = KA'
We now can compactly write the model (3) subject to the restrictions (24) as
e(~IJ
(26)
= A'£, Var(~) = I n c-l
r = L'£ = KA'£
We first observe that the class of estimable functions associated with the
model (26) is the same as the class of estimable functions associated with
the model (3) since (25) implies that the vector space spanned by the roys
of A' and the rows of L' is the same as the vector space spanned by the
rows of A' alone.
Hence, if we assume that
¢ = lIE.
= ~'
function of the parameters, then there exists ~'
i' = !'
A'.
With _c'
the best estimate of
= c_'
(1 x n) and d'
_
is an estimable linear
= d'(l
_
x s)
2'
(1 x n) such that
,..
rI.
'fJ
= c'v
_lI..
+ _,,d''Y is
¢ provided
,..
e(¢)
and the variance
,..
of
¢,
= ~'A'£ +
~'KA'E.
which is given by
= !'A'E. = irE.
~'£
2
c:J
, is a minimum.
determine ¢ by the method of Lagrangian multipliers.
(28)
Let
We can
I
I.
I
I
-12-
where ~' =~' (1 x m) is the vector of Lagrangian multipliers.
with respect to the elements of
Ie
'I
J
I
I
I
I
I
,.
I
we obtain the following system of
~, ~, ~,
equations for the stationary values of v
= A'A.
o = KA'A.
c
I
I
I
I
I
Differentiating
Ac + AK'd
= Ak
We may obtain a solution of the equations
(29)
as follows:
First we observe
that
Ac_....
= AA'A. implies A.
- = (AA,)gAc-
2. =
KA' (AA' )gA~
KA' (M' )gA~ + KA' (AA' )gAK' ~ = KA' (M' )gAK' ~= KA' (M' )g~.
From Lemma 4, we recall that A' (M,)g is uniquely determined, symmetric, and
idempotant; hence the third equation in
(31)
may be written as
KA' (M,)g AA' (M' )gAK'~ = KA' (M' )gA~
or
[KA' (M' )gA] [KA' (M' )gA]t ~
or
III oj
KK'd
where
(30)
f = K(S2
=
[KA' (AA' )gA]~
= Kk
'"
x m) is defined by
~ = KA'(M,)gA; but from Lemmas 2 and 3, it
follows that a solution to the equations (31) is given by
;
using
(29), (30),
~
and
= A'~
(32),
we obtain
= A' (M' )gA~
= A' (M,)g
(A~ _ AK'~)
I
-13-
I.
I
I
I
I
I
I'
I
Ie
Combining (32) and (33), we have that
(34)
~
NN
'1!.._
¢ = !'A'(M,)gAl. - !'K.'(KK,)g(~ - rJ
A
= i' (M' ) gAl..
=
l 1HAl. + i'
where H = H (m x m) and r*
i' (M,)g
-
L[L' (M,)~]g (L' (M' )gAl. - r)
r*
=
r* (m x 1) are defined by
Note that Lemma 4 implies that AlHA
determined; and thus whenever
i'
E.
= A' (M,)g
A - K' (KK I )9<: is uniquely
is estimable,
i
l
HAl. is uniquely determined
and this causes if r* to become uniquely determined from (27)"
I
I
I
I
I
I
I
..
I
To proceed further, we take (Mf)g to be given by the matrix B-1 where
B is defined by
(6);
that this is legitimate can be seen by observing that
(11) is always a solution of (4) and by applying Lemma 2.
We now re-write
(34) and (35) with B-1 replacing (Mf)g and obtain
(37)
¢~ if B-1Al. - l'B-~
= lfHAl. +
i'
(LfB-~)-l
(L'B-1Al. -
r)
r*
where
(38)
H
r*
= B-1
= B-~
- B-~ (L'B-~)-l LIB-l
(L'B-1L)-lr
since L f is assumed to have the full rank 6'2"
relations are satisfied
We observe that the following
I
I.
-14-
I
I
I
I
I
I
With
I'
I
I
I,
I
••I
=
HBH
=H
(37), (38), (39),
0
= Y-
L' y*
-
s2 m'
we are now in a position to consider the
partitioning of the class of estimable functions arising from the presence
of the restrictions (24).
If
i'
and can be written
II
= d*'L' where d*' = d*' (1 x s2)' then the
best estimate ~f
-, ..
I
,Ie
I
I
L'H
say as
l'£ is
i'HAl + i'
(40)
is a linear function of the rows of L '
given by
r*
= ~*'L'HAl +
d*'L'r*
= d*'y
- -
Thus, if
l'
estimator of
is a linear function of the rows of L', we have that the'best
l'£ is the
corresponding linear function of the '['s and the
variance of this best estimator is zero.
On the other hand, let Q
=
Q«n - 82 ) x m) be defined to be a matrix
o
of rank (no - s2) whose rows are orthogonal to the columns of the matrix
R
= R(m
x (m - n )) used in defining B by (6) and to the columns of L; i.e.,
o
we have
(41)
If
l'
Q [R
=0
no - s2' m - no + s2
is a linear function of the rows of QB and can be written say as
(42 )
where 9,,'
(43)
L]
I' = 9,,' QB = 9,,' Q[M'
= 9,,'(1
+ RR']
= 9..' QM
I
x (no - s2))' then the best estimate of
!'HAl +
i'
r*
= 9..'QBB-1Al
=
9,,'QBB
d
= ~'B
-1
-1
Al
Al
l'£ is
given by
- 9,,'QBB-1L(L'B-~)-1(L1B-1Al - r)
I
I.
I
I
I
I
I
_ _ 0'
J
.1
Ie
I
J
I
1
J
I
I
I·
I
-15"-
However, (43) is the best estimate of such
where E is given by (11).
with respect to the model (3) in which there are no restrictions.
LIE
Thus, one
sees that the restrictions provide no information at all about linear functions of the parameters whose coefficient vectors lie in the vector space
generated by the rows of QB = QAA'.
Next, we observe that the vector spaces generated by the rows of QB and
the rows of L' are disjoint.
For suppose there existed
~*'
and
~'
such that
-
d*'L' = q'QB. then we would have q'QBQ'q = d*'L'Q'q = 0 which implies
,
-
~'Q= ~'
-
-
-
since B is symmetric positive definite; but the rows of Q are
independent which implies
~' =~'
and this implies
~*'
= 0'.
Since the only
vector common to the two spaces is the null vector, we have that they are
disjoint.
Also, we have by assumption that L' has rank 6 ; and by the non2
singularity of B, that QB has rank (no - s2).
Finally, since L' = KA' and
QB = QAA', the vector spaces generated by the rows of L' and the rows of QB
are contained in the vector space generated by the rows of A'.
In conclusion
the vector spaces generated by the rows of L' and the rows of QB are disjoint
vector spaces which span the vector space generated by the rows of A', that
is the vector space of coefficient vectors of estimable functions.
result, if [,~ is estimableJthen its coefficient vector
written as
(44 )
[' = ~'QB
+ ~*tL'
and its best estimate has the fonn
"-
[,~
=
~'QAl + ~*'r
I'
As a
may be uniquely
I
I.
I
I
I
I
-16We also note here that
(46)
However, (46) is a difficult expression to compute since it involves Q and
which are not easy to obtain.
observe that the best estimate of
I
I
I'
Ie
I
I
as follows.
is given by (:37) where
2
A
= f..'HAA'H'f..cr = :!'A'HAA'HA:!cr
Var (!.'£)
(48)
ll£
l'£
I'
First we
may be
= :!IA'; hence we have that
2
'Where
A'HA = A'B-1A - A'B-~ (L'B-~)-l L'B-1A
= A'B-lA
.~
r
We may obtain an alternative expression for
the variance of the best estimate of estimable
written as
_ A'B-1AK'(KA'B-1AA'B-1AK,)-1 KA'B-1A
W = A'B-1A
and recall from Lennna 4 that W is uniquely determined, symmetric, and
idempotent, then we have that
(50)
(51)
A'HA
A'HAA'HA
=W-
= WW
WK' [KWWK' ]-1 KW
- WK' [KWWK'
and
]-1xww - WWK' [KWWK' ]-~ + WK' [KWWK' r:IxwwK' [KWWK' r~
=W-
WK' [KWWK'] -J,cw-WKt[KWWK' ] -~ + WKt[KWWK']-~
I
=W -
WK' [KWWK'
I
:1
••I
•
If we let W = W (n x n) be defined by
I
I
~'
]-~
= A1 HA
Hence, we may re-write
(47) as
A
Var(f..'og) = k'A'HAA'HAkcr
= k'A1HAkcr2
= [IHf..cr2
2
I,
I.
I
I
I
I
-17where H is given by (38); a similar argument could be given to show that
(52) is valid When H is given by the more general form (35) since A'HA is
uniquely determined.
Note that -the expression given by (52) is never pigger
than I'B-l !cr2 since (L'B-~)-lcr2 is symmetric positive definite; the two
expressions are equal only if I'B-~
l'
= Q'which
is the case if and only if
lies in the vector space generated by the rows of QB.
l'E corresponding to
Thus, the variance
'I
of the best estimate of estimable
I
I
h
J
vector space generated by the rows of QB in which case the two are
I
I
I
,
I
I
~e
I
I
smaller than that corresponding to the model (3) unless
the model (26) is
I'
lies in the
equal~
Having studied the partitioning of the class of estimable functions,
we now consider the induced partitioning on the set of linear functions of
First of all there exists a matrix E = El«n - no) x n)
l
of rank (n - n ) whose rows are orthogonal to the rows of A; i.e., the rows
o
the observations.
of E form a basis of the error space.
l
of the error set.
The linear functions Ell form a basis
In the non-restricted model (3), the set of linear functions L'E are
all estimable because of (25) and their best estimates are given by
L'B
If we define E
2
= E2 (S2
-1
-1
Al = KA'B Al
x n) to be the matrix
E = L'B-IA = KA'B-1A =KW
2
then we observe that
E E' = KA'B-1AE'
2 1
1
= 0 s2'
.
n - n0
,
I
I.
I
I
-.......
I
I,
-18and hence that the sets Ell and E l are uncorrelated.
2
Finally, in either the non-restricted model (3) or the restricted
model (26), the set of linear functions QB:£= QAA ':£ are all estimable and
their best estimates are given by
(56)
QBB
If we define C
= C((no
I'
'I ,
= QAl
- s2) x n) to be the matrix,
C
(58)
,
= QA
CEll = QAEl' = 0n
o
-s,n-n
2
and
0
\
Ie
I
I
,.
I
= QL
Hence, the sets Ell, E l' and Cl are mutually uncorrelated and have variance2
covariance matrices as follows
Var(Ell)
(60)
2-
Var(Cl)
••I
= E1Ei0'2
-1
-1
2
-1
2
-L 2
Var (
E)
y = KA'B AA'B AK'O' = !<A'B AK'O' = L'B LJO'
"
I
I'
Al
then we observe that
I
"""
-1
= QAA 'Q' 0'2 = QBQ'. 0'2
From (60), we see that ElY' E2l' Cl all have symmetric positive definite
variance covariance matrices because B is symmetric positive definite and L'
has the full rank s2' Q has the full rank (no - s2)' and El has the full rank
(n - no).
But this implies that E2 has rank s2 and C has rank (no - s2).
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I,
I
I
I
••I
-19Thus, the sets
Cl,
E l, and Ell represent a partitioning of the set of
2
linear functions of Yl'Y2""'Yn into three mutually uncorrelated sets.
We call the set E l the induced error set due to restrictions since the
2
expected value of
E
2
l is
r. which
is known.
We call Cl the reduced estimation
set due to restrictions.
Finally, let us consider least squares estimation in the model (26).
We apply the method of Lagrangian multipliers to minimize
(61)
Let
(62)
where A.' ::; ~'(l x s2) is the vector of Lagrangian multipliers.
ating with respect to the elements of
~
and
~,
Differenti-
we obtain the following system
of equations for the stationary values of v
which may be re-written as
(64)
AAr~
+
AK'~ ::;
Al
KA'~::;
r.
The equations (64) may be solved by proceeding as follows.
,.,
where K is defined in (:31).
Hence we have
First
I
'I
I
I
I
I
I
I
I
e
'Ie
I
I
I
-20-
However, we also have
r :;: KA. 'E.:;: -KA' (AA' )gAKl~ + KA' (AA l )gA;r.
fllill
tV
:;: _KK'~ + K;r.
~
:;: KA 'E.
Hence, a solution to (64) is given by
(68)
A. :;: (t<f,)g (it;r. -r )
~ :;: (AA,)g A;r. - (AA l )gAK' (~, )g(it;r. - r )
" as
We may rewrite the solution Eo
"E.:;:
(69)
[(AA,)g - (AA,)g L [L'(AA,)~]g L'(AA,)g] Al
+ (AA,)gL [L'(AA,)gL]gr
:;: IiAl +
where Hand
r*
r* are defined by
(35).
Thus, the best estimate of estimable
(70)
"¢ :;:
i'HAl + i '
l'E.
given by (34) may be re-written as
"
r* :;: i~E.
where E. is given by (69). Thus, the least squares estimators of estimable
l'E.
are their best linear unbiased estimators.
2
the value of Min 8 •
In addition, let us find
To do this, let us first replace (AA,)g by B-1 •
Then
" , (t - A'E.)
"
Min 8 2 :;: (t - A'E.)
:;: (t - A' HA;r. - A' r*)' (;r. - A' HAt - Al r*)
I
I
.I
Min.8 2 :;: (t - A'B-IAt t A'B-~(L'B-~)-l L'B-IA;r. - A'B-~(L'B-~) r)
1
[transpose
same]
:;: [(y- A'B-IAy) + AlB-~(L'B-~)-l(LlB-IA;r. - r)]' [transpose same]
:;: [
(I
n
- A'B -lA)y + E' (E E') -1 (E Y - y)] ,
-
2
2 2
2--
[transpose same]
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
1
••I
-21-
Since Lennna 4 implies that A[I
n
- A'B-1A] = A - AA'B-1A = 0, we may take
In - A'B -1A to be given by Ei ( E1Ei ) -1 El ( That this is all right follows
also from Lennna 5) and hence find that
( ) - l Ey+
MinS 2 =y'E'EE'
- 1 1 1
1-
where S2
el
= y'E I (E E') -1 E Y is called "tue sum of squares to error in
- 1 1 1
1-
squares due to induced error, and S2 = S2 + S22
e
el
e
is the total sum of squares
due to error in the model (26).
Lennna 5:
Let F = F(m x n) and G = G(m x n) be two matrices satisfying the
following conditions
( i) Rank F = Rank G = n
,
<m<n
o -
(ii) There exists P = P (n x t) of rank (n - n ) <t
c
c
0
°
such that FP = GP =
c
c
mt
,
(iii) There exists P = P (s x m) of rank (m - n ) < s
r
r
0
such that Pr F
(iv) There exists S
Then F
Proof:
F
=
=
= Pr G =
=
°sn
,
Sen x m) such that FSF
=
F, GSG
=
G,
G.
Now (i) and (ii) imply the existence of C = C (m x m) such that
l
l
C1G since the rows of F and the rows of G generate the same vector space-
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
-e
-22namely the vector space orthogonal to the space generated by the columns of
P.
Similarly, (i) and (iii) imply the existence of C ;:: C (n x n) such
.
2
2
that G' ;:: C F' and hence G ;:: FC • Thus, the following relation holds
2
because of (iv)
c
2
F ;:: C1G ;:: C1GSG ;:: FSG
;:: FSFC
2
= FC2
=G
and this verifies the conclusion of the lennna.
F ;:: I
(74 )
P
c
If in (71) and (72), we set
- A'B-1A, G;:: E' (E E,)-l E
n
111
= A',
P
r
= A,
S
= In
1
,
then the conditions of Lemma 5 are satisfied and it follows that
In - A'B -lA
= Ei
(E Ei) -1 E •
l
l
We now define the sum of squares due to regression in the model (26)
as the sum of squares due to the linear set
Cr,
and this is given by
) - l Cv
Sr2 ;:: -y'C' (CC'
~
=
r' A' Q. ' (QBQ. ' ) -lQAr
We apply Lennna 5 in order to obtain a more readily computed formula for
S2.
r
(76)
Let
F ;:: A'Q.'(QBQ.,)-lQA, G ;:: A'HA;
then the conditions of Lennna 5 are satisfied if
P
c
=
,,
[E' , E']
1:
2
P
'r
= P'c '
S;:: I
n
because
I
I.
I
I
I
I
I
I
I
-23idempotency of G follows from (51) while idempotency of F follows from
QAA' = QB; FP
(49), (50),
and
(78)
0 follows
(54);
(58)
from
and
(59)
while GP . =
c
= trace F = n - s
· 0 2
and Rank F
Rank G
= trace
0 follows
from
while
G = trace W - trace I
s2
= no - s 2
Thus, we may re-write (75) as
8
2
r
= y'A'HAy
-
The decomposition of the total sum of squares in the model (26) is not
necessarily additive since
2
2
y'y'"
- - 8r - 8 e = y'y
- - - y'A'HAy
- - (y..A'HAy.A'
-'Y*)'(y
- - A'RAy
- - A'')'*)
-
(80)
Ie
I
I
I
I
I
I
I
I·
I
=
c
= +2(A'
L*)'(~
-
A'HA~) -
L*'AA'L*
= 2 L*'Al. - 2 L*'AA'HAl. - L*'AA'
= 2 L*'Al. -
since
L*'AA'HA
= 0 follows
If we define 3
d
from
L*
L*'AA'L*
(38), (54), (49), (50).
to be the departure from additivity, then
(81)
When we assume the yl' Y2' ••• , Yn have a joint normal distribution with
means and variances specified by the model (26), then our previous results
tell us that
C~, El~' E2~
are independent sets of random variables having
2
the normal distributions N(CA'£,CC'cr ), N(~,(ElEi)cr2), N(L,£,L'B-~cr2)
respectively.
2
2
2
Thus, we have that 8r , 8 e l' 8 e 2 all are independently distri-
I
I.
I
I
I
I
I
I
I
-24buted with S2 being distributed as non-central X2 '( n
r
2a
2
being distributed as central X (n - no)' and S;2 being distributed as
2
central X ( s2 ); also S~ = S;l + S~2 is distributed as central
2 "(n + s2 - n).
o
X
We also note that Sd is distributed as N(Y'(L'B-~)-l'Y
_,
4Y'(L'B-~)-ly) and is independent of S2 and S2 • We thus have the following
-
-
r
el
analysis of variance for the model (26):
Source
(82)
Degrees
of
Freedom
Sum of Squares
Regression n - s
o
2
s;
n-n +s
o 2
S2
e
=
Sd
= 2r*'A~-r*'M'r*
Error
Deviation
Ie
I
I
I
I
I
I
I
I·
I
0
- s , -p'AA 'BAA 'p
- ) S2
2
2
' el
Total
= Y'A'HA~
Mean Square
Variance Ratio
(n - n + s )S2
o
2
2
y'y-S -S
- - r d
n
l'~
Thus, we see that the presence of estimable restrictions does not alter
the class of estimable functions of the parameters.
However, by allowing us
to estimate certain functions with zero variance, they permit the construction
of estimates at least as good as and often better than those obtainable in the
unrestricted model.
Finally, they permit us to obtain a better estimate of
error with more degrees of freedom.
5. Models subject to restrictions. Here, we assume that the parameters of the
model specified by (3) are known to satisfy s linear restrictions of the
form
r
I
-25-
I.
I
I
I
I
I
I
I
II
I
I
I
I
I
I
~
I
lI'E, =
where
II'
= II' (s x m) and
1=
t
1(s x 1) are known.
Without loss of genera-
lity, we may assume that the rows of II' are linearly independent.
To deal
with the restricted model specified by
= I n rl
(84 )
lI'E,
=
1 ,
we need to
such that s2
< nand sl
-
a
=
s - s
<m - n , and the rows of L' lie in the
2 -
a
vector space generated by the rows of A' while the rows of
Ri
are linearly
independent of the rows of A'; i.e., we assume the restrictions (83) can be
partitioned into an estimable part like (24) and a non-estimable part like
(13) •
From Sections 3 and 4, we see that the class of estimable functions of
the parameters can be partitioned into three disjoint sUbsets, namely the
one generated by the rows of
Ri,
the one generated by the rows of L', and
the one generated by the rows of QAA' where Q = Q((no - s2) x m) is defined
to be a matrix of rank (no - s2) such that
(86)
Q[rr
where R'2
= R'2
((m 0
- n - sl) x m) is any matrix of rank m - na - sl whose
rows are independent of the rows of A' and the rows of
Ri'
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
-26The class of linear functions of the observations may be partitioned
in the same way as in Section 4 and the analysis of variance given by (82)
also applies here.
This holds true because as long as we deal with esti-
mable functions (with respect to
(3)),
non-estimable restrictions have no
effect; as before, the only effect of non-estimable restrictions is to
widen the class of linear functions of the parameters which can be estimated
unbiasedly.
¢ = ['E
Finally, we give an expression for the best estimate
such that [' can be written as ['
"¢ = £.'~ + ~"[l
where £.'
~l' ~
+
= kIA'
"¢
of
+ ~'Ri
~ L.
are given by the solution of
(88)
L'B-~
where B
= AA'
=
1
+ R R' + R R'
1 1
2 2
6. The non-restricted model with a general but non-singular variance-covariance
matrix.
Let Yl' Y2' ••• , Yn be random variables whose expected values are
given by (1) and whose variance-covariance matrix is non-singular and is
2
known except for a multiplicative constant a.
If we let V
=V(n x
n) be
the matrix defined by
v ll v
(89)
Var (~)
= Va
2
v
=
21
v
vnl v
v
12
22
n2
v
...
v
ln
2n
nn
2
a
,
I
Ie
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
-27then the model can be compactly written as
=
Var(~) = Va
A 'E"
2
The set of estimable functions of the parameters is the same here as for
the model specified by (3) since the design matrix A' is the same for both
cases.
Also, the error spaces associated with the models (3) and (90) are
identical because both are generated by the set of (1 x n) vectors which are
orthogonal to the rows of A.
However, an important distinction can be made
between the estimation spaces corresponding to these models.
By definition,
a linear function of Yl' Y2' ••• , Y is an estimating function if it is unn
correlated with all linear functions belonging to error. Hence, £'r is an
estimating function if and only if
c'Ve
for every
~'r
=0
belonging to error.
As before, the set of all estimating
functions is called the estimation set.
Lemma 6:
A non-null linear function of Yl ' Y2 ' ••• , Yn cannot, at the same
time, belong to the estimation set and the error set.
Proof:
Let !ir, fer, ••• , !'n r be n
e
e
linearly independent non-null linear
functions belonging to error (where as before n
space) •
Let F
= F(ne
x n) be given by
f'
-1
f'
F=
::.e
...
f'
-n
e
;
e
is the rank of the error
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
-28-
= Pene
There exists an orthogonal matrixP
x ne )
such that
PFVF'P' -- D"A.
where D is a diagonal matrix with positive diagonal elements "A. , "A. ' ••• , An
1
A
2
e
since ~,~, is symmetric positive definite. Defining E = E(n x n) and
e
z = z(n x 1) by
- e
ely
e'
-1
PF = E =
(94)
e'
~
-1-
,
z·= PFI. = El =
ely
.::e-
...
e'
-'!le
we have that
~il' ~
-c
e' y
-n e
l' .", -'!le
e' l are uncorre1ated linear functions of the
y's belonging to error; they also form a basis of the error set.
if
~'l
Hence,
is a non-null linear function belonging to error, then it can be
expressed as
e 'y =
- -
do
e' y + ¢ e' y + ... +
~1-1 -
2.::e -
0n
e'
e
y
- n -
e
= E.._
do 'Ey.
If e' l also belongs to the estimation set, then
(96)
= Cov (e 'y
0'
-'!l
e
- _'
but (96) implies that ~'
Ey) = nl'E'VE,0'2 = nl'D 0'2
E..
E.. "A.
= 0'
!le
,.
since all the A's are positive.
As a result,
the only vector common to the estimation space and the error space is the
null vector.
We now investigate the structure of the estimation space.
Because of (91) and
(95), the linear function £.'l is an estimating function if and only if c
satisfies the equations
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
-29EVc
=0
Since V is non-singular and E has the full rank n , the above represent a
e
set of n independent linear homogeneous equations to determine c. As a
e
-
result, the equations (97) have n = n - n independent solutions.
o
e
of solutions to (96), which is of interest, is
(98)
C
Note that EVC
-1
= [~'£e' •.. , £m ] = V
= EVV -1A' . = EA' = On
A'
m since the rows of E and A are ortho-
e
gonal by construction; also note that C has rank n
singular.
Thus,
Cr
One set
generates the
estimatio~
-1
o
since V
space.
is non-
We can now prove the
fundamental theorem of linear estimation in the model specified by (90).
Theorem 2:
If i..':E. is any estimable function of the parameters Pl' P2' ••• , Pm'
then
(i) there exists a unique linear function
such that
£'
of Yl' Y , ••• , Y
2
n
belongs to the estimation space and e(~'l) = i'
also, the variance of
£'l
(ii) the best estimator of
£,
is smaller than the variance of any
other linear unbiased estimator of
l'E. may be
(a) ~'AV-ll where ~' (AV-1A')
A
£'l
1,:E.;
written as either
= I'
A
(b) IIp' 'Where :E. is a value of :E. Which minimizes the sum of
2
1
A
squares S = (l - Alp') I V- (l - A':E.); i.e., :E.satisfies
-1
-1
the equations AV A1£ = AV l ;
(iii) the variance of the best estimator of
and given by II (AV-1A I )gl
l':E.
is uniquely determined
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
-30Proof:
It follows from Lennna
6, (96),
(98)
and
that the n linear functions
ely, ••• , ~
e' -y and ~
y, -J.
c! v~ ••• , 2JJ . -yare independent, where
-J. e
1
2
no
represent n independent rows of C, and that any arbic., c' , ••• , c'.
o
-J. - J .
~n
2
l
o
e~y,
-2-
-.1..-
,~
trary linear function of Yl' Y2' ... , Yn may be expressed in a unique manner
as a linear combination of them.
Since
linear function ~'l such that e(~'l) =
1.'"/2.
.£.'"/2.'
is estimable, there exists a
From what has been indicated
previously we can uniquely write
£'l
where
is in the estimation set and
~'l
is in the error set.
As a
result,
(100)
Suppose
~
l is another linear function in the estimation set which is an
unbiased estimator of
1.'"/2.'
Then, e{(£o - £) 'l} = 0
and hence
(20 -
£) '"l
belongs to the error set; but it is also in the estimation set since both
c' y and cry are in the estimation set.
-0 -
c
"""0
--
- c
-
= --n
0
or c
-0
= c;
-
hence
£'l.
From Lemma 6, it follows that
is uniquely defined.
Next we observe that
(101)
since
£'l
and ~'l are uncorrelated by construction and Var(~'"l) is positive
except when
~'
=
%.
From (i) and
This completes the proof of (i).
(98) , i t
follows that the best estimate of !..'~ can be
written as a linear function of
(102)
£i l'
~"l,
•••
c' l; i.e.,
'"1Il
I
I
"_
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
-311
1
But (102) and the fact that l'J2. = e(9.,'AV- ;X) = 9.,'AV- A'J2.
imply that .9,,'
satisfies
u
(103)
-1
.!.' = 9.,' AV A'
~
Alternatively, if J2. is a solution to the equations
(AV -1)
A' p"=AV -1"l.
(104)
and if l'J2. is estimable with
l'
satisfying (103), then
"
1 "
l'J2.
= 9.,'AV- A'J2.
(105)
= 9.'AV
-1
"l.
" = best estimate
= l'J2.
ofI'J2.
•
The equations (104) are called the generalized least squares (normal)
equations.
AlSO, the equations
CS 2
1
r - = -2(l - A'J2.)' V- A' = 0'
v.s::.
- m
(106)
which simplify to (104) determine the stationary values at which S2 attains
its minimum.
Finally, we note that
(
)
107
u" )
-L__ -l
2
-1' 2
0
2
Var (.!.'J2. = 9.,'AV lTV A's,O" = 9.,'AV A 9,,0" = .!.'9.,0"
Since the equations (103) are consistent when
to them is given by
-1
go
9., = (AV A')!.
(108)
Substituting (108) into (107), we obtain
(109)
"
Var(I'J2.) =
I'
1
2
(AV- A' )g!.O"
•
l'J2.
is estimable, a solution
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••I
-32Since
I'R is
estimable and V is non-singular, symmetric, and positive definite,
Lennna 4 implies that (109) is uniquely determined.
This completes the proof
of the theorem.
Finally, we define the sum of squares due to regression in the model (90)
by
lv
2 "'AV-ll=Z 'V-l 'A'(AV-1A' )gAvSr=R
.L-I.
~
(110)
and the sum of squares due to error by
S2e
(111)
= -y'V -1v
~
-
Sr2 = Min S2
The above follow by procedures analagous to those given in Bose [1950] for
determining the sum of squares due to a set of linear functions of the y's.
Alternatively, one can use the transformation
r=Tz ,
(112)
where T is a non-singular matrix such that TVT'
= In ,
and consider ordinary
least squares estimation in the model
(113)
For the model (113), the sum of squares due to regression is given by
(114)
s; = l*'T A'(AT'TA,)gAT';y:*" = l'V-1A'(AV-1A,)gAV..l l
because T'T
= V-1 and
2
Se
= -Y*'Y! -
2
-1
2
Sr = l'V "l.. - Sr
= Min
2
S
•
The expressions (114) and (115) in terms of the y*' s are well known and are
derived in Bose [1950].
Last of all, it is worthwhile to observe that the partitioning of the
sum of squares in the model (90) is not additive in the sense that
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
-33(116)
•
If' we def'ine
S~ by
..9
(117)
0-d
= -y'v
- -y'V
£..
-1
v
lL.
,
2
then it f'ollows that Sd represents the amount by which S2 and S2 f'a11 to
r
e
add to the total sum of' squares l'il...
This lack of' additivity is due to
the f'ailure of' V to be an identity matrix and the resulting lack of' orthogonality between the estimation set and the error set.
In conclusion, we observe that the ef'fect of the variance covariance
matrix V on the linear model is exerted through the transf'or.mation of' the
1
estimation set f'rom Al. to .A2:- 1..
As a result of this, best estimates of'
linear f'unctions and sums of' squares are suitably modif'ied.
estimable
However, the class of' estimable f'unctions of' the parameters and the class
of' linear f'unctions of' the observations belonging to error remain unchanged.
7. The model of' Section 6 subject to restrictions. Suppose we have the model
(118)
II'~ =
1
Where V is given by (89) and the restrictions are given by (83).
make the transf'ormation (112).
-
= TA'p,
-
II'~
=1
e(y*)
Var(y)
-
Let us
Then
= I n rl
·
The results of' Section 5 may now be applied to (119).
the analysis of' this case.
This completes
I
I
e
I
I,
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,-
I
.-348. The general linear model.
Let Y , Y ' ••• , Y be random variables whose
2
l
n
expected values are given by (1) and Whose variance-covariance matrix is
given by (89); here, however, we don't require V to be non-singular.
The
model can be compactly written as
(120)
where we assume that V has rank r •
The set of estimable functions of the parameters is the same here as for the
model
specified by (3).
Also, the error space associated with the models
(3) and (120) are ident:idal
because both are generated by the set of
(1 x n) vectors which are orthogonal to the rows of A.
However, the same
distinction can be made between the estimation spaces corresponding to these
models as was made in Section 6.
Hence, £'l. is an estimating function if
and only if
c'Ve
(121)
=0
for every £'l. belonging to error.
is called the estimation set.
Lemma 7:
Again the set of all estimating functions
Lemma 6 may be generalized here to
1,
A non-mill linear function of Y
Y ' ••• , Y having positive variance
2
n
cannot, at the same time, belong to the estimation set and the subset of functions in the error set having positive variances; i.e., the only linear functions of the yes which are common to the estimation set and the error set are
those having zero variances.
Proof:
Let f ~ Y, ••• , f'
-n
Y be n linearly independent non-null linear
e
e
functions belonging to error and let F be given by (92). Re-define P to be
-.L,-
an orthogonal matrix such that
I
I.
.J
I
I
I
I
'I
J
Ie
I
I
I
I
I
I
I
,.
I
(122)
=
PFVF'P'
where D~ ::: D~(re x r ) is a diagonal matrix with positive diagonal elements
e
.. '~r
~'~2"
(123 )
e
where r e < n e •
E
l
PF = E
Let E and z be again defined by
a'
El(r e x n)
-l
=
~
E
2
•••
e'
-:r
=
E ((n
2
- r ) x n)
e
e
e
., .
e'
~
Then,
~i~'
... ,
e
e' yare uncorrelated linear functions of the y'sbelonging
-ne ~'l~'
••• ~~ represents
e
a subset of the error set in which the elements all have positive variances.
to error which form a basis of the error set; also,
If
~'~
is a non-null linear function belonging to error, then it can be
written as
(124)
••• + ¢
e'
y + ••• + ¢e'
e
e
Since
(125)
v
n .- n "-
r - r -
e
r
Var(~'~)
=
Var(~.'E~) ::: ~'PFVF'P,~02
=
L:
e
j=l
~.f;02
J J
we have that ¢~ must be positive for at least one, 'j
is to have positive variance.
If
~'~
= 1,
e
=¢'Ey.
-
,
2, ••• , r e if ~'~
also belongs to the estimation set,
then
(126)
o=
Cov(e'y e! ~)
- -'-J
= ~'EVe
-j
0
= ¢.e'jVe
J-j
2
j
= 1, 2,
j = 1, 2,
j = 1, 2,
·.. , ne
·.. , na
·.. , ne
;
I
I.
I
I
I
I
I
I
I
Ie
-36but (126) implies that ¢j = 0
j = 1, 2, ••• ,
I
I
e
•
However, this implies that
t
I
,.
I
e
since ~j V~j = ~j >0
~'l
has zero variance.
estimation set and the error set are those which have zero variance.
Let us now examine more closely the linear set E2l.
This set consists
of linear functions of the y's which are identically zero (i.e., their ex-
AEI = 0
2
m,ne
- i'
,
e
VEl
2
= 0n,
ne - r e
,.
i.e., the rows of E represent a basis of the vector space orthogonal to
2
both the rows of A and V.
Moreover, since the relations
o = E2-Y = E2A'p- = On.=;.
-
provide no information at all about the parameters, the linear set E l
2
can be ignored.
Next let us consider the linear set Ell.
This represents the true
error set since all elements of it have their expected values equal to zero
and their variances positive.
The rank of this true error set is r •
e
Since V is of rank r > r , then there exist (n - r) independent
-
I
r
As a result, the only linear functions of the y's which belong to both the
(128)
I
I
r
=1,2, ... ,
j
e
(n x 1) vectors which are orthogonal to the rows of V; moreover, (n - r) - r ) = (n - n ) - (r - r ) = n . - r
(where r < n = Rank A) of
e
e
e.
e
0
0
0 -0
these may be taken to be orthogonal to the rows of E.
(n
Hence, if we let Ri = Ri«no - r 0) x n) be an orthogonal completion to
I
Ie
I
-37the vector space generated by the rows of V and the rows of E and if we
I
Var(R'v)
1~
= R'VR
(l = 0n .
1 1
I
From (129), it follows that
I
(130)
I
I
I
Ie
I
I
I
I
I
I
I
,e
I
*~1 = R'v
1lL. = R'1
- r ,n
000
e(y)
-
- r0
= R'A'n
1 ....
•
As a result, we see that the parameters satisfy the (n
o
- r ) restrictions
0
specified by (130).
Finally, if we let T
= T(
r
x n) be an orthogonal completion to the
o
vector space generated by the rows of E and the rows of Ri and if we define
r- by
,
(131)
then
(132)
where TVT' is syrmnetric positive definite of rank r •
o
In swmnary, we have partitioned the set of linear functions of
Y1' Y2' ••• , Yn into four disjoint and exhaustive sets Tl' Ril' Ell, E2 l
each with a meaningful interpretation.
transformation
(133)
=
Using this, we consider the following
I
I.
I
I
I
I
J
I
I
Ie
I
I
I
I
I
I
,.
I
I
-38The transformed model is
(134)
where the rows of RiA I are independent of the rows of TA I .
The model (134) is characterized by a non-singular variance-covariance matrix
and the presence of non-estimable restrictions.
It may be analyzed by the
7.
methods of Section
The computations may be simplified somewhat by using the following
modification to the above approach.
Let E be defined as before by the
2
equations
(135)
VEl = 0
2 ' n,n
e
AEI = 0
2
m;n
- re
,.
- re "
e
let E be defined by
l
(136)
= 0m,
E EI = 0
2 1
n
AEI
1
e
re
;
-r , r
e
e
let Ribe defined by
VR
=0
I n ' ' n0 - r 0 .
,•
I
I
I
I
I
I
I
I
I
e
I_
I
I
I
I
I
I
I
e
I
I
-39and let T be defined by
(138)
ET' = 0
n
R'T' .- 0
1
,
e
n
0
r
0
- r 0' r
0
Then the transformed model is
T V T'
e
(J
2
[ E VT'
1
TIr
:f.l
= R'A'p
1-
which can be analyzed by the methods of Section 7.
The computations leading
to E , E , Ri) T can be formidable in Which case this type of analysis
2
l
probably should be avoided. In any event, it does demonstrate that the
effect of singularities in V appears in the form of the non-informative
identities (128) and, more importantly, the'non-estimable'restrictions
(130) •
Finally, if there should be any restrictions associated with the
model (120), then they can be applied to either of the transformed models
(134) or (139) and dealt with accordingly.
8.A. Solution of Equations (121). We restrict
positive variance.
may write E = UIV.
l
~'
to be such that ~'l has
Hence e' lies in vector space generated byEl • Now we
Consider the equations
I
I
e
I
I
I
I
I
I
I
Ie
"
I
I
I
I
I
I
.e
I
-40Two sets of solutions to (140), which are of interest, are the columns of
(141)
C
=
[£'1'
::e' ... , ~]
= (rl-)~A'
and
by Lemma 4, we have
(142)
E1VC
= ulrl-(rl-)~A'
= U1VA'
= E1At = 0;
also VRl
=0
Note that the computations involved with this approach may be as extensive
as those associated with the transformed models (139).
In any event, this result may be of use in generalizing Theorem 2.
I
I.
I
I
I
I,
I
I
BIBLIOGRAPHY
Bose, R. C. [1950], Least Square Aspects of the Analysis of Variance,
Mimeographed series, No.9, North Carolina University.
Lucas, H. L. [1965], Lecture Notes - Statistics 691, North Carolina State
University.
P1ackett, R. L. [1960], Principles of Regression Analysis, Clarendon Press,
Oxford.
Rao, C. R. [1945a], Generalization of Markov's theorem and tests of linear
hypotheses, Sankhya 7, 9-15.
Rao, C. R. [1945b], Markov's theorem with linear restrictions on parameters,
Sankhya 7, 16-19.
I
Rao, C. R. [1946], On the linear combination of observations and the
general theory of least squares, Sankhya 7, 237-256.
Ie
Rac, C. R. [1965], Linear Statistical Inference and Its Applications,
John Wiley and Sons, Inc., New York.
I
I
I
I
I
I
I
,.
I
© Copyright 2026 Paperzz