DIRECTIOnALLY MI:;If\'IAX lilliAN SQUARE
ERROR
IN LINEAR MODELS
ESTlt~TION
by
Guillermo Pedro Zarate de Lara.
Institute of Statistics
Mimeograph Series No. 1102
Raleigh, N.C.
.
'e
ABSTRACT
ZARATE DE LARA, GUILLERMO PEDRO.
Directionally Minimax Mean Square
Error Estimation In Linear Models.
(Under the direction of THOMAS
M. GERIG and ROBERT oJ. MONROE.)
A minimax criterion, using the trace of the mean squared error
matrix, for estimating regression parameters in a linear model is
proposed.
It is shown that with respect to this criterion non-
homogeneous linear estimators are inadmissible.
Estimators are de-
rived using the criterion for (i) the class of ordinary least squares
estimators computed subject to false restrictions, (ii) the class of
shrinkage estimators and (iii) the class of general linear estimators.
The criterion proposed above is then generalized by applying the
minimax argument directly to the mean square error matrix.
A matrix
ordering, and a least upper bound with respect to this ordering, is
defined and a set of sufficient conditions for its existence are
presented.
It is shown that the estimator obtained by using this
criterion can be seen as a generalization of the Ridge Regression
Estimator of Hoerl and Kennard.
The form of the estimator is used to
propose a joint estimator for the regression coefficients of a set of
dependent regression equations describing different variables but
sharing the same design matrix.
The
estim~tor
is compared with a
naive Ridge Regression type and some asymptotic results for both
estimators are presented.
ii
BIOGRAPHY
The author was born on August 1, 1946, in Mexico City, Mexico.
He was reared primarily in Oaxaca, Mextco, where he received his
elementary and secondary education.
He received the Bachelor of Science degree with a major in
So1.1 Science from the National School of Agriculture in 1969.
-
Under
the support of the "Consejo Nacional de Ciencia y Tecnologia" (CONACYT),
he then received the Master of Science Degree in Statistics from the
-
"Centro de Estadistica y Calculo (CEC), del Co1egio de Postgraduados"
of the National School of Agriculture in 1972, and has been a
professor there since 1972.
In 1973, a scholarship was granted to him from the CONACYT to
pursue his studies in statistics towards a Doctorate at North
Carolina State University.
The author is married to the former Graciela Gonzalez Kauffmann
and they have a daughter -
Grac~ela.
iii
ACKNOWLEDGEMENTS
No man knows fully all the forces and significant people who
shaped his thinking and his work.
Yet, upon the completion of one of
the most demanding undertakings, a dissertation, one cannot escape
a feeling of tremendous debt owed to others.
Therefore of the many
persons who actively contributed to its development, I must acknowledge
the following.
Professor Thomas M. Gerig, who from the onset of our relationship as student and advisor, had a genuine interest in my graduate
program and supported me with his invaluable counsel and guidance
throughout the preparation of this thesis.
In addition to showing me
the rigors of the statistical thought, he showed me the virtue of
being a person without pretence, challenged me to learn and gave me
his friendship.
Professor Robert G. D. Steel, who from the first hours of my
arrival on campus and throughout my entire graduate program, has
supported me with his counsel and friendship.
Professor Robert J. Monroe, for serving on my committee.
Professor Stephen L. Campbell, who always gave me advice with
patience, friendship and an enthusiastic attitude.
Professor John W. Bishir, for his counsel and his sincere
friendship throughout these years of study.
Professor Ignacio Mendez R., for introducing me to the study of
statistics.
Professors Alfonso Carrillo L. and Eduardo Casas D., for their
encouragement and help in studying statistics.
iv
My wife, who encouraged me and assisted me through the difficult
times, for the belief that I could do it, and for her waiting until it
was finished.
My daughter, who always brightened my day.
To all
my
fellow graduate students, who have been an exemplary
model of intellectual and academic excellence, in addition to being
true friends.
To Linda Bielawski, who spent many hours typing this thesis.
Finally an eternal thanks to
my
parents, without whose motivation
and devotion this entire educational program would never have been made
possible.
A grant from the IfConsejo Nacional de Ciencia y Tecnolog{a lf has
made possible my graduate program.
to this institution.
I want to express my gratitude
v
TABLE OF CONTENTS
Page
...
..
1.
INTRODUCTION AND SUMMARY
2.
NOTATION.
3.
LITERATURE REVIEW
4.
DIRECTIONALLY MINIMAX TRACE MEAN SQUARED ERROR ESTIMATION
"0"'''01
•
•
•
lJ
•
o
•
•
•
•
•
•
c
•
•
"
•
0
•
• • • • • • •
•
c
•
••••
C
•••
all
•••
1
4
7
30
4.1 DMTMSE Estimation In The Class of 018 Estimators
Computed Subject To False Restrictions
• • • • • •
32
4.1.1 The Single Constraint Case • • • • • • • • •
4.1.2 The General Case • • • • • • • • • • • • • •
4.t,J Some Properties of The Estimator • • • • • •
38
42
4.2 The DMTMSE Estimator For The Class of Shrinkage
Estimators • • • • • • • • • • • • • • • • • • • •
4.3 The DMTMSE Estimator For The Class of General
Linear Functions • • • • • • • • • • • • • • • ••
.5.
DIRECTIONALLY MINIMAX MEAN SQUARED ERROR ESTIMATION
,
'II
•
•
5.1 A Matrix Ordering • • • • • • • • • • • • • • • • ••
5.2 Geometric Interpretation of The Matrix Ordering • ••
5.3 Definition of a Supremum in The Matrix Ordering. ••
5.4 Directionally Minimax Mean Squared Error Estimation.
5. .5 An Alternative Procedure To Obtain The DMMSE
Estimator • • • • • • • • • • • • • • • • • • • ••
5.6 Relation Between The DMMSE Estimator And The
Ridge Regression Estimator • • • • • • • • • • ••
.5.7 Minimization of The Tr[S(Ay,k)] For The Restriction
Case
6.
• . . . • . • • • . . •
e
i
•
Ii
i
•
•
•
•
i.
JOINT DIRECTIONALLY MINIMAX MEAN SQUARED ERROR ESTIMATION
35
4.5
47
.57
.57
.58
61
67
77
80
81
85
6.1 Some Basic Definitions and Notation • • • • • • • ••
6.2 The Joint Generalized DMMSE Estimator. • • • • • ••
8.5
86
6.2.1 Asymptotic Distribution Results For The
Joint DMMSE Estimator And The Joint
Ridge Regression Estimator • • • • • • ••
91
6.3 Comparison Between The Variances of The Joint IS,
Joint DMMSE and Joint Ridge Regression Estimators.
101
vi
TABLE OF CONTENTS (Continued)
...
Page
'" and Var(~R) • • • • 102
6.3.1 Comparison Between Var(~)
6.3.2 Comparison Between Var(H) and Var(~) • t • • • 105
6:3;3 Comparison Between Var(~) and var(§.RR) • • • • 106
N
7.
PROBLEMS FOR FURTHER RESEARCH
8.
LIST OF REFERENCES • • • • • • • • • • • • • • • • • • • • • • 114
• • • •
u
•
•
•
•
• • • • • • • 112
1.
INTRODUCTION AND SUMMARY
The most commonly used estimators of the regression coefficients
in general linear models are the best linear unbiased estimators (or
ordinary least squares estimators).
By dropping the unbiasedness
criterion, linear estimators can be obtained that have smaller
variances.
In this situation, it is common to adopt some type of mean
squared error criterion which balances increased bias against reduced
variance.
Unfortunately, estimators derived from this type of
criterion invariably lead to expressions that involve the parameters
themselves and, as such, are useless in practice.
Several authors have proposed classes of biased regression
estimators which can be used as alternatives to the ordinary least
squares (OLS) estimator.
These estimators have been studied and com-
pared to OLS estimators with respect to mean squared error.
Typically,
they are not uniformly better in this respect but in some situations
can be dramatically superior.
Some of the common methods of biased
regression estimation include Hoerl and Kennard's
[9J ridge regression,
Marquardt's [13J generalized inverse ~egression, Mayer and Willke's
[14J shrunken OLS regression, and Toro and Wallace's [18J false
restrictions regression.
Many others have proposed modifications and
extensions to these.
The purpose of this work is to develop meaningf'ul criteria and
use them to derive biased estimators which will be superior to OLS
with respect to mean squared error.
In Chapter 4 a critarion i.s proposed which is based on a minimax
argument.
In the formula for the trace of the mean squared error
2
matrix, the value of the regression parameters which are, in a sense,
the least favorable to estimation (that is, which result in the
greatest bias) are substituted for the unknown parameters.
By this,
it is hoped that the derived estimators will protect the user from the
worst case that nature can produce.
It is shown that, with respect to
this criterion, non-homogeneous linear estimators are inadmissible.
The criterion is applied to three classes of (bomogeneous) linear
estimators.
First, it is applied to the class of estimators obtained
by computing 018 estimators subject to false constraints.
In this
case it is found that the generalized inverse estimator of Marquardt
[13J is optimum.
Secondly, the criterion is applied to the class of
shrinkage estimators defined by Goldstein and Smith
[7J. Here the
optimum is found to be a member of the subclass of shrunken OIS
estimators defined by Mayer and Willke [14J.
class of linear estimators is studied.
Finally, the general
For this, the optimum turns
out to be the same as that for the shrinkage class.
It is shown as
well that the estimator of a linear combination of the parameters is
not in general equal to the same linear combination of the estimators
of the parameters.
In Chapter
5, the criterion proposed before is generalized, in
the sense that the same minimax argument is applied to the mean square
error matrix.
In order to do so, a matrix ordering and a supremum or
least upper bound with respect to this ordering is defined, and a set
of sufficient conditions for the existence of the supremum is presented.
When this criterion is minimized, the resulting estimat.or is
shown to be a generalization of the ridge regression estimator of
Hoerl and Kennard [9J.
It is shown also that, in general, for this
3
criterion, the estilnator of a linear combination of the parameters is
equal to the same linear combination of the estimators of the parameters.
In Chapter 6, the form of the estimator obtained in Chapter 5 is
used to propose a joint estimator for the regression coefficients of a
set of dependent regression equations describing different variables
but sharing the same design matrix.
These estimators are sinrl.lar to
the "seemingly unrelated regression" estimators of Zellner [21J.
The
estimator is compared with a naive ridge regression type competitor
and some asymptotic results for both estimators are presented.
Finally, in Chapter 7,
some ideas and problems that arose in the
course of this research are proposed for further studies.
4
2.
NOTATION
Consider the linear model
y=Xf3+e,
where
y
is an
matrix of rank
(n x 1)
vector of observation,
m(;;;n) of known constants,
of unknown regression parameters, and
e
f3
II:
!f
o.
is an
is an
unobservable random variables wi th E( e) = 0
X is an
and
(n x m)
(m x 1)
(n x 1)
vector
vector of
E ( ee ') = I:a 2 •
For convenience, we shall refer to this model by
(y,XS,~};).
Define
s = X'X
(2.2)
and
and write the singular value decomposition of
X
as
(2.4)
where
U is
" >..1/2
-..:::.
m
,-' 0 ,
is the
(n x m) ,
V is
(m x m)
1,~/2::
(m x m) ,
diag(>..i/2 ,
U'U = I
identity matrix.
S = VAV' •
Define the partitioned matrices
m
...,
V'V = VV'
From this we write
1/2
>"1
=r m '
:2 ...
and
1m
5
(2.6)
and
where
and
VI
U
2
is
is
(m x m-u) ,
(n x u)
V2
for some
. 1/2
/\1
.1/2
/\
/\i/2
is
(m x u) ,
u.
UI
is
(n x m-u) ,
Similarly, defi.ne
0
_.
(2.8)
1/2
2
0
where
is
[(m-u) x (m-u)]
and
/\~/2
is
(u x u)
for some
u •
We shall be concerned with the simultaneous. estimation of the regression parameters,
Ay + b ,where
~,using
A is some
linear functions of the observations
(m X m)
matrix and
b an
(m x 1)
vector.
For this we shall need the mean squared error matriX8
= E[(Ay
=A~A'~
+ b - (3)(Ay + b - ~)'J
+
[b
+
(AX - I)~][b
and we shall need the trace of
= tr[M(Ay + b,(3)] •
+
(AX - I)~J' ,
(2.9)
M,
(2.10)
6
Furthermore,
M(Ay,P's;s,r;cf) == ~A';- + (p' - AX)SS'(p' - AX)'
will be used as the MSE matrix f'or the estimator Ay
(2.11)
of' parameter
P'j3 •
S
The arguments
and ~;- are included to emphasize the de-
pendence of' the above expressions on them.
They will be omitted f'rom
the notation when there is no danger of' confusion.
Finallyp def'ine
(2.12)
Chm(A)
is the smallest characteristic root of' the matrix
A,
(iii)C(A)
(2.14)
is the vector space generated by the columns of'
A ,
(2.15)
(iv) ~(A) is the vector space generated by the rows of' A ,
(2.16)
(v) C;(1A)
(vi)
is the vector space orthogonal to the vector space
generated by the rows of' A,
"Ax i !
•
I!Ai! = su.p
\'
.!
x10
~
i.
xl,
,where
A
18
a
(p x q)
matrix.
7
.3.
LITERATURE REVIEW
A problem that can occur in regression analysis is that, when the
model
(y, X[3,L: cl),
I~
I 'f 0
is correctly specified p some cqlumns of the
X matrix come quite close to being linearly dependent.
This results
in some very small but positive eigenvalues of the
matrix, and,
X'X
therefore, in large variances for the least squares (18) estimates of
some linear combinations of the regression coefficients.
Some comments
about how this problem arises in econometrics are given by Chipman
[.3J who says that it can arise in two different ways:
(1)
It is very common that a "true" specification
would usually result in the number of variables
exceeding the number of observations. In demand analysis, from the point of view of pure
theory all prices should, in principle, be included as determining variables. This fact is
usually obscured by the practice of neglecting
some variables that do not seem individually
to be of great importance, or by combining
some variables linearly in order to obtain
aggregate variables. Such practices are
usually ad hoc, finding justification neither
in economics nor statistical theory.
(2)
Even if the number of observations exceeds
the number of variables, a number of columns
of the observation matrix may be linearly dependent, or nearly so. This occurs in
econometric time series analysis when some
variables (SUCh as prices) move up and down
together or remain stationary; or in cross
section analysis when different consumers
face the same set of market prices.
The two key properties of the 018 estimator
are that it is unbiased:
8
= 13
E(~)
and that it has minimum variance among all linear unbiased estimators.
The variance is
By dropping the unbiasedness criterion, linear estimators can be
obtained that have smaller variances.
In this situation it is common
to adopt some type of mean square error (MSE) cri.terion which balances
increased bias against reduced variance.
Some of the earlier work
done in the area of estimation using a MSE criterion includes that of
Chipman
[3J. He uses the concept of minimum MSE procedure to obtain
estimates of the
parameter~
13
The approach given in his paper is
delineated in the following lines.
(m x 1)
Let the
vector
13
have a prior probability distri-
bution with
EI3
=S
(n x 1)
Similarly, let the
E(e)
=0
E( ee')
random vector
e
have mean and variance
,
= c:lli = D
Assume further that
E(I3 - S)e'
13
and
=0
•
•
e
are uncorrelated;
i.!:..,
9
He defines the deviation of
S from its mean as
s=~ - S ,
thus,
=0
E
e
From the probability distribution of
conditional distribution of
y
0
o
D
=
Var
e
C
= XS
Sand
+ e
e, Chipman derives the
given, respectively,
Sand
e , with mean and variance
Var(y Ie)
= XCX'
Thus, the unconditional distribution of
Ey
= XS
which defines
Var(y)
= XCX'
y
•
has mean and variance
+ D~ W
W•
Under this setup, he introduces the following definition:
Definition 3.1
-=
A linear estimator S
squared error estimator of
R
=R(A,b) = E(S- -
Ay + b
is said to be a minimum mean
S if A and
-
S)(S - S)'
b are such that the matrix
10
is minimum.
R may be called the matrix of MSE, or more briefly the
risk matrix.
The precise meaning of minimum is stated in Section 5.1.
proceeds, as in Foster [6, p. 388J, first to minimize
to
b and then with respect to A, obtaining that
minimized with respect to
b ::: (I -
He then
R with respect
R(A,b)
is
b when
Ax)13
so that the problem is reduced to finding a matrix A such that
R(A) ::: (I - AX)C(I - AX)' + ADA'
is a minimum.
The choice of the matrix A is then given by the following
theorem:
Theorem 3.1
Let X be an
(n x m)
matrix, and let
C,
D be positive
definite matrices of order m and n , respectively.
a unique
(m x n)
matrix A
= AO
Then there is
(called the optimum inverse of X)
which minimizes
R = (I - AX)C(I - AX)' + ADA'
and it is equal to
' •..J:'
_~'i.
.\
the minimum risk then becomes
.
I'
'. ~ i'
;'.
...
),
11
so the estimator becomes
Toro and Wallace [18J consider using L8 estimators in linear
models w (y,X~,I~), calculated subject to false restric'tions in
hopes of reducing the MSE of estimation.
They propose a uniformly
most powerful test to check for a given data set whether a particular
set of restrictions will reduce MSE. Since the MSE criterion suggests
a framework for thinking about the problem of multioollinearity in a
linear model, they present some examples to illustrate the linkage of
the MSE criterion with multicollinearity.
Toro and Wallace's work consists first in obtaining a class of
biased estimators by imposing a set of false restrictions on the parameter space, of the form
RS
=h
where R is a
,
(u x m)
matrix of known oonstants with rank u
and h a veotor of known oonstants.
~
m
Under this setup, the estimator
is of the form
where ~ is the 013 defined in
N
distribution of
~
is
(2.3) and S-l is given in (2.2). The
12
where
Toro and Wallace make the remark "the restrictions come into play in
reducing variances and one should also note that the restriction reduce variances no matter whether it is true or not".
The problem that arises now is to determine whether a specific
set of restrictions leads to better estimates, where the criterion
"bettemess" is taken to be the MaE criterion.
"For
e
A
to be better than
that for every
(n x 1)
e*
in MSE it is required
vector d
M3E d' "eS MSE d t
To be precise, we ha.ve.
F0
e* •"
Under this definition, Toro and Wallace found that
- ~(RS - ~RS - h)'](RS-1R,)-lRS-l ,
~...
where M(~,S)
and M(S,S)
...
are the MSE matrices of Sand S re-
spectively, and M is defined in Chapter 2.
that
h.
(3.2)
Moreover, they found
(3,2) is a positive-semidefinite matrix if, and only if,
and this ocours if, and only if,
13
Next they point out:
Getting the cri terion:.intothe form. A. ~ 1/2 is
an important step, because A. is a parameter in a
well-defined probability density function. Thus
we can make a test of the hypotheses that a
particular set of restrictions yields better
structural estimates than 01S estimator according to the MSE criterion.
In order to develop the statistical test, let
= Y 'MY i/\I
Q
o
cf=
2 2
0'1' (n-m)
....
,
A
SSE(S) - SSE(S)
cf
where
Toro and Wallace have shown that
Q
l
therefore the ratio
has the non-central
with parameters
(n-m)QO/mQl
and QO are independent and
F
distribution
(u, n-m, A.) • So the hypothesis that a set of con-
strained estimators
....
13
is better than the unconstrained estimators
according to the generalized MSE criterion, can be written as
~
13
14
W)- W , where
We will accept
-
ex
W=
Some critical points and power computations are given in [18J.
Sclove [19] has obtained point estimators for the coefficients
in orthogonal linear regression which are "better" than 018 estimators
when at least three coefficients are to be estimated.
The measure of
goodness of an estimator is the sum of weighted sum of the componentwise MSE.
The extention of the results to the general case of non-
orthogonal regression is given, where the measure of goodness of an
estimator is the mean of a quadratic form in the componentwise error.
The precise meaning of "better" that will be used subsequently is
the following:
"One estimator is better than another if the sum of
its componentwise MSE's is smaller than that of the other, for all
parameter values."
Sclove considers first the model for the regression on
orthogonal independent variables
where the
YJ.
= Z~Q
J
ei
are
n
I:
j=l
He defines
fJ
+ e
j'
ieied.
Z.Z '. = I •
J J
j
= 1,
••• , n
NCO,e?)
random variables, and
1.5
and
The OLa estimator for
"~ ==
since Z'Z
~
is
Z'y
I.
SSE
~
==
The residual sum of squares is
v
==
y'y - ASIS" •
It is well-known that the statistics
/'
S and v are independent,
S(j N(~,;I)
a.nd v f1 ~.2
• The measure of goodness of an estimator
n-m
1\
S will be a weighted sum of componentwise MSE
... ,
m a.re given weights.
where
i == 1,
Thus,
S is "better" than S if
N
for all
A special case is
"
S.
An important series of theorems and coro11ar1.es p given by James
and Stein [10J and bw Baranchik [lJ, that are used intensively bw Sclove
and subsequent authors are the following:
16
Theorem 3.2
(James-SteinC!QJl
For m > 3 , the estimator
,.,
S = (1
-
ov
/'0
i'I\)S
S'S
where 0<: c < f(m-2») , has the property that
- n-m-2
for all S.
Theorem 3.3
For any 13,
N)
Y1 ( S
~S
is smallest when c
e:
~ •
rn:m+2'J
(Baranchik'C!J2.
Let F = ~ and let a(.)
f\
"
be any function suoh that a(.)
monotone nondecreas i ng and 0 <a( • )
¥J(s,v) ::
n-m+2'
< i(m-1»)
is
Let
(1 _ ~~F»~ •
Coro1lap:.3.1
The estimator
in the sense that
It should be stressed that all these results are based upon the
assumption that the errors are normal.
Sc10ve applies the results obtained above to the case when we can
partition l3 as
~'= (13(1),13(2»
,where 13(1) is an m vector
17
and 13(2)
q vector
is a
S' = (~(1)'~(2))
N
0
for all
<
=m+
q) • Similarly, partition
13
as
Use of
13*
Let
·
cv"
- ", 1\ , )13(2)
13(2)13(2)
c ~.~
~.
Then by Theorem 3.2,
13(2)
where
(n
.',
13(2)
= (1
and from Corollary 3.1 we have:
Corollary 3.2
The estimator
13*
where
0
=
< c< ~
~ , has the property that
for aIlS.
He points out that, when
f,
F*
13(2)
=0
A
= S(2~S(2~/q nF
vI
n-m
and the estimate of 13(2)
q,n-m
is zero when F* < «n-m)/q)c.
corresponds to making a preliminary test of hypothesis
a level of significance a = pfF
>- «n-m)/q)c).
<. q,n-m
"]
is very much like estimating 13 as
13(2)
=0
, at
Hence, using 13*
18
when the hypothesis is rejected and as
when the hypothesis is accepted.
Some critical points are given in
[19J.
As an extension o£ the past results to nonorthogona1 regression,
he considers the model
Letting
S
=
suppose that
variables
X.
Z. = L'X.
J
J
and
£or
J
n
l:
XJ'XJ'.
L'SL
=I
j=l
so that the trans£ormation £rom the independent
to the orthogonal independent variables
Sl = LS.
ZJ'
is
The estimator (3.4) gives as an estimator
S the statistic
i£ F*
~
n-m
~-c
q
19
The applicability of the results is limited because of the requirement of estimating at least three regression coefficients.
Another important work in the area of biased estimation in
linear regression problems is the one written by Hoer1
[9J who showed
that in linear regression analysis with nonorthogona1 data, it is helpfu1 to augment the diagonal of the normal equation matrix by a small
positive quantity in order to prevent "inflation" of the elements of
the vector of regression coefficientsf 1ater~' Hoer1 and Kennard
[9J
developed a comprehensive theory supporting Hoerl's procedure, showing
that it is possible to improve linear estimation from nonorthogona1
data by employing biased estimation, focusing on small MSE rather
than least squares.
Hoer1 and Kennard's formulation of the problem is as follows:
Consider the model
(y,XI3,Icr) , and assume that the matrix X'X has
the form of a correlation matrix.
If the eigenvalues of the
matrix are denoted as in (2.5) by
Ai'
X'X
i = 1, ••• , m , then a
seriously "ill-conditioned" problem is characterized by the fact that
the smallest eigenvalue
Am
is very much smaller than 1.
They have
summarized the dramatic inadequacy of 18 for nonorthogona1 problems in
this situation by noting that
.
squared distance between
~
13
cr/Am is a lower bound for the average
and
13 •
Thus, for ill-conditioned data, the IS estimated regression coefficient vector
1\
13
is expected to be far distant from the true vector
13. Moreover, the IS coefficient vector is much too long, on the
average, since
Am
«
1.
The L8 solution yields coefficients whose
absolute values are too large and whose signs may actually reverse
20
with negligible changes in the data.
The ridge regression estimator
is obtained by solving
1\*
(x'x + dI)S
= X'y
yielding
for
d~
0 •
About the form of the estimator, Hoer1 and Kennard says:
'~stimation based on the matrix [X'X + dIJ ,
d ~ 0 rather than XIX has been found to be a
procedure that can be used to help circumvent many
of the difficulties associated with'the usual LS
estimates. In particular, the procedure can be
used to portray the sensitivity of the estimates
to the particular set of data being used, and it
can be used to obtain a point estimate with
smaller mean squared error."
There is an "optimum" value for
d
for any problem which cannot
be determined in practice, so it is desirable to examine the ridge
solution for a range of admissible values of
d.
They introduce
the term "Ridge Trace" to describe the solution" thought of, focused
as a function of
d.
They also discuss the methods for choosing
Some critical comments on the properties and goodness of the method
had been given by Conniffee and Stone [4J and an extension and
detailed study of the properties of the Ridge estimator has been
done by Chapman [2J.
Marquardt [13J proposed a class of regression estimators he
called generalized inverse estimators.
He argues that:
d.
21
"It is important to recognize that practical
estimation problems give rise to matrices X'X
having eigenvalues that may be grouped qualitatively into three types - substantially greater than
zero, slightly greater than zero, precisely zero
(except for rounding error). In computations, it
may be difficult to distinguish between adjacent
types."
As an alternative to using precise methods to obtain "exact" solutions
for this case, he suggests assuming a lower rank for
obtaining a solution under this assumption.
X'X
and
For this he says:
"Use of multiple precJ.sJ.on arithmetic will not
guarantee reliable results. Furthermore, inspection of the eigenvalues spectrum usually
suggests that there is no "rank" clearly assignable to the matrix. Rather, there is a range of
ranks that may be reasonable choices. One would
like to be able to determine the generalized inverse for any assigned rank in this reasonable
range."
The class of generalized inverse regression estimators is given
by:
where
A+
y
and the
is the generalized inverse based upon
Vi
is the eigenvector of
assigned rank
y.
X'X
corresponding to
Ai ' for
In general, there is an "optimum" value for
y
for any problem, but it is desirable to examine the generalized
inverse solution for a range of admissible values for
Marquardt has shown that if "+
~
equations
to,.
X'X~
= X'y
y.
is the solution of the normal
obtained by assinging rank
y to
X'X, and
22
using the generalized inverse (3.5), then
A+
S
minimizes the sum of
squares of residuals
Marquardt makes a distinction between ridge and generalized inverse
esU.mates.
Precisely, he says s
"Although the ridge and generalized inverse
estimators share many desirable properties, the
ridge estimator is not a generali.zed inverse
estimator. For example,
AA+A
=A
•
The ridge can be viewed as an approximate
generalized inverse."
Rao [16J makes the remark that not much work has been done on BLm
(minimum MSE estimators) compared to that on BLUE.
Restricting
attention to the model proposed in (2.1) and letting t'y
estimator of
p'S , he points out that the MSE of
which involves both the unknown parameters
a2
be an
t'y, namely,
and
S is not a
suitable cr:i.terlon for minimizing.
He proposes the following possibilities:
(1)
~l
Choose an ~ priori value of d 13 ,say b,
based on previous knowledge, and set up the
criterion as c:l-8 , where
and
81 :: t'l::t + (X't - p) 'W(X' t - p)
W:: bb' •
23
(ii)
(11.i)
If S is considered to have an ~ ~iori
distribution with a dispersion matrix
cfw
,where
is
cfsl
W is known, then the criter1.on
.
We observe that the expression (3.6) 1.8 the
sum of two parts, one representing the
variance and the second the bias. In such
a case the choice of W in Sl represents
the relative weight we attach to bias compared to variance.
Taking Sl
as a criterion with an appropriate symmetric
W,
Rao gives the optimum choice of .t in the form of the following
theorem.
Theorem 3.4
The BLE of any function
piS is
...
p'f3 , where
He later points out2
"If we have some knowledge about the domain 1.n
which f3 is expected to lie, we may be able to
choose W suitably to assure that BLE's have
uniformly smaller mean dispersion error than
BLUE's."
He also remarks that:
"Further investigation in this direction such as
comparison of the estimators given 1.n Theorem 3.4
with the ridge estimator of Hoerl and Kennard will
be useful."
Mayer and Willke [14] have studied the ridge estimators, and they
consider them as a subclass of the class of linear transformation of
24
IS estimators.
At the same time, they propose an alternative class of
estimators that they call shrunken estimators.
Mayer and Willke have
shown that these estimators satisfy the admissibility condition proposed by Hoerl and Kennard and both are derived as minimum norm
estimators in the class of linear transformations of LS estimators.
In addition, they obtain a class of estimators that are minimum
variance linear transformations of the LS estimators and that the
members of this class are shown to be stochastically shrunken estimators.
The first class that they propose has as a typical member
where
factor.
"13
is the OLS estimator and
If
A is labelled the shrinkage
A is a fixed scalar, then CA is called a
ministically shrunken; alternatively if
function of
h A
A = F(I3'I3)
deter~
is a scalar
II. "
13'13 , then CA is called a stochastically shrunken
estimator and is written
C(f) •
Although the shrunken estimator,
C , may seem a rather
A
simplistic alteration, Mayer and Willke has shown that it satisfies
the following admissibility condition:
Proposition 3.1
For every
13 there exists a fixed
E( C - (3) '(c~
A
A in [O,lJ such that
A
- (3) < Var(l3)
and, thus, the subclass of deterministically shrunken estimators is
admissible.
25
The second class of estimators that they propose are the ones
that belongs to the following class:
Definition 3.2
a
Let
b (;
a ,
denote the class of linear transformations of
then
b
= AS1\
for some
(m x m)
If
matrix A •
Definition 3.3
Thus,
af,)
class
a,
is actually an equivalence class or orbit within the
the equivalence defined with respect to the sum of squares
loss function.
Mayer and Willke show that both the ridge and the deterministically
shrunken estimators can be characterized as minimum norm estimators in
the class
aCT) .
In addition, they discuss some methods for choosing
the proper shrinkage factor.
The third class of estimators that they proposed arises from the
fact that different norms lead to different estimators and there is no
obvious reason for preferring one norm over another.
They consider
estimators which have minimum total variance among all estimators in
the given class.
These
estimato~s
have the form
26
which are minimum variance within each equivalence class.
dO
Although
looks quite complex, Mayer and Willke have shown that the
estimators belongs to the class of shrunken estimators, in fact,
do
is a stochastically shrunken estimator.
Finally, they propose that, if we use a shrinkage estimator with
as the shrinkage factor, where
2
S
= y'y
-
A
A
~'s~
, then the class of
estimators
satisfies a stronger admissibility condition than the one presented
above.
In particular, if we let
WeB) = E(B -
~)'S(B
-
~)
denote the weighted total MSE of the estimator B , then the following
proposition is given Qy Sclove [19J and is based on results of James
and Stein [lOJ.
Proposition 3.2
If
if $0
m.2:3
and
0<
= (m-2) (n-m+2) -1
The class
J'
k < 2(m-2)(n-m~2)-1
, then
and
, then
is strongly admissible with respect to weighted MSE in
the sense that is is known exactly which elements are better (in terms
of MSE) than the LS estimator.
27
Goldstein and Smith [7J examine the mean square error properties
of a class of shrinkage estimators for the normal regression model
(y,Xl3 p Ie?)
which leads to a new derivation of the Hoerl and Kennard
[9J ridge estimator and its~generalizations; they also compare the
proposed class with the James and Stein [lOJ estimator and the estimator
proposed by I.ia~~a~dt [13 J~
.
Goldstein and Smith decompose the model
U, 1\1/2
where
Let
are given in (2.4), as follows 2
and V'
F::: [~J where
U is a
(n x n-m)
matrix, the orthogonal
complement of U , be such that
FXV ::: D
Writing
Z ::: Fy,
6::: V'S,
v::: Fe ,
they obtain
Z ::: FXI3 + Pe :::
where
v
Ie?
Explicity
Do
rl\1/2 V' Q
+ v ::: L
o
~J + [
is normally distributed with mean
zif) N(Ai/2oi;e?)
el
e
J
2
0
and covariance matrix
(i::: 1, 2, ..• , m) ')
!
Zi
n N(O.?)
(i "m+l ••••• n)
J
28
They confine attention to estimators o£ the £oxm
wi.th
where
Z />..1/2
i
are the 018 estimators.
1
The requirements that they impose on
ci
are summari.zed as
£ollowsg
(1)
(3.8)
c l (>".]. ,d)/>..~/2
].
(ii)
Assuming £rom now on that
<0
£or all
(3.8)
is satis£ied, Goldstein and Smith
have proven the £ollowing lemma:
Lemma 3.1
For any
there exists
0
d'> 0 such that
"* c(>"i ,d)Zi
0:1.:=
has
smaller MSE than
"
13
i
==
z.].
>..~./2
].
£or all i
==
1, 2, ••• , m •
Later on they made some comments about the class o£shrinkage
estimators proposed by James and Stei.n [10J.
These estimates are
based on £irst line o£ (3.7) and take the £orms
'"
m 2
0[1 - «m-2)~ Z.)JZ.
i
J
].
i • 1, .•• , m •
29
They said about (3.9)=
"With respect to quadratic loss for 6i it is
well known that this has smaller average MSE than
the 18 estimator ~. = Z. • We note however that
* ... 1/2
1
1
0i
01/Ai
the corresponding estimator for 0i
is now explicitly derived with respect to the loss
function
m
*
2
!:. Ai (0 - 0i)
i=l
corresponding to
=
m...
!:
(oi - 0i)
2
i=l
rather than w1th respect to
m
*
(5 - 01)
2
•
i=l
On an intuitive level, we see, therefore, that the
James-Stein form is inappropriate in the sense that
it implicity takes less account of the loss in precisely those directions where estimation is most
inaccurate."
!:
They have shown too that in the original parameters in the model
(y ,X[3 ,Il) , for any
13*
= VO*
0i*
f3
there exists ad'" 0 such that for
= C(Ai,d)Zi
• We will have that,
,..
than the corresponding 18 estimator Si
Si* has smaller MSE
for all i - I , ••• , m •
Hoerl and Kennard's '.ridge regression estimates and Marquardt's
estimators belong to the class of estimators defined by
letting
0i*
c(Ai,d)
= Zi/Ai
for
= A.~/2/A.
1
i
=1,
Marquardt's estimator.
1
13* since by
+ d we get the ridge estimator and letting
••• , y,
0i*
=0
1
= y+l,
••• , m we get
30
4.
DIRECTIONALLY MINIMAX TRACE MEAN SQUARED
ERROR ESTIMATOR
The goal in this chapter is to obtain an estimator of
which
~
depends as little as possible on the parameters themselves and which,
in some sense, minimizes the trace mean squared error,
T(Ay+b,~).
Adopting a minimax philosophy leads to attempting to replace
the value which maximizes
ful as
T(Ay+b,S) • This, of course, is not help-
T is unboundedly increasing with
fisH.
The modification
adopted in this work is to express the parameters in the form
where
by
~
~
= ka
a are its direction cosines and k its length. Then, for
flxed values of k, the expression,
by choice of a.
T(Ay+b,ka)
»
can be maximized
By Theorem 4.1 and the discussion which follows it,
the ultimate choice of a is independent of' the value of k.
This
fact indicates that, emanating from the origin, there is a direction
corresponding to the worst choice of S with respect to mean squared
error (or equivalently, squared bias).
the value of
S
is chosen to maximize
ray is set by choice
of
It is from along this ray that
T.
The exact location on the
k •.
The ideas expressed above are important enough to warrant the
following definition.
Definition 4.1
tv
A linear estimator S
=Ay + b
is said to be the Directionally
Minimax Trace Mean Squared Error (DMTMSE) Estimator of
b are such that they minimize
~
if A and
,
31
where
In· order to simplify our work, so that we can accomplish our goal,
we will first prove the following theorem.
Theorem
4.1
In the linear model
(y,Xf3J:,cf),
ST(A,b,k).2 ST(A,O,k)
A , and in particular, the DMTMSE estimator is of the form
Proof:
Either
Ay •
2
2
or IIFI3 + bl1 ~ IIFI311 • Suppose
2
Then it follows that Ii bl/ ~ -2b 'FI3 and hence,
IIFI3 + b1l
the former holds.
for any
2
s: IIF13:12
2
= sup
Thus,
Now,
sup "FI3 + bJI
13t=~
.
13~ '\
fmaxCJJFI3 + b/l
~ ;?~ 1~~,,2
.
2
,
II-FI3 + b11
2
J}
32
with equality when
all
b
= o.
Therefore,
ST(A,b,k) ~ ST(A,O,k)
for
9
0
A and k •
In view of Theorem 4.1, the criterion can be reduced to!
(4.1)
where
F
vector of
= AX-I
• It can be seen from (4.1) that the characteristic
F'F
associated with its largest characteristic root gives
the direction cosines of the least favorable direction for estimation.
This is the value of
a which leads to maximum bias.
this direction is independent of
k
Fortunately,
so it is meaningful to hold
k
constant.
4.1
DMTMSE Estimation In The Class of 018
Estimators Computed Subject To False Restrictions
One class of biased linear estimators of regression coefficients
that has been proposed is that obtained by computing least squares
estimates under sets of false restrictions.
studied by Toro and Wallace [18J.
These estimators were
Their work is primarily concerned
with testing whether or not a particular set of false restrictions will
lead to estimators with smaller mean squared error.
They leave the
choice of restrictions up to the experimenter.
For this case, we shall restrict our attention to the linear
model
(y,X~,I~)
and assume that
X is of full rank
(rank(X) ~ m) •
33
To obtain the class of estimators, we shall impose
u
false re-
strictions on the parameters:
=h
R[3
R is a
where
,
(u x m)
matrix of rank
u(~ m)
and
h
is a
(u x 1)
So that the equations are consistent p we shall assume that
vector.
hce(R) •
It is well-known (see, for example p Pringle and Rayner [l.5J) that
under thi.s setup the estimator is of the form
~
= ~(Rph)
S is given by (2.•2).
where
In the following series of theorems p we will see that it is
possible to limit the class of estimators of interest.
Theorem 4.1.1
In the class of least squares estimators subject to false restrictions for the setup (y,X[3,Ic1) p restrlcti.ons
ferred over
Proof:
R[3
=h
with respect to DMTMSE estimati.on.
The theorem follows from Theorem
estimator will be of the form
hand
b
equals
0
R[3:: 0 are pre-
whenever
4~l
since p for
R[3
=h
p the
Ay + b p where A is independent of
h
= O. 0
In view of Theorem 4.1.1, attention wi.ll. be limited to the class
of least squares estimators obtained subject to
RS.
~
0 •
34
Theorem 4.1.2
The class of estimators,
ject to constraints
RS
=0
is equivalent to that where
RS-IR'
=~
,where
Since 1\
Proofg
S(R) , obtained by least squares sub-
£
R € RI R is
where
R~ fRIR
is
(u x m)
(u x m)
of rank
of rank
u
and
S is given by (2.2).
and V given in (2.4) are posi.tive definite, the
rows of ,,1/2 v form a basis for Euclidean m-space and, hence,
be written as
R
= a0~/2v' ,
u ::: rankeR) { rank(B) { u,
and
BB'
for some
(u x m)
matrix B.
B must have full row rank.
is positive definite.
Thus, if we let RS-IR'
IGI ~ 0 , it is easily seen that
G-IR :::
u]
S(R)
= S(G-IR)
R can
Since
Now,
= GG'
,
• Also, if we let
R, we have
Thus, for every
-
R there exists a corresponding R such that
and
-
1-
RS- R'
=I. 0
Using Theorem 4.1.2, we can respecify the class of estimators of
interest as
for all
R such that
RS-IR'
=I.
Within this class we wish to find
the optimum with respect to the DMTMSE criterion set out in Chapter 4.
35
It should be noted that the value of u
fixed.
is assumed to be given and
Also, because of the assumption that
R is of full row rank,
the ordinary least squares (018) estimator is not in the class.
How-
ever, there exist members of the class that are arbitrarily "close" to
the 01S estimator.
Subsections 4.1.1 and 4.1.2 below deal with deriving the optimum
=1
estimator for
u
and for general
u , respectively, and sub-
section 4.1.3
examines some properties of the resulting estimators.
4.1.1 The Single Constraint Case
The first case we shall consider is when
the restrictions being imposed are of the form
an
(m x 1)
vector satisfying r'S
By putting A
= (I
r
- S-lrr')S-lX'
we get
+ k 2r'rr'S -2 r
r
2
-1
.-2
+ k r'rr'S
r
=1
u
= 1.
r'~ = 0
In this case
, where
•
in (4.1), and since
r
is
36
= a2
m
. ~ Ai-1 + (2
k r'r - (r2)-2
r'S r
9
i=l
1\ = diag(Al' 11.2 ' ... , Am) and 1\ is defined in (2.4) and
where
(2 •.5).
The following theorem gives the vector
r
which minimizes
Theorem 4.1.1.1
For the class of least squares estimates computed subject to one
incorrect restr:i.ction 9
with equality when
= A~/~m.
r
is defined in (2.4).
Here
V
= (vI'
v 2 ' ••• , vm)
and
V
That is, the DMTMSE estimator for this class of
estimators is
Proofz
implies
Let
w'w
w = A- 1 / 2V'r so that
= 1.
r
= VI\, 1/2w
• Also,
r'S
-1
r
=1
Now
= a2
~
~ A~
1'--1 1
2 .m
i=l
1 + k 2(~ill woAo)(
2
m
~
1=1
0
2 -1
- r:r . ~ wiAo
1
•
1 1
'1
1=
2 -1)
W,A ,
1 1
(4.1.1.1)
37
Since
2
m
w. = 1
L:
i=l
1.
and since the harmonic mean is always less than or equal to the
arithmetic mean,
m
2
m
2 1
(L: w.A.)(L: w.A:-
i=l
1. 1.
i=l
1. 1.
»1.
(4.1.1.2)
-
Furthermore, since the value of a weighted arithmetic mean is always
less than or equal to the largest value being averaged,
m
L:
i=l
2 -1 {A=1 •
1. 1.
m
(4.1.1.3)
w. A.
Using (4.1.1.2) and (4.1.1.3) in (4.1.1.1), we get
') m-l
::: 0 . L: 11.:-1. 1 +
k
2
•
i=l
Noting that equality holds in (4.1.1.2) whenever
values ~f i
for which wi>
wI = w2 = ••• = wm_1
= 0,
A.1.
=A
for all
° and in (4.1.1.3) whenever
wm ::: 1 , we see that both equalities hold
when w' ::: (0, 0, ••• , 0, 1) , in which case r = VAI / 2w ::: A;/2 Vm •
Thus,
38
and the resulting estimator becomes
i ::
(I - vmm
v')S-lX'y.
0
".
The estimator thus derived can be seen to be of the form
A
N
A
[3 :: P[3 ,where
P is an orthogonal projection matrix and
[3
is the
OLB estimator.
The estimator can be obtained by projecting [3
II
into
the space orthogonal to the characteristic vector corresponding to the
smallest characteristic root of X'X,
,
Another interesting feature of this result is the fact that the
estimator does not depend on the value of k
specified beforehand.
will involve
4.1.2
and, hence, need not be
Of course, the resulting mean squared
e~ors
k.
The General Case
For the general case we shall assume that
(1 ~ u
u
! m). The solution has the appearance of
is a fixed number
th~t for
u:: 1
the proof of the theorem is more involved.
Theorem 4.1.2.1
For the class of least squares estimates computed subject to a
set of u
independently incorrect restrictions,
but
39
R=
wi th equality holding whenever
given in
(2.7)
(2.8).
and
/\~/2y 2 '
where
/\2
and
V2
are
That is, for this class of estimators, the
DMTMSE estimator is
Proof:
where
We can write
B
is any
(u x m)
matrix of rank
m
=
1:
1=1
From this we have that
u.
-1
po Ai
0
J.J.
where
P
B
Since
.
1
= B'(BB')""'
B = (po .)
J.J
•
P
is a symmetric idempotent matrix, we have that
B
which implies that
po
0
7
J.J. -
o.
Further, if
po "7 0 , we have that
0
J.J.
40
implies
Pii.,sl.
It is well-known that
..m
E P
i=l ii
=u
•
Therefore,
0<: p.. ", 1
is maximized subject to
m
E
i=l
p ••
11
11 ~
-
and
=u
by choosing
P1 1 = ••• = p(m-u) (m-u)
=0
and
P(m~u+l)(m-u+1) =
... = Pm = 1
giving
m
-1
E p .. A.
i=l 11 1
=
m
E
i=m-u+1
This corresponds to taking B =
o
p
B
=
:
0
:
I
•
•
,-I
I\,i
•
[O:I
]
• u
and
............
·••
o
•
u
Using this we have
(4.1.2.1)
41
Next, we have that
= Chr1(Q 'Q) ,
where
Now suppose that .e,
Note that Q2 = Q but is not symmetric.
then t
= Qz
for some
z and, hence,
Qt
=t.
Furthermore, if
2
2
.e,* =.e,/(t'.e,)1/2 then Q.e,* =.e,* and IIQtl1 = II.t11 = 1.
C~(Q'Q) =
(C'0(Q)
Therefore,
sup
uQxlF L 1
x(x'X=l)
and
Using (4.1.2.1) and (4.1.2.2) we may conclude that
)
-
2 m-u
0-
I:
i=l
-1
2
A. + k •
1
It remains to demonstrate that the lower bound is attainable.
B
.
= (0:1
)
• u
we get
Putting
42
and substituting this into the left hand side of (4.1.2.3) gives
which is the right hand side of (4.1.2.3).
Finally, the estimator oan
take various forms;
N
S
= (I
1
- V2V')SX'y
2
= (I
~ is the 018 estimator.
~
- V2V')S
2
A
= V1V'S
1'
o
where
~
4.1.3
Some Properties of The Estimator
Marquardt [13J proposed a class of regression estimators he
called generalized inverse estimators.
He argued that, as an a1ter-
native to using precise computing methods for obtaining "exact"
solutions in situations where the
X'X matrix has spree small but
positive characteristic roots, it might be preferable to assign a
lower rank to
X'X and obtain a solution under this assumption.
In
our notation, his estimator can be written as
where
A1
v is the assigned rank of X' X,
A~
= Vl'i lvi '
and V1
are defined in (2.7) and (2.8), with the exception that the
dimensions of V1 is
(m x v) • In this setting the first of a
and
series of properties that the estimator possesses is introduced in
the following theorem.
Theorem 4.1..3.1
The DMTMSE estimator for the class of least squares estimators
computed subject to false restrictions is equivalent to Marquardt's
generalized inverse estimators when the assigned rank of
X·X equals
m- u •
Proof = Since we are assuming that
X'X is of full rank,
Some further results which follow easily from previous results
are summarized without proof in the following theorem.
Theorem 4.1..3.2
For any
(i)
...
Var(~ , (3) ~
if,
(ii)
vector t
(m x 1)
A
Var(t' (3)
~
t€ t...(V l )
wi th equality if, and only
...
A
in which 'caset'[3 =t '[3 •
2
E(t'S) =.t'[3 +t'V2 V [3
and the second term (the
bias) vanishes if, and only if, ~ E e(Vl )
[3f e(Vl ) •
(iii)
The estimator
......
[3'[3 < "'"
[3'[3
•
....
[3 is shorter than
"[3,
!.~.,
or
44
(iv) For the more general model
I:E
I ., 0
(YpXS~r~) , where
p the corresponding estimator is of the
form
where
-X':E-1X ~ ~V
diagonal,
-
A
,
V= (Vl :V2 )
9
and VI
is a
is
(m x (m-u))
matrix.
Toro and Wallace [18J give a statistical test for determining
whether imposing a given set of false restrictions,
suIt in a reduced mean square error.
RS
=0
, will re-
This test is appropriate for use
in this setting provided we make the assumption that the errors in the
model are normally distributed.
Reject
In this case, the form of the test is
HO if W> w
- a.
p
where
W = [y'U 2u y/uJ/[y'(I - UU')y/(n-m)J
Z
and U and U are defined in (2.4) and (2.6).
2
Note that the
A
N
HO states that ~ is superior to the OLS estimator, S,
with respect to mean squared error and accepting HO suggests that S
hypothesis
-
is preferred over ~.
random variable with u
centrality
6
It can be seen that W is a noncentral F
and
(n - m)
= S' V/'2VzS/2~ ,
(2.7) and (2.8).
degrees of freedom and non-
where V2 and 1\ 2 are defined in
It can be seen also that
where
A_ +
mu l
is defined in (2.4) and (2.4).
small whenever the pro jection of
is small.
13
on
e(V2)
Therefore,
6 is
is small or
Am_ +I
u
The latter condition is precisely that for which the
generalized inverse estimators are intended.
Some critical points
and power computations are given in [18].
4.2
The DMTMSE Estimator For The Class of
Shrinkage Estimators
Many regression estimators have been proposed that have the form
...
13
where
= Ay = v(lU'y
,
U and V are defined by (2.4) and
Included in this class are:
j
= 1,
... , m
Generalized Inverse Regression:
Yj
= Ajl/2
y. = 0
J
Ridge Regression:
as well as others.
, j = 1, ... , m-u
j
= m-u+l,
j = 1,
"'J
••• , m
m
This class is discussed by Goldstein and Smith
[7].
In the following, theorem, the DMTMSE estimator for this class is
derived.
46
Theorem 4.2.1
In the linear model
(y,X13plcf)
class of estimators of the form
fined by (2.4) and
r ~ di.ag(Y1D
D
the DMTMSE estimator from the
g = vru'y
, where U and V are de-
••• , Ym) ,is
,..
13
= t13
A
,where
"
S
is
the OLS estimator,
Proofs
First,
Suppose that the value of the second term is
p2 •
Then any set of
Yj'S that minimizes ST must satisfy
,)2 -~ P2
(1.- A~/2
J Y
J
for
j
= 1,
••• , m and, hence,
But ST is a minimum, therefore,
Yj
smallest possible value, so as to make
must equal
(l-P)A-1/2
,its
j
47
as small as possible, This implies that
2
STt/ cJ2
=: (l-p)
m
!:
j=l
2 2
A-1
j + pc.
Sinoe this is concave in p, the minimum can be found .b,y setting
first derlvat!ves equal to zero resulting in
m
p =1:
j=l
-1 m -1
2
Aj I( 1: Aj + 0 )
•
j=l
Thus,
Yj :::
[c2I( ~ A~l + C2 JA-j1/ 2
i=1
where
0
2
= k 2/cl. 0
The estimator obtained in Theorem 4.2.1 is a deterministioally
shrunken estimator as defined and studied in Mayer andWillke [14J,
Its f'orm is simply a scalar times the OIS estimator.
The soa.lar
faotor is between 0 and 1 and D as such D has the effeot of shortening the length of the vector
1\
~
•
4,1 The DMTMSE Estimator For The Class of
General Linear Functions
In this seotion we will be conoerned with the searoh for the
DMTMSE estimator for the class of general linear funotions of the form
Ay ,where A is any
em x n)
matrix.
48
The form of the est1.mators that we have obtained in previous
sections and standard resul·ts from linear model theory, suggest an
admi.ssibility condition which is elabo:rated in the following theorem.
Theorem 4.3.1
.With respect to DMTMSE estimation, all estimators not of the
form
--Proof-....
Ay ,where
g
for some
A:::: au'
Suppose that
Ay
C, are inadmissible.
1.S any linear est1.maH.on of
can be seen that, w:l.th respect to DMTMSE estima.t1.on,
is defined by (2.4), is superior to it.
and, since
I - UU'
that
Thus,
for any A.
0
= tr(A(I
-
AUU'y, where
This is because
is n.n.d., we can write
o ~ tr[(AD)(AD)']
13, then it
UU')A')
I - UU'
= DD'
, say, so
U
49
This theorem states that the original model
can be partitioned into
and
Y2
= V'e
~
where V is the orthogonal complement of V,
Yl
= V'y
,and
~
Y
2
= V'y
, and that all estimation should be based solely on Yl •
Thus, without loss of generality we may restrict our attention
to the problem of estimation of
y
where
is
y
= XB
is
(m x 1)
B in the model:
+ e
(4.3.1)
X =Al / 2y'
(m x 1) ,
with E(e)
=0
and
is
Yar(e)
(m x m)
= Id2
of rank
m, and
e
•
Restricting our attention to the model defined in (4.3.1), an
admissibility condition for estimators of the form
Ay
is introduced
in the next theorem.
Theorem 4.3.2
In the model
y
= XB
+ e , estimators
with respect to DMTMSE estimation if AX
f
Ay
of
(AX)' •
B are inadmissible
.50
,Proofs
For the model defined in (4.3.1), we have
(4.3.2)
Let
so that
A = AOX=l •
By substituting
ST(AO,k)
(4.3.4)
(4.3.3) and (4.3.4) in (4.3.2), we have
= ~tr[Ao(X'X)-lAoJ
+
k2C~[(I-Ao)(I-Ao)'J •
Consider now the symmetric matrix
where
is a positive symmetric square root.
Then it follows easily that
so it remains to prove that
51
or
tr[AOVI\-lV'A O
] ~ tr[AIVA-IV'AiJ ,
X =,,1/2V' •
since
Before proving (4.3.5), observe the following:
tr[(I-Al)'(X'X)-l(I-A l )]
= tr
[[(I-A O) (I-A O) ,]1/2(X'X)-1[(I-A )(I-A O) ,]1/25
O
= tr[(I-Ao)(I-Ao)'(X'X)-lJ
= tr[ (I-A O) '(x'x) -l(I-Ao)] ,
therefore,
or
Hence,
will imply
Therefore, we have only to prove (4.3.5).
This will be done as
follows~
tr[VA-ly'J - tr[AIVA-ly'J
= tr[(I-Al)VA-ly'J
= tr[[(I-Ao)(I-A )'
o
i/2YA-ly,]
= tr{[VA- 1V'(I-Ao )(I-Ao)'VA-ly'Jl
/:7
? tr[(I..AO) 'yA-IV' J
=
tr[VA-lV'J - tr[AoVA-ly'J ,
(4.3.6)
where the inequality holds, since for any real matrix W ,
(see for example, Marcus and Mine [12J, Section 4.2).
we may conclude
as requiTed.
Therefore,
From (4.3.6)
53
Our main purpose now is to prove the following theorems
Theorem 4.3.3
In the model
is
S
(m x m)
=:
tS
or rank
,where y
is
(m x 1) ,
m, the DMTMSE estimator of
X =1\1/2y,
S is given by
,where
= c 2/ (I:m
t
Proofs
(Y9XS9I~)
i=l
A~
~
1
+
2
C )
,
By employing the same transformation used i.n the previous
theorem, we have
where by Theorem 4.3.3,
AO is a symmetric matrix, that can be
written as
A
o ::
Gf'G' •
From the above we have 9
2
ST(AO,k) :: ~tr~G'VA-IV'GrJ + k m~x[(1-Yi)2J
~
= 0-2 I:
2
2 -1
2
2
y. I: (g!vk ) Ak + k max[(l-Yi) ]
i
~ k
~
i
54
2222
-- cr2
I: y. z. + k max[(l-y.) ]
i
i
:1 l
:1
where
since
zi
column of
6~1.
= y~z~
1. J.
y.1.
~
i
0
o.:1 z.:1-1
gi
p
p
is the i
th
column of'
G and
vk
the
k
th
V :
Using the identical argument of Theorem 4.2.1, the minimizing
values are
01.' ::::
2
C /
m
<,4:. 1 z.J.2
J=
+ c) z.
J.
p
i = l , ... ,m,
and, therefore,
i :::: 1, •• ., m
since
Furthermore, it is easily seen that the choice of
So we have
which implies that
G is arbitrary.
55
and, therefore, from Theorem 4.3.1 the resulting DMTMSE estimator of
[3
is of the forms
1\
tS
.0
From Theorem 4.3.3, we may conclude that the DMTMSE estimator for
the class of general linear functions coincides with that of the class
of shrInkage estimators.
Unfortunately, for the general linear model (Y9XS,I~) 9 where
Y is
(n x 1) ,and X is
[p'S]
is not, in general, equal to
(t x m)
matrix.
(n x m)
of
rank ~ m ~ n , the DMTMSE
P9[DMTMSE(S)], where
P is any
The following counterexample proves the assertion.
Suppose we want to estimate
are, respectively,
(1 x n)
Taking the derivative of
zero, we have
so that
giving the estimator
ST
piS
with .t I y , where
pi
and t
vectors.
with respect to t
and equating it to
I
56
where
r
is a diagonal matrix with elements
( A. >+ .c -2)
.
1
Thus, DMTMSE (p'~)
p
1. == Ip '.'P m •
is clearly different from
A
-1/2
p' DMTMSE (~) :: p' t v ~ == p' t 'V1\ ' U'y ,
where
t == c
2
I(
m
~
i=l
-1
~..
1.
+
2
C )
•
57
5.
DIRECTIONALLY MINIMAX MEAN SQUARED ERROR ESTIMATOR
In Chapter 4, we were concentrating on the search for an estimator
of
~
which depends as little as possible on the parameters themselves
and which, in some sense, minimizes the trace of the MSE,
T(Ay,S) •
This chapter will be devoted to the search for an estimator of
S
which we will expect to be free, as much as possible, of the parameters themselves and which minimizes, in some sense, the mean squared
error matrix.
The strategy that we will adopt, similar to the last
S
chapter, is to express the parameters in the form
vector
~
contains its direction cosines and k
for a fixed value of
, where the
its length.
M(Ay,k~)
k, the MSE matrix,
= k~
Then,
, can be
"maximized", in a sense that we will define later, by choice of
~.
Then we will find the matrix A that will "minimize" the resulting
criterion.
5.1' A MatrIx Ordering
Let
H be any square matrix.
positive definite (p.d.) if
definite (Pos.d.) if
As usual,
xVHx ~ 0 for all
x'Hx ~ 0
for all
x
and
H will be called
x
f
=0
0 ; positive semifor some
x
f
0
nonnegative: definite (n.n.d.) if it is either p.d. or p.s.d.~ zero
definite (z.d.) if
H 7 0,
H~
0,
null matrix.
H~
=0
0 and
for all
H
Finally, we define
can be written
Chipman
x'Hx
=0
x.
For these four we write
, respectively, where
H?' B to mean H - B ~ 0
0
is the
this also
B~ H •
[3J has proved the following important lemma regarding
this ordering among square matrices.
Lemma 5.1.1
The relation 2. among sq'uare matrices is transitive and, among
symmetric matrices, it is also ant:t··symmetric.
In view of thi.s lemma, we sha.ll speak of minimizing a symmetric,
nonnega't1.ve defi.ni'te ma'tn.x over a set
H€
"H, where
for all
~
W,
that is finding a matrix
B~ H
is a certain class of matri.ces, such that
B E ?-I.
Owi.ng to the ant.i·..symmetry· o:f the relation }:,
9
1.f a set of
symmetric matrices has a minimu.m, ,the min:1.mum matrix i.s a fortori
unique.
502
Geo!!!.e·tric Interpret.at:ton of The Matrix Ordering
In order to develop more insigh·t a'bout the matrix ordering that
we have defined, we shall state a theorem which will permit us to
visualize what :1.s meant 'by the rela.t:l.o:n H ~ B.
B Z. 0
sidering only the case where
and
We wi.ll 'be con-
H~ 0 •
Observe firs·t ·that acco:rdlng 't·o our defi.rrtti.on,
only if,
x'Hx {x'Bx for all
H.x ::: 0 , and, hence,
x
4 fx /Hx ::: ~
[xl Bx ~
then clearly
x, therefore
oj '"
x
fX/Hx
f (xl Bx ~
&I
Bx:::: 0
H~ B
p
if and
w.nl imply
~ so that if
0) •
Theorem .5.2.1
Let
Hand
EH :::: fx\ x'Hx ::::
~
B be any 't;wo non.d. sy'mmetrlc matrices, and define
and EB
~
rx:/ x'Bx ~
1] ,then
x E EH implies that
59
there exists a number m / 1
only if,
Proof:
mI 0
and
such that
fiX
~
E ' if, and
B
H~ B •
H ~ B and let
Assume first
1 _. x'Hx
x e-: EH ' therefore
S. x'Bx
Now it is clear from the note above that
x'Bx> 0 , so that if we
choose
m=
1
/ 1
~
-fx'Bx
we will obtain
mx'Bxm
that is,
=__
1__
x'Bx
x'Bx
=1
mx E E •
B
Now let
x
be an arbitrary vector.
m1 x £ EH for some
x' Hx
=0
m1
~
I
O.
Then either x'H
=0
I£ the former holds, then
x' Bx •
If the second assertion holds, we have
and by hypothesis there exist
Hence,
m2
I
0 and m2
S1
such that
or
60
Dividing tr~s expr~ssion ~I mi 'we finally get
Since
x was chosen arbitrarily,
0
H ~ B.
By Theorem 5.2.1, we can intuitively conclude that
H ;';'B means
that the graph defined by' the point set EB is entirely contained in
that for E , in t,he :'len~1l'l tha.t, :i.fwe ChOOSE; a point In EH and
H
travel towards the origin, we will crOBB EB before reach:i.ng the
origin.
The following example in a two-di.menslonal space will help to
illustrate the idea.
Exam.;ele 5.2.1
fx/x'a
are
(1
=:
Let E ~ {xl:xt(la'a)x =: ~ and E =: fx1xtaatx :: ~ =
B
H
t
, where x :: (~,x2) and a' ~ (a ,a )
1 or x'a:::
1 2
-I}
:Ie
2)
vectors,
B
:::
lata. and H ::: aa v are two
symmetric n.n.d. mat.rices.
r·t follo'ws :Prom the Ca.uchy·-Schwa1.'z inequality that
for all
x and, hance,
H :: aa}
~
la' a - B •
(2 x 2)
61
It is clear f':rom Figilre
either on the lines
5 2.1 tha.t
:if we choose a point that lies
0
x'a~:
1
or
x'a == -1
and we travel to'wards the
x'(Ia'a)x
origin, we will cross the circle defined by
=1
, before
reaching the origln.
1
-
a'a.
---{x x'(Ia'a)x:=,li
.c----_'__________
~
x x' a
= -1;'
l..
Figure 5.2.1
Graph::, for the point sets
E :::
B
.2.3
==
')
( ,
{xx' (raia)x ::: If._
5.1,
x'aa'x
and
J
Def:'J:ni -t.;ton of' a.. S'idI:£2:mUm in The Matrix
In Section
'.tX
("
-'
Orderi~
we have defined the meaning of the relation
for two n.n.d., sym.metric matrices.
H
<
In this section, we will be con-
cerned with formalizing the concept of a supremum in the context of
the ordering tha.t we are c0l1s1dering.
This concept w:i.ll permit us to
define a criterion that ca:'} be :lsed to obtain biased estimators which
will be superi.or to 018 with raspect to MSE,
Before we formalize the definition of the supremum, let us consi.der the followtng sets of
ma'l~rice8~
B
62
v ...~CB B it~
zS
W
z
V, where
r
= iBi
B
(,,'
~l.!l.d •.J symmetric matrix]t
a
is a non-empty set,
£: V and
(H I[ Z
:1 H <: B) ~ ,
-
:J
:: set of upper bounds for the set
Z
in
V •
Now the precise meF.i;Cc:l.ng of a supremum is contained :i.n the
following definition.
Definition 5.3.1
A square matrix B t::. V 1s the least upper bound or supremum for
the set
(1)
Z
if, and only if,
V
i.n
B is an upper bound for each H ( Z , that is,
H E
H':;;: B for all
z ,
and
(2)
if
B
1
It is
B1
L
B
:i.mporta~t
~ V
,
and
that is
Bl
B
~H
for all
HE Z then
is a minimum in
W
£\n.' lat.er work tc point out that the supremum Dr
least upper bound as
the case when the set
l~as
been defined does not always exist, even in
Z undET consideration is bounded above.
following counter-exa.mple will illustrate this c
The
63
Example .2,3.1
where
H
J.
~J .
_ C- 3
.-
a
It is clear that
HI
is a p.d., symmetric matrix, and since the
characteristic roots of
H
2
are
3
and
1 , it is also a p.d.,
symmetric matrix.
Let.
and
o J .
La 2.5
B .- r li-
2 -
These are clearly two p.d.• , symmetric matrices.
immed"i.ately tha.t the following relations hold£
Since the characteristic roots of the matrix
and the ones corresponding to the matrix
the following relaUons a.lso hold s
may conclude that the set
'"
B
1
\
Z -- '~Hl,H2f
t.
Moreover, it follows
B
l
HI'
B - H
2
I
B - HZ
2
? H2 ,
~
B
2
L HI •
are
a
0
and
are
!:-
B
2
H •
2
and
2
3/2,
Hence, we
is bounded above by the
"
mat,rices
are elements of
ever, there does not exi.st a. n,n.d., symmetric matrix,
such that the following relatj.om-' hold:
W•
How-
64
(i)
B1 '::' C ,
(ii)
B2 ~ C ,
(iii)
C 2.: HI '
(iv)
C L H2 •
and
In order to confirm the above assertion, suppose (i), (ii) and
(ili) hold.
c 22
Then It follows i.rrunediately that
= 2.5.
c ]J =:3
and
Using this a~d (iii), we have that
3
c 12
.
3 0
-
[c12 2.5 J L [0 2.5 J
which implies that for any
x' == (xl,xl )
Thus, letting
x
Xl == 1
and
2
::.: 1 , we ha.ve
and, hence,
Similarly, from (ii) and (iii), we get the following inequali.ties:
65
and
and, letting
xl
= x2
o'
~ X
we have
and
which, in turn, implies
Hence,
and from
(5.3.1)
and
(5.3.2)
we may conclude
So, if we assume (i), (ii), and (iii) true we must conclude that
But, since the characteristic roots of
we can see that
have defined.
HI
and
H2
HI - H
2
are 1.78
and
-0.28
are not comparable in the sense that we
Therefore, we may conclude that there does not exist a
n.ned., symmetric matrix
C, such that (i), (ii), (iii), and (iv)
66
hold and, therefore, that the supremum for our set
= [~1,H2J
Z
does
not exist, even though the set is bounded above.
Before we set out some suff'icient conditions for the existence of
a supremum, let us first develop some useful notations.
Suppose that the n.n.d., symmetric matrix H is a function of
a.
We shall denote it by
H(a) , where
empty set
eL,
a E C~ ~.
Under this set up we
so that our set
B of the set of matrices
Z
belongs to a certain non-
can be described as
{H:H
= H(a) ,
obtain (if l.t exists) the supremum
ca,n
H(a)
a
in
V
over all
a (.
a and we shall
wd te this assertion as
B == sup H(a)
Mf2
Theorem 5.3.1
Let
B
(
EB
a <: L.-:.t, moreover, let
,
= (xlx'Bx = lj
(1..)
(ii)
H(a) c Z ( V for all
and
C V
H()
a:;:' B
and suppose the following conditions hold:
f
or a 11. C1,~.iC
'-
1')
,-"";.
/',
x C EB there exists an aE U
For every
x'Bx
= x'H(a)x
such that
,
then
sup H(a) ::: B •
ar:O
Proof' ~
By assumpt:Lon (i)
B is an upper bound for
remains to be proven that if
x'Bx ,( x'B x
1
B.. i~ W then
.L
for all
x.
Z
in
Be Bl ' that is
V
So it
67
To prove this, let
~ollowing
(1)
be an arbitrary vector.
Then one
o~
the two
cases must occur:
Bx • 0 , in which case
B (
l
(2)
x
x'Bx
= 0 .(
x'B x
1
V •
There exists
is
cix'BX
a E.
a
10
C
l
= 1.
such that
clx! EB ; that
But by (ii) there exists an
such that
where again the inequality hold since
By (1) and (2) we may conclude that
s~~ R(a)
at:
, since
B:s Bl
Bl E W
and hence
= B. 0
~(
5.4 Directionally Minimax Mean Squared Error Estimation
The goal
o~
this section is to develop a
obtain an estimator o~
IL: II
I
meaning~l
~ in the general linear model
criterion to
(y ~X~,~) •
0 , which depends, as little as possible, on the parameters
'
themselves and 'which, in some sense, minimizes the mean squared error
matrix.
linear
some
We shall be concerned wIth the joint estimation of
~unctions o~
(t x n)
the observations of the form
Ay , where
using
~
A is
t ~ n •
matrix,
The strategy adopted, as was done in Chapter 4, is to express the
parameters in the form
direction cosines and
~ ~
k
ka , where
its length.
a
is the vector of its
Then,
~or
fixed values
o~
k ,
68
the matrix
Section
5.3,
M(A..'y,ka:) , can be maximized, i.n the sense d.eflned ln
by choice of
a.
The precise meaning of what we will
call a Directionally Ml.nimax Mean Squared Error Estimator is contained i.n the following definition 8
Definition 5.4~
A linear estimator
~
piS = Ay
is said to be Directionally Minimax
Mean Squared Error (DMMSE) estimator of
pi 13
if
A
is such that ii;
minimizes
~ ~AEA' + sup (pi - AX)13[3'(pi - AX)
13f9t
where
In vlew of DefinJ.tion
problem of Ii.nding a n.n.d.
B = sup
[3lf"
5.4.1,
p
we will be concerned flrst with the
symmetric matrix
B
such that
(p' - Ax)f3I3' (p' - AX)' •
This problem w:Ll1 be coni'ronted in the next two theorems, in which we
will a.ssume, wi tbout 10m3 of generality' that
k
2
=I •
69
5.4.1
Theorem
Let
T:: DD'
p
T :: DD'Z FI3I3 'F'
where
D is a
matrix
Proof 8
Assume first that
2.
FSI3' F'
matrix o:f rank
if, and only if p
r.
Then
F:: DC for some
ChM(C'C) ~ 1 •
(r x m)
DD'
C and
~
13 E
for all
(t x r)
for all
/3 C ~ •
This inequaJ.:i.ty holds p if and only if p
x'F13/3'F'x./~ x'DD'x
Now if
D'x:: 0
we will have from
/3'F'x :: 0
and, hence,
F'x
for all
~
O.
Thus,
therefore,
(;1(F') <;: G?(D' )
and f:rom this we have that
F
:=:
DC
for some
C.
If, as we have assumed p
F1313'F'
S.
DD'
(5.4.1)
/3 ( (!~
for all
for all.
x
and for all.
that
13 E ,9 :
1 (5.4.1)
70
then by' (5.4.2) we have
DCI3I3 ' C' D' 5. DD 'for all
13
,f;:
~.
•
This last inequa.::U.ty holds if, and only if,
x'DCI3I3'C'D'x:f. x'DD'x for all
13.f ~ and all
x.
Hence D in part:i.cular,
m'D'DCI3~'C'D'Dm S m'D'DD'Dm
for all
13~ ~1
a~d all
m.
This happens if, and only if,
D'DCI3f3'C'D'D ~ D'DD'D for all
I
where
'
!D'Di
f
0
CI3I3 ' C'
13 ~
61 '
and, therefore, the relation holds if, and only if,'
s: I
for all
13 €
~
I
By (5.4.3) we have that
which is true if, and only if,
13 ' CI CI3 5 I
for all
13 € ~\
which in turn is true if, and only 1f,
Let us now ast1ume that
Then,
F::-: DC for some
C and
ChM(C'C)
~
1 •
71
FSS'F' - DCS~'C'D'
and, by the CW.why-Schwarz l.neqva,ll ty ,we will obtain
x'DC13f3'C'D'x
S
x'DD'x13'C'Cf3
S. x'DD'x
w~ill
conclude
for all
[3 £ ~ ,
x.
and all
This implies that
DCf313 I C'D
s: DD
I
a.nd, hence 9 by our a.f,suTIrpt,:"i.on we
FSS'F' ~ DD'
= T. 0
Lenuna .5.4.1
F'F
Proof g
is an upper bound for
FSS'F' , that
FSS'F' .( FF'
f3 C
If
matrix
Gi .
The proof follows im.mediat.ely from the
CauchY'~Schwarz
in-
0
squall ty-.
Theorem
for all
1S,
5.4.2
T::: DD'
FF'
FF I
is
,.(
2: F13~'F'
i;:Tlwh
DD'
that
=T
•
f'or all
~ Jt.
<-9.
then the n.n.d.
9
symmetric
72
Proof s
~G'
CC' ::
is,
By Theorem :5.4.1 we can write F
,where
I::J = diag( 0i )
~
GG'
GIG:::: I
IJ.
and
for all
DC
f or some
C.
tet
is a diagonal matrix, ·that
•
It follows immediately from Theorem
0i { l
==
i.
5.4.1 and Lemma 5.4.1 that
Therefore ,
if, and only if,
x' (l\ - I)x ;( 0
for all
x.
In particular,
m'DG(A .. I)G'D'm ~ 0
for all
m.
Therefore,
m'(Dcc'D' .. DD')m
S
0
Thus, we may conclude by Theorem
FF'
Theorem
F~S'F'
~ DD' •
for all m •
5.4.1 that
0
5.4.2 states that if T€
V
is an upper bound for
, then it satisfies T 2 FF' • Then, in view of Lemma 5.4.1,
which stat.es that
FF'
FF' is also an upper bound for Ff3f3 'F' ,
= sup
FSS'F'
f3€~
Therefore, our orlten.on can be reduoed to
In Theorem .5.3.1 we have developed a suffi.cient condition for the
existence of the supremum that will be used in the followl.ng example
to illustra.te another route for obtaining the
sup Ff3S'F' •
l3 ~-Glk
Exam;Qle 5.4.1
It is clear that
FSf3 'F' ~ ktTF'
Now let
xe- E
2_
for all
f3 (; ~ •
, that is p
k"7F;
x':FF'k2x ::: 1 •
We must be a.ble ·to find a
[3 E ~ ,
such ,that
x'FI313'F'x :: 1 •
So ~ cF'x. Since 13 0130 :: k2 that implies that
To do so, let
k
2
~
2
c x'FF'x
and, hence, that
c 2 ::: k 2/x'FF'x •
Thus,
74
222
x'FF'k x
=
x'F~O~OF'x ~ (x'FF'x) c
= 1.
and, hence, by virtue of Theorem 5.3.1 we may conclud.e
FS~'F' ~
sup
FF'k
2
•
0
SJi
The problem on which we shall concentrate our attention now is
the search for an
(m x n)
matrix A , which minimizes our criterion
gi.ven in 5.4.4 and whlch will yield a possibly b:i.ased estimator of the
form
...~:: Ay
whlch will 'be superior to OL8 with respect to MSE.
In the process of solving a different problem, Foster
Chipman
[3J
obtained the matrix
AVA' + (r-lOC.)U(r-JIlC)'
p
'where
The form of thei.r solution is
[6J
and
A which minimizes the matrix
U and V are arlbtrary p.d. matrices.
A :: UX'(XUX' + V)-l.
O
This result can
be appli.ed to the criterion (5.4.4).
Prior to discovering their work, the following theorem 'was
proved.
U
::=
It solves the same problem for the case when
2
k I •
V
~ r,cf
and
Because of its simpl:1.city, the theorem along with its proof
wi.ll be presented.
Theorem ~
(y ,X~ ,r:cl)
In the llnear model
min Q(A)
A
I I "f 0
, r,
,
~ minfk-2SM(AY'p~;k,r:;J
A
f
and the equaJ.lty' holds when
A -- X' ~ c-2 + XX') =1 , where
75
That is, the DMMSE estimator is
- :: X''''o
(~ -2 + XX' )-1y.
S
Proof £ Let E
= (2:0- 2 +
XX') • From this we have
Q(A)
= c- 2AEA'
+ (I-AX)(I-AX)'
= c 2A2:A'
+ I .. AX - X' A' + AXX' A'
= A(Ec- 2 + XX')A
+ 1 - AX - X'A'
Since
we have
Observe that only the first term in (5.4.5) depends on A and is
equal to the zero matrix whenever - A
elude
min Q(A)
A
= (1
+ X'E-1 Xc 2 ,
= X'E- l
• From this, we may con-
76
and the equality holds when
A:::: X'(E, 0- 2
'I-
XX,)-l.
Finally~ our
estimator beoomes
H
S
::: Ay
= X' {-2
Eo
'I-
XX' )-1y.
o
The following theorem states that, holding the model
lEI
pIS
...
f.
0 , fixed, the DMMSE estimator
(y,XS,E;),
(pIS) , of the llnear funotion
oan be obtained by forming the oorresponding linear funotions,
P'S , of the DMMSE estimator,
...
S,
of
S•
Theorem 5.4.4
In the linear model
matrix
(y,XS,Ecf),
lEI
f.
0 , and for any
(t x m)
P'
DMMSE{P'S) ::: pI [DMMSE(If'~
and henoe,
Proofs
Let
E ~ (~-2 + XX') • From this we have
Q(A) :::: 0-2AEA , + (P'-AX)(P'-AX)'
:::: c- 2AEAI + P'P - P'X'A' - AXP + AXX'A'
:::: A(c- 2 r: + XX')A' + pp' - P'X'A' _ AXP ,
I
i
77
._ (A - p'X'm"l)E(A' .. E-lXP) + p' (I +
X'li-1XC2)~l
p •
(504.6)
Observe that only the first term in
eq~al
(,.4.6) depends on A and is
to the zero matrix whenever A ~ P'X'E -1 • From this, we may
conclude
min
Q(A)
~
P'(I + X,~~lXc2)-lp
A
and the equality holds when
Finally, oUr estimator
5•.5 An Alternative
--
(p'S)
becomes
Pro~edu.re
In view of Theorem
To Obtain The DMMSE Estimator
5.4.4, an alternative prooedure for obtaining
the DMMSE estimator, is to apply the minimax argument that has been
developed in the last sections, to the MSE of an arbitrary linear
combination of the observations,
~
'y , used to estimate an arbitrary
linear parametric function of the parameters,
model
(y ,XS ~ ;) ,
argument to a
II:1 :f
O.
p'~
, in the linear
That is f we will apply the minimax
78
Then, expressing the parameters :In the form
the direction cosines arid
maximize
M(t'y,p'/3)
k
For
Theorem
...
p'/3
:: [13;/3 :: ka ,
8
M
as in
ka
p
where
a are
k , we can
a, as followsB
by choice of
:: ; -t.n::t + k2(X'.t-p)' (X' t..p)
13~~
~
where
~
its length, then for fixed
8 (t' y, p '/3) :: SUR M(.t' y, p' 13)
M
/3
(5.5.2)
II all = 1}
(5.5.2),
•
Theorem
5.5.1
gives the o:p-t.:i.mum choice of
5•.5.1
The directionally minimax MSE estimator of any function
p'l3 is
where
:-:X'e
[ ~·2 r:+XX' J-1y,
where
~roofB
Taki.ng the derivati.ve of
8
M
w1.th respect to
J.
and equating
it to zero, we have
2
?
2
oZr:t+ 2k XX' t ~ 2k Xp "" 0 •
80 the value of
t
tha.t mi.nimizes
8
M
sa.ti-sfies the equat,ion
79
so that
giving the est.imator
,.,
._. p' 13
where
c
=2
The estimator
of
o
...
13 of f3 given in (5 •.5.4)
1.S the DMMSE estimator
f3 •
The results whlch appear in Theorem
in Theorem 5.4.4.
5.5.1 are certal.nly lncluded
The .former were obta.ined early in the investlgatlon
and sugges'ted that, the DMMSE estlmator of
13
might be
Without this rr'l.rrt p the proof of' Theorem 5.4.3 would have been i,mpossible.
The res'lllts are included to indicate the stepg which pro-
vided the motivation for the maln results of this chapter.
80
5.6
Rel~:!Jon ~tween ThLDMMSE Esti.m~tor And
The Ri1ge
Regressi~n
Estimator
In the next theorem 'we will show the relation of the DMMSE
estimator and the :ridge regression estimator.
stri.ct our attention to the model
For this we will re-
(y,XI3 ,I;) •
Theorem .5.6.1
The DMMSE est:'Lmator :is equ:1.valent to 'the ridge regression
e~;tima.tor
The result follows immediately if we apply a formula given by
Prooft
Rao
"'"*
[ + c. -2( X'X )-~lJ=l(
]
13 =1
' X'X )-1 X' Y of Hoerl and Kennard [ 9.
[17J
to the form
X'[c- 21
+ XX1J-l.
Thus,
where
~ is the OLS esUma:t.ors.
0
Indeed
81
5.7
Minimization of' The Tr[S(AYpk)]
For~he
In Chapter
4
Restriction Case
we studi.ed the class of biased estimators of
regression coefficients, .by computing LS estimates under sets of false
rest+ictions.
For that case we were restricting our attention to the
(YDXSDI~) in which we assume that rank(X) ~ m.
linear model
ohtain the class of estima.tors we impose
u
To
independent false re-
str.1.ctlon on the pa.rameter space andby' Theorem 4.1 we o'bserved that
RI3"~
restrictions of the form
0
were preferred over
RI3::: h.
More-
over, in view of Theorem 4.2 0 we saw tha·t we can specify the class of
eS'eimators of interest as
-13
for all
=:
-I3(R)
= (I.
R such that
- S-1R'R )-1
S X'y
RS-1R' ;,: I , where
S
has been defined in 2 • .5.
Wi:thtn th::l..s class we f'ound that ·the DMTMSE estlmator, given by Theorem
4.4,
has the form
By putting
..1
~l
A:: (I - SR'R)S . X' , and
2::= I
9
P '" I
0
i.n (5.4.4) 9
we will propose an alternative procedure for obtaining t.he DMTMSE
estimator.
matrix
R
This procedure conslsts of finding, for f'l:x:ed
k
0
the
that mlnlm::l..zes
(5.'1.2)
82
The optimum value of R is given in the following theorem.
Theorem .5.7.1
For the class of least squares estimates computed subject to a
set of
u
independently false restrictions
TS(R,k)
= tr[SM(R.k)]
2 m-u -1
2
~o 1: A,i + k u ,
i==1
where the equa1ity holds whenever
V2
R:: '" ~/2y
2 ,where
Az '
Ai'
are de:f1.ned ion (2.4), (2.'1) and (2.8). respectively'.
and
That :I.S, for
this class of estimators, our estimator is
; ~ V1ViS-1X'y
Proof s We can writs
matrix of rank
u.
R
- V2V2)S-1Xly •
= (BB') -1/2PJA1/2y
I
,where
B is any
(u x m)
From this we have that
P ::: B' (BB') -~
B
where
= (I
p
and
A
has been defined in (2.4).
In Theorem 4.4 we have proved that
tr!jy·1 _ 1\ -1/2PB,,-1/2]
Now let Q::: 1\1/2PBA1/2
rank(Q)
= rank(PB) = u.
"Z mE'u A,~l
•
(.5.7.2)
i~·~l
and observe that' Q2 ::: Q and
Let vi'
i
= l~
.", m be a
83
orthonormal basis for Em
such that the first
orthonormal basis for ~Q)
D
of them form an
11
we have
m
tr(Q'Q) = ~
. 1
J.=
v!Q'Qv.
J.
J.
u
=
~
i=l
m
v!Q'Qv
J.
i
v!v.
=u
u
_.
By using
L
i=l
(5.7.2)
~
+
v!Q'Qv.
i=u+l
J.
J.
.
J. J.
and
(5.7.3)
~
-
we may conclude
2 m-u
0
~
i=l
-1
2
A. + k u •
J.
It remains to demonstrate that the lower bound in
able.
Putting B
and substituting
c?tr[V
= (O:I)
• u
(5.7.5)
,,-ly, ]
I ~
1
is attain-
we get
into the left hand side of
+ k 2 tr[V V' J = ;
2 2
which is the right hand side of
(5.7.4).
Finally, our estimator takes the form
where
(5.7.4)
~ is the OLS estimator.
0
(5.7.4)
m-u
. ~ A~l + k 2u
i=l J.
gives
84
In view of Theorem .5.7.1 0 we may conclude the follmdng important fact.
FACTI
For the class of biased estimation computed subject to possibly
false restrictions, the DMTMSE estimator can be obtained by minimizing
with respect to
R either of the following two expressionsl
+ sup tr[FSS'F'
I3E~
J} ,
or
+ tr[sup FSI3 'F' J}
S~
where
F=8
...1 ,
RR.
85
6.
JOINT DIRECTIONALLY MINIMAX MEAN
SQUARED ERROR ESTIMATION
Th1.8 ehapter is a continuation of Chapter .5 1n that the form of
the estimator obta1.ned there 1.8 'utilized here.
The main subject
matter is the problem of estimating regression coeffic1ents for a set
of dependent regression equati.ons describing different var:i.ables but
sharing the same design matrix, 'when the
but posttive
X'X
matrix has some small
e:tg(~nvalues.
DMMSE estlmati.on applied equation-by--equat:l.on y:1.elds efficient
coefficient est:1.mators under spedal conditi.ons.
For conditions
generally encountered we propose an estimator in wh::tch all parameters
are est:i.mated simultaneously.
In Section 6.2, the proposed estimator is presented and its
asymptotic dlstri.bu'H.on i.s s·tudied.
In Section 6 oJ a comparison
between the varlance-cova:r1.ance matri.ces of the joint 18 estimator, -the
joint
r~dge
regression and the joint DMMSE estimation is presented.
~L So~_Bas:1.c Definitions and Notation
In this section we wi.ll define the Kronecker produc·t of matrices,
order o:f magnttude o:f a sequence, and some notation concernIng
asymptot:1.cally multlnormal random vectors.
Let
A ~ (a. 0)
:q
and
ma·trices, respecM. vely.
B ~ (b..)
lJ
be
(m x n)
and
Then the Kronecker produ<rt
(p x q)
86
A Qc)B ~
is an
(mp:x nq)
a ijB as the
Ca.
,B)
.' 1.J
mat.:r:Lx expressi'ble as a partIHoned matrlx with
' .)th
C~,J
titi on,
par"
i
1 •• .,
=,
m and
. 1 , ... , n •
J:'::
Definiti£ll 6.1.2
The sequence
dicated by
an is said to be of smaller order than
o(nk ) , if the sequence
k
n , in-
n-kan con~erges to zero, and·
moreover, we say that the sequence a n is at, most of order nk
~k
k
wri tt. en O(n ) , when the sequence n a
n is bounded.
Definition 6.1.3
If a random vector W
n converges in distributi.on to a multinormal
dist.rlbution with certain mean and variance-covariance, we will w:r1te
this eithers
or
6.2 The
Joi.nt~neralized
DMMSE Estimator
In Chapter 5 we have been considering a linear model of the form
(Y,~S,tcr2), I~I i 0 , and '~e have derived what we call the DMMSE
eS'Hmator for this model, na.mely g
87
In this sect:i.on "We wi.ll suppose that we have
p
general linear
mod.els of the form that. we have defined above p that is p we will have
a set of models of the forms
y.
J
:=
X[3. + ~J'
D
J
(6.2.1)
j =1, ••• p p ,
where we wi.ll assume
Var[£.J
J
:=
Irl: .;Cov[E. D €.]
JJ
J
~
== Icl. .•
i J
The system descr:i.bed by' (6.2.1) may' be wr:i:tten
aSti
Y1
X 0
• ••
0
[31
e1
Y2
0
X
• ••
0
13 2
€2
•••
==
Yp
0
0
X
• ••
I3 p
+
(6.2.2)
ep
Using the notation that we defined in Section 6.1, (6.2.2) may
be written as
where
and
f:}'.
~1
[€. 1. ' 1
••• , .
€~ nJ
·
88
Furthermore ~ we 'will a.ssume that
so that if we define the vector
we wi.ll have
,."
N
Ci' ... , '€n
and
is a set of independent random vectors.
The setting that we have defined above suggests the form for what
we shall call the joint DMMSE estimator, namelYD
where
In the fol1o-w:l..ng Theorem a.n alternative form for our estimator
...
~
1s proposed which will be very useful i,n future stud:i.es.
T,heo:rem 6.2.1
The joint DMMSE est1ma'tor 18 a 1i.near transformation of the joint
least sCl'uares estimator.
That ls D
where
is the joint L8 estimator.
Proof I
The proof follows immediately if we apply a formula proposed in
Problem 2.9 by Rao [17, p. 33J to our estimator
::: k2 (I @X,)[(~-l fiJI)
.. (t-1 ~ I)(I @ X)D(I ~ X' )(t-1 ~ I)
where
By doing some algebra in the expression above, we get
90
In most of the cases we are faced with I:. unknown.
Zellner
[21Jo
Followlng
a two-.step procedure is proposed which starts the
estImation of I:
using the matrix of mean squares and products of the
18 res1.duals8
where
and
so that
is a consIstent est.imator of
Although the 18 estimator
variable when
XOX
the eigenvalues are
xB.J
d:i.j
1\
~
the
D
<
J
""
(1 9 j)"th element of L:
( X' X)-1Xt y.
J
•
may be highly
Is 0'<;:108e" to colinearltY6 that :1.8, when some of
O~very
sma.ll relatJ.ve to the others 00 p the product
does not deperlli on the eigenvalues of
X' X and 0 therefore 9 their
91
use in this circumstance need not be avoided.
decomposition oi'
Indeed; by using the
X defined in (2.4) we have that
However,
X'X, and ,hence ,
which is clearly independent of the eigenvalues of
dA. . ::: y.I[I
l.J
l.
b>
UU '§ •
J
is independent of the eigenvalues of
X'X
i~self.
1\
Next., we apply
joint DMMSE esti.mator
6.2.1
(6.2.4) with L:
replaced by
L: , so that the
becomes~
Asymptotic Distribution Results For The Joint DMMSE Estimator
And The Joint Ridge Regression
Estimat~r
In thi.s section we will obtain some asymptotic results for the
joint DMMSE and joint ridge regression estimators.
The following theorem gives the asymptotic distribution of
Thi.s will be useful in later work.
92
Theorem 6.Z.1.1
X.
For the model
(i)
1n
(I
==
® X).§.
X'X converges, as
+
n
f ,
~ 00 ,
assume that
to a positive
definite matrix Q • r
llxill ~ t
(ii)
for all
.
1S
and' xi
the
i ,where
. th
1.-
co1umn
t
.p
0.1.
is a given number
the
X'
·
matrl.x.
Then,
has a limiting multinormal distribution with mean !Zero and variance(E- l
covariance
Proof!
[zoi
® Q) ,
that is,
Following the lines of the proof of Theorem 8.Z of Theil
p. 330J , observe first that
and moreover,
where the sum is over a set of independent random vectors.
S
n
1
== -
n
E x.x! =
n 1'-1
1· I
]
~
n
Therefore by assumption (ii),
X'X •
Now let
93
and
By assumption
and, furthermore, ·t.he Gharacteri.stie function of
can be expressed (see Cramer [5J, p. 21.3) as
qw (t)
1- ~ t,(~-I®x.x'.)t + o(t't) ,
==
J J
j
where
Thus,
~
1
._ w.
{n' J
(-t)
0:::
1 -
~
t'(i'l
®~
x.x'.)t + o(ttt/n)
(6.2.1.1)
J J
and i'rom (6.2.1.1) we may conclude
~ (t) =
.w
n
T1'
~.
J·=l..1.
V~
( t):::
W
J'~l
1
2
[1 - - t' (~
-1
1
® :..;. x.x '.)t
..n
J J
j
+ o(t't/n)J ,
where
n
11
(6.2.1.2)
w~
1
n
~
-w .•
j=l
'fu
J
Taking the log of (6.2.1. 2) p we have
log
where for each
q,w(t)
~
t,
n
1
-1
1
~ [1 - -2 t'(~
~)- x.x'.)t + R,.
j=l
n J J
J,n
R.
:=
J,n
l
o(-n ) •
Therefore, if we take
J,
n
suff:i,ciently large to make
1<1
n J J
J,n
1 -!2t,(~·'1®lx.x,.)t+R.
we will obta.in
1
log ~w(t) .- - -2 t'
since for sufficiently large
n
-1
~. (~
'-1
J-'
W.:.-1
n
X oXJ'.)t + 0(1)
J
n,
n
5. ~ 9..=6
01 n
J=
So we will get
log Q (t)
w
=- '2!
n
t' (~-l @1 , ~ x ox'.)t + 0(1) •
n j=l J J
Finally, we will ohta:i.n from assumption (i) that (6.2.1,3) converges to
and this cOITesponds to the ITlultlnormal d:i.stribution with mean zero
and covariance matr:i.x
(~-1 ® Q) • Theref'ore, bY' the Cont:1nuity
Theorem, we may conclude that
9.5
...1
'In
(E- l
® X')6
-
is distributed asymptotically as a
N[O,.~l ~QJ.
0
Q2rollar,y 6.2,!,1
Under ·the assumpti.ons of Theorem 6.2.1.1,
....1 x'f.
Vii.
J
has a limiting mu1t1.nomia1 distrlbut:1.on '1'1"1 th mean zero and variancecovariance ma:trlx
r:l:j.Q •
.J
Proof B Premu1 ti ply
..1 (I
Vn
by
[0, •••
C\
~
X.)€
-
0, I, 0, ••• , OJ ,where
p
I
is located a'tthe
j!ill
position and obtain from Theorem 6.2.1.1, by letting
that
...1
'Vii
X'€.
J
has limiting multinormal distribution with mean zero and variancecova:dance
cl:JJR.
0
One of the asymptotic distr:1.bution results about the joint DMMSE
esti.mator is contained in -the following theorem.
96
Theorem 6.2.1.2
(1, ~e ,I:. ® I) ,
For the model
(i)
(ii)
Ilxtil
~ m
J
for all
vn (.§.""'*
~.@.)
i
x.
~
J
to a p.d. matrix Q •
J
where
is the
~ assume that
m is a given but
iit
column of
XI •
has 11miting mulHnormal distribution with mean
zero and va:rlance""covariance
Proofs
L:
~ XIX converges, as n ~ CO
f'ixed number and
Then,
I I 1- 0
From the definition of
(L: ® Q-l) •
""'*
~
J
given by (6.2.5)~ we obtain
97
Hence,
(6.2.1.4)
where
and
(6.2.1..5)
Observe that, by definlng
we obtain
(6.2.1.6)
The second vector on the right hand side of Equat:lon (6.2.1.6) has
as J.·t s
.th
:r-
e1 ement
It has been shown in Corollary' 6.2.1.1 that
..l:. x'E.!J AN(O,cf. .Q)
\(ii
J
JJ
98
"
and, by the consistency of !: , which implies that
We may conclude (by the continuity of the inverse of nonsingular
matri.ces) that
( ~ja:
_ d j(1 )
...l
-.fn
-4 0
X'E
a
and, therefore, (6.2.1.?) converges i.n probability to zero.
Hence,
Now,
TIwrefore,
-L
-l4o
?r- An
k"'"Vn
,
and since it has been shown in Theorem 6.2.1.1 that
we may conclude that
o
In Section 5.6, we have found that for the case when we restrict
~
ourselves to the model
(y,XS,Id') , the DMMSE estimator is
99
equivalent to the ridge regxession estimator of Hoerl and Kennard;
namely,
where
A
S is the 013 estimator.
An obvious extension of the ridge regression estimator to a joint
regression estimator
is~
In the following theorem the asymptotic distribution of the
estimator defi.ned in (6.2.1.8) is presented.
Theorem 6.2.1.3
Under the assumptions of Theorem 6.2.1.2, we have that
vn(~RR -~) has a limiting multinormal distribution with mean zero
and variance-covariance E ® Q~l , that is,
Proof:. Following the lines of the proof of Theorem 6.2.1.1, and because of the form of the joint ridge regression estimator, we have
100
Hence,
~
-2
- 11) ::: - £-.. A B + A B
RR
In n
In n
Vn(~
Vn
where
A ::: [(I
In
and
Now,
and
and
therefore,
®1n
-2
XtX) + £-.. I]-l
n
101
From Theorems 6.2.1.2, 6.2.1.3, and a result glven by Theil [20,
p. 400J, we may conclude that the joint LS, joint DMMSE, and the joint
ridge regression estimators have the same limiting distribution.
6.3
Comparison Between The Variances of The Joint IS,
Joi.nt DMMSE and Joint Ridge Regression Estimators
We have concluded in the last section that the estimators under
consideration have the same limiting distribution.
The obvious next
s·tep is to compare, for small samples, i.ts mean squa.re errors.
It was
found that the expressi.ons involved were too complicated and that
no·thing useful could be learned.
It is for this reaeon that in this
section we shall only be concerned with the comparison of their
variances.
Note firs·t that both the joint DMMSE and joint ridge regression
estimators are transformations of the joint LS estimatorsB indeed,
and
where
102
and
y
'oil' <:
Y<':
is any gene:ralized variance derived from!:
'VI
C~'.6:q Y ~
P
y~
tr[IlJ/p or
Itl l / P)
such that
•
Now,
therefore,
....
Var(~)
=:
B
-I-
ADA
and
where
and
=:
[I
i
(I
k
6.3.1
0
(X ' X)-l
CompaI1son Between Var(~)
r1
•
and varCiRR)
The criterion that we will be using to compare variance-covariance
matrices is the one we have defined in Chapter .5, that is, we shall
say
if, and only if,
1\
...
Var~J - Var[SJ
will present such a comparison.
is n.n.d.
In the next theorem we
103
!heorem 6.3.1.1
For -the
va:rJ.ance~·covariance
matrices of the joint L8 and joint
ridge regression estimators o the following inequality holdss
Proofg
By definition 0 and from (6.3.1) and
(6.3.3)0
i:f 0 and only if 0
D ~ BDB •
This occurs if 9 and only ito
-t 'Dt ~
since
BDB
t'DBD
I
-t, for all
t,
is p.d.8 this happens if, and only if o
f. ~~i.t ~ 1
for all t
,
or equivalentlyo
"f
~n
t
tODt",-,
tiBDBt $.
.j.
using the result in Problem 22 :1n Rao [170 p. 74J.
Th:i.s last ineq'.laH.ty holds U', and only if 0
(6.3.1.1)
104
Now let P ~ I ~(x,x)~l and observe that the following relations
holds 8
(6.3.1.2)
Using the above results, we f:1.nd that (6.3.1.1) holds if, and
only if,
By writing
where
ac' :: ale : :
I
p
we see that (6.3.1.3) holds, if, and only if,
and this holds, if, and only if p
1
+-i....L
'0
2;>"
L
k
-pm
'where;\
1.s the smal1es·t eigenvalue of P.
-pm
holds If, and only If,
-
')(
k 2;\
-pm
"0
£
which, by assUlltIrcion, is true.
0
Hence, the inequality
6.3.2
Comparison Between
var(~
The comparIson between
the f ollowing
var(~)
10.5
...
and Var(~)
and
....
Var(~)
'Will be presented in
theorE~m..
Theorem 6.3.2.1
For the var:l.ance-covarlance matrix of the joint LS and joint
DMMSE esti.mators p the following inequality
"
ho1ds~
...
Var(~) ~ Var(~) •
Proofs
Using agruments similar to those in Theorems 6.3.1.1, 6.3.2,
"
...
Va.r(~) ~ Var(@.) - D = ADA
is n.n.d. l.f, and only if p
where A and
spectively.
D have been defined in
(6.3.4)
and
(6.3.1)p
re=
The above :l.nequality holds if, and only :l.f,
Ch [(- 1' D + I)2 J .) 1
m k2
and thi.s happens if p and only :if,
6
....E!! , 0
k
where
cpm
2 £
p
is the smallest eigenvalue of the
assumption (6.3.2.1) is tr0.e.
0
D matri.x.
By
106
6.3.3
Comparison Between
Var(~)
In this section 'we i l l be concerned with the compa:ri.son between
the variances of the joint DMMSE and ridge regression estimators.
This
will be ela.borated i.n the following theorem.
Theorem 6.303.1
The variance=covariance matrices of the joint DMMSE and ridge
:regression est'l.mators F.!,'re not comparable in the defined.
Proof 8
~~emje.
The proof' of' this theorem will be done by contradi.ction.
We
will assume first that
is a n.n.d. matrix, where
and (6.3.4), respectively.
D,
A and
B have been defined in (6~3.l)
Now, by simi..lar argument to that in Theorem
6.3.1.1, we have that (6.3.3.1) is true if, and on).y if',
and this is true if', a"1d only' if'D
By using the relations between
we have that
P
and
D given in (6.3.1.2),
(6.3.3.2) holds If, and only 1.1',
107
Equivalently,
Now, using the de£initions given in (2.5), and letting
L: :::
E1"E' , where E'E ::: EE
I
:::
I , we have
£rom this
I +
ik - p :::
(I
= (I ~ V)(1
® VX1 0
V ')
®M)(1 C9 V')
where
M
= diag(l +
-
-+) : :
-
diag(
k Ai
2
Y + k Ai
2
).
k Ai
Hence,
B2 ::: (I
+i
p )-2:::
(I@V)(I@R)(1@V'),
k
where
2
_.
kAi
2
R - d~ag(_
2)'
y + k ~
Now,
108
Therefore,
A- 2
==
(E(&)V)(I
+r' 0 ~ ,(1)2(E'
@v')
k
Thus,
x (E'
0
= Chm[(I
V')(I ~ V)(1 @ R)(1 ~ vI)J
(8) R)(I +
r
0 ~
,,1)2J
k
But,
(I f@R)(I +
r0
~ 1\1)2
k
2
k A. + y
==
diag( _
J.
<
y+kA
Therefore, (6.3.3.2) holds if, and only if,
and we arrive at a contradiction.
Assume now that
2 k)2 •
i
109
is a n.n.d. matrix.
Using the relationships between
P and D
given in (6.3.1.2), we may obtain that (6.3.3.3) is true if, and only
if,
But,
Furthermore,
Thus,
B-
2
= (I
@ V)(1
® W)(1
where
2
W= diag(
-
k A. + Y 2
~
k Ai
Now
).
(i) V')
110
Theref'ore,
Hence,
x (E' ~Vi)(I ~)V)(I ®W)(I ®V')]
x (E ' ®W)]
However, since
we have that
if', and only if',
>1.
111
This last i.nequal:i.ty
hold~1
i.f D and only if,
where Yl is the largest eigenvalue of I:
Clearly (6.3.3.4) is a contradiotion.
= EI"E'
•
In review, what we have is the followings
(1)
If we assume var(IRR) - Var(i)
arrive at a contrdil.:i.ction.
....
...
(ii) If we assume Var(.§.).~ var(§.RR)
is n.n.d., we
is n.n.d., we also
arrive at a contradiction.
....
Therefore, from (i) and (11) we may conclude that Var(@.)
are not comparable in the sense that we have defined.
...
and Var(.§.RR)
0
From Theorem 6.3.1.1 and Theorem 6.3.2.1, we may conclude that
for small SaJl1ples, both the joint DMMSE and ridge regression esti.mators
have oosma11er il variance-covariance matrices than the joint L6 estimators.
Unfortunately, from Theorem 6.3.3.1 we shall conclUde that the .
variance-covariance matr.tces of the joint DMMSE and joint ridge regression estimators are not comparable in the sense we have defined
before.
112
7. PROBLEMS FOR FURTHER RESEARCH
In this section we will present some ideas and problems that
arose in the course of this research, and which warrant further research.
Problem
7.1
Considering the setting in Chapter 4, ObenchalJ! has proposed
that 1.nstead of using the Euolidean length of the parameter vector
13 ,
we could define
This is, conslder
is
k
2
13'8
whose squared length in the no:rm of a p.d.
Q
,and proceed along the same lines of Chapter 4 wi. th this
modificatlon.
The same mod:i.ficatlon could be introduced in Chapter 5.
Problem
7.2
Under the set.t.ing of Chapter 5, try to find (if It. exists)
(7.1)
or prove that
(7.2)
-wi th equal:1.ty holding Tor
b:;:; 0 •
YObenchaiIl, R. L., September 13, 1976. Personal communlcation.
Professor. Department of Applied Statistics, Bell Laboratories,
Holmdel, N.J. 07733.
113
If' (7.2) :i,8 not true v and if ('7,1) exists D use it to obtain the
DMMSE es'timator.
Problem 7.3
Compare the MSE matrices for the joint DMMSE and joi.nt ridge
regression es·timators.
Problem
Let
7.4
-e'
be a. b:1.ased ed:1mator of
;\
e
at least (1 ~ ex) oonf:1.dence interval for
Using
e
v
construct an
e.
For the matr.i.x form (Chapter 5) of 'the minimax MSE oriterl.on D the
optimum estimator for the class of' OLS esti.mators calculated subject
to false restrictions has not been obtained.
114
8.
LIST OF REFERENCES
1.
Baranchik, A. J. 1964. Mu1·tip1e regression and estimation of
the mean of a nru1t:l..'variate normal distribution. Tech.
Report No • .51, Dept.• of Statistics, Stanford Univ.,
Stanford, Callf.
2.
Chapman, D. 1974. An extension and investigation of the properties of the ridge regression. Unpub1. Ph.D~ Thesis~ .Dept.
of Statistics, N.C. State Univ., Raleigh, N.C. Univ.
Microfilms, Ann Arbor, Mich.
3.
Chipman, J. S. 1964. On least squares with insufficient
observations. J. Amer. Stat. Assoc. .5981078-1111.
4.
Connlffe, D. and J. Stone. 1973.
regresslon. The Statistician
5.
Cramer, H. 1966.
Univ. Press,
A crltlcal view of' ridge
22 g181·-187.
Mathemat1cal Methods of Stat:i.stlcs.
N.J.
Pri.nceton
PI~nceton,
6.
Foster, M. 1961. An application of the Wiener Ko10mogorov
smoothi.ng theory of matrix inversion. J. Soc. Ind. and
App. Math. 98387-392.
7.
Go1ds·tein, M. and A. F. M. Smith. 1974. Ri.dge type estimators
for regresslon analysis. J. Royal Stat. Soc., Sere B,
368284-291.
8.
Hoerl, A. E. 1962. Application of ridge analysis to regression
problems. Chem. Eng. Progr. .588.54-.59.
9.
Hoerl, A. E. and R. W. Kennard. 1970. Ridge regressions8
Appllcations to nonorthogonal problems. Teclmometri.cs
12 &~.5-67, 69..,.82. ,
10.
James, W. and C. Stein. 1960. Esti.mati.on with Quadratlc Loss.
Proc. of Fourth Berkeley Symp. on Math. Stat. and Prob.
Univ. of Cali.f., Berkeley, Calif.
11.
Lancaster, P. 1969. Theory of Matrices.
New York C:i.ty, N. Y.
12.
Marcus, M. and. H. Mi.nc. 1969. Survey of Ma:tr:ix Theory and
Matrix Inequalities. Allyn and Bacon, Boston, Mass.
13.
Marquardt, D. W. 1970. Generalized inverse, ridge regression,
bi.ased linear esUmat:1.on and nonlinear esti.mati.on.
Technomet:r1.cs 728.591-612.
Academic PI'ess, Inc.,
115
14.
Mayer, L. S. and T. A. Wi11ke. 197.3. On biased estimation in
linear models. Technometrics 15:497-508.
15.
Pringle, Ro M. and A. Ao Rayner. 1971. Generalized Inverse
Matrices With Applications to Statistics. Hafner Pub1. Co.,
New York City, NoY.
16.
Rao, C. Ro 1971. Unified theory of linear estimation.
Ser. A, .33g.37l-.394.
17.
Rao, C. R. 197.3. Linear Statistical Inference and Its
Appli.cations. (Second Edition). John Wiley and Sons, Inc.,
New York City, N.Y.
18.
Toro-Viscarrondo, C. and T. D. Wallace
1968. A test of the
mean square error criterion for restrlctions in linear
regression. J. Amer. Stat. Assoc. 63~558-572.
19.
Sclove, S. L. 1968. Improved estimators for coefficients in
linear regression. J. Amer. Stat. Assoc. 63~596-606.
20.
Theil, Ho 1971. Principles of Econometrics.
Sons, Inc., New York City, No Y.
21.
Zellner, A. 1962. An efficient method of estimating seemingly
unrelated regressions and test for aggregation bias. J.
Amer. Stat. Assoc. 57~348-368.
Sankhya,
0
John Wiley and
© Copyright 2026 Paperzz