•
WEIGHTED RIDGE REGRESSION
by
JOHN WARREN
A thesis submitted to the Graduate Faculty of
North Carolina State University
i~ partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
DEPARTMENT OF STATISTICS
RALEIGH
198 1
APPROVED BY:
';:::\
/'
..
/
T
,//:'/\'
.... :/ t \". . ~
a
,6
,
I,
/
f
/
V ......
A.. •. ,
,
'.
/
I
L,
f?7/~---
Co-chairman 'of Advisory Committee
~~tv~-~ ~V~ w:{/~q---Co-chairman of Advisory Committee
ABSTRACT
WARREN, JOHN.
Weighted Ridge Regression.
(Under the direction of
F. G. GIESBRECHT and A. R. MANSON.)
The biased estimation technique known as ridge regression assumes
that all the data points have the same importance or weighting.
The
technique of weighted ridge regression allows the points to carry
different weights and such an action has a profound effect on the
variance and bias-squared properties of the resulting estimators.
Many
of the properties of this weighted estimator are compared and contrasted
with those of ridge regression and ordinary least squares.
The
weightings for which weighted ridge estimation is an improvement over
ridge regression are derived, and the extension of the weighting
technique to specific cases of practical interest are examined.
ii
BIOGRAPHY
John Warren was born on November 16, 1945, in Barnet, London,
England.
He received his elementary education in Hillingdon, London,
and his secondary education at Harrow Technical College, London,
graduating in 1964.
He spent a year as a research technician with the Cape Asbestos
Group before continuing his studies at Sir John Cass College of the
University of London.
He was awarded a Bachelor of Science degree in
Physics and Mathematics in 1968.
He entered North Carolina State University in 1968 and received
the last Master of Experimental Statistics degree awarded by the
university in 1970.
He was appointed Visiting Instructor of Statistics
in 1975 in the Department of Statistics, at North Carolina State
University.
In June 1980 he accepted an appointment as Statistician in
the Environmental Protection agency, Washington, D.C.
He married the
former Miss Katherine Mary Brooks of Garner, North Carolina, in 1974.
iii
ACKNOWLEDGEMENTS
To give appreciative thanks to all persons who have inspired,
counseled, and assisted in this study, would be impossible.
The
author does, however, wish to single out for special mention
Dr. F. G. Giesbrecht and Dr. A. R. Manson for their invaluable advice
and encouragement in the completion of this study.
Appreciation also
is extended to the other members of the committee, Drs. R. G. D. Steel,
/
A. H. E. Grandage, R. A. Schrimper and T. W. Reiland.
The author
also wishes to express his sincere appreciation to Drs. D. J. Drummond
and P. M. Burrows for their encouragement and friendship through the
past years.
A special note of gratitude should be sounded for the author's
secretary, Judith B. Adcock, for her careful editing and typing of
the dissertation.
Finally, the author wishes to thank his wife, Kathy, for her
patience during the course of this study.
iv
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS •
iii
iv
LIST OF TABLES
LIST OF FIGURES . .
1.
INTRODUCTION
1.1
2.
3.
4.
v
Preliminary Notation
3
LITERATURE REVIEW
2.1
The Ordinary Least Squares Estimator . • . • • •
3
2.2
The Minimum Euclidean Squared-distance Estimator .
6
2.3
The Artificially Restricted Estimator
8
2.4
The James-Stein Estimator
2.5
The Marquardt Estimator
2.6
The Ridge Regression Estimator of Hoerl and Kennard
.•. .
• . ..
• • • .
9
• . . • • • • . • • . • . •
11
14
WEIGHTED ESTIMATION
3.1
Basic Properties. • • • . . • • . • . • . • • . . .
20
3.2
Weighted Estimation and the Design of Experiments
26
3.3
Restricted O.L.S., Weighted Estimation, Infinite-n
30
3.4
Changing the Sign by Use of Weighted Estimation
34
WEIGHTED RIDGE REGRESSION
4.1
Basic Properties of Weighted Ridge Regression
37
4.2
Relationship of Weighted Ridge to Other Estimators.
42
4.3
The Changing of the k-value for Fixed n-Weighting
47
4.4
Bias-Squared Comparisons for Different Estimators
51
v
5.
6.
4.5
Comparison of Bias-Squareds by a Geometric Argument
53
4.6
The Sequential Weighted Ridge Estimator
61
THE CALCULATION OF n-WEIGHTS
5.1
A-optimality of the Variance-covariance Matrix
70
5.2
A-optimality of the Bias-squared Matrix
71
5.3
D-optimality of the Variance-covariance Matrix
73
5.4
Minimum Residual Sum of Squares
75
5.5
Stability of Estimates of Parameters
76
.•••
...·
Q** Standard
····
S* Standard .
····
S** Standard . . . . .
.····
5.5.1
The Q* Standard
5.5.2
The
5.5.3
The
5.5.4
The
.
79
80
84
88
5.6
Sign Change of an Estimated Parameter
90
5.7
Iterative Sequential n-weighting
90
SPECIFIC WEIGHTED RIDGE REGRESSIONS
6.1
Functional Weighted Ridge Regression.
95
6.1.1
The single eigenvector case
95
6.1.2
The p-eigenvectors case •.
101
6.2
Prior Ridge Regression • • . • . • . • • .
104
6.3
Variance Weighted Ridge Regression
108
7•
AN EXAMPLE
111
8.
SUMMARY
130
REFERENCES
133
vi
LIST OF TABLES
Page
2.1
Commonly used techniques to estimate the k-value •
19
3.1
An interpretation of n-weighting • • •
24
7.1
Initial values for the example (p=5, n =9)
7.2
Transformed values for the example (p=5, n =9)
113
7.3
Ridge regression values
115
7.4
Iterative n-weightings for A-optimality of the variance-
112
o
o
covariance matrix
7.5
116
Iterative n-weightings for D-optimality of the variancecovariance matrix
7.6
122
n-weightings for 1st iteration, A-optimality of the
variance-covariance matrix . . • . . • . . .
7.7
n-weightings for 1st iteration, D-optimality of the
variance-covariance matrix . .
7.8
123
-124
One point infinitely weighted, A-optimality of the
bias matrix, then A-optimality of the variancecovariance matrix
7.9
.............
126
Two points infinitely weighted, A-optimality of the
bias matrix, then A-optimality of the variancecovariance matrix
127
7.10 The points infinitely weighted, A-optimality of the
bias matrix, the A-optimality of the variance-covariance
matrix • . •
128
vii
7.11 Four points infinitely weighted, A-optimality of the
bias matrix, the A-optimality of the variance-covariance
matrix
............
129
viii
LIST OF FIGURES
Page
2.1
Trace mean square error for the ridge estimator
2.2
The ridge-plot
4.1
The difference in bias-squareds resulting in all
....
18
pas i tive eigenvalues • • • • • • • • • • • •
4.2
16
56
The difference in bias-squareds resulting in
indeterminant eigenvalues due to different
orientation of axes
4.3
56
The differences in bias-squareds resulting in
indeterminant eigenvalues due to the interaction
of ellipses
4.4
59
The differences in bias-squareds resulting in
positive equivalent eigenvalues
= 2, n o = 6
5.1
An augmented design for p
7.1
The ridge-plot for estimates in standa.rd
ridge regression .
7.2
.
.........
117
118
Individual weighting of points for A-optimality of
the variance-covariance matrix
7.4
82
Iterative n-weightings for A-optimality of the
variance-covariance matrix
7.3
59
119
Beta-coefficients as a function of n-weights chosen,
k = 0.01
............... ...........
. 120
"
1.
INTRODUCTION
There are many situations when the possible linear relationship
of a response variable to a set of known explanatory variables may
be investigated.
denoted by
Y ,
i
variables by
If the ith observation of the response variable is
and the corresponding values of the set of known
X·l,X·Z,···,X.
,
~
~
~p
then the functional relationship between
the Y-variable and the X-variables may be expressed as
(1.1)
where
ao
is a constant,
Sj
is a regression coefficient
j
= 1,2, ••• ,p
,
and
is an unobservable error
i
= 1,2, •.• ,no
The estimation of the regression coefficients of equation (1.1) is
usually done by the technique known as ordinary least squares estimation
(O.L.S.).
When the model describing the linear relationship (equation
(1.1)) is correct, then O.L.S. estimation provides unbiased estimates
of the parameters
13 O,S 1" ••13 p •
There are, however, situations when these unbiased estimates do
not give physically meaningful values when put into the context of the
engineering, biological or economic situation under investigation.
2
It is under such circumstances that alternatives to a.L.S. estimation
are sought.
There are situations where the cause of meaningless
estimates may be an inadequate choice of model, choice of an incorrect
model, misspecification of the distribution of error terms in the
model, etc.
These problems are, to a certain degree, correctable
when appropriate statistical techniques are used.
However, situations
arise where the choice of model may be correct, the choice of distribution of error
terms appropriate, but a.L.S. estimation may fail to
yield satisfactory estimates.
If the initial set of known variables
is called the design matrix (of dimension
n
responses will be called the observed vector
o
x p), then the set of
(n
o
x 1).
When the design
matrix is of full column rank but at least one column can nearly be
expressed as a linear function of the remaining columns, the matrix is
said to be suffering from multicollinearity or ill conditioning.
Exact
multicollinearity would imply a perfect linear function existed between
the columns, and the resulting design matrix would be of less than full
column rank.
The key disadvantage associated with a.L.S. estimation, in the presence of multicollinearity, is that the estimates have unacceptably large
variances and consequently may be of unreasonable magnitude or sign.
It will be assumed that only the initial set of variables and responses
are available to the analyst.
If further controlled experimentation or
data collection were possible, then the "correlation bonds" between the
variables in the design matrix may be weakened by the judicious choice of
levels of the known variables.
This approach has been investigated by
Silvey (1969), Dykstra (1971), and others.
3
1.1
Preliminary Notation
The severity of multicollinearity may be reduced by transforming
the initial design matrix into the standardized form,
standardized form, the
matrix.
"x'x"
In this
matrix is in the form of a correlation
The linear model in the original variables may be written as
(1.1.1)
where
yO
is an
n
x 1
o
vector of observable random variables and
represents the response or dependent variable,
is the
n
o
x 1
unit vector,
is the
initial design matrix and represents the
regressor or independent variables,
=
000
[x ,x , •.. ,x I ,
-l -2
-p
SO
is the
p x 1
Q
f.>o
is an unknown constant,
E
is the
n
o
vector of unknown parameters,
x 1
vector of unobservable independent random
errors, assumed to be normally distributed with mean zero
and variance
a
2
The standardized model is
Y
= xa +
E
(1.1.2)
4
where
-
1
y = . n
x
=
Yj
o
--
[xl,xZ'""x
-p ]
o
-J
-J
(x. - x.)
o
-J
-J
~
0
{ (x. - x.) (x.
-J
x. = x. _j
-J
J
and
x
j
= -10
n
x. J •
-J -
The standardized coefficients may be related to the original
coefficients by
(1.1.3)
where
S
is a diagonal matrix with a jth element
- ~ 0 {(x o .-x.)
(x.-x.» ~ •
- - J -J
-J-J
This standardization is sometimes referred to as "the removal of
unnecessary multicollinearity" by analysts.
5
2.
LITERATURE REVIEW
The estimation of the unknown parameters,
~,
in the standard
model
Y
X13
+
E
occupies a considerable portion of modern statistical literature.
The
principal estimators and the paragraph numbers discussing them are:
2.1
The Ordinary Least Squares Estimator,
2.2
The Minimum Euclidean Squared-distance Estimator,
2.3
The Artificially Restricted Estimator,
2.4
The James-Stein Estimator,
2.5
The Marquardt Estimator, and
2.6
The Ridge Regression Estimator of Hoer1 and Kennard.
Other techniques of estimation, such as nonparametric estimation,
will not be considered in this dissertation.
2.1.
The Ordinary Least Squares Estimator.
The estimation technique and discussion of the properties of the
O.L.S. estimator
(2.1.1)
may be found in standard texts such as Goldberger (1964) or Searle (1971).
If the Euclidean squared-distance of the O.L.S. estimator from the
true parameter vector
§
is defined as
(2.1.2)
6
then
(2.1.3)
with
= 20 4 tr(X'X) -2 •
(2.1.4)
Equations (2.1.3) and (2.1.4) may be written in canonical form as
(2.1.5)
and
(2.1.6)
where
A.
1
is small,
is the ith eigenvalue of
2
E(L )
X'X.
It is clear that when
2.2.
P
of equation (2.1.5) is large, implying that the O.L.S.
estimates may be far from the true parameter values.
it follows that
A
2
V(L )
As
A
p
is small,
of equation (2.1.6) will be large.
The Minimum Euclidean Squared-distance Estimator
The estimator which minimizes the expected Euclidean squared-
distance from the estimates to the true parameter values is defined to
be
~E'
If the estimator is a linear function of the observed values
then
.§E
= F'Y
(2.2.1)
By definition of expected Euclidean squared-distance from estimate to
parameter,
7
(2.2.2)
Substitution of equation (2.2.1) into equation (2.2.2) yields
tr{(F'X-I)f3f3'(F'X-I)' + iFF'}
Minimization of the expression for
E(L
2
)
E
•
(2.2.3)
given in equation (2.2.3)
with respect to F yields,
(2.2.4)
By use of Corollary A.l.3 of the Appendix, Equation (2.2.4) may be
expressed as
F' =
..1:....2
0
f3f3'X'
--
_..1:....12
0
1
~2+.§ 'X
~
f3f3'X 'Xf3f3'X' ,
'x.§j --
--
(2.2.5)
and substituted into equation (2.2.1) to give
(2.2.6)
The expected value of
and consequently,
~E
§E
of Equation (2.2.6) is
underestimates
f3.
It should be noted that substitution of estimates for parameters
in Equation (2.2.6) yields an estimator whose exact properties are
unknown.
8
2.3.
The Artificially Restricted Estimator
It is well known that placing restrictions on
RS
=r
S
in the form
results in estimators with smaller variance than those obtained
by a.L.S. estimation.
Formally, this means that
is positive semi-definite.
~
placed on
V(~) - V(S
' t e d)
-res tr1C
-
This holds true even when the restrictions
are not valid (Goldberger, 1964).
Toro-Vizcarrondo and
Wallace (1968) investigated this phenomenon with respect to its
influence on mean square error estimation.
With the artificial restriction
RS = r
(where
R is
q x p
and
of rank q), the restricted estimator is
_
A
~restricted = .§
(2.3.1)
and has a variance-covariance matrix,
The difference in mean squared error, MSE(~) - MSE(~restricted)'
is
(2.3.3)
This matrix is positive semi-definite when
(2.3.4)
is positive semi-definite.
9
The condition for
QR to be positive semi-definite is
(R~-E) {R(X"X)-l R ..} -l(R~-E)
2a
The
(2.3.5)
2
A of Equation (2.3.5) may be related to the non-centricity
parameter
of a non-central F-distribution.
Although this technique of
artificially restraining an estimator has some merits, it is seldom
used by analysts owing to the difficulty of determining the optimum
restriction.
2.4.
The James-Stein Estimator
James and Stein (1961) defined an estimator that was a function
of the O.L.S. estimator of the form
_
A
~JS =
(2.4.1)
dB
Their criterion for the determination of
d
was to minimize the expected
Euclidean squared-distance between the estimates to the corresponding
values.
If "d" of equation (2.4.1) is
d
where
0 < c < 2(p-2),
=
2
ca
1--
(2.4.2)
then
(2.4.3)
If
0
of
c
.~
2
is unknown, but estimated from the data by
in equation (2.4.2) are
_·..··c··-
j'
s2, then the limits
10
o<
2(p-2)(n -p)
c <
o
-(u -p+2)
(2.4.4)
o
c =
(p-2)(n -p)
.......::..0_ _
(2.4.5)
(n -p+2)
o
Sc10ve (1968) points out that when the elements of the vector
S
can be separated into two distinct parts
a)
parameters where unbiased estimators are desired, and
b)
parameters where James-Stein estimators are desired,
then O.L.S. estimation may be combined with biased James-Stein estimation.
Let
S" = rS"
L:""a
with the O.L.S. estimate of
3
-a
James-Stein estimate of
denoted by
S
-b
denoted by
B (p a
-a
x 1),
~bJS(Pb x 1) •
and the
Define
(2.4.6)
~
where
~b
is the O.L.S. estimate of
~b'
and i f
is such that
11
then
(2.4.7)
with a minimum mean square error when
(p-2)(n -p)
o
(n
It may be shown that when
a
2
(2.4.8)
-P+2)
o
is known, the expected shrinkage in
the James-Stein estimator is more than for the case where
and estimated from the data.
a
2
is unknown,
The greatest disadvantage with James-Stein
estimation is that the theoretical properties of the estimator have only
been investigated for the case of complete orthogonality of design,
X~X
= I. When the design used is not orthogonal, James-Stein-type
estimation yields an estimator with intractable properties.
2.5
The Marquardt Estimator
The preceding estimators have assumed the design matrix
of full column rank which implies that
X X has
p
X to be
non-zero eigenvalues.
There are occasions when some of the smallest eigenvalues are very close
to zero and severe multicollinearity exists.
As Marquardt (1970) reasons,
12
practical estimation problems give rise to matrices
having eigenvalues that may be grouped into three
types: substantially greater than zero, slightly
greater than zero, precisely zero (except for rounding
error). In computations it may sometimes be difficult
to distinguish between adjacent types.
X~X
The Marquardt, or fractional rank estimator, replaces the
(X~X)+ , a matrix of rank r, (r
used in O. L. S. estimation by
(X "X)-l = • L~
1
i":"
J=l
J
~
~J' ~J' ,
,
PllP~
P
<; p) •
as
Marquardt defined
where
(X~X)-l
[~1""'~
J. He then defined
-p
(X"X)+ =
r
1
r
(X~X)+r as
..
I "\ ~.~.
j=l j J J
P lIP"
r
r
'
where
and
A
=
r~~-hQ_J.
[
I
p-:J
(2.5.1)
13
Suppose the matrix
r
or
(r+l).
X~X
has an unknown rank, but known to be either
The fractional rank estimator is defined as
(2.5.2)
where
(X'X); = r*
L Al
j=l j
~j~j~
J
+ [d
Ar
r*+l
~r*+l ~;*+l
'
and
d
r
=r
- r
*
0< d
From equation (2.5.2), note that
Ap-r
is the null matrix.
r
<1.
~~ is a biased estimator unless
The variance of the fractional rank estimator
defined in equation (2.5.2) is
(2.5.3)
and the bias-matrix is
(2.5.4)
For the mean square error of
~
to be less than the variance of
ordinary least squares, it is sufficient that
(2.5.5)
14
The major problem with the Marquardt estimator is the determination of
"r."
In
..
s~tuat~ons
where the rank of
X~X
is reasonably large, a cluster-
ing of groups of eigenvalues is common and the selection of appropriate
r*
2.6
(and hence r) difficult.
The Ridge Regression Estimator of Hoerl and Kennard
Hoerl and Kennard (1970) proposed adding a small positive constant
to the main diagonal of
procedure.
X~X
prior to its inversion in the D.L.S.
The ridge estimator is defined as
s,*
=
(X~X+kI) "'lx~~
,
(2.6.1)
This estimator has a variance-covariance matrix,
and bias-matrix,
(2.6.3)
The expected Euclidean squared-distance 'E(L*2}
is defined as
and may be written as '
= cr 2
I
i=l
by application of equations (2.6.2) and (2.6.3).
This expected
Euclidean squared-distance is equivalent to the trace of the mean square
error of
*
(3
If equation (2.6.4) is written in canonical form, then
15
+
I
(2.6.5)
i=l
where
ct = p~
.
The expected Euclidean squared-distance for the O.L.S. estimator
may be compared to that for the ridge regression estimator by defining
(A.a
2 + k 2ct.)
2
1.
1.
(A. + k)2
J
(2.6.6)
.
1.
Ridge regression has a smaller Euclidean squared distance if
~E
of
equation (2.6.6) is positive, which implies that
(6.2.7)
k<
It may be shown that the trace of the variance-covariance matrix of the
ridge estimator (the first term of equation (2.6.5», is a monotonically
decreasing function for positive k-value.
Similarly, the trace bias11
matrix (the second term of equation (2.6.5»
increasing function for positive k-value.
ii
is a monotonically
::il
The behavior of the trace of
il
ii
I'
the mean square error matrix of the ridge estimators is illustrated in
Figure 2.1.
minimizes
I
There is no simple method of obtaining the k-value which
E(L*2)
of equation (2.6.5).
A commonly used. technique is to
determine a k-value which stabilizes the estimates via visual examination
of the ridge-plot.
In the construction of a ridge-plot, individual
estimates are plotted as a
,,
. i
>
}
I
16
trace M.S.E.
A
trace variance-covariance of B
trace M.S.E. of
trace bias-matrix of
*
~
trace variance-covariance of
e*
--I-----...,j~-----...;;;=----------------_..oi~k
k
k
optimal
max
Figure 2.1.
Trace mean square error for the ridge estimator.
17
function of k, and a k
typical example.
sta
b'l'
~
~ty
chosen visually, Figure 2.2 being a
There are many techniques for the choosing of a k-va1ue
to be used in ridge estimation. Table 2.1 gives some of the more
commonly used k-va1ues.
It must be noted that a non-stochastic k-va1ue is needed for the
mathematical properties of the ridge estimator to be valid.
Many of the
k-va1ues proposed in the literature are stochastic and their properties
discussed with reference to simulation experiments.
In this dissertation,
it will be assumed the k-va1ue is of a non-stochastic nature, implying
that the observed values are not used to determine the appropriate
k-value for the ridge estimator.
18
Estimated
B-coefficients
All estimates tend to zero as
k tends to 00.
a.L.S. estimates denoted by
*
k
Region of approximate
stability of estimates'"
kstability
Figure 2.2.
The ridge-plot.
19
Table 2.1.
Commonly used techniques to estimate the
Technique
k~va1ue.
Proponent
(i)
Visual ridge-plot
(ii)
k =
(iii)
Choose k such that
Hoer1 and Kennard (1970)
,,2
_...;,.0_.,-
Newhouse and Oman (1971)
1
(§.. .§)~
§,*"2,* = c i f c'>O,
McDonald and Galarneau
(1975)
Choose k = 0 if c<O, where
=
c
2
(iv)
k - ~".e
Hoer1, Kennard and
Baldwin (1975)
(v)
Choose k via k-equa1ity pre-test
Obenchain (1975)
(vi)
po
k= .f?*".§*
-~
2
(vii)
k =
Hoer1 and Kennard (1976)
,,2
po
r
i=l
A.a..
1. 1.
Lawless and Wang (1976)
2
(viii)
Choose k via a stability index
Vinod (1976)
(ix)
Choose kwith reference to the
eigenvalues of X..X
Marquina (1978)
20
3.
WEIGHTED ESTIMATION
The estimation procedures discussed in Chapter 2 assumed each
observation had the same (or standard) weight.
allows observation to have unequal weights.
~-weighting
where 0
Weighted estimation
This weighting, called
00, is a monotone, smooth function.
< ~ <
with standard weight has a
~-weighting
equal to unity.
A point
With this
system of weighting imposed on the linear model the properties of the
weighted estimator may be investigated.
3.1.
Basic Properties
The observed values,
!,
are related to the independent values, X,
by the linear model,
(3.1.1)
where the error vector
2
10 , and
~,
is normally distributed with mean Q, variance
f is the vector of unknown parameters.
Without loss of generality, consider the weighting of the first
observation.
The observations have a weighting-matrix
~l'
where,
the identity matrix with first element replaced by a weighting,
~.
~ris
21
(3.1.2)
~(l)
It may be easily established that,
(3.1.3 )
with a variance-covariance matrix,
(3.1.4 )
From the Gauss-Markov theorem, the difference between variancecovariance matrices of weighted point and O.L.S. estimators is
positive semi-definite for any TI-weighting other than the trivial case
of all TI.
1
IS
= 1.
The weighting of more than a single point results in the weighted
estimator, defined as,
(3.1.5)
with,
E(~) =~,
(3.1.6 )
and variance-covariance matrix given by,
V(~)
The TI-weighting for use with weighted estimation is,
(3.1.7)
22
Q
n
(3.1.8)
=
o
From equation (3.1.1), the first observation, or data-point, may be
written,
where w is the first row of X, and y the first element of Y.
(3.1.2) may be written as,
- (1
S
-(1)
=
-
+
ww~
1f - -
where,
w is the 1.. th row of X,
-i
~
Yi is the
n
o
)' . W
~
i=l
.th
1.
element of
!.,
.T.T~ = X~X
-1..>L.
1.
and,
n
I
o
w.y. = X~Y.
-1. 1.
i=l
Therefore equation (3.1.10) may be written as,
Equation
23
0.1.11)
§.( 1)
and equation (3.1.7) as,
where,
1
n = -
7f
-
1.
The advantages of using n-weighting include simplicity in
computation, the direct evaluation of the effect of weighting the first
point, and a relationship with the design of an experiment (to be
discussed in Section 3.2 of Chapter 3).
An explanation of n-weighting
and 7f-weighting may be found in Table 3.1.
The extension to weighted estimation may.be easily accomplished
by defining,
n- 1
n
I + D,
o
1
where D is a diagonal matrix with i th diagonal element n. = -7f
1.
- 1.
i
From equation (3.1.5),
0.1.13)
However,
II
n0
L
X"DX =
i=l
and,
n
X"DX =
i
"
II
n.w
.w~,
1.-1.-1.
0
L
i=l
ni~iY i'
"
ii
Ii
24
Table 3.1.
An interpretation of nand n-weighting.
Interpretation
n
-1
-l<n<O
o
O<n<co
co
co
This implies the point under consideration has been completely
eliminated from the data set.
l<n<co
Represents the gradual elimination
of the point from the data set.
In other words, a "discounting" of
its "importance."
1
This is ordinary least squares.
O<n<l
Represents the assigning of greater
"importance" to this point. In
other words, the residual of the
fitted function at this point is
diminished with increasing n
o
The residual at this point is forced
to be zero, the fitted function
"passes" through this point.
25
giving.
n
o
I
i=l
(3.1.14)
n.w. y . ) .
1.-1. 1.
-
The variance-covariance matrix for S is obtained from equation (3.1.1Z).
-
V(~)
n
O
rlx ",x + .£r n.(n.
+
1.
1.
1.=1
(x.
•
no
v
J\.
...
+ r..\' n.w·w.
1.-1.-1.
i=l
where.
n
X
J
........ ZX
JJ
=
)-1
(JZ
2)W.W~]
-1.-1.
•
(3.1.15)
o
\' n:w.w·: .
r..
i=l
1.-1.-1.
These forms for the weighted estimator and its variance are too
cumbersome for efficient algebraic manipulation and so the following
nomenclature is adopted.
For weighted point estimation let
-X...X-
= X... X
+
noR oR ....
X"'y = X"'y
+
n.w.
with.
Equation (3.1.11) may be written.
-
~(l)
=
- - -1- (X .. X)
X.....Y.
(3.1.16)
26
and equation (3.1.12) as,
(3.1.17)
For weighted estimation,
n
and
n
X"'y = X"'y
+
o
o
I
i=l
n.w.y .•
1-1 1
Equation (3.1.14) may then be written as
(3.1.18)
and equation (3.1.15) as,
n
v(ih
-
=
(lC"'X) -1 [X'-X
+
()
I n. (n.
i=l
1
(3.1.19)
1
The potential confusion over dual definition of
cannct occur
as weighted point estimation, and weighted estimation, will be,clearly
identified by presence, or absence, of the subscript "(1)" on the
estimate vector.
3.2.
Weighted Estimation and the Design of Experiments.
One of the advantages of using the n-weighting system, as opposed
to the
~-weighting
system, is that there exists a simple relationship
between n-weighting and the design of an experiment.
The "design of
11
"
II
an experiment" aspect solely concerns the structure of the independent,
and dependent variables.
I:
Ii
27
Without loss of generality, suppose the first observation-point
is repeated "n" times.
All these "new" n points are perfectly
correlated with the first point.
For simplicity of notation, the new
(or augmented) matrices will have the same symbol as the standard
matrix but with the addition of a tilde.
The basic augmented matrices
and vectors are;
X
(n
x p)
.......
0
w"
X
(3.2.1)
(n0 + n) x p
(n x p)
w"
e
where "e" denotes the (unknown) error first point (e
= £1)'
The model under consideration is,
y.
X~
+
~,
(3.2.4)
28
and the repeated point estimator from this model is,
(3.2.5)
Obviously,
(3.2.6)
and,
(3.2.7)
where,
Now,
E(E E-")
--
f.E(E e-")
I
I
--
I
--------+-------I
,
E(~ ~-")
tI
E(~ ~-")
and
I
I
I
I
I
x n)
0
o I (n0 x n)
----------+--------I
(n
=
H
(n x n )
0
where
H-"
I
I
I
I
I
I
2
J
(n x n)
I is the identity matrix,
J has all elements equal to unity,
H has the first column with elements equal to
unity, all other elements equal to zero.
29
If the augmented X-matrix is written as,
then,
X...X +
X"'VX
wax
+
X~H"'W~
+
WJW~,
where,
wax
= (w, ••• ,~)
=
nw
(l,Q, ... ,Q)
X,
w~
and,
w~
It follows that equation (3.2.10) may be simplified to,
-
V(l(l)R)
= [ X"'X + nw w"'] -1 [X"'X +
2
-1 2
(2n+n)~ ~"'][X"'X + n~~~J 0 ,
(3.2.8)
which is seen to be the same as equation (3.1.12).
The implication
is that whenever the TI-weighting is such that the n-weighting is integer,
30
the weighted point estimator may be regarded as being the "ordinary
least squares" style estimator obtained from a situation where one of
the points has been identically repeated "n" times.
Of course, if the W matrix represents new independent observations,
then the repeated point estimator reduces to the O.L.S. estimator,
because H becomes the null matrix, and the unity matrix J becomes the
identity matrix I.
Algebraic manipulations of tedious length show that the weighted
estimator is of similar form to the repeated estimator.
This link
between n-weighting and experimental design enables the data analyst
to conceptually view the n-weightings as weakening the correlation bonds
between the independent variables for a positive n-weighting, with the
reverse being true for a negative weighting.
3.3.
Restricted O.L.S. and Weighted Estimation With Infinite n-Weighting
By reference to Table 3.1, it is seen that a TI-weighting of zero
implies an n-weighting of +
00.
With this extreme n-weighting, the fitted
residual at that point is zero, and the possible relationship between
weighted estimation, and O.L.S. estimation restricted to pass through
this point must be examined.
Suppose the technique of O.L.S. estimation is used with the
restriction that the fitted function has zero residual at the point
(w~y).
It must be observed that such a restriction is of a
stochastic nature as it was not formed independently of the existing
data set.
Denote the estimator resulting from minimizing the resicual
sum of squares with a restriction
~~~
- y
=0
by
"
~(l)R •
31
Define
where
S
A
as
is a Lagrangian multiplier.
X"X~(l)R
+
Differentiating
~A - X"'I
S
with respect
= Q,
and
(3.3.1)
Ji(l)R
,.
However, the restriction is
w"'~(l)R
- Y = 0, and hence,
(3.3.2)
If the results of equation (3.3.2) are substituted into equation (3.3.1)
then,
(3.3.3)
~(l)R = 1? -
where
...
Note that 1?(l)R of equation (J.3.3) is unbiased for
estimate is unbiased for
t
because the D.L.S.
J?.
Equation (3.1.11) may be algebraically rearranged to give
...
- y),
by use of Corollary (A.l.3) of the Appendix.
(3.3.4)
32
Consider the limit of f(l) as n
.urn
n+OO
(i(l»
..
which is seen to be
=i+[
~(l)R'
~
00;
1 -1 ] (X"X)-lw(y_;),
w" (X "X)
w
from equation (3.3.3).
Restricted O.L.S.
and weighted regression with infinite weighting yield the same numerical
estimate.
The variance of the stochastically restricted estimator is
=
(X"X)
- 2[
-1
X"[E(.£.£")]X(X"X)
1
w"(X"X)
+[
-1 ]
-1
(X ..X) -lw E[
w
1 -1 ] 2 (X"X)
w"(X"X) ~
(e~")-w"(X"X) -lx"(.£ .£") lX(X"X)-l
-l~[ E(ee)-~" (X ..X) -lX"E (.£e)-E(e.£")X(X"X) -l~
A
The following expressions are needed to rewrite V[~(l)R] in a more
usable form:
E(ee)
= a2 ,
= Icr 2 ,
and,
=
2
[O,O, ..• ,O,a ,0, .•• 0]
i
=
.. 2
h a.
th
l'
position
33
The observation w is linked to the X matrix by
variance-covariance of
~(l)R
X~
= w~,
and so the
is,
(3.3.5)
From equation (3.2.8) the variance-covariance matrix of the weighted
point estimator is,
•
If Corollary (A.l.3) of the Appendix is applied to equation (3.2.8) and
the limit of V(].(l)R) considered as n
.urn
V(S(l)R) = [
~~
n-loOO
1 -1 ]2
(X ~) w
+ "",
then
(X~X)-lw w~O{,10-l+W~(X~X)-lw,
(3.3.6)
which, on rearrangement of terms, is seen to be the same as that
~(l)R
obtained for the estimator
in equation (3.3.3).
It follows that
stochastically restricted a.L.S. and infinite weighted point estimators
are the same.
At first glance it would appear that as we have a form of restricted
least squares, the difference in variance-covariance matrices,
A
[V@) - V(S (l)R]' is positive semi-definite.
Inspection of equation
"
(3.3.5) indicates that [V(~) - V(S(l)R)] is negative semi-definite if
(3.3.7)
Since
-1
Wi~(X~X)·
-
potent matrix
w.
-l.
is the i th element on the main diagonal of the idem-
X(XAX)-lX, this condition is always satisfied.
34
The extension of these concepts to show that restricted least
squares estimator for "r" restrictions, is the same estimator as the
weighted estimator (for "r" points bearing infinite weighting), follows
by straightforward algebra.
It should be noted that the number of
restrictions cannot exceed the number of independent variables.
3.4.
Changing the Sign of an Estimate by Use of Weighted Estimation
When dealing with situations where near multicollinearity exists
between the predictor variables, estimates of beta-coefficients with
"the wrong sign" are often encountered.
sign of the estimated
~oefficient
The "wrong sign" implies the
is in contradiction to what prior
information, intuition, or physical constraints dictate.
Weighted
regression may be used to provide estimates with the "correct sign."
The technique may be regarded as having two stages; the first stage
consisting of O.L.S. estimation to provide estimates of the regression
coefficients, the second stage consisting of reanalyzing the problem
with a point weighted such that the resulting estimator has the desired
sign.
The specific magnitude of the estimate has to be determined by an
iterative search technique.
Suppose Sj is the coefficient that has the incorrect sign.
i th point be weighted so that the weighted estimator
sign to
Bj •
S.
J
Let the
is opposite in
If ~ is the zero-vector, except for the value 1 in the jth
element, then the weighted estimate of S is
j
- = sj l(l)·
-
Sj
A
If 6 has the incorrect sign then
j
(3.4.1)
35
(3.4.2)
which, from equation (3.4.1) may be expressed as,
A
_
~ ..SJ~l.j ~(l) - <
a.
0.4.3)
.
(3 ••
3 4) ,wi t h t h e 1. t h
.
.
.
From equat10n
p01nt
carrY1ng
n-we i gh t1ng
n.,
1
-
~(l)
= 8
A
-
+
t
ni
1 ] .(X.X) -1w. (y .-y.) ,
1 + n.w. "(X"X) - w . ·
-1 1 1
1-1
-1
A
and substituting for i(l) in equation (3.4.3) yields the condition for
change in sign, namely
i"~4"i + ni{~"::t4"i(~"(X"X)-1~) + l.9.j-":i"(X-"X)-~(Yi-Yi)}
(1 + n.w. -"(X"X)-lw.)
J.-:I.
<
-:I.
or,
(3.4.4)
because
Therefore, weighting the observation "i" by any n-weighting less than
the right hand side of equation (3.4.4), will result in the jth weighted
estimate being opposite in sign to the jth
a.L.S.
estimate.
There is no
guarantee that the condition in equation (3.4.4) can be attained, as
feasible n-weighting requires all n-weightings larger than -1.
a,
36
As a specific value is not required for the weighted estimate, it
follows that the principal use of this technique will be in conjunction
with other criteria.
greater depth.
Section 5.6 of Chapter 5 discusses this aspect in
31
4.
WEIGHTED RIDGE REGRESSION
The ridge regression technique of Hoer1 and Kennard (1970) will
be applied to weighted point and weighted estimation.
The main
properties of the ridge technique have been discussed in Section 2.5 of
Chapter 2.
First, the basic properties of the weighted point and
weighted estimators are derived as these properties will be used
repeatedly in comparing weighted ridge estimators with other estimators.
These relationships are investigated in Section 4.2 with emphasis given
to the comparison of weighted ridge estimators with the ridge estimator.
Section 4.3 investigates the properties of estimators derived by fixing
the n-weightings, and allowing the k-va1ue to vary, this being "the
reverse" of the usual ridge-procedure.
The problem of comparing bias-
squared for two estimators is examined in Section 4.5 where a geometric
argument is proposed and examined.
Finally, in Section 4.6, the
sequential ridge estimator is investigated.
This sequential ridge
estimator will be used in the calculation of specific n-weightings in
Chapter 5.
4.1.
Basic Properties of Weighted Ridge Regression
The weighted point ridge regression estimator is defined as
~*
f(l)
(4.1.1)
which may be expressed as,
(4.1.2)
by use of equation (3.1.11).
38
Similarly, the weighted ridge regression estimator is defined as
-*
_$
=
(X~Q
-1X + kI) -1-1
X~Q
y
n
no
(4.1.3)
0
and may be expressed as,
n
s*
=
(X~X
+
n
o
l
niwiwi +
i=l
kI)-l(X~Y
+
0
I
n.w.y.),
i=l
~---:J.. ~
(4.1.4)
by use of equation (3.2.21).
Since
and
equation (4.1.1) may be rewritten as,
-*
-
-1
-1~(1) = ~(1) - k(X~Q1 X + kI) ~(1)·
The variance of
-*
~(1)
(4.1.5)
is given by
V(i(l) ) = {(X"'Qi1X+kI ) -Ix ~Qi1X}V(S (1»){ (X~Qi1X+kI)-lX~Q~lX}'",
(4.1.6)
and by use of equation (3.1.8) gives,
(4.1.7)
Equations (3.2.8) and (3.1.12), permit (4.1.7) to be rewritten
as
*
V (E.(1»
=
The bias-matrix of
-*
~(1)
is defined as
•
(4.1.9)
..*
Substitution for !(1) from equation (4.1.5) in equation (4.1.9) yields,
(4.1.1u,
which may be rewritten as,
~*
B(,§(l»
2
1
-1
- k [(X1C + kI) + n~"]- ~~"[(X"X + kI) + n~1.
(4.1.11·
For the weighted ridge regression estimator, the variance is
V(~*)
r n .1- [x-x + nr/n
n
(n 1.2
~X-X+k1)
i=l i ...1-1 J
i = l : ? ' -1-1 l(
1
• [(X-X+kI) +
w w
)w
n
I
i-I
ni~i~i
J-
W -]
T
1
2
(4. L :
'J.
and the bias-matrix
(X 'X+kI) + n
[
r
i=l
(4.1.
One of the reasons an analyst might be willing to abandon D.L.S.
favor of weighted least squares is that the D.L.S. solution is
i1
1
too
extreme," in the sense that the Euclidean length of the vector is
unacceptably large.
This expected Euclidean squared-distance from
tht~
weighted estimator to the true parameter vector is equivalent to the
trace of the variance matrix.
By the Gauss-Markov theorem, D.L.S. is
the best linear unbiased estimator, and consequently there is no
weighting for which the expected Euclidean length of S is less than
Lt,,,
40
of~.
Hoerl and Kennard (1970) showed the Euclidean length of the ridge
regression estimator is smaller than that for the O.L.S. estimator, and
a similar statement may be made for the comparison of the weighted ridge
estimator with the weighted estimator.
estimator is
~*
~
~*
~~
the Euclidean norm
The Euclidean strength length of the
, and the positive square root of this quantity is
II ~-* II.
By definition of weighted ridge regression
and may be expressed as,
~~
(4.1.14)
= H~ •
The Euclidean norm of
-*
~
/I §* /I
in equation (4.1.14) is
=
II MS /I ,
<
"M II • II 8 II .
(4.1.15)
The Euclidean norm of the matrix M is
+...•.+
,
..
where
=
Aj
f. +
J
It therefore follows that
2
~Mp
II
~* /I
<
k
II § II
for positive k value.
This
indicates that the use of the ridge technique with weighted least
squares will always result in an estimator that has smaller Euclidean
41
length than the initial weighted estimator.
The Euclidean length of
weighted ridge regression when compared with the Euclidean length of
ridge regression, is a function of the specific n-weightings used.
The
implication is that it is possible for the weighted ridge estimator to
be larger (in Euclidean norm) than ridge regression estimator, which may
not be desirable from the data analyst's point of view.
In the development of criteria for selection of n-weightings it is
of importance to know the residual sum of squares.
For any estimator
(call it ~+), the residual sum of squares is defined as
~+ = (Y - XS+)~(Y - Xs+).
-
-
-
-
(4.1.16)
Equation (4.1.16) may be written as a difference of two parts, one a
"total sum of squares," and the other as a "regression stun of squares."
For weighted regression,
(4.1.17)
and for weighted ridge regression,
-* ).
~
(4.1.18)
If the relationship,
is used with equation (4.1.18) then,
(4.1.19)
42
If all the n-weightings are zero, the ridge regression residual sum
of squares is,
(4.1.20)
These expressions for residual sum of squares are used in Section
5.4 of Chapter 5, for the determination of specific n-weightings.
4.2.
The Relationship of Weighted Ridge Estimators to Other Estimators.
The relationship between weighted point ridge estimation and ridge
estimation can be expressed as
*
~(l)
= ~* + [
1
n] Fw-
+ nw"Fw
*
(y-y ),
(4.2.1)
where
F
=
y* =
(X~X
+ kI)
-1
,
w~l3*
and
13* = FX"y.
The derivation of equation (4.2.1) follows that of equation (3.3.4) with
(X~X)-l replaced by F, and
y replaced
by y*.
Weighted ridge regression
can also be related to standard ridge regression by writing
n
[(X~
+ kI) +
as
F - [(X ~X
+ kI) +
o
L
43
and substitution in equation (4.1.4) to obtain,
~* • [1 + F( io
,,]-1~.
1
nrnowowJF2[ni_~1 ni!!iY]~
J.-J.-~
Ilt!!i!!i
i-I
+
[(X-X + kI1 + i=1
(4.2.2)
Consider the comparison of the variance-covariance matrix of the
weighted point ridge estimator, and that of the weighted point estimator.
Rearrangement of terms in equation (4.1.8) yields,
-*
V(~(l»
- - -1 [X"X+n(n+1)~"][X"X+k1]
- - -1 2
= [X"X+k1]
(J
=
[I + k(X"X)-lr 1 V(~)[1 + k(X"X)-lr 1 •
Let the eigen-va1ues of
-
V(~)
2
be denoted by -A 1(J 2 > ••• > A (J.
- vp
v
- - -1 -1
the matrices [I + k(X"X)
]
and
-*
V(~(l»'
(4.2.3)
-
V(~)
Since
commute, the eigen-va1ues of
-* 2 , may be written as,
denoted by Avi(J
(4.2.4)
where ~.]. is the i th eigen-va1ue of
- -
X"X.
The eigen-va1ues Av].., A].. are both positive and so for positive k it
follows that,
(4.2.5)
implying that
IV(~(I»
-
V(~~l»]
extension of equation (4.2.5) is that the difference
positive definite.
A natural
is positive definite.
-
V(~)
-
-* )
V(~
is
44
The weighted ridge technique will be considered as being an improvement over weighted least squares if there exists a k-va1ue such that the
difference between the variance of the weighted estimator and the mean
square error of the weighted ridge estimator is positive definite.
Let the canonical forms be defined as,
n
0
P1\P'" = X...X +
n.w.w. ... ,
I
i=l
I.-I.-I.
no
x.. x + I
p1\ p'" =
k
i=l
n
PQP'" =
... + kI,
n.w.w.
I.-I.-I.
0
l
n.(n.+1) w.w.
i=l
I.
I.
-I.-I.
where,
P is orthogonal,
- 1\k = 1\ +
1\ is diagona1,bearing the eigenvalues of X"'X,
kI,
The variance of the weighted estimator may be written as
(4.2.6)
The mean square error of the weighted ridge estimator may be written as
(4.2.7)
The difference between the variance of -S and mean square error of -*
S
may then be written as,
45
(4.2.8)
where D is positive semi-definite as the kernal matrix of equation
(4.2.8) is positive semi-definite.
Consider any real, non-null vector
~
and define G(k) to be
(4.2.9)
which may be written as,
G(k)
=
l:
p
2
k ~2kAi
1
N}
i=l
A
L
p
i=l
i
(4.2.10)
For the O.L.S. case when k = 0,
2 2
2£iO"
G(O) =
-2
i=l A.
1
p
L
and is positive for all values of
neighborhood of k = O.
for any
&,
then
N*
MSE(~
~
(4.2.11)
and is continuous in the
If a positive k-value is taken such that G(k) > 0
) <
N
V(~),
and therefore a positive k-value does
exist such that the mean square error of weighted ridge regression is
less than that of weighted regression.
Suppose weighted point ridge regression is compared with standard
ridge regression.
v (i\~l» = [F
Equation (4.1.8) may be written in the form,
- (l-ln;'~) F~'F J[X'X-tn (n+2)~'] [F -(l"";'Fi/ )~'F]
2
0
(4.2.12)
,
46
On rearrangement of terms, equation (4.2.13) may be written as,
+{
(/
(l+nw"Fw)
2} {kAn
2
+ Bn }
(4.2.13)
where,
A = Fww"F 2 + F2ww"F,
--
--
(4.2.14)
and
Since F is positive definite, it follows that
~ .. ~
is positive.
ww" is positive semi-definite, of rank one, it follows that,
As
F~"F,
Fww"F 2 and F2~"F are all positive semi-definite.
From equation (4.2.14), the term
is always positive.
It follows that
-*
V(~(l»
positive n-weighting.
-
V(~
*)
is positive semi-definite for choice of a
This implies that ridge regression cannot be
improved upon by positive n-weighting.
If negative n-weighting is used,
it is possible to achieve an improvement.
The bias-matrices of weighted ridge regression and standard ridge
regression may be compared in a similar fashion.
be written as
Equation (4.1.11) may
47
(4.2.15)
where
The bias-squared for ridge regression is,
B(~
*)
=
2
-1
-1
k (X"X + kI) ~~"(X"X + kI) ,
(4.2.16)
B(~*) = k 2FF-1 F SB'F F-1F,
1 1-- 1 1
-1 - -*
-1
= FF 1 {B(g(1»}F1 F.
e
(4.2.17)
Now,
FF-1
1
= I + nFww",
and so equation (4.2.17) may be expressed as,
* =
B(~ )
+
-*
n{B(~(l»)}Eli"F
+ n
2-*
F~"{B(~(l)h~~"F.
(4.2.18)
In equation (4.2.18),
F~"
and
definite, giving B(~~ - B(~~l»
4.3.
-*
B(~(l»
are at least positive semi-
to be positive semi-definite for n > O.
The Changing of the k-Va1ue for Fixed n-Weighting.
In previous sections of this chapter, it has been assumed the k-va1ue
has been fixed and the effect of weighting the observations considered.
48
The analyst would want to investigate the "reverse" sequence i. e.,
fixing a set of n-weightings and then altering the k-value.
From equation (4.1.12) the variance-covariance matrix of the
weighted ridge estimator may be written as,
no
F[X~X
+
I ni(ni+2)RiRi~]FOz,
n
~
where
F
=
[(:X~X
(4.3.1)
i=l
o
I
+ kI) +
n.w.w.~]
i=l
1-1-1
-1 •
(4.3.Z)
For two different k-values (kl and k 21 equati.Qn (4.3.2) may be written as
.
n
o
F(J') = [(X~X + kJ.I) +
n
and
[X~X
+
.
o
I
1=1
n
i
(n.+Z)w.w.~]
1
-1-1
i=l
-1
n.w.w.~], j=l,Z.,
1-1-1
denoted by
~*
§(k )
Z
the variance-covariance matrix of
~*
I
~
~
~
~-1
V[~(k )] = F(l)GF(l)O
G.
by
Z
1
N
N
~-l
~
Z
= F(l)F(Z)F(Z)GF(Z)F(Z)F(l)O
(4.3.3)
However,
no
N
F
~-1
(l/ (Z)
=
F(l)[X~X
.
+
I niRiRi~
i=l
+ klI + (k Z - kl)I],
(4.3.4)
and so,
49
(4.3.5)
If (k Z - k1 ) is positive, then each matrix component of equation (4.3.5)
is positive definite.
It follows that the increasing of the k-va1ue
- V -(k ) , being positive definite t
Z
which may be regarded 'as an "improvement" in variance properties of the
from k 1 to k Z will result in
V -(k1)
resulting estimator.
From equation (4.1.13) the bias squared is
For k = k ,
1
and for k = k Zt
Now,
n . w.w. ~
1-1-1
+ kzI]
i. e. ,
N
~
kZF(Z) = I - F(Z)X~X,
and so the bias-squared matrix may be expressed aS t
Use of the re1ation t
I,
50
results in,
(4.3.6)
-
-
The difference F(l) - F(2) may be written as,
n
= [X~X +
n
o
t
n
i=l
w w ~ + k 1]-1
i-i-i
1
o
[X~X + I
n.w.w.
1-1-1
i=l
~+
k I)-I,
2
i.e.,
where
--1
Observe that Ak
1
--1
- Ak
has i
th
diagonal element,
2
[l i
~
k1 -
and for k 2 > k l is clearly> O.
A1 ~
kll
(4.3.7)
-
-
It follows that F(l) - F(2) is positive
definite, implying that all the matrix-terms of equation (4.3.6) are at
-*
least positive semi-definite, and therefore that B(~(k
2
positive definite.
» -
-*
B(~(k
)
(I)
is
51
Equations (4.3.1) to (4.3.7) show that the weighted estimation
system acts similarly to the O.L.S. system when the ridge-technique is
applied.
4.4.
Bias-Squared Comparisons for Different Estimators
As
discussed in Section 4.2 of Chapter 4, the use of positive n-
weightings reduces bias-squared at the expense of variance.
The use of
negative n-weightings will reduce variance but at a possible increase in
bias-squared.
Consider the case where all negative n-weightings have
been used and the objective is to minimize the trace of the variancecovariance matrix.
The decrease in trace of the variance-covariance matrix
matrix could have been obtained if a new k-value,
used with standard ridge regression.
k*(k*
k)~
had been
The comparison of bias-squareds of
the two possible estimators that yield equal variance reduction is now of
importance.
Let k* = k + d denote the k-variable such that,
*
tr[V(~)]
I
=
4
-*
tr[V(~)]
I
•
k
From equation (4.1.12) the trace of the variance-covariance matrix may
be written,
-* )] = ca 2 •
tr[V(~
For ridge regression with a k-va1ue of k*
*
tr[V(~)]
I
= tr[{(X~X +
=k+
(4.4.1)
d,
2
kI) + dI} -1X~X{(X~X + kI) + dI} -1 ]a,
k*
Equation (4.4.2) may be expressed in canonical form and equated to
equation (4.6.1), giving
(4.4.2)
52
p
~
l
c =
Ai
(4.4.3)
i=l (A. + k + d)2 '
1
which is then solved for d.
For the bias-squared, the difference between bias-matrices may be
expressed as
tr[B(§*)]l k
- tr[B(~*)]lk
=
(k+d)2 ~~[(X~X + kI) + dI]-2~
*
(4.4.4)
or
Zk
= ~~{(k+d) 2 [(X~X
+ kI) + dI]-
2
2
-2
- k [(X~X + kI) + X~DX] }~.
(4.4.5)
The kernel of equation (4.4.5) must be positive definite for the biassquared of weighted ridge regression to be less than that of standard ridge
regression.
Equation (4.4.5) may be written as
(4.4.6)
where,
A = k-2[(X~X + kI) + X~DX]2,
B
=
(k+d)-2[(X~X + kI) + dI]2,
(4.4.7)
and D is diagonal, with i th element equal to the weighting n •
i
The matrices A and B are both sYmmetric and positive definite, and if
A - B is positive definite, then B-1 - A-1 will also be positive definite
(Graybill, 1969).
The matrix A - B is,
53
- 2 (k+d)
-2
d (X lc + kI) - (k + d)
-2 2
d.
(4.4.8)
Rearrangement of terms results in the difference
(4.4.9)
where,
_f.
n.w.w.
1.-1.-1.
d
l(k+d)
2
]
I
(4.4.10)
'
and
(4.4.11)
If equation (4.4.9) is
posit~ve
definite,then Zk of equation (4.4.6)
will be positive, implying the bias-squared of weighted ridge regression
is less than that for k*-ridge regression for any value of
4.5.
B.
The Comparison of Bias-Squareds by a Geometric Argument
There are many instances where equation (4.4.9) is neither positive
definite or negative definite, but of indeterminant form.
It is then
difficult to determine the behavior of bias-squared for the two competing
estimators.
A technique of comparing bias-squared via a geometric
argument is now proposed.
Consider a biased estimator of
trace of this bias-matrix is
~
that has a bias-matrix
G~~~G~.
The
54
(4.5.1)
Let Q be an orthogonal matrix such that
A = S"QD Q"S,
-
g-
(4.5.2)
= y"D Y
-
g-'
where D is diagonal with the diagonal element equal to eigenvalues
g
of GG", i. e., ~ 2 , A2 , •••• , A2 •
g2
gl
gp
Equation (4.5.2) may be expressed as,
+...
t
(4.5.3)
••
This equation (4.5.3) is that of an ellipsoid in p-dimensions 1 with
the length of the i
th
semi~axis,
(4.5.4)
From elementary geometry, the volume of an ellipsoid in p-dimensions
is given by
volume =
[
TfP/2 ~
.E.)
r(2 + 1
[p ]
II
i=l
ai
•
Applying equation (4.5.4) to equation (4.5.5) gives
(4.5.5)
55
(4.5.6)
However,
....
and substitution into equation (4.5.6) yields
volume
=
1fp/2
[
r(t + 1)
]
P 2
[b. / ]
(4.5.7)
~
From equation (4.5.7) it follows that the bias-squared of an estimator
may be related to the volume of an ellipsoid centered on zero.
This
concept may now be applied to the case where the comparison of two
competing estimators resulted in a difference matrix whose eigenvalues
were indefinite.
As a geometric argument is being used it is instructive to look at
the behavior of bias-squareds for the simple case of two dimensions.
One estimator has a bias-squared b.* and the other estimator a biassquared b._.
Figure 4.1 shows the case where b._ is clearly smaller than
b.* and so the eigenvalues of the difference in bias-matrices are all
positive.
(Figure 4.2 shows b._ is smaller than b.* but, due to the
intersection of ellipses, the eigenvalues of the difference in biasmatrices being indeterminant and thus ooocuring 6., < b.*.
.1
If the ellipse
generated by b._ is rotated such that its axes coincide with the axes of
b.* the difference in bias-squareds becomes clear.
The theoretical
extension to the volume of an ellipsoid follows immediately.
56
(32
Figure 4.1.
The difference in bias-squareds
resulting in all positive eigenvalues.
Figure 4.2.
The difference in bias-squareds resu~ting
in indeterminant eigenvalues due to
different orientation of axes.
57
l
A
= d~l - d:- , and writing equation (4.5.11) in canonical
diff
Define
form yields
then 6
diff
must be positive for all values of i, (i=1,2, •••• p).
Because
~
Ai and Ai are positive, 6 diff of equation (4.5.12) may be expressed as,
(4.5.13)
and so
There are instances, however, where the rotation to equivalent
axes still results in a difference-matrix whose roots have indeterminant
form.
A further definition for comparison of bias-squareds is
necessary.
"Equivalent bias-squared" is defined to be a spheroid,
centered on zero, of the same volume as the former ellipsoid.
This is
illustrated in Figure 4.3 where the intersection of ellipses has obscured
the fact that 6* >
6~.
Conversion to equivalent bias-squareds of
Figure 4.4 clearly reveals 6* >
6~.
Equivalent bias-squared, in the form of a spheriod, implies that all
the semi-axes have the same length.
The equivalent bias-squared of
equation (4.4.9) may be written as
2
•• ••• •
~2 r*
which may be rewritten as,
1,
58
Let the bias-squared of the weighted ridge estimator be written,
n
6
n
= k2~~[(X~X + kI) +
o
L
i=l
n
w w ~)-2 S,
(4.5.8)
i-i-i
Let the bias-squared of ridge regression,with a k-value of k* such that
there is trace-equality of variances, be written
(4.5.9)
In each case the bias-squared has been represented in canonical form with
reference to its own axes.
These bias-squared equations may be converted
to ellipsoids on common axes by replacing both a and a by
The difference in volume due to 6* compared to 6
s.
is then
(4.5.10)
-
As D and D* are positive definite and symmetric, then for D* - D to be
~-l
positive, D
-1
- D* must be positive definite (Graybill, 1969).
implication is that the ith diagonal elements of
The
~-l
-1
D -D* satisfy the
condition
a-i l
- *
d- l >. 0 , f or a 11 va 1ues i=l, •••• p,
i
where
and
*-1
th
1
2
d i is the i
eigenvalue of [k~ X~X + I] •
(4.5.11)
59
~-~t---it------t----""'''''-''''''B
Figure 4.3.
The difference in bias-squareds resulting
in indeterminant eigenvalues due to the
interaction of ellipses.
f:"rv(Equivalent)
....---++---f----1H---..... s
Figure 4.4.
The difference in bias-squareds resulting
in positive equivalent eigenvalues.
60
where
2
r*
[ P
IT (1 +
. 1
1=
=
J
-1
~ A.)
1
IIp
-1
(4.5.14)
~.
If equivalent bias-squared is applied to weighted ridge regression
then equation (4.5.8) may be expressed in terms of the (rotated) B-axes as
(4.5.15)
where D is a diagonal matrix with i th diagonal element
d.1
For ridge regression with k-value k*,
(4.5.16)
where D* is a diagonal matrix, i th diagonal element d*
i
(1
-1
+ k*
Ai)
-2
•
These may be converted to equivalent bias-squared by use of equation
(4.5.14).
The condition for the equivalent bias-squared k*-ridge
regression
(~*)
E
r1'd ge regression
to be larger than the equivalent bias-squared k-weighted
(A~)
u_
may be expressed as
E
~*
and is clearly true if r; -
E
> ~~ ,
r2
>
o.
(4.5.17)
Equation (4.5.17) may be written as
(4.5.18)
~E
and when
due to
~
~
of equation (4.5.18) is positive, the equivalent bias-squared
is less than that due to
~
*•
In summary, the most desirable condition (from the analyst's
viewpoint) is that equation (4.4.9) be positive definite.
When this
61
fails, the weaker condition of equation (4.5.13) being positive may be
investigated.
The weakest of all the conditions is that of equivalent
bias-squared of equation (4.5.18).
Should an estimator fail to satisfy
equation (4.5.18), serious doubts as to the appropriateness of weighted
ridge regression should be entertained by the analyst.
4.6.
The Sequential Weighted Ridge Estimator
It is not possible to estimate all n
o
n-weights simultaneously.
One technique is the linear or sequential estimation of n-weights.
The
specific techniques of estimation of the lth weight will be deferred
until Chapter 5.
In this section it will be assumed that (l-l)
weightings have been determined, and the basic properties of the
estimator involving the lth n-weight, n , will be determined.
l
The
following definitions are necessary:
9.,
X9., '"X = X'"X +
x9., '"X 9.,
I
i=l
9.,
= X'"X +
L
i=l
n.w.w.
1.-1.-1.
(4.6.1)
n.W.Yi'
1.-1.
(4.6.2)
w '" = 9.,th row of X,
-9.,
Y9., = 9.,th observation of ~,
and
It may be shown that,
~*
~*
f3
= f3
-(9.,)
-(9.,-1)
(4.6.3)
62
(4.6.4)
(4.6.5)
Before investigating the properties of the sequential weighted ridge
estimator, a theorem concerning the behavior of the eigenvalues of
-.
X~X
is necessary.
The behavior of these eigenvalues is the key to the
understanding of the characteristics of the weighted ridge estimator
under a sequential estimation scheme.
X<X
Theorem 4.1. The ith eigenvalue of
ith eigenvalue of;
X~X
is greater than or equal to the
i f the n-.weighting is positive, but less than or
equal to if then-weighting is negative,
Proof.
By definition,
(4.6.6)
where
w~
represents the single observation weighted, n represents the
specific weighting for that observation.
sYmmetric, but only
X~X
Both
X~X
and
nww~
is known to be positive definite.
a non-singular matrix Q such that Q~X~XQ
..
=I
and Q~(n~~)Q
D is a diagonal matrix with diagonal elements equal to the
of the polynomial equation, I~~
-
AX~X
I
are
There exists
= nD,
where
~igenvalues
(Graybill, 1969) •
Thus,
=I
..
+ nD.
(4.6.7)
63
Note that when n = 0,
where A =
and P is an orthogonal matrix.
P~X~XP,
The characteristic
polynomial is by definition,
This polynomial may be expressed as
and the equation,
(4.6.8)
is the polynomial equation for the determination of the eigen-va1ues of
~~(X~X)-l. The matrix ~~ has a rank of unity and it follows that
-1
~~(X~X)
has exactly one positive eigen-va1ue.
The conclusion is
that D of equation (4.6.7) is positive semi-definite.
From equation (4.6.7) it follows that the i th root of X~X, denoted
by Ai' may be expressed as,
where Ai is the i
~
th
It follows that Ai
root of X~X, and
~
di
is the i
-
Ai if n is positive, and Ai
th
~
diagonal element of D.
Ai if n is negative.
Before the properties of the sequential weighted point estimator
can be determined, a theorem concerning the maximum and minimum of the
quantity
~iFt~t
Theorem 4.2.
~iFt~t
is necessary.
If positive n-weighting is chosen for the tth point then
is bounded by zero and unity.
If negative n-weighting is used,
64
~QF~~£
is positive, but unbounded.
Proof.
First consider a positive n-weighting for the £
Corollary A.l.3 of the Appendix to
F£ =
F~_l -
[1 + n
F~
th
point.
By applying
then,
:~·"F £-1-£
w] F~_l~£~~"F£_l,
(4.6.9)
~-£
and therefore,
=
(1
+
n
(4.6.10)
w"F lW)
£-£ £--~
However,
which is positive definite whenever n i > 0 for i=1,2, •••• (£-1).
From equation (4.5.10) the conclusions are that,
and
" F £-l~~
O _<~£
" F £~~, f or
_<~£
n~
<
o.
In the determination of the variance-covariance matrix and the biasmatrix of the sequential weighted estimator it is to be assumed that the
first
(~-l)
points have been weighted and the n-weighting (according to
some criterion) for the ~th point is sought.
From equation (4.5.4),
65
R,
= F [X~X + I n (n.+2)w w ']Foo2t
R,
i=l i 1
-i-i
N
(4.6.11)
where
R,
G-R, l = X'X +
I
i=l
,
n i (+2)
n.
W.W.,
1
-1-1
Applying equation (4.6.9) to equation (4.6.11) yields t
(4.6.12)
where t
-1
-1
~
AR,
= FR,_l~R,~R,~FR,_l(I - GR,_lFR,_l) + (I - FR,_lGt_l)FR,_l~t~~ FR,-lt
BR,
=
FR,-l~R,~R,'FR,-l[I
~(
,
(~~R,FR,-l~R,) G~:lFR,-lJ
-
-1
+ L~ R,FR,-lGR,_lFR,_l~R,
)
(
I -
~
)
-1 ]
'
~ R,FR,-l~R, FR,_lGR,_l FR,-l~R,~R, Ft - 1 •
It may be demonstrated by methods similar to those used in
-*
Section 4.2 of Chapter 4 that V(g(R,»
-
~*
V(~(R,-l»
is positive semi-
definite for a choice of nR, > 0t regardless of preceding (R,-l) nweightings.
-*
V(~(R,»
-
It follows there exists a negative n-weighting which makes
*
V(~(R,-l»
negative definite.
The bias-matrix of
~~R,)
~*
B(~(o»
N
is given bYt
2
= k FR,-BB'F R,
(4.6.13)
66
and by use of Corollary (A.l.3.) of the Appendix may be rewritten as
+ n 2F
w w'{B(S-* .)}w w'F
~ ~-l-~-~
-(~) -~-~
~-l
(4.6.14)
.
The terms in equation (4.6.14) are at least positive semi-definite when
n~
>
0, and so the use of a positive
n~-weighting
decrease in bias-squared from the preceding stage.
n~-weighting
will result in a
Use of a negative
may possibly cause an increase in bias-squared.
If the experimentor has the reduction of the trace of the variancecovariance matrix of the weighted ridge estimator as the criterion of
interest, then the change in trace of the variance-covariance matrix with
n-weighting is of importance.
From equation (4.1.11) expressed in
canonical form,
(w ~ p. )2 a 2
-
=
i=l
-1
(4.6.15')
i=l
If the Cauchy-Swartz inequality is applied to (~~£i)2 of equation
(4.6.15), then,
as P is an orthogonal matrix.
on equation (4.6.15) as being,
An upper bound may therefore be placed
67
p
..*
tr{Vmax (~(l) )} =
[~.+n(n+l)w"w]cr2
I
1.
-
-
(4.6.16)
i=l
From Theorem 4.1,
xi
= A
i
+ nd
i
,
and as n becomes very large with respect to the other parameters,
then ~i + Nd , and it follows that,
i
p
lim
n+N
..*
I
tr[V(~(l))] =
[N(N+l)ww" + Nd ]cr 2
.'
", --
i.
(4.6.17)
i=l
As N + 00, equation (4.6.17) becomes in the limit,
p
=
(j
2
w~w
L
--2
d.
i=l
(4.6.18)
1.
. Each term of equation (4.6.18) is positive and monotonically increasing
with increasing N and it follows that equation (4.6.18) is monotonically
increasing with increasing n-weighting.
*
V[~(l)]
The practical implication is that
gradually increases as n-weighting gradually increases.
Consider the trace of the bias matrix given in equation (4.1.15),
tr[B(~~l))] = tr[k2(x~X + kI)-l~~ ..
=
k2~ .. (x ..
x+
(x"x +
kI)-l] ,
kI)-2~.
(4.6.19)
In canonical form this becomes,
2- ..--2 ..
= k ~ Ak ~
p
a-2
2
i
I
= k
i=l 6 + k)2
i
,
..
(4.6.20)
68
From Theorem 4.1,
(4.6.21)
and if equation (4.6.21) is applied to equation (4.6.20) then the bias-
-*
squared, tr[B(fi(l))]
is monotonically decreasing for increasing positive
n-weighting.
In conclusion, note that for positive n-weighting, the trace variance
increases while the trace bias-squared decreases.
The reverse is not
necessarily true for negative n-weighting, further discussion being left
to Chapter 5.
69
5.
THE CALCULATION OF n-WEIGHTS
In Chapter 4 the variance and bias-squared of the weighted ridge
estimator were determined assuming the n-weightings were known.
The determination of these n-weightings will now be investigated.
The
advantage in using a sequential technique lies in the ease of computation and in the wayan analyst can determine variance and (relative)
bias improvements at any stage of the sequence.
The disadvantage lies
in the fact that,although each stage of the sequence may have an
optimal n-weighting, there is no guarantee the final complete set of
n-weightings is optimal.
The criteria for calculation of optimal n-weightings to be
considered are:
5.1
A-optimality of the Variance-covariance Matrix
5.2
A-op timali ty of the Bias-squared Matrix
5.3
D-optimality of the Variance-covariance Matrix
5.4
Minimum Residual Sum of Squares
5.5
Stability of Estimated Parameters
5.5,1
The
5.5.2
The
5.5.3
The
5.5.4
The
* standard
Q* * standard
S* standard
S* * standard
Q
5.6
Sign Change of an Estimated Parameter
5.7
Iterative Sequential n-weighting.
This listing is not an exhaustive compilation of criteria, but a
collection of most commonly used, or literature cited, criteria.
70
5.1
A-optimality of the Variance-covariance Matrix.
The criterion of A-optimality seeks an n-weighting such that the
trace of the variance-covariance matrix is a minimum.
that the first
It is assumed
n-weightings have been determined and the
(t-1)
A-optimal n-weighting for the
~th
point is desired.
(4.6.11) the variance-covariance matrix of
-*
V(~(£y
-1
= F£C£
F~0
-*
~(~)
From equation
is
2
(5.1.1)
-1
~ -1 -1
-1
-1 2
[F 1 _l + n1!1!1 ] [G 1_1 + nl(nl+2)!1!lA][Ft_l + n R!1!t 1 a
If
equati~n
(4.6.9) is applied to equation (5,1,1) and the trace of
the matrix taken, then
(5.1.2)
where
71
If
-*
tr[V(~(~))J
in equation (5.1.2) is minimized, then
-a
= -2-b-----a-c
n~(A-optimal variance)
and is non-zero providing
'
(5.1.3)
a f 0 •
The trace of the variance-covariance matrix for this particular
A-optimal n-weighting is given by
a
2
4(b-ac)
Observe that
-*
-*
tr[V(~(~))J~ tr[V(~(~_l)J
equation (5.1.4) is positive.
,provided
(5.1.4)
(b-ac) of
The inference to be made is that the
points may be sequentially weighted by use of equation (5.1.3), and the
trace of the variance-covariance matrix reduced at each operation.
Tedious
but straightforward algebra shows that a, g, and (b-ac) are all positive.
Similar algebra shows that
a - (2b-ac) < 0 ,
which is sufficient to guarantee the
is feasible
5.2
n
t
weighting of equation (5.1.3)
(n > -1) •
t
A-optimality of the bias-squared matrix
Here, the n-weighting which minimizes the trace of the bias-matrix
is sought.
From equation (4.6.13),
If equation (4.6.9) is applied to
and the trace taken, then
72
where
2
r'l
= ~t 'F t-l~t
t
R
1
2
= -S'F t-l-~-~
w w 'F
S
~-l-
and
If
-*
tr[B(~(~))J
of equation (5.2.1) is minimized then
n~
(A-optim~l
bias)
=
(5.2.2a)
or
=+
00,
(5.2.2b)
where
From Section 4.3 it is known that the trace bias-matrix decreases
with increasing n-weighting.
The conclusion to be drawn is that the
nt(A-optimal bias) given by equation (5.2.2a) is negative.
It may be
shown, however, that this negative n-weighting is always less than -1,
and consequently infeasible.
The only recourse for the analyst is to
give the tth point infinite n-weighting.
Note that the trace of the bias-squared matrix using
n (A-optimal bias)
=+
co
is given by
(5.2.3)
73
5.3
D-optimality of the Variance-Covariance Matrix.
The criterion of D-optima1ity seeks the n-weighting such that the
determinant of the variance-covariance matrix is a minimum.
A-optimality criterion, the n-weighting for the
~th
As in the
point is needed.
From equation (4.6.11) the variance-covariance matrix of
is
and minimizing the determinant of
-1
2
min I F.R. G.R. F.R. (J 1
The optimal
=
(5.3.1)
n.R.-weighting will therefore be that which maximizes
(5.3.2)
By repeated use of equation (A.2.2) of the Appendix it follows that
maximizing
V of equation (5.3.2) is equivalent to minimizing
D
(5.3.3)
Minimizing
DD
of equation (5.3.3) yields the n-weighting
n (D-optimal variance)
=
~.R.~F.R.-l~.R. - ~~~G.R._1~~
(1-~.R.~F.R.-l~.R.)~.R.~G.R.-1~.R.
(5.3.4)
74
The
n~-weighting
of equation (5.3.4) may be seen a little clearer by the
following argument.
Suppose
~~G£_l~~
(l-~~~F~_l~~) >
O.
Observe that both
~~~F~_l~~
and
are positive regardless of the choice of preceding n-weights.
Consider
(5.3.5)
If
G~_l
-
F~_l
be positive.
of equation (5.3.5) is positive definite, then
G~_l
Requiring
lent to requiring
-1
F~_l
-
-1
F£_l - G~_l
= kI
£-1
-
L
i=l
must
to be positive definite is equiva-
is positive definite (Graybill, 1969),
n.(n.+l)wiw.~ •
1
1
(5.3.6)
--1
Equation (5.3.6) is certainly positive definite if all the
between -1 and O.
h
n 's
i
are
With a sequential procedure, the first
n£(D-optimal variance) will be negative, and so all the remaining
optimal n-weightings will be negative.
If the quantity
(l-~~F£_l~t)
is negative, the n-weighting satisfying equation (5.3.4) may not exist.
As
~~~
determinant of
of equation (4.6.13) is singular, it follows that the
F£~~~F~
is zero.
This implies it is not possible to
search for a D-optimal bias n-weighting.
75
5.4
Minimum Residual Sum of Squares.
The minimum residual sum of squares criterion is of use when ridge
regression has caused the residual sum of squares to become unacceptably
large.
Rather than choose a different k-value it is decided to use
weighted ridge regression to reduce the residual sum of squares.
It is
necessary to make the assumption that the function fitted, up to and
including the
(l-l)th point, does not pass through the 1th point to
be considered.
Should this be the case, then any feasible weighting
for
would be acceptable as the
~esidual
at that point is zero.
The residual sum of squares is defined as
(5.4.1)
From equation (4.3.3),
-*
_
= -(R.-l)
13
13(n)
IV
where
Y1*
-li;
= ~1~(Q,-1)'
If equation (4.5.3) is used with equation
(5.4.1), then
i~l) = ~ (R.-1)
+(l+n
w2~;
1-1
w] (y 17
~-1-1
:)~R. 'F 1_lX"X(~( l-l)-ffi
(5.4.2)
Min;J.mization of
-*CO
l1>
results in the optimal n-weighting,
76
The residual sum of squares for the n-weighting of equation (5.4.3) is
given by
~*
~*
~(t)(optimal residual S.Sqs.) = ~(t-l) -
• (5.4.4)
The form of equation (5.4.4) suggests use of the secondary
criterion, order of selection of weighting points.
If the Cauchyof equation
Schwartz inequality is applied to
(5.4.4), then
(5.4.5)
For the'~O[minimumJ ot equation (5.4.5) to be attained, the
weighted point
(~)
"-
must be a multiple of
S
and so may be written
w = c(3
where
5.5
c
(5.4.6)
is a constant.
Stability of Estimates of Parameters.
Stability of an estimate implies that a small change in the under-
lying data structure, or data base, should not unduly change the
numerical estimate of that parameter.
In multicollinear situations,
a small change in the data base can often cause dramatic changes in the
estimates.
When using weighted ridge estimation, it is tempting to use
a sufficiently large n-weighting so that small changes in n-weighting
will produce very little change in estimate.
This large n-weighting
is misleading as the inherent instability of the estimates has only
been masked by the large n-weighting and not properly alleviated.
77
It follows that a criterion other than "rate-of-change of estimate"
with "n-weighting" is needed.
This new criterion must not be
influenced by large numerical values of n-weighting.
From equation
(4.5.4),
=
The
-*
~(Q,)
-*
~(R,-l)
n~
*
FR,_I~R,(Y~-yQ,)
+
of equation (4.3.4) may be minimized with respect to
n~
to
obtain
(5.5.l)
and this rate-of-change is clearly dependent
values of
n
will cause
d~(~)/dn~
on
n~.
Large
to be small and so indicate an
"apparent" stability.
Define a new variable,
(5.5.2)
and c0nsider the
rate-Q~-change o~
q~
with respect to
n,
(5.5.3)
By the chain-rule of differentiation,
78
*
(y~ - y~-l)
~~~Ft-l~t
and is independent of
n
Notice that
t
and post-multiplication by
~~,
~~
yields
(5.5.5)
Define
~
Q~ =
I
i=l
qi
,
(5.5.6)
t
=
I
i=l
niw.
~F iW'
-1
-1
.
This suggests that, instead of a visual plot of "estimate" against
"n-weighting", one should use a plot of "estimate" against "Qt".
Stability could then be estimated visually in the same spirit as the
"m-plot" of Vinod (1976).
If a visual estimate of stability is rejected
as being too subjective, the following standards of stability may be
applied.
79
5.5.1
The Q*
Standard.
The
Q* standard compares the rate-of-
change in the specific problem (with respect to
n~),
with the rate-of-
change of an unspecified orthogonal problem (with respect to
n~).
For
the unspecified orthogonal problem, the following definitions are the
analogs of equations (4.6.4)
and (5.5.1) to (5.5.6).
(5.5.1.2)
where
= [(l+k)I +
~-l
I
i=l
Also,
(5.5.1.3)
and
£
L q~
i=l
(5.5.1.4)
80
Define
(5.5.1.5)
When the problem has similar characteristics to an orthogonal situation,
Q*
will be zero.
t
As the orthogonal situation is the most stable
situation by definition, the optimum n-weighting is that which minimizes
Q* of equation (5.5.1.5).
t
Substitution of equations (5.5.1.1) and (5.5.1.3) into equation
Q* yields
(5.5.1.5), and minimization of
1
2
n.t =
(5.5.1.6)
Equation (5.5.1.6) implies either a positive or negative value of
is possible, depending on the precise structure of
o
F _
t l
and
nt
F _ •
t l
Secondary criterion such as variance or bias-squared considerations
could be brought into play to determine the appropriate n-weighting.
5.5.2.
the
The
standard.
The
standard is an extension of
Q**
t
Q*t standard in that the theoretical, orthogonal situation is
derived from the actual situation being investigated.
The advantage of
using a specific orthogonal matrix lies in the analyst being able to
visualize his experiment as being a "fractional replicate" of a much
larger orthogonal experiment.
design matrix
X
a
This orthogonal matrix uses the original
S, and the O.L.S. estimates.
is such that
The "new" design matrix
81
X"X
a a
where
c
is a constant, and
I
= cI
,
is the identity matrix.
Define
x
of size (2 Pn
xa =
o
x p) ,
(5.5.2.1)
where, for example,
w w
ll 12
w2l w22
X
aj
..
-w1j
, -w
2j
·,
·.,
w
lp
w2p
-w
n
· ..
w
n p
~
=
wn 1 wn 2
0
0
0
0
X denoted by w" ,w" , .•. ,w"
,
a
-a 1 -a 2
-a(2 P-l)
"new" Y-values at these points are then,
With the "new" rows of
Y"
the
(5.5.2.2)
-a
;This concept of "creating" an orthogonal design may be seen clearly in
Figure 5.1.
If ordinary least-squares technique is used with
g
:.. augmented
where
;;: (X"X )-lx"y
a a
a-a
X
a
and
Y ,
-a
(5.5.2.3)
82
..
••
*
• •
•
*
•
*
• *
-------------+----------~
•
•
•
•
•
•
•
•
••
*
••
denotes original points
denotes artificial (augmented) points
Figure 5.1.
An augmented design for p
=
2, n
o
= 6 •
Xl
83
n
X"'X
a a
=
o
L
i=l
n 2P-1
~i~i +
by definition of orthogonality.
a-a
L
w
w'"
-ai-ai '
i=l
Also,
n
X"'w
o
o
= X"'Y +
2 P-1
I
A
W
i=l
w'" (3
-ai-ai-
If the ridge technique is used with this augmented data set,
*
.§augmented
=
(X"'X
a a
+
kI) -IX "'Y
a-a
(5.5.2.4)
=
-1
[1
+ l]
2P
S
-
The following quantities are established in the same way as their
analogs in Section 5.5.1,
15*
= [(2 P+k)I
- (Q,)augmented
1
~
~ n.w.w."']- (2 PS +
n.w.y.), (5.5.2.5)
L 1-1-1
1-1-1
i=l
1=1
t
+
a
..
*
F£_l~£ (Y£-Y£-l)
(1+n£~£"F£_1~£)2
A
I
(5.5.2.6)
84
and
n~~~
a
q~
"'Fa
~-1~~
(5.5.2.7)
nJl,~JI,""F~_l~JI,
1 +
where
2
FJI,-l
r
t
1
(ZP+k)I + HL ni~i~i'"
i=l
Define
2
QJI,**
(5.5.2.8)
- 1
Substitution of equations (5.5.1.1) and (5.5.2.6) into equation
(5.5.2.8), and minimization of
**
Q~
with respect to
nJl,
yields
(5.5.2.9)
It is possible the choice of a negative value for
(5.5.2.9) results in a non-viable n-weighting
n~
(nJl, < -1).
should be automatically excluded and the positive value of
from equation
Such values
n
used in
the next stage of the sequential estimation process.
5.5.3.
The
S*
standard.
Vinod (1976) proposes a measure of
stability of estimates that is not unduly affected by magnitudes of
k-va1ues or estimated parameters.
terms of the eigenvalues of
Using a new variable (m), defined in
X.. . X and k-va1ue chosen, a standard
(S)
proposed which compares the stability of situations being investigated
is
85
with an orthogonal situation.
The k-va1ue is then altered until the
S standard is a minimum; this value of k is -then knmm as k
The
*
S~
stability'
standard is an adaptation of the S standard except
that the focus in on n-weighting for stability, rather than k-va1ue
for stability.
By definition of the weighted ridge regression estimator,
=
(X ~ ~X
+ kI)-l X ~y
~
- ,
(5.5.3.1)
The equation (5.5.3.1) may be written in canonical form as
~
where
D JI,
is diagonal with elements,
j = l , ... ,p.
Following Vinod, define
(5.5.3.2)
The orthogonal counterpart of equation (5.5.3.2) may be written,
(5.5.3.3)
=
pk
1
+k .
The orthogonal weighted ridge estimate is related to the orthogonal
weighted estimator by
86
follow:i.ng the development_of equation (5.5.4) then,
(5.5.3.4)
but the quantity
dn R.
_.of equation (5.5.3.4) cannot be easily obtained from
dmR,
equation.(5.5.3.2) due to the extreme difficulty of expressing
terms of
A.
J
and
I f each
in
A£j
is bounded by the maximum
each
root could attain, a solution to equation (5.5.3.4) may be obtained.
From equation (4.3.9) i t may be deduced that
the
A£j
~
nQ,A
j
2nd therefore
equation (5.5.3.2) may be written as
(5.5.3.5)
87
and the corresponding sequential weighted estimator as
(5.'5.3.6)
where
If
D~
m£
is a diagonal matrix, with jth diagonal element
of equation (5.5.3.5) is differentiated with respect to
n£ '
then
(5.5.3.7)
If
-*
~(r)
of equation (5.5.3.6) is minimized with respect to
n£ '
(5.5.3.8)
When the orthogonal forms of equations (5.5.3.7) and (5.5.3.8) are
substituted into equation (5.5.3.4),
(5.5.3.9)
Qr.thogonality
and the right-hand side of equation (5.5.3.9) is independent of
Define the
*
St
n£ .
measure of stability as
(5.5.3.10)
88
where
s =
*
(5.5.3.11)
= k
r
j=l
8£*
will be zero for a (replaced) orthogonal design matrix
a reasonable criterion would be to choose an
of equation (5.5.3.10).
X
and
that minimized
nR,
This would have to be done by numerical
approximation as a closed-form solution to equation (5.5.3.10)
could not be found for
5.5.4.
s£*
The
8 **
2
p
>
2 .
standard.
The
8£**
standard is similar to the
standard but differs in the concept of obtaining the maximum
increase in any given root.
From equation (4.3.9), using trace
considerations, it may be deduced that
\. =
J
Define
= p -
r ~i +
(5.5.4.1)
. i=1LAt +
and
(5.5.4.2)
where
is a diagonal matrix with
jth element
89
If
m~
of equation (5,5,4.1) is differentiated with respect to
n£ '
(5.5.4.3)
If
~**
~(£)
of equation (5.5.4.2) is differentiated with respect to
nt,
then
If the orthogonal forms of equations (5.5.4.3) and (5.5.4.4) are
substituted into equation (5.5.3.4), then
=
orthogonali ty
[Pk;>~J
l
and the right hand side of equation (5.5.4.5) is independent of
Define the
S**
n.
measure of stability as
~
p
=
L
(5.5.4.6)
j=l
where
w
dmQ,
dn£
=
(5.5.4.7)
90
Although
S2* and
St**
are appealing in their approach to stability of
estimates, the computational aspects of repeated iterations will tend to
discourage an analyst from using them.
5.6
Sign Change of an Estimated Parameter,
There are instances where the ridge regression solution to a
problem contains an estimate with an incorrect sign.
A reversal in the
sign of a particular estimate is possible by judicious weighting of an
observation.
A single point
(~i)
may be weighted such that an
estimate known to carry an incorrect sign will have an acceptable sign
after the weighting process.
Let
B*j
be the coefficient with the
incorrect sign, and consider weighting the ith observation.
By an
argument that parallels the derivation of equation (3.6.9), it may be
shown that the required weighting is any value of
n
such that
(5.6.1)
5.7
Iterative Sequential n-weighting.
The iterative sequential technique starts by first determining an
initial set of n-weightings obtained by reference to some criterion
(for example, D-optimality of the variance-covariance matrix).
The
points are then reweigh ted in a sequential fashion, and the process
repeated until no further improvement on the solution can be made.
Consider the
tth stage in the mth iteration and let the estimator,
variance-covariance matrix, and bias-matrix be expressed i.n terms of
the
Q
-matrices defined in Chapter 3.
91
Define
-1
U
to be
X
m
to be the X-matrix for the mth iteration,
m
(Xm
e*{m}
=
the;~
weighting for the mth iteration,
Um~m_1~""'U2~1~X)
,
to be the weighted ridge regression estimate for the
mth iteration,
For the first iteration define
(Xl~X1 + kr)-lXl~~l
'
(5.7.1)
(X~U X + kr)-lx~u y
1
1-
.r t may be shown tha t
--*
.
V[!3 {I}]
(5.7.2)
and reduces to equation (4.1. 7) as
Similarly, the
bias~matrix
e·
(5.7.3)
reduces to equation (4.1.10).
92
For the
mth iteration define,
(5.7.4)
-*{ml ]
(5.7.5)
-*
(5.7.6)
V[s
B[ S [m} ]
where
m
U*m = I1 Ui
•
1
Now consider the tth sequential point in the
(m+l)th iteration.
Define
X
m
=
!;j
U X
*m
'
and
X* = U X.
m
*m
In terms of the actual points, it should be noted that
and
X
'f(
1=
m
*
LwI*
]
- ,m ' ••• ,w
-n ,m •
a
Let
=
~X"'X +kI)
Lmm
~
+
L
i=1
n.w
1
WI
j
-1
i,m-i,m
(5.7.7)
93
and
Gn{ m+1}
;t"
=:
X*"X *
m m
(5,7.11)
The iterative analogs of Sections 5.1 and 5.3 are then easily
established.
For A.. . optimality of the variance-covariance matrix, the
n-weighting for the
~th
point, (m+1)th iteration, is given by
d
Ze -
(5.7.12)
df '
where
d
= EC~~t,mF~-l
{m}
~t,d -~t,mFt-1{ m} G~=l
-
e
=
~;,mF~_l {m}
{m}
F~_l {m} ~t,J
Gt - 1 {Ull F i-I 1m}
~i,J '
G~ ~~ ,mF~_l {m} w£ ,~+ E~~ ,mF~-l mt~£,m) (~~ ,mFt-1{ m} G~=l {m}"lit ,m ~
f = W
n" F n 1 {m} W n
-;t",m
;t,,-;t",m
and where
,
is the R-th element of
U
m
94
For the criterion of D-optimality, the weighting for the tth point
on the (m+l)th iteration is given by
•
(5.7.13)
The advantage in using an iterative scheme is that it may be used in
conjunction with a secondary criterion to produce a given optimal
stopping-rule of iterative sequences.
In summary, five different criteria for the selection of
n-weightings have been discussed.
The complexity in solution for
optimal n-weighting with the stability criterion will cause the
analyst to consider A-optimality and D-optimality in a more favorable
light.
If the final criterion of iterative sequential n-weighting is
used, the technique often finds n-weightings with favorable secondary
features.
95
6.
SPECIFIC WEIGHTED RIDGE REGRESSIONS
The concept of weighting the observations will be extended to three
specific situations.
For eacll situation comparisons withe the ridge
estimator will be made.
a function of the
The first situation concerns the estimation of
unknot~
parameters; the remaining situations concern
differing variance structures of a regression model.
6.1
Functional Weighted Ridge Regression
The estimation of
B is sometimes not of primary importance, the
principal interest being the estimation of a linear function of parameters
which may be expressed as
\
T = !t~
Suppose
t
is the minimum mean square error estimator of
T with
the understanding that the M.S.E.(t) is equivalent to the expected
Euclidean distance squared from
t
to
T.
The eigenvalues
X~X
and have the associated eigenvectors ~1'~2'''· ,~
p
-p
- XX~
X~X
eigenvalues of
are those of
augmented by (n+n -p)
Al' ... , A
0
are
. The
zero
eigenvalues with the eigenvectors corresponding to the "p" real roots
X~1,X~2'···'X~
-p
being
to the
(n+n -p)
o
,
and the arbitrary eigenvectors corresponding
zero roots,
-p+1'-p+2'···'-n+n -p .
o
To establish the form of the estimator, first consider the single
function case, with the estimator denoted by
6.1.1.
The single eigenvector case.
-*
QF(l)
•
Suppose
T
is simply a
function of the tth eigenvector and the unknown vector of coefficients;
96
i. e. ,
T
The estimator
t
= ~-9, "'S-
may be expressed as
(6.1.1.1)
=
t
n*
are constants, and
M.S.E. (t)
E(t -
n + n
Then
o
'0 2 ,
(6.1.1.2)
Er(C9.~tX"'~
- ~;,.@)
L
=
=
.
f Ci~iX"'Y
+
-
i=l
*
+ n\-p
j~l
J2
d.~~Y
J-J-
..
where
*
I Ci~'X"'Y [Cl'~l'" + ... + Cn_1~~~1
i=l
-1
-
=
Yv
Yv
If equation (6.1.1.2) is expanded and all terms containing
(where i # j) eliminated due to orthogonality of
~i
with
s'" . .
-i-j
1'; • ,
-J
then
E(t -
(6.1.1.3)
97
Consider each term
o~
equation (6.1.1.3) separately.
may be simplified by using the definition of the estimator
The term
~,
and so it follows that
(6.1.1.4)
Now, using the fact that, in general,
2
f(z )
= V(z) + [f(z)]
2
,
it follows that
(6.1.1.5)
which may be rewritten 'as
(6.1.1.6)
using equation (3.4.8)
and the fact that
98
The term
J2
n -p
E
I*
~j!
dj
Li=l
may be expanded directly to obtain
n
E
*-p
.'£
J=l
2
dj~j"Y
I
=
The term
Now,
and
is replaced by
l:
j.,l
~
d._:'X~]
j=l
n*-p
X"Y
~
n*-p
E[
=
J J
2~ ~~
d.
~V
2
2
n*-p
+ E[
l
_ _ 2
dj_jf.]
j=l
(6.1.1. 7)
.0
J -J -J
of equation (6.1.1.3) is simplified if
x"x~,
i.e.,
99
Equation (6.1.1.8) may therefore be written as
(6.1.1.9)
Equation (6.1.1.3) may be expressed as the sum of equations (6.1.1.6),
(6.1.1.6), (6.1.1.7) and (6.1.1.9)
E(t-T)
2
=
R* 2-2-2
2
n;-P2_ - 2
L ci[A.a i + ----~iX~VX~,a ] + L d,_,V_
i=l
1
-1
j=l J J j
(6.1.1.10)
Now,
to
E(t_T)2
CR,'
As
of equation (6.1.1.10) should be minimized with respect
V and
XAVX
c. (i=l, •.• ,p except t), and
1
to zero, and the
are positive definite, it follows that
d. (j=l, •. "n *-p),
J
cR,-term minimized directly,
equation (6.1.1.10) with respect to
for
cR,
should be set equal
Differentiating
nR,' equating to zero and solving
yields
(6.1.1,11) .
with all other coefficients zero.
T = g~~
is therefore
The minimum M.S.E. estimator of
100
r
For any
(6.1.1.12)
CB ,
-1.-
(6.1.1.13)
and is unbiased for
~i~
.
Equations (6.1.1.12) and (6.1.1.13) may be combined to form a single
estimator by noting that
P'B
where
and the
is
estimated by
[A + KJ-lp'X'Y
K is a diagonal matrix with diagonal elements,
k£
defined as
(6.1.1.14)
101
It follows that the minimum M.S.E. estimator of
T =
f~
is given by
(6.1.1.15)
where
(6.1.1.16)
Notice how equation (6.1.1.1.6) is similar to the ordinary weighted
ridge estimator except that a specific value for
that the identity matrix used in
6.1.2.
-*
B
is replaced by
The p-eigenvectors case.
general case where
T
t =
XAX,
vectors of
f~~;
.
cannot be expressed as a single function of one of
Again, the minimum M.S.E.
-*
,of
E
may be written as a linear function of the eigen-
~~~F
Notice that
is demanded and
The extension to the more
the eigenvectors will now be investigated.
estimator,
k
T = dAB
is required.
i.e.,
d
=:
r
i=l
fi~i
(6.1.2.1)
Pf
where
-
f
is a vector of constants.
The estimator
t
may therefore be written as
*
n -p
t
=
~
J=l
and
E(t-T)2
may be expressed as
_
dj~jA!
(6.1.2.2)
102
=
p
E[L:
i=l
n*-p
(f.c.E;::X~Y~ ~-~
-
f.E;:~8)
~-
-
+ L:
j=l
(6.1..2.3)
Using arguments similar to those used to derive equations (6.1.1.6),
(6.1.1.7) and (6.1.1.9),
E(t - T)2
By an argument paralleling that used to derive equation (6.1.1.11), it
may be shown that
i=l, ... ,p
c.
~
(6.1.2.5)
It follows that
t
=
r
1=1
and so,
is estimated by
It follows the complete form of the estimator may be written as
(6,1..2.6)
103
The estimator
~*
~F may be written in canonical form as
-*
1'(1 + K)-lp~X~~
~F =
and i f
-*
~F
pAB-*
-F
then
-*
(A + K)-l1'~X~!
~F =
If
Z
XP
Z~Z = A, equation (6.1.2.6) may be written as
such that
-*
a
-F
and hence
-*
~F
(6.1.2.7)
=
may be regarded as being ridge regression applied to
weighted canonical variables.
Further, notice that
and so
S*
-F
=
\p
[,i=l
(X~X + 1'K1'~)-lX~Y
specific values for
-
k
i
.
is only another form of
is then generalized ridge regression with
From consideration of equation (6.1.2.5),
(6.1.2.8)
For ridge regression, Hoerl and Kennard (1976) argued for the use
of a single k-value of
2
k - S~g
-~
(6.1.2.9)
104
and if equation (6.1.2.8) is considered for the case of standard ridge
regression, then
k
i
=
(6.1.2.10)
are assumed
and equation (6.1.2.9) may only be attained if all
equal, which is rarely to be found in practice.
The disadvantage
functional ridge regression is that parameter estimates are demanded for
determination of k-value.
6.2
Prior Ridge Regression.
Consider the situation where it is known that some observations are
more precise (in a variance sense) than others.
and
V<.~) = Ma
2
where
Assume that
M is a known non-singular matrix of
correlation coefficients.
The Aitken least squares unbiased estimate of 13
with variance-covariance
is
(X"M-lX)-li.
Use of the generalized ridge regression technique on the Aitken
least squares structure yields
(6,2.1)
and the variance-covariance matrix, and bias-matrix are given by
(6.2.2)
and
(6,2.3)
The disturbing feature lies in
M-1 .
If the initial set of
observations are such that there exists high correlations between
responses, then this
M-1 unreliable.
M will be nearly singular and hence the computed
A possible solution would be to investigate the use of
ridge-technique on
M and then continue with standard ridge regression
on this result.
Let
*
K
denote the
A
n x n
o
0
matrix to be added to
its inversion in calculating equation (6.2.1),
M prior to
Then
(6.2 ..',
I
with
(6.2.
"J
where
The bias-matrix is given by
(6.2.11
Suppose
K*
is a diagonal matrix, all elements equal to
K a diagonal matrix, all elements equal to the positive value
k*
k
awl
106
Equation (6.2.4) may then be written
(6.2.7)
where
th~ variance-covariance matrix and bias-matrix are
(6.2.8)
and
respectively.
Define the estimator resulting from using ridge regression on the
Aitken least squares structure as
(6.2.10)
with bias-matrix
(6.2.11)
Consider the difference
estimators.
(DB
between traces of the
Equation (6.2.11) minus
bias~atrices
equation (6.2.8) is then
6f the
107
Equation (6.2.12) will be positive definite if the kernel
(6.2.13)
is positive definite.
From Graybill (1969, p. 330), equation (6.2.13)
will be positive definite if
(6.2.14)
is positive definite.
Now,
(6.2.15)
and so equation (6.2.14) may be expressed as
(6.2.16)
Q
where
M = P A P~.
mmm
definite only if
Q of equation (6.6.16) can be positive
Notice that
k*
[I
is negative and
+ k A-lJ- l
*m
is positive
definite and is possible only if
-)..
where)..
pm
is the smallest root of
(6.2.17) is clear.
(A
pm
-+ 0),
As
M
(6.2.17)
pm < k* < 0 ,
M.
The implication of equation
becomes more and more ill-conditioned
the existence of a
k*
to achieve a decrease in trace bias-
matrix becomes confined to a very small negative range.
The variance-covariance matrix for the estimator defined by
equation (6.2.10) is
(6.2.18)
108
Consider the difference
(D)
v
between the traces of the variance-
Equation (6.2.8) minus equation (6.2.18) is then
covariance matrices.
(6.2.19)
- tr[{ X-'M-lX + kI} -lX-'M-lX{X-'M-1X + kI} -lJ0 2 .
As
M = { X-'M-lX + kI
A
r1
is positive definite
(6.2.20)
Dv of equation (6.2.20) may now be made positive by making the kernel
of equation (6.2.20) positive definite and examination of this kernel
~eveals
the condition
(6.2.21)
From equations (6.2.17) and (6.2.21) the conclusion to be drawn is
that prior ridge regression will be a trace mean square error improvement over ridge regression for any choice of
k*
such that
(6.2.22)
6.3
Variance Weighted Ridge Regression.
Suppose the data points are uncorrelated but the variance (i.e.,
in a sense, precision) for point i equals
Define
(6.3.1)
where
C is a diagonal matrix with ith diagonal element
Define
the variance weighted ridge regression as
.* =
8c
=
-L-L
+
(X~C """Un-X
,
0
(X~C-1X +
kI)
-1
X~C
-L-l
lln y
0
n
t
i=l
n
-!.
c
w w~
2 -i-1
i
The sequential variance weighted ridge estimator is defined as
(6.3.:3;
where
and
n* = nc
-2
This estimator has a variance-covariance matrix
(6. ,j
.••
and a bias-matrix
( Ii .\ ,
no
The specific n-weightings may be determined in a manner analogous to
those found in Sections 5.1 to 5.3.
As an example, for the criterion of
minimal trace variance-covariance matriX, the
n~.
=
k{(y,tEc (t-l)~t) (~:f:E~ (i-I )~,e) -
n*~-weight~ng
i$
(~iE~ Ct-I )R() 2 }I-( e~-l:liEc (i-I) h',e) (tliE~ <e-l\l()
(6.3.6)
The trace of equation (6.3.4), with the
n*{
of equation (6.3.6) is
given by
=
~*
tr[V(fic(t-l))]-
- - - - - _ .-
(~iF~(t-l)~,e? [C~-~i{I+kEc(,e-l) }FC<t-l):!i
(6.3,7)
Equation (6.3.7) may be shown to be greater than or equal to zero,
thus indicating negative
n*~-weightings
achieve a diminution in
sequential trace variance-covariance.
Similar expressions may be
found for the criteria of
of variance and A-optimality of
bias-matrix.
D~optimality
111
7.
AN EXAMPLE
Some of the techniques developed in earlier chapters will be
demonstrated using an example.
The parameter values are assumed known
in order for a comparison of bias to be made.
These parameters chosen
came from repeated clinical experiments involving mice.
There were
nine observations and five parameters to be estimated.
The data had
been standardized so as to produce
error terms were drawn from a
X~X
N(O,l)
in correlation form and the
population.
Table 7.1 shows
the initial values used and Table 7.2 gives the transformed values.
Applying O.L.S. to the transformed data yields
B=
[-um~J ·
0.58846
3.42357
The residual sum of squares is 2.7061 and the estimated variance,
0.67653.
The Euclidean distance from estimate to parameter is 12.2374.
The F-test for regression is significant for
a = 0.01.
The variance-
covariance matrix of the estimator is
12,27670
0.72300
5.08051
=
[
symmetric
-11.86151
- 5.13252
16.58890
3.07188
1. 79507
-4.54279
2.34224
-0.59535]
0.29601
-0.24945
0.22013
1. 31964
The trace of the variance-covariance matrix is 37.6079, and the
determinant, 78.6356.
For an orthogonal situation, the trace of the
variance-covariance matrix should equal 5, and the determinant equal 1.
As the O.L.S. values are much larger than their orthogonal equivalents,
112
Table 7.1,
Intttal values for the example
1 1 1 -1 1
12100
1 3 1 -2 -2
2 2 2 -1
X(initia1)= 2 3 3 0
3
1
2 4 2 -2
4
4 4 3 -2
4 5 4 -1
4 6 4 -1
3
0
4
, Beta(initial) =
2
Variance(a ) = 1.0
(p
l=I
-0.
0.8
1.0
0.4
0.6
5. n
o
= 9),
, Y(initia1) =
0.21549
3.36302
2.73027
4.52873
6.15905
6.32341
5.27558
7.67087
11.21620
113
Table 7.2.
X=
Transformed values for the example
-0.356348
-0.356348
-0.356348
-0.089087
-0.089087
-0.089087
0.445435
0.445435
0.445435
Y=
1,000000
-0.521749
-0.298142
-0.074536
-0.298142
-0.074536
0.149071
0.149071
0.372678
0.596285
-5.06036
-1. 91282
-2.54558
-0.74711
0.88320
1.04756
0.00027
2.39503
5.94034
0.836660
1.000000
X"X =
[
symmetric
The eigen values of
0.034948.
X..X
are
(p
0.050252
0.502520
-0.402015
0.050252
0.502520
-0.402015
-0.402015
0.050252
0.050252
-0.384900
-0.384900
-0.384900
-0.096225
0.192450
-0.096225
0.192450
0.481125
0.481125
, Beta
=
= 5, no = 9).
-0.094967
-0.265908
-0.607790
0.246915
-0.094967
0.417855
0.246915
-0.265908
0.417855
-0.74833
3.57771
3.46410
0.88443
3.50999
0.925820
0.839146
1.000000
-0.201456
-0.269680
0.043519
1.000000
0.472087
0.356753
0.411220
-0.188982
1.000000
3.036040, 1.045120, 0.725906, 0.157982,
114
ridge regression is used.
Table 7.3 illustrates the major properties of
ridge estimation for various k-values.
The behavior of the individual
regression coefficients are displayed as a ridge-plot in Figure 7.1.
As the k-value increases in magnitude, the coefficients tend to zero
non-monotonically.
Weighted ridge regression can be used to produce an estimator that
is superior (with a given criterion) to that obtained by ridge
regression.
With a non-stochastic k-value, the n-weightings are selected
sequentially according to the methods of Chapter 5.
The specific order
of points weighted is by that point that yields maximum improvement
for a selected criterion.
After all points have been weighted, the
procedure is repeated until further improvement cannot be achieved.
For the criterion of A-optimality of the
variance~covariance matrix,
negative n-weightings decline from zero to -1 on each successive
iteration (Table 7.4).
The physical interpretation of this decline
is that the data points are being "discounted" in importance on each
successive iteration until
B is estimated by the zero vector
Z:O'
to the complete absence of data.
ow~ng
The decreasing trace of the variance
matrix is offset to some degree by the increasing of bias, and is
trated in Figure 7.2.
tion.
il1us~
However, caution must be used in its interpreta-
It is tempting to conclude that mono tonicity holds for both vari-
ance and bias-squared as the points are sequentially weighted.
not the case as can be seen in Figure 7.3.
This is
The behavior of individual
estimated coefficients is illustrated in Figure 7.4.
In each instance,
the final beta-value is closer to zero than its initial value, but the
path taken by intermediate steps is not monotone.
examined, the maximum value of
tr( S)
is
s" S
If Figure 7.2 is
, equal to 38.4622 in
e
e
e
Table 7.3.
Ridge regression values.
k
0
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.045
0.050
tr(V)
37.6079
30.4819
25.5034
21.8665
19.1132
16.9670
16.2532
13.8564
12.6980
11. 7228
10.8912
det(V)
78.6356
55.0644
39.8767
29.6629
22.5550
17.4680
13.7408
10.9552
8.8373
7.2029
5.9249
0.0817
0.2638
0.4909
0.7365
0.9872
1.2362
1.4802
1. 7174
1. 9473
2.1695
tr(B)
0
tr (M. S. E.)
37.6079
30.7672
25.7672
22.3574
19.3497
17.9542
16.4894
15,.3366
14.4154
13.6701
13.0608
det(M. S .E.)
78.6356
55.1462
40.1404
30.1538
23.2915
18.4552
14.9770
12.4354
10.5547
9.1501
8.0944
E.dist.sq.
12.2374
9.8022
8.1165
6.8913
5.9674
5.2516
4.6859
4.2324
3.8656
3.5671
3.3239
Res.S.Sq.
2.7061
2.7198
2.7528
2.7978
2.8506
2.9088
2.9708
3.0357
3.1025
3.1709
3.2403
-Note:
k = 0
is the O.L.S. solution.
tr(V) = trace(variance-covariance matrix).
det(V) = determinant (variance-covariance matrix).
tr(B) = trace(bias-matrix).
tr(M.S.E.) = tr(V) + tr(B).
det(M.S.E.) = det(V) + tr(B).
E.dist.sq. = Euclidean distance-squared, estimate to parameter.
Res.S.Sq. = Residual sum of squares.
~
~
V1
e
e
Table 7.4.
Iterative n-weightings for A-optimality of the varinace-covariance matri 4 .
Point Number
oth
iteration
st
1
iteration
nd
2
iteration
1
2
3
4
5
6
7
8
9
tr(V)
tr(B)
tr(M.S.E.)
0
0
0
0
0
0
0
0
0
25.5034
0.263772
25.7611
-0.554519
-0.595583
-0.596144
-0.523406
-0.587026
-0.661790
-0.736564
-0.665498
-0.466295
16.8648
1.18031
18.0451
-0.669375
-0.808176
-0.749709
-0.617180
-0.828034
-0.595277
-0.831780
-0.775772
-0.663489
6.20498
4.7156
10.9206
Note:
The ith n-weight refers to the ith row of
k
e
= 0.01
n-weightings
rd
th
4
3
iteration
iteration
th
5
iteration
iteration
iteration
-0.786939
-0.818234
-0.728652
-0.687183
-0.729486
-0.701792
-0.863564
-0.790434
-0.680536
2.33323
9.24916
11.5824
-0.771083
-0.841049
-0.876286
-0.792396
-0.853187
-0.793460
-0.829468
-0.817356
-0.779326
0.519097
15.393
15.9121
-0.901830
-0.910934
-0.941275
-0.932013
-0.966827
-0.951005
-0.884204
-0.934152
-0.837266
0.029791
29.2681
29.2979
-0.986857
-0.992629
-0.996484
-0.997265
-0.999999
-0.996362
-0.998868
-0.998843
-0.985826
0.000009
38.2942
38.2942
-0.999985
-0.999970
-0.999999
-0.999998
-1.000000
-0.999997
-1.000000
-0.999999
-0.999849
0.000000
38.4622
38.4622
~-----~h
X.
.
~
.......
'"
117
Estimated
coefficient
5
4
3
85*
~-----
B~'c
3
2
1
_ - - - - - - - - - - - - - - - 8*
4
Ol----:---:-:----:--:-:-_--:-~--:_:_:__-__:_'"':"':"'--........,..
0.01
0.02
0.03
0.04
0.05
-1
-2
-3
Figure 7.1.
l~e ridge-plot for estimates in standard
ridge regression.
k
118
M.S.E.
40
0.01
k
35
30
25
•,
\
I
,
I
I
\
I
\
I
I
\
I
\
20
I
\
I
\
I
\
\
I
,
I
I
\
\
15
\ \ (tr(M. S •E.)
\
\- ~
--
----""
10
5
5
6
Iteration Number
Figure 7.2.
Iterative n-weightings for A-optimality of the
variance-covariance matrix.
7
e
e
e
tr(B) trey)
k
1.5
0.01
25
JI.
X
ll:
II-
.,.
tr(B)
1.0
20
0.5
trey)
15
o
1
2
3
4
5
6
7
8
9
Cumulative order of points weighted
Figure 7.3.
Individual weighting of points for A-optimality of the variance-covariance matrix.
.....
.....
\0
120
Numerical
coefficient
value
6
5
4
3
2
1
o
1
2
3
4
5
6
7
8
9
Cumulative order
of points weighted
-2
-3
Figure 7.4.
Beta-coefficients as a function of n-weights
chosen. k = 0.01.
121
in this instance, the minimum value of
together with the maximum value of
tr(V)
tr(B).
being zero and occurring
Empirically, it should be
noted that the rate of decrease in trace of the variance matrix is large
for the first two or three iterations, while the rate of increase for
trace of the bias matrix is relatively small.
This suggests the use of
only two iterations to reduce variance when actual values for bias-squared
are not known.
For the criterion of D-optimality of the variance-covariance matrix,
negative n-weightings are determined.
On the sequential recalculation of
n-weightings, it is found that weighting further than the second iteration fails to decrease the determinant of the variance-covariance matrix
(Table 7.5).
The specific n-weightings are a function of the k-value initially
chosen and Table 7.6 illustrates the steady increase of negative initial
n-weightings as the k-value increases.
Somewhat similar characteristics
may be observed for the D-optimality of variance (Table 7.7).
For this
particular example, the Euclidean distance-squared from estimate to parameter decreases as a function of the number of points included in the
iteration step.
Investigation of other examples showed this not to be a
consistent phenomenon however.
Reference to Figure 7.2 indicates the largest reduction in the trace
of the variance-covariance matrix occurs when proceeding from iteration I
to iteration 2 (a 42.88% reduction).
If the problem is recalculated as
a standard ridge regression problem with a k-value chosen such that the
new trace of the variance-covariance matrix equals the trace of the
variance-covariance matrix of the weighted ridge estimator (2nd iteration),
bias comparisons may be made.
Direct comparison of the trace of the bias
matrix by inspection of the difference of two
quadr~tic
forms is
122
Table 7.5.
Iterative n-weightings for D-optima1ity of the
variance-covariance matrix.
Point Number
1
2
3
4
5
6
7
8
9
det(V)
treE)
de t (M. S. E• )
Note:
oth
iteration
0
0
0
0
0
0
0
0
0
39.8767
0.263772
40.1404
n-weightings
st
1
iteration
-
0.263390
0.449882
0.271684
0.208880
0.355609
0.288008
0.452579
0.333887
0.204051
28.5215
0.571228
29.0927
nd
2
iteration
-
0.375763
0.467134
0.368888
0.366138
0.437679
0.407156
0.504362
0.445438
0.338000
15.4789
1. 25784
16.7367
The ith n-weight refers to the ith row of
k = 0.01 •
X.
e
e
e
Table 7.6.
n-weightings for 1
st
iteratioIl.
A-optimality of the variance-covariance matrix.
= 0.010
Point Number
k : 0.005
k
1
2
3
4
5
6
7
8
9
-0.360429
-0.428367
-0.442412
-0.348915
-0.414681
-0.428357
-0.521085
-0.562955
-0.324886
-0.554519
-0.595583
-0.596144
-0.523406
-0.587026
-0.661790
-0.736564
-0.665498
-0.466295
tr(V)
tr(B)
tr(M. S. E.)
Point Number
1
2
3
4
5
6
7
8
9
tr(V)
treE)
tr(H.S.E. )
= 0.025
= 0.015
k : 0.020
k
-0.611160
-0.683696
-0.760841
-0.536271
-0.672544
-0.704456
-0.814054
-0.726453
-0.520676
-0.654429
-0.737321
-0.774589
-0.554325
-0.718598
-0.722674
-0.840365
-0.757317
-0.568543
-0.682517
-0.772768
-0.785338
-0.564309
-0.743627
-0.735219
-0.848068
-0.782553
-0.616235
k
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
30.4819
0.0817
30.5636
26.6392
0.2278
26.867
25.5034
0.2637
25.7671
16.8648
1.1803
18.0451
21.8665
0.4909
22.3574
11.3329
2.3969
13.7298
19.1132
0.7364
19.8490
8.3522
3.4568
11.8090
16.9670
0.9871
17.9541
6.6172
4.3036
10.9208
k
= 0.030
-0.702640
-0.797428
-0.795802
-0.570254
-0.756152
-0.745712
-0.849509
-0.782553
-0.616235
(a)
(b)
15.2432
1.2362
16.4894
5.5176
4.9909
10.5085
k
= 0.035
-0.717725
-0.815147
-0.805551
-0.573867
-0.760682
-0.754820
-0.849363
-0.785797
-0.628441
(a)
13 .85(,1,
k
= 0.040
-0.729300
-0.828126
-0.814188
-0.576099
-0.759879
-0.762712
-0.849272
-0.785791
-0.636423
k
= 0.045
-0.738281
-0.837721
-0.821611
-0.577480
-0.755447
-0.769496
-0.849684
-0.783707
-0.641463
k
= 0.050
-0.745277
-0.844814
-0.827891
-0.578307
-0.748548
-0.775274
-0.850614
-0.780250
0.644366
(b)
(a)
(b)
(a)
(b)
(a)
4.7671
12.6980
1. 7174
14.4154
4.223/.
6.0710
10.2944
11. 7228
1. 9473
D.670J
3.8110
6.5150
10.3260
10.8912
2.1696
13.0608
1.4802 5.569 /,
15.3366 10.3365
(b)
3.1,874
6.9131
10.4005
---(a)
(b)
= Standard
= Weighted
ridge re~res~ion.
ridge regression.
I-'
N
W
e
e
Table 7.7.
st
n-weightings for 1
iteration.
Point Number
k
= 0.005
-0.145790
-0.300620
-0.150947
-0.113432
-0.208422
-0.163237
-0.266037
-0.183511
-0.113636
(b)
(a)
1
2
3
4
5
6
7
8
9
det(V)
tr(B)
det(M.S.E.)
Point Number
55.0644
0.0818
55.1462
k
= 0.030
-0.570468
-0.674074
-0.604398
-0.493192
-0.611075
-0.652645
-0.769779
-0.557124
-0.469110
1
2
3
4
5
6
7
8
9
(a)
det(V)
tr(B)
det(M.S.E.)
Note:
50.118
0.1339
50.2519
13.7408
1. 2362
14.9770
k
D-optima1ity of the variance-covariance matrix.
= 0.010
k
k
= 0.015
-0.357882
-0.539324
-0.367696
-0.287447
-0.455618
-0.381809
-0.578980
-0.448740
-0.277891
(b)
(a)
-0.263390
-0.449882
-0.271684
-0.208880
-0.355609
-0.288008
-0.452579
-0.333887
-0.204051
(b)
(a)
39.8767
0.2637
50.2519
e
28.5215
0.5712
29.0927
= 0.035
29.6629
0.4904
30.1538
k
-0.621630
-0.699282
-0.651 24
-0.539_,77
-0.63919,)
-0.686484
-0.802577
-0.602517
-0.515938
15.3071
1.2366
16.5437
= 0.040
-0.660742
-0.719569
-0.661047
-0.553022
-0.616683
-0.660218
-0.790074
-0.733898
-0.549420
k
= 0.020
-0.435452
-0.599043
-0.445429
-0.352946
-0.524827
-0.452575
-0.665335
-0.535257
-0.340881
(a)
(b)
22.5551
0.7364
23.2915
k
8.0493
2.0098
10.0592
k
-
-0.500406
-0.641837
-0.509424
-0.408328
-0.574357
-0.506659
-0.725908
-0.601079
-0.396287
(a)
(b)
17.4680
0.9872
18.4552
= 0.045
-0.698786
-0.736278
-0.695295
-0.586918
-0.632880
-0.684941
-0.809415
-0.760343
-0.589257
= 0.025
k
4.2360
2.8082
7.0442
= 0.050
-0.732201
-0.750300
-0.724753
-0.616679
-0.646579
-0.705541
-0.825506
-0.782205
-0.625609
(b)
(a)
(b)
(a)
(b)
(a)
(b)
(a)
(b)
2.0936
2.7556
5.8492
10.9552
1.4802
12.4354
1.0964
4.5438
5.6402
8.8373
1. 7174
10.5547
0.6899
4.9016
5.5915
7.2028
1. 9472
9.1500
0.3823
5.5693
5.9516
5.9249
2.1695
8.0944
0.2142
6.2088
6.4230
(a) - Ridge Regression quantities; (b)
=
Weighted Ridge Regression quantities.
f-'
N
+"
125
inconclusive as mixed eigenvalues of the resulting difference matrix
makes interpretation of bias-squared difficult.
Transformation to the
equivalent bias-squared of Sections 4.6 and 4.7 shows weighted ridge
regression to be clearly superior to standard ridge regression; this
result is holding true for both variance criteria.
If the A-optimality of the bias-matrix is investigated, the question
arises as to which five points can carry infinite weighting.
Selection
of that point which gives the maximum reduction in bias-squared leaves
the problem of the n-weighting for the remaining points unanswered.
At each stage of the iterative scheme, the remaining points are weighted
according to the A-optimality of variance criterion.
In this way the
increase in variance due to infinite n-weighting may be partially compensated by the decrease in variance due to the negative n-weighting.
When the maximum of five points are given infinite n-weighting, the
remaining points cannot be weighted as the fitted hyperplane is now
fixed.
Tables 7.8 to 7.11 illustrate these concepts,
In each of the
A-optimality of bias schemes, the iteration was terminated axter the
third cycle as being unable to improve the trace of the variance.
-
e
Table 7.8.
e
One point infinitely weighted. A-optimality of the bias matrix,
then A-Optimality of the variance-covariance matrix.
Point Number
Ridge
Regression
1
2
3
4
5
6
7
8
9
tr(V)
tr(B)
trW.S.E.)
E.dist.sq.
Res.S.Sq.
0
0
0
0
0
0
0
0
0
25.5034
0.2638
25.7671
8.1165
2.7528
0
0
0
0
37.758
0.09421
37.8522
18.6119
3.10409
-0.411964
-0.319462
-0.481980
-0.337697
23.5693
0.3833
23.9526
6.6604
2.8025
-0.549213
-0.492398
-0.568463
-0.503713
19.8199
0.8540
20.6739
5.0753
2.9372
-0.687007
-0.622249
-0.719748
-0.673792
13.5481
3.0598
16.6079
2.8627
3.5920
-2.59918
5.72552
3.61112
0.70566
3.35435
-4.09789
5.2761
5.57199
0.71702
3.31833
-2.37050
5.56332
3.57528
0.72401
3.29031
-2.10201
5.32985
3.50738
0.73185
3.12561
-1.61089
4.81205
3.33383
0.74014
2.76345
Estimated
SJ.
~
134
I3s
Iteration
0
- -0
0
0
0
00
1
-0.402075
-0.387483
-0.444940
-0.325724
00
2
-0.543640
-0.506198
-0.491564
-0.491166
00
3
-0.741541
-0.684797
-0.633128
-0.664518
00
.....
N
0'
e
e
Table 7.9.
Two points infinitely weighted. A-optimality of the bias matrix,
then A-optimality of the variance-covariance matrix.
Point Number
Ridge
Regression
1
2
3
4
5
6
7
8
9
tr(V)
tr(B)
tr(H.S.E.)
E.dist.sq.
Res.S.Sq.
0
0
0
0
0
0
0
0
0
25.5034
0.2638
25.7671
8.1165
2.7528
-2.59918
5.72552
3.61112
0.70566
3.35430
Estimated
e
131
13 2
133
134
13 5
Iteration
1
0
2
3
0
0
0
0
-0.371154
-0.251449
-0.240978
-0.261964
-0.442476
-0.385177
-0.339945
-0.404096
-0.583319
-0.525019
-0.472031
-0.519525
00
00
00
00
00
00
00
00
0
0
0
50.4521
0.0124
50.4645
18.0960
3.4851
-0.312694
-0.366137
-0.287630
24.8170
0.2657
25.0827
7.9984
2.7584
-0.416048
-0.437186
-0.397284
23.4389
0.2721
23.7110
8.0192
2.7685
-0.567961
-0.608258
-0.527131
20.8298
0.3043
21.1341
7.9352
2.8026
-3.97284
5.01805
5.80482
0.83804
3.13205
-2.64642
5.63979
3.70373
0.64627
3.33855
-2.69587
5.58303
3.76344
0.60273
3.32012
-2.75655
5.47771
3.83437
0.54675
3.30723
!-'
N
'-J
e
e
e
Table 7.10.
Three points infinitely weighted. A-optimality of the bias matrix, then
A-optimality of the variance-covariance matrix.
Point Number
1
2
3
4
5
6
7
8
9
trey)
treE)
tr(M~S.E.)
E.dist.sq.
Res.S.sq.
Estimated
81
82
83
84
85
Ridge
Regression
Iteration
0
1
2
3
0
0
0
0
0
0
0
0
0
25.5034
0.2638
25.7671
8.1165
2.7528
0
62.038
0.0075
62.0455
12.5183
6.94725
-0.192629
25.1453
0.2640
25.4093
7.9113
2.7561
-0.282597
24.4961
0.2646
24.7607
7.7760
2.7589
-0.362745
23.3853
0.2663
23.6516
7.6373
2.7636
-2.59918
5.72552
3.61112
0.70566
3.35435
-3.85885
4.43064
4.54606
1. 70672
4.02831
-2.57621
5.69772
3.62004
0.70543
3.37063
-2.57490
5.66321
3.64972
0.69005
3.37526
-2.59452
5.60004
3.71862
0.65000
3.37094
0
0
0
0
-0.225752
-0.144884
-0.280739
-0.210001
-0.287334
-0.243457
-0.322954
-01297203
-0.374320
-0.330142
-0.407624
-0.357431
00
00
00
00
00
00
00
00
0
00
-0.267969
00
-0.275058
00
-0.375111
00
~
N
00
•
e
e
Table 7.11.
Four points infinitely weighted. A-optimality of the bias matrix, then
A~optima1ity of the variance-covariance matrix.
Point Number
1
2
3
4
5
6
7
8
9
trey)
tr(B)
tr (11. S.E.)
E.dist.sq.
Res.S.sq.
Estimated
13 1
13 2
13 3
84
13 5
Ridge
Regression
Iteration
0
0
0
1
-0.270553
-0.141624
2
3
0
0
0
0
0
0
0
0
0
25.5034
0.263
25.7671
8.1165
2.7528
0
61. 3360
0.0073
61.3433
19.7697
9.2856
-0.175140
25.2058
0.2640
25.4698
7.9567
2.7558
-0.262986
24.6712
0.2646
24.9358
7.8743
2.7581
-0.335639
23.7734
0.2659
24.0393
7.7323
2.7638
-2.59918
5.72552
3.61112
0.70566
3.35435
-4.82746
4.81672
4.55607
1.09132
2.90991
-2.61123
5.66950
3.66810
0.67335
3.35339
-2.61810
5.63809
3.70046
0.65499
3.35292
-2.63105
5.57886
3.76145
0.62039
3.35204
00
0
00
-0.203555
-0.284363
-0.236988
00
-0.376099
-0.316607
00
-0.275134
-0.348391
00
00
00
00
00
00
00
00
0
eo
-0.234249
00
-0.284786
00
.
-0.361115
00
~
N
1.0
130
8.
SUMMARY
Ordinary least squares, weighted estimation, and ridge regression
are special cases of the more general class, weighted ridge regression.
The investigation of some of the properties of this general class has
been the subject of this dissertation.
The basic properties of weighted estimation were discussed in
Chapter 3.
The relationship between the artificial assigning of a
variance-covariance structure to the error terms of the assumed linear
model, and experimental design was investigated.
In general, weighted
estimation is a poor competitor to O.L.S. except for the case when a
change of sign of an estimate is desired.
The application of the ridge technique to weighted estimation was
covered in Chapter 4.
For a given k-value, positive n-weightings
caused a decrease in bias-squared, but an increase in variance.
Negative n-weightings have the effect of decreasing the variance, but
often increase the bias-squared.
It is shown, and demonstrated by the
example of Chapter 7, that there exist negative n-weightings such that
both variance and bias-squared are reduced.
Comparison of bias-matrices
for two estimates is seldom straight-forward when the true parameter
values are not known.
A technique for comparing the bias-matrices of
two estimates, when the true parameter values are unknown, was
developed and named equivalent bias-squared.
Chapter 5 dealt with the actual determination of the n-weightings
used in weighted ridge estimation.
The criteria considered represent
a considerable cross-section of those criteria commonly used.
With
each criterion, a sequential method of determining the n-weighting was
used.
This method examined each point in the data set in a sequential
131
order, and the point yielding the maximum improvement (with respect to
the criterion being used) over the preceding stage, chosen to be
weighted.
The extension of the concept of n-weighting to more complex
situations was investigated in Chapter 6, and the basic properties of
each of the three situations discussed were examined.
An example of weighted ridge regression was given in Chapter 7,
and was chosen after some deliberation.
Examples and counter-examples,
each showing the relative merits and drawbacks with biased estimation,
are easy to construct.
The example represented a lItypical" weighted
ridge regression problem with parameter values frequently met with by
by data analysts.
When information concerning parameter values is
available to the data analyst, biased estimation in general, and
weighted ridge regression in particular, have estimators superior
(for a specified criterion) to O.L.S,
Further work in weighted ridge regression might be directed toward
the following three areas:
(i)
Development of an efficient algorithm for the selection of the
n-weightings.
The sequential technique has the same kind of
drawbacks as does the technique of forward selection in
ing a model in multiple regression.
build~
Possibly suitable
algorithms may exist in the areas of non-linear programming or
simplex-type search.
(ii)
Investigation of how the assumption of a model that is different
from the true model is affected by both the ridge-technique and
the weighting of points.
132
(iii)
Finally, a full extension from ridge regression technique
of adding an equal positive quantity to the main diagonal of
X~X
prior to inversion
to the general technique of unequal
positive quantities should be investigated.
133
REFERENCES
1971.
Dykstra, O.
Ix"xl·
The augmentation of experimental data to maximize
Technometrics 13, pp. 682-688.
Goldberger, A. S.
1964.
1969.
Graybill, F. A.
Statistics.
Econometric Theory.
Wiley, New York.
Introduction to Matrices with Applications in
Wadsworth, Belmont, California.
Hemmerle, W. J. 1975. An explicit solution for generalized ridge
regression. Technometrics 1?, pp. 309-314.
Hoer1, A. E. and Kennard, R. W. 1970. Ridge regression: biased
estimation for nonorthogonal problems. Technometrics 12,
pp. 55-67.
1976. Ridge regression: iterative estimation of the
biasing parameter. Comm. in Stat. A5, pp. 77-88.
Hoer1, A. E., Kennard, R. W., and Baldwin, K. F. 1975. Ridge
regression: some simulations. Comm. in Stat. A4, pp. 105-123.
James, W. and Stein, C.
1961.
Estimation with quadratic loss.
Univ. of Calif.
Proc. Fourth BerkeZey Symposium 1, pp. 361-379.
Press., Berkeley, Calif.
Kmenta, J.
1971.
Lancaster, P.
EZements of Econometrics.
1969.
Macmillan, New York.
The Theory of Matrices. Academic Press, New York.
Lawless, J. F. and Wang, P. 1976. A simulation study of ridge
and other regression estimators. Cbmm. in Stat.A5, pp. 307-323.
Leamer, E. E. 1975. A result on the sign of restricted least squares
estimates. J. Econometrics 3, pp. 387-390.
McDonald, G. C. and Galarneau, D. I. 1975. A Monte Carlo evaluation
of some ridge type estimators. J. Amer. Stat. Assoc. ?O,
pp. 407-416.
Marquardt, D. W. 1970. Generalized inverses, ridge regression,
biased linear and non-linear estimation. Technometrics 12,
pp. 591-612.
Marquina, N. 1978. Methods for improving non-orthogonal experimental
designs. University of Texas 3 Austin Tech. Report 126.
Mayer, L. S. and Willke, T. A. 1973. On biased estimation in linear
models. Technometrics 15, pp. 497-508.
Newhouse, J. P. and Oman, S. D. 1971. An evaluation of ridge
estimators. Rand Corporation R-J16-ffl.
•
134
Obechain, R. L. 1975. Ridge analysis following a preliminary test of
the shrunken hypothesis. Technometrics 17, pp. 431-445.
Searle, S. R.
1971.
Linear Models.
Silvey, S. D.
1969.
Multicollinearity and imprecise estimation.
Wiley, New York.
J. Roy. stat. Soc. B 31, pp. 539-552.
Sclove, S. L. 1968. Improved estimators for coefficients in linear
regression. J. Amer. Stat. Assoc. 63, pp. 596-606.
Toro-Vizcarrondo, C. E. and Wallace, T. D. 1968. A test of the M.S.E.
criterion for restrictions in linear regression. J. Amer. Stat.
Assoc. 63, pp. 558-572.
Vinod, H. D. 1976. Application of new ridge regression methods to a
.~ study of Bell System Scale Economies. J. Amer. Stat. Assoc. 71,
pp. 835-841.
Visco, I. 1978. On obtaining the right sign of a coefficient
estimate by 'omitting a variable from the regression.
J. Econometrics 7, pp. 115-117.
© Copyright 2026 Paperzz