Raghavarao, D. and Smith K.J.C.; (1973)Partial ridge regression."

.e
1
2
This work was supported by NSF Grants GU-2059 and GU-19568 and by
U.S. Air Force Grant No. AFOSR-68-1415.
On leave from Punjab Agricultural University (India).
Reproduction in whole or in part is permitted
for any purpose of the
United States Government
PARTIAL RIDGE REGRESSION1
by
D: Raghavarao 2 and K.J. C. Smith
Department of Statistics
University of North CaroUna at Chapel. HiU
Institute of Statistics Himeo Series No. 863
February, 1973
.e
PARTIAL RIDGE REGRESSION~
by
D. Raghavarao
2
and K.J.C. Smith
Department of Statistics
University of North CaroZina at ChapeZ BiZZ
ABSTRACT
A partial ridge estimator is proposed as a modification of the
Hoerl and Kennard ridge regression estimator.
It is shown that the
proposed estimator has certain advantages over the ridge estimator.
The problem of taking an additional observation to meet certain
optimality criteria is also discussed.
1
This work was supported by NSF Grants GU-2059 and GU-19568 and by
Air Force Grant No. AFOSR-68-l4l5.
U~S.
2
.e
On leave from Punjab Agricultural University (India) •
.e
1.
z. =
Introduction. Consider the problem of fitting a linear model
z.' ""
X.§. + f.., where
the dependent variable;
(Yl'Y2'·"'Yn)
X
= (X ij )
is a vector of
is an
n
n x p matrix of rank
~ "" (xil,xiZ""'Xip) being the vector of
and
E'
p,
i-th observations on the
independent variables (i "" 1,2, .•. ,n); ~' "" (Sl'SZ""'Sp)
of parameters to be estimated;
observations on
is the vector
is an n-dimensional vector of random
errors assumed to be distributed with mean vector QI and dispersion matrix
2
Q being a zero vector and In the identity matrix of order n.
a In'
Without loss of generality we assume that the dependent and independent
XiX
variables are standardized so that
Let
Al
~ A2~
is a correlation matrix.
•.. ~ Ap be the eigenvalues of
XIX
and let
~1'~2""'~
be a set of ortho-normal eigenvectors associated with the eigenvalues
Ai (i = 1,2, ... ,p).
Let
a
i
=~!
S for
-'1.-
The usual least squares estimator of
i
(1.1)
= 1,2, ••• ,p.
i
is given by
~
= (X X) -1 X y..
I
and has the unsatisfactory property, when
I
X'X
differs substantially from an
identity matrix, that the mean squared error or expected squared distance
!
from
to
!
tends to be large compared to that of an orthogonal system.
Often an investigator is interested in obtaining a variance balanced
design in which each parameter Si is estimated with equal
precision.
The departure of a design from variance balancedness increases the more
XIX
differs from an identity matrix.
The ridge regression method proposed by Hoerl and Kennerd (1970) estimates
~
by the ridge estimator given by
(1. 2)
.e
8*
_
=
(XiX + kI p)-~'v
.J... ,
2
.e
where
k
is a positive real number satisfying
2
2
k ==
< (1 la
max
(1.3)
A
ai(i = 1,2, ...• p).
amax being the maximum of
estimator of
estimator
~
f*
The estimator
is a biased
but has a smaller mean squared error than the least squares
!.
i* , the
We propose here as an alternative to the ridge estimator
estimator
(1.4)
where
(1.5)
e
This estimator may be called the partial ridge estimator of
~.
We show
~! (wit:h minimum mean
in Section 2 that the partial ridge estimator estimates
squared error and estimates the ..ilG) (i = 1,'::, ••• ,p-1) unbiasedly.
we
consider the problem of
ta~ing e.G a,~ditio:l",l
observation so as to
In Section 3
rel'~ove
the bias of·t~e ~rtia1.riQge'2~timatorand to attain certain optimality
criteria.
2.
Partial Ridge Estimator.
To control the mean squared error of the
estimator of the coefficient vector
~
in the model
and Kennard (1970) proposed a ridge estimator,
showed that the mean squared error of
squares estimator
e*
of
~.
s* , defined by
(1.2) and
was less than that of the least
Specifically. the mean squared error of
is
(2.1)
.e
~
S*
y.. = X! 4- £ , Hoer1
[[(e*
= (12
!
Ai
i=l (A +k)2
i
= y 1 (k)
+
Y2 (k)
+
!
i=l (A .+k) 2
1.
say,
.e
3
where E[]
denotes the expected value of the term in braces.
Yl (k)
is the sum of the variances of the components of
e*
y (k)
is the bias component of the mean squared error.
When
2
The term
and the term
k = 0, the
ridge estimator coincides with the least squares estimator.
!
We propose as an alternative to the ridge estimator of
a partial
! , denoted by -p
S , defined by (1.4). The partial
ridge estimator of
ridge estimator has the following property:
Theorem 2.1. The partial ridge estimator S = (X'X + k ~ ~,)-l X'v ,
I'
P'"""P""""'P..lwh ere
1~
p
=
0
2/ a 2 , is such that
P
, S
~
--PI'
with minimum mean squared error and
~~
estimator of
Proof.
~l' ~2""'~
"""'"P
associated with the eigen values
be eigenvectors
Let
A >A > ... >A
1 = 2 =
= p
associated with the eigenvalues
.!ll' nZ ' ••• ,.!:lQ._p
= 1,2, ... ,p)
vector space.
is the best linear unbiased
are a set of or tho-normal eigenvectors
and
.!:lj
of
X'X;
of
xx'
be a set of orthogonal eigenvectors
the zero eigen value of multiplicity
(i
~'8
,-p
8
1'-
(i = 1,2, ••• ,p-l).
8
-:L -
Since
~'
is the linear estimator of
(j
n-p
= 1,2, •.• , n-p)
of XX'.
will
(i = 1,2, .... p).
associated with
x~
The vectors
form a basis of an
n-dimensional
= ~!
Without loss of generality any linear estimator of t
can be taken to be
~
R: = 2 .c i
(2.2)
where
and
estimator of
(2.3)
i=1
are scalars.
t
n-p
s-i x' y +.L
d
J=l j
~ Y. ,
as an
can be shown to be
p-l
2 2 2
2
c i (Ai a.J. + o Ai) + (c A _1)2 a 2 +
P
P P
J.=l
'"
2
E[(t-t)
] = )
Minimizing (2.3) with respect to the coefficients
.e
J
The mean squared error of
c
i
C
2
A
0
P P
and d.
J
2
n-p
+
I
j=l
we have
d
2
J
0
2
.e
4
= .... = c p-1
(2.4)
d
n-p
=
0, c
p
=
1
2
A+~
P 2
a
p
Choosing
k
p
= a2la p2
error is given by
of
~!
~laking
a
(A
,the linear estimator of
p
+ k )-1 ~' X'y.
are the least squares estimators
,-1
~'X'v
i 2..i..J..
1\
~!
1-1 correspondence of estimators of
Si' the required estimator
~
p-l
of
1
4 = (Ii=l c
~
~
=
(X'X
+
k
~
ti
with least mean squared
The best linear unbiased estimators
t'
p
~'e
-p -
(i
1 2 ••• ,p-1) •
="
with estimators of
is given by
1
+ A +k
P P
~ ~)X 'y
E; ~,)-lX'Y.
P --p --p
This completes the proof of Theorem 2.1.
The problem of estimating
k
p
can be solved either by graphical or
iterative procedures as described by Hoerl and Kennard (1970).
From (2.1) and (2.3) we note that the bias component in the mean squared
error of the partial ridge estimator is smaller than that of the ridge
estimator.
3. Optimum choice of an additional observation. The equation (1.4) defining
the partial ridge estimator suggests taking an additional observation
Yn+l
on the dependent variable corresponding to some choice of values of the
independent variables.
Let us assume without loss of generality that the
design matrix with an additional observation is
.e
(3.1)
xx'
- n+l
J
.e
5
where
~+l
~+l
estimator of
~
=1
and
w is a non-zero scalar.
The least squares
using the additional observation is
(3.2)
=
' )-1 X'1
(X 'X + w ~+l ~+l
which is an unbiased estimator of
~.
Before discussing the optimal choice of the additional observation, we
shall introduce the following:
Definition 3.1
where
e
X
is a
Let
n x p
ness of the design
Al
~
A2
~ ••• ~
design matrix.
X
A
p
be the eigenvalues of
X'X,
The departure from variance balanced-
is measured by
(3.3)
An equivalent expression for
Q(X)
is
(3.4)
where
tr[A]
denotes the trace of the matrix
Definition 3.2.
matrices
X, the design
Definition 3.3.
matrices
[Kiefer (1959)]
X, the design
Of the class of all
X is A-optimal if
[Kiefer (1959)]
A.
tr[(X,x)-l]
Of the class of all
X is D-optimal if
det[(x'X)-l]
det[ ] denotes the determinant of the matrix in braces •
.e
n
x
p
design
is minimum.
n x p
design
is minimum, where
6
.e
The following theorem gives the optimum choice of
w and
~+1
for an-
additional observation:
Theorem 3.1.
of
w and
~l
Given the
n x p
design matrix
X, among possible choices
in (3.1), the design
X
X*=
(3.5)
J~
1- -
p
~
,
t>
has the following properties:
Q,(X*) < Q(X) •
(i)
(ii)
Among the class of designs
Xl
in (3.1),
*
Q(X ) ~ Q(X )
1
(iii)
Among the class of designs
minimum,
Proof.
X*
= Q(X)
minimum value is
(3.6)
w
Xl of (3.1),
2
1
+ 2w ~+l X'X ~+l+ w (1 - p)
~+l (XiX)~+1
A •
p
is minimized when
and minimizing with respect to
A
w
=
1
.e
-
~+l
Substituting this least value of
2w A
=~
and the
~+lX'X ~+1
in
w, we obtain the stationary value of
to be
(3.7)
Q(X )
1
is A- and D-optimal •.
For the design
The quadratic form
Xl in (3.1) and subject to
- AP
1
p
7
.e
Substituting into (3.6), the minimum value of
(r _
*
~in(Xl)
(3.8)
= Q(X)
= Q(X)
Q(X ) is
l
A )2
1 _
-
r
p
*
Thus
Q(X) < Q(X).
*
Moreover
is the minimum value of
Q(X)
Q(X ) •
1
Now
(3.9)
= det[(X'X) -1 ] (1
The maximum value of
~+l
I
('
X
)-1
is llA
~+l
X
is minimized with respect to
order that
Q(X ) be least,
1
for
P
~+l
Xl
-0+1
when
w must be given by (3.7).
D-optimal among the class of designs
To prove the
+ w ~+l (XiX)
A-optimality of
minimum
Q(X ), we observe that
l
(3.10)
1
tr[(X I X )- ]
1 1
=
Xl
-1-1
~+l)
•
= ~I
~+l =~.
Hence
with minimum
Hence
'"'""P
X*
is
Q(X ).
1
X* among the class of designs
I
In
Xl
with
( I )-2
w ~+l X X ~+l
tr[(XIX)- ] I
,-1
1+w ?S.n+1 (X X) ?S.n+1
1
The maximum value of the second term on the right hand side of (3.10) is
the maximum of
(3.11)
)J
=
1
A (~+ 1)
,
w
where A's are the eigenvalues of
is given by (3.7) and the maximum
x'
-0+1
= '~"'""P
minimum
.e
I
•
Thus
Q(X ) •
l
X*
is
XiX.
)J
In order that
is attained when
A-optimal among the class of
Q(X )
l
is least, w
A=A
p
Xl
and
matrices with
.e
8
References
Hoer1, Arthur E. and Kennard, Robert loT. (1970). "Ridge Regression:
Biased Estimation for Non-orthogonal Problems. II TeC!hnometriC!s,
12, 55-67.
Kiefer, J. (1959). "Optimum Experimental Designs." J. Roy. Statist.
SoC!. , 21B, 272-304 •
.e