Ruppert, DavidOn the Bounded-Influence Regression Estimator of Krasker and Welsch."

ON TIlE BOUNDED-INFLUENCE REGRESSION ESTIMATOR
OF KRASKER AND WELSGf
David Ruppert 1
Inavid Ruppert is Associate Professor, Department of Statistics, University
of North Carolina, Chapel Hill, NC 27514. This research was supported by
National Science Foundation Grant MCS-8l00748.
ABSTRACT
•
Recently, Krasker and Welsch (1982) considered a class of bounded-inflllence regression estimators.
They showed that within this class the
so-called Krasker-Welsch estimator is the only solution to a first-order
necessary condition for strong optimality, i.e., for minimizing, in the sense
of positive definiteness, the asymptotic covariance matrix.
However, whether
any strongly optimal estimator in fact exists remained an open question.
this note, an example is given where no strongly optimal estimator exists.
In
1.
I NfRODUCfION
In a recent article, Krasker and Welsch (1982) consider robust estimators for the linear regression model
y. = x. B +
I
I
€.
1
where x.1 is a p-dimensiona1 row vector, (y.I ,x.),
i=l, ... ,n, are independent.
I
2
and identically distributed, and E. is distributed N(O,a ) independently of
I
x·.
I
Their interest is in bounding the influence of outliers in both xl' and
y..
They consider M-estimators of the form
1
n
(1.1)
.
o = i=l
I w(y.1 ,x.;B
) (y. -x. B )x~
I
n I I n I
where w is a scalar-valued weighting ftmction.
S11bstitute an estimator a for a.
n
this estimator is
~?(y,x;w)
If w depends on a, then we
The influence ftmction (Hampel, 1974) for
= B-w1 w(y,x;B) (y-xB)x t
where
(1. 2)
The asymptotic covariance matrix is V = B-lA B- 1
w
w ww
where
(1. 3)
Kr3sker-Welsch discuss several measures of sensitivity.
,Ul
They choose to use
invariant (and quite reasonable) measure y w defined to be
-2-
==
•
==
sup
y,x
[ ~? t
(y,x;w ) V-1 ~(y,x;w )]J,.~
w
sup [(x A-1 Xt )2.!-1 y-xS I W(y,x;S)]
w
y,x
Because an estimator's influence function is normed by the asymptotic covariance matrix of the estimator itself, Stahel (1981) calls y w the self-standardized sensitivity.
Now let a bound a > 0 be given, and consider the class of all weighting
functions w such that
y
and
oj
w
:,; a
depends on y only through ly-xSI •
We say that w is strongly optimal
within this class of (V
. w-V)
w is positive semidefinite for all w in the class .
Krasker and Welsch show that if a strongly optimal w exists, then (up to a
scalar multiple) it must be (letting
E
= y-xS)
w(y,x;S)
(Thi sis an implicit definition since A appears on the right -hand side.)
w
They conjecture that this w is strongly optimal.
In this note we show by
example that in general no strongly optimal estimators exists.
The Krasker-Welsch estimator is invariant to nonsingular reparametrization.
Invariance is not necessarily desirable when the parameters have
physical meanings.
To show the practical significance of the lack of a strong-
ly optimal estimator, we discuss how one might choose alternatives to the
Krasker-Welsch estimator when there are nuisance parameters.
2•
AN EXAMPLE
We will take
x.
= 1.
Suppose p
(l-D.)Z.
= (1
1
where U1, is
0
1
=3
and
D.Z.)
1
1 1
distributedBemoulli(~), 2 ,
is distributed N(O,l), and
1
E.,
1
D.,
1
and Z. are mutually independent.
1
For h. > 0 let
w(y,x;~,I'::.)
=
I'::.
min {I, a/(1'::. IEI (x
if x(Z)
=
-:f
-1 t
1-
X )2)}
AI'::.
°
min {l,a/(I€I(xA~lxt)~)}
if x(Z)
=
°,
where x(Z) is the second coordinate of x and A"
u
(1. 3)
f\f':.,
.
By theorem 1 of Maronna (1976)
is a solution to equation
wI'::.
is unique.
AI'::.
This also proves that
is diagonal, since the syrronetry of Z implies that the matrix obtained by
l~lltiplying
Say
=A
AI'::.
(2.1)
=
the off-diagonal elements of
diag
(~ 1'::.'
Al , I'::.
,
1
=
2
+
2
1
AZ,1'::.'
•
A
3
AI'::.
by -1 is also a solution to (1.3).
,1'::.)· Now, equation (1.3) can be rewritten as
Z
Z -1
E mln{(I'::.E) , a (AI ,I'::.
•
Z Z -1
E mln{E , a (AI , I'::.
+
+
-1 Z-I
Az,I'::.Z)
}
-1 Z-l
A3 ,I'::.Z) }
(2. Z)
(2.3)
We will take a > Z.
If we divide (Z.l) by
and add the resulting equations, we obtain
~,I'::.'
then divide (Z.Z) by
AZ,I'::.'
-4(2.4)
1
+ 2
Now, choose a sequence 6.
m
A.
1,00
= lim Ai
6. +
m
00
+
2 -1
-1 2 -1 -1
a (A_
-1., ~ + A3, ~Z) }Al , ~
,
00
= 1,Z,3
such that for i
exist (but are possibly +00).
6.
,
Z
•
E mln{E
m
If
the limits
~
<
00
"
00
Az
and
00
<
00
,
.
then by (2.4) we reach a contradiction,
Similarly, (7..2) shows that
shows that
~
<
00
"
00
and AZ
~
00
, = and AZ,
00
=
00
00
00
<
00
is impossible.
is impossible, and (2.3)
Thus,"\
00
,
=
Az ,
00
=
00
and
A3 ,00 solves
Notice that B6.
= Bw~
is diagonal (as well as
A~)
Thus B(l)
.
n
'
B(Z) and S(3)
n'
n
arc asymptotically uncorrelated, and we can find the asymptotic variance of
S(3) by examining its estimating equation alone.
This is
n
¥
=
O
L e . min {l, a ("\-1 ~ +
(2.5)
i=l
where e.
1
=
1
y.-x.S.
lIn
"
2 -~ Ie. 1-1 } u. Z.
Az-1 ~ z.)
1
1
1 1
The asymptotic variance of S(3) is the same as the asympn
totic variance of Sn in the model
y,=U.Z·S+E.
III
1
with estimating equation given by (Z.5).
for any fl.
(p = 1)
For this new model, y
w6.
=a
Since the new model has a univariate parameter we can use a result
of Hampel (1968, lerrona 5 or 1974, page 391) to show that
-5-
Ai 16 = 0 (i.e., 6 = 00) is optimal, and the 6 = 1 is strictly suboptimal.
,
Now 6
= 00
does not give us an estimator of type (1.1), but it can be
approximated arbitrarily closely by taking
~
sufficiently large.
Thus, for
large ~, w~ is more efficient for estimating S(3) than for the Krasker-Welsch
estimator.
This fact is a practical importance if S(l) and S(2) are nuisance
parameters, or at least are of only secondary importance compared with S(3) .
3.
DISCUSSION
The "psi-functions" of efficient bounded-influence M-estimators are
constructed by downweighting the maximum likelihood estimator's
wherever the latter is too large.
~-function
The downweighting can be equal for all
coordinates, as when one considers only estimators of form (1.1) with w scalar.
Alternatively, the coordinates can be downweighting differently, e.g., by
letting w in (1.1) be a pxp diagonal matrix.
If some parameters are nuisance
parameters, then it might be advantageous to downweight their coordinates
severely.
This allows one to maintain a given bound on the sensitivity while
estimating the non-nuisance parameters more efficiently.
Carroll (1983) mentions the example of two-group analysis of covariance
with a balanced covariate and only the treatment difference of interest.
The
Krasker-Welsch estimator will cause a loss of efficiency for the treatment
difference in order to bound the sensitivity to outlying values of the covariate.
Ceneral classes of M-estimators which allow unequal downweighting of the
components of the score function have been discussed by Hampel (1978),
,
Krasker (1980), and Stahel (1981).
impossible to find
an
Within such a general class, it is generally
estimator which is strongly optimal subject to a bound
on some measure of sensitivity.
-6-
The conjecture that the Krasker-Welsch estimator was optimal within the
class of estimators with w scalar seemed reasonable to the present author,
since having w scalar greatly restricts flexibility.
Cor
•
It did not seem possihle,
example. to treat nuisance parameters differently than other parameters. \
After repeated attempts to prove strong optimality, I stopped work on the
problem.
Recently, Roy Welsch (oral connnunication) mentioned that the
question of strong optimality was still open, but that unpublished work of
Peter Bickel suggested that the Krasker-Welsch estimator might not be strongly
optimal.
Shortly after this, I realized that nuisance parameters could be treated
differently than the other parameters, if some observations contain inforrnation only about the nuisance parameters.
U
i
=:
If the example, observations with
0 are upweighted, so in the limit as 6
-+
00
the observations with U
i
=:
I
are used only for estimating S(3) •
ACKNOWLEDGMENT
I would like to thank R.J. Carroll for encouragement and useful discussions
about this research.
"
,
REFERENCES
•
"
CARROLL, R.J. (1983). Connnent on "Minimax Aspects of Bounded-Influence
Regression" by Peter J. Huber. Journal of the American Statistical
Association, 78. 78-79.
HAMPf:L, F.R. (1968). Contributions to the Theo of Robust Estimation.
Ph.D. Thesis. unIversIty 0 Ca 1 ornla, Ber e1ey.
HAMPEL, F.R. (1974). The Influence OJrve and Its Role in Robust Estimation.
Journal of the American Statistical Association, 62. 1179-1186.
HAMPEL, F.R. (1978). Optimally Botmding the Gross-error-sensitivity and the
Influence of Position in Factor Space .. 1978 proceedin~s of the ASA
Statistical Computing Section. ASA, Washington, DC 5 -64.
KRJ\SKER, W.S. (1980). Estimation in Linear Regression M:>de1s with Disparate
Data Points. Econometrica, 48. 1333-1346.
f
KRASKER, W.S. and WELSCH, R.E. (1982). Efficient BOtmded-Influence Regression
Estimation. Journal of the American Statistical Association, 77 .
595-604.
MARONNA, R.A. (1976). Robust M-estimators of MUltivariate Location and
Scatter. Annals of Statistics,!. 51-67.
W.A. (1981). Robuste Schaetzun en: Infinitesima1e tima1itaet tmd
Schaetzungen von KovarlanzmatrlZen. P .D. DIssertation. SWlSS Fe era1
Institute of Technology, Zurich.
STAHI:L,