Robust Regression by Trimmed Least-Squares Estimation
by
David Ruppert
and
Raymond J. Carroll
Koenker and Bassett (1978) introduced regression quantiles and suggested
_.
a method of trimmed
least~squares
estimation based upon them.
We develop
asymptotic representations of regression quantiles and trimmed least-squares
estimators.
model.
The latter are robust, easily computed estimators for the linear
Moreover, robust confidence ellipsoids and tests of general linear
hypotheses based on trimmed least squares are available and are computationally
similar to those based on ordinary least squares.
2
David Ruppert is Assistant Professor and Raymond J. Carroll is Associate
Professor, both at the Department of Statistics, the University of North
Carolina at Chapel Hill.
This research was supported by the National Science
Foundation Grant NSF MCS78-01240
and the Air Force Office of Scientific Research
under contract AFOSR-75-2796.
The authors wish to thank Shiv K. Aggarwal for his programming assistance.
4Ir~
3
1.
INTRODUCTION
We will consider the linear model
xS+
where Y
= (YI, ... ,Y ) "
~
n
e
(1.1)
X is a nxp matrix of known constants,
is a vector of unknown parameters, and
i. i. d. with distribution F.
~ =
S=
~
(Sl""'Sp )'
(el, •.. ,e ) where el, ... ,e are
n
n
Recently there has been much interest in estima-
tors of S which do not have two serious drawbacks of the least-squares estimator, inefficiency when the errors have a distribution with heavier tails than
the Gaussian and great sensitivity to a few outlying observations.
In general,
such methods, which fall under the rubic of robust regression, are extensions
of techniques originally introduced for estimating location parameters.
In the
location model three broad classes of estimators, M,L and R estimators are
available.
See Huber (1977) for an introduction to these classes.
For the linear model, M and R estimators have been studied extensively.
Until recently only Bickel (1973) has studied regression analogs of L-estimators.
His estimates have attractive asymptotic efficiencies, but they are computationally complex.
Moreover, they are not equivariant to reparametrization (see
remarks after his Theorem 3.1).
There are compelling reasons for extending L-estimators to the linear
model.
For the location problem, L-estimators, particularly trimmed means,
are attractive to those working with real data.
Stigler (1977) recently
applied robusts estimators to historical data and concluded that "the 10%
trimmed mean (the smallest nonzero trimming percentage included in the study)
emerges as the recommended estimator."
Jaeckel (1971) has shown that if F is sYmmetric then for each L-estimator
of location there are asymptotically equivalent M and R estimators.
However,
4
without knowledge of f it is not possible
corresponding M and R estimators.
equivalent to Huber's M-estimate
to match up a L-estimator with its
For example, trimmed means are asymptotically
which is the solution b to
n
I
i=l
p((X.-b)js
.
1
.
n ) = min!
where
if
Ixl
< k
if
Ixl
> k
The value of k is determined by the trimming proportion a of the trimmed mean,
F, and the choice of s .
n
In the scale non-invariant case (s
n
= 1)
k = F-l(l-a).
The practicing statistician, who known only his data, may find his intuition
of more assistance when choosing a compared with k.
Recently Koenker and Bassett (1978) have extended the concept of quanti1es
to the linear model.
Let 0 < 8 < 1.
=
Define
8 - 1
if
x > 0
if
x < 0, and
(1.2)
Then a 8th regression quantile, S(8), is any value of p which solves
x. b)
~1
~
=
min!
(1.3)
Their Theorem 4.2 shows that regression quanti1es have asymptotic behavior
similar to sample quantiles in the location problem.
Therefore, L-estimates
consisting of linear combinations of a fixed number of order statistics, for
example the median, trimean, and Gastwirth's estimator, are easily extended
5
to the linear model and have the same asymptotic efficiencies as in the location model.
Moreover, as they point out, regression quantiles can be easily
computed by linear programming techniques.
They also suggest the following
trimmed least-squares estimators, call it ~T(a): remove from the sample any
observations whose residual from Sea) is negative or whose residual from
A
S(l-a) is positive and calculate the least-squares estimator using the remaining
observations.
They conjecture that if lim n-l(X'X)
A
then the variance of §T(a) is n
-1 2
0
= Q(positive definite),
n~
1
-1 2
(a,F)Q- , where n 0 (a,F) is the variance of
an a-trimmed mean from a population with distribution F.
A
A
In this paper we develop asymptotic expansions for §(8) and §T(a) which
provide simple proofs of Koenker and Bassett's Theorem 4.2 and their conjecture
A
about the asymptotic covariance of §T(a).
In the location model, if F is aSYmmetric then there is no natural parameter to estimate.
In the linear model, if the design matrix is chosen so one
column, say the first, consists entirely of ones and the remaining columns each
sum to zero, then our expansions show that for each 0 < a <
~
A
n 2 (§T(a) - S - 8(a))
L
+
N(O,Q
-1
0
2
~
(a,F))
where o(a) is a vector whose components are all zero except for the first.
Therefore, the ambiguity about the parameter being estimated involves only the
intercept and none of the slope parameters.
Additionally
and general linear
we present a large sample theory of confidence ellipsoids
hypothesis testing, which is quite similar to that of least
squares estimation with Gaussian errors.
The close analogy between the asymptotic distributions of trimmed means
of our trimmed least-squares estimator, ~T(a) is remarkable.
Other reasonable
definitions of a trimmed least squares estimator do not have this property.
6
For example, define an estimator of
S,
call it K50, as follows: compute the
A
residuals from S(.5) and after removing the observations with k = [an] smallest
and k largest residuals compute the least squares estimator.
The asymptotic
behavior of K50, which is complicated and is not identical to that of §T(a),
will be reported elsewhere.
Section 2 presents the formal model being considered, the main results are
in Section 3, and several examples are found in Section 4.
Proofs are in the
appendix.
2.
NOTATION AND ASSUMPTIONS
Although y, X, and e will depend on
Recall the form (1.1) of our model.
n we will not make that explicit in the notation.
the i th row of X.
Let x.
= (x. 1 ' ... ,x. ) be
~1
1
1p
Assume XiI = 1, i = l, ... ,n,
lim (
n~
-k
max (n 21 x .. I)) = 0,
j :S;p; i:S;n
1J
(2.1)
and there exists positive definite Q such that
lim n
-1
(X'X)
Q.
=
(2.2)
n~
-1
Without loss of generality, we assume F
p-variate Gaussian distribution with mean
a
~
p
< 8 < 1,
a
= F-1 (p)
< 8
1
< ... < 8
and define
m
~l
< 1, and
= ~a
and
l
a
~2
(Yz) = O.
p
(~,L)
<
l
= ~a .
(r).
~
J
=
denote the
Assume
< 1. For a < p < 1, let
2
Assume F is everywhere continuous
~
< a
2
and has a continuous positive density in neighborhoods of
and ~2'
~
and covariance L.
~
< a
Let N
~8'
~8 , ... , ~e '
1
n
Whenever r a number, r is a p-dimensional vector with (:)1
0, j
~
2, ... ,p.
Let I
P
be the pxp identity matrix.
=
rand
~l'
7
3.
MAIN RESULTS
In Section 1 we defined a 8th regression quantile to be any value of b
which solves (1.3).
We now assume that some rule has been imposed which for
A
each 8 selects a single solution b which we denote by S(8).
Asymptotic
results do not depend on the rule used.
Theorem 1: The sotution ~(8) of (1.3) satisfies
1
n
Define §(8) = §
+
~8'
-"2
nAp
\'
L
i=l
x.
~1
~8(Y' 1
x. S(8))
~1
~
0
The next theorem shows that S(8)
S(8) is essentially
a sum of i.i.d. random variables.
Theorem 2: The following representation holds:
Theorem 2 provides an easy proof of Theorem 4.2 of Koenker and Bassett (1978)
which we state as a corollary.
Corollary 1:
Let
~ = ~(8l,
... ,8m; F) be the symmetric mxm matrix defined by
8.(1-8.)
~
..
=
1J
J
1
f(~(8.))f(~(8.))
1
J
Then
(3.1)
We now define trimmed least-squares estimators.
Since F is not assumed
to be sYmmetric it is natural to allow aSYmmetric trimming.
Let a
= (aI'
a 2)
A
and define
~(~)
to be a least-squares estimator calculated after removing all
observations satisfying
8
(3. Z)
(Asymptotic results are unaffected by requiring strict inequalities
Let a.
in (3.2), which is Koenker and Bassett's suggestion.)
1
=0
or 1
according as i satisfies (3.Z) or not and let A be the nxn diagonal
matrix with A..
11
=
a..
1
Then
A
~(~) =
where (X'AX)
(X'AX) (X'Ay)
is a generalized inverse of (X'AX).
X'AX will be invertible.)
<P (x)
(For n sufficiently large
Let
=
£;/ (aZ-a l )
= xl (aZ-a l )
= £;/ (aZ-a l )
x < £;1
if
if
£;1 ~ x ~ £;2
if
(3.3)
£;Z < x .
Define
o(a)
and
By deWet and Venter (1974, equation (6)), aZ(a,F) is the asymptotic variance
of a trimmed mean with trimming proportions a
distribution F.
l
and l-a
Z
from a population with
e·
9
We have the asymptotic expansion
Theorem 3:
n
Q-1 n -~ {L."
S)
Therefore~
i=l
x!(¢(e.) - E¢(e.)
~~
1
+ c5(a)} +
0
1
P
(1) .
(3.4)
if
n
LX.. = 0
i=l
for
j
1J
= 2, ... , P
(3.5)
•
then
(3.6)
Expression (3.4) is similar to a result of deWet and Venter (1974, equation
(5)).
Note that (3.6) verifies Koenker and Bassett's hypothesis on the covar-
iance of ~T(~).
for
Moreover, since §T(~)i
= §i for i =
2, ... ,p the bias of ~T(~)
S involves only the intercept, §l' and not the slopes. Also
§T(~)l
= S,
if F is synunetric.
We will now show that
S
J
2 (a,F) can be estimated consistently.
Let S be the sum of squares for residuals calculated from the trimmed
Theorem 4:
Let a.
0
(6 (a.)
J
~
= y'A(I P-X(X'AX)-X')Ay
~
~
A
S (a,))l for j = 1,2, and
~S
Then
2
s (a,F)
P
+ 0
2
(a,F)
We have seen that applying least squares to the trimmed sample procedures a
~
robust point estimator.
We will now see that one can also construct confidence
10
ellipsoids and test general linear hypotheses by applying simple modifications
of least-squares methods to the trimmed sample.
Suppose m is the number of observations which have been removed bV
Theorem 5:
trimming.
For 0
< E <
distribution with n
l
1, let F(nl,nZ,E) denote the (I-E) quantile of the F
and n
Z
degrees of freedom and let
Suppose for some integer 9" K and c are matrices of size 9,xp and
respectively~
lim
n-+oo
and that K has rank 9,.
If K'
§T(~)
~
c, then
P{(K'~T(~)_~)'[K'(X'AX)-lK]-l(K'~T(~)-:)':::' d(9" n-m-p,d} ~
Letting K
~
I
p
and:
~ ~ (~)
9,xl,
(3.7)
E
the confidence ellipsoid
(3.8)
for
~T(~)
has an asymptotic confidence coefficient of (I-E).
See
Scheffe (1959) for a discussion of confidence ellipsoids and their use.
More-
over, if we test
H : KI
o
6
~T
(a)
~
~
c
versus
(3.9)
by rejecting H whenever
O
(3.10)
then the asymptotic size of our test is
E.
Most hypotheses of interest in
regression analysis are special cases of l3. 9).
See Searle (1971, section 3.6)
11
for further discussion.
Of course, in the special cases where a
and F is Gaussian, (3.8)
and exact size
l
= 0,
a
Z
=1
(so m = 0 and A
= I)
is an exact l-E confidence ellipsoid and (3.10) is
test.
E
Finally let us demonstrate that see) and ~(~) are equivariant under the transformation which maps ~ to A-l~ and X to XA, where A is any pxp nonsingular
matrix.
The equivariance of §(e) is liv) of Koenker and Bassett's Theorem 3.2
(1978).
Thus the residuals are invariant, which implies that the observations
trimmed are the same for the original model and the transformed model.
Then
the equivariance of §T(~) follows from the equivariance of the least squares
estimator.
EXAMPLES
4.
By presenting a plot of regression quantiles for a simple linear regression
example and comparing trimmed least squares with several other estimation
techniques in two cases of multiple regression, we hope to further illustrate
the utility of regression quantiles.
All computations where performed on the
IBM 360 at the University of North Carolina at Chapel Hill.
Regression quan-
tiles were computed using the linear programming routine LPMPS described by
McKeown and Rubin (1977).
The SAS package was used for least-squares
calculations.
For any function
~,
following Huber (1977) we say b is a scale invariant
M-estimate corresponding to
Wif
(b,o) is a solution to the equations
n
I
i=l
n
I
i=l
Z
~ ((y i
~(
(y. - x. b) /0) = 0
1
_1_
and
x.
b)/o] = (n-p) J~(J() ~Z(x) d~(x)
_1
12
where ¢ is the standard Gaussian distribution function.
specific M-estimates, the Huber where
~(x) =
We will consider two
max(-1.5, min(x, 1.5)) and the
Hampel where
if
o<
if
1.5 < x < 3.5
x < 1.5
(4.1)
and
~(-x)
= -~(x).
=
(8-x)/3
if
3.5 < x < 8
=
a
if
8 < x
The Huber and Hampel estimators were computed iteratively
using a program written by one of the authors (RJC) and adapted from work of
Lenth (1976).
The dependent variable of the first example is average biweekly water
temperature in the shrimp nursery areas of the Pamlico Sound, North Carolina,
from March to May of 1972 to 1977 with several biweekly periods missing.
The
single independent variable is average air temperature at Cape Hatteras for the
same biweekly periods.
Figure A is a scatter plot of the data with the regres-
sion quantile lines for
e
=
.05, .50, and .95.
quantiles mentioned in this section are unique.)
(These and all regression
There appears to be no
serious outliers in this data set and the least squares, Huber, Hampel, and
trimmed least squares estimates are very similar for both slope and intercept.
In the second example, average biweekly water salinity in Pamlico Sound
is the dependent variable.
The three independent variables are a linear time
trend, water discharged by two rivers emptying into the Sound, and salinity in
the Sound lagged one month.
The data is from the shrimp harvest seasons of
1972 to 1977.
The last example is the stackloss data of Brownlee (1965, section 13.12),
which has been further studied by Daniel and Wood (1971, chapter 5), and
4It
13
Andrews (1974), who reviews the earlier work.
tors which are described in table 1.
We will compare several estima-
For each estimation, let {r.} be the
1
residuals, let m be the median of {r.}, and then define MAD to be the median
1
of {Ir.-ml}.
1
Of course, as is usual with real data it is impossible to compare
estimators with respect to estimation error since the true parameters are
unknown.
Instead, we rank the estimators according to their MAD and thus
compare their relative success at fitting the data.
Table 2 presents the MAD's and their ranks for both examples.
Least
squares performs relatively poorly as do Huber and Hampel, and KIS does about
as well as Huber and Hampel.
In the introduction, it was mentioned that the
performance of M-estimators depends
scale estimation.
upon, among other factors, the method of
This should be reemphasized.
The Hampel and Huber estimators
may perform well on this data set, if scale is estimated in a manner other than
ours.
The KOS estimator does somewhat better than Huber and Hampel.
Since in
these two cases (and often where n is small), KOS is least squares after
removing 2p (here 8), it might be expected that KOS would fair poorly since the
proportion of observations removed is large.
Yet KOS does well.
The KSO estimator's performance makes it worthy of further study.
The Andrews estimator is admirably suited to the stackloss data.
However,
this data is somewhat extreme in having four apparent outliers in 21 observations; here LAD outperforms all estimators except Andrews.
Since we did not
program a routine for computations of the Andrews estimator, it is not available
for the shrimp harvest data.
LAD ranks third there.
Perhaps LAD should be more
widely used despite its relative inefficiency in the Gaussian model; it works
well with some "dirty" data sets.
14
APPENDIX
X, x.
(1)" .. ,x.(p+l),
such that y.
~l
~l
~l
Proof:
p-vector with (f.).
~l
and p+l rows of
x.
(.)b
~l
J ~ for j = 1, ... ,p+l.
=
J
Let G(b) denote the left hand side of (1.3) and f. be the
~l
=0
G! (b)
J ~
or 1 according as ilj or i=j.
G(b
= lim
Define
(I-e)h f.) - G(b - e h f.)
+
~~J'__
h~O
~~J"__
h
Notice that the limit always exists.
Let H.(a)
J
=
-G!(s(e) + a f.).
J
~
~J
Clearly,
=
H. (0)
J
E
b~
Routine. Use the continuity of F.
Proof of Theorem 1:
For
vector~
With probabiZity one there exists no
Lemma A.l:
> 0, we have
H. (-E) < H. (0) < H. (E:)
J
-
J
-
J
3.nd
J
°
I -< H.J (E)
- H. (-E)
J
H. (E) > 0
J
Thus, for all
E
and H. (-E) <
> 0,
IH. (0)
J
(A. I)
n
=
Letting
I
i=l
E ~
x.. {ljJe(Yk - x. ~(e)
~1
1J
° in
~
+
EX •• )
1J
-
ljJe lY 1'
x. ~(e) ~1
~
(A.I), we see that
n
IH.(O)I <
J
L Ix..
1 I(y.1
1J
i=l
-x. Sle) =0).
~1
EX •• ) } .
1J
15
Theorem 1 now follows from Lemma A.l.
Lemma A.2:
For I:,
E
RP, define
M(M = n
n
-~
x. I:, n -~ - f,. )
L
",1
i=l
e
~
and
U(~)
For aU L >
=
M(I:,) - M(O) -
- M(O))
E(M(~)
a
sup
0<1
-
II:,I I<L
-
IIU(~) II = 0 (1)
(A.2)
P
~
and
sup
IIM(~) - M(~)
0~11~ II <L
,~
Proof:
II
=
(A.3)
0p (1) .
Equation (A.2) follows from Lemma 4.1 of Bickel (1975) since F is
continuous near f,.e"
Remark:
+ f(r;e)Q~
Equation (A.3) is verified by noting that as h
+
00
Equation (A.2) is the conclusion of Theorem 4.1 of Jureckova (1977)
which she proves under conditions different from ours.
Lemma A.3:
For aZZ
E
> 0, there exists K > 0,
n>
Her C.. is our x .. n -~
J1
1J
0, and an integer nO such
that if n > no' then
p{
inf
IIM(~) II < n} <
E
•
111:,11 ->K
~
Proof:
This is shown in exactly the same manner as Lemma 5.2 of Jureckova (1977).
16
Proof of Theorem 2:
In M(~) replace ~ by
In
(see) - s(e».
By theorem 1
n
M(1rl
(s(e) - see))
= n-
Jz
\'
x '. ,I, ("
L
i=d
From Lemma A.3, Irl (s(e) - s(e))
=
'l'e
",1
x.
see)) = op(l) .
",1
Y1'
op (1). Theorem
CA.4)
2 follows from
(A.3) and (A.4).
Proof of Corollary 1:
A
Let y(e i ) = §(e i ) - §(e i )
We need only show that i f ~j
E
and
y' = (y(el)', ... ,yCem)').
(~i,"":~) then
RP for j=l, ... ,m and c' =
L
yn
c' y V
+ N(O,c'
'"
(~0
Q-1 )c).
'"
(A. S)
'"
Define
and
m
n·l ' =
I
j=l
n·1J..
By Theorem 2,
n
Irl c'y
= n -Jz I
n.
i=l
+
0
1
P
(1)
Then (A.S) follows since routine calculations show that
conditions of Lindeberg's Central Limit Theorem.
.
nl .,n 2 ., ...
satisfy the
(See for example, Loeve (1963,
section 20.2, Theorem B).
For A any matrix, let
IAI = max
IA..
I.
"
1J
1,
Lemma A.4:
J
Let D.1n (= D.)
be a rxc matrix.
1
sup (n - 1
n
Suppose
(A.6)
J
17
Let I be an open interval containing
~l
and
~2
and let the function g(x) be
For ~l' ~2' and ~3 in RP
defined for all x and Lipschitz continuous on I.
and 6 =: (~l' ~2' ~3) define
T* CL'I)
T(6) =: T*(6) - E T*(6),
(A.7)
S*(6) =: T*(6) - T*(O),
~
~
S(6) =: S*(6) - E S*(6)
Then for all M > 0,
IS(6) I=:o (1).
~
P
Proof:
Here we follow Bickel (1975, Lemma 4.1) closely.
assume M=:l.
For convenience
Define for £ =: 1,2
and let
b. (6) =: g ( e. + x
1
1
~
~i
6
-1:
3
n 2)
Then for all n large enough
varIS(~)
I
< n
-1
I 10 1.\ 2
n
i=:l
+
n
-1
Varlb.1 (M
- b.1 (0) jI{e.1
~
E
n
2
I
\0.\ Vadb. (0)(1. (6) - 1. (0)
i=:l
1
1
11 ~
11
Since g is Lipschitz on I
n
I
i=:l
I and e.1
+
X.6
n
~l 3
_~
E
I}
18
by (A.6) and (2.1).
Since F is continuously differentiable in neighborhoods of
~l and ~2
R2 .2 O(n
-1
n
10.1 2
I
i::::l
Ix.
I n",1
1
~Z
)
::::
0(1)
•
Therefore for any fixed 6,
P
SCM
Choose
°
> O.
(A.8)
-+ 0
Now cover the p-dimensional cube, [-l,l]P , with a union of
cubes having vertices on the grid J(o) :::: {ljlo,j20, ... ,jpo); ji:::: O, ~l, ~2, ... ,
or ~ ([1/0]
+
If I~I .2 1, then for ~
I)}.
let V~(~) be the lowest
:::: 1,2,3
vertex of the cube containing ~~ and let V(~) : : (Vl(~)' V2(~)' V3(~))' Then
straightforward calculations show that for some C
Is*(~) - S*(V(~)) 1.2 n1
+ n-
Yz
1
Yz
n
1
i~l 10il Ibi(~) - bi(V(~)) I I(e i .!. I~il n- Yz
n
L 10·1
i::::l
1
lb. (V(M)
1
~
I
E
I)
..
[I{-a. < Pl' < a.}
1
-
1 -
1
I
on-
Ix.
1
+
I{-a. < P2 · < a.}]
1 1 1
~
Z
Now since g is Lipschitz on I and F has a continuous derivative in neighborhoods of
for all 6
~l
E
and
~2
there is a constant Kl such that for m::::l,2
(J(0))3.
Then by (2.2), (A.6) and the Cauchy-Schwarz inequality for
19
Wm(~)
Exactly as in the argument leading to (A.S) we have (letting
=
Wm(o'~l)
o (1) .
p
Thus, for all E
>
0 and 0
>
0, there exists nO such that if n
P{(max
~ EJ3
W (~ )) > E
m
+
0
~
nO' then for m=I,2
K o} < E,
I
~O
whence
p{
sup
~E[0,1]3p
Is*(~)
-
S*(V(~))
I > E
+
KIo} < E
Note that there exists a constant K3 such that
sup
IE(S*(M - S*(V(M))
~E[0,IJ3p
~
~
I :5. K3 0
.
Choosing 0 = e/max (K ,K ) we have that for all E > 0, there exists nO such that
I 3
P(
sup
~dO, 1] 2p
IS(~) - S(V(~)
By (A.8), for this 0 and E there exists n
P(
sup
~ EJ (0) 3
l
IS (~ ) I
~O
I
such that if n
>
~O
Inequalities (A.9) and (A.lO) prove the lemma.
(A.9)
> 3E) < E
E)
<
E
•
~
n
l
(A.IO)
20
Proof of Theorem 3.
By Theorem 2 we have for 9. = 1,2 that
n
I
i=l
x ~ ljJ
",1
0.9.
( e . - ~ n ) + 0p ( 1)
1
(A. H)
.
1v
Therefore,
U* (M = n
n
-1
\L x! x. I{~l + x. I:c.ln -Yz < e. < ~2 + x.
i=l ~l ~l
~l
1
~l
1
T*(M =
n
- 7z
n
A
D
2
n-Yz}
1
\
-7z
L
x! e. I {
~l + x. I:c. n
< e. <
l
i=l ~1 1
~l
1
~2 +
x.
~l
1:c.
2
n -k2} .
Then
For M > 0, if we apply lemma A.4 with D. = x. and g(x) = x we obtain
1
SUJ;1
1
IT*(M - T*(O) - E(T*(I:c.) - T*lO))
I ::
(1)
0
O<II:c.I<M
-
(A.12)
.
P
~-
A Taylor expansion which uses the continuity of f about
~l
and
~2
shows that
(A.13)
Thus by (A.ll), (A.12) and (A.13),
21
n
1 n
-;z \'
L
i=l
= n
-~
2
X!
~1
n
(a 2 - a l ) I x! [¢( e .) - E ¢ (e )
i=l ~1
1
1
<S (a)]
+
~
+
0
P
(l)
.
(A.14)
In a similar manner, n-l(X'AX) = (a .. al)Q + opel) which, together with (A.14)
2
~
n
~
proves (3.4). Then (3.5) implies that n- 2 Q-l I x! <S(a) = n 2 <S(a) and (3.6) follows
i=l 1
~
~ ~
by routine calculations.
V (L\)
= n -1
n
\'
L
i=l
(e. - x. L\ln -~ - <S (c.x)) 2 I (t: l
1
~1
.~
+
A
x!
~1
l..1
2
n -~ < e. < t:
-
1 -
We see that
Applying Lemma A.4 with g(x) = x
2
and D = 1 we have for M > 0,
i
su~
IV(L\) - yeO) - E(V(L\) - YeO))
O~IL\I<M
I
=
(1)
0
P
By a Taylor expansion of F and additional simple calculations
E(V(L\) - YeO»)
IV(L\) -
+
0, whence
yeO) I
Therefore by (3.1) and (3. 6) we have
S = YeO)
By Chebyshev's
inequality Var(V(O))
+
+ 0
P
0 so
(1)
=
0
P
(1)
2
+
x! L\ n -~2
~1
3
)
22
s
~
EV(O) + Opel)
= E(e
Furthermore for j
=
i -
o(~))2 r(sl 2
e
i
2
s2)
+
0
p
(1)
.
1?2
a. = 1;:. - a(a) +
J
J
~
0
P
(1)
by (3.1) and (3.6), and Theorem 4 follows.
Proofs of Theorems 5 and 6:
These follow in a straightforward manner from
(3.6), Theorem 4, and Theorem 4.4 of Billingsley (1968).
23
1.
Regression Estimators
Name
De~cription
LS
Ordinary least squares.
Huber
M-estimator with
Hampel
M-estimate with
K15
Least squares with observations having positive
residuals from the e = .85 quantile hyperplane or
negative residuals from the e = .15 hyperplane
removed.
K05
Least squares after deletion of observations having
non-negative residuals from the e = .95 hyperplane
or non-positive residuals from the e = .05 hyperplane.
Andrews
M-estimate using ~(x) = sin(x/l.5) if Ixl < 1.5 and
~(x) = 0, otherwise and reported by Andrews (1974).
LAD
The
K50
Least squares after removing those observations with
the kth smallest or kth largest residual from the
e = .50 hyperplane for k=1, ... ,[.05n] (n is the sample
size) .
e
~(x)
~
= max(-1.5, min(x, 1.5)).
as in (4.1).
= .50 regression quantile.
24
2.
MAD and Rank of MAD for Estimators in Table 1
D?,ta
Set.
,.
Estimator
(
(n:;3~)
Shrimp Harvest
Stackloss (n=21)
MAD
Rank
MAD
Rank
LS
0.726
8
1.867
6
Huber
0.553
6
1.926
7
Hampel
0.551
5
1.928
8
0.87
1
Andrews
Kl5
0.699
7
1.407
4
K05
0.460
1.5
1.463
5
K50
0.460
1.5
1.204
3
LAD
0.508
3
1.182
2
e,
I
nm
28.001
Lf)
N
2Q. 00
I
··-1
'
.
I
!
39
lJ-2
~
20.00
16.00
w
0:::::
:::>
r-
a:
0:::::
w
12.00
0-
L
w
r0:::::
8.00
l!..J
r-
a:
3:
£t.GO
o
i _ I
30
33
36
lJ-5
AIR TEMPERATURE
Figure A - Scatter Plot and Three Regression
e
....
~lantile
e
....
...
Lines for a Simple Linear Regression Example
t
,.
,<
'
e
26
REFERENCES
Andrews, David F. (1974), "A Robust Method for Multiple Linear Regression,"
Technometrics> 16, 523-531.
Bickel, Peter J. (1973), "On Some Analogues to Linear Combinations of Order
,
Statistics in the Linear Model," The Annals of Statistics> 1, 597-616.
Bickel, Peter J. (1975),
"One~Step
•
Huber Estimates in the Linear Model,"
Journal of the American Statistical Association> 70, 428-433.
Billingsley, Patrick (1968), Convergence of Probability Measures> New York:
John Wiley and Sons.
Brownlee, K. A. (1965), Statistical Theory and Methodology in Science and
Engineering> 2nd Edition> New York: John Wiley.
deWet, T. and Venter, J.H. (1974), "An Asymptotic Representation of Trirrorzed
Means with Applications> " South African Statistics Journal, 8, 127-134.
Daniel, Cuthbert and Wood, Fred S. (1971).
Fitting Equations to Data>
New
York: John Wiley.
e.
Huber, Peter J. (1977), Robust Statistical Procedures> Philadelphia: SIAM.
Jaeckel, Louis A. (1971), "Robust Estimates of Location: Symmetry and Asymmetric
Contamination," The Annals of Mathematical Statistics> 42,
1020~l034.
Jureckova, Jana (1977), "Asymptotic Relations of M-estimates and R-estimates in
Linear Regression Model", The Annals of Statistics, 5, 464-472.
Koenker, Roger and Bassett, Gilbert, Jr. (1978), "Regression Quantiles,"
Econometrica> 46, 33-50.
Lenth, Russell V. (1976), "A Computational Procedure For Robust Multiple Regression,"
Tech. Report 53, University of Iowa.
Loeve, Michel (1963), Probability Theory> New York: Van Nostrand.
McKeown, P.G. and Rubin, D.S. (1977), "A Student Oriented Preprocessor for
MPS/360," Computers and Operations Research> 4, 227-229.
27
Searle, Shayle R. (1971), Linear
Models~
Scheffe, Henry l1959), The Analysis of
New York: John Wiley and Sons.
Variance~
New York: John Wiley and Sons.
Stigler, Stephen M. (1977), "Do Robust Estimators Work with Real Data?"
The Annals of
.•
Statistics~
5, 1055-1098 .
© Copyright 2026 Paperzz