Ruppert, David and Carroll, R. J.; Trimmed Least Squares Estimation in the Linear Model."

TRIMMED LEAST SQUARES ESTIMATION IN, THE LINEAR MODEL
by
David Ruppert and Raymond J. Carroll
Abstract
We consider two methods of defining a regression analogue to a trimmed
mean.
The first was suggested by Koenker and Bassett and uses their concept
of regression quantiles.
Its asymptotic behavior is completely analogous to
that of a trimmed mean.
The second method uses residuals from a preliminary
estimator.
Its asymptotic behavior depends heavily on the preliminary
estimate; it behaves, in general, quite differently than the estimator
proposed by Koenker and Bassett, and it can be rather inefficient at the
normal model even if the percent trimming is small.
However, if the
preliminary estimator is the average of the two regression quantiles used
with Koenker and Bassett's estimator, then the first and second methods are
asymptotically equivalent for sYmmetric error distributions.
Key Words and Phrases:
•
regression analogue, trimmed mean, regression
quantile, preliminary estimator, linear model,
trimmed least squares
David Ruppert is an Assistant Professor and Raymond,J. Carroll an
Associate Professor, both at the Department of Statistics, the University of
North Carolina at Chapel Hill. This research was supported by the National
Science Foundation Grant MCS78-01240 and the Air Force Office of Scientific
Research under Contracts AF()S-R:-7S-2796 and AFOSR-80-0080.-
2
1.
Introduction.
This paper is concerned with the linear model
(1.1)
where y'
-
=
(Yl""'y ), X is a nxp matrix of known constants whose i-th row is
n
x., -S' = (Sl""'Sp ) is a vector of unknown parameters, and Z' = (Zl""'Z)
is
n
~
a vector of independent identically distributed random variables with unknown
distribution function F.
Despite the advantages, including efficiency when F
is normal, of the least squares estimator of
~,
this estimator is inefficient
when F has heavier tails than the Gaussian distribution and possesses high
sensitivity to spurious observations.
This inefficiency to heavy-tailed F has
been amply demonstrated for the location submodel by a Monte-Carlo study
~
~
(Andrews (1972)) and by asymptotics, e.g., Table 1 of this paper.
The presence
of spurious data can be modelled by letting F be a mixture of the distribution
function of the "good" data, say standard normal, and that of the "bad" data,
say normal with variance exceeding 1.
S~ch
an F will have, heavier tails than a
normal distribution, and inefficiency with heavy-tailed F appears to be closely
related to sensitivity to outliers.
Huber (1977, p. 3) states that "for most
practical purposes, 'distributional robust' and 'outlier resistant' are
interchangeable."
For the location model, three classes of estimators have
been proposed as alternatives to the sample mean, M, L, and R estimators; see
•
Huber (1977) for an introduction.
Among the L-estimates, the trimmed mean is
particularly attractive because it is easy to compute and is rather efficient
under a variety of circumstances.
As with M-estimates, trimmed means can be used to form confidence
intervals.
Let TM(a) be the a-trimmed mean, let Y(i) be the i-th order
statistic, and with k
= rna],
define
= n{k(Y(k+l)-TM(a))
2
+ k(Y(n_k)-TM(a))
2
n-k
+
I
(Y (i) - TM(a)) 2}/ {(n-2k)(n-2k-l)} .
i=k+l
1
If F is symmetric about ~, then n~(TM(a)-~)/S(a) is asymptotically N(O,l).
(This can be easily seen from Theorems land 2 of deWet and Venter (1974).)
Therefore if we define z to be the (l-y)th quantile of the standard normal
y
distribution,
TM(a) ± Zy/2 Sea)
is a large sample confidence interval.
Tukey and McLaughlin (1963) suggest
replacing Zy/2 by the (1-y/2)th quantile of the t distribution with (n-2k-l)
degrees of freedom.
Huber (1970) uses a heuristic argument to justify a
different choice of degrees of freedom, which is somewhat too complex to give
here.
Gross's (1976)
Monte~Carlo
study of the distribution of TM(a)/S(a)
indicates that the validity (agreement of nominal and actual significance
level) of these confidence intervals will not be wholly satisfactory if n is
small (he studies n = 10 and 20), but with F non-normal they appear to be as
valid as the standard confidence interval based on the sample mean and standard
deviation and the t distribution with n-l degrees of freedom.
Gross also
suggests a more conservative interval procedure.
Hogg (1974) favors trimmed means for the abo'Ve reasons, and because they
can serve as a basis for adaptive estimators.- Stigler (1977) applied robust
estimators to data from 18th and 19th century experiments designed to measure
basic physical constants.
He concluded that "the 10% trimmed mean (the
smallest nonzero trimming percentage included in the study) emerges as the
recommended estimator."
•
4
One might argue, of course, that although L-estimates have desirable
properties, they really offer no advantages over other estimators.
After
all, Jaeckel (1971) has shown that if F is symmetric then for each
L-estimator of location there are asymptotically equivalent M and R
estimators.
However, without knowledge of F it is not possible to match up
an L-estimator with its corresponding M and R estimators.
For example,
trimmed means are asymptotically equivalent to Huber's M-estimate, which is
the solution b to
n
I
p((Xi-b)/sn)
= min!
(1.2)
i=l
where
p(x)
= x 2/2
if Ix I ~ k
= k( Ix I - k/2)
(1.3)
if Ix I > k
The value of k is determined by the trimming proportion a of the trimmed
mean, F, and the choice of s.
n
k
= F-1 (I-a).
In the scale non-invariant case (s
n
= 1),
The practicing statistician who knows only his data may find
his intuition of more assistance when choosing a compared with k.
We do not believe that trimmed means are always preferable to M-estimates,
but rather that they are worthwhile alternatives to M-estimates, particularly
•
to Huber's M-estimate .
For the linear model, Bickel (1973) has proposed a class of one-step
L-estimators depending on a preliminary estimate of
A
~,
but, while these have
good asymptotic efficiencies, they are computationally complex and are
apparently not invariant to reparameterization.
5
In this paper we consider two other methods of defining a regression
analogue to the trimmed mean.
In the location problem, both estimates
A
reduce to the trimmed mean.
.
The first, which we call [email protected](a) for 0 < a <
A
requires a preliminary estimate, which is denoted by [email protected].
~,
Suppose that the
A
residuals from [email protected] are calculated, and those observations corresponding to
the [na] smallest and [na] largest observations are removed.
A
Then
A
.§.PE(= [email protected](a)) is defined to be the least squares estimator calculated from
the remaining observations.
A
The definition of [email protected] was motivated by the applied statisticians'
practice of examining the residuals from a least squares fit, removing the
points with large (absolute) residuals, and recalculating the least squares
solution with the remaining observations.
Generally, there is no formal
rule for deciding which points to remove, but
A
~PE
is at least similar to
this practice.
The second method of defining an analogue to the trimmed mean was
proposed by Koenker and Bassett (1978), who extend the concept of quanti1es
to the
1inea~
model.
Let 0 <
e
< 1.
~e(x) =
Define
e - I(x<O)
(1.4)
and
•
A
Then they call ..@.(e), any value of b which solves
n
I
i=1
P (y. - x . b) = min!
e
1
-1-
(1.5)
6
a eth regression quantile.
(Recall that -x.
is the i-th row of X.)
1
Koenker
and Bassett's Theorem 4.2 states that regression quantiles have asymptotic
behavior similar to sample quantiles in the location problem.
(Since
1\
~(e)
is an M-estimate its large sample behavior can also be deduced from standard
M-estimate theory, as we show later.)
Therefore, L-estimates consisting of
linear combinations of a fixed number of order statistics--for example, the
median, trimean, and Gastwirth's estimator--are easily extended to the
linear model and have the same asymptotic efficiencies as in the location
model.
As they
point~out,
regression quantiles can be computed by standard
linear programming techniques.
They also suggest the following trimmed
1\
least squares estimators, call it
1\
observations whose residual from
1\
~(l-a)
remove from the sample any
~KB:
is negative or whose residual from
~(a)
is positive and calculate the least squares estimator using the
remaining observations.
--
- - - - - - _ _ _ _ _
They conjecture that if lim n-l(x' X) = Q (positive
u
__
~_~_~
__
~
~
definite), then the covariance of
__
~
__
~_~.
~
1\
~KB(a)
is n
~ . _ ~ ~~~.~
~.
-1 2
-1
-1 2
a (a,F)Q , where n a (a,F)
is the variance of an a-trimmed mean from a population with distribution F.
In this paper we develop asymptotic expansions for
1\
~(e)
and
1\
~KB(a)
which provide simple proofs of Koenker and Bassett's Theorem 4.2 and their
conjecture about the asymptotic covariance of
1\
~KB(a).
The close analogy between the asymptotic distributions of trimmed means
and the trimmed least squares estimator
1\
~KB(a)
is remarkable.
A result that
is perhaps even more surprising is that the distribution of the estimator
1\
~PE
depends heavily on that of the preliminary estimator
1\
~O.
In particular,
using least squares or least absolute deviations as the preliminary
estimator results in versions of
1\
~PE
which are inefficient at the normal
model and which are not regression analogues to the trimmed mean (as is
1\
~KB)'
7
1\
(By a version of
~ PE
1\
we mean
~ PE
1\
for a particular
~ a' )
1\
Our results are such that we are able to find a version of f PE which
corresponds to a trimmed mean when the error distribution F is symmetric.
1\
The "right choice" for fa is the average of the ath and (l-a)th regression
1\
quantiles, Le., f o =
11\1\
~(~(a)+ ~(l-a)).
Hogg (1974, p. 917) mentions that adaptive estimators can be
1\
constructed from estimators similar or identical to fpE(a) with a a function
of the residuals from
A
~o'
The advantage of this class of adaptive estimators,
he feels, is that they "would correspond more to the trimmed means for which
we can find an error structure."
However, from the above results, we can
conclude that even if a is non-stochastic, estimators of the type suggested
by Hogg will not necessarily have error structures which correspond to the
trimmed mean.
1\
Besides its nice asymptotic covariance, f KB has another desirable
property. In the location model, if F is asymmetric then there is no
natural parameter to estimate.
In the linear model, if the design matrix is
centered so one column, say the first, consists entirely of ones and the
remaining columns each sum to zero, then our expansions show that for each
o<
a <
where
~
~(a)
is a vector whose components are all zero except for the first.
Therefore, the ambiguity about the parameter being estimated involves only
the intercept and none of the slope parameters.
However, this is also true
for M-estimates (see, e.g., Carroll and Ruppert (1979) or Carroll (1979)).
8
We will present a large sample theory of confidence ellipsoids and
general linear hypothesis testing, which is quite similar to that of least
A
squares estimation.
A
The same theory holds for f pE when f
A
A
o = (f(ex) + f(1-ex))/2 ,
but only if F is sYmmetric.
The methods of this paper can be applied to other estimators.
A
A
For
• •
example, let fA (ex) (= fA) be the least squares eshmate after the pOlnts with
A
the [2aN] largest absolute residuals from f
A
state results for fA'
o are
removed.
In section 6 we
Their proofs are omitted, but are similar to the
A
proofs of analogous results for f pE '
In section 2 we give notation and assumptions.
In section 3, asymptotic
A
representations of f pE are developed, and their significance is discussed in
section 4.
A
Section 5 contains asymptotic results for f KB , and section 6
A
A
discusses conditions under which f KB - f PE are asymptotically equival ent.
In
A
section 7, we compare the asymptotic behavior of f pE for several choices of
A
fo'
Large sample inference is the subject of section 8.
using real data are considered in section 9.
Several examples
All proofs are found in the
appendix.
2.
Notation and Assumptions.
Although
~,
X and! in (1.1) depend upon n, this will not be made
explicit in the notation.
identity matrix.
Let e'
= (1,0, .•. ,0) (lxp) and let Ip be the pxp
Whenever r is a scalar, r
= F-1 (p). Also, suppose
= reo
For 0 < P < 1, define
2 < 1, and define ~l = ~ ex and
l
~2 = ~
• Let N (]J, L:) denote the p-variate Gaussian distribution with mean
ex 2
p ~ and covariance L:.
We will make the following assumptions throughout.
~p
0 < ex l <
~
< ex
9
Cl.
F has a continuous density f which is positive on the
support of F.
C2.
Letting (xiI' ... ' Xip ) ,
for i = l, ... ,n and
n
I
i=l
x~.
1.J
= 0 for
j
be the i-th row of X, xiI
=1
= 2, .•. ,p .
o.
C3.
C4.
= ~i
There exists positive definite Q such that
lim n -1 (X' X) = Q •
n~
.-. --1---
ap (n-~)
C5.
We will assume that
~1
~
for some constant 8.
= O. By C2, this involves no loss in generality.
Assumption C5 is satisfied by many estimators, including the LAD (least
absolute deviation or median regression) (see Corollary 5.1) and, if
2
Eel <
00,
the LS (least squares) estimators.
The residuals from the preliminary estimate
1\
~O
are
Let r ln and r 2n be the [na]th and [n(l-a)]th ordered residuals, respectively.
10
Then the estimate
A
~PE
is a least squares (LS) estimate calculated after
removing all observations satisfying
r i.::o.... r In or
r."1 >- r 2n·
(2.1)
Because of Cl,asymptotic results are unaffected by requiring strict
Let a. = 0 or 1 according as i satisfies (2.1) or
,inequalities in (2.1).
1
not, and let A be the nxn diagonal matrix with A..
11
indicates which observations are not trimmed.
A
~PE(et) =
where (X' AX)
n-l(X' AX)
.
Slnce
Thus
X' A'L
is a generalized inverse for X' AX.
g (1-2et)Q,
/';.
(X' AX)
~KB
= a 1.. The matrix A
whence P (X' AX is invertible)
(Later we show that
+
1.)
behaves similarly to a trimmed mean, even for asymmetric F
and for asymmetric trimming, we will not restrict ourselves to symmetric
trimming when defining
Let
~ =
A
~ KB .
(etl'et2) and define
A
~KB(~)
to be a least squares estimator
calculated after removing all observations satisfying
o.
(2.2)
(Again asymptotic results are unaffected by requiring strict inequalities in
(2.2), which is Koenker and Bassett's suggestion.)
Let b. = 0 or 1
1
according as i satisfies (2.2) or not, and let B be the nxn diagonal matrix
with B.. = b ..
11
1
Then
11
where (X' BX)
is a generalized inverse of (X' BX).
sufficiently large X' BX will be invertible.)
¢(x)
(Again, for n
Let
= t; 1/(a 2-al )
if x < t;l
= xl (a 2-al )
if t;l ::; x
= t;zI (a 2-al )
if t;2 < x
::;
(2.3)
t;2
Define
and letting nj
=
(t;j-O(~)
define
2
By, for example, deWet and Venter (1974, equation (6)), a (a,F)/n is the
asymptotic variance of a trimmed mean with trimming proportions a
l
and l-a Z
from a population with distribution F.
3.
Main Results for
A
~ PE .
First we will find relationships of the form
(3.1)
where G and H are given functions.
We then show that in many special cases
12
(including LS and LAD) the latter term in (3.1) can be further expanded so
that
~·n
n
-~
l
G(x . , Z.) + n -~
- J.
i=l
J.
n
l
i=l
H* (x . , Z.)
- J. J.
(3.2)
for some function H*. It is then a simple matter to obtain the limit
!.z 1\ .
distribution of n (.§.PE-.§.) from (3.2).
In this section we only consider symmetric trimming, so we assume
= 1 - a Z = a. Now define a = ~2f(~2) - ~lf(~l) and
l
c. = (I-ee')x. = (O,x'Z, ... ,x. )'. The specific relationship of form (3.1)
-J.
-J.
J.
J.p
a
is:
Theorem 3.1.
As
n
-+
00"
¥
( l-Za)-l n-!.z
L
i=l
Q-1 c .Z.I(~l:=:; Z.:=:;~z)
-J. J.
J.
(3.3)
+ n -!.z
------
.....
-
l
e~(Z.) + 0 (1)
'1J.=
J.
entries will be called the slopes.
S the intercept and the remaining
Since premultiplication of a vector by
simply replaces the first coordinate by 0, the first two terms on
the RHS of (3.3) represent the ,slope
estimate~.
Note the similarity (and
the differencel) between the first term ,and a representation of the
1\
(untrimmed) LS estimate, .§.; since
(ft -.§.)
and
p.
..~_.
We will call the first entry of
(I-~~')
n
=
(X' X) -1 X' Z
13
(n -1 X' X)
-1-
Q
it follows that
n -~
n
l
i=l
-1
Q
x . Z. +
-1 1
0
P
(1) •
The second term indicates the contribution of the preliminary estimate to
the trimmed LS estimate; this contribution is only to the slope estimates.
Since only the first coordinate of
~
is non-zero, the third term on the RHS
of (3.4) is a representation of the intercept estimate and is identical to a
representation of the trimmed mean in the location model (cf. Corollary 3.1).
To specify the relationship of form (3.2) we make the assumption:
C6.
For some function g,
n
= n -~ l
-1
Q
x.g(Z.) +
-1
i=l
As indicated above, C6 holds with g(x)
By Theorem 5.3, C6 holds with g(x)
estimate.
1
0
= x if
= (f(O)) -1
P
(1) •
A
~O
is the LS estimate.
(~-I(x<O))
if
A
~O
is the LAD
As an immediate consequence of Theorem 3.1, we have our main
result.
Theorem 3.2.
= ( 1_2N)-1
v.
:n-~ ~l Q- 1
i=l
c.{Z.I(~1~Z'~~2)
1
+ n
1
-~
+ ag(Z.)}
1
n
l
1
(3.4)
e~(Z.) +
i=l -
1
0
P
(1) .
14
In the next section, limit distributions are obtained from (3.4) for
various special cases.
Both (3.3) and (3.4) show how the preliminary
estimate influences the asymptotic behavior of
A
f
pE .
As a special case of Theorem 3.2 we obtain:
Corollary 3.1.
In the looation model
(p
=1
and x.
<P(Z.) +
1
1
0
P
=
1 for a'll i)
(1) .
The key technical step in the proofs is an "asymptotic linearity"
result for ordered residuals, which generalizes work of Bahadur (1966) and
Ghosh (1971) for the location model.
Lemma 3.1.
For 0 < e < 13 let r en be the [ne]th ordered residual from
A
~o.
Then
(Recall that
4.
~e(x)
=e
- I(x<O).)
Asymptotic Behavior of
A
~PE.
In this section we show that Theorem 3.2 leads to these basic conclusions
about
A
f
pE :
1) The intercept estimate is asymptotically unbiased if F is symmetric.
2) The slope estimates are asymptotically unbiased even if F is
asymmetric.
15
3) The asymptotic variance of the intercept, which does not depend upon
the choice of
A
~o'
is that of the trimmed mean in the location model.
4) The asymptotic covariance matrix of the slopes depends upon
A
~o
and,
in general, will be difficult to estimate.
Let
°be a (p-l)Xl vector of zeroes.
Q=
QJ
1
[Q
" such that
By C2, there is a "'Q
and Q-1 = [1
Q
Q
Moreover,
Ie. c~ = [.
lim n -1
i=l
n-+oo
-J.-J.
=QOI]
QO
.-
and
Ie. = [_°0J.
Q i=l
If we estimate
~
with
A
~PE
(4.1)
-J.
then the asymptotic bias of the
= (1-2a)
-1
f i;2
interc~pt
is
xdF(x)
i;l
which is zero if F is symmetric about zero.
By (3.4) and (4.1) the slope
estimates are asymptotically unbiased, even if F is asymmetric.
asymptotic variance of the normalized intercept is
2
cr (a,F)
= Var ¢(Zl) ,
The
16
the asymptotic variance of the normalized a-trimmed mean in the location
model.
The intercept is asymptotically uncorrelated with the slopes, and
"'-I 2
the asymptotic covariance matrix of the normalized slopes is Q a (a,g,F)
where
We see that the asymptotic distribution of the intercept estimate does
A
A
not depend upon the choice of .§. 0 provided (.§. 0-.§)
=
0p (n -
k
2) •
On the other
A
hand, we see from (3.4) that the slope estimates depend upon .§.O' since the
unusual situation where a
=0
is ruled out by assumption Cl.
Using the
Lindeberg central limit theorem and Theorem 3.2, it is easy to show that
k A
under C3 and C4, n 2 (.§.PE-.§.- ~¢(Zl)) converges in distribution to a normal
law.
A
In general, large sample statistical inference based on .§.PE will be a
challenging problem because of the difficulties of estimating
a
=
(S2f(S2)-Slf(Sl))'
Obtaining reasonably good estimates of the density f
might take very large sample sizes.
5.
A
Main Resu1 ts for .§. KB'
In section 1, we defined a 8th regression quantile to be any value of b
which solves (1.5).
There may be multiple solutions, though in our few
examples we found that the solution was always unique.
However, the
asymptotic results we present do not depend upon the rule used to select
one.
We suppose, then, that a definite rule has been used, and we denote
A
this solution by.§.(8).
17
1\
~KB
For
those for
we obtain an asymptotic representation which is similar to
1\
~PE
but perhaps simpler.
The estimator
Theorem 5. 1 .
1\
~ KB
satisfies
(5.1)
and therefore
(5.2)
Expression (5.1) is similar to a result of deWet and Venter (1974,
equation (5)).
Note that (5.2) verifies Koenker and Bassett's hypothesis on
the covariance of
intercept,
~ l'
1\
~ KB .
Moreover, the bias of
1\
~KB
for
~
involves only the
1\
and not the slopes.
Also,
~KB
is asymptotically unbiased if
F is symmetric.
Theorem 5.1 requires the next result on regression quantiles.
~(e)
=~
+ ~e'
Define
The next theorem, which is a special case of a general
result for M-estimators, shows that
1\
(~(8)~~(e))
is essentially a sum of
independent random variables.
Theorem 5.2.
The foU01JJing representation hoZds:
= n-~(f(~e)-l
Q-1
I
i=l
x
:We(Z'-~e)
-1
1
+ 0 (1) .
P
Theorem 5.2 and the Lindeberg central limit theorem provide an easy
proof of Theorem 4.2 of Koenker and Bassett (1978), which we state as a
corollary.
18
Corollary 5.1.
Let n = n(8 l , •.. ,8 m;F) be the symmetric mxm matrix defined
by
8.(1-8.)
=
1
J
n..
f(~(8.))f(~(8.))
1J
3
J
1
Then
6.
A
A
A
A Choice of .§. 0 For Which .§. KB and.§. PE Are Asymptotically Equivalent.
We have seen that
A
~KB
A
is a close analogue to the trimmed mean, but the
A
behavior of .§.PE depends upon.§. 0 and is not similar, in general, to that of
a trimmed mean.
A
One might ask whether .§.O can be chosen so that
A
same asymptotic distribution as .§.KB'
A
~PE
has the
The answer is yes, provided F is
symmetric and we allow only sYmmetric trimming.
A
A
A
A
Let .§.PE(RQ,a) (= .§.PE(RQ)) be .§.PE when .§.O is the average of the ath and
A·
(l-a)th regression quantiles, Le., .§.O
=
A
((~(a)+
A
.§.(1-a))/2.
Then, by
A
Theorem 5.2, this .§.O satisfies C6 with
g(x)
•
If F is sYmmetric, then
~l
= -~2'
f(~l)
= f(~2)'
and therefore
(6.1)
By (3.4) and (6.1),
19
= n -~
n~(~ ~E (RQ) - 13)
and therefore, since
~(9:) =
~l Q-1 x.c/>CZ.)
i=l
-1·
+ 0
1
P
(1) ,
0, (5.1) implies
(6.2)
so that asymptotically there is no difference between trimming with this
preliminary estimate and using Koenker and Bassett's (1978) proposal.
(However, (6.2) does not necessarily hold if F is aSYmmetric.)
Notice that (5.1) and (6.2) imply that
7.
Comparisons of Several Choices of
A
~O'
A
The choice of
A
~ PE'
should be based on the efficiency of the resulting
not its similarity to
for using
A
A
~ PE (RQ)
and
~ PE (LAD),
and
~(.5),
A
~o
A
~ KB'
by comparing
which are
A
~ PE
respectively.
In this section we find further support
A
~ PE (RQ)
with
A
~0
with two other estimators,
A
~ PE (LS)
equal to the least squares estimator
Comparisons are made within the family of
contaminated normal distributions, which was introduced by Tukey (1960) to
study the behavior of statistical procedures under heavy-tailed distributions.
These distributions have the form
F(x)
where 0 < E < 1 and
~(x/b)
~
=
(l-E)~(x)
+
E¢(x/b) ,
is standard normal distribution.
is the distribution of the "bad" data, while
"bad" observations.
E
Typically, b > 1 and
is the proportion of
Recall that the asymptotic variance of the intercept
20
1\
~O'
does not depend upon
~-1
and that the asymptotic covariance matrix of the
2
~-1
slopes is Q 0 (a,g,F), where Q
matrices.
depends only on the sequence of design
Therefore, we can compare the estimators by using only 02(a,g,F).
Table 1 displays 02(a,g,F) for several choices of a, E, andb, and for g
1\
1\
1\
corresponding to ~Pli(L~2: ~EJl(L~~~~~~_~j>E (RQ}~_!_~r comparison, we
include the standardized asymptotic variance (that is 0 2 where 02 Q-l is the
asymptotic covariance matrix) for the LS estimate and two M-estimates, a
Huber and a Hampel.
Both of the M-estimates use Huber's Proposal
obtain scale equivariance.
~
to
The Huber uses
w(x)
=
min(2, Ixl)sign(x) ,
and the Hampel uses
w(x)
x
if
0 s; x s; 1.5
= 1.5
if
1.5
s;
x
s;
3.5
if
3.5
s;
x
s;
8
if
8
x
=
=
(8-x)/3
=0
and We-x)
= -W(x).
Ruppert (1979).
1)
1\
~ PE
s;
For discussion of Huber's Proposal 2 see Carroll and
Several conclusions emerge from Table 1.
(LS) and
1\
~ PE (LAD)
are rather inefficient at the normal
distribution.
•
2)
1\
~ PE (RQ)
is quite efficient at the normal model.
3) Under heavy contamination (b large or
and
and
1\
~PE(RQ)
1\
~PE(LAD)
E
large)
1\
1\
~PE (LS), ~ PE (LAD)
are relatively efficient compared with LS.
compare well against the M-estimates, but
poorly compared to the M-estimates if E
=
.25, b
= 10,
,
1\
Also ..§.PE(RQ)
1\
~PE(LS)
and a
=
does
.25.
21
1\
(Intuitively, one can expect that when a = .25,
~PE(LS)
will be
heavily influenced by its preliminary estimate, which estimates
B
poorly for these b and E.)
Because of 1) and 3), the practice of fitting by least squares or LAD,
removing points corresponding to extreme residuals, and computing the least
squares estimate from the trimmed sample is not an adequate substitute for
robust methods of estimation.
If, instead of removing those observations with the rna] smallest and
1\
rna] largest residuals from
~O'
we remove those observations with the [2na]
largest absolute residuals, then the asymptotic variance of the intercept is
the same as that of the slopes.
formed in this manner.
Specially, let
1\
~A (a)
1\
(= ~A) be the estimate
Then, if F is symmetric,
n
I
i=l
Q-
1
x.{Z.I(~1::>Z·::>!;2)
-1
1
1
1\
+ a(~o-~} +
0
P
(1) ,
and if C6 holds, then
n
-~
n
I
Q-l x . {Z.I(!;l::> Z.::> !;2)
i=l
-1
1
+
ag(Z.)}
1
1
+
0
P
(1) ,
which in the location case reduces to
n -~
n
I
i=l
{Z.I(!;1::>z·::>!;2)
1
1
+
ag(Z.)}
1
+ 0
P
(1) .
The proofs are similar to those of Theorems 3.1 and 3.2 and are omitted.
Since
1\
~A
is particularly easy to compute in the location model, it, is
very suitable for Monte-Carlo studies.
It is hoped that such studies will
.
22
indicate the degree of agreement between the asymptotic and finite sample
variances of
A
~A
Le.,
A
~PE
with
300, and 400.
A
~o
as well as
A
~A'
Table 1 displays the variance of
the LS estimate, for sample sizes of n
=
A
~A(LS),
50, 100, 200,
The Monte-Carlo swindle (Gross (1973)) was employed as a
variance reduction technique.
One sees from this table that convergence of
the variance to its asymptotic value can be extremely slow for some
distributions, e.g., b
8.
= 10
and
£
= .10
or ..25.
Large Sample Inference.
Here we sketch a large sample methodology of confidence ellipsoids and
A
hypothesis testing based upon
~KB'
For symmetric trimming and symmetric F,
A
the theory is
.
.
~
aFP'_~,icabl~ t?_.~ PJ1 (RQ)
as well.
The asymptotic covariance
.
matrlx,
cr 2( a,F ) Q-1 ,can be consistently estimated since n -1 X' X + Q, and a
consistent estimate of cr2(~,F) is provided by the next theor~m.
Theorem 8.1.
Let S be the sum of squaPes for residuaZs caZauZated from the
trimmed sampZe, i.e.
S
=
y' B(I
p -X(X' BX)
X')B_y.
Let c.
J
•
Then
2
P 2
s (~,F) + cr (~,F) •
23
Theorem 8.2.
Suppose m is the number of observations whiah have been
For 0 < E < 1, Zet F(n 1 ,n 2 ,E) denote the (I-E)
quantiZe of the F distribution with n 1 and n 2 degrees of freedom and Zet
~emoved
by trimming.
Suppose for some integer ~, K and c are·matriaes of sizes ~xp and ~X1,
respeativeZy, and that
lim piCK'
n-+oo
iKB(~-£)'
Letting K = I
p
K
has
rank~.
If
K' (~+ ~(~)
[K' (X' AX)-l K]-l (K'
and.£. = .§. -
~(~),
= £, then
SKB(~)-£) ~ d(~, n-m-p,
E)}
=E
•
the confidence ellipsoid
(8.1)
for.§. + ~(~) has an asymptotic confidence coefficient of (I-E).
Moreover, if
we test
versus
by rejecting HO whenever
then the asymptotic size of our test is E.
•
24
= 0, a 2 = 1 (so m = 0 and A = I)
l
and F is Gaussian, (8.1) is an exact 1-£ confidence ellipsoid and (8.2) is
Of course, in the special cases where a
an exact size £ test.
9.
Examples.
In this
s~ction
we contrast the results obtained for different
(i) the stackZoss data set given
estimates when applied to two data sets:
by Andrews (1974) and (ii) a set of measurements of water saZinity and river
discharge taken in North Carolina's Pamlico Sound (see Table 2).
estimates we consider are listed in Table 2.
The
Both HUBER and ANDREW are
M-estimates and are calculated by the iterative solution to
N
L
i=l
where s
= MAD/C,
the residuals.
1/1((y.-x.B)/s)x.
1
1
1
C is a constant, and MAD
For Huber, C
=
=o ,
= median
.6745 and 1/1(2)
of the absolute values of
= max(-1.25
,m~n(Z,1.2'5)).
This choice of 1/1 should give results for normal data similar to those for
the regression analogues of
C
=1
a
10% trimmed mean.
The
estimat~
ANDREW uses
and 1/1 (2) = sine (2) I ( I2 I~ 1T) •
A
We defined
~KB
a bit differently than in section 2.
Both data sets
have four independent variables and each regression quantile hyperplane
•
passes through four observations.
Therefore, if one defines
section 2, at least eight observations are trimmed.
by requiring strict inequality in (2.2).
If a
=
A
~KB
as in
A
Instead, we defined ~KB
(.1,.9), this leads to no
trimming for the stackloss data, and only two observations trimmed for the
salinity data, so we use
~
= (.15,.85).
Then observations 4, 9, and 21 are
trimmed in the stackloss data and observations 1, 13, 15, and 17 in the
salinity data.
25
1\
1\
An important advantage of .§.PE(RQ) over .§.KB is that residuals from a
preliminary estimate are rarely tied (at least in these data sets), and with
1\
.§.PE(RQ) one can have the actual percent trimming close to any specified a.
The observations deleted when calculating
1\
~PE(RQ,.lO)
are 1, 3, 9, and 21
for the stackloss data and 1, 11, 13, 15, 16, and 17 for the salinity data.
Since both data sets have outliers, asymptotic theory and Monte-Carlo
studies for the location problem (Andrews et al. (1972)) lead us to expect
'/\
.
that LSE wil1be\\T?rs~~ ~~~EW w~l1 do ~er:Y well,and ~UBER, .~PE~RQ)' an~
1\
.§. I<B wil1 have roughly comparable performances.
_
Of course, with these data
the true parameters are unknown, and we can only measure performance by
closeness of fit to the bulk of the observations, say with MAD or IQR
(= interquartile range of the residuals).
Using either MAD or IQR as
criteria, our study does seem to agree with our expectations.
The
redescending M-estimator (ANDREWS) appears to be best overall.
Also, we have included
1\
~(.5),
the least absolute deviation estimate.
Its performance was quite good here, but of course it is known to have
rather poor efficiency at the normal model.
In Table 3, we list the regression coefficients, MAD, and IQR for each
estimator.
Figures 1 and 2 are box plots of the residuals and were obtained
from the SAS package.
Least squares computations were performed on SAS.
Regression quantiles
were computed using MPS/360, a linear programming package, and LPMPS, a
preprocessor for MPS/360 (McKeown and Rubin (1977)).
10.
Summary.
We have considered two methods of defining a trimmed least squares
estimator:
1\
.§. KB' which uses Koenker and Bassett's (1978) regression
•
26
quantiles, and
A
~PE'
which uses a preliminary estimate.
Despite its intuitive appeal,
A
~PE
estimate will not be very satisfactory.
based on an arbitrary preliminary
Its behavior will
depend heavily upon the choice of the preliminary estimate.
Some choices
(e.g. median regression) result in very inefficient trimmed estimates at the
normal distribution, even if the trimming proportion is small.
Other
choices (e.g. least squares) can lead to low efficiency for heavy-tailed
distributions, especially if the trimming proportion is high.
contribution of the preliminary estimate to the variance of
A
Moreover, the
~PE
depends on
the density of the error distribution and might be difficult to estimate in
practical situations.
The estimate
A
~KB
behaves analogously to a trimmed mean.
Also, for a
particular choice of preliminary estimate, the average of two regression
quantiles,
A
~PE
(which for this preliminary estimate we call
A
asymptoticaily equivalent to
~KB'
A
~PE(RQ)),
is
provided the error distribution is
symmetric.
For moderately-sized data sets,
A
~KB;
with
A
~PE(RQ)
A
~PE(RQ)
has one major advantage over
the proportion of observations rejected can be made quite
close to any specified a.
Since the number of observations lying on a
regression quantile hyperplane is typically equal to the number of independent
variables,
A
~KB
does not share this property with
The trimmed estimates
A
~KB
to M-estimates based on Huber's
and
~,
A
~PE(RQ)
A
~PE(RQ).
seem to be worthwhile alternatives
but perhaps not to redescending M-estimates.
27
Appendix
Lemma A.l.
x~
vectop~ ~~
With ppobabiZity one thepe exists no
and p+l pows of
x. (1) , ... ,x
. ( l)~ such that -y.1 = -x 1. (.)b
fop j = 1, ... ,p+l.
- 1 p+
J -
- 1
Proof.
Routine.
Lemma A.2.
Zet
~n
o
Use the continuity of F.
A
~o~
Let rl, ... ,rn be the pesiduaZs fPom
suppose 0 < e <
l~
and
be a sequence of soZutions to
n
I
Pe (r. - ~ ) = min •
1
n
i=l
Then
1
n -~
In
addition~
n
I
i=l
l/J (r. - ~ ) + 0 a. s.
e 1 n
the sequence of soZutions
A
~(e)
(AI)
of (1.5) satisfies
(A2)
Proof.
We will proof only (A2) because (AI) can be demonstrated in a
similar manner.
Let {e .}~ 1 be the standard basis of RP .
-J J=
n
G.(a) =
J
I
i=l
Define
A
Pe(y·-x.(S(e)+ae.))
1 -1 -J
and let H.(t) be the derivative from the right of G. so that
J
J
28
H. (a)
J
Notice that H.(a) is non-decreasing.
J
Therefore, for E > 0
H. (-E) ::;; H. (0) ::;; H. (E)
J
J
J
and because G.(a) achieves its minimum at a
J
= 0,
H.(-E) ::;; 0 and H.(E) ~ 0'.
J
J
Consequently,
IH.(o)1 ::;; H.(E) - H.(-E) .
J
Letting E
+
J
(A3)
J
0 in (A3), we see that
n
IH. (0)
J
I : ; l:
i=l
1\
-
Ix .. \ I(y.-x.s(e)=O) •
1J
1 -1-
o
Now (A2) follows from Lemma A.l.
Lemma A. 3.
For Q..
€
RP, define
1
M(/::')
_
= n -~
n
• t..\'
1=1
1
x ~ '''e (2.1 - -1x. An -~ - St"e)
-1'f"
Then for all, L > 0
sup
IIM(/::') - M(Q)
o::;;IIQ..IIS:L
-
+ f(!;e)Q /::,11 =
-
(1) •
0
P
(A4)
29
Proof.
The result follows from Lemma 4.1 of Bickel (1975) because
o
Remark.
Equation (A4) is a special case of the conclusion of Jureckova's
(1977) Theorem 4.1, which she proves under conditions different from ours.
Her C.. is our x .. n -~
J].
1J
Proof of Lemma 3.1.
~
Since
n
l
i=l
= r en is a solution to
Pe(r i -~)
= min
(AI) implies that
(AS)
Define V(~)
=n
_~
n
2.
l
1=1
l/J e (Zi - .! i
6. n
_k
.2_
.
t;e) .
Using the method of Jureckova
(1977, proof of Lemma 5.2) and (A4), we can show that for all E > 0 there
exists n, K, and nO such that
inf IV(6.) I < n) <
1~'6.1 > K
-
P(
E
for n
2::
nO .
(A6)
Next, (AS) and (A6) allow us to conclude that
(A7)
._.- ... _----------~-.
k A
By (A7) and CS we may substitute, n 2 (S - B+ e(r - t;)) for
-0 - - en e
6. in (A4) and
complete the proof by examining first coordinates, using C2.
0
30
Lemma
A.4.
Let D. (=D.) be a rxc
In
1
sup (n -
1 n
where II D,1 11
2
= Tr
D: D.
1
1
~1
intervaZ aontaining
L II D, II 2)
i=l
n
Suppose
mat~x.
<
00
,
1
is the Eua Zidean norm of D..
1
~2
and
and Lipsahitz aontinuous on 1.
Let I be an open
and "let the funation g (x) be defined for aU x
For.Q.1".Q. 2" and .Q. 3 i~ RP ana".Q.
r;:
('I:,
l' I:, 2' .Q. 3)
define
n
1
L
i=l
T(.Q.) = n-~
Then" for aU
1
D,g(Z·+1:,3
1
>
M
1
-1
_~
-1-1
1
< Z. < ~2 + .!1' t:.2n-~}
1
0"
sup
11t:.IIS;M
Proof.
-
X ' n-~)I{~l + x. t:.. n
IT(t:.)-T(O)-E(T(t:.)-T(O))I = o (1) •
P
-
The proof is very similar to that of Bickel's (1975) Lemma 4.1 and
is omitted here, but it can be found in Ruppert and Carroll (1978).
Proof of Theorem 3.1.
U(.Q.) = n
-1
For.Q.1 ' .Q. 2 in RP and t:. = (.Q.1'.Q. 2)' define
n
\'
'-~
x, I (~1 + E. 1' .Q. 1 n
i=l - 1 - 1
LX,
S; Z. S;
1
~2 +
.! l''.Q.k
2 n - 2}
and
w(.Q.)
= n
-~
¥
LX,
i=l
-1
z, I{ ~1
1
'-~
+ x , t:. 1 n
-1-
,~
:;;; Z. :;;; ~2 + .! I' .Q. 2 n - }
1
0
31
Using Lemma A.4, it is easy to show that for all M > 0,
sup
o::;II~II::;M
IU(6) - (1-2a)QI
-
=o
P
(1)
(A8)
and
,
Then using the fact that -x.
e
1 -
= 1,
we have
we have
(1-2a)Q +
0
P
(1)
(AlO)
and
(All)
By (AlO)
(A12)
By (All), (A12) and Lemma 3.1
32
o
Then (3.3) follows from the definition of W(O).
Proof of Theorem 5.3.
Using (A4) and the method of Jureckova (1977" proof
of Lemma 5.2) we can show that
!.: 1\
.
n \~(8) - §.(8))
Therefore, we can substitute
!.: 1\
=o
P
n2(§.c.~)_§.(e))
(1) .
for!:J. in (A4) and use (A2) to
obtain
M(O)
o
and Theorem 5.3 follows easily.
Proof of Theorem 5.1.
The proof is quite similar to the proof of
Theorem 3.1 and can be found in Ruppert and Carroll (1978).
Proof of Theorem 8.1.
V(~)
= n -1
We see that
For ~ 1' ~ 2' ~ 3 in RP define !:J.
= (~1' ~ 2' ~ 3) and
o
33
Applying Lemma A.4 with g(x) = x
sup
I~I::;;M
2
and D. = 1 we have for M > 0
1
IV(~) - V(O) - E(V(~)-V(O))
-
-
--
I = oP (1)
•
By a Taylor expansion of F and additional simple calculations
E(V(~-V(O)) -+
0 ,
whence
IV(~) - V(O)
sup
-
I~I::;;M
-
I = oP (1)
.
Therefore by Corollary 5.1 and (5.2) we have
S
Now Var V(O)
-+
= V(O)
+ 0
P
(1) .
0, so by Chebyshev's inequality,
S
= EV(O)
-
+ 0
P
(1)
= E(Z.0(a))2 I (!;1::;; Z.:5: !;2)
1
1
Furthermore for j
+
0
P (1)
•
= 1,2
c.
J
= !;. J
o(a) +
-
0
P
(1)
by Corollary 5.1 and (5.2), and Theorem 8.1 follows.
Proof of Theorem 8.2.
e
This follows in a straightforward manner from (5.2),
Theorem 8.1, and Theorem 4.4 of Billingsley (1968).
l
o
o
e
,e.
e
I>
Table 1 - Variances of the asymptotic distribution of slope estl~itors or contaminated normal
.
distributions. (The asymptotic covariance matrix is Q multiplied by the displayed quantity.)
e:*
b**
Estimator
Trimmed Least Squares
A
A
Least
Squares
NORMAL
Huber .
Hampel
Proposal
2
One Step
~PE(LS)
(Least Squares as
Preliminary Estimate)
~PE(RQ)
~PE(LAD)
A
(Average of nth ciiid 1-ath
(Least Absolute
.Regression Quanti1es as
Deviation as
Preliminary Estimate)
Preliminary Estimate)
a=.05 a=.10
a=.25
a=.05
a= .10
a=.25
a=.05
a=.10
a=.25
1.00
1.04
1.04
1. 30
1.36
1.26
1.54
1.83
2.14
1.03
1.06
1.19
0.05
3.0
1.40
1.16
1.17
1.38
1.51
1.58
1.54
1.88
2.26
1.16
1.17
1.29
0.05
5.0
2.20
1.20
1.23
1.43
1.71
2.15
1.51
1.87
2.28
1.20
1.20
1.31
0.05
10.0
5.95
1. 23
1.28
1.68
2.66
4.81
1.46
1.85
2.30
1.25
1.23
1.33
0.10
3.0
1. 80
1.30
1.32
1.44
1.64
1.88
1.56
1.93
2.39
1.32
1.30
1.39
0.10
5.0
3.40
1.40
1.47
1.45
1.96
2.99
1.46
1.90
2.44
1.46
1.38
1.45
0.10
10.0
10.90
1.49
1.61
1.48
3.32
8.09
1.34
1.85
2.47
1.65
1.45
1.49
0.25
3.0
3.00
1.90
1.94
1. 79
1.97
2.74
1.82
2.12
2.87
2.14
1.85
1.80
0.25
5.0
7.00
2.46
2.68
2.49
2.09
5.13
2.37
1.92
2.99
4.11
2.39
2.01
0.25
10.0
25.75
3.20
4.26
6.50
1.88
15.66
5.51
1.65
3.06
13.65
3.69
2.19
*Proportion of contamination
**Standard deviation of contamination
~
.s::.
35
Table 2
The water saZinity data set. The values are biweekly averages of SALINITY
at time period i, SALLAG = salinity at time i-I, i = TREND = one of the
six biweekly periods in March-May and H20FLOW = river discharge in time i.
e
OBS
SALINITY
SALLAG
TREND
H20FLOW
YEAR
1
7.6
8.2
4
23.005
72
2
7.7
7.6
5
23.873
3
4.3
4.6
0
26.417
4
5.9
4.3
1
24.868
5
5.0
5.9
2
29.895
6
6.5
5.0
3
24.200
7
8.3
6.5
4
23.215
8
8.2
8.3
5
21.862
9
13.2
10.1
0
22.274
10
12.6
13.2
1
23.830
11
10.4
12.6
2
25.144
12
10.8
10.4
3
22.430
13
13.1
10.8
4
21. 785
14
12.3
13.1
5
22.380
15
10.4
13.3
0
23.927
16
10 ..5
10.4
1
33.443
17
7.7
10.5
2
24.859
18
9.5
7.7
3
22.686
19
12.0
10.0
0
21. 789
20
12.6
12.0
1
22.041
21
13.6
12.1
4
21. 033
22
14.1
13.6
5
21. 005
23
13.5
15.0
0
25.865
24
11.5
13.5
1
26.290
25
12.0
11.5
2
22.932
26
13.0
12.0
3
21.313
27
14.1
13.0
4
20.769
28
15.1
14.1
5
21. 393
73
74
75
76
77
36
Table 3
Regression coefficients, MAD and IQR for the Stack10ss data.
Code
Intercept
Air Flow
Temperature
Acid
MAD
IQR
39.92
-.72
-1.30
.15
1.92
3.12
39.69
-.83
-.57
.06
1.18
1.71
42.83
-.93
-.63
.10
1.60
2.49
.§. PE (RQ, .10)
40.37
-.72
-.96
.07
1.37
2.59
HUBER
41.00
-.83
-.91
.13
1~63
3.07
ANDREW
37.20
-.82
-.52
.07
.99
1.50
LSE
A
.§.(.50)
A
.§. KB (.15)
A
Water Salinity Data
.-
e
Code
Intercept
SALLAG
TREND
H20FLOW
MAD
rQR
9.59
.777
-.026
-.295
.72
1.38
14.21
.740
- .111
-.458
.50
.98
9.69
.800
-.128
-.290
.67
1.36
.§. PE (RQ, .10)
14.49
.774
-.160
-.488
.60
1.05
HUBER
13.36
.756
-.094
-.439
.56
1.02
ANDREW
17.22
.733
-.196
-.578
.47
.83
LSE
A
.§.(.50)
A
.§.KB(·15)
A
e
9.0D
, e.
e..
•
+------------------------------------_._--~---------~-------~----------------------------------------~~~---~--~----I
.
,
I
I
1
4.83
1.61
-1.50
I
I
I
I
I
I
I
I
I
I
I
I
...
I
I
I
I
I
I
•
f---+
I
I
I
I
I
I
I f
I
I
I
I
I
I
I
1
I
1
...
I
I
1
I
-+
I
I
I
*
I
I
I
I
I
I
I
I
I
-+
I
I
I
I
I
I
-4.67
- _..
.
I
-I
I
•
I
I
0
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
+---+
•
•
t---+
I
I
I
I
I
I
I
I
I
I
I
I
I
*---.
I
I
,
I
I
I
I
1
I
*---*
*-f-*
t---t
t---+
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
+
I
I
I
I
I
f---+
I
I
I
I
I
I
I
I
1
1
,
•
+I
I
•
I
,
I
I
I
I
I
I
I
I
I
+---+
I
I
I
I
I
I
•
I
I
I
I
1
I
+
I
I
I
I
+--+
;r--+
.-+-.
I
I
_.-~-.
I
I
.---.
I
I
I
I
I
+---+
+---+
f---+
If I
1
I
I
1
I
I
I
I
I
I
I
1+1
I
I
I-
I
1
I
I
I
+
I
I
-I
1
1
I
a
I
I
..
+
*
1
I
I
-
1
*
•
I
•
I
-1'.0
I
1
,
,
I
-1.83
-•
I
I
+
1
I
1
1
I
I
0
1
I
I
1
I
I
I
1
I
I
1
I
1
I
I
I
II
I
I
•
+-----------------------~----------------------A---------------A-------------------A--------~-------~---------~~~-----+
LSE
HUBER
Figure 1.
~PE(RQ,.10)
~KB(.lS,.8S)
Boxp1ots of Residuals for the Stack10ss Data.
!(.S)
ANDREWS
tJ.:I
"'-J
38
·e
..
r--------+--~---~-+---~-~--+~-~-~---+------~-+--------r
1
I
I
I
•
o -~----.
J
f
I.---
,•
I "!
• :<;;'
,
+-.---+
1
I
,
,.
,
t
.
,
,
+f'
I
,-------~-
+~~--.----+
,-,,·1 ..... _+1
_ _ _ _ _ _ _ f•
t- +
• --_ ..
t
•
,,r
,
•I
...... ----.
O~--------I•
I
!•
I
It
Ii
I~
••
t-i---+
------..
+
:-~---~-
II
t
I
. ..
j
I
I,
t
-f--r- ---+I: -----0
••
,
I
,..;10:
•'<cclCl
---....o
_~-
1
+--*---..,.
*•
t
,I
I
t
.1
.0
*
------- I
I"
--- .....1
...., .
,...
1
·,,
II:
, -------
1
-r-*---+
*
, Ii
I1 ~
1
,
•
+--*-----+
I'
1
0--------- IIt
1+
+--.-_.
. . _-+,---... ---I
00
I
f
I
I
•.
II
~--------+--------+--~-~---+-~--~---+--------+-~-----~+
tb
10
.\1.\
0
0
:!II
l"l
•
•
It'I
•
0;-
,.
0
It'I
0
0
It'I
0
•
0
~
_.. _---_.
0
Q
....1•
39
References
Robust Estimates of Location:
Andrews, D.F., et al. (1972).
Advances.
Princeton, NJ:
Anscombe, F.J. (1961).
Princeton University Press.
Examination of residuals.
Symp. Math. Statist. Prob. (J. Neyman, ed.),
Los Angeles, CA:
Bahadur, R.R. (1966).
Survey and
In Proc. Fourth Berkeley
pp~
1-36.
Berkeley and
University of California Press.
A note on quantiles in large samples.
Ann. Math.
Statist. 57, 577-580.
Bickel, Peter J. (1973).
On some analogues to linear combinations of order
statistics in the linear model.
Bickel, Peter J. (1975).
Ann. Statist. 1, 597-616.
One-step Huber estimates in the linear model.
J. Amer. Statist. Assoo. 70, 428-433.
Billingsley, Patrick (1968).
Convergence of Probability Measures.
New York:
John Wiley and Sons.
Carroll, Raymond J. (1979).
On estimating variances of robust estimators
when the errors are asymmetric.
J. Amer. Statist. Assoo. 74, 674-679.
Carroll, Raymond J. and Ruppert, David (1979).
robust regression estimates.
Almost sure properties of
Institute of Statistios MimeoSeries
#1240, Department of Statistics, University of North Carolina at
Chapel Hill.
deWet, T. and Venter, J.H. (1974).
means with applications.
Ghosh, J.K. (1971).
An asymptotic representation of trimmed
South Afrioan Statistics Journal 8, 127-134.
A new proof of the Bahadur representation of quantiles
and an application.
Gross, Alan M. (1973).
Ann. Math. Statist. '42, 1957-1961.
A Monte-Carlo swindle for estimators of location.
App. Statist. 22, 347-356.
40
Gross, Alan M. (1976).
Confidence interval robustness with long-tailed
symmetric distributions.
Hogg, Robert V. (1974).
J. Amer. Statist. Assoc. 71, 409-416.
Adaptive robust procedures:
A partial review and
some suggestions for future application and theory.
J. Amer. Statist.
Assoc. 69, 909-927.
Huber, Peter J. (1970).
In Nonparametric
Studentizing robust estimates.
Techniques in Statistical Inference (M.L. Puri, ed.), pp. 453-463.
Cambridge:
Cambridge University Press.
Robust Statistical Procedures.
Huber, Peter J. (1977).
SIAM,
Philadelphia, PA.
Jaeckel, Louis A. (1971).
Robust estimates of location:
asymmetric contamination.
Jureckova, Jana (1977).
Symmetry and
Ann. Math. Statist. 42, 1020-1034.
Asymptotic relations of M-estimates and R-estimates
in linear regression model.
Ann. Statist. 5, 464-472.
Koenker, Roger and Bassett, Jr., Gilbert (1978).
Regression quantiles.
Econometrica 46, 33-50.
McKeown, P.G. and Rubin, D.S. (1977).
MPS/360.
A student oriented preprocessor for
Computers and Operations Research 4, 227-229.
Ruppert, David and Carroll, Raymond J. (1978).
least squares estimation.
Robust
regres~ion
by trimmed
Institute of Statistics Mimeo Series #1186,
Department of Statistics, Universi ty of North Carolina at Chapel Hill.
Stigler, Steven M. (1977).
Do robust estimators work with real data?
Ann. Statist. 5, 1055-1098.
Tukey, John W. (1960).
A survey of sampling from contaminated distributions.
In Contributions to Probability and Statistics (I. Olkin, ed.).
Stanford, CA:
Stanford University Press.
Tukey, John W. and McLaughlin, D.H. (1963).
Less vulnerable confidence and
significance procedures for location based on a single sample.
Sankhya Ser. A 25, 331-352.
UNCLASSIFIED
SEC U AITY CLASSIFICATION OF THIS PAGE (Wh.,. D.,. Hnl.r.d)
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTArlON PAGE
II.
2. GOVT ACCESSION NO. 3.
REPORT NUMBER
RECIPIENT'S CATALOG NUM8E ..
S . TYPE OF REPOAT • PERIOD COVERED
•• TITLE ( . .d Subllll.)
Trimmed Least Squares Estimation in the Linear
Model
TECHNICAL
6. PERFORMING
O~G.
REPORT NUMBER
Mimeo Series No. 1270
8.. CONTRACT OR GRANT NUMBER(s)
7. AUTHOR(.)
NSF Grant MCS78-01240
AFOSR Contract AFOSR-75-2796
AFOSR Contract AFOSR-80-0080
David Ruppert and Raymond J. Carroll
I. PERFORMING ORGANIZATION NAME AND ADDRESS
t
10. PROG.RAM ELEMENT, PAOJECT. TASK
AREA' WORK UNIT NUMBERS
II. CONTROLLING OFFICE NAME AND ADDRESS
12. REPORT DATE
Air Force Office of Scientific Research
Bolling Air Force Base
Washington, DC 20332
February 1980
13. NUMBER OF PAGES
40 pages
I •. MONITORING AGENCY NAME 6 ADDRESS(1f dlll.rent Irom Conlroll/n, Olflce)
IS. SECURITY CLASS.(ollhi. report)
UNCLASSIFIED
IS •• DECLASSIFICATION/DOWNGRADING
SCHEDULE
-
16. DISTRIBUTION STATEMENT
(01 this Report)
.
Approved for Public Release--Distribution Unlimited
'7. DISTRIBUTION STATEMENT
I'
(01 the ebl/tr.et ent.r.d /n Block 20, If dlll.r.nt lrom Report)
Ie. SUPPLEMENTARY NOTEI
II. KEY WORDI (Contlnu.
Oft , .... r••
• ,d. II nee. . .. " . .d Id.ntlly by block numb.r)
regression analogue, trimmed mean, regression quantile, preliminary estimator,
linear model, trimmed least squares
20.
ABSTRACT (Contlnu. on r....r •• • Id. " n.c. . . .ry . . d Id.ntlly by block numb.r)
We consider two methods of defining a regression analogue to a trimmed
mean. The first was suggested by KoenkeY and Bassett and uses their concept
of regression quantiles. Its asymptotic hehavior is completely analogous to
that of a trimmed mean. The second method uses residuals from a preliminary
estimator. Its asymptotic behavior depends heavily on the preliminary
estimate; it behaves, in general, quite differently than the estimator
proposed by Koenker and Bassett, and it can be rather inefficient at the
DD
1473
EDITION OF • NOV 65 IS OBSOLETE
UNCLASSIFIED
SECURITY CLASSIFICATION Olf THIS PAGE (_.n
D.,. Ent.red)
SECURITy CLA$lI'I"'CATlON 01" THIS PAGE(1I'll_ Del. "".red)
20.
UNCLASSIFIED
normal model even if the percent trimming is small. However, if the
preliminary estimator is the average of the two regression quantiles
used with Koenker and Bassett's estimator, then the first and second
methods are asymptotically equivalent for symmetric error distributions .
•
{.
UNCLASSIFIED
$ECURITY CLASS'I"'CATIO. 0'
T"'"
PAGI!:(Wh.n D.t. En'.',-<l)