Carroll, R.J. and Ruppert, DavidRobust Estimators for Radom Coefficient Regression Models."

ROBUST ESTIMATORS FOR RANDOM COEFFICIENT REGRESSION MODELS
by
R.J. Carroll
1
and
David Ruppert
Running Title:
4It
2
RANDOM COEFFICIENT REGRESSION
lNational Heart, Lung, and Blood Institute and the University of North
Carolina. Supported by the Air Force Office of Scientific Research
Grant AFOSR F49620 82 C 0009.
2University of North Carolina.
Grant MCS 8100748.
Key Words:
Supported by National Science Foundation
M-estimation, random coefficients, heteroscedasticity, estimated
weights, Monte Carlo.
AMS 1970 Subject Classifications:
62G35, 62J05.
i
ABSTRACT
Random coefficient regression models have received considerable attention,
especially from econometricians.
have normal distributions.
Previous work has assumed that the coefficients
The variances of the coefficients have, in previous
papers, been estimated by maximum likelihood or by least squares methodology
applied to the squared residuals from a preliminary (unweighted) fit.
Maximum likelihood estimation poses difficult numerical problems.
Least
squares estimation of the variances is inefficient because the squared residuals
have a distribution with a heavy right tail.
In this paper we propose several robust estimators for random coefficients
models.
We compare them by Monte Carlo with estimators based on least squares
applied to the squared residuals.
The robust estimators are best overall, even
at the normal model.
Among the different robust estimators, none stands out as best.
All are
rather satisfactory and can be tentatively recommended for routine use.
1.
Introduction.
There is now a sizable literature on linear regression models where the re-
gression coefficients are random rather than fixed parameters.
See, for example,
Dent and Hildreth (1977), Hildreth and Houck (1968), Theil and Mennes (1959),
Froehlich (1973), Fisk (1967), Spjotvoll (1977), and Swamy (1971).
Such models
can be expressed in the form
y.1
(1.1)
where x!
-1
B!
-1
= (x.1 l , ... ,
= (Bo1 l , ... ,
= -x!1 -8.1
i = 1, ... , n
x. ) is a known vector of independent variables, and
1k
S·k)
is the unobserved vector of regression coefficients for the
1
ith observation.
We will assume, as is usual, that 8 , ... , 8 are i.i.d. with
-
means B and covariance matrices diag(a ,
l
solely with estimation of 8 and
~ =
(a ,
l
~l'
of the individual coefficient vectors
1
-'-fl
,
~).
Moreover, we will be concerned
,
~)'
• We will not consider estimation
,
~,
and in fact we will not assume
that there is enough information in the sample to do so.
Allowing -1
a. to have a
non-diagonal covariance matrix would not present great theoretical difficulties.
However, such models have a considerable number of parameters to estimate.
Monte-Carlo studies with diagonal covariance models, such as the work presented
here, suggest that rather large sample sizes would be necessary before estimation would be reasonably accurate for models with non-diagonal covariance matrices.
Notice that model (1.1) can be re-expressed as
y. = x!B
(1.2)
where EE.1
(1.3)
1
=
-1-
+
E.
1
0 and
2
Ec =
1
•
x!a
-1-
.
Here and throughout this paper, for any vector or matrix A, A is obtained by
2
squaring each entry of A.
We see then that for purposes of estimation, the ran-
dom. coefficient model can be treated as a fixed coefficient model with heteroscedasticity.
In particular, it would be possible to ignore
~
and simply estimate
However, there are two reasons for estimating a:
unweighted method.
~
by an
(1) the
variances of the random coefficients may be of interest in themselves, or (2)
only
~
may be of intrinsic interest but one hopes to obtain improved estimates
of ~ by using weights based upon ~, rather than using an unweighted estimate.
Estimators of
~
and
~
have been studied by Theil and Mennes (1959), Hildreth
and Houck (1968), Froehlich (1973), and Dent and Hildreth (1977).
tors each fall into one of two categories.
These estima-
The first category consists of
various methods which attempt to find the maximum of the likelihood function.
The second consists of finding a preliminary (unweighted) least squaresestima~
tor of ~, then estimating
£
by applying least squares (possibly weighted) to
the squared residuals from this fit, and finally re-estimating
~
by least
squares using weights from the estimate of a.
Until now it has been assumed in the literature that -1
8. has a normal distribution.
In this paper we introduce robust estimators for the random coef-
ficients model.
These estimators offer three possible advantages over previous
estimators:
(i)
They should be more efficient when coordinates of 8. have heavy tailed
-1
distributions, and they should be less sensitive to gross errors in
the response.
(ii)
Even when
B. is normally distributed, the squared residuals from a
-1
preliminary estimator of
mating
~,
~
will have a heavy right tail.
When esti-
robust methods might be superior to least squares.
3
(iii)
When re-estimating
~
using estimated weights, robust methods might
protect against poor estimates of the weights.
The purpose of this paper is to suggest several robust estimators and to
study the extent to which these possible advantages are realized.
and (ii) can be explored theoretically by asymptotics.
Monte-Carlo studies.
Point (iii) requires
This is because finite sample exact results seem impos-
sible to obtain, and asymptotically any consistent estimate of
which, when
Points (i)
estimating~,
~
gives weights
are just as good as the true weights (in terms of
first order asymptotic distributions).
In section 2, we describe those estimates that have performed best in previous Monte Carlo studies.
In section 3 we introduce some new methods which
are distribution robust.
For the homoscedastic linear model, the so-called bounded influence estimates studied by e.g. Krasker and Welsch (1981) are robust when there are outliers in the design (in the x vectors).
The need for bounded influence estima-
tors has been questioned by Huber (1981, section 7.9).
Nonetheless, we feel
that bounded influence estimators should be developed for heteroscedastic
models, including the random coefficients model.
In the present paper, however,
we do not consider estimators which bound the influence of the design vectors.
In section 4, we present a Monte Carlo study and draw some tentative conclusions from it.
We see that all three possible advantages of robust estima-
tion, i, ii, and iii above, are realized for the sampling situations we studied.
4
2.
Previous Estimators.
We will only mention the estimators which proved best in the Monte Carlo
studies of Froehlich (1973) and in Dent and Hildreth (1977).
Equation (1.2)
can be put in matrix form:
(2.1)
If M = I
l.
- X(X'X)-l X' and
G
..
= MX,
= X~
+
e
.
then as noted by Hildreth and Houck (1968),
the vector of squared residuals from a least squares fit to (1. 2) is
i: = Go.
(2.2)
where Ew
= O.
j=l, ... ,k.
In so doing, one should utilize the constraints:
a.. ;;:: 0,
J
One could simply truncate the least squares estimators below by
Alternatively, Dent and Hildreth (1977) and Froehlich (1973) define ~ to be
the minimizer of
4It
w
Therefore one can estimate a. by applying least squares method-
ology to (2.2).
O.
+
Iii - G~1I2
subject to these nonnegativity constraints.
~
Froehlich's (1973) Monte Carlo study shows that
is superior to the unrestric-
ted least squares estimator, either with or without truncation at O.
culation of
&is
The cal-
a quadratic programming problem, and Dent and Hildreth (1977)
used Lemke's (1965) method, in particular Ravindran's (1972) FORTRAN algorithm,
1\
to calculate cx.
Froehlich (1973) and Dent and Hildreth (1977) found value in using the
weighted estimate of
~
that was suggested by Theil and Mennes (1959).
1\
cuI ate this estimator one starts with w
=
•
r -
1\
G~,
which is the vector of resi-
duals from the restricted least square fit to (2.2).
w is estimated by ~'.
To cal-
The covariance matrix of
Theil and Mennes (1959) have argued that one may ignore
1\
the off-diagonal elements, so let us define
elements replaced by 0; formally
/\
I = I*(~')
I
to be ~, with the off-diagonal
5
where * denotes Hadamard product.
The Theil-Mennes estimators, which Dent and
Hildreth (1977) call ~(2), is the solution to
minimize (; - G~ rl(i - Go.)
subject to o..
1
~
a
i=l, ... , k .
This is, of course, a restricted, weighted least squares estimate.
2· 2
Note that if -1
(3. has a multivariate normal distribution, then r. = e. and
1
e~/E~
has a chi-square distribution with one degree of freedom.
1
1
l.
Thus, r~ will
1
have a very heavy right tail, and we can expect that least squares estimation
applied to (2.2) will not be very efficient.
Even restricted and weighted
least squares should be inferior to a good robust regression estimator.
If the (3. are normally distributed, then of course one can obtain asymp-l.
totically efficient estimators by maximum likelihood.
difficulties with the MLE.
There are, however, two
As is typical with normal-based likelihood methods,
the MLE will be sensitive to heavy-tailed deviations from the normality assumption and to the presence of a few gross outliers.
The second difficulty is in finding the maximum of the likelihood function.
Froehlich (1973) tried to find the MLE using the Newton-Raphson algorithm, but
this failed to converge in approximately twenty percent of the samples.
Dent
and Hildreth (1977) tried three methods of maximizing the likelihood function:
the Davidson-Fletcher-Powell algorithm, Fisher's method of scoring, and Brent's
(1973) PRAXIS algorithm.
The three algorithms often gave different optima, and
in almost all cases the solution from PRAXIS gave a higher value of the likelihood than the solutions from the two other algorithms.
ALGOL version of PRAXIS.
version.
Brent (1973) gave an
Dent and Hildreth (1977) report using a FORTRAN
We have been unable to obtain a FORTRAN version of PRAXIS.
6
3.
Robust Estimators.
First we review robust estimators for homoscedastic linear models.
Readers
unfamiliar with this material are referred to Huber (1981) for further discussion.
Consider model (1.2) but with the e. 's i.i.d. with a symmetric distri1
bution F.
An M-estimator of 6 is a solution to
A
Y·-X!6
\jJ( 1 -1-) x. = 0
A
-1
a
(3.1)
a is a scale estimate.
where \jJ is an appropriate function and
A
odd so that 6 will be consistent, and \jJ bounded so that
sensitive to outliers in the response.
for some k between 1 and 2 is COID~on.
A
~
Typically, \jJ is
will not be highly
The choice of \jJk(x)
= max(-k,min(x,k))
a are the standardized
Typical choices of
median absolute deviation, that is, MAD/(.674S), and Huber's proposal 2 which
solves (3.1) and
",n
Li=l X(
simultaneously.
Often, X(x)
= \jJ2(x)
y.-x!8
1 -1-
A
a
-
=
)
0
J \jJ2(x) d~(x), where ~ is the standard
normal distribution.
If F is asymmetric, and the model (1.2) has an intercept so that the
A
first (say) coordinate of x. is 1 for all i, then 8. consistently estimates 6.
-1
1
1
A
= 2, ... ,
k, but 6 need not be consistent; see Carroll (1979). Of course,
1
consistency will only be obtained under regularity conditions on the -1
x. and F
for i
and/or \jJ, but we will not pursue such niceties here.
A
Notice that if we apply an M-estimate to (2.2), then a.1 will be consistent
for i
= 2, ... ,
manner.
k.
We may, therefore, wish to estimate a
The M-estimate of
~
l
in an auxiliary
can be unweighted so that a robust analog of
the Hildreth-Houck estimator is obtained.
Alternatively, we could use a
weighted M-estimator with weights given by
I
A
and obtain a robust analog of the
7
Theil-Mennes estimator.
A weighted M-estimator with weights WI' ... , w is
n
defined to be an ordinary M-estimator applied to y./w. and x./w ..
1
1
-1
1
If the
ordinary M-estimator is scale equivariant because it utilizes a scale estimator
a as
in equation
(3.1), then the weighted M-estimator is unaffected by re-
placing w. by kw., k
1
1
> 0,
for i
= 1, ... ,
n.
We will now describe a robust estimator of a which, at the normal model; is
consistent for a
l
as well as a , ... ,~.
2
nary estimators of a and
Let
a and S be
consistent, prelimi-
S. Define
~
= J ~(z2
~
r.1
(3.2)
- 1) d¢(z) ,
= y. 1
x!8
-1-
and
(3.3)
Let G! be the ith row of G.
1
Then, let
/\
~
2 ,/\
r·-G·a
be the solution to
L~
1jJ(_1-1-~_ ~
1=1 {
w.
1
(3.4)
} G!
2
-:L
w.1
= 0 .
Although the r. have an asymmetric distribution, subtraction of
1
~
from
~
ensures consistency of ~ at the normal model.
Needless to say, the small sample properties of robustified Hildreth-Houck
and Theil-Mennes estimator, or of
preliminary estimates.
/\
~,
depend upon the particular choice of
In the next section, we define several specific versions
of these robustified estimators, and the study of them by Monte Carlo.
8
4.
Monte Carlo Comparison of the Estimators.
In our simulation study, we used six estimation algorithms.
Each algorithm
consisted of the following steps:
A
1)
A preliminary unweighted estimate, 8 , of S is computed.
2)
An unweighted estimate ~H' of ~ is found by fitting model (2.2).
3)
Using weights ii~H ' one computes a weighted estimate of S and calls it
~
-
A
~H·
4)
I
Let
I*~' where ~ is the vector of residuals from step 2.
=
Using these
weights, one computes a weighted estimate ~M of a.
5)
Using weights ii~M ' one finds a weighted estimate of
A
B and
A
calls it ~M.
Using ~M and ~M as preliminary estimates, one computes r i and wi from
equation (3.2) and (3.3). Then a is estimated by solving (3.4). Call the
6)
estimator
7)
A
~
•
Using weights
• A
~i2M
' one constructs an estimate of B and calls the result
A
~.
Note that in order to characterize an algorithm exactly, in steps 1, 2, 3, 4,
5, and 7 we must specify precisely which estimate is used.
Table 1 gives this
information- for each of the six algorithms that we employed.
estimate is either least squares or Huber's proposal 2 with
x
= ~2
used.
_
In each case the
~
= ~l.s
and
f ~2 d¢. When denoting an estimate, we include the specific algorithm
Thus
A
~(l)
is the estimate from step 3 of algorithm 1 and is, in fact,
the estimate ~ of Froehlich (1973) and of Dent and Hildreth (1977).
use HH(l) to denote both
A
~(l)
and
A
~(l).
We also
See table 7 for a summary of the
acronyms we used.
When estimating
•
d
by least squares (weighted or unweighted) we used re-
stricted least squares calculated by Ravindran's (1972) algorithm.
not develop a restricted M-estimate, so to estimate
the proposal 2 estimator at O.
d
We did
robustly, we truncated
Because the squared residuals have an asymmetric
9
distribution, in general the M-estimators in steps 2 and 4 will not estimate
a
consistently. Therefore, in algorithm 5 we tried a separate estimate of a l .
l
This was the average of
where
2.
(r l , "', rn)
A
Also, a
=
A
2
and a
3
r'
and G. is the ith row of G, where G is given in section
1
are the proposal 2 estimates truncated at O.
In algorithms 1 and 3-6, we solve equation (3.4) exactly in step 6.
Occasionally, the algorithm for solving (3.4) did
set
A
~
equal to
A
~M'
~ot
converge, and then we
In algorithm 2, we did not try to solve (3.4) exactly,
but rather we stopped after two steps towards the solution to (3.4).
A
A
Notice that a and a(2) of Dent and Hildreth (1977) are the same estimators
as HHI and TMI, respectively.
Of course, HHI
= HH2
and TMI
= TM2,
so HH2 and
TM2 are not included in the report of our results.
In our study, we used the same designs as Dent and Hildreth (1977), i.e.,
the first column is of ones, the second is harmonic, and the third is random
.
where "harmonic" and "random" are as described by Froehlich (1973).
sample sizes n
=
25 and n
=
We used
75, as did Dent and Hildreth, and 300 Monte Carlo
iterations.
We used
(1973).
~
=
(1.0, 1.0, 1.0)' as did Dent and Hildreth (1977) and Froehlich
For a we used a = (l.0, 0.2, 0.5)' which we call "mild heteroscedasti-
city" and which was used by Dent and Hildreth and Froehlich.
We also used
a = (1.0, 3.0, 3.0)', which we called "heavy heteroscedasticity."
see, when
~
= (1.0,
0.2, 0.5)' unweighted estimates of
heteroscedasticity) are rather good, but for a
.
=
~
As we will
(which ignore the
(1.0, 3.0, 3.0)' one always
does better by estimating the variance function and using a weighted estimate
of S.
10
The random coefficient error 6ik -
Bi was
;a; Zik'
where the Zik were
either independent N(O,l) variates (normal) or independent [.9N(0,1) + .IN(O,9)]
variates (contaminated normal).
normal sampling situation, but
ation.
Notice that the variance of B is
ik
1.8~
~
for the
for the contaminated normal sampling situ-
A robust scale functional which is Fisher consistent at the normal dis-
tribution, for example, the standardized MAD, will be
bution and close to
~,
not
1.8~,
~
for the normal distri-
for the contaminated normal distribution.
Therefore, for the contaminated distribution, the question arises as to the
parameter being estimated.
However, the ratios a(Bik)/a(B ik ,) are independent
of the particular scale functional, aCe), that is employed.
using the weights -x.a
to estimate
l.-
Moreover, when
B by a weighted proposal 2 (or weighted
least squares), only the ratios (~/~,) are relevant.
This is because 8 is
being estimated by a scale equivariant procedure.
To improve the accuracy of comparisons between the various estimators and
sampling situations, for each sample size we used the same stream of random
numbers for all estimators and sampling situations.
The only exception is
that for the contaminated normal distribution but not for the normal distribution we needed
Bernoulli (.1) random variables in order to decide if Zik was
to be N(O,l) or N(0,9).
Since we used two sample sizes, two choices of a, and two distributions we
had eight sampling situations.
In table 2, we give the mean square error (MSE) for the estimators of 8.
For each sampling situation and each of 8 , 82, and 83 we report the MSE of
1
only the seven estimators with the lowest MSE.
In table 3,. MSE are given for
estimates of a in the normal sampling situations.
To compare estimates of a
under the contaminated normal distribution, we use the parameters log(a /a ),
l 2
Of course, log (~./&.) is undefined if ~. =
l.
J
l.
° or
11
A
= 0,
a.
J
which is a common occurence for estimators truncated at zero.
There-
fore to compare the estimators of 10g(a./a.), we computed the median absolute
1
errors (MAE), which are the medians of
A
A
A
or a. = 0, we set log(a./a.) = 00
J
1
J
J
A
A
I10g(a./a.)
1
J
- log(a./a..)
J
1
I.
The MAE is finite provided ~.1
•
A
=0
If a.
1
F
° and
A
a. t 0 for more than fifty percent of the samples, which was always the case
]
in our study.
The advantage of using the parameter 10g(a./a.), rather than
1
J
(a./a.) is that the MAE of 10g(a./a.) is symmetric in i and j.
1
J
J
1
The MAE are
given in table 4.
From table 2 we can draw some conclusions pertaining to estimation of
1)
When n
=
~.
25 and the heteroscedasticity is mild, then unweighted estiA
mators are competitive with those which use a to estimate the optimal weights.
However, weighted estimates always work about as well or better than unweighted
estimates.
2)
The robust estimators of
~,
especially HH4, HHs, TM4, TMs, M4, Ms, and
M6, are in general quite good and can perhaps be recommended for routine use.
3)
Even at the normal model, robust methods typically outperform least
squares methods.
This may be partly due to the heavy right tail of the dis-
tribution of the squared residuals, which makes robust estimators of a more
efficient than least squares, even at the normal model.
However, HH6 and TM6
which utilize robust estimators of a and least squares estimators of
~,
are
not uniformly better, at the normal model, than HH4, HHs, TM4, TMs, and M4-M6.
These later estimators utilize robust estimators of both a and
B. This suggests
that robust estimators of B guard against poor choices of weights caused by
estimation errors for a.
4)
Whether Hildreth-Houck (HH) type estimators, Theil-Mennes (TH) type
estimators, or estimators based on equation (3.4) (which we denote by M) are
12
best depends upon the sampling situation and the particular coordinate of 13
being estimated.
Examining tables 3 and 4, we see that when estimating a, robust methods outperform non-robust methods, even at the normal model.
Moreover, algorithms 4
and 5 which use robust estimators throughout typically outperform algorithms 3
and 6 which use both robust estimates and least squares.
Although HH4 and TM4 do not produce consistent estimators of aI' in our
study, they worked well when estimating aI' log(al /a ), and log(a /a ). This
2
l 3
suggests that the biases of HH4 and TM4 for a are typically quite small compared to the standard deviations.
We constructed confidence intervals for 13 , 8 , and 8 as follows.
1
2
3
Define
S(a) = {n -1 \,n
!..
x .x !/ x• . a) }-l
i=l -1-1 - 1 -
e
and
2
o (~,m = (n-k)
-1
2
\,n
!..
i=l
•
(Yi - ~i8) /(xi~)
Let t(y,n) be the (1 - y) quantile of the t distribution with n degrees of
freedom, and let s(j)(~) be the jth diagonal element of Sea).
A
•
A
o.1 = (x!a)
-1-
Define
A
0
where ~ is a given estimate of a and ~ is the scale estimate from the proposal
2 estimate of
B.
Then, let
A
Y·-x!B
1 -1-
r. =
"
1
o.
1
~l
= n-
ti=l
1
$(r.) ,
1
"
-1 \,n
2
)
A2 = (n-k)
!..i=l ~ (r i
"
and
" =1
K
+
2k(1-A )
l
~l (h-2k)
13
where ~
= ~1.5' ~(x) =
A
(d/dx)~(x),
and
~
is a given estimate of 8.
A
K corres-
ponds to K of Huber (1981, page 174), but we use 2k instead of k as an ad hoc
adjustment for having extra parameters (a) to estimate.
A
If 6 is a weighted least squares estimator, then the confidence intervals
are
S.J
"t
o(~, S)t(y/2,n-k)S(j) (~ .
A
If S is the unweighted proposal 2 estimator, then the intervals are
A
A
6.
J
;\2
A
( .)
K S J (&)t(y/2,n-k)
oJ:: -
~2
The least squares intervals are obtained by applying standard methodology to
•
A
•
A
y./(x!a) and x./(x!a).
1
-1-
-1
-1-
The confidence intervals for proposal 2 are based on the
asymptotics in, for example, Huber (1981, chapter 7), again applied after
weighting by l/(x!&).
-1-
When constructing these intervals we make no adjustments
for the fact that we are using estimated weights,
•
A
x'~,
•
not x'a.
However, by
using techniques from other studies of heteroscedastic models, e.g., Carroll
and Ruppert (1982), one can show that each of the weighted estimation methods
considered here has the following property.
•
The asymptotic distribution of
A
!
•
is the same when the weights -x!a
are used as when the weights x!a
are used.
11A
The adjustment K is based on a higher order asymptotic expansion of Huber
(1973, 1981), and was found to be essential in a Monte Carlo study by Schrader
and Hettmansperger (1980).
Table 5 gives Monte Carlo coverage probabilities for n
estimators of S.
We used y
=
.05.
= 25
and for selected
The coverage probabilities are quite close
to .95, especially since the standard deviation of these estimated probabilities
is 0.0126, if the true probabilities are, in fact, .95.
The coverage proba-
bilities for n = 75 and for the other estimators are also close to .95, except
14
that HHI, HH3, TMI, TM3, MI, M2, and M3 typically have coverage probabilities
for
B1
between 0.83 and 0.90 when a
-
= (1.0,
3.0, 3.0)
I.
Among the estimators included in table 5 are the least squares estimator
and proposal 2.
By examining their coverage probabilities, we can see that
ignoring the heteroscedasticity, as these estimators do, does not seriously
degrade the validity of the confidence intervals.
15
5.
Conclusions.
The estimators which are robust for both
~ and~,
TM5, M4, and M6 are very satisfactory overall.
that is HH4, HH5, TM4,
We recommend them over the
standard Hildreth-Houck and Theil-Mennes estimators, even if there is no worry
about normality.
We did not consider maximum likelihood estimation in this
study, because of numerical difficulties which are involved.
As numerical
techniques improve and good algorithms become more available, maximum likelihood estimation will become a feasible technique.
Even then, however, maximum
likelihood estimators of variance parameters have quadratic influence functions
and are particularly nonrobust.
Thus we believe that robust estimators, either the ones studied here, or
perhaps ones which will be introduced in the future (e.g. estimators with
bounded influence), should be standard techniques and at the very least should
4It
be computed along with non-robust methods.
16
6.
Further work.
A
As mentioned previously, one can show that the distribution of
B for
each
of the weighted estimation methods considered here is the same asymptotically
when the estimated weights x!~ are used as when the weights x!a based on the
-1.-
true value of a.
-1.-
This conclusion may be established using techniques developed
in Carroll and Ruppert (1982), though the results there are not directly applicable since the random variables
y. -x! B
1. -1.-
(x. a) t;i
1.
are generally not identically distributed.
known previously.
The result may not have been widely
Froehlich (1973) states that he "had limited success so far
establishing the asymptotic distribution of the resulting estimators of B."
Dent and Hildreth (1977) make no mention of asymptotic properties.
The asymptotic distribution of ~ remains an open problem.
Influence func-
tions for the estimators, conditions for consistency, and further results on
asymptotic distributions, particularly for ~, are the focus of current research
which will be reported at a later date.
REFERENCES
Brent, Richard P. (1973). Algorithms for Minimization Without Derivatives.
wood Cliffs, N.J.: Prentice-Hall.
Eng1e-'
Carroll, R.J. (1979). On Estimating Variances When the Errors Are Asymmetric.
J. Am. Statist. Assoc~ 74. 674-679.
Carroll, R.J. and David Ruppert. (1982). Robust Estimation in Heteroscedastic
Linear Models. To appear in Ann. stat.
Dent, Warren T. and Hildreth, Clifford. (1977). Maximum Likelihood Estimation in
Random Coefficient Models. J. of Am. Statist. Assoc~ 72. 69-72.
Fisk, P.R. (1967). Models of the Second Kind in Regression Analysis.
tist. Soc. B~ 29. 266-281.
J.
Roy. Sta-
Froehlich, B.R. (1973). Some Estimators for a Random Coefficient Regression Model.
J. Am. Statist. Assoc.~ 68. 329-335.
Hildreth, Clifford and Houck, James. (1968).
with Random Coefficients. J. Am. Statist.
~
(1973).
Some Estimators for a Linear Model
63. 584-595.
Assoc.~
Huber, Peter J.
Carlo. Ann.
Robust Regression:
5. 799-821.
Asymptotics, Conjectures and Monte
Statist.~
Huber, Peter J.
(1981).
Robust Statistics.
John Wiley and Sons, N.Y.
Lemke, C.E. (1965). Bimatrix Equilibrium Points and Mathematical Programming.
Management Science~ 11. 681-689.
Krasker, William S. and Welsch, Roy E. (1981). Efficient Bounded-Influence Regression Estimation. Manuscript. (To appear in J. Am. Statist. Assoc.).
Ravindran, Arunacha1an. (1972). A Computer Routine for Quadratic and Linear Programming Problems [H]. Comrrrunications of the ACM~ 5. 818-820.
Spjotvo11, E. (1977). Random Coefficients Regression Models.
Operationsforsch. Statist.~ Sere Statist.~ 8. 69-93.
A Review.
Math.
Swamy, P.A.V.B. (1971). statistical Inference in Random Coefficient Regression
Models. Springer-Verlag, Berlin.
Theil, Henri and Mennes, L.B.M. (1959). Conception Stochastique de Coefficients
Mu1tip1icateurs dons L'ajustement Lineaire des Series Tempore11es. Publications
de l'Institut de Statistique de L'University de Paris~ 8. 211-227.
e
Algorithm
Step. parameter
being estimated
and notation for
estimation
1
e
'<
e
4
5
6
Proposal 2
Proposal 2
Proposal 2
Least squares
Least squares
Proposal 2,
Proposal 2,
Proposal 2.
-restricted
truncate at 0
truncate at 0,
truncate at 0
1.2
3
Least squares
(3
Step
=
2 Least squares
Parameter
=
a
Notation
-restricted
= HH
separate estimate
of a
1
Weighted least
Weighted
Weighted
Weighted
Weighted least
= ~
= HH
squares
Proposal 2
Proposal 2
Proposal 2
squares
Step
=
4
Weighted least
Weighted least
Weighted
Weighted
Weighted
Parameter
=
a
squares-
squares-
Proposal 2
Proposal 2. trun-
Proposal 2.
restricted
restricted
cate at O. sepa-
truncate at 0
Step
Parameter
Notation
Notation
=
3
= TM
rate estimate of a
=
l
Weighted least
Weighted
Weighted
Weighted
Weighted least
= (3
= TM
squares
Proposal 2
Proposal 2
Proposal 2
squares
Step
=
7
Weighted least
Weighted
Weighted
Weighted
Weighted
Parameter
=
~
squares
Proposal 2
Proposal 2
Proposal 2
Proposal 2
Notation
=
M
Step
Parameter
Notation
5
Table 1. Estimates used in the six algorithms used in our Monte Carlo study. Weighted Proposal 2 with weights
WI' . • . • wn is obtained by applying Proposal 2 to yl/w l ' ...• yn/w n • -xl/wI' ...• -n
x /w.
The separate estimate of
n
a is described in the text. In algorithms 1. 3. 4. 5. and 6, step 6 solves equation (3.4) with preliminary estil
mates coming from steps 4 and 5. Algorithm 2 differs in that in step 6 only two steps towards the solution of (3.4)
are used. Any esximator can be ide~tified by giving the step at which it was produced and the algorithm number used.
Thus TM4 denotes a from step 4 and B from step 5. both from algorithm 4. The estimator of (3 from step 6 is denoted
by M.
-
-
e
•
,
e
e
Sampling Situation
2
1
Sample size
Distribution
Heteroscedasticity
Rank for
A
MSE of 61
Rank for
A
MSE of 62
Rank for
A
MSE of 63
25
N
M
I
25 N
H
1
2
3
4
5
6
7
PT
LS
HHS
M5
HH4
M1
M2
0.0673
0.0675
0.0689
0.0690
0.0702
0.0715
0.0715
M5
M4
M6
M2
M3
HH4
TM3
1
2
3
4
5
6
7
LS
PT
HH5
M5
HH4
M4
M6
0.0827
0.0836
0.0853
0.0856
0.0868
0.0871
0.0871
1
2
3
4
5
6
7
HHS
LS
PT
M5
HH4
M4
M6
0.1004
0.1006
0.1009
0.1019
0.1036
0.1037
0.1070
3
u
0.2323
0.2388
0.2395
0.2452
0.2460
0.2467
0.2768
-1-
75
N
M
I
4
75
N
H
5
-u-
-----CN25
I M
6
25
CN
H
I-
I
7
8
75
CN
M
75
CN
H
0.0217
0.0217
0.0218
0.0219
0.0220
0.0220
0.0222
M5
M6
M4
TM5
HH5
M2
M3
0.0734
0.0735
0.0752
0.0786
0.0791
0.0794
0.0810
HH5 0.0993
HH4 0.1048
M50.1050
TM5 0.1058
PT 0.1070
M6 0.1089
M4 0.1095
M5
HH5
TM5
M4
M6
HH4
TM4
0.3175
0.3255
0.3279
0.3348
0.3402
0.3536
0.3537
M5 0.0319
HH5)0.0321
TH5 0.0321
M4 0.0325
M6 0.0326
HH4 0.0331
PT 0.0336
M3 0.1009
M2 0.1029
M4 0.1032
M6 0.1047
M5 0.1056
HH40.1103
TM5 0.1140
HH5)0.3566
HH6 0.3566
HH4 0.3599
TM5 0.3623
M5 0.3642
M4 0.3706
M6 0.3725
0.0205
0.0206
0.0206
M4
0.0206
M5
M6
0.0206
IH4jIH~. 0211
1\14,TM .0211
M4
M5)
M6
M2
M1
M3
HH
0.0867
0.0868
0.0868
0.0872
0.0873
0.0884
0.0886
M5
M3
PT
HH5
HH4
M4
M2
0.1221
0.1224
0.1234
0.1236
0.1237
0.1243
0.1250
HH5
HH4
TM3
M5
M4
TM4
M6
0.5265
0.5277
0.5387
0.5406
0.5435
0.5463
0.5535
M4) 0.0291
M5 0.0291
M6 0.0291
HH4 0.0296
TM4 0.0296
HH5)0.0297
TM5 0.0297
M4
M1
M3
M2
M6
M5
HH4
0.1217
0.1218
0.1220
0.1221
0.1225
0.1226
0.1251
HH5 0.4110
HH6 0.4137
HH4)0.4143
M5 0.4143
N2 0.4196
M4 0.4214
M3 0.4257
HH5
TM5
HH4
TM4
HH6
M1
M4
M6
M1
M2
HH4
HH5
0.1182
0.1184
0.1187
0.1195
0.1197
0.1201
0.1204
005
004
M5
TM5
PT
M4
M6
0.1223
0.1263
0.1271
0.1283
0.1289
0.1292
0.1303
HH1
HH5
M4
M6
TM4
PT
M2
0.5440
0.5453
0.5583
0.5658
0.5844
0.5849
0.5889
HH5
TM5
HH4
M5
TM4
M4
M6
M5
HH4
TM5
HH5
TM4
M4
M6
0.1544
0.1551
0.1555
0.1557
0.1560
0.1567
0.1577
HH5)
M2
M3
M5
TM5
M6
IH4, M4
M3
M2)
M5)
M6
0.0301
0.0303
0.0304
0.0306
0.0307
0.0308
0.0308
0.0373
0.0374
0.0382
0.0382
0.0289
0.0391
0.0392
Table 2. MSE of estimators of B. PT = Proposal 2 (unweighted). LS = least squares (unweighted). HH, TM, and M
are respectively Hildreth-Houck-type estimates, Thei1-Mennes type estimates, and M-estimates which solve equation
(3.4). Numbers after HH, TM, and M indicate the particular algorithm used. Ties are indicated by round brackets.
Heteroscedasticity is M (mild) and H (heavy) for a = (1.0, 0.2, 0.5) and a = (1.0, 3.0, 3.0) " respectively. N is
the normal distribution and CN is the contaminatea normal distribution. -
Sampling Situation
Sample size
Heteroscedasticity
1
2
3
4
25
M
25
H
75
M
75
H
Estimator MSE
-
of
1\
0.
1
of
1\
0.
2
1\
0.
3
Table 3.
Estimator MSE
-
7
2.45
2.50
2.82
2.98
3.01
3.02
3.09
HH5
TM5
HH4
HH6
M5
TM4
M6
0.128
0.129
0.169
0.171
0.176
0.182
0.182
M2
M1
M3
M6
M4
M5
TM4
1.09
1.13
1.13
1. 27
1.31
1.64
1. 70
1
2
HH4
HH5
0.183
0.183
0.192
0.199
0.200
0.213
0.214
M5
M4
M6
004
005
HH6
M2
4.53
4.70
4.70
4.72
4.72
4.82
5.06
HH4
005
006
TM5
TM4
TM6
M4,M5
0.049
0.049
0.050
0.053
0.054
0.055
0.066
M5
M2
M6
M1
M4
M3
TM5
1.93
1.96
1.96
1.97
1.99
2.00
2.30
0.289
0.289
0.311
0.336
0.343
0.360
0.362
HH4
HH5
006
TM1
M5
M4
M6
6.59
6.59
7.03
8.20
8.90
9.07
9.22
HH4
HH5
HH6
M5
M4
M6
TM5
0.093
0.093
0.093
0.096
0.097
0.097
0.098
M5
M6
M4
Ml
M2
M3
TM5
2.15
2.19
2.25
2.26
2.28
2.37
2.77
3
4
5
3
4
5
6
1
2
of
MSE
-
TM3
TM1
M2
TM4
M3
TM6
M5
7
Rank for MSE
~stimator
0.288
0.339
0.393
0.397
0.399
0.406
0.510
6
Rank for MSE
-
HH5
TM5
HH1
HH3
TM1
TM3
M5
1
2
Rank for MSE
Estimator MSE
3
4
~f5
M4
H6
M2
Ml
HH4
HH5
M6
M5
5
~14
6
7
H2
HI
MSE of estimators of a at the norm of model.
For notation see Table 2.
Sampling Situation
Sample size
Heteroscedasticity
1
2
3
4
25
M
25
M
75
M
75
M
Estimator MAE
Estimator MAE
-
Rank for MAE
of 10g(o. /Ci )
1 2
Rank for MAE
of log (Ci/ Ci 3 )
.
"
Rank for MAE
of 10g(o./o.3)
Table 4.
-
Estimator MAE
-
Estimator MAE
-
TM5
TM4
TM3
TM1
M1
M2
M4
0.634
0.713
0.835
0.827
0.881
0.881
0.898
M4
M5
M2
Ml
M3
HH5
HH4
0.451
0.582
0.906
0.907
0.935
1.154
1.399
TM5
TMl
TM3
TM4
M4
Ml
M3
0.383
0.590
0.605
0.735
0.861
0.964
0.965
M4
M1
M2
M3
M5
HH5
HH4
0.305
0.690
0.697
0.742
0.801
1.116
1.196
TM4
M4
M2
Ml
M3
M5
HH5
1.565
1.680
1. 713
1.727
1.754
1.877
2.057
M4
M5
Ml
M2
M3
HH4
HH5
0.263
0.313
0.366
0.367
0.378
1.059
1.071
M5
TM4
M4
TMI
TM5
TM3
Ml
1.004
1.036
1.066
1.078
1.150
1.153
1.583
M4
M5
M2
Ml
M3
TM5
TM4
0.206
0.269
0.295
0.301
0.343
0.844
0.933
TM4
TM5
TMI
TM3
M2
Ml
M4
1.031
1.058
1.108
1.108
1.174
1.200
1.202
M4
M5
M3
Ml
M2
HH4
HH5
0.367
0.392
0.397
0.409
0.411
1.006
1.028
TM3
TMI
M5
M4
TM4
TM5
M1
0.693
0.706
0.718
0.783
0.990
1.015
1.060
M4
M5
M2
M3
Ml
TMS
TM4
0.275
0.310
0.313
0.313
0.314
0.769
0.814
MAE for estimators of 10g(o../o..).
1
J
For notation see Table 2.
absolute error, which is defined in the text.
MAE is the median
Coverage Probabilities
13 1
..
Estimator
1
LS
PT
ffiI4
ffiI5
TM4
TM5
M4
M5
M6
0.977
0.973
0.943
0.953
0.920
0.940
0.930
0.940
0.937
0.967
0.967
0.950
0.957
0.953
0.953
0.947
0.947
0.947
0.930
0.937
0.953
0.953
0.953
0.960
0.950
0.953
0.950
LS
PT
ffiI4
HH5
TM4
TM5
M4
M5
M6
0.983
0.983
0.913
0.937
0.903
0.923
0.920
0.920
0.920
0.943
0.947
0.933
0.923
0.913
0.917
0.917
0.913
0.913
0.923
0.927
0.960
0.963
0.957
0.953
0.953
0.950
0.953
LS
ffiI4
HH5
TM4
TM5
M4
M5
M6
0.977
0.980
0.950
0.967
0.943
0.950
0.940
0.953
0.940
0.973
0.983
0.973
0.977
0.973
0.970
0.977
0.973
0.977
0.963
0.963
0.970
0.970
0.973
0.967
0.973
0.970
0.973
LS
PT
ffiI4
HH5
TM4
TM5
M4
M5
M6
0.980
0.980
0.933
0.947
0.910
0.947
0.930
0.950
0.927
0.950
0.960
0.933
0.943
0.933
0.940
0.923
0.927
0.923
0.943
0.947
0.953
0.960
0.947
0.957
0.957
0.960
0.957
2
(normal distribution,
heavy heteroscedasticity)
•
e
13 3
Sampling situation
(normal distribution,
mild heteroscedasticity)
po
13 2
5
(contaminated normal,
mild heteroscedasticity)
6
(contaminated normal,
heavy heteroscedasticity)
PT
Table 5. Coverage probabilities for n = 25 and selected estimators of S.
For notation see Table 2. Nominal coverage probabilities, based upon large sample approximations, are 0.95.