A COMPARISON BETWEEN MAXIMUM LIKELIHOOD
AND GENERALIZED LEAST SQUARES IN A HETEROSCEDASTIC LINEAR M:>DEL
by
R.J. Carroll* and David Ruppert**
Abstract
We consider a linear model with normally distributed but
heteroscedastic errors.
When the error variances are functionally
related to the regression parameter, one can use either maximum
"-
likelihood or generalized least squares to estimate the regression
parameter.
We show that maximum likelihood is much more sensitive to
small misspecifications in the functional relationship between the error
variances and the regression parameter.
Key Words:
Linear Models, Heteroscedasticity, Contiguity, Robustness,
Weighted Least Squares, Maxirntun Likelihood.
Subject Class:
Primary 62J05; Secondary 62G3S.
* University of North Carolina and National Heart, Lung and Blood
Institute. Work supported by the U.S. Air Force Office of Scientific
Research Contract AFOSR-80-0080. Part of this work completed at the
Universitat Heidelberg, with support from the Deutsche
Forschungsgemeinschaft.
** University of North Carolina.
Work supported by NSF Grant MCS78-01240.
2
1.
Introduction
There has been considerable recent interest in the heteroscedastic
linear model
Y.1 = x.1 8 +
(1.1)
E.
1
-~
[f(x.,8,8)]
1
where l3(p x l) is the regression coefficient, {xi (1 xp)} are the design
vectors,
{E.}
1
are independent and identically distributed with
distribution function F, and the function f (xi' 8,8) expresses the
possible heteroscedasticity.
Bickel (1978) considers various tests of
the hypothesis of homoscedasticity, i.e. tests of
0_
Ho : f(x.,8,8)
1
(1. 2)
= constant.
His work has been extended by Carroll and Ruppert (198la), and the tests
have been shown to be locally most powerful by Hammerstrom (1981).
Other
recent papers are Jobson and Fuller (1980), Carroll and Ruppert (198lb),
Box and Hill (1974), and Fuller and Rao (1978).
Box and Hill (1974), Ruppert and Carroll (1979), and Jobson and
Fuller (1980) suggest various forms of generalized weighted least squares
estimates (GLSE) of 8.
/\
Basically, the suggestion is to obtain preliminary
/\
/\
/\
-1
estimates (8p
,8) of (8,8), compute
estimated p
weights [f(x., 8 ,8)] ,
I
and then perfonn ordinary weighted least squares. Carroll and Ruppert
(198lb) emphasize robustness and develop methods which are robust against
outliers and non-normal distributions F; they prove that generalized
M-estimates of 13, which include GLSE estimates as special cases, are just
as good asymptotically as if the weights were really known.
The same
phenomenon has been found in other models of heteroscedasticity; see
Williams (1975) for a review.
3
Jobson and Fuller (1980) suggest using the infonnation about S in
the function f to improve upon the GLSE.
They state that their method is
asymptotically equivalent to the MLEfor S obtained by setting up the
normal likelihood based on (1.1) and maximizing it; this likelihood is
~
(1.3)
N
L
'1
1=
log(f(x.,S,8)) 1
~
N
L
'1
1=
(Y.-x.S)
11
2
f(x.,S,8) .
1
They have a very interesting result suggesting that as long as (1.1) is
correct and F is nonnal, then the MLE will be preferred to the GLSE.
In this heteroscedasticity problem, we have an additional robustness
consideration.
0_
Besides the usual goal (Huber (1977)) of protecting
ourselves against outliers and non-nonnal error distributions, we also
must protect ourselves against slight misspecifications in the functional
relationship between Var(Y.)
and (x.1 ,S,8).
1
Since this ftmctional
relationship expressed in (1.1) through f is typically at best an
approximation, and since our primary interest is estimating S, we would
prefer not to estimate 6 by a statistic which is adversely affected by
slight missspecifications of f.
In this note, we assume that the error distribution F is actually
nonnal.
We study the robustness of GLSE and MLE to small specification
errors in f by means of simple contiguity techniques.
We show that small
mistakes in specifying f can easily make GLSE preferable to the MLE.
2. A Contiguous Mldel
We consider small deviations from (1.1) in the form of
(2.1)
Y.1
= x.1 6
+
[gN(x.,
6,8)] -~ e:.,
1
1
4
where
gN(x.,S,8)
= f(x.,S,8){1 + 2BN -~ h(x.,S,8)}
I I I
(2.2)
-1 N 2
N
L h (x.,S,8)
. 1
1
1=
+ ~
(0 <
~
< (0)
{e: .} are i. i. d. standard normal.
1
One should note that the model (2.1) is very close to the assumed
model (1.1).
Thus the model (2.1) fits our needs because the variance
misspecification error is very small and decreases for larger sample
sizes.
An
estimate of S which is robust against specification errors
should have the same asymptotic properties under both models (1.1) and
"-
(2.1).
Thus the question at hand is to study the sensitivity of the MLE
and GLSE when (1.1) is assumed but (2.1) is true.
If R,l denotes the
log-likelihood for (1.1), and R,2 is the log-likelihood for (2.1), it is
quite simple to show that, when (1.1) is true, to order
.
. -B
2
N
'\
2
0
p
(1),
1
-~
~ + . L (e: i-I) Bh (~ i' ~ , ~ N
1=1
so that
(2.4)
L(R,*)
L
+
N(-B
2
lJ,
2B
2
lJ)
when model (1.1) holds,
where N(a,b) is the normal distribution with mean a and variance b.
From Corollary 1. 2 of Haj ek and
v
Sid~
(1967, p. 204), this means that
model (2.1) is contiguous to model (1.1).
5
3.
Limit Distributions for GLSE
Suppose that for some positive definite matrix S,
l
N-
(3.1)
N
L\'
i=l
' x. f (x
X
0
1
1
0
,
6,8)
-+
S •
1
Then, assuming normal errors and smoothness conditions on f, Carroll and
Ruppert (198lb) (as well as Jobson and Fuller (1980)) show that when model
A
(1.1) is true, the GLSE 6G satisfies
(3.Z)
0_
lA
1
~(6G -6) - N-"2
NIl
L S- x~ P(xo, 6,8)£.
i=l
1
1
£
0
1
(3.3)
A formal proof is possible as long as f is smooth, {f(x.,6,8)} is
1
A
A
bounded away from 00, and (6 ,8) satisfy
p
!,; A
N2 (6p - 6) = Opel)
(3.4)
1
A
N"2(8 -8) = Opel) .
Carroll and Ruppert (198lb) and Jobson and Fuller (1980) verify (3.4) in
the normal case under certain technical conditions.
Now, since
{Eo}
1
are symmetric random variables, one uses (Z.3) and
(3.Z) to show that £* = £Z - £1 and ~(SG-6) are independent, so that
v
by LeCam's third lemma (Hajek and Sidak (1967, p. Z08)),
(3.5)
and this under either modeZ (1.1) or (2.1).
This means that GLSE is
6
robust against small specification errors of the variance function f.
This encouraging result suggests that one will not go too wrong with GLSE
as long as model (1.1) is reasonable.
4.
Limi t Distribution for the MLE
While GLSE is robust against minor errors in specifying the function
f in model (1.1), the same cannot be said for the MLE.
by
A
~.
matrix
Denote this MLE
Jobson and Fuller (1980) show that for a particular covariance
r,
I
A
~(~-8)
(4.1)
L
+
N(O,r) .
The result of particular interest is that r is no larger than S (see
(3.1) and (3.3)) in the sense that S - r is positive semi-definite under
the model (1.1).
In addition to (4.1), from (2.3) and the proof of
1
A
Theorem 2 in Jobson and Fuller (1980), ~(I\i - 8) and £* are jointly
2
2
asymptotically normal with mean (0, _B ll), marginal variances (r, 2B ll) ,
and covariances Bq computed below, i. e.
;]) .
(4.2)
2B 11
We now indicate why it is true that the only cases in which the MLE
can be expected to be robust against variance specification errors is
when S
= r and the MLE is asymptotically equivalent to GLSE. To see
this, first consider model (1.1) to hold.
that
SG
Jobson and Fuller (1980) show
is essentially a linear function of h:.} and {£~ - l}, Le. for
1
scalars {v.} and {w.},
1
1
1
7
N
2
{v.£. + w. (£. -I)} + 0 (1) .
I
(4.3)
1 1
i=l
1
P
1
= 1, ... , N), then from (3. 2), (4 . 3), and
I f we have wi:: 0 (i
Gauss-Markov, we have that
and the estimates have the same limit distribution.
A
~
Thus the only way
A
to improve on B under (1.1) is for the {wi} to be non-zero.
G
In this case, however, we can perform contiguity calculations based on
for
(2.3) and (4.3), thus showing that under model (2.1),
"-
(4.4 )
_x
L
A
N·(~-B) -+
q
N(-2Bq, 1:) ,
-1
= lim N
N
I
·1
!I.~
I"'""'"
1=
w.h(x.,B,e) .
1
1
Of course, q will be non - zero in general if the {wi} are.
These results have important consequences for efficiency.
we wish to estimate the linear combination a.' B.
Then, under model
(1.1) ,
(4.5)
N MSE (a.'
A
~) -+
a.' Ea..
However, under the model (2.1),
(4.6)
A
N MSE(o.' B )
G
-+
a.' S
-1
a.
Suppose
(no change)
8
and of course a'
A
l\1
will be a rather poor estimate if a is not orthogonal
to q and B is large.
5.
Monte-Carlo
We performed a small Monte-Carlo study to illustrate the results
given above.
The model is that of Jobson and Fuller (1980), with a
sample size of N = 40:
The presumed model for variances is
(5.1)
We choose the design {x il ,x i2 } as in Table 1 of Jobson and Fuller (1980),
with their choices (8 0 ,81 ,8 2) = (10,-4,2) and (al ,a 2) = (300,.2). All
experiments were replicated 200 times.
The first departure from model (4.1) was quite moderate:
(5.2)
The second departure was quite substantial and reflects severe
heteroscedasticity:
(5.3)
Besides the estimator JLS defined by Jobson and Fuller (1980), three
others were considered.
The first (LSE) is ordinary least squares.
The
second (ROBUST) is simply a Huber Proposal 2 regression estimate which
9
for the model EY.
1
= Bx.1
simultaneously solves
EIjJ ( (Y. - Bx.) /0) x./0
111
=0
(5.4)
_1_ E [1jJ2 ((Y. - Bx. )/0)
N-3
1
1
where ljJ(x) = max(-2, min(x,2)).
dv ,
Note that the least squares estimate of
= x.
B can be generated from (5.4) by choosing IjJ (x)
further details about robustness.
See Huber (1977) for
The third estimate (WEIGHT) is a
weighted robust estimate (Carroll and Ruppert (198lb)) defined as follows.
(i)
-_
(ii)
(iii)
(iv)
(v)
A
Let B = Huber's Proposal 2 estimate.
•
Let r
A
2
•
- x. B) , M as in Jobson and Fuller (1980).
PrO~S:l 2 for the model Er : : [:1].
Perform
.
A2
Dehne 0i
A
A2
2
2
Perform weighted Proposal 2 by solving (5.4) with 0 replaced
A
by 0..
1
(vi)
= (Y.
A
= al
+ a (Xi B) •
A
Call the est imate B.
Repeat (ii)-(v).
The standard normal distribution N(ll
error distributions used.
= 0,
0
= 1)
was one of the
The second was a contaminated normal.
lIDiform (0,1) random variable U and a N(O, 0
have the error as Z lIDless U
~
= 1)
A
Z were chosen.
We
.90, in which case the error is 3Z.
We
used the IMSL routines GGUBS and GGNPM.
The starting seed was 123457.
We start the process by generating 50 lIDiforms from GGUBS and then SO
normals from GGNPM; this we repeated 350 times.
We then repeated the
experiment 200 times, each time generating 50 lIDiforms and then SO
normals.
10
The outcome of our experiment is given in Table 1.
The entries are
the ratios of mean square errors of different estimates to the LSE.
It
is clear from the results that the scoring estimate JLS is very sensitive
to the assumption of normality and somewhat sensitive to specification
errors of the model (5.1).
11
TABLE 1
Ratio of MSE for different estimates to the MSE of the LSE.
Standard
Nonna1
Contaminated
Nonna1
Correct Model (4.1)
JLS
ROBUST
WEIGI-IT
So
f\
S2
So
S1
S2
.80
.96
.86
.94
.99
.94
1.04
1.00
.99
.94
.77
.69
1.13
.80
.73
1.14
.77
.71
Incorrect Model (4.2)
"_
JLS
ROBUST
WEIGHT
So
Sl
S2
So
Sl
S2
.80
.96
.86
.94
.99
.94
1.04
1.00
.98
.96
.77
.69
1.09
.80
.73
1.09
.77
.71
Sl
S2
9.33
.57
.40
5.41
.54
.43
Incorrect Model (4.3)
JLS
ROBUST
WEIGI-IT
So
Sl
S2
4.08
.56
.42
5.40
.69
.46
7.94
.68
.56
So
11. 21
.45
.39
12
References
Bickel, P.J. (1978). Using residuals robustly I: tests for
heteroscedasticity, nonlinearity. Ann. statist. 6, 266-291.
Box, G.E.P. and Hill, W.J. (1974). Correcting inhomogeneity of variances
with power transformation weighting. Technometrics 16, 385-389.
Carroll, R.J. and Ruppert, D. (198la). On robust tests for
heteroscedasticity. Ann. statist. 9, 205-209.
Carroll, R.J. and Ruppert, D. (198lb). Robust estimation in
heteroscedastic linear models. Manuscript.
Fuller, W.A. and Rao, J.N.K. (1978). Estimation for a linear regression
model with unknown diagonal covariance matrix. Ann. Statist. 6,
1149-1158.
v
Hajek, J. and Sidak, Z. (1967).
New York.
Theory of Rank Tests.
Academic Press,
Hammerstrom, T. (1981). Asymptotically optimal tests for
heteroscedasticity in the general linear model. To appear in
Ann. statist.
Huber, P.J. (1977).
Robust statistical
~ocedures.
SIAM, Philadelphia.
Jobson, J.D. and Fuller, W.A. (1980). Least squares estimation when the
covariance matrix and parameter vector are functionally related.
J. Amer. statist. Assoc. 15, 176-181.
Ruppert, D. and Carroll, R.J. (1979). M-estimates for the
heteroscedastic linear model. Institute of Statistics Mimeo Series
#1243, University of North Carolina.
Williams, J.S. (1975). Lower bounds on convergence rates of weighted
least squares to best linear unbiased estimators. In A SUrvey of
statistical Design and Linear Models. J.N. Srivastava, ed.
North-Holland, Amsterdam .
•
SEl;URITY CLASSIFICATION OF THIS PAGE (WItI'M Da/a EnrMed)
UNCLASSIFIED
r
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTAllON PAGE
3.
RECIPIENT'S CATALOG NlJM8FR
5.
TYPE OF REPORT & PEnlOD COVFRED
A Comparison Between Maximum Likelihood and
Generalized Least Squares in a Heteroscedastic
Linear Model
6.
PERFORMING
AUTHO"(a)
B.
1. REPORT NUMBER
4.
7.
GOVT ACCESSION NO.
TITLE (and Subtltla)
•
11.
PERFORMING ORGANIZATION NAME AND AODRESS
CONTROLLING OFFICE NAME AND ADDRESS
Air Force Office of Scientific Research
Bolling Air Force Base
Washington, DC 20332
f4.
MONITORING AGENCY NAME
a
ADDRESS(1t dlll",.",/ lrom
O~G.
REPORT NUMBER
Mimeo Series No. 1334
CONTRACT OR GRANT NUMBER(')
AFOSR Contract AFOSR-80-0080
NSF Grant MCS78-0l240
R.J. Carroll and David Ruppert
9.
TErnNICAL
Co"""11/",,
10
PROGRAM ELEMENT. PROJECT, TASK
AREA a WORK UNIT NUMBERS
12.
REPORT DATE
I-
March 1981
13.
NUMBER OF PAGES
15.
5ECURITY CLASS. (0/
12
Olllr~)
tI". '''1''''')
---
UNCLASSIFIED
15;'- -iirr:t-,,<.51fICATION-(JOWNGRAOING-SCHEDULE
_ _ _ _ L....
16.
DISTRIBUTION STATEMENT (of thla Repo,t)
"-
Approved for Publi't Release - - Distribution Unlimited
17.
DISTRIBUTION STATEMENT (01 the abat,a.,t enle,eu In Rlock 20, If dlffe'MI f,om U"pot')
18.
SUPPLEMENTARY NOTES
19.
KEY WORDS (Conl/nue on ,eve,.e "Ide If n ...,e••a,y and Identify by block numbe,)
Linear Models, Heteroscedasticity, Contiguity, Robustness, Weighted Least
Squares, Maxirm.un Likelihood
20.
ABSTRACT (Continue on ,eve,.e aide If nece •• a,y and Idontlfy by blo.,k nllmbe,)
We consider a linear model with normally distributed but heteroscedastic
errors. When the error variances are functionally related to the regression
parameter, one can use either maximum likelihood or generalized least squares
to estimate the regression parameter. We show that maxirm.un likelihood is much
more sensitive to small misspecifications in the functional relationship
between the error variances and the regression parameter.
DO
FORM
1 JAN 73
1473
EDITION OF 1 NOV 65 IS OBSOLETE
© Copyright 2026 Paperzz