Biometrika (1988), 75, 1, pp. 35-43
Printed in Great Britain
An asymptotic theory for weighted least-squares with weights
estimated by replication
BY RAYMOND J. CARROLL AND DAREN B. H. CLINE
Department of Statistics, Texas A&M University, College Station, Texas 77843, U.S.A.
SUMMARY
Some key words: Generalized least-squares; Heteroscedasticity; Regression; Replication.
1.
INTRODUCTION
Consider a heteroscedastic linear regression model with replication:
y ^ x j p + o-^j
(i = l,...,N;j
= l,...,m).
(1-1)
Here /3 is a vector with p- components, and the ev are independent and identically
distributed random variables with mean zero and variance one. The heteroscedasticity
in the model is governed by the unknown at. We have taken the number of replicates at.
each x, to be the constant m primarily as a matter of convenience. In practice, it is fairly
common that the number of design vectors N is large while the number of replicates m
is small. Our intention is to construct an asymptotic theory in this situation for weighted
least-squares estimates with estimated weights.
As a benchmark, let /3 WLS be the weighted least-squares estimate with weights 1/cr].
Of course, since the o-( are unknown this estimate cannot be calculated from data. If m
isfixedand
N
SWLs = plim AT1 £
7 ?
then
(Nm)i($wvs-p)^N(0,S^ls).
(1-2)
One common method for estimating weights uses the inverses of the sample variances,
tfl = s* = (m-l)-l£(ylJ-yl)2.
The resulting weighted least-squares estimator will be denoted by /3Sv-
(1-3)
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
We consider a heteroscedastic linear regression model with replication. To estimate
the variances, one can use the sample variances or the sample average squared errors
from a regressionfit.We study the large-sample properties of these weighted least-squares
estimates with estimated weights when the number of replicates is small. The estimates
are generally inconsistent for asymmetrically distributed data. If sample variances are
used based on m replicates, the weighted least-squares estimates are inconsistent for
m = 2 replicates even when the data are normally distributed. With between 3 and 5
replicates, the rates of convergence are slower than the usual square root of N. With
mss6 replicates, the effect of estimating the weights is to increase variances by
( m - 5 ) / ( m - 3 ) , relative to weighted least-squares estimates with known weights.
36
RAYMOND J. CARROLL AND DAREN B. H.
CLINE
Table 1. Summary of results when weights are inverses of sample variances based on m
replicates
m
Consistent?
2
3
4
5
6
7
8
9
10
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Asymptotically
normal?
No
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Rate of
convergence
Relative
efficiency
logN
N^/log N
N'
N>
0
0
0
0
i
1
$
Standard error
factor
—
1-73
1-41
1-26
1-22
118
Relative efficiency calculated with respect to weighted least-squares with known weights. Column standard
error factor is number by which to multiply standard errors from a weighted least-squares package to obtain
asymptotically correct standard errors.
A second method for estimating weights is to use the linear structure of the means.
Write /3 L for the unweighted least-squares estimate and define the average squared error
estimate by
tfi = tnik) = m'1 I (y0 -xJfiL)2.
(1-4)
The resulting weighted least-squares estimate will be denoted by 0EL.
A third method is the normal theory maximum likelihood estimate pML, which is a
weighted least-squares estimate with weights the inverse of
£?3 = <??2(0ML).
This can be thought of as an iterated version of /3 E L .
(1-5)
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
This method is particularly convenient because it involves sending only the estimated
weights to a computer program with a weighting option. The obvious question is whether
/3 SV is any good, and whether the inferences made by the computer program have any
reliability. In §§ 3 and 4, we answer both questions in the negative, at least for normally
distributed data with less than 10 replicates at each x In many applied fields this is
already folklore (Garden, Mitchell & Mills, 1980). Yates & Cochran (1938) have a nice
discussion of the problems with using the sample variances to estimate the weights.
More precisely, for normally distributed data we are able to describe the asymptotic
distribution of j 3 s v for every m. For m 5s 6, this is an easy moment calculation and
we show that 0 S v is more variable than /3 WLS by a factor ( m - 3 ) / ( m - 5 ) . The same
result was obtained by Cochran (1937) for the weighted mean. Not only is /3 SV inefficient,
but if one uses an ordinary weighted regression package to compute /3 SV , the standard
errors from the package will be too small by a factor exceeding 20% unless m s= 10. For
example, if one uses m = 6 replicates, the efficiency with respect to weighted least-squares
with known weights is only 5, and all estimated standard errors should be multiplied by
>/3 = 1-732. For m =£ 5, we use the theory of stable laws and Cline (1988) to describe the
asymptotic distributions. Perhaps the most interesting result here is that, if only duplicates
(m = 2) are used, weighted least-squares with estimated weights is not even consistent.
The results are outlined in Table 1.
Weighted least-squares with weights estimated by replication
37
Remark 1. If ordinary least-squares is less than 3 times more variable than weighted
least-squares with known weights, then )3EL is more efficient than maximum likelihood.
Remark 2. If ordinary least-squares is more than 5 times more variable than weighted
least-squares with known weights, then maximum likelihood is more efficient.
Further, for normally distributed data, maximum likelihood is more variable than
weighted least-squares with known weights by a factor m/(m-2). This means a tripling
of variance for m = 3 even when using maximum likelihood.
2. ASSUMPTIONS AND CANONICAL DECOMPOSITION
We will assume throughout that (x,, cr,) are independent and identically distributed
bounded random vectors, distributed independently of the {ei;}> We define z, = x,/cr( and
dj = aja,. For any weighted least-squares estimator with estimated weights iv, = l/<x2,
I zfijft.
(2-1)
Assuming they exist, we note that the asymptotic covariance of the weighted and
unweighted least-squares estimators are, respectively,
x T )}- 1 .
(2-2)
3. WEIGHTING WITH SAMPLE VARIANCES
In this section, we describe consistency and asymptotic normality for weighted leastsquares estimates /? sv with the weights being the inverse of sample variances. We first
describe the general case assuming that sufficient moments exist. We then look more
closely at the case of normally distributed observations. In this set-up,
Define -qJk = E{ij,/^k) and vjk = £(|e,|7<?2*).
The first result indicates that we obtain consistency only when
Vu
= E(e,/ <??) = 0.
This is true for symmetrically distributed data, but generally not otherwise.
(3-1)
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
These methods have been discussed in the literature for normally distributed errors.
Bement & Williams (1969) use (1-3), and construct approximations, asm^oo, for the
exact covariance matrix of the resulting weighted least-squares estimate. They do not
discuss asymptotic distributions as N-*<x> with m fixed. Fuller & Rao (1978) use (1-4)
while Cochran (1937) and Neyman & Scott (1948) use (1-5). Both find limiting distributions as N -» oo for fixed m 5* 3, although the latter two papers consider only the case
that xT/3 = fj..
One striking result concerns consistency. The estimates /3SV, PEL and /3ML a r e always
consistent for symmetrically distributed errors but generally not otherwise: see Theorems
1 and 3. In § 5, we compute the limit distributions of/3EL and pML. The relative efficiency
of the two is contrasted in the normal case for m s* 3, as follows.
38
RAYMOND J. CARROLL AND DAREN B. H. CLINE
THEOREM
1. (a) Ifvu<oo
and Poi*^00, then
plim 0SV = P + (T7OI
so that consistency holds only if £(z) = 0 or (3-1) holds.
(b) / / vJk«x> for j=s2, &=£2 and (31) /iotas, then (Nm)\psw-P)
is asymptotically
Proof. This follows from the weak law of large numbers and the central limit theorem.
•
COROLLARY 1. Assume that the errors e<, are normally distributed For m3=6,
(Nm) 4 (0 s v -/3) is asymptotically N{0, (m -3)5^Ls/(m -5)}.
Comparing with (1-2), we see that the effect of using mss6 replicates to estimate
sample variances causes an inflation of variance by the factor (m—3)/(m —5) over
weighted least-squares with known weights. Even with m = 10, this results in a 40%
increase in variance.
If one uses a standard statistical package with weights l/s2,, then the resulting standard
errors will also be asymptotically incorrect^ Such packages estimate the asymptotic
covariance matrix of (Nm)*(psv-p) byCTIV^WLS,where
-py1 £ £
If m s= 6 and if the data are normally distributed, then <x|v converges in probability to E(l/<??) = ( m - l ) / ( m - 3 ) , while S W L S - ( ' n - l ) ( m - 3 ) " I 5 W L S ^ 0 . Thus,
((T|V5WLS-5WLS)-*0. Asymptotically, therefore, standard errors should be multiplied by
{(m-3)/(m-5)} J : see Table 1.
4. SAMPLE VARIANCES WITH m < 5 REPLICATES IN THE NORMAL CASE
In this section, we consider normally distributed data with m «£ 5 replicates and the
weights being the inverses of the sample variances. Here Theorem 1 does not apply since
Eil A] does not have finite variance. The results here are based on Cline (1988). We first
state a general result which may be of independent interest. The results for weighted
least-squares, assuming normal errors, are then derived as a corollary.
First, a few definitions are required. A positive function p is regularly varying with
exponent p, denoted by ^i€Rv(p), if fi(yt)/fi(t)^yp
as f-*oo for all y>0.
Let (z,, u,, Wi) be independent and identically distributed random variables with z,eRp
independent of (u,, w,), u, with a symmetric distribution and wt >0. Define /i,(f) =
E{wl(w^t)} and fi2{t) = E{(uw)2I(uw^t)}. Let (c 1N , c2N) be constants satisfying, as
N-*oo, NfjLt(clN)/c1N^\
and Nfi2(c2N)/c\N^ 1.
If a^ < 1, then Si = Si(er,) will denote a positive stable random variable with Laplace
transform £{exp (-r5,)} = exp {-r(2 - a,)f a '/ari}. If a, = 1 then 5, = 1 almost surely. We
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
For normally distributed observations, the assumption that vJk<oo for j ,fc*s 2 holds
only if there are at least 6 replicates. In this case, we have the following corollary.
Weighted least-squares with weights estimated by replication
39
denote by S2 = S2(a2) a symmetric stable random variable with characteristic function
£{exp (itS2)} = exp [-r(3 - a2) cos (^a2)\t\y
{a2(\ - o2)}].
Of course, if a2 = 2 then 5 2 is standard normal.
THEOREM
2. Assume that fite Rv(l-a,), jt 2 e Rv(2-a 2 ) and that
N
I
I C7N Z
N
*> C2N Z "fH-( I
1-1
\
\
W
i-l
(4-1)
/
bN = (clN/c2N)[
Z ZizJw, I
\f-i
/
Z Wi^i
i-i
is asymptotically distributed as Y\* Y2. Further, for any be Rp, bTYtb and bTY2 have the
same distributions, respectively, as
Proof. Consider first the case a, < 1. From Theorem 1 of Cline (1988) we get
(
N
N
\
CTN Z Z,z7>Vi, C2lN Z ^iUiWi I
1-1
is asymptotically distributed as (Yxi,- lY2). In the case
that a/t = l, then S, = 1 almost surely
by Feller (1971, p. 236). From unpublished work of D. B. H. Cline and from Gnedenko
& Kolmogorov (1954, p. 134), for each (j, k),
N
i-l
The convergence of the remaining terms, cj^ 1 z,UjW,, again follows from Theorem 1 of
Cline (1988).
In either case, convergence of the ratio bN follows. The limiting joint distribution is
difficult to describe, but the stated marginal distributions of bTYtb and bTY2 can be
inferred from Proposition 3 of Breiman (1965) and Theorem 3 of Mailer (1981). One
may also conclude that Yx and Y2 are independent if a, = 1, since then Y", is degenerate.
Also, Yt and Y2 are independent if a2 = 2, since then Y2 is Gaussian, and for such limits
the non-Gaussian stable component is always independent of the Gaussian component
(Sharpe, 1969).
•
Note that, in Theorem 2, V, and Y2 are not necessarily independent unless a, = 1 or
a 2 = 2. In the former case, Y, = £(zz T ) almost surely, while in the latter case Y2 is
normally distributed with mean zero and covariance E(zzT).
These are the following special cases.
If (//a,) r 'pr(H'> <)-* 1 for yi>0, then we have the following:
(i) if y , < l , then ai = yl,clN = al { N / ( l - a , ) } 1 / a ' , and 5, is positive stable;
(ii) if -y, = 1, then a, = 1, c1N = atN log TV, and S, ™ 1;
(iii) if y, > 1, then a, = 1, ctN = NE(w), and 5, a l.
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
is asymptotically distributed as (5,(a,), S2(a2)). Suppose that, for some 5 > 0 and all i, j ,
E(\z,j\r) <oo for y = min (2, max (2a,, a 2 ) + 5). 77ien f/iere exists Y2e Rp, Yxp xppositivedefinite, such that
40
RAYMOND J. CARROLL AND DAREN B. H. CLINE
If (f/a 2 ) Tl pr(|uH'|>
(i) if y 2 <2, then
(ii) if y2 = 2, then
(iii) if y2> 2, then
/)->1 for -y2>0, then we have the following:
a2 = y2, c2N = a2{N/(2-a2)}x/a\
and 52 is symmetric stable;
1
i
a2 = 2, c2N = 2 a 2 N log TV, and S2 is normal;
a2 = 2, c2N = 7Vi(£|«H'|2)1, and 5 2 is normal.
Consider the case of normally distributed errors in model (1-1), where we make the
identifications z, = xi/o-,, with E(zzT) = 5 WLS . Further, write
m
U, = £ , = m~' X Bu, Wt"
J-l
22\
Then (f/a) a pr(w>/)-»l, {'/(a*>)}a pr (|«H> ')-» 1 and, if a > l , £(w) = o / ( a - l ) .
With the indicated choices of c1N and c2N, Theorem 1 of Cline (1988) shows that (41)
holds. Thus the conditions of Theorem 2 are met.
COROLLARY 2. In the normally distributed case, with Slt S2, V,, Y2 as defined in Theorem
2, we have the following cases.
Case I (m = 2): a1 = a2 = ^, and (/3Sv~/3) is asymptotically distributed as
Case 2 (m = 3): a, = a 2 = l , and log N(f3sv-fi)
is asymptotically distributed as
{2/OTT)}^Y^Y2.
Case?) (m=4): a, = l, a2 = \, and N 1 / 3 (/3 s v -p) is asymptotically distributed as
2- i {r 2 G)/(i877)} I/3 y7'>V
Case 4 (m = 5): a, = 1, a2 = 2 and TVVlog 7V(/?sv-/3) is asymptotically distributed as
Case 5 (m > 6) is covered by Corollary 1.
Proof. In the notation of Theorem 2, bN = (cls/c2N)(fjSvfl) is asymptotically distributed as YV Y2. Thus, in each case it suffices to construct the constants (c]N, c2N).
Case 1 (m = 2). Here a=\, clN = (S/TT)N2 and c2N = {ST2(1)/(9TT2)}N2.
Case 2 (m = 3). Here a = 1, clN = NlogN and c2N ={2/(37r)}i/V.
3 (m = 4). Here a = § , c, N =3/V and c2N =2-4{3r2(|)/(2Tr)}l/3N2/3.
4 (m = 5). Here a =2, c 1N =27Vand c2N =2(5)-*7V4 log N.
D
5. ESTIMATING VARIANCES BY SAMPLE AVERAGE SQUARED ERRORS
One might reasonably conjecture that making use of the known linear structure for
the means results in improvements over using only sample variances. We show that this
is the case, at least for normally distributed data. Let /30 be any estimate of /3, and define
*?(/5o) = m"1 I (y,j-xjfio)2,
dM
= *,(/8o)/cr,.
We denote by /3G the weighted estimate with the estimated weights l/&2(f}0). As defined
in the introduction, /3 EL uses /3O = 0 L , the ordinary unweighted least-squares estimate,
and 0 ML used i30 = /3 ML . Our results here rely on the consistency of /3 0 , and two
other reasonable moment conditions for m large enough. Here are the assumptions. First
plim/30 = /3.
(5-1)
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
Of course, u, and w, are independent and E(u )<oo. Set
/a
, b = {2/(irm)}*.
Weighted least-squares with weights estimated by replication
41
For each c, > 0, there exists c2 > 0 such that
\
sup |d72(0*)-rfr2(/3)|Uc,
(5-2)
and such that
(5-3)
sup
In addition, we assume the finite existence of
Vjk = E{i{/d?(p)},
^ = £{|ei|V^2k(/3)}<oo
(j,k^2).
(5-4)
THEOREM
3. Assume (5-l)-(5-4). Then
plim pc = p + T/ U (7 ? O,5WLS)"'^(Z).
(5-5)
Thus pc is consistent only if r)u = 0 or E(z) = 0. Further, as N ->oo,
JVi(/8G-j8) = A^1{6N + C N N*(/3 0 -/3)},
(5-6)
w/iere
plim A N = TJOISWLS,
bjv = N~* X z,e,/a 2 (/3),
(-1
plim C N = 2TJ 22 SWLS-
Proof. Since the {z,} are bounded, the assumptions make possible the usual Taylor's
series argument leading to (5-6) and (5-5) is an immediate consequence of (5-6).
•
Assuming consistency of the maximum likelihood estimator, we can compute the limit
distributions of )3EL and @ML.
THEOREM 4. Make the assumptions of Theorem 3, with the {ey} being symmetrically
distributed.
(a) Let Vt = E(xxT) and V2 = E(a2xxT) be finite and positive-definite. Let S^1 =
1
Vj" V2VJ"1- Then with /30 chosen as the unweighted least-squares estimate,
(Nm)\$EL-/3)
is asymptotically N(0, 5EL), where
(b) With po = /3 M L, the maximum likelihood estimate, then (Nm) 4 (/3 ML -/3) is asymptotically N(0, 5MV),
)
SWLS.
WLS.
/ Parts (a) and (b) follow easily from Theorem 3, Slutsky's theorem and the fact
that (Nm) ! (j3 L -0) is asymptotically JV(O, 5 L ' ) 3. For normally distributed observations with m ^ 3 , (Nm) ! (/3 ML -/3) and
* - @ ) are asymptotically normally distributed with respective covariances
COROLLARY
{m/(m
7?22 =
-2)}5W'LS,
By direct calculation (Fuller & Rao, 1978), 7j0] = m/(m - 2 ) , 7j2, = l / m and
l/(m-2).
D
As noted by Fuller & Rao (1978), the asymptotic covariance of pEL consists of a
mixture of the weighted least-squares covariance SWLS a n ^ t n e unweighted least-squares
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
The first result describes the consistency of pc.
42
RAYMOND J. CARROLL AND DAREN B. H. CLINE
l
covariance SZ - Comparing /3EL with the maximum likelihood estimate /3 ML depends on
how much bigger 5[' is than SWLS- Detailed calculations verify Remarks 1 and 2 of § 1.
Thus, doing iterative weighted least-squares may actually hurt, unless the starting value
fiL is sufficiently bad.
6. DISCUSSION
(ii) Using sample variances as a basis for estimating weights is inefficient unless the
number of replicates m is fairly large, e.g. m 5* 10.
(iii) Using sample average squared errors from a preliminary fit to the regression
function as a basis for estimating weights is typically more efficient than using sample
variances. However, even here a fair number of replicates is helpful. For example, the
maximum likelihood estimate for normally distributed data based on 6 replicates still
has standard errors approximately 20% larger than ordinary weighted least-squares theory
would suggest.
There are at least two other methods for estimating the weights. The first is to model
the variances parametrically, for example a, = a(xjp)e (Carroll & Ruppert, 1987;
Davidian & Carroll, 1987). The second is to perform a nonparametric regression of (1-3)
and (1-4) against the predictors and use this regression to estimate the weights (Carroll,
1982).
ACKNOWLEDGEMENT
The work of R. J. Carroll was supported by the Air Force Office of Scientific Research.
REFERENCES
BEMENT, T. R. & WILLIAMS, J. S. (1969). Variance of weighted regression estimators when sampling errors
are independent and heteroscedastic. J. Am. Statist. Assoc 64, 1369-82.
BREIMAN, L. (1965). On some limit theorems similar to the arc-sine law. Theory Prob. Applic 10, 323-31.
CARROLL, R. J. (1982). Adapting for heteroscedasticity in linear models. Ann. Statist. 10, 1224-33.
CARROLL, R. J. & RUPPERT, D. (1987). Transformations and Weighting in Regression. London: Chapman
& Hall.
C L I N E , D. B. H. (1988). Joint stable attraction of two sums of products. /. Mult. AnaL To appear.
C O C H R A N , W. G. (1937). Problems arising in the analysis of a series of similar experiments. / R. Statist.
Soc, SuppL 4, 102-18.
D A V I D I A N , M. & CARROLL, R. J. (1987). Variance function estimation. /. Am. Statist. Assoc. 82. To appear.
FELLER, W. (1971). An Introduction to Probability Theory and its Applications, 2. New York: Wiley.
FULLER, W. A. & RAO, J. N. K. (1978). Estimation for a linear regression model with unknown diagonal
covariance matrix. Ann. Statist 6, 1149-58
G A R D E N , J. S., MITCHELL, D. G. & MILLS, W. N. (1980). Nonconstant variance regression techniques for
calibration curve based analysis. Analytical Chemistry 52, 2310-5.
G N E D E N K O , B. V. & K.OLMOGOROV, A. N. (1954). Limit Distributions for Sums of Independent Random
Variables. Cambridge, Mass: Addison-Wesley.
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
Our results can be summarized as follows.
(i) If nothing is known about the structure of the sample variances, then none of the
common weighted estimates can be assumed to be consistent for data from an asymmetric
distribution.
Weighted least-squares with weights estimated by replication
43
R. A. (1981). A theorem on products of random variables, with applications to regression. AusL
J. Statist 23, 177-85.
NEYMAN, J. & SCOTT, E. L. (1948). Consistent estimators based on partially consistent observations.
MALLER,
Econometrica 16, 1-32.
M. (1969). Operator stable distributions on vector spaces. Trans. Am. Math. Soc. 136, 51-65.
YATES, F. & COCHRAN, W. G. (1938). The analysis of groups of experiments. J. Agric Sci 28, 556-80.
Reprinted (1970) in Experimental Design: Selected Papers of Frank Yates, pp. 121-45. London: Griffin.
SHARPE,
[Received May 1987. Revised September 1987]
Downloaded from http://biomet.oxfordjournals.org/ at Penn State University (Paterno Lib) on September 17, 2016
© Copyright 2025 Paperzz