Gallo, Paul P.; (1982).Properties of Estimators in Errors-in-Variables Regression Models."

"_.
PROPERTIES OF ESTIMATORS
IN ERRORS- IN-VARIABLES REGRESSION t-UDELS
by
Paul P. Gallo
A Dissertation submitted to the faculty of
The University of North Carolina at Chapel
Hill in partial fulfillment of the
requirements for the degree of Doctor of
Philosophy in the Department of Statistics.
Chapel Hill
1982
Reader
PAUL P. GALLO.
Models.
Properties of Estimators in Errors-in-Variables Regression
(Under the direction of RAYMOND J. CARROLL).
In this dissertation, we consider several facets of the "errors-invariables" problem, the problem of estimating regression parameters when
variables are subject to measurement or observation error.
We consider a
general formulation in which some subset of the variables is subject to
such errors while some variables are observed exactly.
We consider two general classes of estimates which have appeared in
the literature:
one is maximum likelihood estimation under normality
(equivalent to weighted least squares estimation), and the other is a
modification of a class of "method-of-moments" estimates in which multiple
observations are made on each variable subject to error.
We consider
several particular cases which differ in the amount of knowledge of the
error structure assumed known.
We demonstrate that a number of estimates which have been considered
in the literature can be expressed in alternative forms which are easier
to compute and lend themselves to deeper investigation of their
properties.
We examine conditions on the values of the variables and the error
random variables under which the estimates considered are consistent; we
produce a set of conditions generally weaker than those previously known,
which is sufficient for consistency.
We also consider the ordinary least
squares estimate (computed as if the observed values are exact) and
demonstrate that certain contrasts of the elements of the parameter vector
can be estimated consistently without requiring specialized information as
with other types of estimates.
ii
We demonstrate, making appropriate assumptions, that the estimates
under consideration have limiting normal distributions with covariance
matrices which can be consistently estimated.
By comparing limiting
variances, we are in a sense able to compare the relative merits of some
of these estimates.
Finally, in one of the models we are considering, we define a robust
~1-estimate
of the regression parameters, which generalizes one of our
other estimates and reduces to the usual regression
variables are observed exactly.
~1-estimate
when all
This estimate should be reasonably
efficient over a wide class of error distributions; we show the estimate
to be consistent and asymptotically normal.
ACKNOWLEDGEMENTS
I would like to express my gratitude to my major professor, Raymond
Carroll, who introduced me to many of the topics considered in this work
and, consequently, invested much time and effort in assisting me in its
preparation.
I am also indebted to many of the members of the faculty of the
Statistics Department at UNC for the contributions they made to my graduate
education, and in particular to the following for their assistance in the
preparation of this work:
David Ruppert, Starnatis Cambanis,
I.M. Chakravarti, and P.K. Sen.
I would also like to acknowledge the importance of the assistance and
·e
support of fellow graduate students too numerous to mention.
Special
thanks is due to Ms. Elizabeth Blake for excellent work in the preparation
and typing of this dissertation, and to Ms. June Maxwell for valuable
assistance in administrative matters on numerous occasions.
In addition, I cannot overstate the importance of the support I
received from my family throughout my education.
This work was supported in part by a National Science Foundation
Graduate Fellowship and by the U.S. Air Force Office of Scientific Research
under Grant AFOSR-80-0080.
TABLE OF CONTENTS
CHAPTER I:
Introduction and Preliminaries------------------------------ 1
1.0.
Introduction------------------------------------------------- 1
1.1.
The Errors-in-Variables Mode1-------------------------------- 3
1.2.
Related Resu1ts---------------------------------------------- 6
1.3.
Summary------------------------------------------------------13
1.4.
Preliminary Resu1ts------------------------------------------14
CHAPTER II:
·e
Consistency of EIV Estimates-------------------------------23
2.0.
Introduction-------------------------------------------------23
2.1.
Some Relevant Considerations---------------------------------23
2.2.
Basic Consistency Results------------------------------------26
2.3.
Consistency with
2.4.
The Ordinary Least Squares Estimate--------------------------41
CHAPTER III:
~funy
Replications---------------------------38
Limiting Distributions------------------------------------50
3.0.
Introduction-------------------------------------------------50
3.1.
Related Results----------------------------------------------50
3.2.
Asymptotic Normality of Estimates of B-----------------------58
3.3.
Some Comparisons---------------------------------------------73
3.4.
Details------------------------------------------------------81
CHAPTER IV:
M-Estimation in an EIV Model-------------------------------93
4.0.
Introduction-------------------------------------------------93
4.1.
Background---------------------------------------------------93
v
4.2.
Notation and Assumptions------------------------------------ 96
4.3.
Estimation of S--------------------------------------------- 99
4.4.
One-Step Estimates------------------------------------------104
4.5.
Detai1s------------------------------------------------ -----107
BIBLIOGRAPHY-----------------------------------------------------------122
e·
...
CHAPTER I
Introduction and Preliminaries
1.0.
Introduction.
The estimation of linear regression parameters when some variables
cannot be ascertained exactly due to measurement or observation error has
long been recognized as an important problem relevant to many fields of
statistical application.
The motivation of the ''Errors-in-Variables''
(EIV) problem we consider is exactly that of classical regression theory:
·e
we wish to estimate a set of parameters which describes a linear relation
between the means of a group of random variables and a set of fixed
values.
However, unlike in the regression situation, these values are
not all exactly observable.
The wide potential applicability of EIV
techniques stems from the obvious fact that many types of numerical
quantities do not lend themselves to arbitrarily precise measurement.
Social scientists were among the first to realize the relevance of the
EIV problem, acknowledging that it can be very difficult to define
meaningful measures for many of the somewhat abstractly-defined variables
they must sometimes employ.
In physical science applications, many
variables are physical quantities which take values on same continuous
scale.
Such quantities are generally not measurable with infinite
precision, and further error can be introduced due to variation in the
measuring instrument, say, or the perspective or even some physical
2
attribute of the observer (e.g., two people timing an event with
stopwatches will not record identical results).
A ntmlber of techniques have been designed for analyzing EIV
situations; in practice, it seems that these are not used nearly as often
as they should be.
Typically, the "faulty" observations are treated as
if they were the exact values of interest, and ordinary regression-type
estimates are computed.
The effects of doing this can be quite serious,
as will be demonstrated.
Such improper analysis could once be somewhat
justified by relative computational simplicity, but some EIV techniques
are nearly as easy to employ as those of ordinary regression analysis,
and modern computing capabilities make others feasible.
The more common
reason that EIV techniques are not more frequently employed is likely a
lack of tmderstanding about how errors of the type we have described can
affect ordinary regression techniques.
If such errors have symmetric
distributions, say, fairly concentrated about the true values, it might
seem as if they would "cancel out" in some sense, perhaps having
consequences no worse than impairing the efficiency of the estimates
somewhat.
This is not the case; the presence of such errors does
systematically bias the regression estimates in a way which will become
clear shortly.
Among the techniques which have been proposed for analysis of EIV
situations, some apply only in rather restrictive situations, such as
simple linear regression (one independent variable).
methods which apply in quite general fonnulations.
We will deal with
Generally, we will
study estimators which have received consideration in the literature,
modifying them where appropriate; we tmdertake a study of their
3
properties with the general intention of demonstrating their applicability
to a wider variety of situations.
1.1.
The Errors-in-Variab1es Model.
The following will be referred to as Model I:
(1.1.1)
where Bl and BZ are vectors of regression parameters to be estimated, Y
and C consist of observable random variables, Xl and Xz consist of
constants and have full colurrm rank but Xl is known and Xz is not, and
E
·e
and U are composed of random variables such that the joint distribution
of the elements of [U E] is absolutely continuous with respect to
Lebesgue measure, and the rows of this matrix are i.i.d. with mean zero
and unkno,vn non-singular covariance matrix
L
=
wi th oZ scalar (models such as this, with the independent variables
being constants, have generally been referred to 1.D1der the title "linear
functional relationship," while a related model in which the variables
are stochastic has been called a "linear structural relationship"; see
Madansky (1959) for discussion).
Although in our discussions n will
vary, there should generally be no confusion if we do not subscript the
matrices we have defined above.
4
The elements of U represent errors of measurement, observation, or
interpretation of the independent variables; each element of
E
can be
viewed as representing some combination of the same type of error in the
dependent variable and random "equation error" (Le., the amount by which
the relation for a particular subject differs from the average relation
in the population).
We will make use of the following notation:
x = [Xl XZ]
B'
=
[B'
I
r* =
r;u]
r*'
EU
B' ]
2
C* = [Xl C]
~r.u
P
= PI
0
2
=
r~
0
+ P2 '
(1.1. 2)
U* = [0 U]
and we assume X is of full coltmm rank.
Also, with In denoting the
identity matrix of order n, we define
= [C* Y]' [C* Y]
W= [C Y]' R[C Y] .
W*
H = [X XB]
(1.1. 3)
E = [U E]
In a related model which we will refer to as M:>del II, we assume
that independent repeat observations are available for each variable
observed with error (typically, perhaps, each is measured twice).
define Model II, we replace (1.1.1) in MOdel I with
E.
1
= <5
i=l, ... , r ,
+
V.1
i=l, ... ,s
To
e·
5
with the V. 's mutually independent with mean zero and
1
Var(V.)
1
the U.' s independent also, and both sets independent of
1
We will basically consider two cases of MOdel II.
we let s = r; each E. = [U.
E. ]
111
o.
In Model II(i),
is distributed as E in Model I.
In
Model II(ii), we let s = 1; V = V.l
is I
independent
of each U. (t
= 0),
E U
and each U. is distributed as U in Model I. In both fonnulations, let
1
C~
1
= [Xl C.1 ]
U~
1
s
Y = s -1 I Y.1
[0
U. ]
1
1
= 1, ... , r
r
C = r -1 L c.1 ,
i=l
with
=
i=l
C*,U,U*,£ defined analogously; in Model II(i), let
w.1
= [C. Y.]' [C. Y.]
1111
E~
1
=
[0
i=l, ... , r ,
E.] ,
1
and in Model II(ii), define
W. = [C.
1
1
Y]' [C.
1
Y] •
In our analyses of Model II, it will be necessary to separate each
£.
1
into the two components referred to above.
In
~bdel
II(i), note that
0, the random equation error, necessarily remains constant among all .
replications E , ... , E . In Model II(ii), our (often reasonable) extra
1
r
assumption on tEU will allow us to obtain good estimates of B without
replicating Y (in a special case of interest, where the dependent variable
is not subject to error so V
= 0,
such replication would be absurd).
6
1.2.
Related Results.
The earliest consideration in the statistical literature of models
like those of the previous section seems to date back over a century to
when Adcock (1878) and Kurnmell (1879) addressed the problem of fitting a
straight line with errors in both variables.
With the two sources of
error uncorrelated and of equal variance, Adcock's sensible
recommendation was to minimize the sum of squared orthogonal distances of
points from the line.
The EIV problem first received detailed
consideration some half a century later in the early days of econometrics
as it became clear that EIV models were relevant in many studies of
economic behavior (e.g., see Frisch (1934)).
Nevertheless, early
proposals did not come under widespread consideration, due in large part
to a feeling that measurement errors were usually quite minor compared to
disturbing influences in the equation being studied.
Hence many
situations which were clearly well-represented by EIV models were
analyzed as if they were classical regression problems, that is, using
ordinary least squares techniques.
Ordinary least squares procedures are, in fact, not appropriate for
EIV models since, while they minimize equation error and errors
associated with the dependent variable, they do not take into account all
sources of variation present.
In Model I, if we treat the observed
values of the variables as if they were the exact values of interest and
compute the ordinary least squares estimate
BL = (C*'C*)-l
(C*'
y)
(1.2.1)
it is easily demonstrated that under quite general conditions ~L "gets
e·
7
close" not to B but to
(I
p
+ neXt X) -1 r*) -1 (B + neXt X) -1 r* )
u
(1. 2.2)
ru
("gets close" in the sense that the difference between this quantity and
/I.
BL converges in probability to zero as n gets large).
Looking at
(1.2.2), it seems that the relative magnitudes of the observation errors
and equation errors are not nearly of so much use in judging whether ~L
is acceptable as is the relationship between the size of the measurement
errors and the "spread" of the true values.
It is instructive here to specialize the preceding to the simple
linear regression (SLR) case; let
·e
y.I
=aI
+ Bx.
E ( E.)
1
2
E(£.)
1
=
E (u .)
1
= x.
c.
+ E.
I
+ U.
I I I
= Cov ( E. , u.) =
1
1
0
= a2
i=l, ... ,n
(1.2.3)
n
.
-1
= 1m
n
I
x n-+oo
i=l
a
1J
2
x
n
= lim n -1 . I 1 x.
,
·1
n-+oo
1=
where the latter two quantities are momentarily presumed to exist and the
{(E., u.)} are mutually independent.
1
1
computed, it turns out that
If the least squares estimates are
8
Thus ordinary least squares generally underesttmates the slope in
absolute value; the errors of observation obscure the relationship
between the variables, lessening the esttmated correlation.
The
intercept is over- or underesttmated, depending on the sign of 8llx .
By the late 1930's, it would not be unfair to characterize the EIV
literature as "confused." Some inappropriate techniques were being
employed, and some proper teclmiques were being used without much
understanding of why they were acceptable.
A paper by Lindley (1947)
cleared up a good deal of misunderstanding and paved the way for future
work on the subject.
Lindley showed that the correct form of least
squares estimation in EIV models is weighted least squares (WLS),
performed by assigning to each term in the stun of squares to be mintmized
a weight proportional to the reciprocal of its variance.
In 1>bdel I, such minimization cannot be performed without some
further information concerning L.
A similar problem arises if, under the
assumption that the rows of E have normal distributions, we attempt to
find maxirntun likelihood estimates; the supremtun of the likelihood is
infinite .
In fact, the methods of weighted least squares and maxtmtun
likelihood (under normality) are equivalent for MOdel I in the following
sense:
any condition on r which produces a solution for either of these
methods does the same for the other, and the two estimates are identical.
In 1>bdel I, the assumption most frequently made in the literature to
produce ML/WLS estimates is that r is known up to a scalar multiple,
Le. ,
e·
9
2
E=o E
o
=0
2
E
uo
(1.2.4)
[ E'EUO
so we know the ratio of any pair of covariances.
Often the sources of
error associated with different variables will be physically independent
so E is diagonal, in which case (1.2.4) will require only some knowledge
of the relative magnitudes of the errors associated with different
variables.
Clearly, information of the type required by (1.2.4) will not always
be available to an experimenter.
Nevertheless, no other assumption on E
which alone allows one to maximize the likelihood is more general, and if
in a given situation one does not possess enough information to produce a
-e
reasonable Eo , then different types of procedures must be fmmd (for
example, using replication as in our Model II).
In all our further
consideration of Model I, we will assume (1.2.4); we will need to refer
to the following, the definitions of which should be obvious from (1.2.4)
by analogy with (1.1.2):
E*uo' E*EUO .
Lindley's paper was concerned with the SLR case, but the ideas it
contained were applicable in more general models.
As
soon as P2 > 1,
however, the direct calculation of MLE's becomes quite complicated.
A
more recent development which allowed concise expression of the MLE was
the realization that the estimates could be written in terms of
eigenvalues and eigenvectors of certain matrices computed from the data.
The most detailed work along these lines which is relevant here is due to
GIeser (1981) and Healy (1975).
Both obtained results which yield
estimates of B in Model I; Healy's formulation is more general, while
10
GIeser considers the properties of the estimate in more detail (both
upgrade (1.2.4) to
= a2 I
+1' which is no less general than our Model
o
P2
I since the two models are equivalent up to a non-singular transformation
L
of the data; in referring to their results, we will generalize their
expressions to encompass our more general
L ).
O
In Model I, we define
(1.2.5)
'\ (A) denotes the l.th 1argest e1genva
.
1ue 0 f a matnx
. A (.1. e. ,
h
were
".
1
Ai (A)
~
Aj (A) if i
< j),
associated eigenvector.
and let g' = [gi g2]' g2 scalar, be an
Healy (1975) has shown that if g2
~
0, then the
e·
normal theory MLE of B exists and is given by
(1. 2.6)
In a slightly less general model, GIeser (1981) has shown that g2
(and hence
/I.
~
~
0
exists) with probability one.
Another situation in which we can obtain estimates of B is our Model
II; the replication provides enough information about the error variance
to allow us to produce estimates without an assumption such as (1.2.4).
Maximum likelihood estimation (assuming normality) has been
considered in the special case of Model II (i) in which c5
=
0, so that
the relation is exactly linear and all variation is due to measurement
error (actually, such a formulation is what has been more traditionally
e
11
referred to as the "linear ftmctiona1 relationship").
Such situations
might be expected to arise mainly in physical science applications where
quantities are related by fixed laws, e.g., consider an attempt to
estimate the density of a substance using faulty measures of mass and
volume.
Normal theory MLE' s in such replication models were determined
by Anderson as early as 1951, but the following more general result can
be inferred from Healy (1980).
For any matrix A of n rows and full column rank, we define
F .. (A)
1)
now in
~bdel
= 0 1)
.. I
n
- r
-1
A(A' A)
-1
A';
(1.2.7)
II (i), let
r
r
F..
(I )[C.
T1 = L L [C. .Y.]'
1) n
)
. 1 1 1
i=l )=
r
r
T2 = L L [C.
. 1 )=
. 1 1
1=
Y.]'
F..
(~) [C.
1
1)
)
Y.]
)
(1.2.8)
Y.]
)
and
with gR = [giR g2R] an associated eigenvector; then (recalling 0 = 0)
if g2
~
0, the normal theory MLE of B exists and is given by
(1.2.9)
12
In our further discussions of ~, it should be kept in mind that we are
= O.
asstuning IS
A "method-of-moments" approach to the EIV problem has recently been
considered by Fuller (1980).
Basically, Fuller asstunes that tmbiased
estimates of the elements of r are available; apart from asymptotically
negligible modifications which he claims should improve small sample
behavior, his estimate is
~ = CC*'C* _n~*)-l (C*' Y-n£* ) •
F
u
(1. 2 .10)
EU
This is clearly a modification of the ordinary least squares estimate
(1. 2.1)
.
It corrects for the extra sources of error in an EIV model and
can be considered a "method-of-moments" estimator in the following sense:
E(C*'C* - nr*)
"u
= X' X
E(C*' Y- nr~u)
"
= X' XB .
In our consideration of this class of estimators, we will be more
specific about the source of the external variance estimates needed.
particular, we will asstune that the extra information comes from
replication; that is, we will apply (1.2.10) to our MOdel II.
end, note that it is easy to show, with T1 as in (1. 2•8), that
n-1Cr-l)-1 T is generally a consistent unbiased estimator of
l
r
EJ2 .
cr
v
To this
In
13
1 .3.
Surrma ry.
In subsequent chapters, we consider asymptotic properties of EIV
model estimators such as the ones defined in the previous section.
Our
primary interest will be with the maximum likelihood estimate in Model I
and the Fuller-type estimates in Model II since these apply to our most
general formulations.
The MLE in Model II is not, of course, as
generally applicable (recall that we require
~
= 0), but we will
consider it nevertheless; part of its interest stems from the fact that
it aids in the comparison of the other two types of estimates since it
bears similarities to both.
In Chapter II, we consider conditions under which the estimates of B
are consistent (Le., they converge in probability to B as n
·e
~
00).
For
the estimates we are dealing with, consistency has been shown in the
literature; but in our demonstration, we employ conditions a good deal
weaker than those which have previously been used, thus making the
techniques applicable to a broader class of problems. We also consider
the ordinary least squares estimate (1.2.1); while we have pointed out
"L
that B
is not generally consistent, there are conditions under which
this method can estimate B or certain contrasts of, the elements of B
consistently.
We would like to be aware of these, .so as not to require
In Chapter III, we compute the limiting distributions of the
estimates, that is, we demonstrate asymptotic normality, in each of the
cases considered in Chapter II.
In some special cases, we use those
results to provide comparisons of the different types of estimators.
14
In Chapter IV, we consider robust estimation of B in an EIV model;
in particular, we extend M-estimation to our Model II.
We seek to define
an estimate which is less sensitive than standard procedures to outliers
or
hea\~-tailed
error distributions, while retaining reasonable efficiency
at an exact nonnal model.
We defer a more complete discussion of the
method of M-estimation and a review of relevant literature until Chapter
IV.
We conclude Chapter I with same preliminary results which will
assist us in achieving the goals of Chapters II and III.
We demonstrate
that the estimates discussed in Section 1.2 can be expressed in alternate
fonns; these new expressions will facilitate our investigation of the
properties of the estimates.
1.4.
Preliminary Results.
In subsequent chapters, we will make substantial use of the
following result, which will be proven after discussion.
Theorem 1.1.
(i)
In Model I, the nonnality-MLE of B exists almost
surely and can be expresseq as
(1.4.1)
with R and e given by (1.1.3) and (1.2.5), respectively; we also have
~_ = (C*'C* - 81:*uo )-1 (C*' y - 81:*EUO ) •
-M
(1.4.2)
15
(ii)
In Model II(i) with 0 = 0, the norma1ity-MlE exists a.s. and
can be expressed as
A
B
2MR
=
r
r
(I I
c~
P.. C.) -
1
r
r
(I I
i=l j=l 1 1J J
C~ P.•Y .)
i=l j=l 1 1J J
(1.4.3)
where P.. = (8 R - l)F .. (I ) - r -1 R and 8R is as in (1.2.8); we also
1J
1J n
have
r
A·
R
-~
with
P~.
1J
r
m = (L
L
C~'P~.C~)-
i=l j=l 1 1J J
1
r
(I
i=l
r
L
C~'P~.Y.)
j=l 1 1J J
,
(1.4.4)
= (8 R -I)F .. (I ) - r -1 In.
1J n
Note that using (1.4.2) and (1.4.4) we could, if lye wish, express
this result without explicit reference to our partitioning of X and B
since a little algebra yields that in (1.4.2) 8 satisfies
8
=)'
-1
)' = largest root of IE*a - )'W*/ = 0 ,
,
and in (1.4.4), 8R satisfies
)' = largest root of
ITi - )'T21
with
r
Ti = I
r
I
[C~
i=l j=l 1
r
T*2
r
= .11
[C~ Y.]'
1
1=
Y.]' F.. (I ) [C~ Y.]
1
1J
[C~
1
n
Y.] .
1
J
J
=0 ,
16
We will also work with the Puller-type estimates for Model II (which
A
we will henceforth refer to as B ) in an alternate fonn. Making precise
R
a statement that we made at the end of Section 1.2, note that in Model
II(i) we can write
T
l
= (r-1)r
r
-1
L
i=l
E~ E. - r
1
r
-1
1
r
E~
L L
1
i=l j=l
i;tj
E.
J
since the E. 's have finite variance and the {[U.
1
1
(1.4.5)
v.D's are mutually
1
independent, it is straightforward to show that
n
-1
(r-1)
-1
Tla.
tu
t EU
t'EU
cr
~
J
(1.4.6)
2'
V
Now in (1. 2.10), using the average of the C~' s and Y.' s as C* and Y,
. 1 1
respectively, and the appropriate submatrices of (nr(r-l))-l T for the
l
required variance estimates, after some simplification we obtain
"
BR
r
= (L
r
r
(L
L
r
L
i=l j=l
i;tj
i=l j=l
i;tj
C~' Y.) •
(1. 4.7)
J
1
If we proceed using similar reasoning in Model lI(ii), we obtain
r
B
"R = (r -1) (L
r
L
i=l j=l
i;tj
C~'C~) -1
1
J
r
(L
i=l
c~t Y) .
(1. 4.8)
1
There are several advantages to considering the estimates in these
forms.
First of all, they are generally a good deal simpler
computationally than their counterparts in Section 1.2.
Also, they have
17
much nicer interpretations in these fonns.
Each is, in a slightly
different sense, a modification of the ordinary least squares estimate
(1.2.1) (why these modifications are in some sense "appropriate" will, of
course, become clearer in subsequent chapters).
Finally, in these forms
the estimates are much more ''manageable'' and lend themselves to deeper
investigations of their properties; it is these fonns, rather than the
original expressions, which we will use to attempt to accomplish the
goals stated in Section 1.3.
It is instructive at this point to compare these estimates, noting
their interrelationships.
·e
First, note the similarity between (1.4.2) and
(1.2.10).
Both seem to modify the least squares estimate in roughly the
same way.
Although at this stage we must necessarily be rather vague, if
these estimates are both to be useful, they should modify (1.2.1)
similarly; both cannot be ''well-behaved'' tm1ess aL o behaves like nL,
i.e., unless n- 1 a behaves like 0 2 . This is the case; under conditions
assumed by Healy (1975), he shows that
n -1
P
a~o
2
as
n~oo.
Thus while Fuller-type estimates require "external" variance estimates,
the method of maximum likelihood produces its own "internal" estimate,
and then the two methods operate similarly.
Of course, the internal
estimate of maximum likelihood estimation is not gotten for free; the
price we pay is having to employ assumption (1.2.4) •
The estimate
A
l\m
effectively operates in the same way as the other
two types, although it is less obvious at this point;
A
l\m
can be viewed
as taking the structure of the model into aCC01.mt in estimating
~,
while
18
1\
BR uses only the infonnation available from replication.
has shown, subject to certain conditions, that
P
eR-+r(r-l)
Healy (1980)
-1
note that if we substitute this limiting value of eR into (1.4.4), then
1\
1\
each Pi i is replaced by zero and l\m becomes identical to BR.
The proofs of the two parts of Theorem 1.1 are quite similar. We
will prove Theorem l.l(i) in detail, then just briefly outline the key
calculations in the proof of Theorem l.l(ii).
The following lennna will be useful:
Lemma 1. 2.
In Model I, C' RC - eLuo is non-singular with probability
one.
Proof.
e.
By symmetry of LO and W, we can write
(1. 4.9)
where
is a matrix of eigenvectors of r-lw, and F = G- l , which we partition
o
similarly.
Equation (1.4.9) implies
~
19
(1.4.10)
From GIeser (1981), we infer that LuoGl1 + L£uoGZl and Fll are
non-singular a.s. since the error distribution is assumed absolutely
continuous.
It fol1ows from a theorem of Okamoto (1973) that the
eigenvalues of L~1 Ware distinct with probability one (al1 we need is
~
(L-lW)), in which case 1.-81
is non-singular. From (1.4.10),
0
Pz
C' RC - 8L
is the product of non-singular matrices and is thus itself
uo
non-singular.
0
8
A
Pz
Proof of Theorem 1.1.
e
With G as defined in the proof of Lemma 1.Z,
satisfies
~
C' RC - 8t
uo
Y' RC - 8r'£uo
C' RY -
8L£UO
Y' RY - 8
~GlJ
J
= 0 •
(1.4.11)
G
ZZ
GIeser (1981) has shown that G ~ 0 a.s., in which case the MLE exists.
Z2
As mentioned earlier, 8 is a. s. an eigenvalue of multiplicity one, so the
left-hand matrix in (1.4.11) has rank (PZ
+ 1) - 1
= PZ'
and solutions to
(1.4.11) wil1 be detennined by equations corresponding to any Pz
linearly independent rows of that matrix.
In light of Lemma 1.Z, the
first Pz rows will do:
so
(1. 4.1Z)
20
"
By (1.2.6), this is B , which demonstrates (1.4.1).
lM
(c*'c*-er* )-1 (c*'Y-er*
uo
EUO
~
x' C
Now
)
J-
1
C' C 6E
uo
-1 + (X' X ) -1 X' CQC' X (X' X )-1
( X'1 X_)
-1.
1 1
1
1 1 1
=
-QC' X (X' X )-1
1 1 1
Q
(X' X )-1 X' Y - (X' X_)-l X' CQ(C' Y - er
- c' X (X' X )-1 X' y)1
1 1
1
1 -1.
1
EUO
1 1 1
1
=
Q(C' Y - er
=
EUO
-
c'x (Xi Xl) -1 Xi Y)
1
(X'X )-1 X'(Y - C(C'RC-er )-1 (C'Ry-er
))
11
1
uo
EUO
(C' RC - er
uo
) -1 (C' RY - er
EUO
)
by (1.4.12) and (1.2.6), demonstrating (1.4.2).
In proving Theorem 1.1(ii), first, writing
(1.4.13)
with T122'T222 scalar, we can demonstrate the following analog of Lemma
1.2:
T
211
- eT
111
is a.s. non-singular.
With a matrix
~
defined
e
21
eR satisfies
analogously to G in part (i),
.
this system of equations is equivalent to a system using only the first
P2 rows:
This implies
(1.4.14)
"2MR . Now some routine algebra shows that
by (1. 2.9), this is B
r
r
l l [C.
1 = i=l
j=l 1
- (T - aT )
2
Y. ]' P.. [Co
1
1J J
Y. ]
J
applying this to the right-hand side of (1.4.14), we obtain expression
(1.4.3) as desired.
Final1y, upon expressing ('\"r '\"r
C~'P~.C~)-l as
l.i=l l.j=l ~ 1J J
_(X' X_) -1 X' m
1 --1
1 U\,(R
_Q
R
-1
C' x_
(X' X )-1
-1. 1 1
r
r
,
(with QR = l
l c. P.. C.) ,
i=l j=l 1 1J J
we obtain by straightforward (but messy) calculation similar to what we
did for part (i) that
22
r
r
r
r
(ri=l j=lr C~'P~.C!)-l (ri=l j=lr C~'P!. Y.)
1
1))
1
1)
)
A
= [B'1MR
demonstrating (1.4.4) and completing the proof.
A
B']'
2MR
'
o
CHAPTER II
Consistency of EIV Estimates
2.0.
Introduction.
We begin our investigation of the asymptotic properties of EIV
estimates with a consideration of consistency.
This will be our
criterion for "closeness"; if we can show that an estimate B
" converges in
probability to B as n
-+ 00,
then in practice we can feel somewhat
confident that the estimate is likely to be near the true value if we
have sufficiently many observations. We will consider conditions on X
and E which ensure consistency.
This question has been considered in the
literature for some of the estimates considered in Chapter I, but we will
take advaritage of the results of Section 1.4 to demonstrate that the
estimates are consistent under a weaker set of conditions than previously
knm..n; hence the techniques we consider will be shown to be useful in a
wider class of situations.
2.1. Some Relevant Considerations.
An initial point that should be made before we consider consistency
of
l\1
"
and
l\m.
"
is that well-known results guaranteeing consistency of
maximum likelihood estimates as long as certain regularity conditions are
satisfied do not apply here.
The elements of
Xz
are what Neyman and
Scott (1946) referred to as "incidental parameters." Each is
th~
J:Ilean of
24
only one observed random variable (or r in Model II), so clearly we
cannot speak of consistency of an estimate of XZ ; furthennore, when such
parameters are present, other parameters may not be estimated consistently
by maximum likelihood.
In their original discussion of this phenomenon,
in fact, Neyman and Scott used an EIV example in the SLR case (1.Z.3) and
demonstrated that the MLE of oZ is inconsistent.
When incidental
parameters are present, they do not affect consistency of the MLE's of
other parameters if the ratio of the number of incidental parameters to
the munber of observations tends to zero, but in Model I and r-bdel II (i) ,
these limits are, respectively, Pz(P+l)-1 and pz(l+pz+r- lp l)-l
In
these cases, we will have to demonstrate consistency directly.
r-bst of the authors referred to in Chapter I have considered
consistency of the estimates which they have defined.
GIeser (1981),
e·
Healy (1975, 1980), and Puller (1980) all make use of the following
assumption:
!:::.
= lim n-1 X' X exists and is positive definite .
(Z.1.l)
Jl:"+<'O
In a model like our Model I, but with all independent variables
(except possibly an intercept tenn) subject to error, GIeser (1981)
demonstrates consistency of
(2.1.1).
l\1
"
using only our llidel I assumptions and
Note that although the estimate was derived assuming a normal
error distribution, his demonstration of consistency does not depend on
that assumption (and neither will ours).
Healy (1975, 1980) maintains
A
the assumption of normality in his demonstrations of consistency of BM
in r-bdel I and B
"MR in llidel II (i), but it is not real"lY necessary.
Fuller does not specifically consider consistency of BF , but it follows
e
25
from his demonstration that
in which he makes use of the assumption that the error distribution is
normal.
Assumption (2.1.1) seems somewhat restrictive.
Consider, for
example, an SLR model in which the values of the independent variable
the sample size; for simplicity, say xk
= k,
vary linearly
\\~th
k = 1,2, ....
In such a situation, it seems that we have a great deal of
good infonnation about the line, and we would certainly expect to be able
to estimate it well.
Nevertheless, (2.1.1) does not hold since
n
I
k=l
k
2
= O(n 3)
as n -+
co •
To obtain some direction in possibly weakening (2.1.1), it is
instructive to consider conditions known to be sufficient for consistency
of estimates in the usual linear regression model (the special case of
Model I with Pz
= 0).
In recent years, results of increasing strength
and generality on this matter have been produced:
see Eicker (1963),
Drygas (1976), Anderson and Taylor (1976), Lai et ale (1979).
Conditions
on the errors vary somewhat among these papers, but the condition on the
sequence of matrices X which is crucial to all of them is
A (X' X) -+
p
co
as n
-+
co •
(2.1.2)
This condition is clearly much weaker than (2.1.2); it requires only that
the independent variables are not highly correlated amongst each other
26
and that the values of these variables "spread out" fast enough.
In SLR
with intercept, for example, (2.1.2) says that
n
I
i=l
(x. - x)
2
-+
00 as n
-+
00 ,
1
which is satisfied as long as the x.' s do not cluster about a particular
1
value.
Our goal here will thus be to define behavior of X "intennediate"
between (2.1.1) and (2.1.2), which will be sufficient for consistency of
estimates in EIV models.
2.2.
Basic Consistency Results.
Consider the following conditions on the sequence of matrices X:
(A. 1)
n -~ A (X' X) -+
P
(A.2)
A-1 (X' X) A2(X' X) -+00 as n-+
1
00
as n -+ 00
P
oo •
Our first consistency result is
Theorem 2.1.
Under the assumptions of M:>de1 I, if (A.1) and (A.2)
hold and the joint distribution of the errors E possesses finite fourth
" -+
p B as n -+
moment, then ~
00 •
Before proving Theorem 2.1, we consider the value of our new
conditions on X.
First of all, note that these two conditions are
trivially satisfied in the SLR - no intercept example we considered in the
previous section (however, if we include an intercept, (A.2) is violated
27
if xk = k; a similar condition which satisfies (A.l) and (A.2) and not
(2.1.1), though, is xk = kq , q < ~). Conditions (A.I) and (A.2) are
"intennediate" between (2.1.1) and (2.1. 2) in the sense mentioned
earlier:
either one implies (2.1.2), while both are implied by (2.1.1).
Purthennore, they seem closer in spirit to (2.1.2) than to (2.1.1).
Condition (A.l) is clearly much like (2.1.2), requiring (again, apart
from multicollinearity considerations) that the variables "spread out"
fast enough, albeit at a faster rate than (2.1.2).
Condition (A.2) also
can be seen to hold much more generally than (2.1.1).
It is easily
demonstrated that it is satisfied, for example, whenever (A. 1) holds and
the independent variables are bounded.
In proving Theorem 2.1, we will make use of the following results,
the second of which may be a special case of a result which is kno"n but
could not be found.
Lennna 2.2.
(ii)
Al (X ZRX 2)
(i)
letting (X' X) -1 = [ L
~
Al (X' X) ;
L ],
2
l
PXJ>l PXPZ
Proof.
Let Z*'
o
=[
Z'], where Z
0'
o
lXP1 1XPZ
0
is an eigenvector
' RXZ) -1) such that II Z II = 1.
associated with A ((X Z
.
P2
0
is a lower right-hand submatrix of (X' X) -1,
inf
Ilzll=l
Z'(X' X)-1 Z ~ Z*(X' X)-l Z*
= Zo (X'RX
)-1
Z Z
o
Z
0
= APz ((X'RX
)-1)
Z Z
0
Since (X 2' RX Z) -1
Z8
since for an mxm matrix A,
A.(A)
1
this implies (i).
-1
= ")n-l
~
.(A-1 ) ,
Noting that the non-zero eigenvalues of LZLi and
L2LZ are identical, (ii) follows similarly since LiLz is a lower
right-hand submatrix of (X' X) -2.
0
For any matrix A, let
IIAII = (tr(A'
A))~
.
Lerrnna 2.3.
If {An }n=1,2, ... and {Bn }n=1,2, ... are sequences of
mxm syrrnnetric matrices, then
e·
Proof.
Letting zn be a length 1 eigenvector associated with
Am(Bn),
A (A ) - A (B ) =
m n
m n
=
inf
x' A x
n n n
Ilxnll=l
inf
x' B x
n n n
Ilxn 11=1
inf
x' A x - z' B z
n n n
n n n
Ilxn 11=1
~
z' (A - B ) Z
n n
n n
Similarly,
A (A ) - A (B ) ~ y' (A - B)y
mn
where y
n
mn
nn
nn
,
is a length 1 eigenvector associated with A (A ).
clear that for each n there exists a vector t
m n
n
Thus it is
of length 1 such that
29
It'n (An
= 0(1)
Proof of Theorem 2.1.
~1
= (C*'C* -~
=(1
P
n )tn I
-B
by hypothesis .
From (1.4.2),
eL:* ) -1 (C*' Y - eL:* )
UO
EUO
+ (X' X) -1 (X'U* + U*' X + (U*'U* - nL:*) + (no
U
x
0
(X' X)-l (X' Y + U*' XB + (U*'
2 - e)L:* ))-1
00
E -nL:* ) + (ni - e)L:*
EU
EUO
)
Clearly, it will suffice to show that
·e
(i)
(ii)
(iii)
(X' X) -1 X' U*
£a
(X' X) -1 U*' X £
a
(X' X) -1 (U*'U* - nL:*) 1;
U
a
(iv)
(v)
(X' X) -1 X' Y
£B
(vi)
(X' X) -1 (U*'
E -
nL:* )
EU
1;
a.
Eicker (1963) has shown (v) when A (X' X)
p
-+
00,
which is of course true
by (A.1); (i) also follows immediately from his work under the same
condition.
30
Note that
1
U*'V* - nr*
u
= 0p (n'1)
under our assumption of finite fourth error moment, so (iii) holds if
(X' X)
-1
= o(n -k:2), and this follows from assumption (A.l). The same
argument demonstrates (vi).
Now
(X' X)-l V*' X = L V' X
2
the (i,j)th element has mean zero and variance
n
Z
L
xk ·
k=1
J
x
h
L2i denotes the l· th co 1umn 0 f L'2.
were
max diag(X' X)
x
e·
L2'·r L2 · ,
1 u
1
Thus (ii) is satisfied if
max diag(LZL Z)
~
0
this is seen to be equivalent to (A.2) using Lemma 2.2(ii).
Letting k henceforth denote Al-1 (X' X), we need only demonstrate (iv),
which is equivalent to
Note that
31
where r.~~ is a s)'lImetric square root of
k(e - no 2)
r- l ,
if and only if
2
no 1+1))
P2
=
we will show that this converges in probability to zero.
Let
For each n, D is positive semi-definite of rank P2 and thus has P2
positive eigenvalues, its other eigenvalue being zero.
Now using the fact
that
RX
1
= X'1 R = 0
'
,...e can show that
(2.2.1)
say .
Using (A.2), Lemma 2.2(ii), and essentially the same arguments by which we
e
previously showed (i) and (ii), M ~ O. M2 does likewise since
l
-!..
k = o(n 2). Finally, noting that In - R is idempotent with rank PI' we
deduce that
32
= 0
2
kpl I
P2
+1 =
0
(1) .
The diagonal elements of M3 are positive with expectations going to zero
and thus go to zero themselves. Since each member of the sequence {~13}
is positive definite, we have M
3
£ O.
It now follows from Lemma 2.3 that
so from our preceding discussion,
o
which demonstrates (iv) and completes the proof.
It seems clear from looking at the demonstrations of (ii) and (iii)
in the proof of Theorem 2.1 that (2.1. 2) is too weak a condition for our
model and we actually do need (A.I) and (A.2).
Our requirement of finite
fourth moments of the rows of E is not particularly restrictive, yet we
could weaken it if we were willing to strengthen (A.I):
it requires no
major modification of the above proof if we assume that (A.2) holds and
for 0
:5
6 :5 2, the rows of E have finite
n- 2 (2+6)
-1
A (X' X)
P
-+
00
(2 +6) th moment and
as n
-+
00
•
In particular, we would need only finite variance if
e·
33
Such a condition would not be "intennediate" in the sense we desire, that
is, it clearly doesn't follow from (2.1.1), which is why we stated the
theorem in the fonn we did.
Not surprisingly, a result similar to Theorem 2.1 holds for ~.
We
state it here, and in proving it, we will, as in Chapter I, proceed more
quickly through the parts of the proof which are essentially the same as
their cotmterparts in Theorem 2.1.
Theorem 2.4.
In Model II(i), if (A.I) and (A.2) hold, then
11.m P-+
1\
B
as n
-+
ex>
"
(Note that we do not assume here that any rnoments higher than the second
are finite.)
Proof.
r
1\
By (1.4.4),
r
R = (L
L Cl[c'P~.Cl[c)r.1R
.1= 1 J=
. 1 1 1J J
1
r
r
(L
L
. 1 . 1
Cl[c'Pl[c.'(.)
1= J= 1 1J J
r
= (I
+ (X' x)-l (X'U*+U*' X) + (X' X)-l {(1- 8 r(r-1)-1)r- l L Ul[c'U~
R
. 1 1 1
P
1=
(2.2.2)
x
(B + (X' X) -1 (X' £ +U*' XB)
+ (X' X)-
~
Since r is fixed,
1
-1 ..1 r
{(l- 8 r(r-l))r
L Ul[c'
R
i=l 1
E.
1
+ eRr
-2
n U~' E.})
J
i:tj
1
•
34
(X' X) -1
(X'V* + V*'
X)
£.
XB)
£0
0
and
£ + V*'
(X' X) -1 (X'
exactly as (i) and (ii) in the proof of Theorem 2.1.
show that as n
~
We still need to
00 ,
(a)
(b)
(c)
(d)
Now
(c) and (d) will follow similarly if we can show (a) and (b).
Also,
since
r
L
i=l
U~'U~
1
1
=0
P
(n)
by WLLN and
by the independence of different replicates, (a) and (b) are equivalent to
3S
(a')
(l-eR(r-l)r -1 )n(X'X) -1
(b' )
n!.z e (X'
R
xf 1 £ 0
£0
•
Furthermore, if (a') holds, then
n~(l - 8 (r _l)r- l ) (X' X) -1 = opel)
R
=>
n~(X' X) -1 - n!.z
eR r(r -1) -1
(X' X) -1
= opel)
.
Since by (A.l) the first term in the left-hand side of the preceding
statement is 0p (1), we have
(a') => n!.z
e
R
(X' X)-l
=0
P
(1)
thus (a') => (b'), so we can complete the proof by demonstrating (a').
-1
Again let k denote Al (X' X); we want to show that
Ti
1
Now with
denoting a synmetric square root of Tl' we have
Now manipulating the definitions of T and T , we can show that
2
l
36
r
I
r
L
i=l j=l
[Co
1
Y.]' R[C.
1
J
(2.2.3)
Y.]
J
so
(2.2.4)
Also, recall that
(see (1.4.5) and remember that 0 = 0 in our discussions of
\m,
so
c~ = 0 2 ), so we can rewrite (2.2.4) as
(2.2.5)
with
E- = Var(E)
= r -1 E.
E
Nm.;
we can proceed just as we did after (2.2.1) in the proof of Theorem
2.1 to show that the difference between the right-hand side of (2.2.5) and
converges in probability to zero.
As before, DR is not of full rank for
any n, so its smallest eigenvalue is zero.
conclude
Again we apply Lemma 2.3 to
37
and hence
o
as desired.
The final results of this section concern our Fuller-type estimates
and are easily demonstrated using by now familiar argtDllents.
Theorem 2.5.
If (A.I) and (A.2) hold, then in Model II(i) and in
Model II (ii) ,
as n
Proof.
In
~~del
-+
00
•
II(i), by (1.4.7)
x (B + (X' X)-l (X' £+U*' XB) + (X' X)-l r- l (r-1)-1
II
i;tj
U~, E.) .
1
J
As in the preceding theorem,
(X' x)-l (x'u*+u*, X)
e
£0
and (X' X)-l (X' £+U*' XB)
£ 0".
The other two terms in (2. 2. 6) which we want to show tend to zero do so by
(A.I) since both
II
i;tj
U~'U~ and
1
J
II
i;tj
U~,
1
1
Eo
J
are 0p(n"l).
38
J:l Ui!' Eo by
i;tj 1 J
Ui' £, and the argument is exactly the same as above.
In Model II(ii) , E is replaced in (2.2.6) by e: and
(r-l)
II=1
0
Summarizing the results of this section, we have, for each of the
estimates
A
A
A
~,~~,BR'
succeeded in weakening the assumption (2.2.1) to
(A.I) and (A.2) in our demonstrations of consistency.
For the
~bdel
II
estimates, we have reduced the assumption of a normal error distribution
to an assumption that the errors E have an absolutely continuous
distribution with finite variance.
moments for the rows of
E
We did need to assume finite fourth
for consistency of ~, which GIeser (1981) did
not have to do in considering his version of ModelL
2.3.
Consistency with Many Replications.
In our Model II fOnIR.llations, if
c=
0 so that the relation in
question is exactly linear and all variation is due to measurement error
(the situation in which we defined ~), we will be able to obtain
consistency in another manner.
consistency with n
In cases of this type, we will consider
p fixed, but now r
~
-+
00,
that is, we have a fixed
number of experimental units to observe, but we can take as many
observations as we wish on each.
There is no asymptotic behavior of X to
consider; X is a fixed matrix of which we get an increasingly better view
as r gets large.
Theorem 2.6.
(i)
(ii)
BR ~ B
~
e. B
as r
as r
In Model II(i), if n is fixed and
-+
00
-+
00
c=
0,
39
in t>1odel II (ii), tmder the same conditions
as r
Proof.
~ = (I
R
P
4
00
•
Int-fodel II(i), by (2.2.6)
+ (X' X)-l (X'O* + 0*' X + r-l(r-l)-l
x
II
. .
l;tJ
U~, U~))-l
J
1
H
(B + (X' x)-l (X' £ + U*' XB + r-l(r_l)-l
i;tj
U~' e:.)) •
1
J
Since different replicates are independent, the following all converge in
probability to zero by WLLK:
n
r-l(r -1) -1
jj*
i~j
(note why we require that 6
U~'U~
r
J
1
= 0; othenvise £
-1
(r -1)
e 6).
-1
U~' E.
1
Thus ~R
J
e B.
The
argument is essentially identical for ~ in Model II (ii).
As for
of T
l
~tR' first note that we can deduce from our representations
arid T -- (1.4.5) and (2.2.3) -- that with n fixed, as r
2
r
(recall H = [X
T2
P
4
nL: + H' RH
XB] ), so
-~
Tl
By Lenma 2. 3,
-1
-~ P
T2T1 ~ I
P2
+1 + n
-1
L:
-~
H' RHI:
-~
•
4
00
40
(2.3.1)
L:-~ H' RHL:-~ is not of full rank and thus has smallest
since
eigenvalue zero.
No,~
f\
RfR
-f\
=
using (1.4.4) or (2.2.2), it is not too difficult to show that
(Ip + (r(r -1)
-1
8
-1
-1
-1 \\
oLL.
- l)(r (r-1)
R
C~'C~)
l~J
x
-1
J
1
(r
~
-1
L
10=1
\\ C~'C~) -1 (r -1
((r(r-1) -1 8 -1 - l)(r -1 (r-1) -1 LL
R
o
olJ
l~J
C~'C~))
~L
1
-1
1
C~'Y.)
0111
1=
(2.3.2)
It is easily seen that
r
-1
(r -1)
-1
C~'C~
J
1
are all 0p(l) as r
r
,
~
00.
L
-1
C~,
1
Yo
1
It follows from (2.3.1) that
8 - 1
R
and we have already shown that
R B.
r
i=l
r (r -1)
\1R
-1
D
~
0 as r
BR I; B.
~
00
,
Therefore (2.3.2) implies
0
41
2.4.
The Ordinary Least Squares Estimate.
As demonstrated in Chapter I, the ordinary least squares estimate
(1.2.1) is generally inconsistent in MOdel I.
Nevertheless, it is worth
cons idering whether or not B
"L ever does tend to B for a number of obvious
reasons, not the least of which is that the EIV estimates we have been
considering all require some sort of specialized information and are not
always applicable.
We define a new condition concerning the asymptotic behavior of X,
and obtain our first result along these lines:
Theorem 2. 7.
In Model I, if (A.2) and (A.3) hold, then
BL £ B
as
n-rco.
Proof.
Recall that
BL = (C*'C*)-l
BL - B = (I P +
x
(C*' Y) =>
(X' X) -1 (X'U* + U*' X)
+
(X' X) -1 U*'U*)-l
(2.4.1)
((X' X)-l X'(£ -U*B)
+
(X' X)-l (U*' £ -U*'U*B)) .
Clearly (A.3) => (A. 1) , and just as in Theorem 2.1,
(X' X) -1 (X'U* + U*' X)
R0
,
(X' X) -1 X' (£ - U* B)
£0
But since U*' U* and U*' (: are 0p (n), (A.3) implies also that
.
42
(X' X) -1 U*'U* eO,
(X' X) -1 U*'
£
~ 0 •
o
This result might seem a bit disconcerting at first, making one wonder
what all the fuss concerning measurement errors is all about.
After all,
(A.3) might often be a not unreasonable assumption; it certainly holds in
the SLR example we considered earlier in which the value of the independent
variable was linearly related to the sample size.
It should be clear that
"L can be badly inconsistent if (A.3) does not hold, which occurs, for
B
example, if the independent variables are bounded. The main reason,
though, that Theorem 2.7 is not particularly important is evidenced by
same results of Anderson (1976); whether or not (A.3) holds, for moderate
sample sizes ~L is generally not as li~ely to be close to B as the
estimates derived specifically for EIV models.
We have avoided discussing
unbiasedness of estimators since, in general, finite moments do not exist
for the types of estimates we have considered.
Nevertheless, the
distributions of the estimates designed for the EN models are "centered"
about the true values, while the distribution of ~ is not, even if CA.3)
Bias is present in B
"L, but Theorem 2.7 holds since if the X values
"spread out" at a fast enough rate, they eventually "overwhelm" all sources
holds.
of error present.
We cannot be very precise about this since fixed sample
size theory is generally lacking in EN models and is very unwieldy even
in the simplest cases.
In Anderson's consideration of this problem, he
approximated distributions of same estimators in the SLR case and showed
that
"l\1
is "better" than
"aL
in the sense of having higher probabilities
of lying in intervals centered at f3 unless same function of If31 and n is
small.
In the remainder of this section, we will not allow X to increase
4It
43
at the rate specified by (A.3), and the problem above will not arise.
The
consistent estimates we find will be "centered" about B in the
aforementioned sense.
Further restricting the behavior of X, we can produce a condition
/I
equivalent to consistency of B .
L
(A.4)
Consider
There exist M and M such that for all n
1
2
Theorem 2.8.
Remark.
~
p,
In Model I, if (A.4) holds, then
Fuller (1980) noticed the sufficiency of a similar condition
in a slightly less general setting, assuming (2.1.1).
Proof.
~
L
- B
=
(I
From (2.4.1),
P
+ (X' X) -1 (X'U* + U*' X) + (X' X) -1 U* 'U*)-1
x
((X' X)-l X'(e: -U*B) + U*' e: - U*'U*B) .
It is easily shown that (A.4) implies (A. 1) and (A. 2), so as in Theorem
2.1,
(X' X) -1 (X'U* + U*' X)
e
Since
£0
,
eX' X) -1 X' (E: - u* B)
£0
•
44
U*'
and n (X' X)
(I
-1
+
is bOlIDded by (A. 4), we can express
+ n(X' X) -1 r*) -1 n(X' X) -1 (L:*
P
= nr*EU
E
U
EU
- L:* B) +
0
U
P
open) ,
~
fi
L - B as
(1)
(2.4.2)
=
(n-IX'X + L:*)-I (L:*
U
EU
- L:*B) +
0
(1)
up'
and the sufficiency is obvious since
(2:* - L:* B)'
EU
U
=
[0
L:'
EU
- B'L:'] .
2u
As
for demonstrating necessity, assume otherwise, i • e ., L: EU
/\
D
B
L
~
B.
~
r UB2 but
Then
and by (2.4.2) the following must vanish:
(L:*
EU
- L:*B)' (n-IX' X + L:*)-2 (L:*
U
U
EU
- r*B) .
U
This term exceeds
=> A ((n -1 X' X + L:*) -1) -+ 0
P
U
=> A ( (n -1 X' X) -1) -+ 0
p
=> n
-1
Al (X' X)
-+
which contradicts (A.4).
00
o
45
Corollary 2.9.
If
£
is independent of U, then B is estimated
consistently by least squares if and only if B =
2
Proof.
LEU
o.
= 0, so we would need !:uB2 = 0, and
L
U
has full rank. 0
Now we certainly should not want the consistency of an estimate of B
to depend on L, over which the experimenter generally has little control,
much less on the value of B itself.
The main value of the preceding
result thus seems to be that the necessity of the condition of the theorem
provides more evidence to discourage the use of ordinary least squares in
estimating B.
It says nothing, however, about the potential of
consistently estimating certain linear functions of the elements of B.
·e
Particular contrasts are the main quantities of interest in many
si tuations.
The remainder of this section deals with an anSl'.'er to this
question and some applications.
We will let y' B denote a contrast of the elements of B, with
y' = [ y'
I
IXPI
Theorem 2.10.
Proof.
Under the conditions of Theorem 2.8,
Letting
Qn-l
we have by (2.4.2),
= n- 1 X'
2
RX
2
+ L
u
46
y'(B -B) =y'(n-lX'x+r*)-l (r* -r*B)+o(l)
L
u
£u
U
P
=
(n- l X' X )-1
{y'
1 1
[
:]
o
(Xi Xl) -1 Xi X
z
+ y'
-I
x (r*
£u
Qn [Xz xl (Xi Xl)-l
pZ
- t* B) +
u
0
P
-I
pZ
]}
(1)
Since (A.4) bounds Q from above, our condition is sufficient.
n
(A.4) also bounds Q
n
from below, it is also necessary:
Since
arguing as in the
A
preceding theorem, if y' B were consistent but our condition did not
L
hold, then
(see Lemma Z.2), which violates our hypothesis.
This result is potentially more appealing than the prior one:
0
the
conditions for consistency of a particular contrast now depend only on the
behavior of X, over which the experimenter may well have control.
We
illustrate some uses of this result with examples, the first of which is
simple but instructive:
Example 1.
Xi
In SLR model (1.Z.3),
=
[1. .. 1]
47
so
n
I
.1
1=
x.1
=x
and a contrast Y1 ex + y Z f3 is estimated consistently by least squares if
and only if YZ - Yl x -+ 0, which holds if YZ - Yl C £ o. It follows that
we can consistently estimate exactly one point on the line (we certainly
can't estimate two in light of Theorem 2.8):
Thus we can consistently estimate the response at the level which is the
·e
mean of the values of the independent variable (this mean value is itself
being estimated).
theorem:
This is not surprising and could be deduced without the
the least squares regression line must cross the point (c,y) ,
which must approach (x, ex + f3x), which lies on the true line.
This example
illustrates the operation of the theorem, though, and can be used to
motivate Wald's method of grouping (see Wald (1940), Madansky (1959)):
if
we divide the data into two groups and can from each group consistently
estimate one point on the line, then the line connecting those two points
should consistently estimate the true regression line.
OUr next example is less obvious and perhaps more useful than the
first.
Example 2.
one covariate:
Consider a two-class analysis of covariance model with
48
y..
1)
=
Sx ..
+
(X.
1
1)
i
X'1
=
=
~ ... 1
0...0]
0 ... 0
1. .. 1
c ..
+ e: •.
1)
I,Z
1)
j
=
X'Z
=
=
x ..
1)
+
u ..
1)
1, ... , n i .,
[xII' .. x ln1
X
ZI · .. x Zn Z]
so
and a contrast
y
is estimated consistently if
In particular, if
yi
= [1 -1] ,
e·
Yz
=0
,
and
then
that is, we can consistently estimate the treatment difference, which is
generally the quantity of most interest in such situations.
We can
guarantee that our condition on the x values holds if we can assign
subjects to treatments in such a way as to keep observed treatment means
approximately equal:
49
Of course, while estimation of the treatment difference may be of chief
interest, it is rarely all one wishes to do in such situations:
confidence intervals, for example, or tests for equal slopes would require
information not obtainable through ordinary least squares.
Finally, both these examples generalize to broader settings: we can
consistently estimate the response when all independent variables are at
their mean levels, and in a k-class covariance model with any number of
covariates, we can consistently estimate any linear function of intercept
terms as long as the same function of the corresponding class means tends
to zero for each covariate.
CHAPTER III
Limiting Distributions
3. O.
Introducti on.
The consistency considerations of the previous chapter are important,
but are not sufficient for achieving the goals of many statistical
investigations.
For purposes of statistical inference, we would like to
have some knowledge of the distributions of our estimates.
Fixed sample
size distribution theory would depend on the specific forms of the error
distributions, about 'which we often have little knowledge and would like
to assume as little as possible; also, as mentioned earlier, even in the
simplest EI\' formulations, details of small sample theory have been shO\\'l1
to be extremely tmwieldy.
We will restrict our attention to asymptotic
distributions; we should be able to do this in our general formulations
wi th a minimum of new asstDTIptions.
In particular, we will demonstrate
that, appropriately standardized, our estimates have limiting normal
distributions.
3.1.
Related Results.
Again, as our starting point we will consider the present state of
the literature concerning limiting distributions of the estimates we are
considering.
Fuller (1980) demonstrated that for his class of estimates (1.2.10),
n~ (~F - B) has a limiting zero mean normal distribution; he assumed, as
e·
51
usual, that the sequence X satisfies (2.1.1) and, furthermore, that the
errors themselves are normally distributed, an asslDllption we would
certainly like to eliminate.
The question of lllniting distributions
apparently has not been answered for ~ in Model II (i) or for ~M in our
general
~1ode1
I (Healy did not address this question).
GIeser (1981) did
treat this problem in considerable detail in his version of our Model I
(all variables except an intercept observed with error) and c1allned that
J.-
"
n-2(~1- B)
has a limiting normal distribution if (2.1.1) holds and the
distribution of E possesses finite fourth moment.
His argument, however,
is in error; his conclusions depend directly on an erroneous result which
he states as follows:
Let· yl' Y2" •• be a sequence of nnltual1y independent s-dimensional
·e
random vectors, where y.1 has mean vector zero and finite covariance
matrix V..
1
If
.
-1
1 lllln
n-ko
n
I
.1
1=
v.1 = V* exists (and is finite) ,
then
n
_~
n
L y.
.11
1=
V
~
N (O,V*)
S
This result is not correct as stated, as is seen by considering the
following example (due to D. Ruppert):
random variable such that
For k
=
1,2, ..• , let xk be a
S2
P(X
k
k Z
= 2 / ) = Z-(k+1)
P(x
P (x
k
= 0) = 1 - Z-k
k
= - Zk/Z) = 2-(k+1)
Now P(x
k
~
0) = 2
-k
,SO
00
I
k=l
=I
P(x ~ 0)
k
<
00
,
and by the Borel-Cantelli lemma,
P (x
k
~
0 infinitely often) = 0
=> P(B: no such that x
=> pew: n
-!.>
n
~
1:
i=l
~n
Hence n -!:z Li=l
xi
-+
0 a.s.
X. -+
1
e-
n = 0 V n > no) = 1
0)
=1
.
Thus we have a sequence {xi}' mutually
~n
. not
independent, each with mean zero and variance one, yet n -!:z Li=l
xi IS
asymptotically N(0,1) .
The conclusion of GIeser's lemma can be shown to be true if we add
some condition which involves finite moments of order higher than two.
As
GIeser applies this result to Model I, the random variables of the lemma
involve errors squared; thus if the lemma were restated to require more
than two finite moments, we would need the errors to have more than four
moments finite.
We would like to avoid this if possible.
Limiting distribution theory seems to be needed, then, in each of the
cases we are considering.
As in O1apter II, we may obtain some direction
53
by considering what occurs in the usual regression situation:
~ = (X'X)-lX'Y=>~ -B=(X'X)-lX'£
L
L
properly standardized, thIs has a l~iting multivariate normal distribution
if and only if every linear cOO1bination has a
(Cramer-Wold) .
Now for y
E:
normal distribution
RP,
·n
/I
y' (B
L
- B)
= y' (X'
X)-l
x'
E
= y*'
E
We make the standard assumption that the
and finite variance
·e
l~iting
ci;
=
r
y*.
nl
i=l
E.' S
1
E.
1
,
say .
are L Ld. with mean zero
it is well-known that this quantity has a limiting
n0l1na1 distribution if and only if y* satisfies the so-called Noether
condition:
(3.1.1)
Thus we need
=0
lime max y' (X' X) -1 X.X! (X' X) -1 y/y' (X' X) -1 y')
n-+oo l~i:m
1 1
for all Y
E:
RP , where Xi is the i th row of X.
(3.1.2)
A workable sufficient
condition which is easily demonstrated to imply (3.1.2) is
Z
IIX.II
1
~
A-lex' X)
P
If such a condition is satisfied, then
=0
•
(3.1.3)
S4
0-
1
(X' X) ~
(~L -
B)
£ NP (0, I P)
•
(For a more detailed argument along the lines of the one we have sketched
above, see Eicker (1963); he does not assume that the errors are i.i.d.,
and he shows that individual B.' s are asymptotically normal without
1
considering joint distributions.)
In demonstrating the asymptotic normality of our EIV estimates, we
will use the following result:
Theorem 3.1.
Let {x.} and {y.} be two sequences of random
1
1
variables, each i.i.d. with zero mean, positive finite variances
i,
y
respectively, and Cov(x.,Y.) = 0.. 0
1
J
1J
0;
and
Let {a.} be a sequence of
xy
1
constants satisfying
n
lim n- 1 I a.2 = a 2
n-+<JO
i=l 1
eo<
a
2
<
00
(3.1.4)
•
Then with
n
n
2
2
2 + no 2 + 20
a.
a.
0
=
sn
l 1
x i=l
xy l
Y
i=l 1
2
sl > 0
,
\,n
s -1 £.'
1 (a.x. + y.) converges in distribution to a standard normal random
n
1=
1 1
1
variable.
Remark.
Equation (3.1. 4) is far from necessary, but it is all we
will require in the next section.
Note that the conclusion of Theorem 3.1
holds if n- 1 \'~ 1 a? -+ 0 or if n- 1 tJ?- 1 a? -+ 00, and {a.} satisfies the
£.1=
1
£.1=
1
1
Noether condition since in those two cases, 5 -1 l~ 1 (a.x. + y.) equals
n
1=
1 1 1
55
n
!<
o n2
Y
I
i=l
n
y. + 0 (1) and
P
1
2
(I
0
x i=l
n
1
a.)~
1
I
i=l
a.x. + 0p(l) ,
1 1
respectively.
In proving the theorem, we will use the following 1errma, which seems
to be well-knmVl1 although a proof was not found.
For completeness, we
will include its proof in Section 3.4.
LeJT111a 3.2.
n
( i)
(
a.2) -1
I
1
i=l
(ii)
n
-1
If the sequence of constants {ail satisfies (3.1.4), then
a 2 -+ 0 as n-+
l:sk:::::n k
max
2
max a -+ 0 as n-+
k
l:::::k:::::n
Proof of Theorem 3.1.
2
sn =
ro
oo
•
Since
n
L
Var(a.x. +y.)
i=l
1 1
1
we must verify that the Lindeberg condition holds, i.e., for all E > 0,
lim s-2
n+ro
n
~
i;l
ECCa.x. +y.)2 ICla.x. + y.1 > e:s )) = 0 .
1 1
1
1
(3.1.5)
lIn
In light of (3.1.4), this is equivalent to
lim n- 1
n+ oo
I
i=l
E((a.x. +y.)2 ICla.x. + y.1 >
1 1
1
Cnote that since by Cauchy-Schwarz
1 1
1
2e:n~))
= 0
S6
n
2
L
a.1 >
.1
1=
n
n
(L
.1
a.)
1=
2
1
we have
lim sup
-1 n
a. I s a) .
.1 1
1=
niL
Jl""+'<lO
Since (p+q)2
$
2(p2 + q2) for all p and q, and
Ir
+ s I > t =>
Ir I >
~t
or
Is I
> ~t ,
the 1 imit above is bounded by
n
2 lim
Jl""+'<lO
I
i=l
2 2
2
1
1
E((a.x. +y.)(I(la.x·1 > En~) + I(ly·j > En~))) .
1 1
1
1 1
1
e-
It will suffice to show
(a)
n-
(b)
n
(c)
1
-1
n-
1
n
L E(a 2.x.2 I ( Ia .x. I > EnJ.:2)) ~ 0
i=l
1 1
1 1
n
L
i=l
2 2
~
E(a.x. I(ly·1 > En )) ~ 0
1 1
1
n
L E(y.2 I ( Ia .x.
i=l
1
1 1
n
I
. 1
1=
2
E (y. I ( Iy.
1
I
1
I
J.:
> En 2)) ~ 0
k
> En 2)) ~ 0 .
Now (d) is immediate by the necessity of the Lindeberg condition since the
sum of LLd. random variables with finite variance is well-known to be
e
57
as)~totically
normal.
By Lemma 3.2, the Noether condition holds for the
sequence {a.}; this is equivalent to the Lindeberg condition applied to
1
the random variables {a.x.},
so (a) follows similarly.
1 1
Now the term in (b) equals
n
n- l
I a~ E(X~ I( ly·1
>
En~))
limE(xiI(IYll > En~))
= O.
i=l
1
1
1
using (3.1.4), we have to show
Jl'+'X'
The integrand is dominated by the integrable function xi, so we have
·e
E(xi lim I(IY1' > En~)) = 0 •
Jl+OO
As for (c),
n-
1
n
I
i=l
2
E(y. I ( Ia. x.
1
1 1
I
1
> En~))
From Lenma 3. 2, we have
lim n-~ max I~I
lsJc;;n
Jl+OO
....
0
S8
as in (b), a dominated convergence argument shows that the expectation
above goes to zero as n
3.2.
-+
(Xl
o
•
Asymptotic Normality of Estimates of B.
In this section, we show that the estimates of B which we have been
considering can be expressed in a fonn in which we can use Theorem 3.1 to
demonstrate their as}mptotic normality.
Throughout this section, we will
assume that X satisfies the condition referred to in equation (2.1.1):
(A.S)
lim n- l Xl X = b. ,
with b. positive definite .
]1+00
This merits some explanation since we went to such great efforts to avoid
this assumption in Chapter II.
First of all, it should be clear from the
preceding section that nonnality results using (A.S) will be some
contribution to the existing literature.
If we try to avoid (A.S), we are
building up a fonnidable list of assumptions which X must satisfy.
In our
arguments, we will use the fact that our estimates are consistent, so we
'will have to assume (A.l) and (A.2); from the preceding section, it seems
clear at this stage that we will need some sort of Noether analog at least
as strong as (3.1.2) or (3.1.3), and we still have a long way to go.
One
way to avoid such a list of assumptions is to assume (A.S), which implies
all the conditions just referred to.
Basically, we want to assume that
the relationship between X' X and n does not "get out of hand"; we could
obtain essentially the same results by
s~ilar
arguments if we assume,
say, (A.4) and (3.1.3), but using (A.S) instead will streamline our
results without any real loss of generality.
Also, it will shortly become
e
59
clear that our standardized estimates will be expressed as the sum of two
tenns, one of which becOOles negligible if n-1 X' X -+ either 0 or 00 (see
the remark following Theorem 3.1).
For moderate sample sizes, it may be
more realistic to have variation associated with these terms expressed in
the "limiting" variance; thus even if (A.S) does not hold, the results we
obtain using it may be more useful in practice than results obtained using
technically correct assumptions.
A final point to be made on this subject
is that it is not clear that the variances we derive can be consistently
estimated without an assumption such as (A.S); more will be said of this
later.
Theorem 3.3.
·e
(i)
in
~bdel
If (A.S) holds, then
I, if the distribution of the rows of E possesses finite
fourth moment, then n~(~1- B) has a limiting zero mean multivariate
normal distribution as n
-+
00; if the third and fourth moments of the
distribution of the rows of E are the same as those of the normal
distribution, then
o
o
where d
=
[B Z -l]E[B Z -1]';
o
([I
B ]E- 1 [I
B ],)-1
pZ Z
pZ Z
(3.2.1)
60
(ii)
1
in Model II(i),
/\
n~(l\1R - B)
(3.2.2)
o
o
o
(iii)
1
in Model II(i),
/\
n'2 (B
R
- B)
£ NpCO,
6- l Cr- 1 (d -cr~)6 + cr~(6+ r-lL~)
(3.2.3)
+ r-1Cr-l)-1 (Cd-cr~)L~ + D ))6- 1 ) ,
R
Civ)
1
in Model !I(ii) ,
/\
~CBR - B)
Remarks.
C3.2.4)
(1)
In the proof below, we simply demonstrate the
asymptotic normality of the estimates; the computation of the variance
matrices is somewhat tedious and is deferred to Section 3.4 for the
interested reader.
61
(2)
As stated in part (i), the assumption that the error moments are
those of the normal distribution is not necessary for asymptotic normality
of
"
\to
Nevertheless, the limiting variance depends on the third and
fourth moments; in general, the expression is very tDlwieldy and can be
evaluated numerically in particular cases using formulas in Section 3.4.
In stating the theorem, we thus assume that the moments are those of the
normal distribution (as did GIeser) since this yields a concise expression
more easily compared with those of other estimates.
In parts (ii), (iii), and (iv) , we have made no assumptions on
(3)
the error distribution beyond those contained in the definition of Model
II (i.e., we need only two finite moments).
·e
Proof of Theorem 3.3.
Part (i):
using (1.2.5) and (1.2.6), it is
easily sho"TI that
W* [~
-1]' = 1:*[e~'
o ~1
Partitioning W* in an obvious manner,
=
=>
W*12
e1:* ~
uo~
- e1:*
EUO
-e]' .
62
A
(~
=> w*
- + er*
11 - e1:*uo )R
~
£uo
=
A
R'_(~
~ 11
= [I p ~]
~
Let 51
= [I
A
- e1:* )R_ + e
Uo~
(W* - er* ) [I
il_] + eL o*
11
uo P~
I
(3.2.5)
B2]r- l ; noting that
P2
0
I
B
0
PI
=o,
r*
0
0
-1
51
we obtain from (3.2.5) that
I
PI
0
B
51
0
PI
I
P
=
w*
0
I
-1
0
51
~
(W*11 - er*
uo )(B
A
-~) •
Now W*11 is just C*'C*, so this yields
I
/I
1\1 - B = - (C*' C* - er*uo )-1
0
PI
I
-1
p
I
PI
51
BM
0
5ince
= [I p B] X' X[Ip B]
it is easy to show that
(3.2.6)
w*
0
E(W*)
B
0
I
+ nL* ,
51
-1
63
a
=
-(C*'C* - eL:* )-1
(W* - E(W*)) [B'
uo
-1]' ,
(3.2.7)
Nm\ CA.S) implies both (A. 1) and (A. 2); 1.D1der those conditions, we showed
in the proof of Theorem 2.1 that
(C*'C*-6r* )-1 X'X = I
uo
P
+ 0 (1) ,
P
so by CA. 5) ,
n(C*'C* - er* )-1
uo
= fj-1
+ 0 (1)
p'
64
Also, we know that ~
£
B, so
Thus if
(3.2.8)
(W* - E(W*)) [B'
we obtain from (3.2.7) that
o
_1
n ~(W*-E(W*))[B'
(3.2.9)
52
/I
so we have reduced the problem of finding a limiting distribution for 1\1
to that of finding a limiting distribution for
_1
n ~(W* - E(W*)) [B '
-1]'.
Recalling that
H
=
[X
XB]
and E*
=
[U*
£]
,
we have
W*
but H[B'
=
-1]'
(H + E*)
= 0,
I
E(Wir)
(H + E*) ,
so for all Y
E
I
:JRP+ ,
=
H' H + nI:*
65
yl (W* - E(W*)) [B I
-1] I
= yl (H'E* + E*IE*
n
= l
i=1
- nL*) [B'
-1] I
y' (H.E~' + E~E~' - L*) [B I
1 1
-1]',
1 1
where Hi and E1' denote the i th rows of H and E*, respectively.
yl = k[B'
-1] for some constant k;e 0, then each y' H.
1
= 0,
If
and limiting
normality follows trivially since we have a sum of i.i.d. random variables
with finite variance.
n-
1 n
l
(y'H )
i
i=l
For all other y,
2-1
= n y'H'Hy -+-y'[l
B]' fl[l
P
B]y> 0
P
the sequences
·e
{Ei:C'[B'
1
-I]'} and
{y'(E~E~'
1 1
-r*)[B'
-I]'}
are each i.i.d. with zero means and finite variances, and Theorem 3.1
applies. Thus for all y E: RP+ 1 ,
n-~Y'(W*-E(W*)) g NCO, y'(d([lp B]' Ml p B]
+
r*) + Dh)
with
•
D = r [B'
-1]' [B'
-1] L •
(see Section 3.4 for variance derivation), so (3.2.8) holds and the result
(3.2.1) follows after same algebra.
e
Part (ii):
this is somewhat similar to part (i), so we will omit
some repetitious details.
We know
\m.
and 8R satisfy
66
T*[~'-1]1
2 f.IR
T~
Thus with Ti and
= T*[8
~I
-8]
1 R~
R
I
•
partitioned similarly as in (1.4.13),
which imply
T~ll
T*2 =
(3.2.10)
Let
A
SIR
A
SZR
-1
B2]T1
= nCr -1) [I P 2
=
~
A
CS 1R [I
Since SIRTI [B
Z
pZ
BZMR] I)
-1] I
= 0,
I
-1
A
~IR
it is easily shown that
0
PI
B
=0
T*1
0
A
SIR
-1
,
67
~
so from (3.2.10),
B
0
I
PI
(T* - r(r -1)
2
0
-1
T*)
1
"
-1
SIR
I
I
0
PI
B
P
"
0
=>
SIR
"
l)1R
-1
" - B=
l\1R
I
- (T
~]
(T~n - 8RTin) [I p
=
Z11 '- 8RTin)
-1
0
PI
I
-1
0
B
PI
(T* - r(r -1)
2
"
0
. ~
SIR
~
I
=
I
P
-(T~n - 8RTin)
-1
PI
"
0
B
0
"
-1
SZR
But
and in Theorem Z. 4 we showed that
r
r
L
i=1 j=1
~
c~'P~.C~)-1 x' X = r- 1 I
1
1J J
using (A. 5), this says that
+ 0 (1)
P
P
T*)
1
-1
SIR
(T* - r(r -1) -1 T*)
Z
1
"
s3R
(2
-1
68
It is clear that
so if
then we can write
o
n-~(T~ - r(r-l)-l Ti)[B'
-1]'
S2R
+
opel) .
e.
(3.2.11)
Now
r
T*
2
= L (H + E~)'
T*
1
=r-
so for all y
E
" 1
1=
1
1
(r -1)
1
r
I E~'E~
"Ill
- r-
1
1=
n
..
E~'E~
l~J
1J
RP ,
y' (Ti - r(r-l)-l Ti)[B'
-1]'
r
= y' (I H'E~
i=l
= y'
(H + E~)
1
n
r
k=l
i=l
l (I
+ (r-l)-1
II
i~j
Ei!'E~) [B'
1
J
n
H E~' + (r-l)-1
k 1k
i~j
-1]'
EikEJ~k)IB'
-1]' ,
(3.2.12)
.
e
69
where
Eiic
denotes the
kth
row of
Ei.
As before, the sequences
r
{ . I 1 E~'
[B'
1k
-1] '}
1=
are i.i.d., each with mean zero and finite variance;
part (i).
{y'~}
is just as in
Thus by Theorem 3.1, asymptotic normality of
"
and hence of n!.:2(l\,m
Part (iii):
- B), follows as before.
recall
·e
=>
n~(~ - B) = n(
~
H
..
l~J
C~'C~) -1 n~
1J
n:
C~, (Y. - C~ B)
·.1
J
l~J
J
(3.2.13)
as long as
t
H
C~, (Y. - C~ B) = 0 (n~)
i~j
1
J
J
P
since it is easy to show that
n-
1
II
i~
C~'C~ = r(r-1)6
1
J
+ 0
P
(1) .
70
But for all
y'
n
i;tj
y
€
RP,
n
C~, (Y. - C~ B)
1
=
J
J
-y'
L
k=l
r
((r -l)X
L
E~' +
k j=l Jk
H
i;tj
U!kE*J·P [B'
-1]' .
(3.2.14)
The sequences
r
{I
j=l
EJ~k[B'
-1] , }
are each i.i.d. with zero means and finite variances.
n -1 ~L CCr -1)Y' Xk) 2 =. n -1 (r -1) 2 y' X' Xy
k=l
~
Also
(r -1) 2 y' b.y > 0
e.
it follows from Theorem 3.1 and calculations in Section 3.4 that
has a 1liniting normal distribution, and (3.2.3) follows.
Part (iv) is essentially the same, apart fram some differences in the
way we calculate the limit variances.
Yj with Y and
Ejk
with
[Ujk
In (3.2.13) and (3.2.14), we replace
o
£k] and argue just as above.
Knowledge of the limiting distributions of the estimates of B is, of
course, not of TIU.lch practical use unless we can consistently estimate the
covariance matrices.
estimates of B,
~,
In each of the four parts of Theorem 3.3, we need
and some components of the error variance.
Estlination
71
of B is no longer a problem, of course; we consider the other estimates we
need for each part of the theorem.
Part (ii):
we need to estimate b. and L
wi th ~, we are assuming is
n
-1
= 0,
(r-1)
Recall that when we deal
in which case
-1
1
n
1
~
L •
We mentioned in the proof of Theorem 3.3 that
Alternately, we could just use the infonnation from replication:
·e
(nr(r -1)) -1
Part (iii):
n
i;tj
C~'C~ £ b. •
1
(3.2.15)
J
we can estimate b. as in (3.2.15).
estimate L, but we also need estimates of
°2is
and
Again we need to
2
0v'
In the current
case,
and we can consistently esti1J.tate
n
-1 -1
r
r
L
o~ upon noting that
2
2
YlY.£B'b.B+o +0is
j=l J J
v
(3.2.16)
72
Part (iv):
t
EU
=
0 and r
Part (i):
essentially the same as part (iii) except we know that
=
1 in (3.2.16).
we need to estimate 6 and o 2 .
We showed in the proof of
Theorem 3.3 that
Also, GIeser and Healy have argued that
in their similar models, and the argunent is the same here: we can show
that L:~ 1 W converges in probability to a singular finite matrix, the
smallest eigenvalue of which is clearly zero (the MLE of 0 2 is, by the
way, n -1
e, which indicates what can go wrong when incidental parameters
are present).
We will conclude this section by considering, in a simple example,
conditions lIDder which we can consistently estimate 0 2 in Model 1. It is
not clear that conditions such as those of Theorem 2.1, say, are enough;
as briefly alluded to at the start of this section, this provides a little
more "justification" for assumption (A.S).
Consider Model I with PI
condition on X, I~=l xi
with
-+
00.
=
0, P2 = 1, to
Then
=
I
p2
' and, as a minimal
e·
73
'"
W=n
-1
~
P2
£
1
1 1
2
13 2 Dc.2 + 2f3Dc.E.
f31:x. + 2f3Dc. u· + Ex. E.
since n- 1 W = 0 2 1
if A (\\1)
2
Now
f3Dc~1+ f3Dc.u.
Dc.E.]
1 1+1
1
Dc.2 + 21:x.u.
11111
+
W+
0
P
1
(1) (see Lemma 2.3).
1 1
Then $2 is consistent
0; we can show that this occurs if and only if det (\\1) ~ O.
det(W)
= -n
-2
n
(L
i=l
. 2
X.(E. -u. 13))
1
1
1
and we would need
n
-2
r
n 2
x.....
i=l 1
(3.2.17)
0 •
Admittedly this is a weaker condition than (A. 5) , but it seems to indicate
that we should require a condition which "controls" the X values relative
to same power of n.
This is not the only way to estimate
0
2 , of course, but the same
types of considerations seem to arise if we try other methods.
If, for
example, we try something like
we ultimately require (3.2.17) again.
3.3.
,
__ tit
Some Comparisons.
In this section, we quickly check some of the infonnation yielded by
Theorem 3.3 concerning the relative merits of the various estimates under
74
consideration.
Keep in mind that one nnlst be careful not to overstate the
importance of comparisons based on limiting variances, but up tmtil this
point we have had no basis of comparison with which to judge the relative
worth of these estimates.
We cannot precisely canpare ~ with the replication estimates using
Theorem 3.3 because of the different circumstances tmder which the
estimates are computed (although generally it seems that fewer sources of
error are expressed in the limiting variances of the ML estimates than in
those of the Fuller-type estimates).
We can, however, compare ~ and
~R in Model II(i) (remembering to set o~
A
~
si tuation in which
is used).
= 0 in (3.2.3) since that is the
It should be interesting to see whether
the extra effort in the computation of ~ is justified by a decrease in
variability.
The following result is best described as a corollary to
Theorem 3.3.
Corollary 3.4.
In Model II (i) with 6 = 0, for all y
!.:
~
A
E
lRP ,
A
lim Var(n 2 y ' (~- B)) s lim Var(n y' (~ - B)) .
n+<x>
Proof.
With
n~
o~ = 0,
A
the limit variance of B in (3.2.3) becomes
R
Comparing this with (3.2.2), since DR is positive semidefinite it will
P2
suffice to show that for all y E:R
,
(3.3.1)
e.
75
Now since E- 1 can be expressed as
J
l
u-l Ee:u
.
(OZ_E' E- 1 E )-1 [E' E- 1
e:u u e:u
e:u u
-1] ,
-1
we have
[I
P2
B]'=r- 1 +(oZ_r'r- 1 E )-1 (E-1E -B)(E- 1 L: -B)'
2
u
e:u u e:u
u e:u
Z u e:u
2
(3.3.Z)
1
=> y' [I
B ]E- [1
B ]' y ~ y' E- 1 y
P2 2
Pz Z
u
B ]E- 1 [I
Z
Pz
(note that there is equality if and only if Ee:u = EuB Z' which we have
1\
seen to be necessary and sufficient for consistency of
Equation
·e
1\).
(3.3.1) now follows immediately fram (3.3.Z) and Graybill (1969, Theorem
o
12.2.14(5)).
\m
1\
1\
"beats" B , possibly quite substantially (in the proof
R
above, we effectively ignored the term in (3.Z.3) involving DR); we can
Thus
perhaps interpret this as meaning that taking the structure of the model
into aCCOl.mt yields a much better estimate of E, and hence of B, than does
replication alone.
1\
variances of
l\1R
In a few special cases, we compare the 1nniting
1\
and B
R
.
to possibly better illustrate what is occurring:
dI
Pz
J
o
- oZ B B'
2 Z
-1
/: :,
76
(ii)
B2 = 0; limit variances:
We now consider some results which are interesting, although perhaps
not of much practical use in their present forms.
It was pointed out in
Section 2.4 that certain important contrasts yare estimated consistently
by ordinary least squares.
If we can compute limiting distributions, we
will have some basis for ccmparing y' ~L with y,~, say.
We will
continue to use assumption (A.S) here for convenience, although we did not
in Section 2.4.
Define
B
= (M +
2L
and note that since
Theorem 3. S.
~
rU f
1
(ME + l:
)
EU
is positive definite, M is also.
In Model I, if (A.S) holds and y' = [y i
n !:Z( Y1'(X'1 X1)-1 X'1 XZ -
yz') --
0
(1) ,
y 2] satisfies
(3.3.3)
77
then
n ~"v
y' (B - B) -+ N(O, d
L
Remark.
L
y'~
-1 y).
Condition (3.3.3) is of course stronger than the condition
of Theorem 2.8, which in light of (A.5) could be restated
,_ 11,-1"
- y 1L.lII L.llZ •
(3.3.4)
y2
It is condition (3.3.3) which makes the theorem of dubious practical value
as stated; it is not clear that this can ever be guaranteed.
Proof.
Recall (see (1.Z.2)) that
·e
~
L
£
(~+ L*) -1 (b.B + L* )
u
=>
EU
~2L £ BZL •
(3.3.5)
Now since
~L = (C*'C*)-l (C*' y) ,
we have
"
-1 Xi (Y - CB" ) .
B
2L
IL = (Xi Xl)
= BI
+ (Xi Xl) -1 Xi XZ(B
Thus for all
y
€
RP,
ft
Z - ZL ) + (Xi Xl) -1 Xi (£ -
~ZL)
78
A
i
~
A
y' BL = Y BIL + Y:2 .ti ZL
=
yi Bl
z
(Yi (Xi Xl) -1 Xi X2 - Yz) (B 2 - ~2L)
+
Y BZ
+
yi (Xi Xl) -1 Xi (£ - ~2L) ,
+
so if y satisfies (3.3.3), using (3.3.5) we obtain
Now in a familiar notation,
the
{(£ - UB
2L )k}' k
= 1,2, ... , are
i. i.d. with mean zero and variance
and the sequence {Yi(XiXl)-l Xlk}k=l,Z, .•• satisfies the Noether
condition (see Lemma 3.2).
Thus we have asymptotic normality, and
-- dL ny 1' (X'1 X1) - 1 Y
~
d
' Ll. . -1 Y1 •
LY1
ll
Finally, note that
-1
~ll ~12
0
+
o
0
-I
Pz
-1
-1
M [~21 ~ll
-I
PZ
]
,
(3.3.6)
e·
79
so for y satisfying, (3.3.4),
o
Corollary 3.6.
In Model I, if (A.5) holds and y
~
0 satisfies
(3.3.2), then
lim
n+oo
Var(n~y' (~L -
B))
~ lim Var(n~y' (~- B))
,
n-+«>
wi th equality if and only if Le:U = L B2 •
u
Proof.
Noting from (3.3.6) that
y,~-l = [Y'~ii 0] ,
the second variance term in (3.2.1) equals zero, so
k
lim Var (n 2 y I
(l\1- B)) = dy' ~ -1 y.
1\
n+oo
The result fo11m,'s since d
L
< d (this is not immediately obvious and is
0
demonstrated in detail in Section 3.4).
As an example, consider the relative values of the limiting variances
in a case in which L
d
= 0 ZI
z
L
pZ
+ l' M = mI
Z
Pz
2 -2
= 0 (1 + m (m + 0 )
,
B B )
Z 2
•
80
Thus in an admittedly idealized situation, least squares "beats" the
maximum likelihood estimate.
Consider the analysis of covariance example
of Section 2.4; not only can the treatment difference (Xl - (Xz be estimated
consistently by least squares, but appropriately standardized it has a
smaller limiting variance than the MLE if
The problem is that (3.3.7) cannot be guaranteed.
The fact that (3.3.3)
is not in general a practical condition should not preclude all use of
1\
these results, however.
After all, y' BL is consistent tmder the
conditions of Theorem 2.8, which often can be guaranteed, and our results
do not deny the possibility that it may in practical situations behave
"better" than the MLE.
Keep in mind that the MLE requires knowledge of
o ; we might be willing to allow some bias or sacrifice same efficiency to
avoid such an assl.DTlption, and hypothesizing an incorrect LO may lead to
L
disastrous behavior of the MLE.
Finally, for contrasts satisfying (3.3.3) it is interesting to look
at the limiting distributions of the other types of estimates we have
considered.
Using Theorem 3.3:
in Model II (ii),
in Model II(i),
81
Note that only for
"
~
can the limiting variance be made arbitrarily
small by choosing r large.
For ~, as well as for ~ and ~L· in Model I,
the limit variance must exceed
,,'hich is the limiting variance of the least squares estimate in the usual
regression case, i.e., no observation error (and this actually is trivially
satisfied for ~1R also, since in that case recall that we are asslDTIing
0
2 :: 0).
0
3.4.
Detail s.
In this section, we supply some details which were anitted earlier in
this chapter in order to make our discussions easier to follow.
Proof of Lemma 3.2.
Either of the two conclusions of the lemma
follows easily from the other; we will show
n
-1
2
max ak-..O as n-"
lsksn
Let Tn =\~
1 a~1 " since
£1=
n -1 Tn -.. a 2
we have
<
00
,
oo •
82
Thus (n+l) -1 an2+l
0, so for all e:: > 0 there exists N = N(e::) such that
-+
n- l a < e:: if n
n
2!:
N.
(3.4.1)
Now we can choose M = M(e:: ,N) large enough so that M > N and
m-1 max
2
~ <
(3.4.2)
e:: for all m > M .
l~k~N
From (3.4.1),
max k -1 ak2 < e:: for all m 2!: M ,
Nsk~
so
-1 2
max m
~ <
e:: for all m > M .
(3.4.3)
Nsk~
Combining (3.4.2) and (3.4.3), we have that for all e:: > 0 there exists
M(e::) such that
m-1 max
l~ksm
ai
<
e:: for all m ~ M .
o
We next complete the proof of Theorem 3.3; asymptotic normality was
demonstrated in Section 3.2, and here we need to derive the limiting
variances for each part of the theorem.
Part (i):
recall that we first need to find the limiting variance of
n
y' (W* - E(W*))A =
l
i=l
y' (H.E~' +E~E~' - r*)A
1 1
1 1
83
with A = [B'
-1]'.
Now
n
Yare
2 (y' H.E~'
A + y' (E~E~' - ~*)A))
1 1
1 1
i=l
n
= L
y' H. Var(E~' A)H! y + ny' Var(E*E*' A)y
1
1
1
1 1
i=l
n
+ 2
L
. 1
1=
(3.4.4)
Cov(y' H.E~' A, y 'E~E~' A) .
1 1
1 1
For each i,
Var(E~'
1
·e
A) =
A'~*A
= d .
Now in an obvious notation, the (k,j)th element of Var(EtE!' A) can be
expressed as
Cov( (EtEt' A)k' (EtEt' A) j)
p+l
= Cov(
L
r=l
p+l
a e l elk' L a e 1 e 1 ·)
r r
r=l r r )
(3.4.5)
p+1 p+l
=
L L
r=l 5=1
a a (E (e l e1ke 1 e l ·) - a k a .) ,
r 5
r
S)
r S)
where a. is the i th element of A and e.. and a.. denote the (i ,j) th
1
1)
1)
elements of E* and ~*, respectively. With. h .. defined correspondingly
1)
for H, we similarly obtain
Cov((H.E~' A)k' (E~E~' A).) =
1 1
1 1
)
p+l p+l
l
l
r=l 5=1
a a h. E(e·ke. e .. )
r 5 1r
1
1S 1)
(3.4.6)
Equation (3.4.4) can be evaluated using (3.4.5) and (3.4.6); as we have
84
mentioned previously, we will assume that the third and fourth moments of
the distribution of the rows of E are those of the normal distribution in
order to obtain a workable and probably fairly reasonable expression.
Since the errors have zero mean,
°
°
°
E(e1re1se1ke1j) = r kOS). + rs)
Ok· + r).ok s
E(e.ke.
e .. ) =
1
IS I)
o,
so (3.4.6) is zero and (3.4.5) becomes
p+1 p+1
\
\ a a (0 k+· · k . = ok)· A' ~* A + A' ~)*. ~k*' A
L
L
r=l
s=l
r s rs)
r) s
°
° ° )
L.
L.
L.
thus
Var(E!E!' A) = (A' r* A)r* + r* M' r*
= dr* + D ,
and from (3.4.4) we obtain
Var(n
n
-k
;z
L
(y'
i=l
H.E~'
1
1
A + y'
(E~E~'
1
1
- r*)A))
= n-1 dy'H' Hy + y' (dr* +D)y
~
dy' ([I
P
B]'
~[I
P
B] + dr* + D)y .
By our discussion in Section 3.2, we thus have
n-~(w* - E(W*))[B'
£ Np+1 (O,
-1]'
drIp B]' ~[Ip B] + dr* + D) .
(3.4.7)
85
1
"
Hence n'2(~
- B) is asymptotically nonnal, and we find its limiting
variance using (3.2.9):
l
PI
OJ (d{IB]' 6 [I
B] + dr* + D)
S
P
P
2
0
S3
6
-1
J
(3.4.8)
•
S'2
We look at each of the three tenns in (3.4.8) separately.
First, noting
that
B ]'
1
·e
we have
6
-1
[PI O]d[I
S
P
S3
B]'
2
MIP B] [PI0 SJ3 6- 1
=
d6- l .
S'2
Next,
[
dt:.- 1
PI
S3
s}{:1
3 6- 1
SJ
S'2
=
~
0
=
Finally, since
d6- l
d8-1~
-1
°
J
S2I:Si 6
0
( [I
J-l
B ],)-16
B ]r- l [I
2
P2
P2 2
.
86
it follows that
and the last term in (3.4.8) vanishes.
Adding the terms above, we obtain
the result (3.2.1).
Part (ii):
referring to (3.2.12), we first need
n
I
k=l
ry' H Var
k
(Eik A)Hk y
Now, just as before, Var(EikA)
Var(y'
II.
.
1~)
=
II
..
1~)
+ (r -1) -2 n Var(y'
n:
EhEji A)
(3.4.9)
i~j
= d.
Next,
E~ E~' A)
11)1
Var(y 'E~lE~I'
A) +
1)
II
..
1~)
Cov(Y'E~IE~l'
A, y 'E~lE~I'
A)
1)
)
1
=
r(r -l)y' (Var(E* E*' A) + E(E* E*' M'E* E*')h
=
r(r-lh' ((A' E(E
=
r(r-l)y'(dl:*+Dh·
11 21
Z1Eii)A)
11 21
11 21
x E(EilEii) + E(EhEii)M' E(EilEii))y
87
It is clear that the third term in (3.4.9) is zero.
=
r
n- 1 VarCCH'
= n- 1 CrdH'
~ rd[I
p
I
i=l
E~ + Cr-1)-1
1
II
i;tj
Thus
E~'E~)[B' -1]')
1
J
H + (r-1)-2 nr(r-1)(dr*+D))
B].6[I
p
B]' + rCr-1)-1 (dr*+D)
C3.4.10)
so
has a limiting normal distribution with zero mean and the variance above.
A
- •
1
/I
Thus n~C\1R - B) is also asymptotically normal with limiting variance
which can be obtained using C3.2.11) and (3.4.10):
S
°Jcrd[I
2R
P
B].6[I
p
B]' + r(r-1}-1 (dL:*+D))[P1
0
88
OJ D -
o,
S2R
and we obtain the limiting variance
o
Part (iii):
LL
i;tj
C~,
1
(Y. -
J
C~
J
B)
note that
= LL
C~
1
= LL
C~,
i;tj
i;tj
1
'(£, - U. B)
J
J
6 +
n:
i;tj
C~,
1
01· - U~ B)
J
J
.
(3.4.11)
Since 6 is independent of each U. and V., we can show that the two tenns
J
J
above are uncorrelated and
Yare
LL
i;tj
C~, 6)
= (r-1)
2
Yare
1
r
L
.1
1=
e-
C~, 6)
1
Now note from (3.2.12) that the last tenn in (3.4.11) looks and behaves
much like
- (r -l)[I
with Ti and
T~
p
O](T~ - r(r -1) -1 TiHB'
as in part (ii) (and each
-1]',
replaced by V ). We can
j
thus use (3.4.10) to eliminate much of the detail in computing the
E
j
e
89
e
limiting variance, replacing 0
2
in our part (ii) calculations with 0 2
v
wherever appropriate:
lim var(n-~
Jl'+OO
.
= hm
n~
n
+
II
i;tj
-1
C~ I
(Y. -
J
1
(r -1)
2
C~ B)
J
2 2
0<s(r X' X + rnl:*)
u
(r _1)2 [I
0] (r(d -
p
+ r(r-1)-1 ((d -
=
(r-l)2
(0~(r2 6
+
O~)l:*
rl:~)
O~) [Ip
+ D))[I
+ red -
B]
p
0~)6
6[I
I
0]
B]
p
I
+ r(r-l)-l ((d -
O~)l:~
+ DR)) ,
/I
1
and by (3.2.13), the limiting variance of n~(BR - B) is
·e
Part (iv):
note that
(3.4.12)
r
(r - 1)
=
I
i=l
C~ I
e
C~IU.B
J
1
1
The two tenus above can be shown to be lmcorrelated since l:eu = 0 in
Model II(ii), and similarly as in part (iii),
r
Var(
I
'1
1=
C~I
1
e:)
= 02(r 2 x'
Now note that computing the variance of
X + rnl:*)
u
C~
1
I
.
U. B is essentially the
J
90
same as obtaining the variance of T~ - r(r -1) -1 Ti in part (ii):
(3.4.9), we replace H,
E~,
1
and A with X,
U~,
1
and B, respectively.
limiting variance (3.4.10), we correspondingly replace [I p
in
In the
B] with I p '
L:* with L:~, d with dR' and D with DR (DR is defined the same as in part
(iii) except, of course, that L:
lim Var(n -~
n+oo
II
i~j
C~ I
(Y -
0 here).
Thus
C~ B))
.
-1
2 2 2
= llID(n
(r -1) a (r XI
Jl+OO
2
=
J
1
+ (r-l)
EU
(r~t1 +
X + rnL:*))
u
r(r-l)
-1
(~L:~ +
DR)) ,
and it then follows as in part (iii) that the limiting variance is
o
We conclude the chapter by completing the proof of Corollary 3.6 and
demonstrating that
with
Then
<1.
< d in Model I.
Recall that
e·
91
so
d
L
=
B
Z
M(M + r )-1
0
-1
-r' (M+ r )-1
1
= [B Z
U
EU
-1]J [B
1
U
r
r
u
r'EU
(M + r ) -1 M
EU
(1
2
r
-04+
U
U
)-1
BZ
LEu
1
0
-1
Z -1] ,
where
J1
=
M(M+ I: )-1 M(M+ I: )-1 r
M(M + r ) -1 r (M + r ) -1 M
U
U
U
U
I:' (M+ r )-1 M(M+ r )-1 M
EU
U
U
U
EU
L' O4+r )-1 (-2M- r )(M+r )-1 I:
EU
U
U
U
EU
+
0
2
Using the fact that
·e
(r.1 + L ) -1 M = I
U
P2
- (M + r
) -1
U
L
U
'
we can show that this equals
[B 2 -I]J 2 [B'2 -1]'
I
with J 2 equal to
r (M+ I: )-1 (2M+ r )(M+ I: )-1 I:
r (M+r f1 (2M+I: )(M+r )-1 L
U
U
U
U
U
U
U
U
U
EU
I: I:' (M + I: ) -1 (2M + I: ) (M + I: ) -1 I:
EU
U·
U
U
U
r' (M + r ) -1 (2M + L )(M + I: ) -1 r
EU
Recalling that
d
=
[B
Z
-l]I:[B
Z
-1]' ,
U
U
U
EU
92
we get
d
L
- d
=
-[B'
2
-l][I
U
= -q' (2M+ L:u)q
I' ](M+L: )-1 (2M+L: )(M+L: )-1
EU
U
U
U
p:U
L:
EU
][B '
2
-1]'
say.
The result follows since M and E are positive definite:
u
with equality only if q = 0, Le., only if EEU = EuB Z.
<\ - d
~.
0,
o
e·
~
..
CHAPTER IV
M-Estimation in an EIV Model
4.0.
Introduction.
Standard linear regression teclmiques are potentially subject to a
number of problems and limitations which hinder their effectiveness in
many situations.
For many of these problems, remedies and modifications
have received much consideration in the literature.
Generally, the same
types of difficulties can arise in EIV models, yet extensions of remedial
·e
measures for many of them have not been considered.
consider one such problem:
formulation.
In this chapter, we
we extend robust M-esti,mation to an EIV
Our treatment will not be as far-ranging and detailed as
some existing treatments of the standard regression case, but it is
illustrative of the fact that there do exist avenues of future research in
EIV models.
4.1.
Background.
The efficiency of many statistical estimation procedures is highly
dependent upon the assumption of a normal error distribution.
Among these
are classical least squares teclmiques for the usual linear regression
model.
By the Gauss-Markov theorem, the estimates obtained are optimal by
almost any sensible standard if the errors are i.i.d. normal random
variables.
However, both non-normality of the errors and the presence of
94
outliers can have the effect of seriously tmpairing the efficiency of
these estimates.
One rarely has very dependable knowledge concerning the
exact form of the underlying error structure in a given situation; in
practice, though, it often seems that error distributions with somewhat
heavier tails than the normal are the rule rather than the exception (see
Hampel (1973)).
Also, it is clear that outliers can be very difficult to
spot in regression situations, as opposed, say, to estimating a location
parameter (see Hoaglin and Welsch (1978), Huber (1981, Chapter 7)).
Historically, although empirical evidence often was indicative of the
violation of LLd. normality asstmlptions, the problem was largely ignored
until relatively recently.
This was due partly, perhaps, to
misunderstanding of the Gauss-Markov and Central Ltmit Theorems, and
partly to a lack of realization of the sensitivity of certain procedures
to departures from normality, but the main culprit was likely that the
excessive computation that would have been involved in developing and
implementing more general procedures would have rendered such procedures
impractical.
Some relief for these problems was, of course, provided by the
development of distribution-free or non-parametric procedures.
More
recently, Huber (1964) introduced a new approach to the problem of
estimating a location parameter; the method was termed M-estimation since
it generalized the method of maximum likelihood.
Rather than mintmizing
the sum of squared deviations, he proposed minimiZing the surn of a
different convex function of the deviations, presurnab1y one which was less
rapidly increasing.
An example of such a function (one which was shO\..'l1 to
be optimal in a certain minimax sense) is
e·
95
Ix I
p(x)
< k
(4.1.1)
= klxl
•
-
~
k
2
Ixl
,
~ k
with the value k determined by the experimenter (this corresponds to
maximum likelihood estimation for a distribution which behaves as a normal
one in the center and exponentially in the tails).
An estimate so
obtained generally turns out to be nearly as efficient as least squares at
an exact normal model, while it can be vastly superior for certain error
distributions which might well occur in practice. Modern computing
capabilities make such robust techniques feasible, and the method extends
to a large class of statistical estimation problems.
A number of authors extended M-estimation to the linear regression
·e
model.
An early treatment is due to RelIes (1967); Huber (1973)
considered more general formulations in which p, the order of the
parameter vector, as well as n, was allowed to get large.
An example of
more recent work is Yohai and Maronna (1979).
The works of different authors tend to vary somewhat in terms of the
conditions placed on the error distribution, design matrix, and function
being minimized.
Typically, for an appropriate function P, we minimize
n
l
i=l
p(y. 1
1\
x.1 8)
(our use of notation here should be obvious from previous chapters);
1\
(almost) equivalently, the estimate 8 is chosen to satisfy
n
L
i=l
1\
x! ¢(y. - x. 8) = 0 ,
1
1
1
(4.1. 2)
96
with p differentiable and
~
= pl. Generally, the solution is consistent
and asymptotically normal; for example, if the error distribution is
symmetric,
~
is differentiable, and (A.S) is satisfied, we have
•
(4.1.3)
with
The method of M-estlination has not yet been extended to EIV models.
Clearly the same considerations which motivated extension of M-estlination
to the regression situation also are present in EIV models.
our goal in this chapter:
This defines
we would like to produce an estlinate which
simultaneously generalizes one of the EIV estimates we have considereq
earlier while extending M-estimation to the appropriate EIV model.
Of
course, we would like our estimate to operate in the spirit of robust
estimation and have good efficiency for a class of non-normal error
distributions.
In particular, we will aim for an estimate which protects
against "contamination" of the equation errors e:; we would not generally
expect the observation errors U to behave as unpredictably.
4.2.
Notation and Assumptions.
In this section, we set up the situation in which we will define and
consider the properties of an M-estimate of B; our model will be
essentially slinilar to Model II(ii) of previous chapters with r
= 2.
We would like this chapter to be basically self-contained; hence we
97
will remunber previous asslunptions according to a new scheme and may
even make some slight changes in previous notation (with explanation
wherever there is potentially any confusion).
y
= XS
+ £
C.1 = X + U.1
with
We have
i
= 1,2
,
U , and U mutually independent and identically distributed
2
l
£,
with zero mean, and with £k the k
th
element of e: and uik the k
th
row
of U.,
1
·e
(In this chapter, we are suppressing the superscript "*", wich in
previous chapters indicated the presence of variables observed without
error.
We are still being as general; U and U may contain coltmms
2
l
which are identically zero and L may be singular.) The results of
U
this chapter can be obtained if e: is correlated with UI and U2 or
r > 2; clarity and relative simplicity governed our choice of the model
to be used here.
th
Let Yk denote the k
element of Y, k = 1, ... , n, and
th
the k
rows of X and Ci' respectively. Define
i
We consider a function
(B. 1)
~
(B.2)
Var(~(rll)) <
is odd
00
~
= 1,2
satisfying
,
k=l, ... ,n.
xi<
and cl k
98
(B.3)
~
is non-decreasing
(B.4)
~
satisfies a Lipschitz condition
and we further assume
(B. 5)
For i = 1,2, the distribution of [uh
(B.6)
fj,
= lim n -1 X'
is symmetric
£1]
X exists and is positive definite
n-+oo
(B.7)
For i = 1,2, k = 1, ... , n, Y
(here s
-1
E
:m.P , there exists
O(s) approaches a constant as s
-+
A such that
0).
Condition (B.7) is not difficult to justify; if in addition to our
other assumptions we assume that W is twice bounded1y differentiable,
it follows immediately from a Taylor expansion (with A (lti) = E(W' (r ll ))).
It holds much more generally though; for example, lti' can fail to exist
at a finite number of points provided that the distribution function of
r l l is continuous at those points.
We will also make use of the following notation:
n
T .. (y) = n-~ l c1!kw(r)"k+ n-~c)!kY)
nl)
k=l
s .. (y)
nl]
= Tnl).. (y)
- T .. (0)
nl)
for Y E lRP , let
99
4.3.
Estimation of S.
A
.
Let B be a solutlon to
(4.3.1)
A
equivalently, B satisfies
(4.3.2)
Note that \·:ith \(i(x) = x, (4.3.2) becomes
·e
or
A
which is the replication estimate B which we defined for Model II (ii) in
R
Chapter T. Furthennore, if the independent variables are observed exactly
so that each
'1<j
= xk ' then (4.2.2) becomes
which we saw as (4.1.2) previously.
Thus the estimate defined by (4.3.2)
extends M-estimation in a usual regression model to an EIV model while
generalizing one of our Fuller-type replication estimates.
Since
100
it seems that the estimate should operate in the spirit in which it was
intended:
it appears that this method should diminish the adverse
effects of "bad" £k' s, and to a lesser extent, "bad" U values.
In considering asymptotic behavior of
g,
our first major result is
the following:
Theorem 4.1.
Assume (B.l)~(B.7); then if
g is
a solution to (4.3.1),
we have
n
\
n~(8 - 8) = (ZA) -1 u,,-In-~ k~l
1
/\
(c
()
( ) ) + 0p (1 ) .
1k 1jJ r Zk + c Zk 1jJ r 1k
(4.3.3)
The proof of this result is a bit involved and uses some intermediate
results.
We defer the proof in its entirety to Section 4.5.
From the representation (4.3.3), we immediately obtain the limiting
distribution of S:
define
d ..
1)
= E(1jJ(r.1 1 )1jJ(r.J 1 ))
we have
Corollary 4.Z.
If (B.l)-(B.7) hold, then
(4.3.4)
Proof.
This will follow fran
101
for all a
n
-k
2
E
RP, we have
a'
= n -~
NO'.\'
by (B. 6) ,
n
-1
n
\"
l..
k=l
(a t x )
2
k
-+ a' !::.a
> 0
m1d the sequences
are each i.i.d. with zero mean and finite variance, so asymptotic
normality follows from Theorem 3.1.
Now we have
The two terms above can be shown to be uncorrelated; using (B.l) and
(B. 5) ,
again using (B.6), the result follows.
o
102
Note that with ljJ(x)
= x, we have
A = 1
D = Lu IJ013' Lu'
'-
and the limiting variance in (4.3.4) reduces to that of (3.2.4), which
(keeping in mind that here we are suppressing the superscript lIlt") can
be rewritten as
(4.3.5)
With
L
u
e·
= 0,
D
=0
,
and the limit variance in (4.3.4) becomes
which is that of (4.1.3), the usual regression estimate.
As a simple demonstration that ~ operates as desired, we consider a
special case in which comparison of ~ and ~ is easy:
let 13
=
0 (or,
more generally, 13 2 = 0, with 13 partitioned as in previous chapters).
the limit variance of B
"R in (3.2.4) becomes
Then
103
0
2 (~-l +
!:
.1
~ -1
r ~ -1)
u
'
whereas in (4.3.4), that of "
S is
and just as in the usual regression case, the asymptotic efficiency of
any contrast y'
e
"
relative to y'
A
~
•
1S
(4.3.6)
No~
Huber (1973) has claimed that error distributions which frequently
occur in practice are well-modeled by members of the following family of
·e
distribution functions:
(l - 6) ¢ ( 0
where
~
-1
x) + M ( ( co)
-1
x),
-00
<
x
<
00
(4.3.7)
,
is the standard normal distribution function (this does not
specifically mean that the errors are normal with a-contamination by a
more variable normal distribution; rather, this is just a convenient
approximation to distributions heavier-tailed than the normal which often
arise in practical situations).
typical.
Generally,. 01 < 6 < .15 and c
If with p as in (4.1.1) we take lP
of some consistent estimate
&of
= pI
=3
is
v.;ith k being a multiple
E~(Ei), it is easy to evaluate the ARE
(4.3.6) for several typical values of k and 6 (and c
=
3):
104
tS
"
1.50k
2.00"
0
1. 037
1.011
.05
.829
.829
Thus, in this admittedly special case,
.10
.720
.736
.
e" 1S
nearly as efficient as
~R if £1 is exactly nonnal, while it can be vastly superior tmder error
distributions such as those specified by (4.3.7).
More closely examining
(4.3.4) and (4.3.5) term by term, it seems that we should obtain the same
of results generally if, say, r ll were distributed approximately as
some member of the family (4.3.7).
t)~e
4.4.
One-Step Estimates.
In a given situation, it may be true that no solution
gof
(4.3.1)
exists; even when one does exist, it may be very difficult to produce.
Generally, (4.3.1) must be solved by iterative methods; these may involve
a good deal of computing, and even then the sequence of solutions may not
converge (much depends on the form of
ljJ
and on the choice of an
appropriate preliminary estimate).
In the usual regression situation, Bickel (1975) has considered
so-called "one-step" estimates.
1
Starting with any n~-consistent estimate
of B and performing one Newton-Raphson iteration of the system (4.1.2)
yields an estimate which behaves asymptotically like the root.
advantages are obvious:
The
1
an n~-consistent estimate of B is obtained
without consideration of existence or convergence of a sequence of
solutions (and, asymptotically, without loss of efficiency) and with
substantial savings in computation.
The same considerations lead us to
consider one-step estimation in our EIV model.
e-
105
We define some new assumptions concerning llJ:
(B.B)
llJ is differentiable except possibly at a finite number of points,
and at those points the distribution function of r 11 is continuous
lim sup E11/J' (r n
(B.9)
M+O
(B.lO)
Ie I<M
Var (1/J , (r
ll
)) <
00
+
e) -
1/J' (r n
)I = 0
•
(Note that now A of assumption (B.7) is E(1/J' (r )).)
n
Assumptions (B.B) and (B.9) are easily seen to be satisfied if, say,
~'
satisfies a Lipschitz condition.
They hold much more generally, though;
they are satisfied for the commonly used function
1J!
corresponding to
p
of (4.1.1).
·e
k
A
Consider now an n 2- consistent estimate B* in our model (B
as
R
defined for Model II(ii) certainly suffices by discussions in Chapter
III); let
i
= 1,2 ,
k=l, ... ,n.
One Newton-Raphson iteration of (4.3.2) (the linear approximation to
(4.3.2) about B*) yields
(4.4.1)
which implies
106
(4.4.2)
Using Bickel's terminology, we let equation (4.4.2) define a Type I
one-step estimate of S.
We define
A
A
A
= A(1jJ) =
e·
We now modify (4.4.2) and define a Type II one-step estimate as
/I
S
o
Al
= S* + A- (C' C +C' C)-
1 2
I
x n
2 I
1:2
1:
T (n 2 (S-S*)) .
n
(4.4.3)
Our main result here is
Theorem 4.3.
If (B.I)-(B.IO) hold and S satisfies (4.3.1), then
for both Type I and Type II one-step estimates
(4.4.4)
We defer the proof of this result to Section 4.5.
Theorem 4.3 demonstrates that not only do our one-step estimates
behave like the root of (4.3.1), but they are in fact asymptotically
identical.
Note also that
go
has the asymptotic normal distribution
107
specified in Corollary 4.5.
It is clear that the one-step estimates (particularly Type II) are
relatively easy to compute, especially if residuals from computation of
B* are available.
We could, for example, write (4.4.3) as
(4.4.5)
1
where Rl and RZ are vectors of "modified" residuals; (Ci Cz+ Ci Cl ) 1\
has presumably already been calculated (assuming we have used BR as our
1\
1\
preliminary estimate).
·e
If we use the cammonly employed
corresponds to
p
quite simple.
For some value k, we have
Wwhich
as in (4.1.1), computation as in (4.4.5) becomes
A= (Zn)-l
x {#
1\
1\
of residuals exceeding k in absolute value}
and in Rl and RZ most residuals will be unmodified, with extreme ones
replaced by ±k.
4.5. Details.
We conclude our consideration of M-estimates with the proofs of
Theorems 4.1 and 4.3.
Lemma 4.4.
e
These will require several intermediate results.
If (B.4) and (B.6) hold, then for all Y
E
RP,
With S~l~(y) denoting the jth element of SnlZ(Y) and c lk;
the (k,j) th element of Cl' it suffices to show that for j = 1, ..• ,P,
Proof.
108
Var(s(j)(y))
n12
£0
By (B.4), for some K we have
(j)
Var(SnlZ(y)) = Var(n
~
Z -1
Kn
-~ ~
~l
c lkj (1jJ(r Zk +n
c Zk y) - 1jJ(r Zk )))
n
\
Z
-1
Z
E(clk·)E(n (c Zk y) )
k=l
J
l.
. .- KZII y 11 2 n- 2
.:.
~ K2
-~
I Iy I 12 (n -1
n
(2
l.\
x · + a 2., ) (x , x + tr (r) )
kJ
k k
k=l
JJ
U
n
l.\ x 2 . +
k=l k J
0 2.. ) n -1 (max
JJ
l~i~n
II x I I2
k
+ t
r (r ))
U
= 0(1) by (B.6) and Lanma 3.2 .
Lemna 4. S.
As sume (B. 4) and (B. 6) hold; then there exists K such
o
tha t for all 0 > 0,
Proof.
By (B.4), for all y, y*,
0
e.
109
Thus
= wn = Wn (0)
(4.5.1)
t
say
t
and
(4.5.2)
By (4.5.1) and (4.5.2), it will suffice to show that there exists K
o
·e
such that
No1\'
n)
E (W
= Kon -
1
n
I E ( II c 1 1· II II c 21· II)
. 1
1=
~
n
Kon -1
L
. 1
1=
E( II c . liZ)
11
n
= Ko(tr(L ) +
u
.. Ko tr(L
u
Z
n- 1 . L1 Ilx.II
)
1
1=
by (B.6)
+ b)
Thus if we choose K1 > K tr(b + L )
U
t
then for n large enough we have
(4.5.3)
110
Note that K is independent of 6.
I
Z Z
Var(Wn) = K 6 n-
Z
z z -Z
~ K 6 n
Now
r
Var(llcl·11 Ilcz·11)
~
E(llcl·11 )E(llcz·11 )
n
· I
1=
L
· I
1=
n
r
· I
1=
1
1
Z
Z
1
1
Z
(1Ix.II
+
1
tr(I ))Z
U
But by (B.6) and Lemma 3.Z,
n-
r
i =1
zn
Ilx·11
4
~
n-
1.
I
max Ilx·11
I~ i~n
Z
x
n-
r
i =I
In
1
'
Ilx·11
Z
~
o.
e·
1
Hence
(4.5.4)
Now if we choose K > ZK , then by (4.5.3) and (4.5.4),
I
o
P(W
n
+ E(W
n
) > K 6)
o
~ P(W
- E(W ) > (K - ZK ) 6)
o
1
n
n
~ P( IW
n
~ VarCW
n
- E(W ) I > (K - ZK ) 6)
o
I
n
)((K -ZK )6)-Z
0
I
~
O.
Therefore
Pewn
+ E(W ) $ K 6) ~ 1 •
n
0
o
111
Lemma 4.6.
If (B.l) and (B.4)-(B.7) hold. then for all M> O.
sup
IIY I ISM
Proof.
p
(y) - ZllyAII ....
lis
o.
n
Clearly it suffices to show that
. sUI?
i h II:SM
lis
IZ(Y) - llyAl1
n
£0
Fix 6 > 0; we can covel
since a corresponding result must hold for SnZI'
the set
II y II
:S M with a finite number. say J. balls of radius 6.
We
call these b l , ... , bJ' centered at y l ' ... , yJ' By Lemma 4.5, we can find
K and, for each j s J, n. such that for n > n.,
a
J
J
P( sup (liS IZ(Y.) - S IZ(y*)
y*Eb.
n
J
n
J
II
(4.5.5)
For each j :S J, we can by Lemna 4.4 find n! such that if n > n!,
J
J
Now choose no > max(n l ,· .. , n J • ni, ... , nj); we have
:S P(~ax sup (II SnlZ (y.) - Snl2 (y*)
J:SJ y*Eb.
J
J
+
II E (SnIZ(Y j )
- SnIZ(Y*)
+ P(~axIISnl2(Y.)
J:SJ
J
II
II
~ Ko 6)
- E(SnlZ(Y.))
J
II
~ 6)
112
;!;
J x 6/ZJ + J x 6/ZJ = 6 if n > n
by (4.5.5) and (4.5.6).
o
Thus
(4.5.7)
Now by (B.l), (B.5), and (B.7),
= n-
1
n
L
.1
1=
1
x.x! yA + n-~
11
n
1
Z
L
x.O(n- E((c '· y) ))
Z1
.11
(4.5.8)
1=
say .
e·
By (B. 6),
(4.5.9)
Ih II
unifonnly on
;!;
n
_~
2
max
h~n
;!;
M, and since
Ilx.11
1
x y'y x
A (nl
1
n
L
x.x!
i=1
1 1
+ L)
u
,
we have
(4.5.10)
Thus by (4.5.8)-(4.5.10),
113
SUI?
IIY IlsM
fo110l~s
The result
IIE(Sn12 (y)) - 6yAII
£
0 .
(4.5.11)
o
upon combining (4.5.7) and (4.5.11).
The following result is a slight modification of a result of
Jure~kova (1977).
Lemma 4.7.
exist M, n
If (B.1)-(B.7) hold, then for all 0 > 0, n > 0, there
o such that for n > n 0 ,
P(
Proof.
inf
Ihll>M
liT (y) II < n) < 0 •
n
Fix 0, n; since Tn(O) has unifonn1y bOtmded variance, there
exists M > 2n such that
o
(4.5.12)
Now choose M to satisfy
(4.5.13)
We have
P(
inf
I IY II;oM
S
P(
y' T (y ) < nM)
n
inf
y'T (-y) < nt-1 and
n
+ P(
inf
I hll=M
inf
Ilyll=M
(y'Tn(O) + 2y' !::.yA) > 2rM)
(y' T (0) + 2y' !::.yA) < 2nM)
I hll=M
n
say .
114
Since sup (g - f) > inf g - inf f,
PI
~
P(
sup
y' (T (0) - T (y) +
IIY II=M
n
n
2~yA)
~
nM)
~ P(
sup
IIYIIIIT (0) - T (y) + 2~YAII ~ nM)
IIY II=M
n
n
= P(
sUJ?
Ilyll=M
-+
(y) + 2~yAII >
II-s
n)
n
0 by Lemma 4. 6 .
In addition,
Pz = P(II~f=M(Y' Tn(O) + 2y' ~yA) < 2nM)
~ P(
~
inf (y' T (0) + 2A"A (~)M2) < 2nM)
IIY II=M
n
p
P( -MilT (0)
n
II
< 2nM - 2AA
p
(l~)M2)
= P( IIT (0)
n
II
> -2n + 2A"Ap(~)M)
~ P(IITn(O)
II
> -2n + 2M ) by (4.5.13)
o
~ P( IITn(O)
II
> M ) < 6 by (4.5.12) .
o
Thus there exists no such that for n > no'
P(
inf
IIYII=M
Suppose
IlY II
~ M; then define
y* = t*
and note that
(4.5.14)
Y'T(Y)<nM)<6.
n
IIY* II = M and t* ~ 1; thus
-1
Y
115
P(
inf liT (Y)II < n) ~ P( inf y'T (Y)(lhIIM- 1)-1 < nM)
Ih 11~~1 n
Ily II~M
n
= P( inf
y*' T (t*y*) < nM)
Ilyll~M
~
~
by (B. 3) since t*
P(
P(
n
inf y*' T (y*)
Ilyll~M
n
< nM)
1, and this last tenn is
inf
Ih* II=M
y*' T (y*) < nM) < 0
n
o
if n > no by (4.5.14), which completes the proof.
"e
(
Corollary 4.8.
1\
Under the conditions of Lemma 4.7, if 8 satisfies
4 . 3.1), then
Proof.
1\
1\
1
Let 8 solve T (n~(8 - 8))
n
inf I IT (y)
I hll~M n
II
= 0;
then
1\
1
> n => n ~ II 8 - 8 II < M
so
1
P(n~1I8 - 811 < M) ~
PC inf
I hll~M
I/T (y)
II
> n)
n
> 1 - 0 for n > n
We are now in position to prove Theorem 4.1.
o
by Lemma 4. 7
0
116
Proof of Theorem 4.1.
M, n
o
Fix 6 > 0; then by Corollary 4.8, there exist
such that for n > n ,
0
!-<
1\
(4.5.15)
P (n 2118 - 1311 > M) < 6/2 .
Nm.. .
by Lel11Tla 4.6 we can find n l > no such that for all n > n l ,
P(
sUI?
Ilyll~M
lis
(y) - 2l1yAII > 6) < 6/2 .
n
Combining this with (4.5.15), for all n > n l ,
But of course
so we have, for large enough n,
1
1\
P ( I ITn (0) - 2Mn'2 (8 - B)
Since 6
IS
< cS) >
1 - 6 .
arbitrary, this gives us
!-<
1\
T (0) = 2Mn 2(8 - 8)
n
ID1d
II
+ 0
p (1)
(4.3.3) follows.
We will employ the following lemmas in our proof of Theorem 4.3.
o
117
If (B.B) and (B.9) hold, then
Lemma 4.9.
Proof.
It suffices to show that
n- l
n
l
i=l
£.
lV' (r*.)
11
A •
!..
By the n 2- cons istency of 8*, for any 6 > 0 we can find M large enough so
that for large enough n,
!.
P(lln 2 ((3 - 8*) II > M)
For y
€
(4.5.16)
< 6 •
R P , let
hnCJ)
= n
-1
n
\'
L
i=l
1
(I.ji'(r. +n-~u'. y1 - W'(r ,))
11
11
~
11
Ke have
!..
PC Ihn (n 2(6
- 8*))
I
> 6)
(4.5.17)
~ PC
sUI?
I iY II~M
Ih (y)
n
NO\'J
for any T > 0,
P(
sup
I hll~N
Ih (y)
n
I
>0)
I
> 6) + PC
Iln~C8 - 8*) II
> M) •
118
+ P(
max lin -!.< u1 " II
2
hisn
+ P(
> T)
1
max
Iln-~u1"11
.
1
1:£l:£n
> T) + 0(1)
by the 1m." of large munbers, and this quantity tends to zero us ing (B" 9)
since
n
-~
TIms
P(
sUI?
Ih
I iYll:£M n
(y)
I
> 6) -+ 0 ,
(4.5.18)
and by (4.5.16)-(4.5.18) we have
Recalling that
this gives us
n
•
-1
n
L
. 1
1=
1J;' (r *") = n
11
-1
n
L
" 1
1=
1J;' (r .) + 0p (1) .
11
119
The members of the sequence {1JI' (r l 1·) }.1--1 , ... ,n are i. i. d., and the result
follows by WLLN.
Lemma 4.10.
0
If (B.l), (B.5)-(B.6), and (B.8)-(B.lO) hold, then
It suffices to show that
Proof.
n
-1
n
I
i=l
p
1JI' (rZ·)cl·c Z· ~ M as n
1 1 1
-+
ex>
•
(4.5.19)
Similarly as in Lennna 4.9, we obtain that
·e
n
-1
n
I
i=l
~'(r*Zl·)cl1·cZ'1· = n
-1
n
I
i=l
1JI'(r Z·)c l ·c Z
'· + 0 (1) .
1 IIp
(4.5.20)
Noting nm,· that the sequences
are each i.i.d. with zero mean and finite variance (note in particular
that since 1JI' is even, the symmetry of the error distribution implies
•
n
-1
n
.I1
1=
n
11,' (
0/
r Z1
· cl·c
Z· 11
)
,-
n -1
I
. 1
1=
,1,'
(r
0/
· ) x.x.'+ 0p (1) .
Z111
In light of (4.5.19) and (4.5.Z0), we need only show that
120
n
-1
n
p
L ~'(r2°)xox! =.-M
i=l
1
1 1
(4.5.21)
or, for k,j = l, ... ,p,
n
where x
st
-1
n
L
i=l
p
a,
xokxo o~' (r 2 o) - Mk)o =.1
I)
1
and list denote the (s,t)th elements of X and li, respectively.
NO'\'
E(n -
1
n
-1 n
L
x °kX° 0 l}!' (r 2 0) - AA.)o) = An
L xokxo ° - -1<
A.)o ,
i=l 1 I)
1
-1<
i=l 1 I)
which goes to zero by (B.6), and
Var(n
-1
e·
n
L xokx .. l}!'(r 21o) - Mk)o)
i=l 1 I)
= (n
~ n
-+
-2
-1
n
L xo2kx 2.. )Var(l}!'(r 2l ))
i=l 1 I)
2
-1 n
2
x' k x n
L Xoo
x Var(~'(r2l))
l~ i~n 1
i=l I)
max
o
0 by (B.6), (B.10), and Lerrona 3.2
We are now in position to quickly complete the proof of Theorem 4.3.
Proof of Theorem 4.3.
-~
Since (8 - 8*) = 0p (n 2), we can apply Lerrona
4.6 to obtain
~
ZAn 2 li(8 - 8*) +
0
P
(1)
•
121
hence
~
T (nZ(S - 8*))
n
•
~
+ 2An z 6(8 - 8*) + 0 (1)
= Tn (0)
(4.5.22)
p
Now letting
n
g(l)
= i=l
L (~'(r*2·)c1·c2'·
1
1
1
~'(r*1·)c2·c1'·)
+
1
1
1
we have by (4.4.2), (4.4.3), and (4.5.22) that for j = 1,11, the Type j
one-step estimate satisfies
By Lerrnnas 4. 9 and 4.10, we have ilmnediate1y that for j
n(g(j ))
so both
t)~es
-1
= (~)-1
= I, II ,
+ 0 (1)
P
of one-step satisfy
!</\
=> n Z ( 8
o
- (3)
= (2A) - 1
1
6 - T ( 0) + 0 ( 1) .
n
p
This is the same representation (4.3.3) which we demonstrated for the root
of equation (4.3.1).
o
BIBLIOGRAPHY
•
Adcock, R.J. (1878).
A problem in least squares.
The Analyst 5 53-54.
Anderson, T.W. (1951). Estimating linear restrictions on regression
coefficients for multivariate normal distributions. Ann. Math.
statist. 22 327 -351.
Anderson, T.W. (1976). Estimation of linear functional relationships:
approximate distributions and connections with simultaneous equations
in econometrics. J. Roy. statist. Soc. B 38 1-20.
Anderson, T.W. and Taylor, J.B. (1976). Strong consistency of least
squares estimates in normal linear regression. Ann. Statist. 4
788-790.
-e
Bickel, P.A. (1975). One-step Huber estimates in the linear model.
J. Amer. statist. Assoc. 70 428-434.
Drygas, H. (1976). Weak and strong consistency of the least squares
estimators in regression models. Z. Wahrscheinlichkeitstheorie und
verw. Gebiete 34 119-127.
ficker, F. (1963). Asymptotic normality and consistency of the least
squares estimators for families of linear regressions. Ann. Math.
Statist. 34 447-456.
Frisch, R. (1934). Statistical Confluence Analysis by Means of Complete
Regression Systems. University Institute of Economics, Oslo.
Fuller, W.A. (1980). Properties of some estimators for the errors-invariables model. Ann. Statist. 8 407 -422.
GIeser, 1.J. (1981). Estimation in a multivariate "errors in variables"
regression model: large sample results. Ann. statist. 9 24-44.
Graybill, F.A. (1969). Introduction to Matrices with Applications in
Statistics. Wadsworth Publishing Company, Belmont, Cal.
-, .
Hampel, F.R. (1973). Robust estimation: a condensed partial survey.
Z. WahrscheinZichkeitstheorie und verw. Gebiete 27 87-104.
Healy, J.D. (1975). Estimation and tests for unknown linear restrictions
in multivariate linear models. Ph.D. thesis, Purdue University.
123
Healy, J.D. (1980). Maximum likelihood estimation of a multivariate
linear functional relationship. J. MUltivariate Anal. 10 243-251.
Hoaglin, D.C. and Welsch, R.E. (1978). The hat matrix in regression and
ANOVA. The Ameriaan Statistiaian 32 17-22.
Huber, P.J. (1964).
Robust estimation for a location parameter.
Ann.
Math. Statist. 35 73-101.
•
Huber, P.J. (1973). Robust regression: asymptotics, conjectures and
Monte Carlo. Ann. Statist. 1 799-821.
Huber, P.J. (1977).
Robust Statistiaal Proaedures.
Huber, P.J. (1981).
Robust statistias.
SIAM, Philadelphia.
Wiley, New York.
Jure~kova, J. (1977).
Asymptotic relations of M-est~ates and R-estimates
in linear regression model. Ann. statist. 5 464-472.
Kurnmel, C.H. (1879). Reduction of observed equations which contain more
than one observed quantity. The Analyst 6 97-105.
Lai, T.L., Robbins, H. and Wei, C.Z. (1978). Strong consistency of least
squares estimates in multiple regression. Proa. NatL. Aaad. Sai. USA
75 3034-3036.
Lindley, D.V. (1947). Regression lines and the linear functional
relationship. J. Roy. Statist. Soa. SUpp. 9 219-244.
Madansky, A. (1959). The fitting of straight lines when both variables
are subject to error. J. Amer. statist. Assoa. 54 173-205.
Neyman, J. and Scott, E. (1946). Consistent estimates based on partially
consistent observations. Eaonometrika 16 1-32.
Okamoto, M. (1973). Distinctness of the eigenvalues of a quadratic form
in a multivariate sample. Ann. Statist. 1 763-765.
RelIes, D.A. (1968). Robust regression by modified least squares.
thesis, Yale University.
Ph.D.
Wald, A. (1940). Fitting of straight lines if both variables are subject
to error. Ann. Math. Statist. 11 284-300.
Yohai, V.J. and Maronna, R.A. (1979). Asymptotic behavior of M-estimators
for the linear model. Ann. Math. statist. 7 258-268.
•