Gallo, Paul PConsistency of Regression Estimates When Some Variables are Subject to Error."

CONSISTENCY OF REGRESSION ESTIMATES
WHEN SOME VARIABLES ARE SUBJECT TO ERROR
by
Paul P. Gallo*
University of North Carolina at Chapel Hill
Abstract
For a general univariate "errors-in-variables" model, the maxiIm.nn
likelihood estimate of the parameter vector (assuming normality of the
errors), which has been described in the literature, can be expressed in
an alternative form.
In this form, the estimate is computationally
simpler, and deeper investigation of its properties is facilitated.
In
particular, we demonstrate that, under conditions a good deal less
restrictive than those which have been previously assumed, the estimate
is weakly consistent.
AMS 1980 Subject Classifications:
Key Words and Phrases:
62F10, 62F12, 62J99.
Errors-in-variables model, maximum likelihood
estimate, weak consistency.
*This research was supported in part by a National Science Foundation
Graduate Fellowship, and by the Air Force Office of Scientific Research
under Grant AFOSR-80-0080.
2
1.
Introduction.
The estimation of linear regression parameters when some variables
cannot be ascertained due to measurement or observation error is a
problem with a long history in the statistical literature, yet one with a
considerable recent emphasis.
We consider a general "errors-in-variab1es"
model in which some subset of the variables is observed with error (much
of the literature concerns the case in which all variables are subject to
error, with parhcu1ar emphasis on models with just one independent
variable; see Moran (1971) and Kendall and Stuart (1961, Chapter 27)).
Our model is
y
+
+
= X2
£
nxl
+ U ,
where 8 and 8 are vectors of regression parameters to be estimated, Y
1
2
and C consist of observable random variables, Xl and X2 consist of
constants but Xl is known and X2 is not, and £ and U are composed of
random variables such that the rows of [U £] are LLd. with mean zero
and unknown non-singular covariance matrix l: •
~:: :£zJ'
(Models such
as this with the independent variables being constants have generally
been referred to under the title "linear functional re1atlonship." A
related model in which the variables are stochastic has been called a
"linear structural relationship"; see Madansky (1959) for discussion.)
Although in our discussion n will vary, there should be no confusion if
we do not subscript the matrices involved.
We consider maximum likelihood estimation under the assumption
that the errors are jointly normally distributed.
It is well-known that
3
the supremum of the likelihood is infinite unless we impose additional
structure on
~
(and furthermore, that under any conditions on
~
which
yield a solution to the likelihood equations, the estimate obtained is
the same as that obtained by the method of weighted least squares).
The
assumption most frequently made in the literature, and one which we will
adopt, is
~uo
(1.1)
~
=
2
0 ~
0
=
~EUO
2
0
L'
Ella
1
with Lo known .
The most detailed results along these lines can be obtained from the work
of GIeser and his students, who considered multivariate regression
models.
In our model, let
(1. 2)
(Ai (A) denotes i th largest eigenvalue of A) ,
and
g' = (gi
g2) is an eigenvector associated with 8 .
lxl
Healy (1975) has shown that if g2
exist and are given by:
(1.3)
~ 0,
then the MLE' s of Sl and 02
4
In Section Z, we demonstrate that the MLE can be expressed in
alternate forms which are easier to interpret than (1.3), as well as
computationally simpler.
These "simpler forms" also facilitate deeper
investigation of certain properties of the estimate.
In Section 3, we
/\
/\
consider one such aspect:
we demonstrate that B and B are weakly
l
Z
consistent estimates under conditions weaker than those which have been
previously shown.
Z.
The MLE under Normality.
In this section, we will make use of the following obvious notation:
X = [Xl
XZ]
L:*
C* = [Xl
C]
B' = [Bi
B'Z]
-
~EOu
pxp
EU
EOJ
(JZ
L:*'
EU
P = PI
=
~ J
Pz
+
U* = [0 U]
we define
E~, E~o' L:~uo
analogously.
Also, let H
=
[C* Y]' [C* Y].
The main result of this section is:
Theorem 1.
In our model, if the joint distribution of the errors is
absolutely continuous with respect to Lebesgue measure, then the
normality-MLE of
B exists almost surely and is given by
/\
BZ = (C' RC - BEUO) -1 (C' RY- BE EUO)
(Z .1)
/\
Bl = (Xi Xl)
-1/\
Xi (Y - CB z)
with B andR given by (l.Z); we also have
,
5
B=
(2.2)
e=
with
y
-1
, y
(C*' C* - el:~J -1 (C*' y - el:~uJ '
= largest
root of
Il:*o -
yH I
=
o.
A
In the form (2.2), B can be viewed as a modification of the ordinary
least squares regression estimate, which is known to be inconsistent in
the errors-in-variab1es (E.I.V.) case.
In fact, the
est~te
seems to
operate much like the "method-of-moments" estimate described by Fuller
(1980).
In an E.I.V. model in which l:u and l: £u can be consistently and
independently estimated but are otherwise unknown, Puller has proposed
estimates such as
A
A
B = (c*' C* - nl:*)
u
-1
A
(C*' Y- nl:*£u ) •
Under the assLUnption that n -1 X' X converges to a finite matrix,
Healy (1975) showed that n-1 e consistently estimates a 2 ; hence
n-1 el:
uo
£ l:u
in our model.
Thus, while Fuller's method requires an
"external" variance estimate, the maxiJm.ml likelihood approach in effect
produces its own "internal" estimate.
Of course we do not get this for
free; the price we have paid is the additional structure that we have
imposed upon l:.
In proving Theorem 1, we will make use of the following result:
Lerrma 1.
Under the conditions of Theorem 1,
eigenvalue of C' RC with probability one.
e=
-1
A +1 (l:
P2
0
W)
is not an
6
Proof.
Let G
=
ll
[G
G
2l
be the matrix of normalized eigenvectors
G
G12J
22
lxl
associated with the (ordered) eigenvalues of l:-lW with F
o '
partitioned similarly. Thus
l:-l W = GDF
(2.3)
with D
= G- l
o
=~
diag(Al(L
-1
-1
W), ... , A (l: W)).
P2 0
o
Equation (2.3)
implies that
(2.4)
C'RC
=
(l:
uo
GIl + l:
G )(A- 8r )F +
EUO 2l
P2 ll
8rP2
From GIeser (1981), we infer that LuoG ll + l:EuoG2l and Fll are
non-singular a.s. if the error distribution is absolutely continuous; it
-1
follows from a result of Okamoto (1973) that the eigenvalues of L: Ware
o
distinct with probability one (all we need is 8 ~ A (l:-lW)), in which
P2 0
The
result
follows
since
(2.4) implies
A8r
is
non
-singular.
case
Pz
that C' RC -
8r
o
is non-singular a.s.
P2
Proof of Theorem 1.
' RC - el:
From the definition of 8 and G,
uo
C' RY -
(2.5)
[Y' RC - el:'EUO
GIeser (1981) has shown that G
22
eLe:u~
Y' RY - 8
~
f l=
12
0 •
J~zzJ
0 a. s., in which case the MLE exists.
As mentioned above, 8 has multiplicity one a.s., so the left-hand matrix
has rank P2 w. p. 1, and solutions to (2.5) will be determined by
7
equations corresponding to any P2 linearly independent rows of that
matrix.
In light of Lenuna 1, the first P2 rows will do:
(C' RC - el:UO )G12
(2.6)
+ ( C'
RY - 8l: EUO )G2 2
0
=
A
By (1.3), this is S2' which demonstrates (2.1).
For the second part of the theorem, note first that
from which it follows that 8 of eq. (2.2) is the same as that of (1.2)
A
(in this part of the theorem, we want to express S in a form which does
not explicitly refer to our partitions of the matrices involved).
Now according to (2.2),
~:J
=
~~
~
X' C J 1
C'X1
C' C SL
lX'11
X ) -1 + (X'1
uo
~ ~
X' Y
C' Y SL
~) - 1 x'1 CQC'
J
Ella
- (X' X ) -1 X' CQ
X1(X'
X )-1
11
1 1
X'Y
1
1
=
-QC' X (X' X )-1
1
(with Q
=
(C' RC- 8l:
llO
)-1)
(X' X)-l X' Y - (X' X )-1 X' C[Q(C' Y- 8l:
1 1
1
1 1
1
Ella
) - QC' X (X' X )-1 X' Y]
1
=
Q(C'Y-8l:
which implies
C'Y-8l: EUO
Q
1 1
Ella
) - QC'X (X' X )-1 X'Y
1
1 1
1
1 1
1
8
S2 = Q(C' RY - 8E cuo)
=
(C' RC -
8E
uo
) -1
(C' RY -
8E
cuo
)
o
These agree with (1.3) and (2.6).
3.
Consistency.
1\
Various results concerning weak and strong consistency of S in our
model and related models have been described by Healy (1975), Bhargava
(1975), and GIeser (1981).
(3.1)
Generally, all require that
limn -1 X' X exists and is positive definite .
~
Such a condition on X is Jm.lch stronger than conditions which have been
shown to be sufficient for consistency of the usual linear regression
estimate (the special case of our model with P2 = 0).
In recent years,
results of increasing strength and generality on this matter have been
produced:
see Eicker (1963), Drygas (1976), Anderson and Taylor (1976),
Lai et al. (1979).
Conditions on the errors vary somewhat among these
papers, but the condition on X which is crucial to all of them is
(3.2)
A (X I X)
P
-+
00
as n
'+
00
•
We would like to find conditions "intermediate" between (3.1) and (3.2)
1\
which are sufficient for weak consistency of S.
Theorem 2.
(A. 1)
If the following conditions on X are satisfied:
n -~ A (X I X)
P
-+
00
as n
-+
00
9
(A.2)
-+
00
as n
-+
00
,
and the joint distribution of the errors possesses finite fourth moment,
then
8 R.
B as n
-+
00 •
(Note that we are obtaining consistency without
using the assumption that the errors are normally distributed.)
The following simple lennna will be useful:
(i)
Lennna 2.
(ii)
Proof.
Al (XZRX 2 )
:S
Al (X' X) ;
letting (X' X) -1
=
L
1
PXPI
L ], Al (L LZJ
2
2
X
P P2
:S
Ai (X' X) -1 .
Since (X RX ) -1 is the lower right -hand submatrix of (X' X) -1 ,
Z 2
A (X' X) -1
P
=
inf
II ZII =1
Z' (X' X) -1 Z :s
Z ' (X' RX ) -1 Z = A
inf
2
Ilzll =1
2
Noting that the non-zero eigenvalues of L L
2
P2
2 and L2L2
(X' RX )-1
2
2
are identical,
(ii) follows similarly since L L2 is a lower right-hand submatrix of
2
(X' X) -2 •
0
Proof of Theorem 2.
B=
(C*' C* - 8l:* ) -1 (C*' Y - 8l:* )
uo
EUO
= (I
P
+ (X' X)-l (X'U* + U*' X + (U*'U* - nl:*) + (na
u
x (X' X)-l (X' Y + U*' XB + (U*t
E -
2 - 8)l:* ))-1
nl:* ) + (na
EU
00
2
- 8)l:*
EUO
) •
10
Clearly, it will suffice to show that:
R0
(X' X) -I X'U*
(i)
ex' X) -I
(ii)
U*' X R 0
(iii)
(X'X)-l (U*'U* - n~*)
u
(iv)
Ina Z - 81 (X' X) -I R 0
(v)
(X' X) -I X' y
Rs
(vi)
(X' X) -I (U*'
E
-
£0
nL* )
EU
R0
.
Eicker (1963) has shown (v) when A (X' X)
-+ 00 , which is of course true by
p
(A.l); (i) also follows immediately from his work under the same
condition.
Note that U*' U* - n~*
u
1
= 0p (n~)
if the errors have finite fourth
moments, so (iii) holds if (X' X)-l
= o(n-~),
The same argument demonstrates (vi).
(X' X)
-I
which follows from (A.I).
U*' X
L XZ. •
k kJ
Thus (ii) is satisfied if
(i ,j) th element has mean zero and variance
2.
the i th column of L
2) -+
max diag(X' X) • max diag(L L
Z
= LZ U' X; the
p~ ~
1
U
P., where P.
1
is
1
0; this is seen to be equivalent to
(A.Z) using Lemma Z(ii).
Letting k henceforth denote Al (X' X)
Z
(iv) , which is equivalent to k(e - na )
Z
if and only if k(e -na )
o
~
Xi RXZ
BzXi RX Z
R O.
,we need only demonstrate
Note that e
= A +1(k(~-lw-naZ I +1));
Pz
0
Pz
this converges to zero in probability.
D = kL- l
-I
Xi RXZ BJ
.•
= A +1(L- l
Pz
0
W)
we will show that
Let
As the product of a positive definite
SiXi RX Z S
matrix and a positive semi-definite matrix of rank
Pz'
D has
Pz
positive
11
eigenvalues, its other eigenvalue being zero.
Now
2
l
k(L:- W- n0 I +1) - D
o
P2
. l ~ Xi RlJ
= kL:-
o
+ U'
RX 2
S'X'RU + £' RX
222
+ kL: -1{(U
o
= ~ + M2 + ~,
£]' [U
U'RX2 S
n~ +
1J2 X'2 ru:..
Q ,
+XiRe:J
£
'RX 21J 2
Q
£] - nL:} + kL: -1 [U
0
£]' (R - I ) [U
n
£]
say.
Using arguments essentially the same as before, Ml ~ 0 by (A.2) and
_1
Lemma Ui). M2 does likewise since k = o(n ~). Finally, noting that
2
In - R is idempotent, we deduce that E[1%] = _0 kpl I +1' The diagonal
p2
elements of the positive definite matrix L: M3 are positive with
o
expectations going to zero; thus they are 0 (1 ) themselves.
p
Consequently, M £ O.
3
Since eigenvalues are continuous ftmctions of a sequence of
matrices, it follows from the above discussion that
Z
Z
l
A +l(k(L:- W - no I +1))
A +l(D) = 0, and hence k(6 - no )
P2
0
Pz
Pz
a
R O.
0
Our assumptions (A.l) and (A.Z) are intermediate in the sense
mentioned earlier:
(3.1).
either one implies (3.2), while both are implied by
Condition (A.l) requires that X' X "gets large" at a faster rate
than does (3.1) (it can be seen by considering the demonstration of
(iii), e.g. in the proof of Theorem 2, that (3.2) is too weak a condition
for our model).
A simple example in which (3.1) is too weak to ensure
consistency, but where (A.l) suffices, is a situation where p
= P2 = 1,
12
and the independent variable varies linearly with n.
Condition (A.2)
will also hold much more generally than (3.1); it is satisfied, for
example, if (A.l) holds and the independent variables arebOlmded.
Finally, while our requirement of fourth moments of the errors is not
particularly restrictive, we could weaken it if we were willing to
strengthen (A.l) (for example, we would require only finite (2+0)th
.
-[2(2+0)-1]
moment, 0 ~ 0 ~ 2, If n
A (X' X) -+- 00 as n -+- 00) •
p
Acknowledgment
The author gratefully acknowledges the value of many discussions on
this topic with Professor R.J. Carroll, as well as helpful suggestions
from Professors Stamatis Cambanis and P.K. Sen.
References
Anderson, T.W. and Taylor, J.B. (1976). Strong consistency of least
squares estimates in normal linear regression. Ann. statist. 4,
788-790.
Bhargava, A.K. (1975). Estimation of a linear transformation and
associated distributional problems. Ph.D. thesis, Purdue University.
Drygas, H. (1976). Weak and strong consistency of the least squares
estimators in regression models. Z. WahrsoheinZiohkeitstheorie und
Verw. Gebiete 34, 119-127.
Eicker, F. (1963). Asymptotic normality and consistency of the least
squares estimators for families of linear regressions. Ann. Math.
statist. 34, 447-456.
Fuller, W.A. (1980). Properties of some estimators for the errors-invariables model. Ann. Statist. 8, 407-422.
GIeser, L.J. (1981). Estimation in a multivariate "errors in variables"
regression model: large sample results. Ann. Statist. 9, 24-44.
Healy, J.D. (1975). Estimation and tests for unknown linear restrictions
in multivariate linear models. Ph.D. thesis, Purdue University.
13
Kendall, M.G. and Stuart, A. (1961).
Griffin, London.
The Advanced Theory of Statistics.
Lai, T.L., Robbins, 11. and Wei, C.Z. (1978). Strong consistency of least
squares estimates in multiple regression. Proc. Natl. Acad. Sci.
USA 75, 3034-3036.
Madansky, A. (1959). The fitting of straight lines when both variables
are subject to error. J. Amer. statist. Assoc. 54, 173-205.
Moran, P.A.P. (1971). Estimating structural and functional relationships.
J. MUltivariate Anal. 1, 232-255.
Okamoto, M. (1973). Distinctness of the eigenvalues of a quadratic form
in a multivariate sample. Ann. Statist. 1, 763-765.
SECURITY CLASSIFICATION OF THIS PAGE (Wh"11 !Jul-;:;;:;;.,..d)
UNCLASSIFIED
r
READ INSTRUCTIONS
BEFORE COMPLETING FORM
REPORT DOCUMENTArION PAGE
1. REPORT NUMBER
4.
GOVT ACCESSION NO.
TITLE (and Subtitle)
TYPE OF REPORT & PERIOD COVERED
6.
PERFORMING O~G. REPORT NUMBER
8.
CONTRACT OR GRANT NUMBER(s)
AFOSR Grant AFOSR-80-0080
9. PERFORMING ORGANIZATION NAME AND ADDRESS
14.
S
Mimeo Series No. 1328
AUTHOR(.)
Paul P. Gallo
11.
RECIPIENT'S CATALOG NlJMBf.R
TEQINICAL
Consistency of Regression Estimates When Some
Variables Are Subject to Error
7.
J
CPNT~OLLING8fFlCE N Ad:E ANQ ADO~ES.S
Air orce f1ce 0 SC1ent1f1c Research
Bolling Air Force Base
Washington, OC 20332
10.
PROGRAM ELEMENT. PROJECT. TASK
AREA & WORK UNIT NUMBERS
12.
REPORT DATE
February 1981
D.
NUMIJER OF PAGES
IS.
SECURITY CLASS. (01 fIJI. '''''0'')
13
MONITORING AGENCY NAME eo ADORESS(II dlllo'~/1' Itom CO/lt,olllllll Olllce)
...
UNCLASSIFIED
_._DFCL
.. _..... -.__ ._-_._._-----ASSI FICA TlON'
GRADING
1~",
-
16• DISTRIBUTION STATEMENT (01 this Rrpo,t)
•
SCHEDULE
OOW~j
--
Approved for Public:: Release - - Distribution Unlimited
4
__
~_ _ _
17. DISTRIBUTION STATEMENT (<>1 the a/)sttltcf "nlt''''d III Ill"d' 20, II dl/I",ollt lrom Hepart)
18.
SUPPLEMENTARY NOTES
19.
KEY WORDS (Conl/nue on teve,." side 1/ nocessa,y MId Idelltlly by block n"",b ..,)
errors-in-variables model, maximum likelihood estimate, weak consistency
20.
ABSTRACT (Continue on ,eve,.e .Ide II necessa,y and Idellii/y by block n/lm"",)
For a general tmivariate "errors-in-variables" model, the maximum
likelihood estimate of the parameter vector (assuming normality of the
errors), which has been described in the literature, can be expressed in an
alternative form. In this form, the estimate is computationally simpler, and
deeper investigation of its properties is facilitated. In particular, we
demonstrate that, tmder conditions a good deal less restrictive than those
which have been previously a~sumed, the estimate is weakly consistent.
DD
FORM
1 JAN 73
1473
EDITION OF 1 NOV 6S IS OBSOLETE
UNCLASSIFIED
•
SECURITY CLASSIFICATION OF
TU"
PAGE(Wh~n
V.I.
En/e,~d)