Stefanski, L.A.; (1989)Measurement errors in generalized linear model explanatory variables."

MEASUREMENT ERRORS IN GENERALIZED
LINEAR MODEL EXPLANATORY VARIABLES
by
Leonard Stefanski
Department of Statistics
North Carolina State University
and
Department of Biostatistics
Harvard School of Public Health
Paper Presented
at the
Third International Workshop on Statistical Modelling
Vienna July 4-8, 1988
SUMMARY
Under the assumption that response and explanatory variables
follow a generalized linear model, estimating equations are derived
for the case in which the explanatory variables are measured with
error.
Although the estimating equations are shown to have multiple
solutions, a procedure is suggested for uniquely identifying the
appropriate root.
A by-product of the proposed computational methods
is an informative plot, called the measurement error trace, which
graphically illustrates the effect of measurement error on estimated
parameters.
The first half of the paper reviews the material in Stefanski and
Carroll (1987); the latter half focuses on computational issues.
o.
INTRODUCTION
This paper studies the problem of fitting generalized linear
models to data when explanatory variables are measured with error.
Assuming that measurement error is normally distributed and
independent of both the true explanatory and response variable,
unbiased estimating equations for the generalized linear model
parameters are obtained by conditioning on certain sufficient
statistics.
The estimating equations are suitable for both the
functional and structural versions of the measurement error model.
For the structural semi-parametric version of the generalized linear
measurement-error model, efficient estimating equations are
identified.
Definitions and statement of the modelling assumptions are given
in Section 1.
Estimating equations for functional models are derived
in Section 2, and Section 3 contains material relevant to structural
measurement-error models.
To a great extent Sections 1-3 constitute a
review of the recent paper by Stefanski and Carroll (1987).
A major obstacle to the application of the theory in Sections 1-3
is the nonuniqueness of solutions to the proposed estimating
equations.
This problem was mentioned in Stefanski and Carroll
(1987), although a satisfactory solution to the problem was not given.
The latter half of this paper addresses the uniqueness problem and
certain computational issues which arise in applications.
Sections 4
and 5 explain the manner in which multiple solutions to the estimating
equations arise and a criterion for identifying the desired solution
is proposed.
Section 6 presents a method of computation which in turn
2.
suggests a graphical technique, called the measurement-error trace,
for displaying the effect of measurement error on parameter estimates.
1.
GENERALIZED LINEAR MEASUREMENT ERROR MODELS
1.1
Generalized linear models in canonical form
Throughout this paper attention will be restricted to generalized
linear models in canonical form (McCullagh & NeIder, 1983, Ch. 2).
That is, given a p-vector explanatory variable U-u, it is assumed that
the response variable y has the density
exp{y(<<+~TU)-b(<<+~TU) + c(y,,)}
a(,)
with respect to a sigma-finite measure m(·).
(1 . 1 )
In (1.1), eT-(<<,~ T ,.);
a(·), b(·) and c(·) are known functions; and the dominating measure
m(·) does not depend on e or u.
Table 1.1 gives choices of a(·), b(·)
and m(·) for some common nonlinear models.
Table 1.
Choices of a(·), b(·) and m(·) for some common
generalized linear models in canonical form.
Poisson
Logistic
•
a(,)
Counting
measure on
{0 , 1 , . . . }
Inverse Gaussian
•
Counting
measure on
10g(-1/11)
(11<0)
Lebesque
measure on
_(_211)1/2
(11<0)
Lebesque
measure on
{O, 1}
(O,CD)
(O,CD)
b(l1)
m( • )
Gamma
The key feature these models have in common is a natural sufficient
statistic for u, when u is regarded as a parameter and all other
3•
parameters are assumed known.
This is crucial to the theory presented
later and thus the necessity of the restriction to canonical models.
Unfortunately, some canonical models entail restriction on a+aTu, e.g.
Gamma and Inverse Gaussian, and thus they are less desireable, from a
modelling viewpoint, than certain noncanonical models.
1.2
The measurement error model
In a generalized linear measurement error model a proxy, X, is
observed in place of U.
It is assumed that conditioned on U-u, the
observable random variable X has the normal density
12\-1/2
T--1
h x (x;9,u) • (2n)-P/2 exp{-(1/2)(x-u) Q (x-u)}
( 1 .2)
where 2 is the covariance matrix of the measurement error vector, x-u.
Note that like (1.1), (1.2) possesses a natural sufficient
statistic for u when the other parameters are assumed known.
A generalized linear measurement error model is obtained by
combining (1.1) and (1.2) under the assumption that Y and X are
conditionally independent given U.
The resulting density of (Y,x)
conditioned on U-u is then
(1 . 3)
For an independent sequence of random variables (Y.1 ,X.)
1
(i-1, •. ,n), let (U i ) (i=l, .. ,n) denote the corresponding sequence of
unobserved covariables. Depending on the nature of the sampling and
the inferences to be drawn, it may be appropriate to regard (U i )
4.
either as a sequence of constants or as a sequence of independent and
identically distributed random variables.
In the former case a
functional model is obtained while the latter case is termed a
structural model.
structural models can be further characterized as
parametric or semiparametric depending on whether the distribution of
U is specified parametrically or nonparametrically.
Functional models
and nonparametric structural models are studied in this paper.
Not all of the parameters for all versions of model (1.3) are
identifiable and thus some additional information is required.
It
will be assumed that
2/a (+)
where S2 is known.
-
( 1 .4)
S2,
In simple linear regression (1.4) reduces to the
common identifiability assumption that the ratio of measurement-error
variance to the equation error variance is known.
In models for which
a(+)-l, for example, logistic and Poisson regression, (1.4) requires
known~
that the measurement-error covariance matrix is
2.
2.1
FUNCTIONAL MODELS
The functional likelihood
Under the assumptions of the functional model there are n+p+2
unknown parameters, a,~T,+ and u 1 , .. ,u n . Given data (Yi,X )
i
(i-l, .. ,n) the functional log likelihood is
L ( e, u 1 ' .. , un)
=.
n
I
1-1
log {h y X(Y . , X. ; e, u. ) } .
'
1
1
( 2 .1 )
1
For the normal linear model it is known that maximization of
(2.1) with respect to (a,~T,+,ul, ... ,un) results in consistent
estimators of the regression coefficients
~
and
~
(GIeser, 1981).
5.
However, for nonlinear models maximization of (2.1) is neither
computationally attractive nor is it guaranteed to yield consistent
estimators.
In logistic regression it is known that the functional
maximum likelihood estimator of (a,~T) is not consistent (Stefanski
and Carroll, 1985).
This is a classic example of the failure of the
method of maximum likelihood in the presence of an increasing number
of nuisance parameters (Neyman and Scott, 1948).
2.2
Unbiased estimating equations
Note that (1.3) can be written
hy,X(y,x;e,u) - q(&,e,u)v(y,x,e)
( 2 .2)
where
q
(
r
0,
e
,u
)
T -1
T -1
T
exp{u 2 & _ u 2 u+2b(a+~ u)}.
a(+)
2a(+)
,
v ( y,x,e ) - exp {
2ay-x T 2-1 x
*
}
2a(+>
+ C (y,+)
& - &(y,x,e) -
x+y2~
c*(y,+) - c(y,+) - (1/2)log[{2na(+)}PI2I].
Thus viewing u as a parameter and
a,a
and + as fixed, the statistic
6 - 6(Y,X,e) - X+Y2a
is sufficient for u.
(2. 3)
As a consequence, the distribution of YI6 does
not depend on u, and this fact can be exploited to derive unbiased
estimating equations for e which are independent of u.
write h yI6 (yl&;e) for the conditional distribution of YI6=o.
routine derivation establishes that
A
6.
h yI6 (yl&;9) - exp[yn -
(1/2)y2 a TQa/ a (+) + c(y,+)
- log{S(n,a,+)}],
where n -
(2 •4 )
(~+aT&)/a(+) and S(·,·,·) is determined by the requirement
that
(2. 5)
Since m(·) does not depend on 9 it follows from (2.5) that
.
I h yI6 (yl&;9) dm(y) - 0,
.
where h yI6 (yl&;9) -
( 2 .6)
(a/a9)h yI6 (yl&;9) .
.
Define Ws (y,x,9) - hYI6(ylx+y2a;9), then it follows from (2.6)
that E 9 {w s (y,X,9)} - E 9 [E 9 {W s (Y,X,9)16}] - O.
Any estimator 9 solving
s
A
n
.1: Ws (Y i ,X i ,9 s ) - 0
(2.7)
1-1
will be called a sufficiency estimator.
The score,
ws '
is given by
7.
{y - E(YI6-&)}/a(,)
ws(y,x,e) -
{y - E(YI6-&)}/a(,) -
r (y , x , e ) -
E{
2
{y2_ E (y 16=&»2a/a(.)
( 2 .8)
r ( Y , X , e) I 6- &}
A second unbiased estimating equation is found by adopting an
approach due to Lindsay (1980,1982,1983).
The conditional score, wc '
is defined via
{y-E(YI6-&)}/a(,)
{y-E(YI6-&)}t(&)/a(,)
( 2 .9)
r (y, x, e) -E { r (Y, X, e) 16-&}
where t(·) is a p-vector-valued function not depending on (Y,X).
With
this restriction on t(·) note that
E[{Y-E(YI6)}t(6)] - E(t(6)E[{Y-E(YI6)}16]) - 0
and thus Wc is unbiased.
An estimator satisfying
n
E w (Y.,x.,e) . 1 c
1
1
C
0
(2.10)
1-
will be called a conditional estimator.
The conditional score depends on t(·) which must be specified.
Ideally t(·) would be chosen to minimize the asymptotic variance of
e . However, in a functional model the optimal choice t(·) depends on
c
the particular sequence, lUi}' of covariates (in fact no choice of
t(·) is better than t(6 i )-u i ), and thus no globally optimal solution
to this problem exists. A related problem is noted by Cox and Hinkley
(1974, p. 146) in connection with hypothesis testing.
8.
The fact that t(Ai)-u i is optimal, and thus so to is any one-toone linear function of u i ' suggests choosing t(·) so that E{t(A i )} is
a one-to-one linear function of u i . Thus simply taking t(A)-A is
suggested (E(A)-u+E(y)Q~). Another possibility is suggested by the
facts that X is unbiased for u and A is sufficient for u (assuming e
fixed) and thus t(A)-E(XIA) is a uniformly minimum variance unbiased
estimator of u (again assuming e fixed).
E(XIA) -
Note that
A-E(YIA)Q~.
3.
3.1
(2.11)
STRUCTURAL MODELS
The structural likelihood
Consider the nonparametric structural model defined in Section
1.2.
The joint density of (Y,X) is given by
fy,X(y,x;e,g) -
f hy,x(y,x;e,u)g(u)dv(u)
where hy,x is defined jn (1.3).
( 3 .1)
This constitutes a semiparametric
model with parametric component e and nonparametric component g.
The
density g is assumed to be an element of G, a family of densities with
respect to Lebesgue measure, denoted v(·).
LetR(y,x,e,g) - log fy,x(y,x;e,g) and }(y,x,e,g) (a/ae)1(y,x,e,g).
If g(.) were known then
J
would be the efficient
score for e.
Assuming that differentiation and integration can be interchanged
in (3.1), a useful expression for
J. is
obtained by noting that
9.
I(y,x,9,g) • !(a/a9)log(h)hgdv
~
Ihgdv
• E{(a/a9)log h(y,xi9,U)IY=y,X=x} .
.
Thus if g(o) is viewed as a prior for u, then J has the interpretation
as the posterior expectation of the functional maximum likelihood 9score.
Furthermore, since
~·X+YQ~
is sufficient for u in the
conditional model, (1.3), the conditional distribution of UIY,X is the
same as that of
UI~.
Therefore,
R(y,x,9,g) • E{(a/a9)log
3.2
( 3 .2)
h(y,xia,U)I~=x+yQ~}.
Efficient estimating equations
In model (3.1) interest lies primarily in estimation of a, i.e.,
g(o) is a nuisance function.
The conditional and sufficiency scores
of Section 2 are appropriate for the structural model of this section
in the sense of being unbiased but they are generally not efficient.
In this section the efficient a-score is derived and is shown to
be equal to a conditional score (2.9) with
t(o).E(UI~-o)o
Efficiency
is defined in the sense of Pfanzagl (1982, Ch. 14), Begun et ale
(1983) and Lindsay (1983, 1985).
Assume that the family of densities {hy(Yin)}, obtained by
setting n·a+~Tu and fixing, in the right hand side of (1.1), is a
regular exponential family for n
intervals
(-~,O),
(O,~),
(-~,~).
= RXRPXR+ and 9 an element of G.
€
H where H is one of the three open
Let a = (a,~ T ,.) be an element in e
write ~ = (a,g) and with supp(g)
denoting the support of g, define T={~:a+aTu
8
H for u
8
supp(g)}.
The parameter space for the nonparametric structural model is T.
10.
Let 5 be the class of estimating equations, W, satisfying for all
1:' in T:
(i )
·T
E1:'{(3/3a)w(Y,x,a)} - -E1:'{w(Y,x,a)j (Y,x,a)},
(i i )
2
E1:'{llw(y,X,9)11 } <
(iii)
aD.
If G is complete in the sense defined below, then it transpires
that every score in 5 must be conditionally unbiased with respect to 6
(Theorem 3.1), i.e., if w is in 5 then
E1:'{W(Y,X,916)} - 0, for all 1:' in T.
(3•3)
This allows an easy derivation (Corollary 3.1) of the efficient
estimating equation for a.
Definition.
A collection of functions, H, is said to be complete
with respect to a measure
p
if a necessary condition for
I t(s)h(s)dp(s) • 0,
for all h € H, is t(·).O p-almost surely.
For a fixed a
measure on {u
€
€
a
let Ga-{g
RP:a+~Tu
Theorem 3.1.
with respect to va.
€
€
G:(9,g)
€
T} and let va be Lebesgue
H}.
Assume that for each fixed a
Then if w
€
€
a, Ga is complete
5, E1:'{w(Y,x,a) 16}=0 for all 1:'
€
T.
11.
The positive-definite matrix Vw·{E(wiT)}-lE(wwT){E(jwT)}-l
measures the efficiency of w as an estimating equation.
Under
regularity conditions V is the asymptotic covariance matrix of
w
1
2
n / (e_a) when e is a consistent estimator of a satisfying
Ew(Yi,xi,a)-o.
Let
*
w -
. Y, X, a, g) I ~-x+yS2a}
J. (y, x, a, g) -E {~(
and let V * be the associated covariance matrix.
w
(3. 4)
The following result
states that w*- is the efficient a-score.
Vw~Vw*
Corollary 3.1.
for all w
€
S.
proofs of Theorem 3.1 and its corollary can be found in Stefanski
and Carroll (1987) and will not be given here.
To find w* note that Y and U are conditionally uncorrelated given
~.
This follows from the facts that the a-field generated by
~
is
contained in the a-field generated by (Y,X), and the conditional
distributions of UI(Y-y,X-x) and
E(YUI~)
-
E{E(YUIY,X)I~}
-
UI~-x+yS2a
E{YE(UI~) I~}
-
are identical.
E(YI~)E(UI~).
(3.2) imply that
*...
w -
i -
.
E(ll~)
- E(h/hIY,X) from which one finds
.
- E(h/hIY,X) - E{E(h/hIY,X)
.
E(h/hl~)
I~}
Thus
This fact and
12.
{y - E(YI6-o)}/a(+)
w* (y,x,e)
-
r (y , x , e ) -
E{
Comparison with (2.9) shows that
E(UI6-o).
Note that
distribution.
(3•5)
{y - E(YI6-o)}E(UI6-o)/a(+)
r ( Y , X, e) I 6=- 0 }
w*
is a conditional score with t(o)
6-X+Y2~-U+Z+Y2~
where Z has a N{O,a(+)2}
This implies that
E(UI6) - 6 But by (2.11), 6 -
E(YI6)2~
E(YI6)2~
- E(ZI6).
- E(XI6); also it can be shown that
.
E(ZI6-o) - -a(+)2f 6 (o)/f 6 (o)
.
where f (o) is the density of 6 and f 6 (o)
6
=:
.
(3/30)f 6 (o).
Thus
E(UI6-o) - E(XI6-o) + a(+)2f 6 (o)/f 6 (o).
(3.6)
Fully efficient estimators of (<<,~T) in the linear model have
been given by Bickel and Ritov (1986).
4.
LOGISTIC REGRESSION
In this section the logistic model is used as a means of
illustrating and comparing the various estimating equations.
The
logistic model assumes that
pr e (Y- 1 I U- u ) - F(<<+~Tu)
where F(t) =: l/(l+e- t ).
For this model a(+)
measure on {O,l}.
5
1 and m(·) is counting
The conditional distribution of Y16=o is given by
( 4 .1 )
The sufficiency score is, by (2.8),
=:
13.
Note that if d * is defined as d *-d-(1j2)2a, with a similar
notation for ~*-~-(1j2)2a, then
( 4 .3)
i . e. , conditioned on d * -6 * , Y follows a logistic model.
This closure
property (equality of the conditional distributions of YIU and YI6 *
where 6 * is some function of 6) seems only to hold for the normal and
logistic models.
The sufficiency score is a conditional score for the particular
choice t(6)-6-2a.
In Section 3 it was suggested that taking
E(Xld-6) should lead to promising estimating equations.
t(~)
=
For the
logistic model
(4•4 )
Fully efficient estimation requires that
.
2f6(~)jfd(6),
t(~)
-
E(XI6-~)
+
see equation (3.6).
All of the scores for the logistic model share a common problem;
the associated estimating equations may possess multiple roots.
Consider, for example, the conditional score with t(&)
let
=
&-(1j2)Qa and
14.
Note that if Y.-1,
1
"'c(Yi,Xi,a) - {I _ F(Ot+(3TX.+(1/2)(3T Q(3)}( 1 \
1
~i+Q(3/~
and if (3T 2(3
4
~,
",
c (y.,x.,a)
1 1
4
0; also if Y.-O,
1
~
"'c(Y' ,X. ,a) _ -F(Ot+(3T X1.-(1/2) (3T 2(3)
1
)
X.-2(3/2
1 1
1
and again "'c(Yi,xi,a)
4
0 as (3T 2(3 4~.
Thus if 11(31 I
4
~ in such a
~, Gn (Ot,(3) 4 O. The manner in which "'c depends on (3
through (3T 2(3 makes the score behave similar to "redescending" scores
way that (3T 2(3
4
which find applications in robust statistics.
A consequence of this
behavior is the fact that Gn (·,·) may have multiple roots not all of
which lead to consistent sequences of estimators.
Figure 1 displays a graph of the second component of Gn (O,(3) vs (3
for ~ e [0.25,10.25]. Sample size, n, was set at 100; the (U i ) are
distributed as standard normal random variates; the (Xi) were
generated according to the model
X.1
- U.
+ IQ
z.
1
1
where (Z.)
are standard normal random variables independent of the
1
(U i ); 2-1; and Yi were generated according to model (1.1) with a=O and
~-1.
It is evident that Gn(O,~) contains multiple roots in the
interval [0.25,10.25].
The problem of multiple roots is not specific to logistic
regression.
It results from the fact that (3 enters model (2.4)
15.
quadratically through the term ~TQ~.
The next section discusses the
problem of multiple roots and suggests a strategy for locating the
appropriate root.
5.
THE PROBLEM OF MULTIPLE ROOTS
It is often the case that (2.7) and (2.10) have multiple roots.
This is not a finite-sample problem; it persists asymptotically.
Thus
a strategy is required for selecting the correct solution to
equations (2.7) and (2.10).
As a means of motivation, the problem of multiple roots will be
discussed first in the context of the simple linear errors-invariables version of model (1.3).
The insights gained from this
investigation are then generalized to nonlinear models and illustrated
in the context of logistic regression.
Let
~o
be a scalar and suppose that given U-u, y has a normal
2
distribution with mean a+~ou and variance a • The conditional
distribution of Y16-6 is normal with variance
T
(a+~o6)/(I+~oQ~o).
a2/(I+~~Q~o) and mean
It can be shown that for this model the estimating
equations (2.7) imply that
a
and
~s
s
-
Y-~ s X
solves
A2
-~sQSyX
+ (SyyQ-SXX)~s + SyX
=
0,
where
n
SyX
= . t 1 (Y.-Y)(X.-X)i
1
1
1-
Syy
=
n
t
i-I
_ 2
(y.-y) .
1
n
_ 2
SXX= t (X.-X)
i=l 1
(5.1)
16.
This quadratic equation has two real roots, a s ,l and a s ,2 (Kendall and
Stuart 1979, Ch. 29), converging to a o and -l/Qa o ' (ao~O). A number
of root-selection criteria can be formulated for this model but
unfortunately they do not generalize to nonlinear multiple-regression
models.
For example, the correct root, a s, l' has, at least
asymptotically, the same sign as the so-called naive estimator,
aN-syX/SXX·
Consider the function
(5.2)
obtained by replacing Q with
~Q
in (5.1).
Note that when
~-O
the
equation
f(a,~)
( 5 .3)
- 0
has the unique solution aN-SYX/Sxx; while for
identical.
~-1,
Since (5.3) has a unique solution at
(5.3) and (5.1) are
~-O,
the implicit
function theorem guarantees the existence of a unique function,
solving (5.3) for all
~
in some neighborhood of
that a(~) exists uniquely for all ~
€
~-O.
a(~),
It transpires
[0,1] and that a(l)-a s, 1.
That
is the "correct" root of (5.1) is determined by the fact that it lies
on the locus of solutions,
{a(~):O~~~l)}
to (5.3), which in turn is
uniquely determined by the condition that a(O)=Syx/Sxx'
These ideas
are illustrated in Figure 2.
Now consider logistic regression and the estimating equations
given in (4.5).
Define
Gn(a,a,~)
for
0~~~1
by
17.
Note that when
likelihood.
~-O,
Gn(<<,a,O) is the gradient of an ordinary logistic log
Consequently, except in cases of quasi- or complete
separation (Santner and Duffy, 1986), the equation
Gn(<<,a,O) - 0
possesses a unique finite solution, eN -
A
AT T
(~,aN)
, which is easily found
using a Newton-Raphson iteration; eN is the so-called naive estimator.
As in the linear model the implicit function theorem guarantees the
existence of a unique family of solutions,
Gn(<<,a,~)
~
€
to the equation
- 0,
such that e(O)-e N .
If e(~) exists uniquely
[0,1] then the asymptotic arguments in Appendix I suggest
in some neighborhood of
for all
e(~),
~-O
that e(l) is the consistent estimator we seek.
A simple example illustrates the preceeding argument.
Consider the
problem of fitting a no-intercept (<<-0) logistic model to the data
described at the end of Section 4.
Figure 1 indicates that the
estimating equation (4.5) has at least three roots.
plotted the second component of
1.00.
G(<<,a,~)
for
~-O.OO,
In Figure 3 is
0.25, 0.50, 0.75 and
The figure clearly shows which root is connected continuously to
the naive estimator.
Figure 4 contains plots of the same functions
depicted in Figure 3 but for a data set of size n=1000.
This figure
makes it clear that the problem of multiple roots persists asymptotically
and that the suggested root-finding procedure locates the correct root in
this case.
18.
For models other than logistic or linear regression the root-finding
argument proceeds similarly.
with
~2
in place of 2.
o-
Then
Gn (e,~) - n
6.
Let
section.
e(~),
O~~~l
Let
e(~)
w(Y,x,e,~)
denote either (2.8) or (2.9)
solves
-1 n
t w(Y.,x.,e,~).
. 1
1
1
( 5.5 )
1-
THE MEASUREMENT ERROR TRACE
be the solution locus discussed in the previous
Generally e(O) is easily obtained using standard computational
methods; this is the naive estimator.
For
~>O,
e(~)
can be found by
employing a Newton-Raphson interation of (5.5) starting from e(O).
iteration will converge to the desired solution provided
sufficiently close to zero.
generated on a grid of
~
The solution locus,
values
can be
successively by using a
...
e(~i)
is
e(~),(O~~~l),
{O-~O<~l< .. <~k-l}
Newton-Raphson iteration starting at
~
This
A
to compute
e(~i+l).
The
iteration scheme converges to the desired solution provided the grid mesh
is sufficiently small.
Note that
e(~)
is the estimator one would obtain if the measurement
error covariance had been assumed to be proportional to
~2
instead of Q.
It is often the case that a measurement error model is fit to data
primarily for examining the effects of measurement error on estimated
parameters.
In these cases Q may not be known, but represents a best
guess or crude estimate of the measurement error covariance matrix.
plot of
e(~)
versus
~
A
illustrates the nature of the dependence of the
estimated parameters on the magnitude of the (assumed) error covariance.
Since it is similar in design and intent to a ridge trace, the plot of
e(~)
versus
~
is called a measurement-error trace.
Provided
e(~),
19.
(O~L~I),
has been computed by the procedure suggested in the preceeding
paragraph, the measurement error trace is easily constructed.
Figures 5-8 display examples of the measurement error trace for some
different logistic regression measurement-error models.
In each example
one thousand observations were generated according to the logistic model
with ~-1, ~T_(O.OO,O.25,O.50), U - N(O,I ), Z - N(O,I ) and X_U+2 1 / 2 z.
3
3
Figures 5-8 differ only with respect to the choice of 2.
For Figure 5,
2-2 5 -1 3 ; for Figure 6
2 - 26
for Figure 7,
2 - 27
and for Figure 8,
-G
{~
0
0
0
~}
0
1_1/2
2
2- 1 /2
1
o
~-1/2)
;
g).
1 ,
Only the estimated slope coefficients, a(L), are plotted in Figures
5-8.
Recall that a(O) is the naive estimate, i.e., the estimate obtained
by fitting a logistic model to the observed data ignoring measurement
error; a(l) is the errors-in-variables estimate.
Figures 5-8 clearly
illustrate those parameters which are affected by the covariable
measurement error.
ACKNOWLEDGEMENTS
Support for this work was provided in part by a Cooperative
Agreement between Harvard University, SIMS, and the Environmental
Protection Agency, and the National Science Foundation.
20.
REFERENCES
Begun, J.M., Hall, W.J., Hwang, W.M. & Wellner, J.A. (1983).
Information and asymptotic efficiency in parametric-nonparametric
models. Ann. Statist. 11, 432-52.
Bickel, P.J. & Ritov, Y. (1987). Efficient estimation in the errorsin-variables model. Ann. Statist. 15, 513-40.
Cox, D.R. & Hinkley, D.V. (1974).
Chapman and Hall.
Theoretical Statistics.
London:
GIeser, L.J. (1981). Estimation in a multivariate 'errors-invariables' regression model: large sample results. Ann. Statist.
9, 24-44.
Kendall, M.G. & Stuart, A. (1979).
2. London: Griffin.
The Advanced Theory of Statistics,
Lindsay, B.G. (1980). Nuisance parameters, mixture models, and the
efficiency of partial likelihood estimators. Philos. Trans. Roy.
Soc. London Sere ~ 296, 639-65.
Lindsay, B.G. (1982). Conditional score functions:
results. Biometrika 69, 503-12.
some optimality
Lindsay, B.G. (1983). Efficiency of the conditional score in a
mixture setting. Ann. Statist. 11, 486-97.
Lindsay, B.G. (1985). using empirical partially Bayes inference for
increased efficiency. Ann. statist. 13, 914-32.
McCullagh, P. & NeIder, J.A. (1983).
London: Chapman and Hall.
Generalized Linear Models.
Neyman, J. & Scott, E.L. (1948). Consistent estimates based on
partially consistent observations. Econometrica 16, 1-32.
pfanzagl, J. (1982). Contributions to a General Asymptotic
Statistical Theory. New York: Springer-verlag.
santner, T.J. & Duffy, D.E. (1986). A note on A. Albert and J.A.
Anderson's conditions for the existence of maximum likelihood
estimates in logistic regression models. Biometrika 73, 755-58.
stefanski, L.A. & Carroll, R.J. (1985). Covariate measurement error
in logistic regression. Ann. Statist. 13, 1335-51.
Stefanski, L. A. & Carroll, R.J. (1987).
Conditional scores and
optimal scores for generalized linear measurement error models.
Biometrika 74, 703-16.
21.
APPENDIX 1
An argument is given which suggests that the root selection
procedure presented in Section 5 works asymptotically.
Let A denote an element in RS and ~ an element in [0,1].
Let
{G(·,·), G1 (·,·), G2 (·,·), •.•. } be a sequence of functions defined on
S
R x[O,l], taking values in RS such that Gn (·,·) converges to G(·,·)
uniformly on compact sets in RS x[O,l].
Assume that G(A,O) has a unique root, i.e., the equation
G(A,O) - 0
(A.1)
has a unique solution, A •
O
e s [0,1],
element of
Also assume that there exists a unique
a, such that
G{a(~),~}
- 0 for all
~
€
[0,1].
(A.2 )
Note that a(O)-A o .
Finally, make the assumption that for each fixed
isolated root of
G(a,~)-O
uniformly
in~.
~,
a(~)
is an
More specifically it is
assumed:
There exists an n>O, not depending on
G(A,~)-O
and
I IA-a(~)1 I
~
n imply
~,
such that
A=a(~),
for all
~.
(A.3 )
Under these conditions it is possible to prove the following
result.
proposition A.1:
If for each n, there exists a n (·)
that
Gn{an(~)'~}
and an(O)
~
AO '
then
= 0 for all
~
€
[0,1)
€
e S [O,l]
such
22.
lim
n
Proof:
sup
't'
119n ('t')-9('t')II-O.
It is sufficient to show that given any
~
> 0,
limsup sup I 19 n ('t')-9('t')I I ~ ~.
n
't'
Note that it is also sufficient to consider only those
(A.4 )
~
for which
where n is defined in (A.3).
o<~<n,
Suppose there exists some
hold.
~,
O~~~n,
for which (A.4) does not
It is shown that this leads to a contradiction.
Define Dn (·), «n' and 't'n by the equations
Dn('t') - 119n ('t') - 9('t') II;
«n - sup Dn('t') - Dn('t'n)·
't'
If (A.4) does not hold then it is possible to find a subsequence
{n k }, such that «n
>~
k
for all n k •
Since
°n k (0)
~
0, On ('t'n ) - «n
k
k
and Dn (·) is a continuous function, it follows that for n k large
. 't'n.
*
enough, On ('t')-~ for some 't', call lt
Note that the sequences
k
k
{'t'n* } and {9n ('t'n* )} are both contained in compact sets. Thus a
k
k
k
* ~ 't' * ,
further subsequence {n.} can be found along which 't'n.
J
J
9n . ( 't'n* .) --, A,* and
--'0.
J
J
- On. ( 't'n* . )
~
J
~
I I A* -9( 't' * ) I I .
J
Let An. -9 n . ( 't'n* . ) ;
J
J
since Gn
~
J
G uniformly on compact sets
*
Gn ( An . ' 't'n* . ) - G(An.,'t'n.)
J
By continuity of G,
J
J
J
~
O.
k
>~,
23.
and thus
*
Gn(An.'~n.)
* *
- G(A ,~ ) ~ O.
J
J
*
But Gn(An.'~n.) • 0 and this means that G(A * ,~ * ) - O.
J
J
So it has been shown that G(A * ,~ * )-0, and I IA * -e(~ * )1
thus (A.3) implies that A*-e(~*).
I - 0
~
nand
Since I IA*-e(~*) 11-0, this
contradicts the assumption that 0>0.
In the application of the proposition to the root selection
problem, say for example in structural logistic regression, Gn is
given by (5.4) and
G(a,a,~)
- E e {Gn(a,a,~)}.
o
Pointwise convergence of Gn to G, either almost surely or in
probability, follows from the corresponding law of large numbers.
Smoothness conditions on Wc and regularity conditions on the joint
density of (Y,X) will generally guarantee that the convergence is
uniform on compact sets.
The existence .of a unique solution to the equation G(a,a,O)-O
follows from the fact, again under regularity conditions, that
G(a,a,O) is the expected gradient of a convex likelihood function; A
O
is just the limit of the so-called naive estimator.
Note also that by
design, G(ao,ao,l)-O.
Now provided that
~-O,l
aG(a,a,~)/a(a,a)
is nonsingular,
O~~~l
(at
this follows trivially) the Implicit Function Theorem guarantees
the existence of a unique solution to
G(a(~),a(~),~)-O
for all
~
in
some neighborhood of zero (or one) and thus (A.2) and (A.3) simply
24.
rule out exceptional behavior of
G(<<,~,T)
for
O~T~1
and thus will
generally hold under sufficient regularity conditions.
Finally concavity of the usual logistic likelihood insures that
Gn(<<'~'O).O
has, for n large enough, a unique convergent solution.
Thus provided
[«n(T),~~(T)] solves Gn{«n(T)'~n(T),T} • 0, for all T
€
[0,1], the proposition establishes the convergence of «n(T),~~(T) for
all T, and hence that of
(<<n(l),~~(l)) to (<<o'~~).
25.
Figure 1.
Plot of
Gn(O,~)
vs.
~,
(0.25<~<10.25);
regression; sample size, 100; (Score
a
Gn(O,~».
model, logistic
26.
i
I
. 1"'_--.-----------.---~
_~--
_------------
.."" ,tI
(
,
~
'.
\ I
I
I
II
t:s:::I
1"1)
o!""I"-
$LI
I
_.
Io'lII!!j
I"Q
:=
~
1"1)
.....
27.
Figure 2.
Plots of
f(~,~),
~-0(0.25)l,
regression; sample size, 100; (Score
=
vs.
~;
f(~,~)).
model, simple linear
28.
.
_.
.....
,.-...
&,
""
,..
I
I
II
I.
.\\
....
\.
\
I
1'\'1
\.'....
\
1 ••••••1
I
'\, )I
I
\,
I
11
/
.1
I
,I
,,' I'~
•
.1
•
J
1
I
I • I!
I --.1
s
....
I
,.1
11,1.'.1
11.1,,'
:"'>-:.1'"
,& .....
I
I
II .'
II • •" .1
If. Ie .1
I •. •.
If!.
iI. ._ ••K
t.:~::::::::.'.'
I.·..""·
.......... :.·
..
'''';;.
/
~
I
.-
I
I
••
I / _ 1/
(1.
1
I .1 111
I~II
I
.1
"'1'1 ' \
.1 I
1/1
II
""1.
~~~_
...•
11
l
//
29.
Figure 3.
Plot of
Gn(0,~,~),~-0(0.25)1,
vs.
logistic regression; sample size, 100; (Score
~,
~
(0.25<~<10.25);
Gn(O,~,~)).
model,
30.
I
I
31.
Figure 4.
Plot of
Gn(0,~,T),T-0(0.25)1,
vs.
~,
(0.25 <
model, logistic regression; sample size, 1000; (Score
=
~
< 10.25);
Gn(O,~,T)).
32.
J
33.
Figure 5.
Plot of
(~'~1'~2'~3)
-
~(~)
vs.
~,
°<
~
< 1; model, logistic regression,
(1,0.00,0.25,0.50), U - N(0,I 3 ), Z - N(0,I 3 ),
1 2
X - U+2 / Z, 2 - 2 5 (see Section 6); sample size-1000.
34 •
~!J.,:
••
s:a
-.....
IS:!
-
~-
C5:J
::::c:
;:'LI
-
-
1SI
;S)
:;-..."II
~
1S1~
~I
-!
!1,--4
- I'
- - )' .:.
-\
- I -
-
~I"D
.t:cI
~
IS:)
~
~
C!£)
-
.-"'"
-c;r-.
f
-
-
.'.' ....J. ...: ..t..:. ....': .....~. ....
:
:
.
. .
.
.
.
- -
~
.
- --- -
:..~
- - - --
.
- - -: - . .\ - . :- - - - . .: - - - - _:. . - - .
- -
-
1-
-
~
I~
~
-
- -. - - - ~ - -- - - - - -- - - . - .
~
-'
I-
----
..---------I
. -t--.------\:1-- - - - - :-- - - - ---. - - - ~
:
---:s...... ~:.
. I:J
-~
- -- . .. -- -i· - ----- ---- ---- ----
-=
:
~
~
-~-
~
~
~
~
~
-~-
. _I. . .:__. . _:. __ . .: _. __.:\_. . .: . _. . _: _____
~
IS)
~
:;
~
-
-
-
-
~
I
1
~ \
__
I -
-
~
-
I,
-
- -II - - .: - - . - - :- - - - \- -: . - - - ': - . \
-
.
--- - -
-:-
.
-
. .
-
\-:
.
.
--
-
- -
~
-
. .
- . -- - -
.
-
-
.
-
.
~-
- - - - - --
..
J
-I
-II
-
.
~
1-
-
.
-
-- - . - - -:
\\
-
-I
-
I
.
-
.-.
- . . .
.. --
-
\
_
- - - - -- - - . - _. - - - -
- ';
.
-
\ .
.... .
36.
~
oe-t'-;:U:::C:
IS:I
-
~
II:
-
1-
~+1
!
~
-
~
I
-
-:..rI
i
i
-- II I
I _
- - -t- - - - - - . - - - - - - -
-lSI
-
IS)
-e.a.:-
t"-.:I
~
-!"'I- I"P
C£I
IS)
_
_
\- - :- - - - - -: - - - - -: - - - - -
--1- -:- - - - - - - - - - . - -tI --- - - - - -- - - - - --- - I•
i
--
- - - - - - -- - - - - - - -
~
-\
~
~
- _: - - - - _: - - - - -
-- - - -
II~
\
- - - - - - - - - - - - - - - - - - - - - - - - -: - - - 3
..i
~-I...
-..:1'-1
(
_ll
_
_
_
:
-:- - - - -
--
-
- - - - - - - - - -- - - - - -- - - - - ~- - - - -~ - - - - - ~ - - - - -
- - - -
\
I
~
~
~
~
~~
cs::st --.. i- -.-- - -- ---- - - "--- :x.
I
f
-tS:I
- - - ~\ - -
_
-
---
- - - - - - - - - --
-: - - - - -
-
~
---------------
-
\)-
- - ~- - -
i
-~.
-
--
-~-
- - - -
I I -
\-
- - I- - - - - --- - -
~_I.
~
---------1--------
\
\.
I
I.
-
38.
,
~s:a.-
-.....
-
lS:I~i
i
~
-
IS)
IS)
::c
i
~
~1'1)
J:'d
-
IS)
IS:I
!is:I
IS)
~
(;.0,.)
~
-:J1
~
-
\
-
i
~
I
~
-
IS)
r;r-.
~
i
~~ __ _tt__ __ _:_-__ __ _-: __ -'1_ _.~_ _ _ . _ _-~ _ _ _ _ _.~ _ _ _ _ _
I
1-
I ~
~
~
-.f----:- -.--:---II -:- -----: -..-_:----.
:
~
~
~
~
:1
:
:
~I
~
~ -- --:1- -- -:- ---- -: ---
'I
~
~
~
I:
:
:
+- ---- -: -----: ----.
~
------l ----- ----:-----~J ----:-----------
~~
----:- I---~ --.-~ -----~ \ ---~ ----_:- ----
~
-
-1
\
- 1
-:
~
~
~
-
.
-I
-
-
-
-
-I
-
-
\
-
-:
- - - - - - - - :- - - - - _: - - - - - :- - - - -: - - - - - - - - .
~
~
-
-
I ~
~
~
I
I
~
~
- - - - . "- - .1- --.. . - . . .- - . - - --. - -\ - . -. - - . --- - - - .
-....:l
.
1_
.-
I..
.
.
\:
~
~
~..
li-
.
: -r--:-.--V -.. ---~ -.---:- ..--\ .--_:----.
fS)
-
"-D
1
~
~
- - - --- - . - - \ - . - . : . . . - - -- - - - . -- -\1 . - _.. . . . ..
I
-
I
-
.
..
Is
-I
-
.
-
I
.
~
-
II.
-
I,
-
\ .. -
40.
~~;.,I
..&S:I
s:J,.i
::::I:::
-
-
;S)
IS)
.~td:)
I:::'d
-
IS)
~
CJ'I
IS)
r
I : : : -'
~
c....,:J
- i f
I
-
-
ts:J
0--.
i
=1 - - - - --- - - - --- -1- - - -- - - - - --- - - - - -- - - - - -- - - - •
i
1
~
Il-*
:
:
-
-
-
0.
_
_
...
...
-
-,_.
,
~
~
-c...:w,
I
IS:)
... ...
~
-
-
.1
- - -. - - -
... ...
_..
... ...
...
-
-
-
-
'.
..
-
...
..
-
-
-- - ~ - - .- - -
-
- - -- -
-
--
...
...
:
:
-
-
...
...
...
-...
'"
...
...
....
...
-
- - - .- - -
... ... ... ...
--
...
...
.
. -
--
...
...
...
----
::t_. ---:- ---<- - t" - - - - - - --- - - - - -- - - - - -
----
~~- ---:- ----~ --r-~ -----~ ----~ -----:
----
I
I
-
-
!!Wt----_:-----:t::T-a
-
i
~~II
.aa
~
-
:
-
-
:
:
I
- - -
-
-
-
-
-
-
-
-
1-
-
-
-I
I:
:
:
:1
-1- -~ - - - - -~- - - - - -~ - - - - -~ - - - -
I~
~
~
. . . . . . . . . . .
~
_
...
t-~
~- -~
~
I\----------.. -\----------- ----------
~
~
I - - -; - -
- - :- -
-
- - - - -
- - - -
- -
- - - --
-