10/01/85
•
COIIJ)ITIOIiAL SCORES .ARD OPTIMAL
•
SCORES FOR. GElIERALIZED LINEAR
by
Leonard A. Stefanski
Department of Economic and Social Statistics
Cornell University
Ithaca, New York 14853
and
.
•
..
Raymond J. Carroll
Department of Statistics
University of North Carolina
Chapel Hill, North Carolina 27514
SUHMAR.Y
In this paper we study estimation of the parameters of generalized
•
linear models in canonical form when the explanatory vector is measured
with independent normal error.
explanatory vectors are fixed
For the functional
constants~
case~ i.e.~
when the
unbiased score functions are
obtained by conditioning on certain sufficient statistics.
This work
generalizes results obtained by the authors <Stefanski & Carroll, 1986) for
logistic regression.
In the case that the explanatory vectors are indepen-
dent and identically distributed with unknown
distribution~
efficient score
functions are obtained using the theory developed in Begun ec a1. (1983).
Related results can be found in Bickel & Ritov (1986).
Some key words: Conditional score function; Efficient score function;
Functional model; Generalized linear model; Measurement error; Structural
model.
Page 1
1•
nrrJlODUCTIOIl
Given a covariate p-vector u assume that Y has the density
T
T
{Y(Q+@ u)-b(<<+P u)
h Y( y,.9)
,u • exp
a($)
+
c(y,.)}
(1.1 )
with respect to a a-finite measure m(·); in (1.1) aT • (<<,aT,.) and
a(·),b(·) and c(·,·) are known functions.
The density (1.1) is that of a
generalized linear model in canonical form (McCullagh & Nelder, 1983, Ch.
2).
ments
Suppose now that u cannot be observed but that M independent measureX.(Xl'."'~)
of u are available.
When measurement error is
normally distributed the matrix X has density
M
hX(x;a,u) •
n
j-1
.
(21r)-p/2
(1.2)
ISilt
Together (1.1) and (1.2) define a generalized linear measurement-er.ror
model with normal measurement error.
If for a sample (Yi,X ) (i-1, ••. ,n)
i
the covariables (u ) are unknown constants, a functional model is obtained;
i
if (u ) are independent and identically distributed r.andom vectors from
i
some unknown distribution, a structural model is obtained (Kendall
Stuart, 1979, Chapter 29).
&
In this paper the problem of deriving unbiased
scores for a in both functional and structural models is studied.
There is a vast literature on this problem in the special case that
..
(1.1) is a normal density.
This dates back to Adcock (l878) and has been
reviewed by Anderson (1976); see also Moran (1971).
Recently there has
been considerable interest in nonlinear measurement-error models; see
Prentice (1982), Wolter
& Fuller (1982a, 1982b), Carroll ec a1. (1984),
Stefanski (1985) and Stefanski & Carroll (1986).
Page 2
The density (1.1) includes normal. Poisson. logistic and gamma
regression models.
The key feature these models have in common is the
existence of a natural sufficient statistic for u when all other parameters
are fixed.
The same is true of the normal density in (1.2).
In fact (1.2)
could be replaced with any density possessing a natural sufficient statis-
.
tic for u when other parameters are fixed and much of the following theory
holds with little or no modification.
However. in the framework of
measurement-error models no other assumption on the error distribution is
more palatable than that of normality and thus the added generality is
sacrificed for a reduction in notational complexity.
In Section 2 functional models are studied and unbiased score
functions for estimating 8 in the presence of the unknown uis are
presented.
This work generalizes and, extends results of Stefanski
Carroll (1986) for logistic regression.
&
Structural models are studied in
Section 3 and efficient score functions for estimating 9 in the presence
of the unknown distribution for u are identified.
These results are
obtained using the theory of efficient estimation developed by Begun et
81. (1983).
Other work in this area includes that of Bickel
& Ritov
(1986).
In the case that the covariates u •... ,un are observed without
l
error the maximum likelihood estimator of
e
maximizes
n
with respect to 8.
I log hY(Yi;8.u i )
i-I
Let Xi be the mean of the M measurements of ui; that
value of 9 which maximizes
n
I log hy(yi;e.x i )
i-I
will be referred to as the naive estimator. This estimator is usually
inconsistent. (Stefanski. 1985) although when filM is small its bias will
be small.
•
Page 3
2.
2.1
FUlICTIOIIAL MODELS
l'he funcc10nal like11hood
Consider the functional version of model (1.1) & (1.2).
In this
section the case M • 1 is studied under the additional assumption that
(2.1)
•
Throughout this section the random variables (YitX ), (i-l t ..• t n) are
i
independent but not identically distributed since their distributions
depend on the true regressors u ' which vary with i.
i
However t for nota-
tional convenience the subscript i will be dropped when referring to (YitX )
i
in those situations where it causes no confusion.
Under (1.1), (1.2) and
(2.1) the joint density of (Y,X) takes the form
(2.2)
h Y,X(y,X;9,u) • h y(y;9 t u)h x (x;9,u).
•
For a set of n observations the log-likelihood is
n
(2.3)
!1 log{hYtX(Yi'Xi;e,ui)}·
i
In the case that Y is normally distributed it is known that under (2.1)
L(8,u 1
,···t U
maximizing (2.3) with respect to
n)
a
(~,a
T
,.,u •.•• ,u ) results in consistent
1
n
estimators of the regression coefficients a and
a
(GIeser, 1981).
For any
model other than the normal, the task of maximizing (2.3) with respect to
its n+p+2 parameters is formidable and not likely to be undertaken.
More
importantly it is not generally true that maximizing (2.3) produces
consistent estimators.
It follows from results in the first author's
University of North Carolina Ph.D. thesis that in the case of logistic
regression the functional maximum likelihood estimator of (a,a) is not
consistent under assumption (2.1); see also Stefanski & Carroll (1986).
The unwieldy functional likelihood, and its failure to produce consistent
estimators in some important cases point to the need for an alternative
theory of estimation which is now pursued.
Page 4
2.2
Unbiased score functions
In this section unbiased score functions for the functional model are
obtained by conditioning on certain sufficient statistics.
Note that (2.2)
can be written as
(2.4)
•
where
q(6,9,u)
T 1
{ u sf 6
• exp
a(4l)
r(y,x,9) • exp
{
T -1
T
u g u + 2b(a+B u)
2a(+)
T -1
2ay-x Sl· x
+ c*(y,+)}
2a(+)
};
(2.5)
.,
6 • 6(y,x,9) • x + yOB;
Thus viewing u as a parameter and a, a and + as fixed, the statistic
(2.6)
6 • MY, X, 9) • X + YOB
is
sufficient for u.
As a consequence, the distribution of YI6 depends
only on the observed variables Y and X and 9,
but not on u.
From "this
conditional distribution it is possible to derive unbiased estimating
equations for 9 which are independent of u.
Let hYI6(ylo;9) denote the conditional distribution of YI6 •
find h
YI6
cs.
To
note that the Jacobian of the transformation which takes (Y,X)
into (Y,x+yga) has a determinant of one.
pr(Y.y,
~.o)dm(y)d6
Thus
• pr(Y.y, X.6-yOB)dm(y)d6
and after some routine calculations one finds
h
where
(10.9)
J(y,6,9)
YI~ y ,
• !J(y,6,9)dm(y)
(2.7)
Page 5
(2.8)
Define
This allows (2.7) to be written as
(2.9)
Note that (2.9) is an exponential-family density with Y as the natural
sufficient statistic for n.
the partial derivatives of
Thus moments of Y16-6 can be computed from
S(n,6.~)
with respect to n. e.g.,
Ea (YI6-6) • (a/an)log{S(n,a,~)}ln·(a+BT6)/a(');
(2.10)
var a (Y\6=6) - (a2/an2)log{S(n.B.+)}ln_(a+BT6)/a(+)·
Since h
YI6
is an exponential-family density it is true that
JhYI~(YI6;a)dm(y) • 0,
(2.11)
where
Thus. defining
(2.12)
it follows that
~s(·.·,·)
is unbiased for e, i.e.,
The inner conditional expectation is zero by virtue of (2.11).
The score
as •
~S
will be called the suff2c2ency score and any estimator
(&s.a~,.s)T which satisfies
Page 6
(2.13)
will be called a
suff~c~ency esC~m8Cor.
Consider the density in (2.4) and let 'Y,X • (atae)hy,x'
Note that
'y.X(y,x;9,U)
h Y,x(y,x;9,u)
E(YI~=~)}/a(')
{y •
{y - E(YI~·6)}u/a(~)
r.(y,X,9) -
E{r.(y,X,9)1~·6}
where
•
r+ ( y,x, 9)
ac
*(y.x.e)
a,
T -1
_ {2av-x 0
2a 2 (+)
x}
.
'(A)
a v'
As the expression in brackets above depends on the unknown covariate u only
as a 'weight' this suggests the class of score functions
{y-E(YI~·6)}/a(.)
~C(y,x,e)
=
{y-E(yl~·6)}Ot(6)
(2.14)
r.(y,x,e)-E{r.(y,X,e)I~·6}
indexed by the vector-valued function t(·).
called a
cond~c~on81
6=x+yOa
The score (2.14) will be
score following Lindsay (1980, 1982, 1983).
natural choices for t(6) might be t(6)-6and
since X is unbiased for u and
~
t(6).Ee(XI~·6).
Some
Note that
is sufficient for u the latter choice
corresponds to replacing u by its uniformly minimum variance unbiased
estimator.
Also since
(2.15)
only the conditional moments of
YI~are
needed to find
Ee(XI~.6).
will be said on appropriate choices for t(o·) in Section 3.3.
More
Page 7
Any estimator g
e
which satisfies
n
i~1 We(Yi,Xi,a C) • 0
will be called a
(2.16)
cond1~ional es~imacor.
The estimating equations in (2.12) and (2.14) are both unbiased.
Although it should be possible to show that under reasonable conditions
there always exist consistent sequences of estimators as and Be
satisfying (2.13) and (2.16) respectively, it is not generally true that
(2.13) and (2.16) define as and g
uniquely.
e
More importantly, there
can exist sequences of solutions to (2.13) and (2.16) which are not consistent, and thus care must be taken when defining 9
S
and 9 '
C
In
practice a couple of solutions to this dilemma are possible. The first
consists of defining the estimators 9
S
and g as the solutions to
e
(2.13) and (2.16) which are closest to the naive estimator introduced in
Section 1.
This rule is justifiable when measurement error is small,
however it can break down
when measurement error is large. This is
discussed in greater detail for the normal model in the next section.
The
second solution entails doing one or two steps of a Newton-Raphson iteration of (2.13) and (2.16) starting from the naive estimator. Again this is
generally appropriate only when the measurement is small. However, in some
realistic sampling situations, Stefanski
& Carroll (1986) show that such an
approach substantially improves upon the naive estimator in their study of
measurement error in logistic regression.
Finally, preliminary work by the
authors suggests that it is possible to deconvolute the empirical distribution function of the observed Xi's to obtain an estimator of the empirical distribution function of the u. 's, which under regularity conditions
1
can be used to construct consistent estimators for the functional model.
These estimators can then be used to uniquely define the more manageable
Page 8
M-estimators, 9
and 9 C'
S
When consistent sequences of solutions to (2.13) and (2.16) are
obtained the asymptotic distributions of 9
S
and B are easily derived
C
since both are M-estimators; see Huber (1967).
2.3
Normal, logistic snd Poisson regression
In this section the strengths and limitations of the estimation theory
are illustrated by studying it in three particular generalized linear
models.
Consider first the case in which Y has a normal distribution with mean
a + aTu and variance
Lebesque measure.
For this model ~ •
02 •
02,
a(~) • $ and m(·) is
Using (2.7) one finds that the distribution of YI6 • 6
0 2 /(1
is normal with variance
+ aTgB) and mean ~ where
a +
l.l. •
aT 6
(2.17)
1 + BTgB
Corresponding to (2.12) one finds
+
~S(y,x,9)
•
~2(Y-l.l.)
~2{(Y-l.l.)2gB-(Y-l.l.)(8-2l.l.gB)}
gB
l+BTnB
-1
2(j2 +
where l.l. is defined in (2.17).
T
(y - u)2(1+a nB)
20 1t
Define
*
T -1 {6(Y ,X ,9)-aQB}
6 i • (I+nBB)
i
i
where 6(.",,) is given by (2.6) and consider the equations
In
i-I
* ( 1*
i
6
T
(Y -a-B 6)
i
i
) - 0
(2.18)
Page 9
It is a simple matter to show that every solution to (2.18) is also a
solution to IIjIS(Y ,X ,9) • 0, Le., any solution to (2.18) is a sufficiency
i i
estimator.
The similarity of (2.18) to the usual normal equations is
However, keep in mind that 6* depends on a and
i
readily apparent.
a
and
thus (2.18) is nonlinear in the parameters.
Note that 6* is also a sufficient statistic for u when a, B, and a 2
i
i
*
are fixed.
with
t1C.
me~n
Because of (2.18) and the fact that given 6 , Y is normal
i
i
* *
a + 8T6 , 6 will be called a conjugate suff1c1ent StSt1si
i
*i is
Also since 6
the functional maximum likelihood estimator for u.
1
in this model (GIeser, 1981), equation (2.18)
shows that the functional
maximum likelihood estimator is a sufficiency estimator.
as •
From (2.18) it follows that
possible to deduce that
as
Y - "'TBSX and using this it is
satisfies
(2.19)
where Yi
*•
Yi - Y, Xi
* = Xi
- X.
Consider (2.19) for the case p • I, Le.,
ratic equation has two real roots (Kendall
as
is a scalar.
This quad-
& Stuart, 1979, Chapter 29);
unfortunately the sufficiency principle does not indicate which root is
appropriate.
Had the equations (2.18) been derived as the gradient of the
functional log-likelihood the appropriate root would have been dictated by
the maximizing principle.
In the previous section it was suggested that in the case of multiple
solutions to (2.13) and (2.16) to pick that solution closest to the naive
estimator and that this selection rule would work as long as the measurement error variance was small.
In this particular case the two roots of
Page 10
(2.19) converge to
and -02/(p
0
~2),
where
~2 • Qa 2
is the measurement
The naive estimator converges to 0 2 8
error variance.
11~
ao
u 0
/(02+~2)
u
is the limiting value of the _sample variance of the true u i
where
IS.
Thus
the suggested selection rule will asymptotically choose the right root
whenever
<
This inequality is satisfied if
The infimum of the right hand side above with respect to the ratio
is 20 2 •
Thus whenever
u
the values of
(12
~2
02
/8 o2
< (12u the selection rule works no matter what
2 • however
and 80'
if
,
~2
> (12u and
02
/8 02 is sufficient-
ly small then the selection rule chooses the wrong root.
This is encour-
aging for it is unusual to have measurement error so large that
~2
~ 02 •
U
To gain some additional insight into the performance of the 8ufficiency estimator suppose that u , •.. ,u are independent normal variates
n
1
with mean
~
u
and variance
case, (Kendall
11 2 ,
u
i.e. assume a structural model.
In this
& Stuart, 1979, Chapter 29) the structural and functional
maximum likelihood estimators are the same, and in light of the previous
discussion this common estimator is also a sufficiency estimator.
Thus in
this particular case the sufficiency score is an efficient score.
*
Finally for the normal model Ee(YiIAi) • a + 8TA.~ and from (2.15)
*
Ee(XiIA i ) • Ai'
Thus
~S
and
~C
define the same estimators, i.e., as • ac ' when
Now consider logistic regression in which
Page 11
F(t) • (1 + e
-t -1
)
•
For this model a(,) a 1 and m(·) is counting measure on {O,l}.
Using (2.7) one obtains
pr8(Y.11~.6) • F{«+(6-tna)T a }
(2.20)
and corresponding to (2.12) is the logistic sufficiency score
(2.21)
and setting I~S(Yi,Xi,9) • 0 results in the equivalent equations
!i-I {Y
*•
~i
where
~i-tua;
i
-
note that
F(a +
aT~:)} (1*) =
O.
(2.22)
~i
* is
~i
a conjugate sufficient statistic.
Stefanski & Carroll (1986) introduced this estimator and show in a Monte
Carlo study that in spite of the possibility of multiple solutions to
A AT T
( 2.22), a modified one-step version of (<<s,B
s ) starting from the naive
estimator, performed well in some realistic sampling situations.
Unlike
the normal model the logistic sufficiency estimator does not correspond to
the functional maximum likelihood estimator, which in this case is not
consistent; see the first author's University of North Carolina Ph.D.
thesis and Stefanski & Carroll. (1986).
In Section 3.3 it is shown that
the logistic sufficiency score is optimal for a particular structural
model.
For logistic regression it is not true that as • aC' when
Indeed with
E6(YI~·6)
given by (2.20),
E(XI~.6) • 6 - F{a+(6-tuB)T B}na
and corresponding to (2.16) are the equations
Page 12
The final model considered is that of Poisson regression in which
pra(Y-klu) - (k!)
-1
T
u) -
exp{k(~+a
exp(~+a
T
u)}.
For this model a(,) - 1 and m(·) is counting measure on {O,l, •.. }.
From (2.7) it follows that
.
(2.23)
CD
! (j!)-lexp{j(~+aT6)
- j
2 BTgB/2}
j-O
Since (2.23) has no closed form the sufficiency scores $S and
messy and are not given.
statistic.
~C
are quite
Note that there is no conjugate sufficient
Also, as in logistic regression, the estimators 9
are not equal for this model when
S
and 9
C
t(&).E(XI~·6).
The conditional distribution (2.23) is more typical of generalized
linear models than are those from the logistic and normal models.
Since in
T
(2.8) the factor y 2 a gB/2a(,) appears in the exponent it is only in special
cases that the denominator of (2.7) can be obtained in closed form.
Thus
implementation of the sufficiency estimators will often require numerical
integration or summation.
3.
3.1
STRUCTURAL MODELS
The structural l1kel1hood
In this section the model studied is the structural version of (1.1)
&
(1.2), , i.e., u , .•• ,u are independent and identically distributed
n
1
observations with unknown density gU(u).
Since it should cause no con-
fusion the subscript U on gu(u) is omitted.
The density g is an element of
G, a family of densities with respect to the measure v(·).
As in Section
2, it is assumed that M-l along with the identifiability condition
(2.1).
Under these conditions the joint density of (Y,X) is
Page 13
f Y,X(y,X;9,g) • f h y ,x(y,x;9,u)g(u)dV(u)
(3.1)
where hy,x is defined in (2.2).
Let
f y ,x(y,X;9,g) • (3/89)f Y,X(y,X;9,g) and assume that
i y ,X(y,X;9,g) • f'y,X(y,X;9,u)g(u)dv(u)
where,
'y,x(y,x;e,u) • (3/39)h Y,X(y,x;9,u)
i.e., assume that differentiation and integration can be interchanged in
(3.1).
If g(.) were known then the efficient score for 9 would be
f y ,X(y,X;9,g)
t(y,x,9,g) • f y,x ( y,x; e
,g) '
and the information available in (Y,X) for estimating 9 would be
Throughout this section interest lies in estimating 9 when g and hence
are unknown.
~
Note that both the sufficiency and conditional scores of
Section 2 are unbiased for the structural model (3.1) also.
Attention
therefore is directed to the problem of finding efficient score functions.
3.2
Eff~cient
score
funct~ons
and information bounds.
Efficient score functions for estimation of e • (~,aT,,)T in the
presence of the nuisance function g(.) are now derived.
As with the
theory in Section 2 the existence of certain sufficient statistics plays a
key role here.
Huang
The derivation draws heavily on the results of Begun, Hall,
& Wellner (1983); see also Pfanzagl (1982, Chapter 14).
The
structural model studied here is a generalization of a model considered by
Bickel
& Ritov (1986).
Whereas they study simple linear regression under a
number of conditions, including that of replicated measurements and our
assumption (2.1), we consider the more general model only under the latter
Page 14
assumption.
However, the approach used here to derive efficient scores
extends quite naturally to the case of replicated measurements when (2.1)
is not assumed.
In the following ¥ denotes the product measure of m(·)x Lebesque
measure on p-dimensional Euclidean space.
L2(~)
As in Begun
e~
a1. (1983) let
and L2(V) denote the L2-spaces of square-integrable functions with
respect to the measures p and v respectively.
these spaces are denoted by
The theory of Begun
abi1i~y
e~
Norms and inner products on
H·n p and (.,.> p , and H·ft v and e·,'> v .
a1. (1983) requires
He11inger-differen~i-
of the square root of (3.1) with respect to (9,g);
Definition 2.1.
see their
It is assumed here, and can be proven under regularity
conditions, that ft,x(y,X;9,g) satisfies condition (2.1) of Begun
a1. and hence is Hellinger-differentiable.
Its
differen~ial,
for
e~
sequen~es
+ Mgt_gt"
converging to zero, is given by
n
v
pT(9 -9) + A(gt_gt)
(9 ,gn) satisfying 119 -911
n
n
n
~
n
•
where
p • (t)ft,x(y,X;9,g)i(y,x,e,g),
and the linear operator A taking L2(V) into L2(p) is defined for f in L2(V)
via
/ hY,X(y,x;e,u)f(u)dv(u)
Af •
t',x(y,x;e,g)
2fY
When necessary to indicate dependence on (y,x,a,g), pis written p(YiX,e,g)
and Af as Af(y,x,9,g).
The key result of Begun
e~
a1. (1983) used here is that when g is
unknown the efficient score for e is
*
2(Pe - Af )
,l.(y,x,e,g) •
ftx(Y ,x;8 ,g)
0.2)
Page 15
where the L2(V) function r * satisfies
< Pe-Ar* ,
(3.3)
Af> 11 -0
for all functions r obtained as L2(V) limits of sequences (r ) of the form
n
rn -
nt(gn -g).
In terms of expectations (3.3) becomes
U)E
e,
*
g [{~(Y,x,e,g) - 2f (y,x,e. g )}{ tAf(y,x.a. g )
f ,X(Y,X;9,g) fY,X(Y,x;e,g)
Y
Note that for any
r
€
}]. O.
(3.4)
L2 (v)
2Af(Y.X,9.g)
ftx(y,X;e,g)
_ J hY,X(Y,X;9,u)f(u)dV(u)
f hY,X(Y,X;9,u)g(u)dv(u)
and thus in view of (2.4)
2Af(Y.x.e.g) _ [ g(6,e.u)f(u)dv(u)
/ q(6,9,u)g(u)dv(u)
Y
(3.5)
t
f ,X(Y,X,9,g)
where 6 • X + YQa is defined in (2.6).
The important fact here is that the
right hand side of (3.5) depends on (Y,X) only through the complete
sufficient statistic 6 irrespective of
r;
this is a consequence of the
sufficiency of fi for u,when u is regarded as a parameter.
It follows now
that (3.3) holds for all r when
*
2Af (Y.X,9.g)
ftx(Y ,X;9, g)
• Ee ,g {~(y,X,e,g)lfi} .
Thus the efficient score given by (3.2) is
~(y,x,e,g)
•
~(y,x,e,g)
-
Ee,g{~(y,X,e,g)lfi.6}t6.x+YQa'
and the "information" available in (Y,X) for estimating e in the
presence of g is
T
1* - E{~(Y,X,e,g)~ (Y,X,e,g)};
see Equation (3.4) of Begun et al. (1983).
(3.6)
6(y,x,9) are given in (2.S).
Then using (2.4)
f
fy,X(y,x;e,g) •
*
q (9,u)r(y,x,9)g(u)dv(u).
and thus
f(ii-r* +
q rgdv
Jiig*
dV
f
ii~dV
*
f
•
q
*
q gdv
or
09
+r
Now since
_ b' ( a+uT@ )
a(+)
2.9....* •
if - b'(a+uTaHu
ae
a(+)
uTQ-lu + 2b(<<+u Ta) - 2u TQ-1 6 (y,x,9) }
2a 2 (+)
a'(,)
{
and
ar
-.
ae
where
(
r. y,x,
9)
*
T -1
ac (Y.') _ { 2«y - x Q x }al(~),
•
2a 2 ( . )
v
a.
y - f (6,9,g)
1
-
t(y,x,9,g) • a -1 (+)
yR(6,9,g) - f (&,8,g)
2
a(+)r.(y,x,9) - f 3(6,9,g) &-x+yQa
Page 17
where the scalar-valued functions f
functions Rand f
2
1
and f , and the p-vector-valued
3
depend on (Ytx) only through 6 -
x+yQB~
It follows that
~(y,xt9tg) • i(y,x t 9,g) - E{i(Y t X,9,g)16-6}
{y - E(YI6-6)}/a(+)
-
{y - E(YI6-o)}R(ot 9 tg)
(3.7)
Define
w(y) •
J q(y,9 t u)g(u)dv(u);
(3.8)
and
•
1-1
w(y) • (a/ay)w(y) • a(+) f Q uq(y,9,u)g(u)dv(u);
Now the function R( 6,9 ,g) appearing in (3.7) is given by
.
R(6 t 9,g) -
QW(6)
(3.9)
w(6) .
Using the relation x • 6 - yQB
r+ (y,x,9 )
.ac*~;.+)
_{ 2ay
-
and thus (3.7) involves only expectations of functions of Y16 a 6.
3.3.
Efficiency of ~he sufficiency and condi~ional scores
in a s~ruc~ur81 se~~ing
In the discussion of the normal linear functional model in Section
2.3, i t was deduced that the sufficiency score is equivalent to the
efficient score for the structural version of this model when the true
predictors (u1, •.. tUn) are themselves normally distributed.
result for logistic regression is now derived.
A similar
Compare (2.21) to the
logistic efficient score given by
( 3.10)
Page 18
equation (3.10) is just (3.7) for the case of logistic regression.
For
(2.2l) and (3.10) to be equivalent the function R(&,e,g) must be linear in
&.
Since by (3.9)
oW(4)
R(&,e,g) •
w(&) ,
this means that log{w(&)} must be a quadratic form in 6, call it Q(&),
i.e., using (3.8) with q(6,e,u) chosen accordingly for logistic regression,
Now using a moment-generating-characteristic-function argument it follows
that
T -1
exp{- u ~
u}
g(u)
T
l+exp(<:t+f3 u)
must be proportional to a p-variate normal distribution.
This means that
g(u) must be a mixture of two p-variate normal distributions with different
means and common covariance matrix.
The picture is now clear; the suf-
ficiency score (2.21) is efficient in a structural setting only when (Y,U)
satisfy the assumptions of the normal discrimination model,
Of course if all of this information were known a priori then the linear
T
discriminant, <:t+a u, would most likely be estimated using the full likelihood as opposed to using logistic regression, see Efron (1975) and Michalik
& Tripathi (1980).
A theorem is now proved which indicates when the conditional score
defined in (2.14) is the efficient score in some structural setting.
This
provides some insight into appropriate choices for t(·) when choosing a
conditional score (2.14).
~C
•
Page 19
THEOREM.
The conditional score lP
model for some density g( .) and
C
is the efficient score in a structural
S01l8
measure v(·) i f and only i f there
exists a real-valued function T(') such that
"
t(~)-(a/a6)T(~)
where exp[ T{a( ,)Sl6}] is a moment generating function for sOl11e probability
density with respect to v(·).
PROOF.
Assume that $C is the efficient score in a structural model with
density g(.) and measure v(·).
Then comparing (2.14) and (3.7) it follows
that
where w(6) is given by (3.8).
be determined later.
Let T(6) - log{w(6)}-k for a constant k to
Clearly (a/a6)T(4)-t(6) and furthermore, using (3.8),
Thus with k chosen accordingly M(6)-exp[T{a(,)SlS}] is a moment
generating function of the density
exp {k
•
T -1
T
_ u g u + 2b(a+@ u)} ( )
2a(,)
g u
with respect to v(·) .
The steps in this argument can be reversed to prove the theorem in the
other direction.
1/11
The discussion of efficiency in a functional setting is difficult . I n
light of this a reasonable approach is to choose a conditional score.
Page 20
(2.14») which is known to be efficient for some structural model.
The
theorem indicates that the class of appropriate functions t(·) is fairly
restricted.
3.4. Eff:!cienc esc:!1IJ8c:!on.
Since the efficient score in (3.7) depends on the unknown density
g(.») it is not readily apparent how one constructs a sequence of
estimators with asymptotically minimum variance.
Begun ec 81. (1983)
suggest in general solving
n
I ~(Yi)Xi,8)g) - 0
i-I
where g(.) is some suitable initial estimator of g(.).
empirical distribution function,
Fn , of the observed
XIS
0.11)
Since the
converges to
the convolution of G with a normal distribution function it should in
theory be able to deconvolute
Fn
to obtain consistent estimators of G
which would then be smoothed to obtain estimators of g(.).
In practice
this is quite difficult and technical problems might arise when p>1.
Also
given a g(.) it is still possible that (3.11) will have multiple
solutions, not all yielding consistent sequences.
This last problem can be avoided if a root-n consistent preliminary
estimator, 8, is available.
Again let g(.) be an estimator of g(.)
and define
8 - 9 +
~-1
1*
.1.
n-"2"
n
I
i-I
where 1* is an estimator of 1*, e.g.,
n
-1
~(Yi,Xi,9,g)
Page 21
and
j(y,x,9,g) • (a/a9)~(y,x,9tg).
Then 9 will generally be asymptotically· efficient provided
are good estimates of 1* and g(.) respectively.
1*
and g(.)
This approach still
requires an estimator g(.) of g(.).
Note that
~(Ytxt9tg)
(3.8) and its derivative.
depends on g(.) only through the function w(·) in
In work in progress the authors are investi-
gating a one-step construction of an asymptotically efficient estimator
which estimates w(·) directlYt avoiding the intermediate step of estimating
g( • ).
4.0
CORCLUDI8G RFJL\IU{S
In conclusion we reiterate that the assumption of normal errors,
..
(1.2), is not crucial to the theory developed herein; the existence of a
complete sufficient statistic for u when regarded as a parameter is
crucial.
The situation in which (2.1) is replaced with an assumption of
replicated measurements, i.e., m
> 1 in (1.2), is conceptually no different
than when (1.2) is assumed with the exception that both g and, can now
be estimated; thus there will be an additional p(p+1)/2 - dimensional
component to all the scores.
Although no distributional assumptions on the measurement errors is
more reasonable than that of normality it is still an unverifiable as-
•
sumption unless replicate measurements are made.
The sufficiency, con-
ditional and efficient scores lose their unbiasedness when the assumption
of normal errors is erroneous.
Thus when measurement error is nonnormal,
estimates derived from these scores will generally
will generally not be computable.
be biased and the bias
Approximations to the bias can probably
Page 22
be obtained
using the small-measurement asymptotics employed by Stefanski
(1985) although we have not attempted these calculations.
The work of R. J. Carroll was supported by the U.S. Air Force Office
of Scientific Research.
•
•
-,
Page 23
References
Adcock, R. J.
(1878). A Problem in least squares. rhe Analyst 5, 53-4.
Anderson, T. W. (1976). Estimation of linear functional relationships
(with discussion). J. Roy. Statist. Soc. Ser.B 38, 1-36.
Begun, J. M. Hall, W. J., Hwang, W. M. & Wellner, J. A. (1983). Information
and asymptotic efficiency in parametric-nonparametric models. Ann.
Statist. II, 432-52.
Bickel, P. J. & Ritov, Y. (1986). Efficient estimation in the errors-invariables model. Ann. Statist. to appear.
Carroll, R. J., Spiegelman, C. H., Lan, K. K., Bailey, K. T., & Abbott, R.
D. (1984.) On errors-in-variables for binary models. Biometrika 71,
19-25.
Efron, B. (1975). The efficiency of logistic regression compared to normal
discriminant analysis. J. Amer. Statist. Assoc. 10, 892-98.
.
•
GIeser, L. J. (1981). Estimation in a multivariate 'errors-in-variables'
regression model: large sample results. Ann. Statist. 9, 24-44 .
Huber, P. J. (1967). The behavior of maximum liklihood estimators under
nonstandard conditions .. Proceedings of the Fifth Berkeley Symposium
on Hathematical Statistics and Probability 1. Ed. 1. M. LeCam & J.
Neyman, 221-33, University of California Press.
Kendall, M. G. & Stuart, A. (1979). rhe Advanced rheory of Statistics, 2.
London: Griffin.
Lindsay, B. G. (1980). Nuisance parameters, mixture models, and the
efficiency of partial likelihood estimators. Philos. rrans. Roy. Soc.
London Ser. A 296, 639-65.
Lindsay, B. G. (1982). Conditional score functions: some optimality
results. Biometrika 69, 503-12.
•
Lindsay, B. G. (1983). Efficiency of the conditional score in a mixture
setting. Ann. Statist. 11, 486-97 .
McCullagh, P. & NeIder, J. A. (1983). Ceneralized Linear Hodels. London:
Chapman and Hall.
Michalik, J. E. & Tripathi, R. C. (1980). The effect of errors in diagnosis
and measurement on the estimation of the probability of an event. J.
Amer. Statist. Assoc. 75, 713-21.
Moran, P. (1971). Estimating structural and functional relationshps. J.
Hulti. Anal. 1, 232-55.
Page 24
Pfanzagl, J. (1982). Contributions to a General Asymptotic Statistical
Theory. New York: Springer-Verlag.
Prentice, R. L. (1982). Covariate measurement errors and parameter
estimation in a failure time regression model. Biometrika 69,
331-42.
Stefanski, L. A. (1985). The effects of measurement error on parameter
estimation. Biometrika to appear.
Stefanski, L. A. & Carroll, R. J. (1986). Covariate measurement error in
logistic regression. Ann. Statist. to appear.
Wolter, J. M. & Fuller, W. A. (1982a). Estimation of nonlinear errors-in-var
iables models. Ann. Statist. 10, 539-48.
Wolter, J. M. & Fuller, W. A. (1982b). Estimation of the quadratic
errors-in-variables model. Biometrika 69, 175-82.
•
© Copyright 2026 Paperzz