VARIANCE FUNCTION ESTIMATION IN
HETEROSCEDASTIC REGRESSION MODELS
by
Marie Davidian
A dissertation 8ubmitted to the faculty of the
University of North Carolina at Chapel Hill in
partial fulfillaent of the requireaents for the
degree of Doctor of Philosophy in the Departaent
of Statistics
Chapel Hill
1986
~$.~
A~~
Dwt
Reader
' P.
.
J..
Jon;:-j--
~
~~~='- -
Reader
MARIE
DAVIDIAN.
Variance
Regression Models.
Function
Estimation
in
Heteroscedastic
(Under the direction of RAYMOND J. CARROLL.)
Heteroscedastic regression models are used to analyze data in a
variety of fields, including economics, engineering and the biological
and physical sciences.
Often, the heteroscedastici ty is modeled as a
function of the regression and other structural parameters.
Standard
asymptotic
theory implies that under reasonable conditions, how one
estimates
the
parameters,
variance
has
no
function,
effect
on
the
in
particular
first
order
the
structural
properties
of
the
estimates of the regression parameters; however, it has been noted in
practice that how one estimates the variance function does matter.
Furthermore. in some settings. estimation of the variance function is
of independent interest or plays an important role in the properties of
estimates of other quantities besides the regression parameters.
We develop a general theory for variance function estimation in
regression which includes most methods in common use.
we
focus
on
estimation
of
the
structural
In particular.
In
parameters.
our
development. we note that most variance function estimation procedures
can
looked
be
upon
as
regressions
with
being
"responses"
transformations of absolute residuals from a preliminary fit or sample
standard deviations from replicates at a design point.
allows
us
to
conclude
that.
typically.
using
Our theory
residuals
is
more
efficient than using sample standard deviations, but not uniformly so.
For variance function estimates based on transformations of absolute
residuals. efficiency is a monotone function of the efficiency of the
preliminary fit when the errors are sy.metric.
so that one should
iii
iterate so that the residuals are based on generalized least squares.
Robustness issues are of even more importance for variance function
estimation than for estimation of a regression function.
To illustrate the implications of these results and to show how
variance function estimates play a ro le
in the estimation of other
important quantities, we focus on the analysis of assay data, which are
often
fit
by
a
nonlinear
heteroscedastic
regression
model.
An
additional component of assay analysis is the estimation of auxilIary
constructs such as the minimum detectable concentration, for which .any
definitions exist.
properties
of
the
We consider one such definition and show how the
standard
estimate
for
minimum
detectable
concentration are dependent to first order on how one esti.ates the
structural variance para.eters.
support the asymptotic theory.
Sillulation results and an
example
e-
ACKNOWLEDGEMENTS
While
I
could
probably
go
on
for
several
pages
to
thank
individually each of the kind souls without whom this dissertation
would have never been completed, I would like especially to acknowledge
those people whose help and understanding were of the highest order of
magnitude.
My most heartfelt appreciation goes to my advisor, Ray Carroll.
His vast expertise,
availability
for
help
and
guidance,
boundless
enthusiasm (sometimes to excess) and, most importantly, his patience
were instrumental in my completion of my degree.
I cannot do justice with mere words to the depth of my gratitude
to my roommate, compatriot, partner in misery (and sometimes crime) and
friend,
Stena Kettl.
together,
not
to
This
mention
makes
all
two
the
degrees
we've
associated
pain,
been
through
disgust
and
self-doubt; as fun as it's been, though, I don't think I'm up for it
again.
See, Stena, I didn't even mention the Virginian (or Papagayo's,
or Molly Maguire's, or our living room floor, or ... ).
The understanding and support of my friends has been far above and
beyond
the
call
of
duty;
their
phone
calls,
visits
and
patient
tolerance of my sometimes hysterical ravings have been a lifeline for
v
True
me throughout my stay here.
"best"
friends are rare,
and
I
consider myself blessed that I have two -- Joyce Maxwell and Ellen
Bierschenk.
I
cannot
even
begin
to
express
how
important
their
friendship and love have been these past four years.
If it had not been for the support and love of Jed Frees, I think
I'd be aiming for a tenure-track position at a different kind of state
institution than the kind that awards degrees.
His encouragement and
understanding were nothing short of legendary.
Along
the
way,
I
had
the
opportunity
to
meet
people
whose
friendship I will continue to cherish long after I've left Chapel Hill.
The mere existence of June Maxwell, Courtney Stark, Doug Vass and Jane
Wi lIe made the day-to-day travail a bit more bearable,
enjoyable.
maybe even
The importance of their patient endurance of my ongoing
tirade on the concept of doom cannot be emphasized enough.
The appreciation and gratitude I have for the love, encouragement
and
tolerance
inexpressible.
of
my
family
during
the
past four years is simply
To my mother and brother, thank you for believing in
me, and, more importantly, putting up with me.
To my wonderful father,
who did not get the opportunity to actually see me complete my degree,
even though you didn I t
encouragement;
say it often,
thank you for your love and
knowing you were up there rooting for lie left lie no
option but to finish my degree.
I dedicate this dissertation to you.
This research was supported by the Air Force Office of Scientific
Research AFOSR-F-49620-85-C-0144.
TABLE OF CONTENTS
CHAPTER I
INTRODUCTION
1.0
Introduction and overview
1
1.1
Model and basic assumptions
2
1.2
Estimation in heteroscedastic regression
models - background
4
1.3
Estimation of 9
10
1.3.1
Regression methods
11
1.3.2
Other methods
15
1.4
Summary
18
1.5
Characterization of restricted maximum
likelihood
19
CHAPTER II
AN ASYMPTOTIC THEORY OF VARIANCE FUNCTION
ESTIMATION
2.0
Introduction
23
2.1
Methods based on functions of absolute
residuals
24
2.2
Methods based on sample standard deviations
27
2.3
Methods not depending on the regression
function
29
2.4
Small a asymptotics
31
2.5
Examples and further results
32
2.5.1
2.6
Pseudo-likelihood, restricted
maximum likelihood and weighted
squared residuals
33
2.5.2
Logarithms of absolute residuals
35
2.5.3
Absolute residuals
36
Proofs of major results
37
vii
CHAPTER III
OTHER ESTIMATORS
3.0
Introduction
40
3.1
Maximum likelihood
41
3.2
Extended quasi-likelihood
45
3.3
Modified maximum likelihood
61
CHAPTER IV
COMPARISON AND DISCUSSION
4.0
Introduction
70
4.1
Comparison of methods based on residuals
70
4.2
Methods based on sample standard deviations
72
4.3
Discussion and conclusions
76
Tables 4.1 - 4.3
79
CHAPTER V
e-
APPLICATION - THE ROLE OF 9 IN THE
PROPERTIES OF THE ESTIMATOR FOR MINIMUM
DETECTABLE CONCENTRATION
5.0
Introduction
81
5.1
Analysis of assay data and the Jlinimum
detectable concentration
81
5.2
Asymptotic theory
84
5.3
A siJlulation
87
5.4
An example
90
5.5
SUJlllary
92
5.6
Proofs
92
Tables 5.1 - 5.7
99
REFERENCES
105
CHAPTER I
INTRODUCTION
1.0
Introduction and overview
Heteroscedastic regression models are accepted as appropriate in a
wide variety of fields, including radioimmunoassay (Rodbard and Frazier
(1975), Finney (1976), Raab (1981», econometrics (Hildreth and Houck
(1968), Amemiya (1977», pharmokinetic modeling (Bates, Wolf and Watts
(1985», enzyme kinetics (Haaland, et al. (1986»
(Pritchard, Downie and Bacon (1977»
and chemical kinetics
among others.
In such settings,
the mean response is modeled as a possibly nonlinear function of known
explanatory
variables
and
unknown
regression
parameters.
The
heteroscedasticity may be regarded as of unknown form or may be modeled
as a function of the explanatory variables, known constants exogenous
to the model and the regression parameters.
completely known,
This function may be
specified up to additional unknown parameters or
completely unknown.
The usual goal
is to obtain estimates of the
regression parameters in order to investigate the character of the mean
response.
Due
to
efficiency considerations,
incorporate
a
estimation methods
method
for
estimation
for
the
of
the
regression
parameters
variances;
the variance estimates also allow for better understanding
2
of the variability in the data.
As will be discussed shortly, it has
become increasingly apparent that, despite the implications of standard
asymptotic theory,
better
one's
the better one I s estimates of the variances,
estimates
of
the
regression
parameters
will
the
be.
Furthermore,
in some applications, estimation of the variances is of
independent
interest
or
plays
an
important
theoretical
role
estimation of quantities other than the regression parameters.
while much effort has been focused on the study of
estimators for the regression parameters,
in
Thus,
properties
of
it is of some interest to
investigate the properties of estimators for variance as well.
We shall be interested, then, in investigating and comparing the
properties
of
different
methods
heteroscedastic regression models.
for
estimating
variances
in
In particular, we focus on settings
for which variance is modeled as a known function of explanatory and
exogenous variables,
structural
the regression parameters and additional unknown
parameters;
our
analysis
reduces
to
determining
the
properties of various estimators for these additional variance function
parameters.
As will become clear shortly, many such estimators, while
motivated by different considerations, have similar formulations;
exploi t
we
this fact to pursue a simple and concise unified theory for
variance function estimators from which specific estimators arise as
special cases.
This theory makes general observations and comparisons
particularly straightforward.
1.1
Model and basic assumptions
Consider a general possibly nonlinear heteroscedastic regression
model for observable data Y given by
e-
3
(1.1 )
E Y.
1
f (X . •fJ) ;
=)J.
1
1
var Y.
I
=
a2.
i'
i
1, ... N.
Here. {x. (k x I)} are the design vectors, fJ (p x 1) is the regression
1
parameter, f is the possibly nonlinear mean response function, and the
{ail express the heteroscedasticity.
The model (1.1) is quite general.
N is the total sample size.
When replicate observations are
available at each design point, one might choose to regard the form of
the {a.} as entirely unknown and use the replicate observations to
1
estimate the {a.}; see Jacquez, Mather and Crawford (1968), Jacquez and
1
Norusis (1973), Fuller and Rao (1978) and the discussion of Section
1. 2.
Often, however, a /Rode I to describe the heterogeneity may be
suggested
by
the
data.
application
or convention.
Indeed,
if
no
replication is available the above approach is not feasible and some
additional assumption must be made.
One common situation in practice
is that in which a~ increases as a function of explanatory or exogenous
1
variables or the mean response.
As a result, much recent interest has
focused on modeling the variance as a function of these quantities; see
Box and Hill
(1974), Rodbard and Frazier (1975), Finney (1976), Raab
(1981) and Carroll and Ruppert (1982a) among many others.
A general
parametric model for the variance which subsumes all of the above can
be written as
(1. 2)
222
a. = a g (z. ,fJ ,9 ) ,
1
1
where a is an unknown scale parameter, and the variance function g
expresses the heteroscedasticity, where {z. (i x I)} are known vectors,
1
4
possibly containing the {x.}, and 9 (r x 1) is an unknown parameter.
1
In practice as well as for theoretical investigations. g is taken to
satisfy appropriate smoothness conditions.
In a model such as (1.2), estimation of the variances essentially
reduces to estimation of 9. since p will be estimated routinely and the
final estimates of p and 9 may be used to obtain a final estimate of a.
Thus, our investigation of properties of variance estimators for (1.2)
focuses on properties of estimators for 9.
Another
approach
to
modeling
variances
as
a
function
of
the
above-mentioned quantities is to assume that
2
a .
h ( z .) or h (jJ • ) ,
I I I
where h is an unknown but "smooth" function.
The smoothness assumption
separates this approach from that in which the
{a.}
1
are completely
unknown and invites estimation schemes from the realm of nonparametric
function estimation;
(1985) .
We
do
see Carroll
not
consider
(1982)
this
and Matloff,
model
here
Rose
and
Tai
as we restrict our
investigation to properties for parametric models.
1.2
Estimation in heteroscedastic regression models - background
Before we consider methods for variance estimation in (1.1) when
(1.2)
holds,
we
briefly
heteroscedastic regression,
provide
more
concrete
review
as
the
the
literature
well
known
for
estimation
results
justification for our endeavor.
we
in
describe
We discuss
particular methods for estimation of 9 in the next section.
Much of
e-
5
the
literature focuses
on the
linear model
j.J.
1
however. with sufficient smoothness conditions on f we may consider
most of the results for the linear model applicable for (1.1).
In what follows. when we consider the possibility of replication
at each design point. we write in obvious fashion {Y .. }. j
1, ... ,m. ,
1J
to denote the mi
2 observations at xi and let N =
~
Z
1
M
i=1 m be the
i
total sample size. where M is the number of design points.
In the
discussion below, as N ~~, consider the {m.} as fixed.
1
When (1.1) holds and the {a.} are known. Gauss-Markov shows that,
1
under regularity conditions as N ~
I
where P
LS
estimators,
LS
~
I
~.
in the sense of nonnegative definiteness,
WLS
and P
WLS
are
the
ordinary
and
weighted
least
squares
respectively, and H(a,B) is the appropriate multivariate
normal distribution with mean a and covariance matrix B.
When the {a.}
are unknown, a natural attempt to capitalize on this
result
1
is to
replace the {a.} by estimates {a.} and perform weighted least squares
1
1
to obtain the generalized least squares
popular
approach,
schemes for
Early
Jacquez,
P
the
only
difference
estimator P
between
GLS
various
'
In
this
estimation
is the choice of the {a.}.
suggestions
1
for
choosing
Mather and Crawford
(1968)
the
{a. }
1
require
replication.
and Jacquez and Norusis (1973)
6
study
PGLS
empirically when
(m. - 1)
1
(1. 3)
-1
m
i
where Vi.
mi
E j =l Yij .
-1
mi
E'
J= l
-
(Y .. - Y. )
1J
l'
2
,
They find that using these weights can be
disastrous in the estimation of
P. particularly if the {mil are small.
Fuller and Rao (1978) exploit the form of the mean response by choosing
(1.4 )
They show that when the data are normally distributed, P
cOJlputed
GLS
under (1.4) satisfies
e-
As we discuss shortly, it is possible to construct estimators for the
{a i}
such that P
GLS
is aSYMptotically equivalent to P
WLS
'
so in an
asymptotic sense it is possible to improve on (1.3) and (1.4).
The difficulty with these naive forms of nonparametric estiaation
of the {a.} is that the estimated variances can be widely different
1
even when the {ai} are close, especially i f the {ail are sllall.
The
above results thus suggest that poor.
"nonsmooth" estimation of the
variances can lead to poor estillation of
p.
Because of results such as the above as well as considerations
such as those described in Section 1.1, more recent work has focused on
making assullptions about the form
approach is parametric as in (1.2).
of
the
{a i}
.
Most
often,
the
Exaaples are plentiful; a very
7
abbreviated list consists of the references in Sections 1.0 and 1.1 as
well as Jobson and Fuller (1980) and NeIder and Pregibon (1986) among a
host of others.
Other nonparametric approaches not to be considered
here are discussed, for example, by Carroll (1982).
When
obtain P
one
GLS
(1.2)
'
holds,
clearly one only need
estimate g(zi,P,9)
P
The most common method for estimating
obta ins es tima tes of g (z .
1
to
is that in which
,P ,9) by us ing an es t ima te of 9 and a
preliminary estimate of P and then performs weighted
least
squares;
see, for example, Carroll and Ruppert (1982a) and Box and Hill (1974).
We will henceforth use the term generalized
exclusively to
an
estimator of this type.
proved precisely by Jobson and Fuller
least
squares
to
refer
The following result is
(1980) and Carroll and Ruppert
(1982a) .
Theorem A.
Suppose that (1.2) holds and that 9 and
1 2
for 9 and 13 such that N / (; - 9) =
Then,
under
regularity conditions
fY
as
P
(1) and N
N
~
00,
1I2
the
P*
are estimators
(P* -
13)
=
generalized
fY
P
(1).
least
squares estimator computed with 0i = g(zi,P*,9) satisfies
This result shows that while we can improve on (1.3) and (1.4)
the assumptions that
standard asymptotic
(1.2)
theory
under
holds and 13* and 9 are "good" estimates,
is
not
very helpful
for
deciding
which
estimates 13* and 9 to use.
Despite standard asymptotics as in Theorem A, there is evidence
for finite samples to suggest that the choices of 9 and
p*
do matter.
8
It has become increasingly apparent that the better one's estimate of
9,
the better one's final estimate of 13 will
states
that
"both analytic
and
empirical
be.
Williams
(1975)
studies ... indicate that ...
ordering of efficiency (of estimates of 13) ... in small samples is in
accordance
with
the
ordering
by
efficiency
(of
estimates
of 9)."
Rothenberg (1984) shows via second order calculations that in case g
does
not
on 13.
depend
when
the
data
are normally distributed
the
covariance matrix of the generalized least squares estimator of 13 is an
increasing function of the covariance matrix of the estimator of 9; see
also Carroll and Ruppert
Carroll.
Ruppert
and
(1985).
Wu
Carroll and Ruppert
(1986)
cite
evidence
to
(1982a,b) and
suggest
standard asymptotic theory for 13 can be highly optimisi tic.
Ruppert
and
Wu
squares
method
(1986)
by
consider
letting 13*
iteration of
fJ
=
initial
generalized least squares and repeating the process
the
aim
being
to
reduce
the
effect
preliminary estimator for 13 such as fJ
of
LS
'
an
~
the
Carroll,
the generalized
after
GLS
that
least
estimation
by
- 1 more times,
initial
inefficient
Their results suggest that
the choice of 9 and 13* will play a role in determining the optimal
number of
pages
cycles~.
96-120)
A Monte-Carlo study of Goldfeld and Quandt (1972,
shows
that
inefficient generalized
it
least
is
possible
squares
to
construct
estimator as well
a
badly
as a quite
efficient one.
Second order asymptotics provide only part of the justification
for
studying
Estimation
of
the
9
properties
is
of
of
variance
independent
consequences in many settings.
interest
function
and
has
estimators.
important
In some applications, estimation of 13
is not the only problem of interest.
In chemical and biological assay
e-
9
problems. for example. issues of prediction and calibration arise.
In
such
in
problems.
the
estimator
of
9
plays
a
central
role;
radioimmunoassay the statistical properties of prediction intervals and
constructs
such as
the minimum detectable concentration are highly
dependent on how one estimates 9.
We discuss
this application and
exhibit this result precisely in Chapter 5.
In engineering applications an important goal is to estimate the
error made in predicting a new
observation;
this
directly from the variance function estimate.
can
be
obtained
In off -1 ine qual i ty
control. the emphasis is not only on the mean response but also on its
variability; Box and Meyer (1986) state that "one distinctive feature
of Japanese
quality
control
improvement
techniques
is
the
use
of
statistical experimental design to study the effect of a number of
factors on variance as well as the mean,"
(1980) .
see also Taguchi and Wu
Effective estimation of the variance function could play a
major role in this application.
The above discussion suggests that there are numerous practical
situations in which choice of the method for estimating the {a.} will
1
be crucial.
In the case of our model (1.2), this choice is defined by
how we choose to estimate the variance function g, and in particular,
9.
Our brief review indicates that
the parameter e
can
play
an
important part in a statistical analysis far beyond that of a nuisance
parameter.
Variance function estimation may be thought
of
as
a
type
of
regression problem in which we try to understand variance as a function
of known or estimable quanti ties. where 9 is a "regression parameter."
Many of the methods for estimation of
e that have been proposed in the
10
literature
are
(possibly
weighted)
regression
Jlethods
based
on
functions of either absolute residuals from the current regression fit
or, in the case of replication at each design point, sample standard
deviations
based
on
(a,p,9)
(1.3).
Still other methods are joint estimation methods
assumptions
about
the
underlying
distributions
are in principle estimated simultaneously.
in
which
In Section 1.3 we
describe specific approaches to estimation of 9.
1.3
Estimation of 9
We now discuss the form and motivation for several estiJlators of 9
in (1. 1) and (1.2).
We confine our attention to Jlethods which are
siMple or are in common use;
in particular, we do not discuss the
robust Jlethods of Carroll and Ruppert (1982a) or Giltinan, Carroll and
Ruppert
(~986).
In what follows, let p* be a preliminary estimator for
p.
This could be unweighted least squares or the current estimate in
an
iterative
reweighted
least
residuals by
and define the errors
{~i}
by
squares
calculation.
Denote
the
e-
11
1.3.1
Regression methods
Pseudo-likelihood.
Given p*, the pseudo-likelihood estimator for
9 maximizes the normal log-likelihood e(p*,9,o), where
e(p,9,o)
(1. 5)
see Carroll and Ruppert (1982a).
and Samaniego (1981).
The terminology is borrowed from Gong
While this method does not appear in this form
to be based on a regression using absolute residuals, examination of
the estimating equations for 9 and
0
based on (1.5) show that they have
the form of equations for weighted regression.
Generalizations of
pseudo-likelihood have been studied by Carroll and Ruppert (1982a) and
Giltinan, Carroll and Ruppert (1986).
Least squares on squared residuals.
other
methods
motivation
for
using
these
squared
residuals
methods
is
that
Bes ides pseudo-l ike 1 ihood,
have
the
been
squared
proposed.
residuals
The
have
approximate expectation o2g 2(z.,P,9), see Jobson and Fuller (1980) and
1
Amemiya (1977).
This suggests a nonlinear regression problem in which
"regression
the
"responses"
are
and
the
2 2
o g (zi,P.,9). The estimator 9
minimizes in 0 and 9
SR
A
function"
is
12
For
normal
data.
the
squared
residuals
have
approximate
variance
4
a 4g (z ..ft.9); in the spirit of generalized least squares, this suggests
1
the weighted estimator which minimizes in 9 and a
(1. 6)
where 9* is a preliminary estimator for 9, 9
SR
' for example.
Full
iteration, when it converges, would be equivalent to pseudo-likelihood.
Accounting for the effect of leverage.
One objection to methods
such as pseudo-likelihood and least squares based on squared residuals
is
that no compensation is made for the loss of degrees of freedom
associated with preliminary estimation of ft.
For example, the effect
of applying pseudo-likelihood directly seems to be a bias depending on
p/N.
For settings such as fractional
factorials where p is
large
relative to N this bias could be substantial.
Bayesian ideas have been used to account for loss of degrees of
freedom; see Harville (1977) and Patterson and Thompson (1974).
When g
does not depend on ft, the restricted maximum likelihood approach of the
latter authors suggests in our setting that one estimate 9 from the
mode of the marginal posterior density for 9 assuming normal data and a
prior for the parameters proportional to a
-1
.
When g depends on ft. one
may extend the Bayesian arguments and use a linear approximation as in
Box and Hill (1974) and Beal and Sheiner (1986) to define a restricted
maximum likelihood estimator.
Let Q be the N x p matrix with ith row fn(X ..ft)t / g(z .•ft,9),
,.,
1
where fft(Xi,ft) = a/aft {f(xi.ft)}, and let H = Q(QtQ)-1 Qt
matrix with diagonal element h
ii
1
be the "hat"
= hii(ft.9); the values {h ii } are the
e-
13
See Cook and Weisberg (1982) for a discussion of
leverage values.
It
leverage.
estimator
is
turns
out
that
equivalent
pseudo-likelihood
to
characterization,
to
an
account
while
not
the
restricted
estimator
for
the
obtained
effect
unexpected,
is
maximum
of
new.
likelihood
by
modifying
leverage;
We
exhibit
this
the
derivation of this estimator and its equivalence to a modification of
pseudo-likelihood in Section 1.5.
The least squares approach using squared residuals can also be
modified to show the effect of leverage.
Jobson and Fuller (1980), for
example, essentially note that for nearly normally distributed data we
have the approximations
222
E r. ~ a (l - hi.)g (z. ,p,9),
1 1 1
2
var r i
~
a
424
(1 - h .. ) g (z.
11
1
,p, 9 ) .
To exploit these approximations modify (1.6) to minimize in 9 and a
N
where h
2
X. 1 {r ·
(1. 7)
1=
1
2
2
A
2
A
2 4
A
A
a (l-h .. )g (zi,P*,9)} j{(l-h .. ) g (z.,p*,9)},
-
11
11
1
An
ii
asymptotically equivalent variation of this estimator in which one sets
the derivatives of (1.7) with respect to 9 and a equal to zero and then
replaces
by
9
can
pseudo-likelihood
in
which
9*
studentized residuals.
easily
one
be
seen
replaces
to
be
standardized
equivalent
to
residuals
by
While this estimator also takes into account
14
the
effect
of
leverage.
it
is
different
from
restricted
maximum
likelihood.
Least squares on absolute residuals.
Squared residuals are skewed
and long-tailed. which has lead many authors to propose using absolute
residuals to estimate 9; see Glejser (1969) and Theil (1971).
Assume
that
Ely. - f (x . .P)
1
1
I
= qg( z
which is satisfied if the errors
distributed.
. •P,9 ) ,
1
{~.}
1
are independent and identically
Mimicking the least squares approach based on squared
residuals. one obtains the estimator 9
AR
by minimizing in q and 9
In analogy to (1.6), the weighted version is obtained by minimizing
where 9* is a preliminary estimator for 9, probably 9
squares estimation based on squared residuals.
AR
.
As for least
one could presumably
modify this approach to account for the effect of leverage.
Logarithm method.
The suggestion of Harvey (1976) is to exploit
the fact that the logarithm of the absolute residuals has approximate
expectation log {a g (zi,p.9)}.
Estimate 9 by ordinary least squares
regression of log Iril on log {ag( z . .P* ,9 )}. since i f the errors are
1
independent
and
identically
distributed,
the
regression
should
be
15
approximately homoscedastic.
If one of the residuals is near zero the
regression could be adversely affected by a large "outlier," hence in
practice one might wish to delete a few of the
smallest
absolute
residuals, perhaps trimming the smallest few percent, for example.
The
logarithm
for
method
invites
further
interpretation
for
9
when,
example, g can be written
g(Z.,fJ,9)
(1. 8)
1
g (J.I • , z . ,9 ) ;
1
1
9
i f we have the "power of the mean" model g(J.I., z. ,9 )
1
J.I i'
1
the natural
interpretation of 9 is as the slope parameter.
1.3.2
Other methods
Maximum likelihood.
In a parametric model such as
(1.1)
and
(1.2), joint maximum likelihood estimation is possible, where we use
the term maximum likelihood to mean normal theory maximum likelihood.
When the variance function does not depend on fJ. it can be easily shown
that the maximum likelihood estimator of 9 is asymptotically equivalent
to weighted least squares methods based on squared residuals;
in the
important situation in which the variance function depends on fJ, as in
(1. 8),
this
Chapter 3.
normality
is not the case.
These results will be exhibited in
Jobson and Fuller (1980) have in fact shown that under
the
maximum
likelihood
estimator
of
fJ
has
asymptotic
covariance matrix at least as small as that of any generalized least
squares estimator of Theorem A.
However.
it has been observed by
Carroll and Ruppert (1982b) and McCullagh (1983) that while maximum
16
likelihood estimators enjoy asymptotic optimality when the model and
distributional assumptions are exactly correct, the maximum likelihood
estimator of p can suffer severe problems under departures from these
assumptions.
This suggests that joint maximum likelihood estimation
should not be applied blindly, although we consider the properties of
the maximum likelihood estimator of 9 for comparative purposes because
of optimality when the assumptions are correct.
Extended quasi-likelihood.
When 9
is known
and
the
variance
function has form (1.8), quasi-likelihood estimation of p is a form of
generalized least squares which
iterated such that p*
is
p;
see
Wedderburn (1974), McCullagh (1983) and McCullagh and NeIder (1983).
The extended quasi-likelihood method of NeIder and Pregibon (1986) is a
joint
estimation
scheme
which
attempts
to
extend
the
quasi-likelihood to include estimation of 9 assuming (1.8).
is
based
on
the
assumption
that
the
data
arise
notion
of
The method
from a class of
distributions depending on 9 and involves estimation of 9 by minimizing
in p, 9 and a the "extended quasi-likelihood"
(1. 9)
where
-2
f
J.l
y
y - u
--=2---- du.
g (u,z,9)
The major motivation for this method is that it includes an extended
parametric
family
distributions.
which
"nearly"
includes
For example, when g(J.l., z. ,9)
1
1
the
9
= J.l.
1
gamma
and 9
and
=
1/2,
Poisson
a
=
1,
17
Q+ differs
Stirling
from the Poisson log-likelihood by replacing Y.!
by its
1
approximation;
for
I,
9
Q+
differs
from
the
gamma
log-likelihood by a factor depending on a.
For a related formulation, see Efron (1985).
Methods requiring replication.
Methods requiring m
i
2 replicate
~
Observations at each Xi have been proposed in the assay literature; for
simplicity, we consider only the case of equal replication m.1
= m Vi.
These methods do not depend on the postulated form of the regression
function; one reason that this might be advantageous is that in many
assays along with observed pairs {Yij,x } there will also be pairs in
i
which only Yij is observed.
A popular and widely used method in radioiJDunoassay is that of
Rodbard
and
Frazier
If
(1975).
we
assume
(1.8),
the
method
is
identical to the logarithm method previously discussed except that one
replaces Iril by the sample standard deviation si and f(xi,P*) in the
"regression function" by the sample mean V. .
1-
Under the assumption
maximum
likelihood
method
of
independence
of
Raab
and
(1981)
(1.8),
estimates
the
9
modified
by
joint
maximization of the "modified" normal likelihood
2
in the (M + r + 1) parameters a , 9, J.'l' ... ' J.'M;
serves to make the estimator of a
2
unbiased.
the modification
18
1.4
Summary
The discussion of Sections 1.2 and 1.3 suggests the need for a
unified investigation of variance functions in a model such as (1.2),
in particular, estimation of the structural parameter 8.
in the Ii terature tends to be scattered in
that
Previous work
it appears
in the
efforts of researchers from a variety of fields of application and it
treats special cases of
estimation
methods.
(1.2)
Our
as
intent
different
is
to
models
study
with
their
parametric
own
variance
function in a unified way so that we may make general observations
about different methods as a class and comparisons among the methods of
Section 1.3 under various conditions.
The major insight which allows
for a unified study is that the absolute residuals or
the
sample
standard deviations in the case of replication are the basic building
blocks for analysis.
In Chapter 2 we develop an asymptotic theory for a general class
of
estimators
for
8
whose
methods of Section 1.3.1.
construction
encompasses the regression
In Chapter 3 we investigate the properties
of the estimators of Section 1.3.2 which do not fit into this general
class.
Chapter 4 contains comparisons of the theoretical properties of
the estimators of Section 1.3 based on the results of Chapters 2 and 3
and a general discussion of the implications of our work.
In Chapter 5
we discuss the notion of minimum detectable concentration from the
field of radioimmunoassay and show precisely how the estimator of 8
plays a central
role
in the estimation of this important quanti ty.
Throughout, our presentation is brief and in many cases heuristic in
19
order
that
general
insights
and
results
are
not
overshadowed
by
laborious technical details.
1.5
Characterization of restricted maximum likelihood
We now specify the form of
estimator for
in
EI
(1.1)
and
the
(1.2)
restricted
maximum
likelihood
and show its equivalence to a
modification of pseudo-likelihood.
Let P* be a preliminary estimator for p and define
Assume first that g does not depend on p.
Thus, writing in this case
gi(EI) to denote g at zi' the likelihood is proportional to
p(p,EI.a)
N
2
1=
1
={1l. 19 .(EI)}
-1/2 -N
2 -1 N
2
a exp(-(2a) Z.1= l{Y.-f(x
..P)}/g.(EI)].
1
1
1
Let the prior distribution n(p,EI,a) for the parameters be proportional
to
a
-1
.
Then,
by
Bayes'
p(p,EI.alx) is proportional to
(1.11)
p(P.EI,a) n(p,9,a).
theorem,
the
joint
posterior
density
20
The marginal
p(p,6.0Ix).
posterior
for
6
may
be
computed
by
integration
of
For a nonlinear regression model the integral is hard to
compute in closed form, so, following Box and Hill (1974) and Beal and
Sheiner
(1986),
note
that
p. is a generalized least squares
if
estimator evaluated at the true 6, we have the linear approximation
Replacing f (xi ,P) by its linear expansion in (1.11), one can compute
the marginal posterior for 8 exactly as proportional to
(1.12)
p(8 )
(1l~=1 g~ (8)} 1/2
~(N-P)(9) {Det S (9)}1/2 '
G
where Det A
=
G
determinant of A.
I
e·
f the var iances depend on p, we
extend the Bayesian arguments by replacing gi(8) by g(zi,P.,9); see Box
and Hill (1974) and Beal and Sheiner (1986) for related discussion.
Let H be the hat matrix H evaluated at
From (1.5), pseudo-likelihood solves in
P.
and let h
ii
= h ii (P.,9).
(9,0)
( 1. 13)
~N
= ~ i=l
where ""9 (i ,P,6)
=
0/08
r
l
1
1
""9(1,,0.,8)
J'
{log g(Zi,P,8)}.
From
the discussion
in
21
Section 1.3 and the fact that H is idempotent, the left-hand side of
(1.13) has approximate expectation
(1.
14)
To modify pseudo-likelihood to account for loss of degrees of freedom.
the
suggestion
is
to
approxiaate expectation;
replaced by (1.14).
maximum
likelihood.
equate the left hand side of
(1.13)
to its
i.e., solve (1.13) with the right-hand side
We now show that this is equivalent to restricted
From
(1.13)
and
(1.14),
the
mod i f ied
pseudo-likelihood estiaator solves
(1.
15)
Taking logarithms in (1.12) and setting equal to zero the derivative
wi th respect to 9 yields (1.15) with the first term on the left-hand
side replaced by
thus, to show equivalence of the two estiaators we wish to show that
( 1. 16)
(1/2) a/a9 [log{Det SG(9)}]
22
From Nel (1980),
(1.17)
Letting V
0/09
[log{Det SG(9)}]
diag[[
V
9 (i,P*,9)]] gives
where Q is Q of Section 1.3 evaluated at
Nt""
p*.
Plugging this into (1.17)
"'"
and using the fact that NS (9) = Q Q and the definition of H yields
G
(1.16).
CHAPTER II
AN ASYMPTOTIC THEORY OF VARIANCE FUNCTION ESTIMATION
2.0
Introduction
In this chapter we construct an asymptotic theory for a general
class of regression-type estimators for 9 based on residuals from some
regression
fit.
This
formulation
includes
Section 1.3.1 as well as maximum likelihood.
properties
of
the
estimator
of
Rodbard
all
the
estimators of
We comment briefly on the
and
Frazier
and
Raab's
functional maximum likelihood estimator in the course of our discussion
as well.
We also investigate the effect of replacing in the methods of
Section 1.3.1 absolute residuals by sample standard deviations and.
when (1.8) holds.
of replacing predicted values f(x .• P*)
1
by sample
means Vi' in the "response function" part of the regression.
The technical assumptions necessary for investigating a nonlinear
model such as (1.1) and (1.2) are detailed; see. for example, Jennrich
(1969) and Carroll and Ruppert (1982a).
in
obtaining
general
insights.
we
do
Since our major interest lies
not
explicitly
state
such
technical assumptions so that important results are not obscured by a
complicated exposition.
The nature of the necessary assumptions
is
24
apparent in the proofs of
throughout that the
2.1
{~.}
1
major
results
in
Section
2.6.
Assume
are independent random variables.
Methods based on functions of absolute residuals
Write d (,0)
Let HI be a smooth function and
i
define H . by
2 .1
H .
2 .1
where
H . (I] .8 .fJ)
2 ,1
is a scale parameter which is usually a function of a only.
I]
If
1]*. 9* and ,0* are any estimators for 1]. 9. and p. define I] and 9 to be
the solutions of
e·
N- 1/ 2 XN.
(2.1)
1=1
H
(
4.i 1].
9
,
fJ )
o.
*
where H . (I].9.p) is a smooth function and H . is usually the partial
3 i
4 i
derivative of "2 ,1. with respect to (q.9).
The
includes
class
of
estimators
solving
(2.1)
directly
or
an asymptotically equivalent version of the estimators of
Section 1.3.1; we exhibit this in Section 2.5.
which account for
depend on the {h
ii
Note that for methods
the effect of leverage. H • . H ,i and H . will
2 i
3
4 i
}.
It may be shown by that in this case we need the
additional assumption that if h =
zero.
includes
l~~~N h ii • then N1 / 2h converges to
Such an assumption is typical in studies of robust regression
estimators.
25
1/2
Let q*. 9* and p* be N
Theorem 2.1.
and
p.
consistent for estimating q. 9
Let HI be the derivative of HI and define
C.
1
-N
-1 N
~.
1=
1 (H 4 ./H3 .) a/ap {H 2 i(q.9.P)};
.1.1
•
Then. under regularity conditions as N ~
~.
(2.2)
We
may
immediately make
estimator 9 solving (2.1).
depend on
P.
some
general
observations
about
Note that if the variance function does not
then H2 ,i does not depend on
P
and hence B2 •N
the estimators of Section 2.1. HI is an odd function.
= o.
Thus.
errors {~i} are symmetrically distributed. E[ Hi{di(p)}sign(~i)
and hence B ,N
3
= o.
the
The following result is then immediate.
For
if the
] = 0
26
Corollary 2.lla1.
Suppose that the variance function does not depend
on p and the errors are symmetrically distributed.
Then the asymptotic
distributions of the regression estimators of Section 1.3.1 do not
depend on the method used to obtain p*,
If both of these conditions do
not hold simultaneously. then the asymptotic distributions will depend
in general on the method of estimating p.
a
The implication is that in the situation for which the variance
function
does
symmetrically
not
depend
distributed.
on
for
p
and
large
the
data
sam~~e
are
sizes
approximately
the
preliminary
estimator for p will play little role in determining the properties of
9.
Note also from (2.2) that for weighted methods, the effect of the
preliminary estimator of 9 is asymptotically negligible regardless of
the underlying distributions.
The preliminary estimator p* will be in general the unweighted
least squares estimator, a generalized least squares estimator or some
robust estimator.
See, for example, Huber (1981) and Giltinan, Carroll
and Ruppert (1986) for examples of robust estimators for p.
easily shown that for some vectors {v
.},
N ,1
It
is
these estimators admit an
asymptotic expansion of the form
(2.3)
Here
~
is odd in the argument e.
on p, B , N
2
~
In case the variance function depends
0 in general; however,
i f the errors are symmetrically
distributed and p* has expansion of form (2.3), then the two terms on
27
the
right-hand
side
of
(2.2)
are asymptotically independent.
The
following is then immediate.
Corollary
2.1C b) .
Suppose
that
the
errors
are
symmetrically
distributed and that p* has an asymptotic expansion of the form (2.3).
Then for the estimators of Section 1.3.1, the asymptotic covariance
matrix of 9
is a monotone nondecreasing function of the asymptotic
covariance matrix of p*.
D
By the Gauss-Markov theorem and the results of Jobson and Fuller
(1980) and Carroll and Ruppert (1982a),
the implication of Corollary
2.1(b) is that in general using unweighted least squares estimates of p
will
result
in
inefficient
estimates
of
9.
This
phenomenon
exhibited in small samples in the Monte Carlo study of Chapter 5.
is
The
result suggests that i f one has available as p* the unweighted least
squares estimate, one ought to iterate the process of estimating 9 -use the current value p* to estimate 9 from (2.1), use these p* and 9
to obtain an updated p* by generalized least squares and repeat the
process
~
- 1 more times.
It is clear that the asymptotic distribution
of 9 will be the same for
~
~
2 with larger asymptotic covariance for
~
= 1, so in principle, asymptotically at least, one ought to iterate
this process at least twice.
2.2
Methods based on sample standard deviations
Assume
observations
at
so
each
that
of
M design
N
Mm
points
we
represents
have
the
m
~
total
2 replicate
number
of
28
observations.
where
2
s.
We may compute then the sample standard deviations {si}'
(m-1)
1
themselves
have
disasterous
-1 m
Z, 1
J=
been
results
exists. however,
(Yl' J'
proposed
as
-
Y.
-
2
l'
as
described
).
Sample
standard deviations
estimators
of
in
1.
Chapter
the
variance
When
with
replication
practitioners feel comfortable with the notion that
the {s.} may be used as a basis for estimating variances; thus, one
1
might reasonably seek to estimate 9 by replacing d (,0*) by si in the
i
general equation (2.1).
The following result is almost immediate from the proof of Theorem
2.1 in Section 2.6.
Theorem 2. 2 .
"ere we let N ~
If d i (,0*)
~
such that m remains fixed.
is replaced by s i
in (2. 1).
then under the
conditions of Theorem 2.1 the resulting estimator for 9 satisfies (2.2)
with 8 ,N
3
=0
and the redefinitions
("4 ,I.1"3 .1·)("l(s.)
- "2 .1.};
1
(2.4)
"2 ,1.
"2 ,1.(q,9,P)·
0
If the errors are symmetrically distributed, then from (2.2) and
the result of Theorem 2.2 we have that whether one is better off using
absolute residuals or
sample
Section
only on
1.3.1 depends
standard deviations
the differences
values and variances of "l{d (P)} and H (si)'
1
i
such comparisons
preferred
importance.
to
in
the methods
of
between the expected
In Chapter 4 we exhibit
explicitly and show that absolute residuals can be
sample
standard deviations
in
situations
of
practical
29
2.3
Methods not depending on the regression function
We assume throughout this discussion that the variance function
has form
(1.8) and that m
~
2 replicates are available at each xi'
From Section 1.3.1 we see that the "regression function" part of the
estimating equations
depends
equation (2.1)
"3 .
"2 .,
,1
,1
so
that
in
the
and "4 ,1. all depend on f(x.1 ,fJ*).
settings, one may not postulate a form for
the~.
1
general
In some
for estimating 9; the
method of Rodbard and Frazier (1975), for example, uses s. in place of
1
as in Section 2.2 and replaces f (x. ,fJ*) by the sample mean Y. .
1
We now consider the effect of
l'
replacing predicted values
by sample
means for the general class (2.1).
The presence of the sample means in the variance function in (2.1)
requires more complicated and restrictive assumptions than the usual
large sample asymptotics applied heretofore.
The method of Rodbard and
Frazier and more generally the general method (2.1) with sample means
are
functional
nonlinear errors in variables problems as studied by
Wolter and Fuller (1982) and Stefanski and Carroll
setting,
the
standard asymptotics
letting a go to zero at rate N-
1/2
for
.
these
(1985).
problems
In our
correspond
to
In Section 2.4 we discuss the
practical implications of assuming a to be small; for now, we state the
following result.
Theorem 2. 3 ..
and
"4 .
,1
theorems.
in Theorems 2.1 and 2.2 and adopt the assumptions of those
Further, suppose that as N ~
~,
a
~
0 simultaneously and
30
Nl/2Z~=1 Ci has a nontrivial asymptotic normal limit
(ii)
•
distribution;
(iii)
The
{~i}
are independent and identically distributed random
variables.
Either the
{~.}
1
are symmetric or A
=
O.
{IYi' - ~il / a}2 has uniformly bounded k moments for
(iv)
some k > 2.
Then the results of Theorems 2.1 and 2.2 hold with 8 ,N
8 3 ,N -
2
o. o
This result shows that under certain restrictive assumptions, one
may
replace
predicted
values
by
sample
means
under
replication;
however, it is important to realize that the assumption of small a is
not
generally
valid
and
hence
the
use
of
sample
means
may
be
disadvantageous in situations where these asymptotics do not apply.
The estimator of Raab (1981) discussed in Section 1.3.2 is also a
functional nonlinear errors in variables estimator, complicated by a
parameter space with size which is C(N).
Sadler and Smith (1985) have
observed that the Raab estimator is often indistinguishable from the
same estimator with~. replaced by
1
Y.l ' in (1.10); such an estimator is
contained in the general class (2.1).
satisfied
in
this
case.
In
Chapter
Note that
3 we
(iv)
show
is
that
trivially
under
the
asymptotics of Theorem 2.3 and additional regularity conditions that
31
the two estimators are asymptotically equivalent.
We may thus consider
the result of Theorem 2.3 relevant to this estimator as well.
2.4
Small a asymptotics
In
Section
2.3
we were forced by technical
pursue an asymptotic theory in which a is small.
It
considerations to
turns out that in
some situations of practical importance these asymptotics are relevant.
In particular, in assay data values for a are often observed which are
quite small relative to the means and for which assumption
Theorem 2.3 is reasonable.
(i)
of
Further, such asymptotics are used in the
study of data transformations in regression.
It is thus worthwhile to
consider the effect of small a on the results of Sections 2.1 and 2.2
and to comment on some other implications of letting a
~
o.
The following observations can be made from the discussion in
In the situation of Theorem 2.1,
Section 2.5.
if
the errors are
symmetrically distributed, then for the estimators of Section 1.3.1, if
a
~
0 as N
parameter p.
~
00,
there is no effect for estimating the regression
In the situation of Theorem 2.2, the errors need not even
be symmetrically distributed.
The major insight provided by these
resul ts is that in certain practical situations in which a
the choice of
is small,
p* may not be too important even if the variance function
depends on p.
Small a asymptotics may be used also to provide insight into the
behavior of other estimators for 9 which do not fit into the general
framework
of
(2.1).
We use small a
asymptotics
in Chapter 3 to
evaluate the properties of the extended quasi-likelihood estimator and
32
to gain insight into the behavior of the maximum likelihood estimator.
We show that the extended quasi-likelihood estimator is asymptotically
equivalent to weighted regression estimators based on squared residuals
and the maximum likelihood estimator when a
2.5
~
O.
Examples and further results.
The asymptotic theory constructed in this chapter has allowed us
to state some general characteristics of regression-type estimators of
9.
In this section we use the theory to exhibit the specific forms for
the various estimators of Section 1.3.1.
v(i.p,9)
and
let
V
Throughout, define
e-
log g(zi,p,9),
9 (i.p,9)
and
V
p
(i,p,9)
be the column vectors of partial
derivatives of v with respect to 9 and ,.0.
Further, let
T(i,p,9)
-1 N
N
~
where
(,.0 ,e
1: =1 T(i.p,e) T(i.p,9)
i
t
,
)
ve (,.0, e)
is the mean of the v (i ,,.0, e) .
e
For simplicity, assume
33
{~i}
that the errors
kurtosis
2.5.1
K;
K
=
are independent and identically distributed with
0 for normality.
Pseudo-likelihood. restricted maximum likelihood and weighted
squared residuals.
If when accounting for the effect of leverage we let h
~
Writing q
0,
= log
then these methods
a, we have H (x)
1
are
~
0 such
asymptotically equivalent.
= x2 ,
2
exp(217) g (zi,p,9),
H .
3 ,1
and E [ "1{d i (P)} sign(~i) ] = 2 E [ Vi - f(xi,p)] = 0 so that B3 ,N
regardless of the form of the underlying distributions.
B1 ,N = 4D 1 ,N' B2 ,N = 4D 2 ,N and Ci = 2
depend on ,0, or if as a
~
2
(~i
- 1) r(i,p,9).
=0
Algebra yields
If g does not
0
(2.5)
which is satisfied by unweighted least squares and generalized least
squares estimators and the maximum likelihood estimator,
then
e
is
asymptotically normally distributed with mean 9 and covariance matrix
(2.6)
Thus,
(2 + K) {4N ~ (,0 ,9 ) }
the
regression
-1
.
estimators
considered here are asymptotically
equivalent to the maximum likelihood estimator when a is small or g
34
does
depend on f3;
not
we exhibit an asymptotic expansion for
the
maximum likelihood estimator in Chapter 3.
As mentioned in Section 2.4, under
Theorem
2.3,
the
asymptotically
covariance
i)~>-
-:0-1
extended
equivalent
matrix
ikel ihood,
likelihood,
quasi-likelihood
to
the
(2.6) .
weighted
maximum
the small a
estimators
Thus,
squared
likelihood
estimator
here
if
with
these
under
residuals,
and,
asymptotics of
of
quasi-likelihood, are all asymptotically equivalent.
...
is
asymptotic
conditions,
restricted
a
9
maximum
extended
0,
In addition, all
of these estimators have influence functions which are linear in the
squared errors, indicating substantial nonrobustness.
We may also observe that efficiency considerations dictate that
these
methods
residuals.
are
preferable
to
unweighted
regression
on
squared
To see this note that we may write (2.6) as
(2.7)
4
where W is the N x N diagonal matrix with elements "3 . and V is the N
,1
t
x p matrix with i th row "4 .,
,1
For the unweighted estimator based on
squared residuals, calculations similar to those above show that the
asymptotic covariance matrix when either a ... 0 or g does not depend on
fJ is given by
(2.8)
4
e·
35
The
comparison
between
Gauss-Markov theorem,
and
(2.7)
is
(2.8)
simply
that
of
the
so that (2.7) is no larger than (2.8) in the
sense of nonnegative definiteness.
Logarithms of absolute residuals
2.5.2
We
do
residuals.
not
consider
Here H (X)
1
deletion
of
the
few
•
log x so that H (x) = x
1
=
smallest
-1
absolute
Letting q = log
0
and assuming independent and identically distributed errors we have
H2 ,i
= q
+
v(i,P,9)
+ E
log I~I, H3 ,i
5 1
and H4 ,i
=
B1 ,N = D1 ,N' B2 ,N = D2 ,N' Ci = { log I~il - E log I~I
r(i,p,9), so that
} r(i,p,9)
and
sign(~
(2.9)
.)
1
]
.
Under the assumption of symmetry of the errors, with g not depending on
p or a
~
0, and assuming (2.5), algebra shows that 9 is asymptotically
normally distributed with mean
(2.10)
var {log
I~
2
1
}
{4N
~
9
and covariance matrix
(P ,9 )}
-1
.
The influence function for this estimator is linear in the logarithm of
the absolute errors, indicating less severe nonrobustness than for the
squared residual estimators.
residuals
estimators
logarithm
method
of
symmetrically distributed.
Note that in general for the squared
Section
o in
2.5.1,
B ,N
3
general
= 0,
unless
whereas for the
the
errors
From (2.9) we see that even if
0
~
are
0 and
36
(2.5) holds,
if the errors are not symJletric then
there will be an
addi tional effect due to estimating p not present for the methods of
Section 2.5.1, even if g does not depend on p.
2.5.3
Absolute residuals
Here,
assume that
the
distributed and let exp(q)
We have H1 (x)
=
so that 8
o1,N
1,N
Ci
x, H1 (x)
8
errors
a E
=
=
are
I~I.
1, H ,i
2
independent
and
identically
Consider the weighted estimator.
exp(q) g(zi,P,9) and H3 ,i
=
2
H2 ,i
o2,N
2,N
[{ I~il/E I~I
}-
1] T(i,P,S) and
e·
-1
N
aEI~
I
Thus. if the errors are symmetrically distributed and either g does not
depend on p or a
-+
O.
I]
is asymptotically normally distributed with
mean 9 and covariance matrix
(2.11)
where ~ = var I~I.
The
influence
function
for
this
estimator
is
linear
in
the
absolute errors. indicating less lack of robustness than methods based
on squared residuals.
Note that if the errors are not symmetric. even
if g does not depend on
estimating
p there will be an additional effect due to
p not shared by the estimators based on squared residuals.
37
By an argument similar to that at the end of Section 2.5.1, we may
conclude that when the effect of !J* is negligible one should use a
weighted estimator and iterate the method.
2.6
Proofs of major results
We now present sketches of the proofs of Theorems 2.1, 2.2 and
2.3.
Our exposition is brief and nonrigorous as our goal is to provide
general insights.
In what follows, we assume that !J* satisfies 2.5 for
alIa and that
-
~
(2.12)
- q ]
[ 9 - 9
under sufficient regularity conditions it is possible to prove (2.12).
Such a proof would be long, detailed and essentially noninformative;
1 2
see Carroll and Ruppert (1982a) for a proof of N / consistency in a
special case.
Sketch of proof of Theorem 2.1:
that E [ H {d.(!J)} ]
1 1
(2,13)
o
=
From (2.1), a Taylor series, the fact
H ' and laws of large numbers, we have
2 ,1
-1/2 N
N
X'_1
(H 4 , ./H
.)[H 1 {d.(!J*)}
- H2 ,1.(q,9,!J*)]
11 ,31
1
+ 0
P
(1)
By the arguments of Ruppert and Carroll (1980) or Carroll and Ruppert
(1982a) ,
38
N-1/2XNl'=1
(H
./H
4 ,1
.)
3 ,I
H1 {d.(,6)}{d.(,6*)
- d.(,6)}
1
1
1
+ ().
P
(1)
Applying this result to (2.13) along with a Taylor series in H . gives
2 ,1
- '1
(). (l),
p
- 9
which is (2.2).
e-
0
The result of Theorem 2.2 follows by a similar argument; in this
case the representation (2.14) is unnecessary.
Sketch of proof of Theorem 2.3:
Theorem
2.1
is
similar.
We consider Theorem 2.2; the proof for
Recall
here
that
(1.8)
In the
holds.
following, all derivatives are with respect to the mean }J.
1
definitions of C and H ,i are as in (2.4).
i
2
Assumption (iv) implies that N1/ 2 max
1~ i~N
Taylor series in '1, 9 and Y.
1-
gives
IV.1. -
and the
}J.,2 ~ 0 so that a
1
39
(2.15)
N- 1/ 21: N C, - N- 1/ 21: N
(if H /H
) (Y'
- J.J, )
i=l 1
i=l
2,i 4,i 3,i
i·
1
+0
P
Since V,
J.J
l'
,
1
=
CJ
(1),
g(J.J "Zl,,9) -E., "= AN -1/2 g(J.J .. z.,9) -E.,
1
l'
1
1
l'
,
where~,
l'
is
the mean of the errors at x,. we can write the last two terms on the
1
right-hand side of (2.15) as
(2.16)
AN-lIN. =1 -E.. ( qi 1 + q. 2C ' )
l'
,
1,
1
1
for constants {q. .}.
1.
probability to zero if
of symmetry.
Since
J
E(~.
l'
Eo.
l'
has mean zero,
(2.16) converges in
C.) = 0, which holds under the assumption
1
If A = O. then (2.16) is trivially equal to zero.
(2.16) converges to zero which from (2.15) completes the proof.
that if we drop the assumption of symmetry and if A
~
Thus.
Note
0, (2.16) implies
1 2
that the asymptotic normal distribution of N / (; - 9) will have mean
p-lim { A
N-P>
B~~N N-11:~=1
(
~i'Ciqi,2
)}.
C
CHAPTER III
OTHER ESTIMATORS
3.0
Introduction
In this chapter we
estimators
for
9
investigate the properties of some of the
described
in
Section
1.3.2.
While
the
maximum
likelihood estimator fits into the framework of Section 2.1. we treat
it separately here.
modified
maximum
The extended quasi-likelihood estimator and the
likelihood
estimator
of
Raab
included in the formulation of Chapter 2;
are
not
explicitly
the properties of these
estimators are derived in this chapter.
We show in this chapter that under certain conditions the maximum
likelihood and extended quasi-likelihood estimators are asymptotically
equivalent to the estimators of Section 2.5.1.
under
certain
conditions
the
Raab
estimator
We also verify that
is
asymptotically
equivalent to that of Sadler and Smith as described in Section 2.3.
Further results and discussion of all
Chapter 4.
Assume throughout that the
the estimators are given in
{~.}
1
are independent.
e-
41
3.1
Maximum likelihood
The maximum likelihood estimator for 9
(1.5)
for
is obtained by solving
and 9, where fJ* is the maximum likelihood estimator of fJ.
0
Thus, we can use the result of Theorem 2.1 to obtain an asymptotic
expansion for the maximum likelihood estimator.
.
f or N1/2(~
an expanSIon
f"*
i n t erms
It)
f"
-
0
To do this, we derive
f N1/2(~ _
Of
A
n
of
,9
-
9)t and
substitute the result into (2.2) when B1 ,N' B2 ,N' B3 ,N and Ci are as in
Write 17 = log
Section 2.5.1.
0,
let B1 ,N' B2 ,N' B3 ,N and Ci be defined
as in Section 2.5.1, and let the generics (fJ,9,17) denote the maximum
likelihood estimators of (fJ,9,17).
p (i,fJ,9 )
flt(x. ,fJ) / g(z. ,fJ,9);
I
f"
G
N
G
1,N
Define
+ 20
2
I
G
2,N
From (1.5), the maximum likelihood estimator of (fJ,9,17) solves
(3.1)
where
o
[
1'
42
/
N-1 2 Z ~ =1 [
r i P (i.,.6.6)
--...,x..----...,x,...-".x-
exp(2J]) g(z .. ,.6.6)
1
+ {
2
x
x
x
exp ( 2J]) g (z .. ,.6. 9 )
1
and
2
M
2,N
N-1/ 2
z N.1=1
........._r1-;'~--,......... A
2
x x
{
I}
-
(.
T 1,,.6.
9)
.
exp (2J]) g (z .. ,.6. () )
1
Under
regularity
consistent.
thus.
conditions,
by a
Taylor
the
maximum
series
in
likelihood
(3.1)
and
estimator
laws of
is
large
numbers, we obtain by tedious calculations that
2
-2 GN
a
t
,.6 - ,.6
-B 2 ,N
N1/ 2
-B 2 . N
B1 ,N
fl - fl
9 - 9
(3.2)
(1).
+Q
P
The second equation in
(3.2)
residual
it should be.
estimators,
as
yields. upon rearrangement,
is simply
(2.2)
for weighted squared
The first equation in (3.2)
e-
43
a
2
2
lJe~
1 2
-1 B t ,N N / [ _ - lJ ]
GN
2
- e
(3.3)
(3.3) affirms our earlier assertion that (2.5) holds for the maximum
likelihood estimator of p.
Substituting into (2.2) gives the following
result.
Theorem
3.1.
Under regularity conditions.
as N ...
00,
the maximum
likelihood estimator admits the expansion
(3.4)
To compare (3.4) with the result for the methods of Section 2.5.1,
we state explicitly the expansion for these methods.
To do so we need
an expression for N1/2CD* - P), where here again /3* is a preliminary
estimator for p.
The discussion at the end of Section 2.1 suggests
that we consider as p* a generalized least squares estimator of p as in
Theorem A of Chapter 1.
(3.5)
Then (2.3) becomes
44
where v
-1
(3.5)
. = G
p(i,p,9).
I,N
,N,1
shows how
generalized least squares estimator of p.
(2.5)
holds for the
If (3.5) holds, we obtain
the following expansion for weighted squared residuals methods:
- q ]
- 9
(3.6)
2 N-
+
1/2
X~=1 (E.~
a N- 1/2 ~i=1
~N
E.
i
-
B
1) r(i,p,9)
1
(. It 9)
2,N G-I,N pI,,...,
+ () (1).
P
(3.4) and (3.6) show that the maximum likelihood estimator and the
estimators of Section 2.5.1 are in general different asymptotically.
Thus, if normality obtains and the model is correct, the latter (as
well as any other estimators in Chapter 2) are not efficient.
In many
situations, however, the following result is relevant.
Corollary 3.llal.
Suppose that either the variance function does not
depend on p, so that G = G ,N and B ,N = 0, or that a
N
1
2
G1 ,N
~
0 so that G ~
N
Then the maximum likelihood estimator of 9 and the estimators of
Section 2.5.1 are asymptotically equivalent.
D
The corollary is a reaffirmation of the implication of Corollary 2.1(a)
and what we would expect from the discussion of Sections 2.4 and 2.5.1
when a
implies
~
O.
that
Note that no symmetry is required here.
in
many
situations
of
practical
The result
importance
the
computationally simpler weighted squared residual type estimators equal
e-
45
the asymptotic performance of the maximum likelihood estimator and so
when
the data are approximately normally distributed,
squared residual methods are asymptotically efficient.
the weighted
This result has
special relevance to the analysis of assay data to which the small a
asymptotics apply.
A further
comparison
cur ious resul t .
of
(3.4)
and
(3.6)
yields
the following
From (3.6), we see that the asymptotic covar iance
matrix of the estimators of Section 2.5.1 when P* is the generalized
least squares estimator and
without bound in a.
asymptotic
normality
obtains
increases
In (3.4), however, note that
This shows that the asymptotic covariance of the maximum likelihood
estimator of 9 will stay bounded for all a.
While this observation may
have implications favoring maximum likelihood in some situations,
result
of
application.
likelihood
Corollary
3.1(a)
Further,
discussed
is
the
in
of
more
difficulties
Section
1.3.2
practical
associated
make
relevance
with
preference
the
in
maximum
for
this
estimator tenuous.
3.2
Extended quasi-likelihood
Recall for the extended quasi-likelihood estimator we assume that
(1.8) holds; we write g(JJ.,z.,9) and JJ.
1
fact.
1
1
= f(x.,P)
to emphasize this
1
Throughout this section, let B ,N and C be as in Section 2.5.1,
i
1
46
let q = log a and let the generics (p,6,q) represent the joint extended
quasi-likelihood estimators for (p,6,q), the true values.
J.J
fy
(3.7)
From (1.9),
y - u
2
Write
duo
g (u,z,9)
the extended quasi-likelihood estimator for
(P,9,q)
solves in ,0,9, and q
Ql,N(P,9,q)
o
(3.8)
Q2 ,N(,6,9 ,q)
Q3,N(,6,9,q)
e-
where
1/2 IN
Ni = 1 {2
- e -217 H9 ( Yi ,J.J i ' Z i)
-}
I ; an d
(3.9)
- {a/a9 log g(Y., Z. ,9)} ].
1
1
The fact that the extended quasi-likelihood estimator is consistent as
N~
~,
a
~
0 follows from the result below.
47
Lemma 3.2.
(3.10)
Under regularity conditions,
N
-1
N
E. 1 H (Y ,
1=
A
,Il • ,
1"'1
l:I
z 1. )
(3.11)
+ 0
3
N-1~N
"'I' =1 s2
3
4
. E.. + () (0 );
,lIp
(3.12)
for constants {So .}, j
J,l
Sketch of oroof:
series in
0
E.
about O.
1, 2, 3, where
The proofs of (3.10) Write
(y - IJ) /
{a g (IJ ' Z ,9 ) },
(3.12) follow from Taylor
48
and note that
(3.13)
.,
fJ +
- f
H (y,fJ,z)
0
g(fJ,z,9)E.
2
(y - u) / g (u,z,") duo
J..i
Here, 9 represents the true value of the parameter.
For (3.11) and
(3.12), assuming that g is sufficiently regular so that the interchange
of differentiation and integration is legitimate, we obtain
(3.14)
a/a., H (Y.J./,z)
"t
fJ +
2
0
f
g(fJ. z ,9)E.
3
(y - u) g.,(u,z,9)/g (u,z,9) du
fJ
(3.15)
2
a /0"
2
H (y,J./,Z)
'Y
2
f
fJ
+ 0
g (fJ • Z ,9
)E.
.,., (u,Z,.,)/g3 (u,z,'Y)
(y - u)[ g
J.i
4
t
- 3 g.,(u.z."t) g.,(u.z ..,)/g (u.z,.,) ] duo
For illustration, we show the form of the expansion for (3.10); the
others are similar.
A Taylor series in
0
about
a in (3.13) gives, upon
simplification and under regularity conditions that
49
"'Y(y,J.l,Z)
-
(y - u) g(J.l,z,6) e. / g 2 (J.l,z,'Y)
0
+
~2{
+
~
3
g2(J.l,Z''Y)4+ 2 g'(J.l,z,'Y) (Y-J.l) } l(J.l,Z,6) e.2
g (J.l, Z ,'Y )
2
[ g (J.l, Z,'Y) g (J.l, Z,'Y) - g (J.l, Z,'Y) g
I
3
5
g (J.l, Z , 'Y
+ 4 (g'(J.l,z.'Y)}
2
(y-U)]
5
g (J.l, Z , 'Y
g
3(
I
,
(J.l, Z,'Y )
)
"') ",3
J.l,z,~
~
)
4
+ CJ (0 ),
where g' (J.l,z,9) and g" (J.l,z,e) are the first and second derivatives of
g
with
respect
to
its
first
argument.
Thus,
under
regularity
conditions, for some constant sl'
"6 (y ,J.l,
12233
0
+ sl e. 0
- 2' e.
z)
+ CJ
(0
4
).
Applying the above calculations to the left-hand side of (3.10) yields
under regularity conditions the result.
0
From (3.10) and (3.11) of Lemma 3.2, it is easily seen that with
sufficient smoothness conditions on g, the estimating equations (3.8)
are unbiased as N ~
(3.16)
N-1/2
~,
0
~
0, i.e., as N ~
Q. N (p,9,q ) _p 0, j
J,
implying consistency of the solution.
~,
0
~
0,
1, 2, 3,
The theory of M-estimation (see,
50
for example.
sufficient
Huber
(1981)
regularitY
or Serfling
conditions,
(1980»
if
the
implies
solution
that.
to
under
(3.8)
is
consistent in general for all fixed a. then (3.16) should hold as N
00.
a
This is clearly not apparent from
fixed.
(3.9).
~
A simple
heuristic argument to see that the extended quasi-likelihood estimator
of 9 need not be consistent in general can be pursued as follows.
For
simplicity. assume p is known so that we need only consider the second
two equations of (3.8).
N-
+
1
These equations imply that we solve in 9
1:~1= 1
-1
{2 N
a/a9 H (Y .• J.I •• Z.)
9
N
1: i = 1
1
1
He (Yi .J.I i ' Z i ) }{N
If the solution 9 is consistent,
that as N -+
1
-1
N
1: i = 1 a / a9
log g (Yi . Z i .9 ) } .
then for all fixed a we should have
00
(3.17)
Lemma 3.2 implies that as N ~
If (3.17) holds for all
that
(3.18)
0,
00,
a
~
0
then we should also have as N
~ 00.
0
~
0
e·
51
It can be shown by detailed calculations that (3.18) does not hold in
general.
These calculations involve expansions of (3.10) and (3.11) to
terms of order
The
0
4.
implication
of
the above discussion
quasi-likelihood method may not
estimator for 9.
relevant
the
parameters.
even admit
is
does
provide
consistent
We now show that in the small
the extended
in general
but in situations where the small
method
that
0
a
consistent
asymptotics are
estimates
of
the
asyaptotics, the extended
0
quasi-likelihood estimator of 9 is in fact asymptotically equivalent to
the maximum likelihood estimator and the estimators of Section 2.5.1.
Theorem 3.3.
Under regularity conditions,
0,
distributed,
and
either
A
=
if, as N ...
0 or the
{e..}
1
and
00
0
...
O.
are symmetrically
then the extended quasi-likelihood estimator admits
the
expansion
-1/2
2 N
~:
o
~N
"'i=l (E.
2
i
- 1) T(i,P.
{J
) + v p (l).
A Taylor series in (3.8) using consistency, a Taylor series in
about 0 using Lemma 3.2 and
simplification
laws
of
large
numbers
yield
after
52
G1 ,N
0
0
1
2' 8 1 ,N
fJ - fJ
N1/ 2
q - q
9 - 9
(3.19)
a N
1 / 2 ... N
(1
~'1E..P"
1=
1
fJ 9)
1 2
From Lemma 3.2, (3.10) and (3.11), the lJ (N / a) term will be of the
p
form
(3.20)
S.
1
for some constants
{si}'
3
E..
1
thus,
if
the
{E..}
1
are symmetric, we only
1 2
require N / a = lJ(1) in order that the remainder term be a (1), while
p
1 2
if the {E. } are not necessarily symmetric we require N / a ~ O.
i
(3.19)
implies that
which
shows
upon
comparison
quasi-likelihood estimator of fJ
with
is
(3.5)
that
asymptotically
the
extended
equivalent
to
a
generalized least squares estimator of fJ as in Theorem A of Chapter 1
when a
~
O.
(3.19) also implies that, as N ~~, a
~
0,
53
~ :]
which from (3.4) and (3.6) is the expansion for the maximum likelihood
estimator and the estimators of Section 2.5.1 when
0
o.
~
0
The asymptotic results presented so far suggest that in situations
where small
0
asymptotics are relevant, the extended quasi-likelihood
estimator is a
reasonable competitor to maximum likelihood and
the
weighted squared residual methods in terms of asymptotic performance.
Note that technically we need the additional requirement on the rates
at which N
and
~ ~
0
~
0 for this result to hold.
While we see that
the estimator may not be consistent when these asymptotics do not
apply, the seriousness of this result is unclear from our discussion.
We now investigate the character of
the
inconsistency by means of
several examples.
For s impl ic ity , assume that the {J.i.} (and hence fJ) are known so
1
that
our
focus
is
restricted
to
estimation
of
9
and
0.
For
definiteness, assume also the power of the mean model
(3.20)
Note in (1.9) that for (3.20), g(Y ,zi,9)
i
becomes
infinite.
NeIder
and
g(Y,z,9) by g(Y+c,z,9) with c
from
the
fact
that
when
Q+
=
Pregibon
used
for
the
(1986)
suggest
replacing
1/6 in (1.9); this suggestion arises
is
distribution such as the Poisson,
approximation
= 0 when Yi = 0 so that Q+
an
approximation
to
a
discrete
the problem lies with the Stirling
factorials,
and
the
factor
c
1/6
54
represents the proper correction for
series.
the first
term in the Stirling
In the investigation described below we use this correction.
Since we assume fJ to be known, we consider only solution of the
second two equations of (3.8) and write Q. N(fJ,9.q) = Q. N(9.0). j = 2.
J.
J.
3.
to
emphasize
Briefly.
this
fact
and
the
fact
the theory of M-estimation as
that
we are
in Huber
implies that under regularity conditions. if 9 and
Q2.N(9 ,0) ]
estimating o.
(1981.
0
p.
130-132)
are such that
p
-.;.......... 0,
Q3.N(9.0)
and if
Q2.N(9,O) ]
A (9 ,0 )
Q3,N(9,O)
exists for all 9 and
0
and has a unique 0 at some (9*,0*). then
and
It is easily seen that under (3.20), A(9.0) exists and is finite only
i f expected values of
the form E { ya.
(log y)k } exist for various
values of a. and k; we thus restrict our investigation to distributions
for Y and values for 9 and
consider
the
requirement
example.
0
for which this requirement is met and
correction described in the preceding paragraph.
precludes
consideration
of
strictly
normal
data,
This
for
55
The
above
theory can be used
examples when the {J.i.}
to determine 9 *;
we ci te
take on a finite number of values
1
three
in equal
proportions:
(i)
a
Vi distributed as Poisson with mean
= 1.
However,
if
proportions, then 9*
then 9*
=
0.640.
J.i
i
.
In this case. 9 = 0.5 and
the {J.i.} take on the values 1 and 4 in equal
1
0.675; if the {J.i.} take on the values 1 and 5.
1
In these cases, a is large relative to the means.
If
the {J.i.} take on larger values, such as 30, 40 and 50 or 50, 75 and 100
1
in equal proportions, however,
This
example
shows
that
then 9 * is very nearly equal to 0.5.
there
are
cases
in
which
the
extended
quasi-likelihood estimator can be badly inconsistent, but also invites
the following conjecture.
The
consistent.
Poisson
It appears that as the {J.i.} increase, 9 is
1
distribution
is
one
.
+
for WhICh Q
is an
approximation to the actual distributional likelihood. and for large J.i.
1
appears to be behaving very nearly like the normal likelihood.
This
leads one to suspect that the extended quasi-likelihood estimator may
have properties similar to those of the maximum likelihood and weighted
squared residual estimators when the means are large.
We investigate
+
this conjecture below; we now consider examples for which Q is not an
approximation to the distributional likelihood in this way.
(ii )
distributed as Poisson with mean
as long as a
J.i
i
(9-1/2) $ 1.
If a
J.i
i
.
In this case the theory is valid
1 so that
J.i
i
$ 1. this setting is a
perhaps slightly artificial example of a case where a is large relative
to the means.
If a
is considered known so that only 9
need
be
56
{~.}
estimated, and the
take on the values shown in equal proportions,
1
we obtain the following results:
9
~i
9*
1.0
0.3,0.5
1.194
0.8
0.3,0.6
0.992
0.8
0.1,0.9
0.952
0.6
0.3,0.7
0.825
0.6
0.1,0.9
0.845
0.55
0.1,0.9
0.815
This example represents a case in which the small a asymptotics are not
valid and the extended likelihood estimator may be poor.
(iii)
+
a
9
~.
1
E..
1
. h
WIt
E. .
l
v.lw
1/2
1
, where v.
1
is truncated
standard normal on (-a,a) and w = 1 - (2/n)I/2{aexp(-a2/2)}/{~(a)-1}.
The theory is valid i f ~~ 1-8) ~ a a/w I /2 so that the means must be
relatively large compared to a.
For the following values of the
{~.},
1
a and 9 with a = 1, in virtually every case 9* is very nearly equal to
9:
9
~i
a
0.2
5,10
2.5
0.5
10,20
2.5
0.5
50,100
2.5
0.5
10,20
2.0
0.5
50,75
2.0
57
It can also be shown that if the e. are centered Uniform (a,b) random
1
variables with variance I,
similar results obtain for a variety of
values for the means.
The above examples show that while the extended quasi-likelihood
estimator may be inconsistent in some instances, this need not always
be the case.
We
now
more
examine
closely
the
behavior
of
the
extended
quasi-likelihood estimator when the means are large as suggested by the
discussion at the end of Example (i).
More specifically, we show in
the case of (3.20) for fixed a, under regularity conditions, that as
long as 9 < I, when the means increase the extended quasi-likelihood
estimator behaves as it does in the small a asymptotics, thus in some
sense verifying the conjecture.
(and hence the
a.
Let
~O,N
{~.})
For the following discussion,
let f3
be known so that we focus on estimation of 9 and
1
be the median of the
{~i}
and define
and
so that e i
=
that as N ...
(Y *i -
*
~i)
/ (5
*
~i)'
where 5
(9 -1)
a ~O,N
.
Assume further
00,
min
1SiSN ~i
and
_00
in such a way that as N ...
00
the
~O.N -
*
{~i}
00
* as well as various sums
and {Vi}
58
of functions of the
{~.}
1
are well-behaved in a sense that will become
obvious in the calculations below.
Ii
fY
(y - u)/u
Note that as long as e < 1. 6
above
conditions.
results
He (y .~). so that as N
for the small
Lemma 3.4.
0
~ ~
~
Rewrite (3.7) as
2e
duo
0 as N ~~.
similar
to
We now show that under the
those of Lemma 3.2 hold for
the asymptotic calculations parallel those
case.
Under regularity conditions. if B < 1. then.
(3.21)
N- 1z N
i=l He (Y i '~i)
(3.22)
N- 1x N
i=l
0
2
-1 N
2
- '2 N Z.1= IE.·1
alae He(Yi'~i) =
0
-1 N
3
N Z.1= 1t 1 ,1.E..1
+ 6
2 N-1x~
1=1
2
log
1
E..
~.
1
(3.23)
Sketch of proof:
and
~
=
(e -1)
0
~O
E.
{~.}. ~
1
median of the
.
Let
e
(y - ~)/(o ~ )
(y
*
- ~
*
)/(6
~
*e
).
*
+
2
(6 );
(J
P
59
By a change of variables, note that
0
-
(3.24)
*
*9
2 r}.l + o}.l Eo
~
o
J *
(y
*
2,-
- w)/w
}.I
(3.25)
a/a,- H (y ,}.I )
20
- 02
(3.26)
2
a la,-
2
fy
-2
'1
2
}.I
*
(y - u) log u/u
+ 0 }.I
*9
Eo
f }.I *
2,-
4
(y,}.I)
40
2
0
2
du
(y * -w)(log z
}.I
H,-
dw;
}.I
2
(y - u) (log u) lu
fy
*
}.I + o}.l
*9
Eo
*
(y -w) (log w
f}.I *
+
29
roles of
}.I,
y and a, respectively.
}.I
*
2,-
dw;
du
log
The integrals on the right-hand sides of (3.24) same form as those of (3.13) - (3.15) with
log }.Io)/w
+
,2
}.Io'
(3.26)
'
2,dw.
w
are of the
y * and 0 playing the
,
Thus, by the same calculations used
in Lemma 3.2. where the Taylor series are now in 0 about 0, we obtain
that for constants {t.}, j = 1, 2, 3, depending
J
=a2
Eo
2
log}.l
+ ~
L3
t 2"
on}.l
+
*•
(J ( ~
2
);
60
Applying these calculations to the left-hand sides of (3.21) - (3.23)
yields under regularity conditions the results.
C
We may now use arguments entirely similar to those in the small a
asymptotics to conclude that the extended quasi-likelihood estimator is
consistent under these conditions and, by a Taylor series in the second
two equations of (3.8) using consistency, a Taylor series in 6 about 0
using Lemma 3.4 and laws of large numbers, that
(3.27)
where here V9 (i,P,9)
=
the proof of Theorem
right-hand side be
0
p
log
~i
3.3,
in the definitions of B ,N and Ci .
1
in
order
that
the
second
term
As in
on
the
(1), we require that either the {e.} be symmetric
and N1 / 26 = a{N1/2~~~;1)} ~ A* ~ 0 or that N1/2~ ~ O.
1
From the results
of Sections 2.5.1 and 3.1, we see that i f p is known and the means
become large in such a way that certain sums of functions of the means
remain bounded, the maximum likelihood and weighted squared residual
estimators will admit expansion (3.27) with remainder term which is
o (1).
P
The implication of the preceding discussion is that the conjecture
at the end of Example (i) is valid in the sense that for "large" values
61
of the means, the extended quasi-likelihood estimator will again behave
similarly to the
estimators.
maximum
Thus,
likelihood
while
the
and
weighted
formulation
and
squared
motivation
estimator may differ from those of the latter estimators,
quasi-likelihood
can
be
competitive
under
certain
residual
for
this
extended
conditions.
Computation of the extended quasi-likelihood estimator may not be as
straightforward
as
for
the
weighted
squared
residual
estimators,
however, since the latter may be computed using standard software while
the former requires iterative solution of a "nonstandard" problem which
includes the requirement of the correction factor described previously
in certain settings.
3.3
Modified maximum likelihood
Throughout this section,
we use
repl ication as in equation (1.10).
such a way that m remains fixed.
the notation for
Recall that N
=
the case of
Mm and N
-+
00
in
For modified maximum likelihood, we
require that (1.8) holds; thus, we write g(J.I.,z.,9) for the variance
1
function and v(J.I.,z.,9)
1
1
=
log g(J.I.,z.,9).
1
1
1
As before, v
derivative of v with respect to 9; similarly, let vee
9
represents the
=
{alae ae t} v
and let a superscript " , " denote differentiation of the quantity once
with respect to its first argument.
quantity represents its mean, e.g.,
Also as before, a
above a
62
log a and write
Let l"J
~(J.I.z.6)
It
turns
out
that
it
is
notationally
simpler
to
derive
an
expansion for the modified maximum likelihood estimator of 6 directly
rather than for the estimators of 6 and l"J simultaneously.
In Chapter
2, we note that the estimator of Sadler and Smith (1985) can be shown
to be equivalent to the modified maximum likelihood estimator.
Theorem
2.3
and
algebra.
we
can
show
that
the
Sadler
and
From
Smith
estimator satisfies
~(
') J-l,z, 9) N1/2(9~ - 9)
(3.28)
where the (} (N 1/ 2a) term is (} (l) if N1/ 2a
P
are
P
symmetric
or
A
o.
-+
A ~ 0 and either the {E. .. }
1J
We now show that the modified maximum
likelihood estimator of Raab admits expansion (3.28).
For the following.
let 9. l"J and J-l '
1
.•.•
J-l
M
denote the
modified maximum likelihood estimators of 9, l"J and J-l ' ... , J-l .
1
M
joint
Define
the following conditions:
(i)
( i i)
The {E. .• } are independent and identically distributed
1J
random variables.
Either the {E. •. } are symmetric or A
1J
0;
63
(iii)
1 2
Under (i) and (ii), N / (; - 9) = ~ (1) and
p
N
(iv)
(v)
1/2
A
(1] - 1])
The~.
1
= ~p(1) ;
are uniformly consistent for
For every
.,
> 0, P{
-
J.J i
max
l~ i~N
~.
1
the~.;
1
k
I
oj
k = 4, j = 3, and k = 3, j
> ., } _
0 for
2.
Before we state the result, we comment on (iii) and (iv) above.
the
general
estimator of Chapter 2,
to prove
(iii)
As for
would be very
detailed and essentially noninformative; as we show in Chapter 4, under
the assumption of (iii), the mod if ied maximum likelihood estimator is
not competi ti ve with several other important estimators so to prove
(iii) would be a laborious task resulting in I i ttle gain.
have not verified
(iv)
and
While we
(v) for the modified maximum likelihood
estimator itself, under reasonable conditions these assertions hold for
Y.l '
replacing~.
1
and for a one-step estimator of J.J. starting from Y.
1
and the Sadler and Smith estimator of
e.
expect
and
that
themselves.
conditions
such
as
(iv)
Since our major aim
l'
Thus, it is reasonable to
(v)
is to obtain
hold
for
the
{~i}
insight mainly for
comparative purposes we do not pursue these points further.
We now state the main result.
Theorem 3.5.
hold.
Under
Suppose that as N
these and further
maximum likelihood estimator of
e
~ ~,
0
~
0 simultaneously, (i) - (v)
regularity conditions,
satisfies
the modified
64
~
(J.I, z ,9) N
1/2
Sketch of oroof:
A
(9 - 9)
We present a heuristic argument and, as the algebra
involved is lengthy,
(1.10),9, q and J.l '
1
only briefly summarize the major steps.
... , J.l
M
From
solve
(3.29)
(m-1)
0,
m
i
(3.30)
By the Mean Value Theorem and (iii), (3.30) implies that
(3.31)
where
1,
... , M;
o.
65
-1
(3.32)
N
H1 ,N = -2-
~M
~m
.lo
.lo.
a
(3.33)
i =1
- 2
(Y ij - jii)
x
2 x
g (ji., z .. 9 )
1 1
J= 1
N- 1/2 M
H
2,N
1:
2
a
(Y
.Em
i=1 j=l
~(ji .• z .• 9).
1
1
- 2
ij - jii)
{ v 9 (J.l i . Z i
2 x
x
g (J.l.,Z.,9)
1 1
.9 )
- V9 (ji,z,9)
}
and
~
A
(ji . , Z • ,9 )
1
1
Taylor series of
(3.32)
in jii about jii using (iv) and (v) yields
after much tedious algebra that
-1
N
+ N
M
m
.E. 1X, 1
1= J=
-1 M
m
X. 1X'
1=
J=
2
Eo • • ~
1J
1 [ -2
+ h
(ji . , Z • ,9 )
1
Eo ••
1J
1
~(ji.,z.,9)/{0
1
2
.
1 ,1
Eo.,
1J
]
1
g(ji.,z .. 9)}
1 1
(ji1' - ji. )
1
(3.34)
-1 M
+ N
X.
1=
m
1X, 1 [ 2 ~ (ji. , Z • ,9) I
J=
+ h
for
constants
{h
.},
k ,1
1
1
k
,Eo.,/O
2 ,1 1J
1,
+ h
2,
{a
2
,Eo
2
g (ji., Z •• 9 ) }
1
2
1
.
3 .1 i J
3.
Similarly,
after
much
66
simplification a Taylor series of (3,33) yields
N -l/2~M,
"'1=1
H 2,N
+
2 {(
9)
- (
9)}
j=1 E. ij
V9 J.li,Zi'
- V 9 J.I,Z,
Em
-1/2 M m
N
E, IE, 1[-2E., ,{V (J.I"z,,9) - V (J.I,z,9)}/{a g(J.l "ZI,.a)}
9
1= J=
IJ 9 1 1
1
*
+ hI
2
,I' E."IJ ] (J.l 1, - J.I,1 )
(3.35)
+
N-l/2~M,
[2 {
'" 1= I'"~m,
va( J.I 1.• Z.1 , 9) - -va (r11, Z, 9)}/{ 0 2g 2( r11 , ,Z 1. , 9)}
J= 1
1
+
-1/2
+ N
M
h * ' E., , /0
2 ,I IJ
+
h * ' E.,2 .
3 ,1 IJ
m
-
2 2
E. IE, 1 [-2{V (J.I·,z"a) -Va(J.I,z,e)}/{o g (J.I.,z.,9)}
e
1= J=
1 1
1 1
+ {Va' (J.I. ,z . , e)
1
1
-
-
2 2
Vel (J.I, z ,9 ) } / {o g (J.I., z .
1
1
,e ) }
+ h * ' E..,/o + h * . E. 2..
4 ,I
5 ,I
IJ
IJ
*
for constants {hk,i}'
k = 1, ... ,5.
To obtain an expression for (J.l
i
that using (iii) and (iv) we have that
- J.I.)
1
we focus on (3.29).
Note
e-
67
a 2v
am E. .
().i.
1
I'
- ).i. )
1
1
(J-I . , Z . ,9 )
11m
1
(m - 1)
2
}
{E. ..
1J
A.
A. g (J-I . , Z . • 9 )
1
I
- - - - - - : L .J= 1
+
m
1
(3.36)
mE..
I'
v~
{ 1 +
(J-I .• Z . ,9)
'"
A. g(J-I.,Z.,9)
1
}
1
III
+---
m
(m - 1)
2
.L j=1 { E. ij - - - -
m
m
2
J=
1J
- 2 .L. 1 E.. . v' (J-I . , Z . ,9) { 1 + va (IJ • , Z . ,9) }],
1
'" r 1
1
1
where
2
2
{mig (J-l 'Zi ,9)} + (m - 1) v' I (J-l 'Zi,9) a
A.
i
i
1
+ {4 a m
v
I
(J-I . , Z . •9) ~.
1
1
I'
I g (J-I
. , Z . ,9 ) }
1
1
2{V'(J-I.,z.,9)}2].L~
1 E.~ ..
1
1
J=
1J
A Taylor series in a about 0 in (3.36) then gives that
,
(J-l
i
- J-I.) = a
g(J-I.,z.,9) E..
1
1
I'
1
+
a
2 2
g
-
(J-I.,z.,9)v'(J-I .• z.,9)[-4E..
1
1
1
1
I'
-1 m
+ m
:L
j =
2
(m-1)}]
1{E. .. -....:....-~
1J
m
(3.37)
+ (N
-1/2 2
2 ,
a) g (J-I .• z . ,9)v (J-I .• z .• 9) [ 8 {I + v
1
1
1
1
-2
9
(J-I . , z .• 9 ) }E. .
1
1
I'
68
m
2
+ {1J~(f.l.,z.,9)/IJ'(f.l.,z.,9)} X. 1{E. ..
Q
1
1
1
1
J=
IJ
- 2 {I +
IJ"
Q
Using (3.37)
-1 m
(f.l . , z . ,9) m X. 1
1
1
J=
(m-l)}
- ...;".--.;..
m
2
E... ].
IJ
in (3.34) and (3.35) yields, after much tedious algebra
and repeated use of (i), that
H
(3.38)
N-Ix M
Xm
i=1 j=1
I,N
t
(f.l
z 9) (E.
- ~. ) 2 +
i' i'
ij
I'
0.
P
(1)
and
H
N-l/2E~1= lX~J= 1
2,N
{1J,,(IJ.,z.,9) - v,,(f.l,z,9)
Q
r 1
1
Q
}
(E.
ij
_~.)2
I'
(3.39)
+
(N1/20) N-1 E.M IE.m 1 b.. +
1=
J=
IJ
0.
P
(l),
where the {b ij } are functions of various constants and the {E. } only
ij
through third powers of the {E. .. }.
IJ
From (i) and (ii) and the nature of
the {b .. }, the second term on the right-hand side of (3.38) is
IJ
0.
P
(1).
Multiplying (3.38) and (3.39) by m/(m-l) and abusing summation notation
slightly to facilitate the comparison with (3.28), we obtain that
2
and, by laws of large numbers and the fact that E si
1, that
69
Using the above in (3.31) now yields the result.
Theorem 3.5 shows that under the small a
2.3,
the
modified
maximum
likelihood
0
asymptotics of Theorem
estimator
of
Raab
asymptotically equivalent to the estimator of Sadler and Smith,
is
thus
supporting their empirical observations and the discussion at the end
of Section 2.3.
CHAPTER IV
COMPARISON AND DISCUSSION
4.0
Introduction
In this Chapter we use the results of Chapters 2 and 3 to offer
some
theoretical
comparisons based on asymptotic eff iciency of
variance function estimators of Section 1.3.
the
Our findings. along with
the developments in previous chapters, allow us to make several general
conclusions
regarding
efficient
and
appropriate
variance
function
estimation.
4.1
Comparison of methods based on residuals
In
order
to
make
simple
comparisons
among
the
methods
of
pseudo-likelihood. restricted maximum likelihood or weighted residuals,
maximum likelihood, extended quasi-likelihood. the logarithm method and
weighted absolute residuals, we assume that the errors are symmetric
and independent and identically distributed and that either g does not
depend
on fJ
likelihood
or a
and
asymptotically
the
is
small.
Recall
weighted
squared
equivalent
here
and
from Chapter 3 that maximum
residual
that
the
methods
same
are
methods
asymptotically equivalent to extended quasi-likelihood when a
~
o.
thus
are
71
From
(2.6)
and
(2.11),
we
see
that
the
asymptotic
relative
efficiency of the weighted absolute residual method with respect to
pseudo-likelihood and the other methods of Section 2.5.1 is
(4.1)
{(2 + K)(l - O)} /
This
is
the
asymptotic
(46).
relative
efficiency
of
the
mean
absolute
deviation with respect to the sample variance for a single sample, thus
the problem of comparing these two estimation methods is identical to
that of the Eddington-Fisher dispute, see Huber (1981, page 3).
normal
errors,
using
absolute
residuals
results
in a
12% loss
For
in
efficiency while for standard double exponential errors there is a 25%
gain in efficiency for using absolute residuals.
From (2.6) and (2.10), the asymptotic relative efficiency of the
logari thm method with respect to those based on squared residuals is
given by
(2
(4.2)
For
normal
2
[ var {log (£. )}]
+ K)
errors,
the
logarithm
-1
.
method represents a 59% loss of
efficiency with respect to pseudo-likelihood.
Huber
(1981,
page 3)
presents a table
of
asymptotic
relative
efficiencies for mean absolute deviation with respect to mean square
deviation for various contaminated normal distributions.
Here,
we
adapt part of that table as our Table 4.1, augmenting it to include the
result
(4.2).
The table shows that while at normality neither the
absolute residuals nor the
logarithm methods are efficient,
a very
72
slight
fraction
of
"bad"
observations
is
enough
to
superiority of squared residuals in a dramatic fashion.
offset
the
For example.
just two bad observations in 1000 negate the superiority of squared
residuals.
If 1% or 5% of the data are "bad." absolute residuals and
the logarithm method. respectively, show substantial gains over squared
residuals.
The implication is that while it is commonly perceived that
methods based on squared residuals are to be
these methods can be highly non-robust.
result
for
maximum
likelihood.
preferred
in general,
Our formulation includes this
showing its
inadequacy under
slight
departures from the assumed distributional structure.
4.2
Methods based on saMole standard deviations
Assume that m
design point.
It
~
2 replicate observations are available at each
is often
the case
in practice that m is
usually no more than 4 and most often 2,
see Raab (1981).
small.
We now
compare using absolute residuals to using sample standard deviations in
the estillators of Section 1. 3.1.
For silplici ty.
assume
that
the
errors are independent and identically and sY'Metrically distributed
and that either g does not depend on p or
0
is small.
If the errors
are not symlletric and C is not small or the variance depends on
fJ.
using sample standard deviations will be more efficient than suggested
in the discussion below.
2
Let s m be the salllpl e var iance of
I
errors
{E.
1
.···, E. m} .
It
is
easily shown by calculations analogous to those of section 4.1 that
replacing
absolute
residuals
by sample standard deviations has
effect of changing the asymptotic covariance matrix (2.6) to
the
73
{(2
(4.3)
so
+ K)
+
2/(m - 1)} {4N ~(P.9)}-1
that the asymptotic relative efficiency of using sample standard
deviations to weighted squared residuals is given by
(4.4)
{(2
+
K)(m - 1)} / {(2
+
K)(m - 1)
+
2}.
From the discussion in Sections 2.3 and 3.3 we see that under
addi tional conditions presented there.
relative efficiency of
the
(4.4) represents the asymptotic
the Raab estimator to
the
weighted
squared
residual methods.
By calculations similar to those in Section 2.5.2. we find that
replacing absolute residuals
by
sample
standard
deviations
in
the
logarithm method changes (2.10) to
(4.5)
Note that this is thus the asymptotic covariance for the Rodbard and
Frazier estimator.
When the errors are normally distributed. it can be
shown routinely that the asymptotic relative efficiency of using sample
standard deviations to absolute residuals is given by
2
/ [2m ~r{(m - 1)/2}].
(4.6)
n
where .,.'
is the trigamma function.
see Abramowitz and Stegun (1972,
chapter 6).
Replacing absolute residuals by sample standard deviations in the
weighted absolute residual method of Section 2.5.3 can be shown to
74
change the asymptotic covariance matrix of this estimator from (2.11)
to
{m 6
where 6*
=
* /
var
(1 - 6
*)}
{N ~ (13 ,9 ) }
-1
,
The asymptotic
(s ).
m
relative
efficiency of using
sample standard deviations to absolute residuals in this method is thus
(4.7)
Table 4.2 contains
the asymptotic
relative
efficiencies (4.4),
(4.6) and (4.7) for various values of m when the errors are standard
normal.
The values in the table for HI (x)
=
x
2
and x indicate that i f
the data are approximately normally distributed, using sample standard
deviations
can
entail
a
loss
residuals i f m is small.
in efficiency with respect
For substantial replication (m
~
to using
10), using
sample standard deviations produces a slight edge in efficiency with
x.
respect to residuals for HI
4.1
that
for
normal
data,
It is interesting to note from Table
the
asymptotic
relative
efficiency
of
weighted squared residuals wi th respect to absolute residuals is the
reciprocal
of
the
limit
of
the
asymptotic
relative
efficiency of
weighted absolute residuals with respect to weighted absolute sample
standard deviations as m ...
00
This implies that for large m, using
sample standard deviations for H (x) = x is the same as using weighted
1
squared
residuals.
encountered
For
in practice
the
(m
~
degree
4),
of
however,
replication
the
likely
implication
to
is
be
that
residuals are to be preferred to sample standard deviations in both
methods when the data are approximately normal.
e·
75
The second column of Table 4.2
shows
that
for
the
logarithm
method, using sample standard deviations surpasses using residuals in
terms of efficiency except when m '" 2 and
eff icient for large m.
In its raw form.
because, at least occasionally,
the regression.
decrease
the
Ir i I
::l::
is more than
log
Ir i I
twice
is very unstable
0, producing a wild "outlier" in
The effect of using sample standard deviations is to
possibility
of
such
outliers;
the
sample
standard
deviations will be likely more uniform, especially as m increases.
implication
residuals
is
as
that
unless
the
logarithm
remedial
measures
approximately normally distributed.
method
are
should
taken
not
if
be
the
based
data
The
on
are
The suggestion to trim a few of
the smallest absolute residuals before using this method is clearly
supported by the theory;
presumably,
such trimming would reduce or
negate the theoretical superiority of using sample standard deviations.
Table
4.3
contains
the
asymptotic
relative
efficiencies
of
weighted squared sample standard deviations and logarithms of these to
weighted squared residuals under normality of the errors.
The first
column is thus the eff iciency of Raab I s method to pseudo-likelihood.
and the second column is the efficiency of the Rodbard and Frazier
method to pseudo-likelihood.
Dividing the second column by the first
yields the asymptotic relative efficiency of the Rodbard and Frazier
method to that of Raab; these numbers agree quite well with Monte Carlo
efficiencies for m
= 2, 3, and 4 of 39%, 62% and 77% and 39%, 64% and
74% reported by Raab (1981) and Sadler and Smith (1985), respectively.
The results of the table imply that using the Raab and Rodbard and
Frazier methods, which are popular in the analysis of radioimmunoassay
data, can entail a dramatic loss of efficiency when compared to methods
based on weighted squared residuals when the data are approximately
76
normally distributed.
In Chapter 5 we exhibit a specific implication
of this result for an important application in the analysis of assay
data.
Note from (4.4) that the squared residual methods will always be
more
efficient
than
Raab's
distribution of the data.
method.
regardless
is
not
as
the
underlying
When the distribution is not normal, the
superiority of the methods of Section 2.5.1
Frazier
of
clear
When
cut.
the
to that
{~.}
1
are
of
Rodbard
and
independen t
and
identically distributed double exponential random variables such that
E~i
= 0 and
2
E~i
the Rodbard and
residual
methods
normal errors.
= 1 and m = 2. the asymptotic relative efficiency of
Frazier
is
estimator with
0.448,
respect
to weighted
squared
which is more than twice that given for
For some highly contaminated normal distributions,
it
can be shown that in some instances the Rodbard and Frazier estimator
can be preferable to weighted squared
relative
efficiency.
When
m
=
2,
residual
methods
in
terms
of
1% contamination by observations
which are normal with standard deviation six times that of standard
normal yields an asymptotic relative efficiency of 2.203.
4.3
Discussion and conclusions
In Chapter 2 we constructed a general theory of regression-type
estimation for 9 in the heteroscedastic model (1.1) and
(1.2).
This
theory includes as special cases common methods described in Section
1.3.1 and allows for the regression to be based on absolute residuals
from the current regression fit as well as sample standard deviations
in the event of replication at each design point.
In Chapter 3 we
e·
77
showed that the
implications of
the general theory are valid under
certain conditions for other estimators not fitting strictly into this
Under various restrictions such as symmetry or small a. when
class.
the variance function g does not depend on
P.
we showed in that we can
draw simple general conclusions about this class of estimators as well
as estimators which do not
fall
into
this
class
and
make
simple
symmetric
error
comparisons among the various methods.
Our
conclusions
apply
strictly
only
to
distributions. but they are fairly definitive and one is unlikely to be
too successful ignoring them in practice.
that
robustness
plays
function estimation.
approximately
a
great
Squared
normally
role
in the efficiency of variance
residual
distributed
Our results indicate first
methods
data.
but
are
preferable
this
preference
for
is
tenuous. as methods based on these can be highly nonrobust under only
slight departures from normal ity.
Methods based on logarithms or the
absolute residuals themselves exhibit relatively more robust behavior.
A second
residuals.
conclusion
is
that
when
employing
methods
one should weight the residuals appropriately.
based
on
Further.
because asymptotic efficiency of the variance function estimators is an
increasing function of the current regression fit.
iterative
weighted
fitting so that
one should use an
the estimate of 9
is based on
generalized least squares residuals.
Another conclusion concerns the use of residuals versus that of
sample standard deviations.
For the small amount of replication found
in practice. using sample standard deviations rather than residuals can
entail a large loss of efficiency if estimation is based on the squares
of these quantities or the quantities themselves.
For the logarithm
method of Harvey based on residuals. trimming the smallest few absolute
78
residuals
is essential, since for normal data using sample standard
deviations is almost always more efficient than using residuals.
for a small number of replicates.
even
Popular methods in applications such
as radioimmunoassay based on sample means and standard deviations can
be less efficient than methods based on weighted squared residuals.
Efficient
variance
function
estimation
in
heteroscedastic
regression analysis is an important problem in its own right.
There
are important differences in estimators for variance when it is modeled
parametrically.
e-
79
Table 4.1
Asymptotic
residuals
relative
for
function F(x)
efficiency
contaminated
= (1 -
a)~(x)
with
normal
respect
distributions
weighted
with
squared
distribution
+ a~(x/3).
weighted absolute
residuals
contamination
fraction a
to
logarithms of
absolute residuals
0.000
0.876
0.405
0.001
0.948
0.440
0.002
1.016
0.480
0.010
1.439
0.720
0.050
2.035
1.220
Table 4.2
Asymptotic relative efficiency of using sample standard deviations to
using absolute residuals under normality for H (x) (weighted methods) .
1
.H 1W
2
~
~
0.500
0.500
0.500
3
0.667
1.000
0.696
4
0.750
1.320
0.801
9
0.889
1.932
0.986
10
0.900
1.984
1.001
00
1.000
2.467
1.142
.!!!
~
2
80
Table 4.3
Asymptotic relative efficiency of using sample standard deviations to
weighted squared residuals under normal errors for H (x).
1
l!l"LAl
.m
~
2
~
2
0.500
0.203
3
0.667
0.405
4
0.750
0.535
5
0.800
0.620
6
0.833
0.680
7
0.857
0.723
8
0.875
0.757
9
0.889
0.783
10
0.900
0.804
e·
CHAPTER V
APPLICATION - THE ROLE OF 9 IN THE PROPERTIES
OF THE ESTIMATOR FOR MINIMUM DETECTABLE CONCENTRATION
5.0
Introduction
In this chapter we investigate a specific setting in which the
choice of the estimator for 9 plays an important role in determining
the
theoretical
quantities.
estimator
properties
of
estimators
for
other
important
In particular, we study the theoretical properties of an
for
the
minimum
detectable
concentration,
an
important
calibration quantity in the analysis of assay data, and show how the
properties of this estimator are controlled by those of the estimator
for 9.
We present a Monte Carlo study and an example which not only
support the asymptotic results of this chapter but our results on
iterating the generalized least squares algorithm from Chapter 2.
5.1
Analysis of assay data and the minimum detectable concentration
The analysis of assay data has long been an important problem in
clinical chemistry and
the
biological
sciences;
Finney (1964) and Oppenheimer. et al. (1983).
see,
for
example,
The most common method
of analysis is to fit a nonlinear regression model to the data.
Much
82
recent work suggests that these data can be markedly heteroscedastic;
in radioimmunoassay, for example, this characteristic has been observed
repeatedly and incorporated into the analysis as discussed by Finney
(1976), Rodbard (1978), Tiede and Pagano (1979)
analyses
are
for
heteroscedastic
the
most
nonlinear
part
special
regression
and Raab (1981).
cases
model
Such
of
the
general
(1.1 )
and
(1.2).
Specif ically, we observe independent counts Y . at concentrations x.
iJ
1
for i = 1, ... ,M and j = 1, ... m with mean and variances given by (1.1)
i
and
(1.2).
Since
replication
is
involved
we
use
this
notation
throughout for convenience; for our analysis we adopt the simplifying
assumption that mi
= m vi.
A standard model for the
mean
in
a
radioimmunoassay is the four parameter logistic model
e-
(5.1)
Almost without exception, the variances have been modeled as functions
of the mean response as
in
(1.8),
usually either
as
a
quadratic
function or as a power of the mean, i.e.,
(5.2)
The assay problem does not always stop with estimating p, but also
addresses
issues
of
calibration.
intervals for a true x. given a
problem.
These
issues
new Y.,
the
include confidence
classic
calibration
Also of interest is determination of the sensitivity of the
assay using such concepts as the minimum detectable concentration of
Rodbard
(1978)
and
the
critical
level,
detection
level
and
83
determination limit of Oppenheimer, et al. (1983).
In this chapter we
show that a unique feature of these calibration problems is that the
efficiency of estimation is essentially determined by how well one
estimates the variance parameter 9.
We show that far from being a
nuisance parameter, 9 plays a central role in these problems.
For
definiteness,
we
detectable concentration.
concept;
we
use
the
focus
on
the
There
is
no
following
determination
unique
of
definition
minimum
of
this
based on the definition of Rodbard
(1978).
Definition.
Let Y(x,m) be the mean response based on m replicates at
concentration level x, taken independently of the calibration data set
Let f(O,P) be the expected response at zero concentration based
{Yo .}.
1J
on the calibration data set.
at level
(5.3)
(1~)
The minimum detectable concentration x
c
is the smallest concentration x for which
Pr {Y (x, m) ~ f (0 ,p )} > 1 - a.
If t(a,N-p) is the (l~)th percentile of the t-distribution with
N-p degrees of freedom, where N
=
Mm is the total sample size and M is
the number of concentrations, the usual estimate x
(5.4)
{ f (x
c
,P) - f( 0
,p)} 2
c
of x
2 A2 2 A
{t(a,N-p)} {a g (x ,p,9)/m
c
c
+
satisfies
var[f(O,p)]),
A2
where var[f(O,p)] is an estimate of the variance of f(O,P) and a is
the usual mean squared error from the weighted fit:
84
A2
a
In
(N-p)
-1 M
m
E. 1E . l{Y"
1=
application,
J=
1J
A 2 -29
A
- f(X.,It)} f
(X.,It).
1 ~
practitioners
1 ~
involved
in
the
analysis
of
radioimmunoassay data commonly use the methods of Rodbard and Frazier
and Raab to estimate 9 and then perform generalized least squares with
Thus, in our investigation, we will first investigate the properties of
x
c
in general and then for illumination use the general
examine the behavior of x
c
when 9
we
will
refer
in
to
is estimated by the methods of
Rodbard and Frazier, Raab and any of
methods;
results
the weighted squared residual
particular
to
the
latter
methods
as
pseudo-likelihood for brevity.
5.2
Asymptotic theory
In this section we outline an asymptotic theory for the estimator
As mentioned
of minimum detectable concentration.
in Section 2.4,
values of a which are small relative to the means are typical of assay
data.
Furthermore, as in Chapters 2 and 3, the assumption of small a
is needed from a technical standpoint for study of common estimators
for 9 used in this application such as the Rodbard and Frazier and Raab
estimators.
Thus,
the asymptotic theory we develop here is based on
the assumption that N ~
remains
fixed.
For
~
and a
~
0 simultaneously in such a way that m
simplicity
of
presentation
and
immediate
e-
85
application, we express our results in terms of the power of the mean
model (5.2).
Throughout.
assume
that
identically distributed.
v.
1
s
0
2
m
2
V
the
errors
{~ij}
are
independent
and
Define
log f ( x. . p) ,
1
((m-1)
-1
lim", _
!~
m
-
~.
E. 1(~ ..
J=
1J
(M-1)
l'
-1 M
) 2 }, and
-
2
E. 1 (v. - v) .
1=
1
For the minimum detectable concentration, note that in (5.4) the term
var{f(O,p)} is of the order Nother
terms.
1
and is hence small relative to all the
It turns out that
in our asymptotic framework,
the
solution to (5.4) is estimating the quantity x * , where
c
(5.5)
o
2 2 29
{z(a)} 0
f
*
*
(xc,p)/m
- {f(xc'P)
- f(O,P)} 2 ,
where z (a) is the (1 - a) th percentile point of the standard normal
distribution.
Here are the major results,
given in Section 5.4.
the technical details for which are
Define
(5.6)
The main theorem follows from the results of Chapter 2.
86
Let x (RF),
c
TheQrem 5! 1 .
minimum
detectable
estimate of
9,
x (MML)
c
cQncentratiQns
and x (PL)
c
denQte the estimated
using
RQdbard
the
and
Frazier
the modified maximum likelihood estimate of Raab and the
pseudo-likelihood estimate ,respectively.
Then there is a constant A
o
and a sequence b for which
N
(5.7)
(5.8)
(5.9)
Froll
maximum
(4.3),
•
the asymptQtic relative efficiency of the modified
likelihood
estimate
of
minimum
detectable
concentration
relative to the pseudo-likelihQQd estimate is
(5.10)
which
(2
is
Similarly,
+ 1C)(Oe +
less
than
d~)/{(2
1 fQr
the asymptQtic
+ 1C)(Oe +
all
II
d~)
+
2d~/(m - 1)}
regardless
Qf
relative efficiency Qf
the
the
value
Qf
K.
lQg-linearized
estimate Qf minimum detectable cQncentratiQn tQ the pseudQ-likelihQQd
estiJlate is
(5.11)
It fQllQWS frQm TheQrem 5.1 and (5.10) and (5.11) that the Qrdering in
efficiency Qf estimated minimum detectable cQncentratiQn is the same as
e-
87
the
ordering
for estimating
e and thus from Chapter 4 will favor
pseudo-likelihood for normally distributed data in the case of the
Rodbard and Frazier estimator and for all distributions in the case of
the mod i f i ed maximum 1ike 1ihood method.
depend on the
example,
in
The numerical
2
efficiencies
2
logar ithm of the true means through do and a v'
the
simulation
discussed
in
the
next
section,
For
the
asymptotic relative efficiency of the Rodbard and Frazier estimator is
27% for m
The
= 2 and 63% for m = 4.
asymptotic
theory
thus
suggests
that
inefficiencies
in
estimating the variance parameter e translate into inefficiencies for
estimating the minimum detectable concentration.
5.3
A simulation
To check the qualitative nature of the asymptotic theory, we ran a
small simulation.
We
restrict our focus
here to the
Frazier method and weighted squared residual methods,
pseudo-likelihood iterated with number of cycles
Rodbard
and
in particular,
~.
The responses Y.. were normally distributed with mean and variance
1J
satisfying (5.1), (5.2), where PI = 29.5274, P
2
P4 = 1.0022, e = 0.7 and a =.0872.
given in Table 5.1.
= 1.8864, P3 = 1.5793,
The 23 concentrations chosen are
The parameters and the concentrations were chosen
=2
to represent some assays we have observed.
We studied the case m
or duplicates and m
For each situation, there
=
4 or quadruplicates.
were 500 simulated data sets.
A limited second simulation was run with
the larger value a = 0.17, but there did not appear to be significant
88
qualitative differences from the case reported here.
The estimators chosen were unweighted least squares for P.
the
Rodbard and Frazier method and the pseudo-likelihood/generalized least
squares combination which we report only for
algorithm.
The
methods
of
~
estimating
= 1 and 2 cycles of the
the
concentration are as discussed in Section 5.1.
constrained to lie in the interval 0
~
9
~
minimum
detectable
The estimates of 9 were
1.50.
In Table 5.2. we compare the estimators of 9 on the basis of bias
and variance.
The biases are large relative to the standard error. so
that mean-squared error comparisons are artificial and dramatic.
bias in the pseudo-likelihood estimate of 9 when doing only
The
'e
= 1
cyc les of the algorithm has been previously observed as mentioned in
the discussion of Chapter 1.
the replicates
One sees here that the effect of doubling
from two to four
for a given set of concentrations
improves the Monte-Carlo efficiency of the Rodbard and Frazier estimate
of 9
from 0.351
to 0.585.
compared
to
the
theoretical
asymptotic
increase from 0.203 to 0.535. supporting the theory of Chapter 4.
This
example indicates that pseudo-likelihood estimation of 9 can in some
circumstances be a considerable improvement over the method of Rodbard
and Frazier.
In Table 5.3 we consider the mean-squared errors for estimating
the regression paraMeters (PI 'P .P .P 4) .
2
here.
3
The value of a is very sJlall
Larger values of a would presumably cause greater differences.
As expected.
the unweighted least squares estimate is
unacceptably
inefficient.
Finally.
chose a
we turn to the minimum detectable concentration.
= 0 . 05 .
For all
of the methods used in
the
study.
We
the
e-
89
probability requirement
(5.3)
was
easily satisfied;
rather than 95%
exceedance probability, every case was more than 97%.
of
The mean values
the minimum detectable concentrations are reported in Table 5.4,
with variances given in Table 5.5.
Note that in both of these tables,
we give results for the case that 9 is known as well as estimated.
The
relatively poor behavior of unweighted least squares is evident.
To
quote from Oppenheimer, et a!.
have
been
observed
inappropriate
depending
unweighted
parameter 9 is known,
(1983):
on
analysis
"Rathel' dramatic differences
whether
a
used."
is
valid
When
weighted
the
or
variance
there is little difference between any of the
weighted methods.
When
9
is
differences.
unknown,
The figures
there
are
rather
large
proportional
in Table 5.4 show that the mean minimum
detectable concentration for
the Rodbard and Frazier method
is
10%
larger than for the pseudo-likelihood method based on 'e = 2 cycles;
whether the raw numerical difference is of any practical consequence
will depend on the context.
For m
detectable
=
2 replicates,
the pseudo-likelihood estimate of minimum
concentration with unknown 9 has mean 3.934 x
standard deviation 0.05 x 10
-4
10
-2
and
; the corresponding figures for M = 4 are
2.722 x 10- 2 and 0.028 x 10- 4 .
Proportionately, when 9 is unknown, the
method of estimating it seems to have important consequences for the
estimate
of
variability
minimum detectable concentration,
of
the
estimate.
For
the
case
particularly
of
in
the
duplicates,
the
Monte-Carlo variance of pseudo-likelihood is only 0.368 as
large as
that based on the Rodbard and Frazier estimate, while the asymptotics
90
suggest
0.273,
increasing
to
0.709
and
0.629
respectively
for
quadruplicates.
5.4.
An example
the three estimators of a and the subsequent
Differences among
estimators of minimum detectable concentration which are reminiscent of
the qualitative implications of the asymptotic theory and simulation
can
be
seen
in
the
following
example.
The
data
are
from
a
radioimmunoassay based on 4 replicates at each of the 23 concentrations
given in Table 5.1 and are presented
in Table 5.6.
The analysis
presented here is for illustrative purposes only; we do not claim to be
analyzing these data fully.
Our aim is to exhibi t the fact that the
three methods of estimation of 9 considered in this chapter can lead to
nontrivially different results.
We assumed in all cases the mode I (5.1) and (5.2).
For the full
data set and reduced data sets considering all possible permutations of
duplicates
application
(except
of
the
2
one set for which an s.
1
Rodbard
estimates of a, a 2 and x
Frazier
methods
and
c
the
and
Frazier
= 0, complicating the
method),
we
computed
the
using the pseudo-likelihood and Rodbard and
estimate
of
Sadler
and
Smith
(1985)
as
described in Section 2.3 in place of the more computationally difficult
modified maximum likelihood estimate.
We also computed the estimate of
minimum detectable concentration based on ordinary least squares.
results are given in Table 5.7.
The
91
An investigation of both the full and reduced data sets suggests
that there are no massive outliers and that design points I, 22 and 23
are possible high leverage points.
For our purposes of illustration we
do not pursue this point; we could investigate the behavior of one of
the methods that account for leverage, for example.
The resul ts of Table 5.7 show that the three estimates can vary
greatly.
As a measure
deviations of 9 and x
c
of
this,
consider
the
means
and
standard
for the five data sets obtained by considering
duplicates (ignoring the fact that these data sets are not strictly
independent) .
Using these, we list "relative efficiencies" for the
estimators below:
"Relative efficiencies" for estimators of 9 and x
c
for data in example when m =2
RF to PL
9
x
c
MML to PL
RF to MML
.222
.351
.632
.529
.659
.802
Qualitatively, the estimates exhibit the type of behavior predicted by
the asymptotic
theory;
quanti tati vely,
the values compare favorably
with what the theory would predict given the crudity of the comparison.
This example shows that there can be wide differences among the
various estimation methods for 9 and minimum detectable concentration
in application and that the qualitative way in which the differences
manifest themselves is predicted by the asymptotic theory of Chapter 2
and Section 5.2.
92
5.5
Summary
The point made in this chapter is that the relative efficiency of
the estimated minimum detectable concentration can be affected by the
method used to estimate the variance parameter 9.
While for estimation
of 13 the effect of how one estimates the variance function is only
second order,
first order.
of
the
for estimation of other quanti ties,
While we have shown this result in the specific instance
minimum
detectable concentration,
probably a much more general
general.
the effect can be
Development
of
a
result for
unified
we conjecture that
calibration
theory
for
the
it
is
quantities
in
estimation
of
calibration quantities would be a worthwhile endeavor.
5.6
Proofs
The analysis of the estimator for minimum detectable concentration
is complicated by the behavior of the derivative of f(x,fJ) with respect
to 13 at x=O, especially for the standard model (5.1).
f (x ,13 ) = h (t ,13), where t
=
e (x ,13 ),
the model (5.1), for example, t
throughout
smooth.
that f(O,fJ)
=
t
*
c
=
e (x,fJ)
e (x *c ,13)
exp (13
and t
4
We will write
c
=
e (x c ,13).
log x).
In
We assume
> 0 and that all functions are sufficiently
Assume further that
93
(5.12)
e (0 ,P)
(5.13)
a/a~
(5.14)
a/ap e. (0 ,P)
(5.15)
If w ~ 0 and v is a random variable such that
= 0
;
h(O,P)
~
0
o
p
e.(v,P)/e.(w,P)
sup{
p
I
O~a~l:
I, then
~
{a/a~ e.(av+(1--a)w,p)}/{a/a~
e.(w,P) - I}
I }~
O.
These assumptions are satisfied for the model (5.1) if P4 > O.
We need the following results.
Define c = {z(a)}2.
will be at the end of the section.
Lemma 5.2
~
Proof:
*
c
=
As a
aa
+ 0'
c
0
for f(O,P) > 0,
(a 2 ),
a
c
A Taylor series expansion of (5.5) in
Lemma 5.3
Assume that as N ~~, a ~ 0, (t
2ma
Then as N
expansion
(5.16)
~
The proofs of Lemmas 5.4 and 5.5
c
{a/a~
~ ~,
a
~
2
h(O,P)} /{cf
0,
if N
29
112 (9
c
*
~c
and around zero.
2
- t*)
= 0' (aN 1/ ).
c
p
0
Define
(O,P)}.
-
9)
0' (I),
p
we have the asymptotic
94
Lemma 5.4
Consider Lemma 5.3.
Then
•N l/2(~2
a - a 2)/a 2
so that
where dO is defined in (5.6).
e-
C
1 2
Proposition 1 : The limit results (5.7)-(5.9) hold for A N /
o
where (in obvious notation)
Proof of Proposition 1
~
c
= ~
c
(RF). ~
c
(MML) or
~
c
(t c -~*)/o,
c
(PL).
From the results of Chapter 2, we have that
(in obvious notation)
E(log q~)}(v. -
v)
(vi -
v)
+ 0 (1);
l I P
( 1/2) N-1/2 m-1/2 EM
i=1
Em
j=1
(6. 2 - 1)
ij
+ 0
P
(1),
95
so that by Lemmas 5.2 - 5.4 and equation (5.16), we have
N
-1/2
M m
2
1:. 11:. 1 (t:. . . 1=
J=
IJ
{
1
1)
2
do (vI' - v) /0 }
+
V
+ 0.
P
(l),
which with the central limit theorem gives the same limit distribution
as in (5.9).
We also have that
+
d
o
mN
-1/2
M
1:. 1 (v. 1=
v) log
2 2
q. /0
+ 0.
-1/2 M
m
2
1
Simple
do mN
central
-1/2
M
- 2 2
1:. l(v. - v)q./o
1=
limit
1
theorem
1
final
Result (5.8) is based on
fit of
V
V
+ 0.
P
P
(l)
(1).
calculations
distribution as in (5.7) and (5.8).
Remark:
1
1:.1= 11:.J= 1(t:. IJ
.. - 1)
N
+
M m
2
1:.1= 11:.J= 1 (t:. IJ
.. -1)
-1/2
N
yield
the
same
0
0
obtained from the residuals of the
the mean response function as for
the log-linearized
method and pseudo-likelihood, so that Lemma 5.4 holds.
The modified
maximum likelihood method also provides a joint estimate of
wi th the estimate of 9.
limit
0
along
If one considers this estimator in place of
0
in Lemma 5.4, it can be shown that the resulting estimator of minimal
detectable concentration has even larger asymptotic variance than that
96
in (5.8).
This follows from calculations using the results of Section
3.3 and has interesting implications for practice.
By (5.16), for any of the estimators x . since
c
Proof of Theorem 5.1 :
t = e(x,p), we have the limit result that for
(5.17)
N
Thus, for
~
W
N
where
1/2
c
c
,P) -
between x
e x (~ c ,P)}
ex (v,P)
e(v,p).
{e(x
I
e(x*,p)
c
}/o
-~__• N(O,~).
and x * , defining
c
c
e x (x c* ,P)
It thus suffices through (5.15) to prove that
p
1.
But this follows from (5.17) since t *10
c
a , see Lemma
c
[]
Proof of Lemma 5.3:
5.2,
e-
is the derivative with respect to the first argument of
_
5.2.
some~,
By a series of Taylor expansions and using Lemma
97
(5.18)
1 2
N /
{h(t
c
'P)
_ h(O,p)}2/a 2
= N 1 / 2 {h(i c' 13)
1 2
N / {h(t
+
+
* 13)
c'
- h(O,f3)}2/a 2 +
(J
p
(N- 1 / 2 )
- h ( 0 ,13 ) } 2/0 2
1 2
2
2{h(t * ,13) - h(O,f3)}{a/at h(t*,f3)}N / (i -t*)/a
c
c
c c
(J
p
(N- 1I2 )
1 2
N / {h(t:,f3) - h(O,f3)}2/a 2
+
N
2t * {a/at h(O.f3)}2 N1/2(i - t * )/0 2 +
c
c
C
1I2
+
0.
c
~
(1)
{h(t;,f3) - h ( 0 ,13 ) } 2/0 2
2a {a/at h (0 ,13) } 2 N1/2 (~ - t * ) /a +
c
c
C
0.
p
1 2
Similar calculations taking into account that N /
t*
p
(1).
(p - 13)
=
0 yield
(5.19)
+ (29/m)h
29-1
*
*
1/2
(t ,f3)h/l(t ,13) N
(13 - 13) +
A
C
~
C
0.
P
(1)
(J
p
(a) and
98
+ (2c/m) h
29
1/2 (O,P) {log h(O,P)} N
(9 - 9) + a (1).
p
Combining (5.4), (5.5), (5.18) and (5.19) yields (5.16).
Proof of Lemma 5.4:
N
-1 M
X.
Define
m
9
f(x. ,P)}/f (x. ,P)]
..
1= IX'J= 1 [ {Y IJ
112 (P - P)
Then, since N
=
1
2
1/2 M
X.
X~
1=1 J=
-
m 2
a X.1= 1X J=
' 1E....
IJ
-12M
2 29
- f(x.,P)} If
(x.,P)
1 [{V
1
ij
1
2 29
-2
{Y .. - f(x. ,P)} If
(x. ,p)]a
IJ
1
1
2
I
-
29
-2
(x.,p)]a
+ a (1)
l I P
-{Y .]. -f(x.,P)} If
1/2-
-2v N
0
N
1
a p (a),
N-
completing the proof.
0
(9
- 9) + a
P
(1),
e-
99
Table 5.1
Concentration Levels Used in the Simulation
~
~
0.00
2.50
0.075
3.25
0.1025
4.50
0.135
6.00
0.185
8.25
0.25
11.25
0.40
15.00
0.55
20.25
0.75
27.50
1. 00
37.00
1.375
50.00
1.85
100
Table 5.2
Three Estimates of the Variance Parameter 9
m
Bias
2 Replicates
Monte-Carlo
Rodbard and Frazier
Asymptotic
m
4 Replicates
Monte-Carlo
0.15
0.001
Asymptotic
Pseudo-likelihood
'e
1
0.045
0.022
'e
2
0.000
0.001
'e
3
0.004
0.000
Variance of Pseudo
-likelihood with 'e = 2
Relative to Variance of
Rodbard and Frazier
0.351
0.203
0.585
Pseudo-likelihood
1
1. 010
1.010
3
1.000
1.000
0.535
101
Table 5.3
Mean Squared Error Ratios for Estimating the Regression Parameter
Pseudo-likelihood with
~
= 2 Steps Relative to
m = 2 Replicates
m = 4 Replicates
Least Squares
0.621 0.330 0.704 0.424
0.694 0.383 0.725 0.418
Rodbard and
0.901 0.917 0.901 0.893
1.000 0.971 1.000 0.990
Frazier
Pseudo-likelihood
~
1
0.980 0.961 0.990 0.971
1.000 1.000 0.990 0.990
~
3
1.000 1.000 1.000 1.000
1.000 1.000 1.000 1.000
Table 5.4
100 x Mean Minimum Detectable Concentrations
m
=
9 Known
Unweighted Least
Squares
Rodbard and Frazier
2 Replicates
9 Estimated
m = 4 Replicates
9 Known
9 Estimated
13.106
13.106
9.173
9.173
3.937
4.346
2.718
2.785
Pseudo-likelihood
2
2.809
4.216
1
3.927
3.934
2.715
2.722
102
.
Table 5,5
Ratio of Monte-Carlo Variance of the Estimate of Minimum Detectable
Concentration -- Pseudo-likelihood with
m
= 2 Replicates
9 Known
Unweighted
Least Squares
0.067
Rodbard and Frazier 0.980
9 Estimated
0.118
0.368
~
= 2 Cycles Relative to
m = 4 Replicates
9 Known
0.053
1.000
9 Estimated
0.073
0.709
Pseudo-likelihood
't = 1
0.840
0.910
e-
103
Table 5.6
Data for Example of Section 5.4
Concentration (x)
Response (Y)
0.000
1.700, 1.660, 1. 950, 2.070
0.075
1. 910. 2.270, 2.110, 2.390
0.1025
2.220, 2.250, 3.260, 2.920
0.135
2.800, 2.940, 2.380, 2.700
0.185
2.780, 2.640, 2.710, 2.850
0.250
3.540, 2.860, 3.150, 3.320
0.400
3.910, 3.830, 4.880, 4.210
0.550
4.540, 4.470, 4.790, 5.680
0.750
6.060, 5.070, 5.000, 5.980
1.000
5.840, 5.790, 6.100, 7.810
1.375
7.310, 7.080, 7.060, 6.870
1.850
9.880, 10.120, 9.220, 9.960
2.500
11. 040, 10.460, 10.880, 11.650
3.250
13.510, 15.470, 14.210, 13.920
4.500
16.070, 14.670, 14.780, 15.210
6.000
17.340, 16.850, 16.740, 16.870
8.250
18.980, 19.850, 18.750, 18.510
11.250
21. 666, 21. 218, 19.790, 22.669
15.000
23.206, 22.239, 22.436, 22.597
20.250
23.922, 24.871, 23.815, 24.871
27.500
25.748, 25.874, 24.907, 24.871
37.000
24.441, 25.874, 25.748, 27.270
50.000
29.580, 26.698, 26.536, 27.181
104
Table 5,7
Estimates of 9, a and x
based on example of Section 5.4
Least
Pseudo-
Rodbard and
Squares
likelihood 'e=2
Frazier
x
Full
c
~2
c
a
.1554 .4721
x
~2
c
(]
a
x
c
(]
Mod i f ied
Max. Likelihood
x
c
(]
.0790 .4750 .0487
.0793 .4757 .0485
.0822 .4500 .0487
Duplicates
1 & 2
.2230 .5200
.0728 .7000 .0063
.0476 .9404 .0158
.0659 .7500 .0063
2 & 3
.2385 .4013
.1555 .3500 .0809
.1870 .1950 .1603
.1739 .2500 .1253
3 & 4
.2513 .5361
.1324 .5750 .0356
.1112 .6940 .0197
.1104 .7000 .0186
1 & 4
.1593 .4737
.0612 .5500 .0302
.0601 .5931 .0252
.0695 .5000 .0377
1 & 3
.1859 .4692
.0938 .4500 .0500
.0981 .4233 .0564
.0909 .4750 .0452
Mean
.2116 .4800
.1031 .5250 .0406
.1008 .5692 .0554
.1021 .5350 .0466
SO
.0763 .0471
,0357 .1183 .0246
.0491 .2511 .0544
.0439 .1997 .0466
Note:
Means and SDs are based only on the five reduced permutations of the
data with duplicates.
e·
REFERENCES
Abramowitz. M. and Stegun, I. A. (1972).
Handbook of Mathematical
Functions. Dover Publications. New York.
Amemiya, T. (1977). A note on a heteroscedastic model.
Econometrics 6, 365-370 and corrigenda 8, 265.
Journal of
Bates, D. M., Wolf, D. A. and Watts, J. A. (1985).
Nonlinear least
squares and first order kinetics. Unpublished manuscript.
Beal,
S. L. and Sheiner, L. B. (1985) .
Heteroscedastic nonlinear
regression with pharmokinetic type data. Preprint.
Box, G. E. P. and Hill, W. J. (1974).
Correcting inhomogeneity of
variance with power transformation weighting. Technometrics 16,
385-389.
Box,
G. E. P. and Meyer, R. D. (1986).
Dispersion effects from
fractional designs. Technometrics 28, 19-28.
Carroll. R. J. (1982).
Adapting for heteroscedasticity
models, Annals of Statistics 10, 1224-1233.
in
linear
Carroll, R. J., and Ruppert, D. (1982b). A comparison between maximum
likelihood and generalized least squares in a heteroscedastic
linear model.
Journal of the American Statistical Association
77, 878-882.
Carroll, R. J., and Ruppert, D. (1982a).
Robust estimation in
heteroscedastic linear models, Annals of Statistics 10, 429-441.
Carroll, R. J., and Ruppert, D. (1985).
Power transformations when
fitting theoretical models to data.
Journal of the American
Statistical Association 79, 321-328.
Carroll, R. J., Ruppert, D., and Wu, C. F . J. ( 1986) .
expansion and the bootstrap in generalized least
Preprint.
Variance
squares.
Cook,
R. D. and Weisberg, S. ( 1982) .
.A:R~e~s'-Ai.ldd.ldu.».a"",l~s---"alll.<nl.L:d_-,Iuno!.lf,-,l~u:l.lle::.LnUoc<..:=e
in Regression. New York: Chapman and Hall.
Efron,
B. (1985).
Double exponential families and their use in
#107,
Report
generalized
linear
regression.
Technical
Division of Biostatistics, Stanford University.
Finney, D. J. (1964).
London: Griffin.
Statistical
Methods
in
Biological
Assay.
Finney, D. J. (1976).
Radioligand assay.
Biometrics 32, 721-740.
Fuller, W. A. and Rao, J. N. K. (1978).
Estimation for a linear
regression model with unknown diagonal covariance matrix.
Annals of Statistics 6, 1149-1158.
Giltinan, D. M., Carroll, R. J. and Ruppert, D. (1986).
Some new
methods for weighted regression when there are possible outliers.
Technometrics 28, 219-230.
Glejser, H. (1969). A new test for heteroscedasticity.
American Statistical Association 64, 316-323.
Goldfeld, S. M. and Quandt, R. E. (1972).
Econometrics. Amsterdam: North-Holland.
Gong,
•
Journal of the
Nonlinear Methods
in
G. and Samaniego, F. J. (1981).
Pseudo maximum likelihood
estimation:
theory and applications.
Annals of Statistics
9, 861-869.
Harville, D. (1977).
Maximum likelihood approaches to variance
component estillation and to related problems.
Journal of the
American Statistical Association 79, 302-308.
Harvey, A. C. (1976). Estimating regression models with multiplicative
heteroscedasticity. Econometrics 44, 461-465.
Hildreth, C. and Houck, J. P.
(1968).
SOIRe
linear model with random coefficients.
American Statistical Association 63, 584-595.
Huber,
P. J. (1981).
and Sons.
Robust Statistics.
estimators for a
Journal of the
New York:
John
Wiley
Jacquez, J. A., Mather, F. J. and Crawford, C. R. (1968).
Linear
regression with non-constant, unknown error variances: sampling
experiments with least squares and maximum likelihood estimators.
Biometrics 24, 607-626.
Jacquez, J. A. and Norusis, M. (1973).
Sampling experiments on the
estiaation of paraJReters in heteroscedastic I inear regression.
Bioaetrics 29, 771-780.
Jennrich, R. I. (1969).
squares estimators.
633-643.
Asymptotics properties of nonlinear least
Annals of Mathematical Statistics 40,
Jobson, J. D. and Fuller, W. A. (1980). Least squares estimation when
the covariance matrix and parameter vector are functionally
related.
Journal of the American Statistical Association 75,
176-181.
e·
Matloff, N., Rose, R. and Tal,
methods
for
estimating
Unpublished report.
R. (1985).
A comparison of two
optimal
weights
in
regression.
Quasi-likelihood
McCullagh, P. (1983).
Statistics 11, 59-67.
McCullagh, P. and Neider, J. A.
Chapman & Hall, New York.
(1983).
functions.
Annals
Generalized Linear Models.
Nel, D. G. (1980).
On lIlatrix differentiation in statistics.
African Statistical Journal 14,87-101.
NeIder, J. A. and Pregibon. D. (1986).
function. Preprint.
of
An
~
extended quasi-likelihood
Oppenheimer, L., Capizzi, T. P., Weppelman, R. M. and Mehto, H.
(1983). Determining the lowest limit of reliable assay measurement. Analvtical Chemistry 55, 638-643.
Patterson, H. D. and Thollpson, R. (1971).
Recovery of inter-block
information when block sizes are unequal.
Biometrika 58,
545-554.
Pritchard, D. J., Downie, J. and Bacon, D. W.
consideration
of
heteroscedasticity
in
models. Technometrics 19, 227-236.
Raab,
(1977) .
fitting
Further
kinetic
G. M. (1981).
Estimation of a variance function, with
application to radioimmunoassay. ApPlied Statistics 30, 32-40.
Rodbard D. and Frazier, G. R. (1975).
Statistical analysis
radioligand assay data. Methods of Enzvmology 37, 3-22.
of
Rothenberg, T. J. (1984). Approximate normality of generalized least
squares estillates. Econometrica 52, 811-825.
Ruppert, D. and Carroll, R. J. (1980).
Trimmed least squares
estimation in the linear model.
Journal of the American
Statistical Association 77, 828-838.
Sadler, W. A. and SlIith, M. H. (1985).
error relationship in immunoassay.
1802-1805
Serfling, R. J.
Statistics.
(1980) .
New York:
Estimation of the responseClinical Chemistry 31/11,
Approximation Theorems
John Wiley and Sons.
of
Matheaatical
Stefanski, L. A. and Carroll, R. J. (1985) .
Covariate measurement
error in logistic regression.
Annals of Statistics 13,
1335-1351.
Taguchi, G. and Wu. Y. (1980).
Introduction to Off-line Quality
Control. Nagoya: Central Japan Quality Control Association.
Theil.
H. (1971).
Wiley and Sons.
Principles
of
Econometrics.
New York:
John
Tiede. J. J. and Pagano. M. (1979). The application of robust calibration to radioimmunoassay. Biometrics 35. 567-574.
Wedderburn. R. W. M. (1974).
Quasi-likelihood functions. generalized
linear models and the Gauss-Newton method.
Biometrika 61.
439-447
•
Williams. J. S. (1975). Lower bounds on convergence rates of weighted
least squares to best linear unbiased esti.ators. In A Survey of
Statistical Design and Linear Models. J. N. Srivastava. editor.
Amsterdam: North Holland.
Wolter. K. M.• and Fuller. W. A. (1982).
Estimation of nonlinear
errors-in-variables models. Annals of Statistics 10. 539-548.
..
•
© Copyright 2026 Paperzz