Carroll, R.J. and Ruppert, David; (1986).Diagnostics and Robust Estimation When Transforming the Regression Model and the Response."

Page
1
Diagnostics and Robust Estiaation When Transforaing
The Regression Model and The Response
Revised: September 1986
R. J. Carroll and David Ruppert
Department
(ff
University of
.
~~-:--' .......
Statistics
~orth
~~~'_.",!R~'
•
Carolina
I
~~_~"':'.'::.'_'_W_" __
.~-=~!~tH_1J !~~.: ...-~:,._ ..u.~.L..-.
,
•
•
I
~~."'-J"_'_~~__ 'H~"-:!'!"''':'j...._ •. ~~-~~_",:,·_~-~_'':-~'''·
._-
;·~-~t~·_~·":·~."'-;"'~~"'~"'~T"~-:':'~"""""~~~~~-···-"~···
:
!
..
.
":"_"":'!"'=-~-'
~~.-"''''''''' ~,~,,,: "'::""'~+~""':', ~ _._--~
In regre~sion .~~1¥_~.~~~-",.:te~.!~rS.2~sl!_1.§
.. often transformed to
.... -:
Abstract:
-.
,
remove heteroscedasti-Glty -~~:o'''''''~n'a model already exists
I
.
for the untransformed re-spB-iise;"'thei\""'lt"!cari ~~~ preserved by applying the
.. ~-
i
·'-~c~'<~~·",:,::: ........';"•• ~,~:,~.:.
same transform to both the model and the response.
_.',,__
Iwhich we call "transforJl
papers,
and
appears
'~'~_"""~_~I"'''''''"'
."., ::: ":"'
.brth..s,.ides~..QiUt
highly useful
.::.. -:..__
This methodology,
c
been- applied in several recent
When a
in practice.
parametric
transformation family such as the power transformations is used,
the transformation can be estimated by maximum likelihood.
however,
is very sensitive to outliers.
diagnostics
to
indicate cases
influential
In this article,
for
the
then
The MLE,
we propose
transformation or
Page
regression
parameters.
We
also propose a
robust
2
bounded-influence
,
estimator similar to the Krasker-Welsch regression estimator.
Acknowledge.ent
The research of Professor Carroll was supported in full and the
research of Professor Ruppert was supported in part by Air Force Office
of
Scientific
research
Contract
AFOSR-S-49620-85-C-0144.
Professor
Ruppert's research was also partially supported by the National Science
Foundation Grant DMS-8400602.
We thank the referees for their helpful
criticism and the encouragement to
:"......
,
...".....-
1~"
,
~
.... ,
...
V
rev~se
~i\)~
this article.
',:,J
Brian Aldershof for producing the figures.
(
,.
We also thank
•
Page
3
1. Introduction
In regression analysis. the response y is often transformed for two
distinct purposes, to induce normally distributed, homoscedastic errors
and to improve the fit to some simple model involving an explanatory
variable x.
known
In many situations, however, y is already believed to fit a
model
P
f(x.P),
transformation
of
y
being
is
still
a
p-dimensional
needed
to
parameter.
remove
If
skewness
a
and/or
heteroscedasticity, then this model can be preserved by transforming y
and
f(x,P)
in
the
same
manner.
Specifically,
let
Y
(A)
be
a
transformation indexed by the parameter A and assume that for some value
of
A
(1)
where
Eo
1
.... Eo N are
independent
distributed with variance 1.
and at least approximate I y normal I y
Notice the difference between (1) and the
usual approach of transforming only the response, not f(x,P). i. e.,
(2)
It should be emphasized that model (1) is not intended as a substitute
for
(2).
Both models
circumstances.
are appropriate.
but
under quite
different
Model (2) has been amply discussed by Box and Cox (1964)
and others, e. g .• Draper and Smith (1980). Cook and Weisberg (1982) and
Carroll and Ruppert (1981).
Page
Typically,
in
model
(2),
nonlinear models can be used.
sides",
has been
f(x,P)
is
linear
but
in
4
principle
Model (1), which we call "transform both
investigated
by Carroll
and
Ruppert
(1984),
Snee
(1986), and Ruppert and Carroll (1986) and we will only summarize those
discussions.
According
to
(1),
f(x,P)
has
two
closely
related
interpretations: f(x,P) is the value of y when the error is zero and it
is the median of the conditional distribution of y given x.
In Carroll
and Ruppert (1984), we were concerned with situations where a physical
or biological model provides f(x,P), but where the error structure is a
prior i unknown.
Examples by Snee (1986), Carroll and Ruppert (1984),
Ruppert and Carroll (1985), and Bates, Wolf, and Watts (1985) show that
transforming both sides can be highly effective with real data, both
when a theoretical model is available and. as Snee shows, when f(x,P) is
~-," ~.J ~10
2 "J
Li,,[L::tr
obtained empirically.
'1 Lj'..Jf)-3"':, o~
By estimating
II,
0,
~·$;"'~).di'E,;f':
and p
simultaneously.
;.f~.:~t23L;'} 't~,~)('
i --r:,:Cl"1
fitting the original response y to f(x,P),
rather
than
simply
t-f~l.rl
we achieve two purposes.
First, p is estimated efficiently and therefore we obtain an efficient
c~'
F:"0i£1~L~:tp~1~fi-i ,..;;~.'j~}
·!l.nt~
estimate of the conditional median of y.
~
.2'1:.
Second. we model the entire
.1Jejr-'
condtional distribution of y given x. and.
::c ( .; ~t I
f·d:·~··~~f;:
b
in particular, we have a
Sif i:JtlO'1,(.'
model which can account for the skewness and heteroscedasticity in the
data.
Carroll and Ruppert (1984) discuss the importance of modeling the
or:",,:.]
·';_ti~~&·.~, ~
conditional distribution of y in a
. ",,',
~;,
'J ~
special case.
analysis of the Atlantic menhaden population.
a spawner-recruit
In section 6 we discuss
the estimation of the conditional mean and conditional quantiles of y
given x.
t
Page
5
Many data sets we have examined have had severe outliers in the
~
~
untransforlled response y, but not in the residuals e (fJ,A)
i
f(A)(x,p)];
the
outlying y's.
transformation
has
accommodated,
or
=
[y
(A)
explained,
-
the
There is still the danger, however, that a few outliers
in y can greatly affect p and A.
Outliers should not be automatically
deleted or downweighted, especially when they appear to be part of the
normal variation in the response, but it should be standard practice to
detect and scrutinize influential cases.
When influential cases are present and they have an unacceptable
effect on the MLE, then the best remedy is not simply to delete these
cases but rather to apply a robust estimator.
why this is so.
There are several reasons
First, as Hampel (1985) illustrates, robust estimates
are generally somewhat more efficient than outlier rejection along with
Moreover, outlier rejection
a classical estimator such as the MLE.
t\
",~:.~·~~·:f)a.jcB:tl!Ht1.t(~
br~£,
r?'.)
,;..
~-.
affects the sampling distribution of classical methods in ways that have
rL-', \)
::~~f\.
not been fully studied.
• {.
u.,
~,. ;~:.
1i
f'l 'f
~~ZtIoq~~~:Vl
.l. sn
~
In contrast, the large-sample distribution of
most robust estimators, in particular M-estimators which will be used
.rt.noJ~)~),
.·~"AC
[i;IlC'.::.::lbn,~':
;'l:ilb9ffi
here, can be easily calculated .
.;
In
this
paper
.btu
-ll
we
propose
··l:'·;'
LU1\
,.x
a
n9v.i~
',:
J...:
r.!ji~·(·
case-deletion
<\pgr'lWf.:Xa
·;rr.~
JOj,
diagnostic
and
a
.~'nl!C
"bounded-influence" estimator.
Case deletion diagnostics for linear regression are discussed in
f}:"j
:_.~.t . ::~)1?
i,. (;...'
V. . , -,'
Belsley, Kuh, and Welsch (1980) and Cook and Weisberg (1982), and have
been extended to the response transformation model (2) by Cook and Wang
(1983) and Atkinson (1986).
The last two papers approximate the change
in A as single cases or subsets of cases are deleted.
Page
6
Another approach to influence diagnostics is measuring changes in
the statistical analysis under infinitesimal perturbations in the model.
Cook
(1986)
gives an introduction to this theory, which he calls "local
influence" .
We will not consider local influence for transformation
models, but this seems a promising area for research.
When the response transformation model is used, p depends heavily
on A, and it seells better to examine influence for p only after A has
In constrast, under the "transform both sides" model p
been determined.
and
A
are only weakly related, with p determined by the median of the
untransformed
response,
and
A
determined
by
the
Therefore. influence for p and
heteroscedasticity.
skewness
and
can be treating
A
simultaneously.
For the "transform both'sidea.?liIllGdelwe propose two approximations
T T
to the changes in 9 = (P ,A) as cases are deleted.
next section, the
MLE"}'~aDtd)tt, fOWld
step of an iterative procedure for
As shown in the
a'l",the eleast"?:squares estimate of a
fi.nMmJ'(,;tb~!;estimate
without that
case.
procedure,
in effect udng
matr ix
the
of
approximation is
SUIL
'9Q
~QCAl1i'at"'11J.PPJ'oxilllation
of.- "'SQUarnrsd'D.tt;:x1:hed:pseudo-lIodel.
to the Hessian
The
second
and is equivalent
basedon.Atk1ilsOllJ~II"qulckfestimate",
to using one step of the Gauss.-Newton ratherL than the Newton-Raphson
algorithm.
It is considerably less accurate than the first, but is more
easily implemented on standard software and is useful for diagnostic
purposes.
•
Subset
practice
deletion
because
is
of
the
influential
subsets are
search
them.
for
simple
An
in
large
theory
number
of
to be detected,
alternative
to
but
can
be
possible
Page
7
unwieldly
in
subsets.
If
so.e strategy is needed to
subset
deletion
and
a
good
supplement to single case deletion diagnostics is to compare the fit of
a highly robust estimator with that of the MLE.
In the example of
section 6, a robust estimator reveals an observation whose influence was
masked
by another
observation and
could not
be detected by single
case-deletion diagnostics.
Bounded-influence
estimators
function of each observation.
place
a
bound
'.'~j"~
~J'1.L
7f.;;!J;:.~ . :
(1978), and Krasker and
T
l. ':"Z,\:
21:.)
has }:;qoestlil)ned'; !t;Jtg
(1983)
nature will place
Apparent
~ Iftfi!df;w.for.~L
btmnded- influence
resl1ohse:~.outtt.f'."1)f1eG1rg6>lyiW1fe~l:!tli~ can
response
outll:er-s:'-Jarw 1 fOl1U-lt' _e
~.tto;>:grbss
measurement or recordjng .o.f \the;iexp1llna'tiJr'Y':lJ\faTiables x.
r
response outliers can be :.oaused:bv'
model
for
influence
The last paperandi':Haapfi'l'etJal.' (1986) provide a good
overview.
Huber
the
Bounded-influence regression estimators
have been proposed by Krasker (1980), Hampel
Welsch (1982).
on
IIOdel'~Ib~Enikdown.
do the most
errors
in
In addition,
Both an incorrect
the median response. oro}an"."incorrect specification of
variance function,
say a
the
the
constant' variance model where the variance
actually depends on the median or mean, will lead to response outliers.
Page
8
In the latter case, the y may be outlying only relative to the assumed
variance, not the true variance, but this is enough to cause problems.
For all these reasons, outlying y values are more likely at unusual x
values.
Huber's
objection
to
bounded-influence
estimation
,
follows
logically from his .odel that the errors are identically distributed
wi th a heavy-tailed density,
but this model is unrealistic in many
modeling situations.
Moreover, even if one accepts Huber's conclusions about regression
modeling,
they
transformation
apply
only
parameter.
when
there
is
no
need
The
analog
of
leverage
derivative of the residual e.(9) with respect to A.
1
will usually make this derivative large.
to
for
estimate
A
is
a
the
A response outlier
Therefore, response outliers
can induce high leverage poinits :ini,transformation models even at x' s
that are in no way UlllUsual.
?
X rK ;:;,'6"
';;w
Carroll
bounded-influence
transformation (BIT)
~,t1 . ..to~.ex~t1ing5tlbe·KNslGer-Welschestimator
the response transformation aodel (2).
estimator
to
the
"transform
both
computational aspects and: pr&ppee1;e
to
In this paper we adapt this
sides"
~dimple:
model.
We
also
discuss
U>ne-step estimator that can
be implemented on standard software packages such as SAS.
. ' 1'"\(,'
--
..
.
"
'1<
'1'
~.(. r,
r
'r
.,~c.
2. Weighted Maxiaua Likelibood BstlaatioD
All
estimators
used
weighted log-likelihood.
in
this
paper
are
found
by maximizing a
When the weights are identically 1, then the
I
Page
9
The robust estimators introduced in section 4 are
estimator is the MLE.
weighted MLE's with the weights less than 1 for influential cases and
equal to 1 otherwise.
Let w •...• w be fixed weights.
1
N
will depend on the y's but not the paraaeter 9
= (P T •
For now, they
T
A).
In section 4
the weights will depend on 9,
but 9 will be fixed at a preliminary
estimate 9 .
P
Throughout this paper y(A)
is the modified power transformation
family used by Box and Cox (1964);
Y
(A)
A
(y - 1) /A i f A '/l 0,
log(y) if A = O.
Our analysis will be oondlt1onalort.: the observed x' s.
This is
appropriate both for fixed and random x IS • • ILetc 1j9 = (pT. A ) T be the
vector
of
the
transfor.a~4oDQ and;~2':'regress1Da..w'paraJleters,
x ..
be
log-likelihood for y'i is: ,:
1
11.:
2
2 -1
2
[ e i (9 )] ,
- (1 / 2) log (21(0 ) + ( A - 1) log (y 1) - ( 20)
where
and
let
The
Page
10
The weighted log-likelihood for Yl .··· 'YN is
(3)
For fixed 9
maxiJRizes L(9.a) over 9.
Let Y be the weighted geometric mean of
YI .··· 'YN defined by
L.
,(t~'iZ)
(4)
ax
(8) : L(9,a('»
\ A)..
1
: f.. i ,
l'
'1·
"
,I;:
N
A2
Xi=l wi {- (1/2)log(2Xa ) + (A - l)log(y i ) - l/2}
Since the weights wi are fixed, 9 minimizes
(5)
=
•
SS(9 )
Page
Following Box and Cox (1964), 9 can be computed as follows.
P
A, minimize SS(9) in
11
For fixed
by ordinary (typically nonlinear) least-squares
and call the minimizer PtA).
Plot L
(P(A) ,A) on a grid and maximize
max
graphically or numerically.
This technique is particularly attractive
T
when f is not transformed and f(x,P) = x P for then (5) can be minimized
in P by linear
least-squares.
When transforming both sides,
this
technique is less attractive computationally but for the unweighted MLE
it does give the confidence interval
{A: L
max
(P(A),A)
2
where )(1 (1 - (1) is the (1 - (1),
with one degree of freedom.
q'¥if!,~ile"O~c
tl}e chi-square distribution
Minimizing (5) simultaneously in A and 9 is
straightforward with standard nonlinear
~~.~ ~oftware.
creates a dummy variable 0i which is 0 for all cases.
{ ~ ~\} ~.). ~ >L~
yA
e (9) /
i
~
!, )
'r
Oefine
,_. J
:t. [,,~
= [y(A) - t(A)(Xi,p)] /
(
,: i :
{Ll).
One first
:l.
.
i..c ~~..:\ ~o [ I ~-: ,I
One fits the "pseudo-model"
"
~.!
,,'.,
'~ .. '
• ,.
!
iI·... ",,' ~l ""I) f
• . I .;" .~.
,
;,.'
"\
,
J"
(6)
Iwith "response" 0i' "regression function" Zi (9), "regression parameter"
9, and "independent variable" (Yi'x ).
i
The dummy variable 0i is created
Page
12
because nonlinear least-squares software typically does not allow the
In (6)
response to depend on the parameters.
the real response is
incorporated into the model so that is can be transformed by A.
The standard error of A that is output when fitting (6) with a
least-squares package should not be used.
standard error from
consistently estimate
the
inverse Fisher
the
It is not that saMe as the
inforllation and
it does not
large-saaple standard deviation of A;
see
section 5.
3. Di&postic8
A sillple and easily
interpr~.ted-' way
f,?f measuring the influence of
the ith case is to recoapute 9 with this case deleted.
Since nonlinear
approximations to these case-deletion diagnostics.
Let 9 and 9 (1)- ~ {'~~ .M~~ . ~~ ~ ~it~.,~~p 1 wi t~0ut the ith case, and
E
E
E
let od. = (od fJ i' od Ai) = 9 - 9 (i) be the exact change in 9 upon deletion
A
,.
A
-.
,
1
of this case.
2
Let v~1i (~J a~d, ~v ~."j~'), ,b~ ,~he.._,gradient and Hessian of
~
zi(9).
E
The first step.}pJ:h~H~PPf8~f,a~~~n~}t9,4i.Willbe to ignore the
change in y when y i is,~ele~~<!;'il~Pf~t~n~)~r:evious section, 9 (i) is the
solution to
(7)
o.
Page
An approximate solution to
13
(7) is obtained by taking one step of the
Newton-Raphson algorithm beginning at 9.
This requires the differential
of the left-hand side of (7) which is
(8)
Then the one-step approximation to 9(i) is
.f~5.~,i.
..i<':".!-.:"':'?"'
[1:.:
~~d
t' f ~ fcl· .-
~) ."";
~"f\'"
r.
If y is replaced by p"in'::zi (81,(
Y1""
on Yj'
,Y ·
N
the';'~/.'tr,6 (l1~9;;f does
Thus, onlyth~18if~~~':Hiht~htlnlf:tl~lIe~t:of
, " ", ':',
_'
{ .. -. -\
ItV!>!;j
, 1-)5t;:-~
j
.."' ; , ' .
~.
V
not depend on
2
Z
j
(9) depends
2
so that Zj(9) is'iildependent of alI'otlier entries of v Zj(9).
Therefore, letting 9
o.
0
be the true parameter we have
Page
14
Therefore. letting
(9)
we have vB (j ) (fJ) a 8 j
.
The approximate influence diagnostic for the ith case is
The inverse of 8. can be easily computed using the well-known
J
identity [Rao (1973. page 33)]
T -1
(10)
v A
which holds for nonsingular p x p matrices A and
p~dimensional
and v such that vTA -1·U·jtf1;1~«:SO
u).
6vT..Ji's nonsingular).
"t-Hli't :'Ac...;
Let
u
A
j
(0, ... ,0, I z. (fJ) v
J
2
A
Z. (fJ )
J
I
1/2
). and
vectors u
Page
15
u ..
J
T
Then D
Cj - UjV j '
j
Using
(10)
twice we have
(11 )
and then
(12)
-1
-1
Using (11) and (12) allows us to compute B , ...• B
1
N
-1
single matrix inversion needed to calculate g:.
using only the
If we ignore the D (9)
j
in (9), then we obtain an even simpler, but less accurate, approximation
\l
..dA ~
~
,-
1 }'.
1..:, . .
V L~ .,
is analogous to ;;thE;., aqulv~-est.ilJ"'R~"~)!l,:iagnoS;tric;
(1986)
for the response transformation model.
used by Atkinson
The advantage of ~~ is
1
!that it can be computed using standard software packages that calculate
linear regression diagnostics;
TP"l?O\ thU~,
'.,
model with
\
'Dne creates a linearized
Page
16
as the vector of dependent variables and
~
T
~
(vz (9), ... ,vz (9))
1
as the design matrix.
N
Q
Then
~i'
i = 1; ... ,N, are the diagnostics DFBETA
of Belsley, Kuh, and Welsch (1980, page 13).
compute
a
scaled
diagnostics.
version
DFBETAS,
Some software packages
which
is
equally
useful
for
Cook's 0 or DFFITS from the linearized model can be used
as measures of total influence for p and A.
If
we
used
one
step
of
the
Gauss-Newton,
rather
than
Newton-Raphson. algorithm when solving (7) then we would obtain
A
J
The
~..
difference
between
the
Gauss-Newton
and
the
Q not
~j'
Newton-Raphson
algorithms is that the former uses an approximate Hessian which in the
N
2
~
~
present notation consists of 19ndring'tae term Zj=1 Zj(9)V Zj(9) in the
Hessian
of
the
transformation
sum
of
parameter,
Hessians and the
squares.
For
regression
models
without
the residuals are uncorrelated with
a
their
Gaus~~w&1!~~~xtiaflt~Jn'~~\accePtable.
In the example of section 6 and in all other examples that we have
examined, IA~I sUbstintiil1~ o~e~~~1.at~s 1ar~e values of IA~I while A~
is a good approximatiorl' tb1~~.£.J:<rhjEVWet-~!EPi..ation results from positive
2
2
correlation between zi(9) and ~ /~A Zt(9) causing
A
N
to be positive and of the same magnitude as X j =1
[~/8A
~
2
Zj(9)] .
•
Page
17
If we use A~ only as a diagnostic, then the overestimation of IAA~1
is not a serious problem and it does not prevent us froll detecting
influential cases.
The diagnostic .d~
influence
separately
or
for
its
each
approximations
parameter.
are
vectors
overall
An
showing
measure
of
influence of the ith case is the exact likelihood distance defined by
Cook and Weisberg (1982, section 5.2) as
(13)
LD~1
2[L
max
(9)
-
L
max
(9
(i)
E
2
(.d )T(V L
)] iili
i
Ilax
E
(;».d ..
1
The approximation in (13) is also found in Cook and Weisberg (1982) and
follows from a Taylor expansion using vL
accurate approximation,
approximation one can
max
(9) = O.
We can define an
LD~ ~¥ rQl?l~HiI1gcA~ in (13) ,with .d~. As a quick
u,~·~s,
101
which is a constantlll}l1J:}HY~ ",9;s.fPP~1l" . .~ ~rf»"'.t ,% "pnearized model.
. D
From the above discussl.pJl~EtP~~.M~q.~:thayJ.QiJ wJ,l:.:l:tfl1-ot be an accurate
E
apprOXimation to LD i · ''' •.-
..
r
If"
; " i ,S
'!.f:,' "f.; b,;['. '!oj. :
"
4. Robust Bstlaatlon
Once influential cases have been identified, we must decide how
they should be treated.
In some situations, the statistician will feel
Page
18
that they are valid data, that they do not indicate model deficiencies,
and that they should be allowed full influence on the analysis.
Then
the MLE can be used.
In other situations,
gross errors.
the influential cases will be suspected as
Alternatively, the influential cases may indicate a model
deficiency, but either the sparsity of data near their x values will not
allow a better model to be developed or the data analyst will hesitate
to add complexity to the model merely to accommodate one or at most a
few observations.
Then a robust estimator should be used.
This section is concerned with the robust estimation of I.
scale
parameter a
functional,
e.
g.
can
be estimated separately with a
scale
the media.n absolute devia.tion (MAD) applied to the
residuals froID a robust fit.
function, i. e. the
robust
The
Let's'f(Yi'9.),= vti(Yi,I,a(I» be the score
grad1~Dllrofl~e
log'rrUlwlihood for Yi.
The weighted
MLE satisfies
The ordinary
i .-
(unweighotitcl~)" M,L'E ~iJl''8~nJl.tit;ive6tQ
,cases with large values of
si' in particular, to cases with large values of residual eiU',A), 8/8P
(f(A) (Xi
,P», or
8/lJA
Si)(Y~~',1I)',ci~d~r~'sliHKdfIig'to response, high leverage
points, and points havini'higRJlfif.f~1i2eLforA:: respectively.
,:".
. . . " $ r: .r:' -'
,~.
A robust bounded-influence estfmator can be found by letting wi
decrease as sOlie nor.. of s i (y i ,9) increases.
To do this the weights
•
Page
aust be allowed to depend on 9.
19
Thus we define the estimator 9 as the
solution to
(14)
0,
where wi is a suitable scalar weighting function and, as in section 2,
~2(;) is the weighted variance of the residuals.
The
minimum
requirement
for
robustness is to have -i(y,9) bounded as a function of i, y , and 9.
Otherwise, a single outlier can cause an arbitrarily large change in 9.
It should be mentioned that a bounded-influence estimators may not
have a high breakdown point.
The breakdown point
is
the
largest
percentage of contamination thato.an 'estIMator 'can tolerate before i t can
be overwhellled by the coIrtallj·nan1's·.: tl'l1IHscJleansl.!.tthat-' i f the percentage
of contallinants exceeds the breakdown point, then the estillate can be
forced to take an arbi trary value by choosing the contaminants in a
suff iciently nasty way.
The estta,.tof8
'1
Ftla~:,'fe
define here are related
to the Krasker-Welsch regression estimator, and like that estimator they
will have a breakdotmcpOiintJatl'l~at:,I:(1)+<1.I) ~}J,M wl1l!1'-e1,,;~p+1) is the total
nUlllber of parameters.,
In the case of the
Yohai (1984) have
lipe.~f; ;!Ilq.i\~l~ ~~P8WU~'1(\98~)
•
,
. r -
proposed~ ~'&t;i~~i~p.rs ,,,i t~}'Jl:~8f{, ;50%
the best possible.
Howe,'(er,
the;~~~.,e8timator~
and Rousseeuw and
breakdown points,
have poor efficiency.
Yohai (1985) shows how to achieve" both high asymptotic efficiency and a
near
50%
breakdown
point,
but
again
only
in
the
case
of
linear
Page
regression.
20
In the future, we hope to develop similar estimators for
transformation
models.
A major
difficulty
will
be
computational
When choosing w, asymptotic efficiency measured by the covariance
matrix of 9 must be balanced against robustness measured by the supremum
of some norll of • i (y,9 ), the so-called gross-error sensitivity.
For
univariate parameters there is a unique optimal wi which minimizes the
asymptotic
variance
subject
to
sensitivity (Hampel 1968, 1974).
a
given
bound
on
the
gross-error
For mUltivariate parameters such as 9
the balancing of robustness and variance raises philosophical questions,
since there are many ways of comparing covariance matrices or of norming
vector functions.
Different norms on • i (y i,9) give rise to different
definitions of gross·-error sensi'tiV'l1ty.
The
approach
bounded-influence
we
tatre ·f;;geriera1Jzes·' /t-tie
Krasker-Welsch
regressiotPe-&t'fliate!i'F!l' "Whether
the
(1982)
Krasker-Welsch
estimator optimizes "tlfe- a~OJt:i66:ie:fbv~ance Jiuitrix in any meaningful
I
sense is an open question, but its efficiency at the normal model is
usually close to that of the MLE (Ruppert 1985).
(1982)
Krasker and Welsch
bound the so-called;
'9~n"'st'a1tlfard.-i;~ed'
gross-error sensitivity,
.'.
,..
i
..
which we denote by ., 2'
We will describe ., 2 only briefly and the
interested reader iSJ:!efenred
·~to~)~)thQiIYoriglnal
paper of Krasker and
Welsch or to Hampel etaL,( IS8a" ,oaapt,er; 4) for further details.
First we note that the influencefuncUon ,of 9 satisfying (14) is
-1
B •. (y. ,9), where
1
1
Page
21
1
T
- N- X N v E[l>i(Yi,9)].
i=1
B
This definition of the influence function is conditional on x , ... ,x
1
N
but coincides with the usual definiton when the
XIS
are independent and
identically distributed and sUllmation over i is replaced by expectation
with respect to the
The definition of B is analogous to that on
XiS.
page 5 of Carroll and Ruppert (1985) where w is incorrectly squared.
-1
-1 T
The asymptotic covariance matrix of 9 is B A(B ) where
M
A
N
-1
Xi
N
=
1E [I> • (y
1
T
q
1
9 )1> • (y . , 9 ) ] .
1
1
An intuitively reasonable way to nor:m infl,uenqe function is to use the
asymptotic covariance lIatlilx
(1982) for further mot).vay:qn
qf'j~~th~E! :e~"t~lIla~~~r
~ee
Krasker and Welsch
an,(h~-1~~~_s.101\~i:,?r..u~:i'cl'~suJtant
measure of
defined as
where II v "
M.
M
=
(v™ -1v ) 1/?,:! forl·anyd"ector iv'''amf1 'posi'tive definite matrix
This definition of
'Y·ls:,anal1)go!rsto'·~quat.ion(15)of
s
2
Ruppert (1985) where w has been
fneo~iI'ectly
Carroll and
omitted from the last term.
To robustify the MLE, we will choose the weights so that 'Y
not exceed a predetermined bound.
'Y
s
must be at least (p+1)
1/2
s
does
(Krasker
Page
Fro. experience with this and other probleas we
and Welsch 1982).
suggest bounding"
22
s
1/2
bya(p+1)
where "a" is between 1.1 and 1.6.
The
choice a = 1.5 worked well on the data set in section 6.
If
an
downweighted.
.,
s
observation
has
low
influence
then
it
should
not
be
Otherwise, it should be downweighted just enough to keep
below the given bound.
Therefore the weights should be
(15)
In (15) A must be replaced by an estiaate A.
To calculate 9 we used a simple iterative scheme:
(1)
Fix a > 1.
Set c=l.
used.
£
Let
b~(tp~ t~ta~(. ng~~~r of iterations that will be
Let 9 be a preliminary estimate, possibly the MLE.
p
Set w. = 1 for all i.
1
(2)
Define
(3)
Using (15) update the weights:
(4)
Using the methods of section 2, find the weighted MLE with these
weights. and call it 9.
,l:
(5)
If c < C,
Otherwise, stop.
It
set 9
P
9 ,c'~:'
e
,i;
1, and return to step (2).
is possible to implement this algorithm on standard software
packages, in particular SAS.
Steps (2) and (3) can be computed with a
Page
23
matrix language such as PROC MATRIX on the 1982 version of SAS.
Step
(4) can be performed using a weighted nonlinear least-squares routine
such as PROC NLIN on SAS.
By using macros on SAS or a similar package,
it is possible to put the Ilatrix computations and the least-squares
routines into an iterative loop.
We initially used SAS, but now prefer
the !latrix programming language GAUSS.
One or two iterations seell adequate for diagnostic purposes, but
this algorithm sometimes converges slowly, particularly when there are
extremely influential points.
This was the case wi th the example in
section 6 where the algorithm did not stabilize until ten iterations.
Unfortunately,
the
slow
convergence
gave
the
appearance
that
the
algorithm had converged after only two or three iterations.
We found that a fully iterative version of the algorithm could be
easily implemented with the GAU'SS' oril'an'
..;!
'<'
AI i. III .
IBf pt~At,
"'.
h
and computation time
t
K:
(~
for ten or more iterations was acceptable.
Although bounded-influence estillation limits the effect of any case
on the estillate, all
ca~es. re~ar~:U.~]J,s ,of~,.,haw. :. ae:yi~nJ:
"
<
•
(,~.
the data will have some influence.
frOIl the bulk of
l.i
Hampel el a!.
(1986) discuss the
need for robust estillators that cOllpletely reject extreme outliers.
(f~
~~:
"
t. ....
'
, ., <oj:
h
,_ ,"
1f1
t ~H
For
: ,I
estimation of a location. parameter, this can be d.one with a redescending
"psi-function".
We can define an analog to a redescending
1'(x) be an odd function with 1'(x)
(16)
~
0 for
x;~
psi~function
here.
Let
0, and define the weights
Page
Equation (16) reduces to (15) if
~(x)
~
=
24
is the Huber psi-function
~
x
Ixl
b sign(x)
otherwise.
b
with b = a(p+l)I/2.
If for so.e R > 0,
~(x)
are completely rejected.
=
0 for all Ixl > R, then extreme outliers
In the example we use Hampel's three-part
redescending psi-function
~
(x)
=
b
(17)
o
x
b
1
.. to.1bj {bS"':tt'iJ Itb :"U b2)~)
3
1/2
"
. , ,;;.
b
~
1
2
x
~
b
1
~
x
:s
x:S b
~
b
2
3
\i
with (b , b , b ) = (p+l)
(1.5/,~;51~8.0).
1
2
3
In section 6, the redescending estimate was computed by the same
algorithm used to calCQdate lkeU~ounded+in~luence·estimate.
the
estimate.
In step 1,
bounded-influence
In step (3) th.e•.;::WeJ.;ghts;::·wete'lcaJzculated using (16) and (17)
instead of (15).
; ,-,. ~
.(1 ...
.-=.. r .
..1 ..... '
Page
25
5. Estiaating The Covariance Matrix of 9
The asymptotic covariance matrix of the MLE can be found by (i)
inverting the Fisher information matrix or (ii) using the covariance
matrix of the influence function, the covariance being with respect to
the empirical distribution of (Yi'x ), i=1, ... ,N.
i
Method (ii) is based
on the asymptotic theory of M-estimation; see for example Hampel et al.
(1986, section 4. 2c) .
One advantage of (ii) is that the asymptotic
covariance is consistently estimated even if the
~.
1
are not normally
distributed (Huber 1967) or do not have a constant variance.
In fact,
method (ii) is similar to the jackknife which Wu (1986) advocates as a
consistent
estimate
of
the
least-squares
Moreover,
heteroscedasticity.
bounded-influence and
method
covariance
can
(ii)
be
matrix
used
for
under
the
redesc!!'1l4ing!tsqll8.te~da~rwell.
I
Method (i):
The observed Fisher
informati~n.atrix for
(9,a)
is
(18)
respect to (9,a).
We could
flin.verJt~L(l8hand~:*h;en "take
(p + 1) 2 corner corresponding Eto
the upper left
<;; ~", but;f:bV->\'Pa:1::ef ield( 1977,
1985 ) this
is equivalent to the easier computation of inverting the (P+l)2 matrix
2
-v L
where v
2
A
max
(9),
now is the Hessian with respect to 9 only.
Page
Method
The
( ii ) :
MLE.
bounded-influence
esti.ator.
26
and
the
redescending estimator are all defined by the equation
o
(19)
and only differ in the choice of the weights.
The asymptotic covariance
matrix of any estimator solving an equation of the form (19) can be
--1 -
estimated by [8
--1 T
A (8
)] where
-
8
=
v
T
N
Xi = 1
• i
(y i .9) and
A
The asymptotic theory of M-estimation also shows that the standard
error
- when
A
f itting.?,th.«H~pl@id&1.i.Meft!?1s'i!.eq.kl!oetion 2) are incorrect.
The pseudo-model finds the MLE by .ini.izing
or solving
(20)
When the pseudo-model is fit by a nonlinear least-squares package the
estimated covariance matrix is
Page
27
(21)
The asymptotic
covariance
of
the
solution
to
(20)
is consistently
estimated by
_1
-1 T
BLS '\S ( BLS )
A
A
A
(22)
where
E.
N
1=
B
LS
and
It is not hard to see that (21) and (22)
1 vZ.(Y.,9)
1
1
have different limits so that (21) is inconsistent.
and
(22)
In practice (21)
can be considerably different. especially in the estimated
var iance of
A.
In this section we look at an example.
Our goals are (a) to see
the type of data analytic inform,~~o,>!:h~t~an come from the diagnostics
and the robust estimators,
~andle
(b) to see how well the robust estimators
outlying data, and (c) to compare the accuracy of the "accurate"
a pprOXimation.:11 to the "quick" approximation .:1~.
When managing a fish stock,
the annual
,QD~~su.mQ.d~l
l,~
spawning stock size and
catchable-sized
:
..
the relationship between
the eventual
production of new
fish (returns or recruits) from the spawning.
Ricker
and Smith (1975) give numbers of spawners (S) and returns (R) from 1940
Page
28
influencing
the
until 1967 for the Skeena River sockeye salmon stock.
Using
some
simple
assumptions
about
factors
survival of juvenile fish, Ricker (1954) derived the theoretical model
•
f(S,,I3)
R
relating Rand S.
Other models have been proposed, e. g. by Beverton
and Holt (1957).
However, the Ricker model appears to fit adequately
and,
in particular, gives almost the same fit to this stock as the
Beverton-Holt model.
From Figure 1, a plot of R against S, it is clear that recruitment
is
highly
variable
and
heteroscedastic,
with
the
variance
of
R
increasing with its mean.
SewerakC8ses appear somewhat outlying, in
particular #5, #18, #19,
(.25... '1""j;;
An index plot of
,~jld
A
nfwas_r¢OQ§tJPuCt:~d:.oft.e~Flgure2.
Clearly case
#12 stands out as theAI}O$.'tI'i'influent'idpby this. ,mea'sure , and #5, #19, and
We will examine
The exact case-d'el~tioni·.tatlaUc~~':-:i:ind;the approximations ~l and
i
i
~~ are given in table 1 for these four cases.
Case
particular,
has
#12
&\
~
=
a
h1g!te
'-=Ott ia-n . three parameters.
In
.51, sh'owillg ttiat thlfMt'B!"of A decreases from .31 to
'
-.2 when # 12 is deleted.
To see why,
'lnflu~i\ee'
Why
is'#'1'2~scj1inf1uential?
look again at Figure 1.. Observation #12 is not far
removed from the median relative to the variation in all the data, but
Page
29
it is quite far removed relative to the variation in the other data with
similar values of the independent variable S.
The 27 data points besides .12 suggest that the data are extremely
heteroscedastic,
with
the
variation' in
rapidly wi th the median recruitment.
recruitment
increasing
very
The effect of '12 is to increase
the apparent recruit variation for low median recruitment and to suggest
that the heteroscedasticity is not so nearly pronounced.
Seen in this
Ilight, the much more severe transformation (A = -.2 instead of A = .31)
when '12 is deleted is not surprising.
Besides
suggesting
less
heteroscedasticity
than
seen
in
the
remainder of the data, .12 has a large negative residual which suggests
less right skewness as well.
To further analyze the inf"luence! (1f''''12 on A we will introduce two
alternative estimators
These .w:Cll hb'e aUcussed
of A.
fully
in
a
i
forthcoming paper by Aldel'shof
halld[Rtl~t't?h"" F?,F
the MLE of 13 and define"e(k)!;:
\(~(M}i1\~,:)J1.'1!:lThem'ke-..ness estimator
fixed A. let P(A) be
Ask
is the value of A such i that,'the:-(!skSWl'llfss-: (C'ollff-.it:ietl't"rof the residuals
{e i (9 (A»} is O.
The heteroscedasticity estimatqr::.A
het
is the value of
A such that the corr,latiio~npe~Ke~I).1;,{~l'flAUh~p.db-'{~,l.og(f(xi
,p(A»} is
o.
When case .12 is
o.itt~d t~~ V,~~u~Q~.As~,:p"ly changes
.46, but Ahet changes much. .tur,..~tler.;
of deleting '12 is to
The major effect
increas~'Jtl\e.~ he!teroscedastici ty.
Case '12 was the year 1951
recruitment
f.ro~· .. ~6.'...t90 -.86.
from .58 to
w~~n.a
(Ricker and Smith 1975).
rock slide drastically reduced
For this reason we are quite
Page
comfortable dOWDweighting it severely.
*12 entirely.
30
In fact, it seems best to reject
This, in effect, is what the redescending estimator does;
see below.
Compared to case *12, cases *5, *19, and *25 all have the opposite
effect on A.
Deleting any of them decreases the apparent recruitment
variance when
the number of expected recruits
suggesting less heteroscedasticity.
i
=
For this reason
large,
E
in effect
is positive for
~i
5, 19, and 25.
The effect of *12 on
A.
is
P
is not as easily analyzed
as its effect on
Deleting #12 increases PI from 3.29 to 3.77 and decreases P
-7.00 to -9.54.
2 from
These effects tend to cancel but not completely.
As
shown in Figure 3, the net effect of including *12 is a decrease in
f(S,P) for small S and an increase,for large S.
plausible since #12.
increase in
has<}-a:'~l'l)w.:rvaJrueoo.f:'
f(S,P)
for
s.:
The former effect is
and a negative residual.
large S is not so plausible,
but
it
The
is a
consequence of using cthe ,Rlekel'l ;JlodeUfor uthe Jlledtan recrui tment.
Having analyzed;-<tbs' exao:t'chaBge$fA~,n.weturn to the accuracy of
the approximations A~ and A~.
and *12, especially #12 ,.
'~anli
The accuracy of A~ is poor for cases *5
theseuBBe t»recds-e,ly' the cases of interest.
The same is true of theqapproxiiaations to
-LOt;E
the quick approximation
is rather inaccurate.;
The quick estimate'doesEbetter}fbr,;Of:ases #19 and #25, but this is
lIerely fortui tious.
The overestilllaUorhby the quick estimate is small
here and cancels the error from not recalculating
To
see
this
we
can
examine
the
changes
in
y after
the
case-deletion.
MLE
induced
by
Page
31
case-deletion with y is kept equal to the geometric mean of all the y's.
These changes are -.17, .77, -.013, and -.011 for i
respect! vely.
~
Q,\
=
5, 12, 19, and 25,
In all four cases, the change is closer to
~
A
Ai than to
..
1
We calculated 20 iterations of the bounded-influence estimate with
the tuning constant a
=
1.5, and using this as a starting value we
calculated 10 iterations of the redescending estimate with (b ,b ,b ) =
1 2 3
1/2
(p + 1)
(1.5, 3.5, 8.0).
The bounded-influence estimate changed rapidly for the first five
iterations, more slowly for the next five, and then was stable for the
last ten iterations.
of
To show the behavior of the algorithm, the value
and non-unity values of Wi are given in table 2 for iterations 1,
9
2, 3, 4, 5, 10, and 20.
It
is interesttng:,to examine w ·
4
This is 1 for
the first three i tera tions·;but, eNentuia1 lY"lcdeC!l'eases to .65, less than
i
w and w .
25
19
Case #4 is
influentiaL;Lbut,:1ts~,inf1uelie8dsdlllasked
by
#12.
Robust
estiMation or subset (de,let-i.on ~i!agnp8ttos Jsaea neces9sry to detect the
t:~
influence of #4.
, _- b..
The values of 9 and"nenundty ya;}u8sBot
~)/1!
,
1f~,arelalso
given in table 2
for the redescending estiJlili.')'Case,#i1:a;·:is~>·C\ltlpietelyrejected, e. g.,
w
12
=
0, for all iterations.
The value of w decreases slowly for three
4
iterations and then jumps downward. ta~·al.dst",(}lon -the fourth iteration.
With #4 nearly deleted the weights wi6i
fifth and sixth iterations.
the sixth iteration.
;~wi9'
and w25 readjust on the
The redescending estimate is stable after
I
Page
The
first
iteration
of
the
identical to the MLE without #12.
decreases
to
about
heteroscedasticity.
completely
-.3
As
rejected,
#16
estimator
is
nearly
After #4 is strongly dOWDweighted, A
which
both
redescending
32
#12
suggests
and
becomes
#4
even
are
slightly
completely
influential
and
stronger
or
is
nearly
slightly
dOWDweighted.
We
do
not
necessarily
redescending estimator.
advocate
using
the
full y- i terated
The bounded-influence estiaator or the first
iterate of the redescender give a good fit to the bulk of the data and
reject the outlier #12.
Without further information about these data,
we
dOWDweighting
are
not
comfortable
#4
and
as
#16
much
as
the
fully-Iterated redescendlng estimator dOWDweights them.
If forced to choose one estimate,
redescender.
The residllals-"'{e
we would choose the one-step
ice H' frointhls
estimate are plotted in
IgnoringeI2ft·): J t~l:remafrli'ng' residuals show only slight
Figure 4.
heteroscedasticity'-aiut a'lltoS:t'r~ftlo :AiCewri'e~. ~~n.: 'OJ
,
The redescending estnlator';coaplb'te'l~ rejects #12 so there is no
need
to
ref i t
instructive
to
w-i th' : this J .cItHo.alous' :lcase' removed.
see
bounded-influence and'
wha'tiC};happe~j:'if"
this' is
However,
done.
the';NHieBCetltfing~;esUmatorsconverge
it
Both
is
the
rapidly, with
the first
iterate equalt'l';or';·practical 'purposes to the fully-iterated
estimate.
This suggests that ~thlEFallg6rHI\iIGonverges slowly only in the
presence of an extremely influential point-such as #12.
In table 3 we give the standard errors of the MLE by method (i),
inverting the observed Fisher information matrix.
We also give the
1
Page
method
(c = 10)
(c
(it)
standard errors for
bounded-influence
the MLE,
estimator,
the one-step and
and
the
one-step and
33
iterated
iterated
5) redescending estimator.
The method (i) and method (ii) standard errors of the MLE of A are
moderately
different,
but
both
are
considerably
standard error of A for the robust estimators.
to increase the variabil ity of A,
smaller
than
the
Downweighting #12 seems
but since #12
is known to
be an
unusual year it seems wise to use a robust estimator despite the higher
variability.
The standard errors of 13 1 and 13
2
are smaller for the
robust estimators than for the MLE.
At least in this example, robust estimation appears to cause a loss
in efficiency of A but a gain in efficiency
efficiency for
A
is only
looking at this, either
appare~t",," nQt,
real.
However, this loss in
There are two ways of
co,qd:itJRnin~~ 91' ,n,ot,~nQitioning
,.
that a rock slide
of~.
on the event
.j
occurr~c:l}iIl,,~Plr+c,Co9P;i
t iQna~
gn
~there
being exactly
one rock slide and it occuring, Ml::P'J,y~~r"",h~Q.oPJ' J;I,\l,mber of spawners was
large mean square errol?,
r~ativ.~-tO?th'Bli9~\lst;~~.~till\ators.
condition on the occurence .Qf
some other number of
e1i~ctkY"'(UW,;,~sHcj~",
more variable than i ts
The change
but rather admit that
slid~S.dllC9\l,ld;"h~vel:)pec\lrr~4,~nd
have occurred at any yeara",·:tAftP
that these could
Hi~!I:]eleaF:(~that,the MLE
is really much
standat'd~~llrQ:l!csho.ws ;;i,
in A when #1?; '.f.s
standard errors of A.
If we do not
Notice that
del~t~d
E
.d A 12
is
large relative to the
is over twice the standard
error of the MLE and about 1.65 times the standard error of the one-step
redescending estimator.
Page
34
Table 1.
Deletion diagnostics for the most influential cases.
The
superscripts denote the method of calculation: E = exact, A = accurate
approximation, Q = quick approximation.
DIAGNOSTIC
CASE NUMBER
AEA.
1
AAA
Q
A ,\
A
A
A
i
i
E
PI , 1.
A
PI ,1.
f>i:.! .
Q
PI ,1.
l
A
A
A
.'i'" .
~
~.: i,~'
A
zr.
(j.
Q
a~
81
P2 ,1.
19
25
-.10
.51
-.034
.033
-.14
.61
-.015
-.015
-.32
1.57
-.036
-.034
-.22
-.48
.08
.12
-.22. .,
-.45
.11
.15
-.34
-.43
.10
.14
1.48 .
2.54
-1.02
-1.25
1.47'"
2.57
-1.14
-1.38
1.80";;·
3 ..72
-1.12
-1.36
~;;
......"
1°
P2 ,1.
12
•
E
P2 ,1.
5
*'.
'-'1
dJ(l.
~
;CO,
'
,. I
j'
LO~1
.58
9.64
.32
.38
LoA
.72
8.7
.27
.34
LO~1
1.29
24.3
.27
.34
i
e
Page
35
Table 2.
Estimates of P and II and the case weights for the robust
estimators.
BIE = bounded-influence estimator.
RE = redescending
estillator.
ESTIMATES
PI
CASE WEIGHTS
P2
II
w
4
w
5
w
12
w
16
w
19
w
25
MLE
3.29
-7.00
.31
1
1
1
1
1
1
BIE:c=1
3.49
-8.00
.27
1
.71
.58
1
1
1
c=2
3.59
-8.54
.20
1
.63
.34
1
1
1
c=3
3.66
-8.87
.12
1
.62
.20
1
1
.99
c=4
3.70
-9.11
.06
.98
.62
.13
1
1
.98
c=5
3.74
-9.31
.02
.91
.62
.087
1
1
.96
c=10
3.84
-9.71
-.07 '
.69
.63
.041
.97
1
.91
-9.74
,". ;."
-.08" '"
.65
.64
.038
.92
1
.91
.65
.64
0
.92
1
.91
.52
.71
0
.86
.97
.86
c=20
3.85
r.
l:::
RE: c=l
3.91
-10.1
-.21
c=2
3.93
-10.1
-.23
c=3
3.92
-10.0;'0
,-.23
.46
.75
0
.81
.94
.82
c=4
3.91
-9.97
,-.23 0
.43
.76
0
.78
.92
.81
c=5
4.11
-10.7
-.36
.006
.77
0
.76
.90
.79
c=10
4.00
-9.94
-.34
.005
.91
0
.64
.71
.61
H
f:l
Page
Table 3:
36
Standard errors of p and A.
ESTIMATOR
MLE (inverse Fisher
infor.ation - method ( i ) )
(as an M-estimator method (if))
DIE:
One-step (c=1)
P1
P2
.75
3.40
.66
.54
c=10
RE:
3.46
A
.21
.16
3.01
.42
3.51
.39
One-step (c=1)
.51
2.79
.35
Fully iterated (c=10)
.43
2.37
.35
Page
37
LIST OF FIGURES
Figure 1:
Plot of returns (or recruits) against spawners with median
recruitment estimated using the one-step redescending estimate.
and spawners are in thousands of fish.
Figure 2:
Returns
Selected cases are identified.
Index plot of (LD~)1/2, the square root of the approximate
1
likelihood distance.
Figure 3:
Difference in median recruitment estimated with and without
case '12.
Figure 4:
Residuals from the
one-s~ep
redescending estimator.
-..
',
t.t· .
i
...
I
_~
•.i
r
3000
•
2400
.......
'0'
•
• • • • • • • • • • • • • • • •0
'u
_0"
.
••••••••• 0
'M5
•••••••
_0_
.
• • • • • • • • • 0,
•••••••
_0_
•••••••
,0 • • • • • • • •
M
:
II)
~
•••••••
:
·
1800
M1B
Q)
:
9M
'M
~
:
:
a:
•••••••
:
;.
.
M
M
1200
~
600
~ ,~~
• • •M,
,
M
,,
.
•
,
,
,
, , .... '.' . , .. '';22:'. ... , , , ,
,
,,~,
M6 '
••
,
,
,:.
,
,
,
,
•
,1,~~,"25.
,
... ,:. , , . , , , .
M4
16
OLJ-L.-L...L....L.-I-.1--L.-L...L-.L..JI...-I......J......L-..L...-l...-I.--L.......I-...L...-..L.-..L-.L---L.-I-...L...."--I.......J
o
200
400
600
1000
BOO
1200
Spawners
f:, ,.
2.5
o
•
'* • • • • • • ",0 • • ".0 • • "0 • •
.
~
..
.
. '....,..
.
.
.
'* • •
'0'
«'
•• ,.", ~ ~.'.~~~.. *':-.,-;:-"~,. --:-.• ...,
••-.-• •-+-i-:-':"'7 -.-... ~,~~ ..-. ~
2
1~...J·riri
•
. '".1 .
•
1.5
•
~
•
•
•
"0
•
•
'"0
•
•
.'.
•
"
.'.
•
•
.'.
•
•
.•••
•
•
0° •
•
•
:
, ...•... ,
•
•
•
:
•
•
•
:
•
•
•
~
.
•
•
•
°0
•
•
..
.
..
..
..
.. ..
............................
.
1
0.5
OLLLJ..J.It:WLu...J-.u...J-.u...J-.u.Lu.LL.U..JL.U..Ju..J!!I~::J..l.J..LU..LU..LU.J...U:::u..J..l.U!UlJ..l..l.lll.u..u
o
2
4
6
B
10
12
14
16
Case Number
18
20
22
24
26
28
200
....
CIJ
..
150
.
.
.
..
........................................
.
.
.
100
.......
N
..-4
C
'.'
.
'.'
'.'
eu
.c
~
50
c
....
.
R:I
'0
eu
%
Ot------'-----..:..--~"'-.....:-.---....:.------=----......j
....
C
eu
CI
C
R:I
......
-50
~
.
--~
.
.
5
-100 L..-l---L...............I-..l......J'--I..--'--'--""----...........--A-...........J-.l.-.J'---L--'--'--""----...........--A-.....L-..J-.l.-.J'---L--l
o
200
400
600
BOO
1000
1200
Spawners
f
~*,,,gr
t
.. ..
~,
"
16..
. .....
.
.. ~a
1\,
:-1
'~
"
: .. 5
.
ot"
.....
i i
O~---"""-:::~~-"""-~-....,..-.-..--.......
'
....
CIJ
.....
R:I
;:,
'0
....en
..4
eu
-
I' \
I'"
...
','
,
..
9"
--....,....--------_t
"22'
.. 6
,,
\
f
\
1~....25
I
ex:
..
.
.
.
.
....................................................
.
..
.
.
.
-0.25
-0 . 5 L..-l---L.-J--'--..J--.l.-.J'--&.--L.....J-..J-..l......JL..-l---l...............................................---'-..............a...-""----L..-l---l..
o
200
400
600
Spawners
800
1000
.....
1200
Page
38
REFERENCES
Atkinson, A. C. (1986) .
"Diagnostic tests
Techno.etrics, 28, 29-38
for
transformations."
Bates, D. M. and Watts, D. G.
(1980).
"Relative curvature measures
of nonlinearity."
Journal of the Royal Statistical Society,
Series D, 42, 1-25.
Bates, D. M., Wolf, D. A., and Watts, D. G. (1986). "Nonlinear least
squares and first-order kinetics."
in Proceedings of Co.puter
Science and Statistics: Seventeenth Sy!posiu. on the Interface.
David Allen, ed. New York: North-Holland.
Belsley, D. A., Kuh, E., and Welsch,
Diasnostics. New York: Wiley.
Box,
R.
(1980) .
E.
IRgpI
limt
G. E. P. and Cox, D. R.
(1964).
"An analysis of
transformations (with discussion)."
Journal of the Royal
Statistical Society, Series D, 26, 211-246.
Box, G. E. P. and Hill, W. J.
(1974).
"Correcting inhomogeneity of
variance with power transformation weighting."
'1ft hee_t:rics,
16, 385-389.
,<'-2~~iJ:.-;tf:__ ,::?~0",'
Carr, N. L.
(1960).
"Kinetics of catalytic isomerization of
n-pentane. " Ind1ist:tfir'and'1tm~i~eii-1n, Cbealstry, 52, 391-396.
'- Si-;, -~U .;. ,!L "'f!:~!tl!'-E.ll!£~:_J
Carroll, R. J. and Ruppert, D. (1981). "On prediction and the power
~',-,rnlation f1'tliHy>." "'Bio_litke, &8,'609-&16.
~.,":.;;
u '{;)-~:' ~~ 8
,~'/f'; ~.:'
f~f~'~-:-~J
,.12
Carroll, R. J. ana-'--RU\;perf;;:~!f.Ll'l~l!;-!'~ower'-transformations when
fitting theoretical models to data."
Journal of the Aaerican
Statistical Association, 79, 321-328 .
. '~';;:,
1·";'
HL t t··:!},
12.6~ ...~"
,':"
'.
Carroll. R. J. andL~R1ippe~t-. tBDP~H9af;)r.'-G' "Transformations
regression: a robust analysis." Techno.etrics, 27. 1-12.
t:.:,'::!';;
U;.t~!21
l
J
in
' -
Cook, R. D.
(1986).:,o_:~qses@"1!.!b,'" ~f influence."
the ROyal Statistical Societ~;~~ri. . D. To appear.
.JCRIr'IIIl1
or
Cook, R. D. -and Wang, P. C. (1983). "Transformation and influential
cases in regression. ",Technol,trics. :;25, 337-343.
1 for::,'
Cook, R. D. and Weisberg, S.(1982).
lleat.maI. . . . . Idl1ll!!DCe .ill
Regression. New York and London. Chapman and Hall.
Draper. N. and Smith, H.
(1981).
Edition. Wiley: New York.
Applied Regression ADaIJ!is, 2nd
Page
39
Hampel. F. R.
(1974).
"The influence curve and its role in robust
estimation. "
Joanaal of the AaeriCaD Statistical Association,
62. 1179-1186.
Hampel, F. R. (1978). "Optimally bounding the gross-err-sensitivity
and the influence of position in factor space."
in • • •
Proceedings of the ASA Statistical Co.puting Section.
ASA,
Washington, D. C., 59-64.
Hampe I, F. R.
(1985) .
"The breakdown points of the mean combined
with soae rejection rules." Technoaetrics, 27, 95-107.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel. W. A.
(1986). Robust Statistics. John Wiley and Sons:New York.
Krasker.
W.
S.
and Welsch.
R.
E.
(1982).
"Efficient
bounded-influence regression estimation."
Journal of the
American Statistical Association, 77. 595-604.
Patefield, W. M.
(1977).'
Sankbya, B, 39, 92-96.
"On the maxillized likelihood function."
Patefield. W. M. (1985). "Inforllation from the maximized likelihood
function." Bio.etrika. 72, 664-668.
f. ( : .
Rao.
,~,
C. R.
(1973).
Linear statistical inference and
applications, 2nd. edition. John Wiley and Sons: New York.
.
[-,'~'.
~'f
\', fi ~. ;~.
,_ a,':~
,
Ricker. W. E.
( 1951,:»)n:J .,,:;~-tP~~~~d'H;~-e«t.ttl~~.-m:t/·
Fisheries ResearchBOircl' orCiDi -,, rC"·'SSlf;623.
,:'.~'.
(:0"
'1;,1
.G
.,'·i';:.F~Jj.
its
Journal
of
i..,-
Ricker. W. E. amd SJlith" H. ~1 .C<-JP7...5)'r,>UJ'Arev,i~M:! ~~terpretation of
the history of the Skeena·· ..River' 'sockeye salmon (Oncorhynchus
nerka) . " Journal of"then Pisb!l'J,es Researsh Board, ·of Canada. 32.
1369-1381.
..,~·L'H~ .., · · f'JriL' 0:1 ;;..I~;j<iI'; '(.,~:r~;"("
?~[ :Sf ,e~ .ooljBt~o2S;
Rousseeuw. P. J.
(1984) .
"Least aedian()f" 'squares regression.
Journal of the Aaerican,~t~~~stigal ~~~9G!ati~~ 79, 871-880.
~ ~! "1 J .~.. iI\.fiti~;f:!1
2
i
~::'~'
. 1.: ni::, J? ~J d:;'"
Rousseeuw, P. J. and Yah'a! ,'V.-l1984) . "Robust regression by means
of S-estillators." tin RabJIJit ad:! NOIlUneu" Tille ~series Analysis,
Franke, J.. HardIe. W. ~,-_~_~.~i!fL-:::J!;J2Q,~ J~~~tt;' .. $pr inger-Ver lag:
Berlin.
; .
-,~,
Ruppert, D.
(1985).
"On the bounde.,£>1nf}Uence regress10n estimator
of Krasker and Welsch."
Journal of the American Statistical
Association, 80, 205-208.
":
Ruppert, D. and Carroll. R. J.
(1985).
"Data transformations in
regression analysis with applications to stock-recruitment
relationsh1ps."
in Resource IfIuuICe-Dt:
ProceediD2s of the
Second Ralf Yorque Workshp held in Ashland, Oregon, July 23-25,
1984. S. Levin, ed. Springer-Verlag: Berlin.
Page
40
Snee, R. D. (1986) . "An alternative approach to fitting models when
re-expression of the response is useful."
.J~ ~ Qaality
Technology. To appear.
Wu, C. F. J.
(1986).
"Jackknife, bootstrap and other resampling
methods in regression analysis.
Annals of Statistics, 14. To
appear.
Yohai, V.
(1985) . Righ breakdOllD point and high efficiency robust
estiaates for regression. Technical Report No. 66, Department of
Statistics, University of Washington, Seattle, Washington.
•
e"