Page 1 Diagnostics and Robust Estiaation When Transforaing The Regression Model and The Response Revised: September 1986 R. J. Carroll and David Ruppert Department (ff University of . ~~-:--' ....... Statistics ~orth ~~~'_.",!R~' • Carolina I ~~_~"':'.'::.'_'_W_" __ .~-=~!~tH_1J !~~.: ...-~:,._ ..u.~.L..-. , • • I ~~."'-J"_'_~~__ 'H~"-:!'!"''':'j...._ •. ~~-~~_",:,·_~-~_'':-~'''· ._- ;·~-~t~·_~·":·~."'-;"'~~"'~"'~T"~-:':'~"""""~~~~~-···-"~··· : ! .. . ":"_"":'!"'=-~-' ~~.-"''''''''' ~,~,,,: "'::""'~+~""':', ~ _._--~ In regre~sion .~~1¥_~.~~~-",.:te~.!~rS.2~sl!_1.§ .. often transformed to .... -: Abstract: -. , remove heteroscedasti-Glty -~~:o'''''''~n'a model already exists I . for the untransformed re-spB-iise;"'thei\""'lt"!cari ~~~ preserved by applying the .. ~- i ·'-~c~'<~~·",:,::: ........';"•• ~,~:,~.:. same transform to both the model and the response. _.',,__ Iwhich we call "transforJl papers, and appears '~'~_"""~_~I"'''''''"' ."., ::: ":"' .brth..s,.ides~..QiUt highly useful .::.. -:..__ This methodology, c been- applied in several recent When a in practice. parametric transformation family such as the power transformations is used, the transformation can be estimated by maximum likelihood. however, is very sensitive to outliers. diagnostics to indicate cases influential In this article, for the then The MLE, we propose transformation or Page regression parameters. We also propose a robust 2 bounded-influence , estimator similar to the Krasker-Welsch regression estimator. Acknowledge.ent The research of Professor Carroll was supported in full and the research of Professor Ruppert was supported in part by Air Force Office of Scientific research Contract AFOSR-S-49620-85-C-0144. Professor Ruppert's research was also partially supported by the National Science Foundation Grant DMS-8400602. We thank the referees for their helpful criticism and the encouragement to :"...... , ...".....- 1~" , ~ .... , ... V rev~se ~i\)~ this article. ',:,J Brian Aldershof for producing the figures. ( ,. We also thank • Page 3 1. Introduction In regression analysis. the response y is often transformed for two distinct purposes, to induce normally distributed, homoscedastic errors and to improve the fit to some simple model involving an explanatory variable x. known In many situations, however, y is already believed to fit a model P f(x.P), transformation of y being is still a p-dimensional needed to parameter. remove If skewness a and/or heteroscedasticity, then this model can be preserved by transforming y and f(x,P) in the same manner. Specifically, let Y (A) be a transformation indexed by the parameter A and assume that for some value of A (1) where Eo 1 .... Eo N are independent distributed with variance 1. and at least approximate I y normal I y Notice the difference between (1) and the usual approach of transforming only the response, not f(x,P). i. e., (2) It should be emphasized that model (1) is not intended as a substitute for (2). Both models circumstances. are appropriate. but under quite different Model (2) has been amply discussed by Box and Cox (1964) and others, e. g .• Draper and Smith (1980). Cook and Weisberg (1982) and Carroll and Ruppert (1981). Page Typically, in model (2), nonlinear models can be used. sides", has been f(x,P) is linear but in 4 principle Model (1), which we call "transform both investigated by Carroll and Ruppert (1984), Snee (1986), and Ruppert and Carroll (1986) and we will only summarize those discussions. According to (1), f(x,P) has two closely related interpretations: f(x,P) is the value of y when the error is zero and it is the median of the conditional distribution of y given x. In Carroll and Ruppert (1984), we were concerned with situations where a physical or biological model provides f(x,P), but where the error structure is a prior i unknown. Examples by Snee (1986), Carroll and Ruppert (1984), Ruppert and Carroll (1985), and Bates, Wolf, and Watts (1985) show that transforming both sides can be highly effective with real data, both when a theoretical model is available and. as Snee shows, when f(x,P) is ~-," ~.J ~10 2 "J Li,,[L::tr obtained empirically. '1 Lj'..Jf)-3"':, o~ By estimating II, 0, ~·$;"'~).di'E,;f': and p simultaneously. ;.f~.:~t23L;'} 't~,~)(' i --r:,:Cl"1 fitting the original response y to f(x,P), rather than simply t-f~l.rl we achieve two purposes. First, p is estimated efficiently and therefore we obtain an efficient c~' F:"0i£1~L~:tp~1~fi-i ,..;;~.'j~} ·!l.nt~ estimate of the conditional median of y. ~ .2'1:. Second. we model the entire .1Jejr-' condtional distribution of y given x. and. ::c ( .; ~t I f·d:·~··~~f;: b in particular, we have a Sif i:JtlO'1,(.' model which can account for the skewness and heteroscedasticity in the data. Carroll and Ruppert (1984) discuss the importance of modeling the or:",,:.] ·';_ti~~&·.~, ~ conditional distribution of y in a . ",,', ~;, 'J ~ special case. analysis of the Atlantic menhaden population. a spawner-recruit In section 6 we discuss the estimation of the conditional mean and conditional quantiles of y given x. t Page 5 Many data sets we have examined have had severe outliers in the ~ ~ untransforlled response y, but not in the residuals e (fJ,A) i f(A)(x,p)]; the outlying y's. transformation has accommodated, or = [y (A) explained, - the There is still the danger, however, that a few outliers in y can greatly affect p and A. Outliers should not be automatically deleted or downweighted, especially when they appear to be part of the normal variation in the response, but it should be standard practice to detect and scrutinize influential cases. When influential cases are present and they have an unacceptable effect on the MLE, then the best remedy is not simply to delete these cases but rather to apply a robust estimator. why this is so. There are several reasons First, as Hampel (1985) illustrates, robust estimates are generally somewhat more efficient than outlier rejection along with Moreover, outlier rejection a classical estimator such as the MLE. t\ ",~:.~·~~·:f)a.jcB:tl!Ht1.t(~ br~£, r?'.) ,;.. ~-. affects the sampling distribution of classical methods in ways that have rL-', \) ::~~f\. not been fully studied. • {. u., ~,. ;~:. 1i f'l 'f ~~ZtIoq~~~:Vl .l. sn ~ In contrast, the large-sample distribution of most robust estimators, in particular M-estimators which will be used .rt.noJ~)~), .·~"AC [i;IlC'.::.::lbn,~': ;'l:ilb9ffi here, can be easily calculated . .; In this paper .btu -ll we propose ··l:'·;' LU1\ ,.x a n9v.i~ ',: J...: r.!ji~·(· case-deletion <\pgr'lWf.:Xa ·;rr.~ JOj, diagnostic and a .~'nl!C "bounded-influence" estimator. Case deletion diagnostics for linear regression are discussed in f}:"j :_.~.t . ::~)1? i,. (;...' V. . , -,' Belsley, Kuh, and Welsch (1980) and Cook and Weisberg (1982), and have been extended to the response transformation model (2) by Cook and Wang (1983) and Atkinson (1986). The last two papers approximate the change in A as single cases or subsets of cases are deleted. Page 6 Another approach to influence diagnostics is measuring changes in the statistical analysis under infinitesimal perturbations in the model. Cook (1986) gives an introduction to this theory, which he calls "local influence" . We will not consider local influence for transformation models, but this seems a promising area for research. When the response transformation model is used, p depends heavily on A, and it seells better to examine influence for p only after A has In constrast, under the "transform both sides" model p been determined. and A are only weakly related, with p determined by the median of the untransformed response, and A determined by the Therefore. influence for p and heteroscedasticity. skewness and can be treating A simultaneously. For the "transform both'sidea.?liIllGdelwe propose two approximations T T to the changes in 9 = (P ,A) as cases are deleted. next section, the MLE"}'~aDtd)tt, fOWld step of an iterative procedure for As shown in the a'l",the eleast"?:squares estimate of a fi.nMmJ'(,;tb~!;estimate without that case. procedure, in effect udng matr ix the of approximation is SUIL '9Q ~QCAl1i'at"'11J.PPJ'oxilllation of.- "'SQUarnrsd'D.tt;:x1:hed:pseudo-lIodel. to the Hessian The second and is equivalent basedon.Atk1ilsOllJ~II"qulckfestimate", to using one step of the Gauss.-Newton ratherL than the Newton-Raphson algorithm. It is considerably less accurate than the first, but is more easily implemented on standard software and is useful for diagnostic purposes. • Subset practice deletion because is of the influential subsets are search them. for simple An in large theory number of to be detected, alternative to but can be possible Page 7 unwieldly in subsets. If so.e strategy is needed to subset deletion and a good supplement to single case deletion diagnostics is to compare the fit of a highly robust estimator with that of the MLE. In the example of section 6, a robust estimator reveals an observation whose influence was masked by another observation and could not be detected by single case-deletion diagnostics. Bounded-influence estimators function of each observation. place a bound '.'~j"~ ~J'1.L 7f.;;!J;:.~ . : (1978), and Krasker and T l. ':"Z,\: 21:.) has }:;qoestlil)ned'; !t;Jtg (1983) nature will place Apparent ~ Iftfi!df;w.for.~L btmnded- influence resl1ohse:~.outtt.f'."1)f1eG1rg6>lyiW1fe~l:!tli~ can response outll:er-s:'-Jarw 1 fOl1U-lt' _e ~.tto;>:grbss measurement or recordjng .o.f \the;iexp1llna'tiJr'Y':lJ\faTiables x. r response outliers can be :.oaused:bv' model for influence The last paperandi':Haapfi'l'etJal.' (1986) provide a good overview. Huber the Bounded-influence regression estimators have been proposed by Krasker (1980), Hampel Welsch (1982). on IIOdel'~Ib~Enikdown. do the most errors in In addition, Both an incorrect the median response. oro}an"."incorrect specification of variance function, say a the the constant' variance model where the variance actually depends on the median or mean, will lead to response outliers. Page 8 In the latter case, the y may be outlying only relative to the assumed variance, not the true variance, but this is enough to cause problems. For all these reasons, outlying y values are more likely at unusual x values. Huber's objection to bounded-influence estimation , follows logically from his .odel that the errors are identically distributed wi th a heavy-tailed density, but this model is unrealistic in many modeling situations. Moreover, even if one accepts Huber's conclusions about regression modeling, they transformation apply only parameter. when there is no need The analog of leverage derivative of the residual e.(9) with respect to A. 1 will usually make this derivative large. to for estimate A is a the A response outlier Therefore, response outliers can induce high leverage poinits :ini,transformation models even at x' s that are in no way UlllUsual. ? X rK ;:;,'6" ';;w Carroll bounded-influence transformation (BIT) ~,t1 . ..to~.ex~t1ing5tlbe·KNslGer-Welschestimator the response transformation aodel (2). estimator to the "transform both computational aspects and: pr&ppee1;e to In this paper we adapt this sides" ~dimple: model. We also discuss U>ne-step estimator that can be implemented on standard software packages such as SAS. . ' 1'"\(,' -- .. . " '1< '1' ~.(. r, r 'r .,~c. 2. Weighted Maxiaua Likelibood BstlaatioD All estimators used weighted log-likelihood. in this paper are found by maximizing a When the weights are identically 1, then the I Page 9 The robust estimators introduced in section 4 are estimator is the MLE. weighted MLE's with the weights less than 1 for influential cases and equal to 1 otherwise. Let w •...• w be fixed weights. 1 N will depend on the y's but not the paraaeter 9 = (P T • For now, they T A). In section 4 the weights will depend on 9, but 9 will be fixed at a preliminary estimate 9 . P Throughout this paper y(A) is the modified power transformation family used by Box and Cox (1964); Y (A) A (y - 1) /A i f A '/l 0, log(y) if A = O. Our analysis will be oondlt1onalort.: the observed x' s. This is appropriate both for fixed and random x IS • • ILetc 1j9 = (pT. A ) T be the vector of the transfor.a~4oDQ and;~2':'regress1Da..w'paraJleters, x .. be log-likelihood for y'i is: ,: 1 11.: 2 2 -1 2 [ e i (9 )] , - (1 / 2) log (21(0 ) + ( A - 1) log (y 1) - ( 20) where and let The Page 10 The weighted log-likelihood for Yl .··· 'YN is (3) For fixed 9 maxiJRizes L(9.a) over 9. Let Y be the weighted geometric mean of YI .··· 'YN defined by L. ,(t~'iZ) (4) ax (8) : L(9,a('» \ A).. 1 : f.. i , l' '1· " ,I;: N A2 Xi=l wi {- (1/2)log(2Xa ) + (A - l)log(y i ) - l/2} Since the weights wi are fixed, 9 minimizes (5) = • SS(9 ) Page Following Box and Cox (1964), 9 can be computed as follows. P A, minimize SS(9) in 11 For fixed by ordinary (typically nonlinear) least-squares and call the minimizer PtA). Plot L (P(A) ,A) on a grid and maximize max graphically or numerically. This technique is particularly attractive T when f is not transformed and f(x,P) = x P for then (5) can be minimized in P by linear least-squares. When transforming both sides, this technique is less attractive computationally but for the unweighted MLE it does give the confidence interval {A: L max (P(A),A) 2 where )(1 (1 - (1) is the (1 - (1), with one degree of freedom. q'¥if!,~ile"O~c tl}e chi-square distribution Minimizing (5) simultaneously in A and 9 is straightforward with standard nonlinear ~~.~ ~oftware. creates a dummy variable 0i which is 0 for all cases. { ~ ~\} ~.). ~ >L~ yA e (9) / i ~ !, ) 'r Oefine ,_. J :t. [,,~ = [y(A) - t(A)(Xi,p)] / ( ,: i : {Ll). One first :l. . i..c ~~..:\ ~o [ I ~-: ,I One fits the "pseudo-model" " ~.! ,,'., '~ .. ' • ,. ! iI·... ",,' ~l ""I) f • . I .;" .~. , ;,.' "\ , J" (6) Iwith "response" 0i' "regression function" Zi (9), "regression parameter" 9, and "independent variable" (Yi'x ). i The dummy variable 0i is created Page 12 because nonlinear least-squares software typically does not allow the In (6) response to depend on the parameters. the real response is incorporated into the model so that is can be transformed by A. The standard error of A that is output when fitting (6) with a least-squares package should not be used. standard error from consistently estimate the inverse Fisher the It is not that saMe as the inforllation and it does not large-saaple standard deviation of A; see section 5. 3. Di&postic8 A sillple and easily interpr~.ted-' way f,?f measuring the influence of the ith case is to recoapute 9 with this case deleted. Since nonlinear approximations to these case-deletion diagnostics. Let 9 and 9 (1)- ~ {'~~ .M~~ . ~~ ~ ~it~.,~~p 1 wi t~0ut the ith case, and E E E let od. = (od fJ i' od Ai) = 9 - 9 (i) be the exact change in 9 upon deletion A ,. A -. , 1 of this case. 2 Let v~1i (~J a~d, ~v ~."j~'), ,b~ ,~he.._,gradient and Hessian of ~ zi(9). E The first step.}pJ:h~H~PPf8~f,a~~~n~}t9,4i.Willbe to ignore the change in y when y i is,~ele~~<!;'il~Pf~t~n~)~r:evious section, 9 (i) is the solution to (7) o. Page An approximate solution to 13 (7) is obtained by taking one step of the Newton-Raphson algorithm beginning at 9. This requires the differential of the left-hand side of (7) which is (8) Then the one-step approximation to 9(i) is .f~5.~,i. ..i<':".!-.:"':'?"' [1:.: ~~d t' f ~ fcl· .- ~) .""; ~"f\'" r. If y is replaced by p"in'::zi (81,( Y1"" on Yj' ,Y · N the';'~/.'tr,6 (l1~9;;f does Thus, onlyth~18if~~~':Hiht~htlnlf:tl~lIe~t:of , " ", ':', _' { .. -. -\ ItV!>!;j , 1-)5t;:-~ j .."' ; , ' . ~. V not depend on 2 Z j (9) depends 2 so that Zj(9) is'iildependent of alI'otlier entries of v Zj(9). Therefore, letting 9 o. 0 be the true parameter we have Page 14 Therefore. letting (9) we have vB (j ) (fJ) a 8 j . The approximate influence diagnostic for the ith case is The inverse of 8. can be easily computed using the well-known J identity [Rao (1973. page 33)] T -1 (10) v A which holds for nonsingular p x p matrices A and p~dimensional and v such that vTA -1·U·jtf1;1~«:SO u). 6vT..Ji's nonsingular). "t-Hli't :'Ac...; Let u A j (0, ... ,0, I z. (fJ) v J 2 A Z. (fJ ) J I 1/2 ). and vectors u Page 15 u .. J T Then D Cj - UjV j ' j Using (10) twice we have (11 ) and then (12) -1 -1 Using (11) and (12) allows us to compute B , ...• B 1 N -1 single matrix inversion needed to calculate g:. using only the If we ignore the D (9) j in (9), then we obtain an even simpler, but less accurate, approximation \l ..dA ~ ~ ,- 1 }'. 1..:, . . V L~ ., is analogous to ;;thE;., aqulv~-est.ilJ"'R~"~)!l,:iagnoS;tric; (1986) for the response transformation model. used by Atkinson The advantage of ~~ is 1 !that it can be computed using standard software packages that calculate linear regression diagnostics; TP"l?O\ thU~, '., model with \ 'Dne creates a linearized Page 16 as the vector of dependent variables and ~ T ~ (vz (9), ... ,vz (9)) 1 as the design matrix. N Q Then ~i' i = 1; ... ,N, are the diagnostics DFBETA of Belsley, Kuh, and Welsch (1980, page 13). compute a scaled diagnostics. version DFBETAS, Some software packages which is equally useful for Cook's 0 or DFFITS from the linearized model can be used as measures of total influence for p and A. If we used one step of the Gauss-Newton, rather than Newton-Raphson. algorithm when solving (7) then we would obtain A J The ~.. difference between the Gauss-Newton and the Q not ~j' Newton-Raphson algorithms is that the former uses an approximate Hessian which in the N 2 ~ ~ present notation consists of 19ndring'tae term Zj=1 Zj(9)V Zj(9) in the Hessian of the transformation sum of parameter, Hessians and the squares. For regression models without the residuals are uncorrelated with a their Gaus~~w&1!~~~xtiaflt~Jn'~~\accePtable. In the example of section 6 and in all other examples that we have examined, IA~I sUbstintiil1~ o~e~~~1.at~s 1ar~e values of IA~I while A~ is a good approximatiorl' tb1~~.£.J:<rhjEVWet-~!EPi..ation results from positive 2 2 correlation between zi(9) and ~ /~A Zt(9) causing A N to be positive and of the same magnitude as X j =1 [~/8A ~ 2 Zj(9)] . • Page 17 If we use A~ only as a diagnostic, then the overestimation of IAA~1 is not a serious problem and it does not prevent us froll detecting influential cases. The diagnostic .d~ influence separately or for its each approximations parameter. are vectors overall An showing measure of influence of the ith case is the exact likelihood distance defined by Cook and Weisberg (1982, section 5.2) as (13) LD~1 2[L max (9) - L max (9 (i) E 2 (.d )T(V L )] iili i Ilax E (;».d .. 1 The approximation in (13) is also found in Cook and Weisberg (1982) and follows from a Taylor expansion using vL accurate approximation, approximation one can max (9) = O. We can define an LD~ ~¥ rQl?l~HiI1gcA~ in (13) ,with .d~. As a quick u,~·~s, 101 which is a constantlll}l1J:}HY~ ",9;s.fPP~1l" . .~ ~rf»"'.t ,% "pnearized model. . D From the above discussl.pJl~EtP~~.M~q.~:thayJ.QiJ wJ,l:.:l:tfl1-ot be an accurate E apprOXimation to LD i · ''' •.- .. r If" ; " i ,S '!.f:,' "f.; b,;['. '!oj. : " 4. Robust Bstlaatlon Once influential cases have been identified, we must decide how they should be treated. In some situations, the statistician will feel Page 18 that they are valid data, that they do not indicate model deficiencies, and that they should be allowed full influence on the analysis. Then the MLE can be used. In other situations, gross errors. the influential cases will be suspected as Alternatively, the influential cases may indicate a model deficiency, but either the sparsity of data near their x values will not allow a better model to be developed or the data analyst will hesitate to add complexity to the model merely to accommodate one or at most a few observations. Then a robust estimator should be used. This section is concerned with the robust estimation of I. scale parameter a functional, e. g. can be estimated separately with a scale the media.n absolute devia.tion (MAD) applied to the residuals froID a robust fit. function, i. e. the robust The Let's'f(Yi'9.),= vti(Yi,I,a(I» be the score grad1~Dllrofl~e log'rrUlwlihood for Yi. The weighted MLE satisfies The ordinary i .- (unweighotitcl~)" M,L'E ~iJl''8~nJl.tit;ive6tQ ,cases with large values of si' in particular, to cases with large values of residual eiU',A), 8/8P (f(A) (Xi ,P», or 8/lJA Si)(Y~~',1I)',ci~d~r~'sliHKdfIig'to response, high leverage points, and points havini'higRJlfif.f~1i2eLforA:: respectively. ,:". . . . " $ r: .r:' -' ,~. A robust bounded-influence estfmator can be found by letting wi decrease as sOlie nor.. of s i (y i ,9) increases. To do this the weights • Page aust be allowed to depend on 9. 19 Thus we define the estimator 9 as the solution to (14) 0, where wi is a suitable scalar weighting function and, as in section 2, ~2(;) is the weighted variance of the residuals. The minimum requirement for robustness is to have -i(y,9) bounded as a function of i, y , and 9. Otherwise, a single outlier can cause an arbitrarily large change in 9. It should be mentioned that a bounded-influence estimators may not have a high breakdown point. The breakdown point is the largest percentage of contamination thato.an 'estIMator 'can tolerate before i t can be overwhellled by the coIrtallj·nan1's·.: tl'l1IHscJleansl.!.tthat-' i f the percentage of contallinants exceeds the breakdown point, then the estillate can be forced to take an arbi trary value by choosing the contaminants in a suff iciently nasty way. The estta,.tof8 '1 Ftla~:,'fe define here are related to the Krasker-Welsch regression estimator, and like that estimator they will have a breakdotmcpOiintJatl'l~at:,I:(1)+<1.I) ~}J,M wl1l!1'-e1,,;~p+1) is the total nUlllber of parameters., In the case of the Yohai (1984) have lipe.~f; ;!Ilq.i\~l~ ~~P8WU~'1(\98~) • , . r - proposed~ ~'&t;i~~i~p.rs ,,,i t~}'Jl:~8f{, ;50% the best possible. Howe,'(er, the;~~~.,e8timator~ and Rousseeuw and breakdown points, have poor efficiency. Yohai (1985) shows how to achieve" both high asymptotic efficiency and a near 50% breakdown point, but again only in the case of linear Page regression. 20 In the future, we hope to develop similar estimators for transformation models. A major difficulty will be computational When choosing w, asymptotic efficiency measured by the covariance matrix of 9 must be balanced against robustness measured by the supremum of some norll of • i (y,9 ), the so-called gross-error sensitivity. For univariate parameters there is a unique optimal wi which minimizes the asymptotic variance subject to sensitivity (Hampel 1968, 1974). a given bound on the gross-error For mUltivariate parameters such as 9 the balancing of robustness and variance raises philosophical questions, since there are many ways of comparing covariance matrices or of norming vector functions. Different norms on • i (y i,9) give rise to different definitions of gross·-error sensi'tiV'l1ty. The approach bounded-influence we tatre ·f;;geriera1Jzes·' /t-tie Krasker-Welsch regressiotPe-&t'fliate!i'F!l' "Whether the (1982) Krasker-Welsch estimator optimizes "tlfe- a~OJt:i66:ie:fbv~ance Jiuitrix in any meaningful I sense is an open question, but its efficiency at the normal model is usually close to that of the MLE (Ruppert 1985). (1982) Krasker and Welsch bound the so-called; '9~n"'st'a1tlfard.-i;~ed' gross-error sensitivity, .'. ,.. i .. which we denote by ., 2' We will describe ., 2 only briefly and the interested reader iSJ:!efenred ·~to~)~)thQiIYoriglnal paper of Krasker and Welsch or to Hampel etaL,( IS8a" ,oaapt,er; 4) for further details. First we note that the influencefuncUon ,of 9 satisfying (14) is -1 B •. (y. ,9), where 1 1 Page 21 1 T - N- X N v E[l>i(Yi,9)]. i=1 B This definition of the influence function is conditional on x , ... ,x 1 N but coincides with the usual definiton when the XIS are independent and identically distributed and sUllmation over i is replaced by expectation with respect to the The definition of B is analogous to that on XiS. page 5 of Carroll and Ruppert (1985) where w is incorrectly squared. -1 -1 T The asymptotic covariance matrix of 9 is B A(B ) where M A N -1 Xi N = 1E [I> • (y 1 T q 1 9 )1> • (y . , 9 ) ] . 1 1 An intuitively reasonable way to nor:m infl,uenqe function is to use the asymptotic covariance lIatlilx (1982) for further mot).vay:qn qf'j~~th~E! :e~"t~lIla~~~r ~ee Krasker and Welsch an,(h~-1~~~_s.101\~i:,?r..u~:i'cl'~suJtant measure of defined as where II v " M. M = (v™ -1v ) 1/?,:! forl·anyd"ector iv'''amf1 'posi'tive definite matrix This definition of 'Y·ls:,anal1)go!rsto'·~quat.ion(15)of s 2 Ruppert (1985) where w has been fneo~iI'ectly Carroll and omitted from the last term. To robustify the MLE, we will choose the weights so that 'Y not exceed a predetermined bound. 'Y s must be at least (p+1) 1/2 s does (Krasker Page Fro. experience with this and other probleas we and Welsch 1982). suggest bounding" 22 s 1/2 bya(p+1) where "a" is between 1.1 and 1.6. The choice a = 1.5 worked well on the data set in section 6. If an downweighted. ., s observation has low influence then it should not be Otherwise, it should be downweighted just enough to keep below the given bound. Therefore the weights should be (15) In (15) A must be replaced by an estiaate A. To calculate 9 we used a simple iterative scheme: (1) Fix a > 1. Set c=l. used. £ Let b~(tp~ t~ta~(. ng~~~r of iterations that will be Let 9 be a preliminary estimate, possibly the MLE. p Set w. = 1 for all i. 1 (2) Define (3) Using (15) update the weights: (4) Using the methods of section 2, find the weighted MLE with these weights. and call it 9. ,l: (5) If c < C, Otherwise, stop. It set 9 P 9 ,c'~:' e ,i; 1, and return to step (2). is possible to implement this algorithm on standard software packages, in particular SAS. Steps (2) and (3) can be computed with a Page 23 matrix language such as PROC MATRIX on the 1982 version of SAS. Step (4) can be performed using a weighted nonlinear least-squares routine such as PROC NLIN on SAS. By using macros on SAS or a similar package, it is possible to put the Ilatrix computations and the least-squares routines into an iterative loop. We initially used SAS, but now prefer the !latrix programming language GAUSS. One or two iterations seell adequate for diagnostic purposes, but this algorithm sometimes converges slowly, particularly when there are extremely influential points. This was the case wi th the example in section 6 where the algorithm did not stabilize until ten iterations. Unfortunately, the slow convergence gave the appearance that the algorithm had converged after only two or three iterations. We found that a fully iterative version of the algorithm could be easily implemented with the GAU'SS' oril'an' ..;! '<' AI i. III . IBf pt~At, "'. h and computation time t K: (~ for ten or more iterations was acceptable. Although bounded-influence estillation limits the effect of any case on the estillate, all ca~es. re~ar~:U.~]J,s ,of~,.,haw. :. ae:yi~nJ: " < • (,~. the data will have some influence. frOIl the bulk of l.i Hampel el a!. (1986) discuss the need for robust estillators that cOllpletely reject extreme outliers. (f~ ~~: " t. .... ' , ., <oj: h ,_ ," 1f1 t ~H For : ,I estimation of a location. parameter, this can be d.one with a redescending "psi-function". We can define an analog to a redescending 1'(x) be an odd function with 1'(x) (16) ~ 0 for x;~ psi~function here. Let 0, and define the weights Page Equation (16) reduces to (15) if ~(x) ~ = 24 is the Huber psi-function ~ x Ixl b sign(x) otherwise. b with b = a(p+l)I/2. If for so.e R > 0, ~(x) are completely rejected. = 0 for all Ixl > R, then extreme outliers In the example we use Hampel's three-part redescending psi-function ~ (x) = b (17) o x b 1 .. to.1bj {bS"':tt'iJ Itb :"U b2)~) 3 1/2 " . , ,;;. b ~ 1 2 x ~ b 1 ~ x :s x:S b ~ b 2 3 \i with (b , b , b ) = (p+l) (1.5/,~;51~8.0). 1 2 3 In section 6, the redescending estimate was computed by the same algorithm used to calCQdate lkeU~ounded+in~luence·estimate. the estimate. In step 1, bounded-influence In step (3) th.e•.;::WeJ.;ghts;::·wete'lcaJzculated using (16) and (17) instead of (15). ; ,-,. ~ .(1 ... .-=.. r . ..1 ..... ' Page 25 5. Estiaating The Covariance Matrix of 9 The asymptotic covariance matrix of the MLE can be found by (i) inverting the Fisher information matrix or (ii) using the covariance matrix of the influence function, the covariance being with respect to the empirical distribution of (Yi'x ), i=1, ... ,N. i Method (ii) is based on the asymptotic theory of M-estimation; see for example Hampel et al. (1986, section 4. 2c) . One advantage of (ii) is that the asymptotic covariance is consistently estimated even if the ~. 1 are not normally distributed (Huber 1967) or do not have a constant variance. In fact, method (ii) is similar to the jackknife which Wu (1986) advocates as a consistent estimate of the least-squares Moreover, heteroscedasticity. bounded-influence and method covariance can (ii) be matrix used for under the redesc!!'1l4ing!tsqll8.te~da~rwell. I Method (i): The observed Fisher informati~n.atrix for (9,a) is (18) respect to (9,a). We could flin.verJt~L(l8hand~:*h;en "take (p + 1) 2 corner corresponding Eto the upper left <;; ~", but;f:bV->\'Pa:1::ef ield( 1977, 1985 ) this is equivalent to the easier computation of inverting the (P+l)2 matrix 2 -v L where v 2 A max (9), now is the Hessian with respect to 9 only. Page Method The ( ii ) : MLE. bounded-influence esti.ator. 26 and the redescending estimator are all defined by the equation o (19) and only differ in the choice of the weights. The asymptotic covariance matrix of any estimator solving an equation of the form (19) can be --1 - estimated by [8 --1 T A (8 )] where - 8 = v T N Xi = 1 • i (y i .9) and A The asymptotic theory of M-estimation also shows that the standard error - when A f itting.?,th.«H~pl@id&1.i.Meft!?1s'i!.eq.kl!oetion 2) are incorrect. The pseudo-model finds the MLE by .ini.izing or solving (20) When the pseudo-model is fit by a nonlinear least-squares package the estimated covariance matrix is Page 27 (21) The asymptotic covariance of the solution to (20) is consistently estimated by _1 -1 T BLS '\S ( BLS ) A A A (22) where E. N 1= B LS and It is not hard to see that (21) and (22) 1 vZ.(Y.,9) 1 1 have different limits so that (21) is inconsistent. and (22) In practice (21) can be considerably different. especially in the estimated var iance of A. In this section we look at an example. Our goals are (a) to see the type of data analytic inform,~~o,>!:h~t~an come from the diagnostics and the robust estimators, ~andle (b) to see how well the robust estimators outlying data, and (c) to compare the accuracy of the "accurate" a pprOXimation.:11 to the "quick" approximation .:1~. When managing a fish stock, the annual ,QD~~su.mQ.d~l l,~ spawning stock size and catchable-sized : .. the relationship between the eventual production of new fish (returns or recruits) from the spawning. Ricker and Smith (1975) give numbers of spawners (S) and returns (R) from 1940 Page 28 influencing the until 1967 for the Skeena River sockeye salmon stock. Using some simple assumptions about factors survival of juvenile fish, Ricker (1954) derived the theoretical model • f(S,,I3) R relating Rand S. Other models have been proposed, e. g. by Beverton and Holt (1957). However, the Ricker model appears to fit adequately and, in particular, gives almost the same fit to this stock as the Beverton-Holt model. From Figure 1, a plot of R against S, it is clear that recruitment is highly variable and heteroscedastic, with the variance of R increasing with its mean. SewerakC8ses appear somewhat outlying, in particular #5, #18, #19, (.25... '1""j;; An index plot of ,~jld A nfwas_r¢OQ§tJPuCt:~d:.oft.e~Flgure2. Clearly case #12 stands out as theAI}O$.'tI'i'influent'idpby this. ,mea'sure , and #5, #19, and We will examine The exact case-d'el~tioni·.tatlaUc~~':-:i:ind;the approximations ~l and i i ~~ are given in table 1 for these four cases. Case particular, has #12 &\ ~ = a h1g!te '-=Ott ia-n . three parameters. In .51, sh'owillg ttiat thlfMt'B!"of A decreases from .31 to ' -.2 when # 12 is deleted. To see why, 'lnflu~i\ee' Why is'#'1'2~scj1inf1uential? look again at Figure 1.. Observation #12 is not far removed from the median relative to the variation in all the data, but Page 29 it is quite far removed relative to the variation in the other data with similar values of the independent variable S. The 27 data points besides .12 suggest that the data are extremely heteroscedastic, with the variation' in rapidly wi th the median recruitment. recruitment increasing very The effect of '12 is to increase the apparent recruit variation for low median recruitment and to suggest that the heteroscedasticity is not so nearly pronounced. Seen in this Ilight, the much more severe transformation (A = -.2 instead of A = .31) when '12 is deleted is not surprising. Besides suggesting less heteroscedasticity than seen in the remainder of the data, .12 has a large negative residual which suggests less right skewness as well. To further analyze the inf"luence! (1f''''12 on A we will introduce two alternative estimators These .w:Cll hb'e aUcussed of A. fully in a i forthcoming paper by Aldel'shof halld[Rtl~t't?h"" F?,F the MLE of 13 and define"e(k)!;: \(~(M}i1\~,:)J1.'1!:lThem'ke-..ness estimator fixed A. let P(A) be Ask is the value of A such i that,'the:-(!skSWl'llfss-: (C'ollff-.it:ietl't"rof the residuals {e i (9 (A»} is O. The heteroscedasticity estimatqr::.A het is the value of A such that the corr,latiio~npe~Ke~I).1;,{~l'flAUh~p.db-'{~,l.og(f(xi ,p(A»} is o. When case .12 is o.itt~d t~~ V,~~u~Q~.As~,:p"ly changes .46, but Ahet changes much. .tur,..~tler.; of deleting '12 is to The major effect increas~'Jtl\e.~ he!teroscedastici ty. Case '12 was the year 1951 recruitment f.ro~· .. ~6.'...t90 -.86. from .58 to w~~n.a (Ricker and Smith 1975). rock slide drastically reduced For this reason we are quite Page comfortable dOWDweighting it severely. *12 entirely. 30 In fact, it seems best to reject This, in effect, is what the redescending estimator does; see below. Compared to case *12, cases *5, *19, and *25 all have the opposite effect on A. Deleting any of them decreases the apparent recruitment variance when the number of expected recruits suggesting less heteroscedasticity. i = For this reason large, E in effect is positive for ~i 5, 19, and 25. The effect of *12 on A. is P is not as easily analyzed as its effect on Deleting #12 increases PI from 3.29 to 3.77 and decreases P -7.00 to -9.54. 2 from These effects tend to cancel but not completely. As shown in Figure 3, the net effect of including *12 is a decrease in f(S,P) for small S and an increase,for large S. plausible since #12. increase in has<}-a:'~l'l)w.:rvaJrueoo.f:' f(S,P) for s.: The former effect is and a negative residual. large S is not so plausible, but it The is a consequence of using cthe ,Rlekel'l ;JlodeUfor uthe Jlledtan recrui tment. Having analyzed;-<tbs' exao:t'chaBge$fA~,n.weturn to the accuracy of the approximations A~ and A~. and *12, especially #12 ,. '~anli The accuracy of A~ is poor for cases *5 theseuBBe t»recds-e,ly' the cases of interest. The same is true of theqapproxiiaations to -LOt;E the quick approximation is rather inaccurate.; The quick estimate'doesEbetter}fbr,;Of:ases #19 and #25, but this is lIerely fortui tious. The overestilllaUorhby the quick estimate is small here and cancels the error from not recalculating To see this we can examine the changes in y after the case-deletion. MLE induced by Page 31 case-deletion with y is kept equal to the geometric mean of all the y's. These changes are -.17, .77, -.013, and -.011 for i respect! vely. ~ Q,\ = 5, 12, 19, and 25, In all four cases, the change is closer to ~ A Ai than to .. 1 We calculated 20 iterations of the bounded-influence estimate with the tuning constant a = 1.5, and using this as a starting value we calculated 10 iterations of the redescending estimate with (b ,b ,b ) = 1 2 3 1/2 (p + 1) (1.5, 3.5, 8.0). The bounded-influence estimate changed rapidly for the first five iterations, more slowly for the next five, and then was stable for the last ten iterations. of To show the behavior of the algorithm, the value and non-unity values of Wi are given in table 2 for iterations 1, 9 2, 3, 4, 5, 10, and 20. It is interesttng:,to examine w · 4 This is 1 for the first three i tera tions·;but, eNentuia1 lY"lcdeC!l'eases to .65, less than i w and w . 25 19 Case #4 is influentiaL;Lbut,:1ts~,inf1uelie8dsdlllasked by #12. Robust estiMation or subset (de,let-i.on ~i!agnp8ttos Jsaea neces9sry to detect the t:~ influence of #4. , _- b.. The values of 9 and"nenundty ya;}u8sBot ~)/1! , 1f~,arelalso given in table 2 for the redescending estiJlili.')'Case,#i1:a;·:is~>·C\ltlpietelyrejected, e. g., w 12 = 0, for all iterations. The value of w decreases slowly for three 4 iterations and then jumps downward. ta~·al.dst",(}lon -the fourth iteration. With #4 nearly deleted the weights wi6i fifth and sixth iterations. the sixth iteration. ;~wi9' and w25 readjust on the The redescending estimate is stable after I Page The first iteration of the identical to the MLE without #12. decreases to about heteroscedasticity. completely -.3 As rejected, #16 estimator is nearly After #4 is strongly dOWDweighted, A which both redescending 32 #12 suggests and becomes #4 even are slightly completely influential and stronger or is nearly slightly dOWDweighted. We do not necessarily redescending estimator. advocate using the full y- i terated The bounded-influence estiaator or the first iterate of the redescender give a good fit to the bulk of the data and reject the outlier #12. Without further information about these data, we dOWDweighting are not comfortable #4 and as #16 much as the fully-Iterated redescendlng estimator dOWDweights them. If forced to choose one estimate, redescender. The residllals-"'{e we would choose the one-step ice H' frointhls estimate are plotted in IgnoringeI2ft·): J t~l:remafrli'ng' residuals show only slight Figure 4. heteroscedasticity'-aiut a'lltoS:t'r~ftlo :AiCewri'e~. ~~n.: 'OJ , The redescending estnlator';coaplb'te'l~ rejects #12 so there is no need to ref i t instructive to w-i th' : this J .cItHo.alous' :lcase' removed. see bounded-influence and' wha'tiC};happe~j:'if" this' is However, done. the';NHieBCetltfing~;esUmatorsconverge it Both is the rapidly, with the first iterate equalt'l';or';·practical 'purposes to the fully-iterated estimate. This suggests that ~thlEFallg6rHI\iIGonverges slowly only in the presence of an extremely influential point-such as #12. In table 3 we give the standard errors of the MLE by method (i), inverting the observed Fisher information matrix. We also give the 1 Page method (c = 10) (c (it) standard errors for bounded-influence the MLE, estimator, the one-step and and the one-step and 33 iterated iterated 5) redescending estimator. The method (i) and method (ii) standard errors of the MLE of A are moderately different, but both are considerably standard error of A for the robust estimators. to increase the variabil ity of A, smaller than the Downweighting #12 seems but since #12 is known to be an unusual year it seems wise to use a robust estimator despite the higher variability. The standard errors of 13 1 and 13 2 are smaller for the robust estimators than for the MLE. At least in this example, robust estimation appears to cause a loss in efficiency of A but a gain in efficiency efficiency for A is only looking at this, either appare~t",," nQt, real. However, this loss in There are two ways of co,qd:itJRnin~~ 91' ,n,ot,~nQitioning ,. that a rock slide of~. on the event .j occurr~c:l}iIl,,~Plr+c,Co9P;i t iQna~ gn ~there being exactly one rock slide and it occuring, Ml::P'J,y~~r"",h~Q.oPJ' J;I,\l,mber of spawners was large mean square errol?, r~ativ.~-tO?th'Bli9~\lst;~~.~till\ators. condition on the occurence .Qf some other number of e1i~ctkY"'(UW,;,~sHcj~", more variable than i ts The change but rather admit that slid~S.dllC9\l,ld;"h~vel:)pec\lrr~4,~nd have occurred at any yeara",·:tAftP that these could Hi~!I:]eleaF:(~that,the MLE is really much standat'd~~llrQ:l!csho.ws ;;i, in A when #1?; '.f.s standard errors of A. If we do not Notice that del~t~d E .d A 12 is large relative to the is over twice the standard error of the MLE and about 1.65 times the standard error of the one-step redescending estimator. Page 34 Table 1. Deletion diagnostics for the most influential cases. The superscripts denote the method of calculation: E = exact, A = accurate approximation, Q = quick approximation. DIAGNOSTIC CASE NUMBER AEA. 1 AAA Q A ,\ A A A i i E PI , 1. A PI ,1. f>i:.! . Q PI ,1. l A A A .'i'" . ~ ~.: i,~' A zr. (j. Q a~ 81 P2 ,1. 19 25 -.10 .51 -.034 .033 -.14 .61 -.015 -.015 -.32 1.57 -.036 -.034 -.22 -.48 .08 .12 -.22. ., -.45 .11 .15 -.34 -.43 .10 .14 1.48 . 2.54 -1.02 -1.25 1.47'" 2.57 -1.14 -1.38 1.80";;· 3 ..72 -1.12 -1.36 ~;; ......" 1° P2 ,1. 12 • E P2 ,1. 5 *'. '-'1 dJ(l. ~ ;CO, ' ,. I j' LO~1 .58 9.64 .32 .38 LoA .72 8.7 .27 .34 LO~1 1.29 24.3 .27 .34 i e Page 35 Table 2. Estimates of P and II and the case weights for the robust estimators. BIE = bounded-influence estimator. RE = redescending estillator. ESTIMATES PI CASE WEIGHTS P2 II w 4 w 5 w 12 w 16 w 19 w 25 MLE 3.29 -7.00 .31 1 1 1 1 1 1 BIE:c=1 3.49 -8.00 .27 1 .71 .58 1 1 1 c=2 3.59 -8.54 .20 1 .63 .34 1 1 1 c=3 3.66 -8.87 .12 1 .62 .20 1 1 .99 c=4 3.70 -9.11 .06 .98 .62 .13 1 1 .98 c=5 3.74 -9.31 .02 .91 .62 .087 1 1 .96 c=10 3.84 -9.71 -.07 ' .69 .63 .041 .97 1 .91 -9.74 ,". ;." -.08" '" .65 .64 .038 .92 1 .91 .65 .64 0 .92 1 .91 .52 .71 0 .86 .97 .86 c=20 3.85 r. l::: RE: c=l 3.91 -10.1 -.21 c=2 3.93 -10.1 -.23 c=3 3.92 -10.0;'0 ,-.23 .46 .75 0 .81 .94 .82 c=4 3.91 -9.97 ,-.23 0 .43 .76 0 .78 .92 .81 c=5 4.11 -10.7 -.36 .006 .77 0 .76 .90 .79 c=10 4.00 -9.94 -.34 .005 .91 0 .64 .71 .61 H f:l Page Table 3: 36 Standard errors of p and A. ESTIMATOR MLE (inverse Fisher infor.ation - method ( i ) ) (as an M-estimator method (if)) DIE: One-step (c=1) P1 P2 .75 3.40 .66 .54 c=10 RE: 3.46 A .21 .16 3.01 .42 3.51 .39 One-step (c=1) .51 2.79 .35 Fully iterated (c=10) .43 2.37 .35 Page 37 LIST OF FIGURES Figure 1: Plot of returns (or recruits) against spawners with median recruitment estimated using the one-step redescending estimate. and spawners are in thousands of fish. Figure 2: Returns Selected cases are identified. Index plot of (LD~)1/2, the square root of the approximate 1 likelihood distance. Figure 3: Difference in median recruitment estimated with and without case '12. Figure 4: Residuals from the one-s~ep redescending estimator. -.. ', t.t· . i ... I _~ •.i r 3000 • 2400 ....... '0' • • • • • • • • • • • • • • • • •0 'u _0" . ••••••••• 0 'M5 ••••••• _0_ . • • • • • • • • • 0, ••••••• _0_ ••••••• ,0 • • • • • • • • M : II) ~ ••••••• : · 1800 M1B Q) : 9M 'M ~ : : a: ••••••• : ;. . M M 1200 ~ 600 ~ ,~~ • • •M, , M ,, . • , , , , , .... '.' . , .. '';22:'. ... , , , , , ,,~, M6 ' •• , , ,:. , , , , • ,1,~~,"25. , ... ,:. , , . , , , . M4 16 OLJ-L.-L...L....L.-I-.1--L.-L...L-.L..JI...-I......J......L-..L...-l...-I.--L.......I-...L...-..L.-..L-.L---L.-I-...L...."--I.......J o 200 400 600 1000 BOO 1200 Spawners f:, ,. 2.5 o • '* • • • • • • ",0 • • ".0 • • "0 • • . ~ .. . . '....,.. . . . '* • • '0' «' •• ,.", ~ ~.'.~~~.. *':-.,-;:-"~,. --:-.• ..., ••-.-• •-+-i-:-':"'7 -.-... ~,~~ ..-. ~ 2 1~...J·riri • . '".1 . • 1.5 • ~ • • • "0 • • '"0 • • .'. • " .'. • • .'. • • .••• • • 0° • • • : , ...•... , • • • : • • • : • • • ~ . • • • °0 • • .. . .. .. .. .. .. ............................ . 1 0.5 OLLLJ..J.It:WLu...J-.u...J-.u...J-.u.Lu.LL.U..JL.U..Ju..J!!I~::J..l.J..LU..LU..LU.J...U:::u..J..l.U!UlJ..l..l.lll.u..u o 2 4 6 B 10 12 14 16 Case Number 18 20 22 24 26 28 200 .... CIJ .. 150 . . . .. ........................................ . . . 100 ....... N ..-4 C '.' . '.' '.' eu .c ~ 50 c .... . R:I '0 eu % Ot------'-----..:..--~"'-.....:-.---....:.------=----......j .... C eu CI C R:I ...... -50 ~ . --~ . . 5 -100 L..-l---L...............I-..l......J'--I..--'--'--""----...........--A-...........J-.l.-.J'---L--'--'--""----...........--A-.....L-..J-.l.-.J'---L--l o 200 400 600 BOO 1000 1200 Spawners f ~*,,,gr t .. .. ~, " 16.. . ..... . .. ~a 1\, :-1 '~ " : .. 5 . ot" ..... i i O~---"""-:::~~-"""-~-....,..-.-..--....... ' .... CIJ ..... R:I ;:, '0 ....en ..4 eu - I' \ I'" ... ',' , .. 9" --....,....--------_t "22' .. 6 ,, \ f \ 1~....25 I ex: .. . . . . .................................................... . .. . . . -0.25 -0 . 5 L..-l---L.-J--'--..J--.l.-.J'--&.--L.....J-..J-..l......JL..-l---l...............................................---'-..............a...-""----L..-l---l.. o 200 400 600 Spawners 800 1000 ..... 1200 Page 38 REFERENCES Atkinson, A. C. (1986) . "Diagnostic tests Techno.etrics, 28, 29-38 for transformations." Bates, D. M. and Watts, D. G. (1980). "Relative curvature measures of nonlinearity." Journal of the Royal Statistical Society, Series D, 42, 1-25. Bates, D. M., Wolf, D. A., and Watts, D. G. (1986). "Nonlinear least squares and first-order kinetics." in Proceedings of Co.puter Science and Statistics: Seventeenth Sy!posiu. on the Interface. David Allen, ed. New York: North-Holland. Belsley, D. A., Kuh, E., and Welsch, Diasnostics. New York: Wiley. Box, R. (1980) . E. IRgpI limt G. E. P. and Cox, D. R. (1964). "An analysis of transformations (with discussion)." Journal of the Royal Statistical Society, Series D, 26, 211-246. Box, G. E. P. and Hill, W. J. (1974). "Correcting inhomogeneity of variance with power transformation weighting." '1ft hee_t:rics, 16, 385-389. ,<'-2~~iJ:.-;tf:__ ,::?~0",' Carr, N. L. (1960). "Kinetics of catalytic isomerization of n-pentane. " Ind1ist:tfir'and'1tm~i~eii-1n, Cbealstry, 52, 391-396. '- Si-;, -~U .;. ,!L "'f!:~!tl!'-E.ll!£~:_J Carroll, R. J. and Ruppert, D. (1981). "On prediction and the power ~',-,rnlation f1'tliHy>." "'Bio_litke, &8,'609-&16. ~.,":.;; u '{;)-~:' ~~ 8 ,~'/f'; ~.:' f~f~'~-:-~J ,.12 Carroll, R. J. ana-'--RU\;perf;;:~!f.Ll'l~l!;-!'~ower'-transformations when fitting theoretical models to data." Journal of the Aaerican Statistical Association, 79, 321-328 . . '~';;:, 1·";' HL t t··:!}, 12.6~ ...~" ,':" '. Carroll. R. J. andL~R1ippe~t-. tBDP~H9af;)r.'-G' "Transformations regression: a robust analysis." Techno.etrics, 27. 1-12. t:.:,'::!';; U;.t~!21 l J in ' - Cook, R. D. (1986).:,o_:~qses@"1!.!b,'" ~f influence." the ROyal Statistical Societ~;~~ri. . D. To appear. .JCRIr'IIIl1 or Cook, R. D. -and Wang, P. C. (1983). "Transformation and influential cases in regression. ",Technol,trics. :;25, 337-343. 1 for::,' Cook, R. D. and Weisberg, S.(1982). lleat.maI. . . . . Idl1ll!!DCe .ill Regression. New York and London. Chapman and Hall. Draper. N. and Smith, H. (1981). Edition. Wiley: New York. Applied Regression ADaIJ!is, 2nd Page 39 Hampel. F. R. (1974). "The influence curve and its role in robust estimation. " Joanaal of the AaeriCaD Statistical Association, 62. 1179-1186. Hampel, F. R. (1978). "Optimally bounding the gross-err-sensitivity and the influence of position in factor space." in • • • Proceedings of the ASA Statistical Co.puting Section. ASA, Washington, D. C., 59-64. Hampe I, F. R. (1985) . "The breakdown points of the mean combined with soae rejection rules." Technoaetrics, 27, 95-107. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., and Stahel. W. A. (1986). Robust Statistics. John Wiley and Sons:New York. Krasker. W. S. and Welsch. R. E. (1982). "Efficient bounded-influence regression estimation." Journal of the American Statistical Association, 77. 595-604. Patefield, W. M. (1977).' Sankbya, B, 39, 92-96. "On the maxillized likelihood function." Patefield. W. M. (1985). "Inforllation from the maximized likelihood function." Bio.etrika. 72, 664-668. f. ( : . Rao. ,~, C. R. (1973). Linear statistical inference and applications, 2nd. edition. John Wiley and Sons: New York. . [-,'~'. ~'f \', fi ~. ;~. ,_ a,':~ , Ricker. W. E. ( 1951,:»)n:J .,,:;~-tP~~~~d'H;~-e«t.ttl~~.-m:t/· Fisheries ResearchBOircl' orCiDi -,, rC"·'SSlf;623. ,:'.~'. (:0" '1;,1 .G .,'·i';:.F~Jj. its Journal of i..,- Ricker. W. E. amd SJlith" H. ~1 .C<-JP7...5)'r,>UJ'Arev,i~M:! ~~terpretation of the history of the Skeena·· ..River' 'sockeye salmon (Oncorhynchus nerka) . " Journal of"then Pisb!l'J,es Researsh Board, ·of Canada. 32. 1369-1381. ..,~·L'H~ .., · · f'JriL' 0:1 ;;..I~;j<iI'; '(.,~:r~;"(" ?~[ :Sf ,e~ .ooljBt~o2S; Rousseeuw. P. J. (1984) . "Least aedian()f" 'squares regression. Journal of the Aaerican,~t~~~stigal ~~~9G!ati~~ 79, 871-880. ~ ~! "1 J .~.. iI\.fiti~;f:!1 2 i ~::'~' . 1.: ni::, J? ~J d:;'" Rousseeuw, P. J. and Yah'a! ,'V.-l1984) . "Robust regression by means of S-estillators." tin RabJIJit ad:! NOIlUneu" Tille ~series Analysis, Franke, J.. HardIe. W. ~,-_~_~.~i!fL-:::J!;J2Q,~ J~~~tt;' .. $pr inger-Ver lag: Berlin. ; . -,~, Ruppert, D. (1985). "On the bounde.,£>1nf}Uence regress10n estimator of Krasker and Welsch." Journal of the American Statistical Association, 80, 205-208. ": Ruppert, D. and Carroll. R. J. (1985). "Data transformations in regression analysis with applications to stock-recruitment relationsh1ps." in Resource IfIuuICe-Dt: ProceediD2s of the Second Ralf Yorque Workshp held in Ashland, Oregon, July 23-25, 1984. S. Levin, ed. Springer-Verlag: Berlin. Page 40 Snee, R. D. (1986) . "An alternative approach to fitting models when re-expression of the response is useful." .J~ ~ Qaality Technology. To appear. Wu, C. F. J. (1986). "Jackknife, bootstrap and other resampling methods in regression analysis. Annals of Statistics, 14. To appear. Yohai, V. (1985) . Righ breakdOllD point and high efficiency robust estiaates for regression. Technical Report No. 66, Department of Statistics, University of Washington, Seattle, Washington. • e"
© Copyright 2025 Paperzz