Diagnostics and Robust Estimation When Transforming The Regression Model and The Response R. J. Carroll and David Ruppert Department of Statistics University of North Carolina Chapel Hill, N.C. Abstract: In regression analysis, 27514 the response is often transformed to remove heteroscedastici ty and/or skewness. When a model already exists for the untransformedresponse, then it can be preserved by transforming both • the model methodology, several and which recent the we papers, response call and II with transform appears the same both highly sides II useful transformation. has been appl ied in practice. • parametric transformation family such as power transformations ~en the transformation can be estimated by maximum likelihood. however is very sensitive to outliers. In this article, This we When in a is used, The MLE propose 2 diagnostics which regression estimator parameters. similar diagnostics software. indicate cases and to the We also influential propose the Krasker-Welsch robust estimator a for the transformation or robust bounded-influence regression can be estimate. implemented on Both the standard e 3 ACKNOWLEDGEMENT The research of Professor Carroll was supported by Air Force Office of Scientific Research Contract AFOSR-S-49620-8S-C-0144. Professor Ruppert's research was supported by the National Science Foundation Grant DMS-8400602. 4 1. INTRODUCTION In regression analysis, the response y is often transformed for two di stinct purposes, and to improve variables x. known to induce the fit normally to some di stributed, simple model homoscedastic errors involving In many si tuations, however, y is already believed to fit a f(~;!) model transformation being 6 of y is a p-uimensional still needed parameter to manner. Specifically, parameter ~ be a ~ ~) = f ( ~) (x; B) €l""'€N are '4IJistributed. - - + skewness f(~;!) transformation a and/or in the same indexed by the ~ and assume that for some value of 1 where y(~) let If vector. remove heteroscedasticity, then one can transform both y and y explanatory e-€. (1) 1 independent and at least approximately normally Notice the difference between (1) and the usual approach of transforming only the response, not f(x;!), i.e., y ( ~) = f ( x; B) + e-€ .• - - ( 2) 1 It should be emphasized that model is not a substitute for (1) models are appropriate, but under different circumstances. been amply discussed by Box and Cox Smith f (~;!) (1980) and Cook and (1964) Weisberg and others, Model e.g., Typically (1982). (2). in is linear but in principle nonlinear models can be Both (2) has Draper and model used. (2), MOdel (1), which we call "transform both sides", has been discussed extensively in Carroll & Ruppert (1984), and Snee (1985), and Ruppert (1986) and we will only summarize those discussions. f(~;!) . when has two closely related interpretations; the error ~stribution of is y zero given and x. it In is the Carroll and Accordi nq to f(~;!) median and Carroll of Ruppert (1), is the value of y the conditional (1984), we were concerned wi th si tuations where a physical or biological model provides 5 f (~.i~'>' but where the error structure is a priori Snee (1985), Carroll and Ruppert Bates, Wolf, highly and Watts effective with (1985) real ~, ~ Exampl es by (1984), Ruppert and Carroll (1986), show that transforming both sides data, available and, as Snee shows, when By estimatinq unknown. both when f(~;!) a theoretical e and can be model is is obtained empirically. and! simultaneously, rather than simply fitting (~;!), the or iqi nal response y to f we achieve two purposes. Firstly, ! is estimated efficiently and therefore we obtain an efficient estimate of the condi tional median of y. distribution of y given ~, Secondl y, and, in we model partic~lar, the entire condi tional we have a model wich can account for the skewness and heteroscedasticity in the data. (1984) Ruppert discuss di str ibution of y in a the importance special case, Atlantic menhaden population. y, for and fixed let F be a of modeling the spawner-recrui t of the To specify the conditional distribution of distribution approximately normal, conditional analysi s ~ let h(y,~) be the inverse of y(~), Le. the Carroll and function of but not necessarily e 1.. We exactly h(y(~) ,~) =. y, assume normal ~. that since if F is y( U is the Box-Cox (1964) modified power family, y(~) = (Y~-l)/X ~~O ( 3) = log(y) then given F must have ~ ~=o finite , support whenever ~~O. 'The p-th quantile of y is h{[f(~)(~;!) + ~F-l(p)],~} and the condi tiond 1 )fl'",] r) ( 4) of Y is (5 ) where -a..s.x~a estimation of is (4) the and support (5). of F. E(ylx) Ruppert and Carroll (1986) is easily estimated by Duan's discuss (1983) ~ 6 "smearing" estimate, ~the which estimates F by the empirical distribution of residuals; see section 5. Many data sets we examined in the untransformed response y, but not in the residuals y ( ~) - f ( ~) (~;!) ; the transformation has accommodated, have substantial or explained, outliers the outlying y's. There is still the danger, however, that a few outliers in y can greatly affect x and J3. outliers should not be automatically deleted or downweighted, especially when they appear to be part response, but it should be of the standard practice normal variation to detect and in the scrutinize influential cases and when outliers are present to compare the MLE with a robust estimtor. In this paper we propose a diagnostic and a "bounded- influence" estimator which can be used together for detecting influential ~ cases and for robustly estimating Case deletion diagnostics ·e Belsley, Kuh, and Welsch for (1980) and !. linear regression and Cook and Wei sberg been extended to the response transformation model (1983) and Atkinson in ~ (1986). (2) are (1982), the influential be searching for them. to have The last two papers approximate the change can be unwieldly because of are and in by Cook and Wang as single cases or subsets of cases are deleted. subsets di scussed large number detected, one of Subset deletion possible subsets. needs some If strategy to Alternatively, one can examine weights from a robust estimator with good breakdown properties. Bounded-influence regression estimators, so-called because they place a bound on the influence of. each observation, have been proposed by Krasker (1980), Hampel (1978), and Krasker and Welsch last paper provides a good overview. tit bounded (1982), and this Carroll and Ruppert (1985) proposed influence transformation (BIT) estimator extending the Krasker- Welsch estimator to the response transformation model (2). 7 In this paper we adapt Atkinson's (1986) estimator to the "transform both sides" model. diagnostics and the BIT e The basic technique is to linearize the model (1) by a Taylor approximation at the MLE, and then to apply ordinary regression diagnostics and bounded-influence estimates. Our methods software. are designed to be easily implemented on All our computations were performed on the SAS package using PROC NLIN and rather simple data manipulations in PROC MATRIX steps. standard and DATA The computations would also be straightforward on other software packages. influence Our computational estimate for the eliminating the need for a techniques can be applied response-transformation to model a bounded(2) , lengthy FORTRAN program used in Carroll thus and Ruppert (1985). e· 8 MAXIMUM LIKELIHOOD ESTIMATION Throughout this paper y(~) is the modified power transformation Under model (1) the log-likelihood is (3). N = \ log f (!, ~, 0- ) L i=l where .10g f(!,>--"cr) = -tl0g(2Jlcr 2 ) + (~-l)log(y.) 1 -1/ ( 2cr 2 ) [y. ( >--.) - f ( ~) (x. ; B ) ] 2 • 1 _4Ifor fixed B and maximizes L(.!!, -1 (6) - ~ ~,cr) in cr. Thus, the MLE of B and ~ maximizes N =- (N(2) 10g{N- l ) (7 ) i=l where y is the geometric mean of Yl' ..• IY N. Therefore, B and ~ minimize ( 8) 9 Following Box and Cox For fixed >.-, minimize (8) (1964), in ! 13 and by ordinary squares and call the minimizer !( >.-). maximize graphically or >.- can be computed as follows. Plot numerically. (typi c~lly nonlinear) Lmax (!(>'-) ,>.-) This technique = !.T! attractive when f is not transformed and f(!.i!) minimized the in 13 by technique is linear less least-squares. at tracti ve When on is a grid and particularly for then (8) can be transforming both computationally least- but it does sides, gi ve the confidence interval ( 9) where xi (1-0() one degree of is the (I-a:) freedom. quanti Ie of the chi-square distribution wi th Minimizing (8) simultaneously in >.- straightforward with standard nonlinear regression software. fits the dummy variable Di D. 1 = [Y ~ >.-) - f 1 -1 matrix of 13 is One simply to the pseudo-model (!,~). using e- ( 10) - Not only is the least-squares estimate (!, >.-) the MLE, but for small values of covariance 13 ( >.-) (x. i 13 ) ] /y >.- -1 with regression parameter of =0 and the 0- pseudo and large N estimating the regression model (10) essentiaLly equivalent to inverting the Fisher information for (>.-,13 is ,0-) , see the appendix. • 10 3. and DIAGNOSTICS '!( i) ) be the ( ).. (i) MLE' s wi th A The changes .6~i respectively. interpreted measures of = influence, called case i, I\a. = (a-a.) are easily - -1. and (>-"-)..0» and without -1. the sample influence curve A (Cook and Weisberg (1982». Unlike 1\).. 1.. and -I\a. 1. in linear regression, cannot be computed exactly without actually recomputing the MLE wi th case i deleted. 1\).. 1.. However, and - 1\13. -1. can be approximated by applying Atkinson's (1986) "quick estimate" to a linearization of model (1). A 1\).. 0) and I\a( i ) we To approx ima te 1 i near i z e the model (11) Let w( = )..,!) u . ( ).. , a) 1 - ( )..) (~;!) ] I~ ).. -1 , = [y ( )..) _ f z()..,!) (J/J)..)z()..,a), = (J 1 J a . ) z ( ).. , a) -1 - = - ( J 1 J -1 a . ) f ( ~) (x; B ) 1'1).. -1 , - and = (u. ()..,a) , .. .,u ()..,a» 1. p- Sometimes we will z(y,~;!,)..) write dependence on y and x. = z z we If ~ = = z()..,!), w = fit G. tki nson '" -()..-)..)w - (!-!) equation T (12) If instead we fit IS (1986) II • instead of z(!,)..) The same holds for w()..,!) w()..,!), and A T ~ = ~()..,!). ~ and to emphasize ~()..,!). Also let Then (11) is approximated by + error. to the (12) full the (12) data, then of course with the ith case deleted, ).. =).. and then we obtain quick estimate II approx ima tioD, which we ca 11 ).."'Q( i) and 11 "Q !( i) • .6~.1 Let Q = " "Q (~-~(i» " ;\~. " and ;\fL Q -1 ;\B .• " "Q (!-!(i» = be (12) the resulting linear, refitting approximations to without case i is easy using standard matrix identities, which have been - and 1 Because --1 is e programmed in many statistical packages. We used the following computational scheme on (~,!), minimized by PROC NLIN to obtain in a DATA step, and finally then z, w, the linear model (12) was (;\}... - not 1 Q "Q ,;\13. -1 part is a ) is DFBETA. 1 of in the Belsley, standard SAS relatively large val ues of (1986 ) Atkinson's calculating " ;\~. - 1 ;\~. 1 1 Kuh, If we output. "Q ,;\B. ). --1 Welsch are (19) gives We feel, however, When only the response is transformed, ! both sides , ! ~ The unsealed nomenclature and is interested in simple a cases wi th formula for that influence for both 13_ should probably be assessed together and DFBETAS.1 . sensible to estimate fit on PROC REG. then the scaling is immaterial. equation Q alone. "Q (.6~. scaled version of was (Belsley, Kuh, and 1 which (8) and u were generated PROC REG calculates the regression diagnostic DFBETAS. Welsch 1980), First SASe first and then is depends heavily on to estimate 13. is usually vet:y stable as ideal ~ When is perturbed and ~ ~ for and this. e- and it is transforming ~ and Bean be treated simultaneously. In the example in section 5 and in other examples ;\~. report, - and 1 ;\~. Q were 1 often - considerably that we will not different, which is surprising since the quick estimate is reasonably accurate when y alone is transformed (Atkinson 1986). ~ and 13. The approximation . The difference is that here u depends on ;\~.Q does indicate cases with relatively 1 - large val ues of ;\ ~., and ;\ ~. Q seems adequate for diagnostic purposes. - 1 - To obtain a compute Cook's D or DFFITS psuedo model (12). 1 single measure of (Belsley, joint Kuh, influence for and Welsch (~,!) one can (1980) ) for the • 12 4. A general approach to variance subject approach was Hampel to begun a by (1978), Krasker ROBUST ESTIMATION robust bound Hampel estimation on the <1968, is to minimize gross-error 1974), appl ied asymptotic sensitivity. to This regression by (1980) and Krasker and Welsch (1982), and used in the response transformation problem by Carroll and Ruppert (1985). Here we will find parameters \ and!. wi th a robus t Let an estimator scale functional, t{~,y;~,B) ~, We will ignore e. g. bounding the influence for the which can be estimated separately the MAD, appl ied to the residuals. be the score function -e <13 ) = z().."B)( - w().."B) -). ~()..,,!) Since the MLE minimizes (8) it solves N r(x. ,y. ;~,B) \ -1 1 / i=l at least when \ of B. = and ! 0, are unconstrained and f (~;!) is a smooth function The MLE is highly sensitive to cases with large values of z(\,!), w()..,,!), or ~()..,,!) corresponding to response outliers, high leverdge ')ints for \, and high leverage points for !, respectively. A robust bounded-influence estimator ()..,,!) is found by solving 13 N \ W(y. ,x. i~,J3)t(y. ,x. i~,J3) 1 -1 1 -1 / i=l = 0 The optimal where W is a scalar weight function such that wt is bounded. choice of W was first studied (~,!) measured by the covariance of (1968, Hampel When univariate parametric families. measured by the norm of wt by choosing w, 1974) for asymptotic general efficiency must be balanced against robustness (~,!), For a multivariate parameter such as this balancing raises philosophical questions since there are many ways , of comparing covariance matrices approach we take generalizes regression estimates. or of norming the Krasker-Welsch vector (1982) functions. The bounded-influence Whether the Krasker-Welsch estimator is optimal in any meaningful sense is an (Ruppert 1985), open question but it seems quite satisfactory in practice. Let any be the gradient of log f(.!, >.-,cr) Q, weighting function W, the influence with respect to function evaluated (.!, >.-). at For (y.,x.) 1 -1 This defini tion of IF coincides wii th the usual defini tion when the Xl are the will be defined as where N B = N -1"\ E {W(y,x. ,~,J3)t(y,x. ;>.-,13)9, y 1 -1 / T (y,x. ;>.-,B)}. -1 i=l i.i.d. definition for of B some is H, and replaced the by averaging over expectation with definition is appropriate for fixed or random XIS. {xl' ..• ,x N} respect to in H. but not here, t = 1. Our In the defini tion of B on page 5 of Carroll and Ruppert (1985), W is incorrectly squared. that article, s In The asymptotic covariance matrix of 14 N A An = -1 \ 2 • T N / E {W (y,x.,~,B)r(y,x.;~,B)t Cy,X.i~,B)}. _ y -1 -1 -1 i=l intuitively reasonable way covariance; see Krasker discussion. The resultant and to norm Welsch measure wt (1982) of is to for use the further influence, the asymptotic motivation so-called and self- standardized gross-error sensitivity is ... T -1 max [IFCy.,x.;>.-,B) V IF«y.,x.,>.-,B)] 1 -1 1 -1 i = max i ([tCy.,x.,>.-,B) T-l A t(y.,x.,~,B)] t W2 Cy.,x.;>.-,B)}. 1 -1 1 -1 1 -1 - _~Pte that w2 has been incorrectly omitted from the last term in equation C15) of Carroll and Ruppert experience wi th other problems we "a" is between satisfactory. 1.1 and To bound = Y2 Y2 mus t be at least Cp+l) t . From suggest bounding Y2 by a Cp+l) t, where (1985). 1.5, and a = 1.2 or 1.3 has generally been by a(p+l)t, we use the weighting function . .; T -1. -; m1n{1,a(p+l) [t(y,~;>.-,!) A r(y,xix,B)] }. BereA is defined implicitly since it depends upon Wand vice versa. In practice, >.-, !, and A are estimated iteratively. We used a simple iterative scheme: (1) Fix a>l. ~p e· Let C be the total number of cycles. be preliminary estimates, probably the MLEs. (2) Define Set c=l. Set W. =11 - Let >'-p and 15 In t use the weighted geometric mean y = \ w,lOgy.ILw.). \ exp(l 1. 1. 1. (3) Update the weights: W. 1. = t - - T--l - -t } • mi n {l, a (P+ I) [ t (y. , x. ,).. ,13 ) A t (y. , x. i).. ,13 )] 1. -1. P -p 1. -1. p-p (4) Solve N \ 1 W.t(y.,x.;)..,I3) = 0 1. 1. -1. - i=l 13 =!, c=c+l and return to (2). (5) If c<C, set >"p = ).. and , -p If c=C then stop. In the examples discussed will not report, ).. and 13 in section 5 and other examples stabilized at C=2. Therefore, we that we recommend C=2, or perhaps C=3 for small N or data sets with extremely influential points. In fact C=l seems adequate, at least for diagnostic purposes. We calculated step (3) with a short program in PROC MATRIX and step was performed in PROC NLIN. (4) PROC MATRIX is needed only to invert A, and the program should be easily modified when PROC MATRIX is replaced by an interactive matrix language. easy on other packages. This iterative transformation model. Undoubtedly, the computations would also be We will call the final estimate the BITBS. method can also be used for the response Instead of using (3) one sets It should be noted that this approach differs from the BIT estimate of e- 16 Carroll and Ruppert (1985), since the BIT estimates -Because the likelihood scores for Band 0- are, 0- simultaneously. respectively, linear and quadratic in the residual, the joint bounded-influence estimator of and 0- behaves as a rescending (Hampel 1978). outliers bounded. , "psi-function", e.g. Therefore, the influence on approaches zero when the \ influence \'!' a Hampel M-estimator and! of extreme response for 0- is simultaneously 5. AN EXAMPLE When managing a fish stock, one must model the relationship between the annual spawning stock size and the eventual production catchable-sized fish (returns or recruits) from the spawning. of· new Ricker and Smi th (1975) gi ve numbers of spawners (S) and returns (R) from 1940 until 1967 for the assumptions Skeena about River factors sockeye salmon influencing the stock. Using survival of some simple juvenile fish, Ricker (1954) derived the theoretical model R = a S exp(a S) = f(S:!) (14) 2 relating Rand S. Other models have been proposed, e.g. by Beverton and HoI t l (1957). However, the Ricker model appears adequate and, in particular, gives almost the same fit for this stock as the Beverton-Holt model. From figure 1, a plot of R against S, it is clear that R is highly e· variable and heteroscedastic, with the variance of R increasing with its mean. Several cases appear somewhat outlying, in particular #5, #19, and #25. The model (12) number: , - model (14) and the was linearized about square see figure 2. root of Cook's the MLE D was to form plotted the pseudo against case Case #5 and especially case #12 stand out. In figure 1 case #12 is somewhat masked by the heteroscedasticity since the residual on the or igi nal scale [R 12- f (S 12:!)] after transformation by the MLE >-is substantially larger though extremely high leverage point, model (12) is h 12 = 0.685. An = a leverage [R~U-f(~) (S. ,a)] 1 1 - point = e.1 by this relatively .314 the residual still not [R excessive. 12 small, but (>--)-f(>--) (S:,i») Case #12 is an and its Hat matrix diagonal according to h value exceeding considered high by Hoaglin and Welsch (1978). also is criterion. 2p/N = 6/28 Since h In S = figure are plotted against S aWl though 3, = .214 is .23, case #5 is the residuals #12 stands out, e 18 it does not seem extremely outlying until one accounts for leverage. -compensate for leverage, Belsley, Ruh, and (1980) Welsch To suggest standardizing A. by its estimated standard error to produce 1 RSTUDENT. = e./S(i)(l-h.)t, 111 where S(i) is the root mean square 8rror without case i. For this data set RSTUDENT 12 = -4.40! In table 1 we present influence diagnostics applied to model (12), exact change the qui ck es tima te 1\ ".- . Q, and another the 1\".., - 1 . t'10n --~i I\~ N . approxlma 1\".., 1 - and there It are at - is evident least that 1 I\~.Q is not always close to 1 - two possible causes of inaccuracy: (i) I\~. Q uses a linearization of the parameters and (ii) in model (12) y is - 1 held fixed rather than readjusted as cases are deleted. _.e effects of cause computed the (ii), we experimented wi th nonlinear least squares a To isolate the different approximation. - estimate ". (i) N minimiz ing (8) when the y is calculated from the full data but the sum of squares in ( 8 ) is over Then we j;ii. Of course, set is as difficult to compute as 1\".. itself and is not of interest as a practical - approximation; inaccurate. we have In table 1, 1 calculated I\~.N just to learn --1 II\".i' is large for i=5 and 12. why I\~.Q -1 is In both cases, I\~.N approximates --1 1\".. much better than I\~.Q which suggests that cause 1· 1 (i) It is clear, though, that small changes in y is the primary problem. from deleting an outlying y. 1 can have a notable effect on ".. We have compared I\~. and 1\".. Q for several other data sets, the kinetics data of -- Carr (1960) 1 - 1 analyzed in Box and Hill (1974) and Carroll and Ruppert (1984), the Atlantic menhaden spawner-recruit data in Carroll and Ruppert :1985), .arrOll and the (1986). "population In all A" spawner-recruit cases data though in not Ruppert an and accurate 19 If',,~. I, approXim.ation, did indicate large values of and 1 f'"l.. Q appears to ... 1 • be an effective diagnostic of high influence. We computed the BITBS estimate with bound a=1.2 and C=3 cycles. table 2, and )..., iteration of BITBS. W. non-uni t The changes are 1 given for the MLE In and in the estimates and weights each from the first to the second iteration are small, and one iteration seems adequate for most practical purposes; suffice~ certainly two iterations the BITBS estimate severely downweights case #12, Although the estimate of ~ only changes from .31 to .13 in two iterations of BITBS, while the MLE becomes ~ = -.2 if #12 is deleted. For these data, the BITBS influential points and reduces, but does not eliminate, detects their the influence. Case #12 was the year 1951 when a rock slide severely reduced recruitment (Ricker and Smith 1975). To model one would delete case #12 and refit. recrui tment under Without #12, normal condi tions the MLE and the BITBS estimate are similar (table 3), and one would probably use the MLE. The bulk of the data indicate that recruitment is highly heteroscedastic and a severe errors. transformation if is needed to induce homoscedastic Because the anomalous #12 occurs where S is small, less heteroscedastici ty used ()... = -.2) #12 is not and a deleted. more Since moderate case transformation #12 is not from it indicates ()... = .3) the is target popula ti:,n of normal spawni ng years, it seems safe to delete it. Thi s data set is an example where one might consider a robust estimator that gives essentially zero influence to extreme outliers, e.g. transformation models of a redescending-psi a generalization to M-estimator. We normally prefer using a redescending M-estimator to rejecting outliers, since an M~timatnr rejection has a known large sample distribution. methods asymptotically. upon the MLE are not The effects of outlier well understood, even Here we di stingui sh between rej ecting outliers based on e- 20 some statistical criterion, ~observations known say a hypothesis test, and deleting to come from a non-target population. Although the MLE and the BITBS are similar, when #12 is deleted, #4 and #5, are substantially downweighted by the BITBS. cases, had been z:nasked when #12 was included, the presence of #12. further final information If analysi s. deleted, two Case #4 but #5 was already influential in One might explore fur ther deletions though wi thout we feel #4 that #5 and in addition to #12, should case #4, then the MLE changes noticeably; be case retained #5, see Table 4. or in the both are The deletion of #5 and the deletion of #4 affect the MLE in somewhat opposite directions, though deleting #4 affects the MLE more severely. The BITBS downweights #4 somewhat more than #5, so effects of downweighting #4 and #5 tend to cancel. ·e a We do not view the "transform both sides" method chiefly as choosing new scale for conditional analyzing distribution the of y response, on the but rather original scale on which direct measurements have been made. (\) f\ = f (\) f\ modeling By scale. scale, we mean the scale of primary scientific interest, Y as the "original" usually also the The model (~;!) +€ leads to y = yeS) = We are = (f\(~;!)+~€)l/~ exp(log assuming approximately f(~;!) that € symmetric, \ ¥ 0, + \S) is and \ = o. approximately this last normal point and in suggests particular that the estimate of conditional median of y given x can be estimated by . h e condi tional mean ~an (1983). is easily estimated by Let r. by the i-th residual .1 the "smearing" 21 = r. 1 y. >...- f >.. (x. , B) 1 -1 - = >.. (y ~ >.. ) - f ( >...) (x. , B))• 1 -1 - Then the smearing estimate is N =N -1,\ (15) / i=l with obvious changes if >.. = o. If >.. is a or if >...-1 is an integer, then E(yl~) can be estimated by methods of Miller ~ when to a (1984). Miller's estimators are particularly simple It is a common practice to round >.. is in the set {-l,-t,a,!,l}. value in applicable. this set, in which case Miller's (1984) estimate is We do not necessarily advocate rounding >.. especially since there is some theoretical evidence against this practice (Carroll 1984), but when the rounded value is very plausible according to a then test, inference. the rounding The common should little have rationale for effect rounding >..., hypothesis on subsequent to make the transformation easily interpretable, is less compelling when one views as part of a model for y on the original scale. or is admittedly of little >.. In the present example, biological direct _- and economic interest, but the same is true of, say, 10g(R). A In figure 1, plotted. ~, m(RIS) A and E(R/S), both calculated without E(R/S) was also estimated by fixing>... = #12, are 0, re-estimating ! and and using Miller's (1984) estimate: ~ A f(~;!)e~ 2/ 2 The smearing estimate and Miller's estimate are so close that they would be barely distinguishable had Miller's estimate be included in figure 1. A To 8~e . the influence of case #12 on m(R/S) A and estimates were calculated both wi th and without case #12. E(R/S), When these #12 is tit 22 deleted then not only are \ and B set equal to the MLE without #12, but 4ILlso the averaging in (15) is over i¥12. The changes in the estimated median and mean caused by deletion of case #12 are graphed in figure 4. As might be expected, deleting #12 caused the estimated median and mean recruitment to increase for small S, especially for S near 300-400. The most dramatic change when deleting #12 is a decrease in estimated median and mean recrui tment for large S. This decrease is since B 2 largely brought about by the decrease in B , 2 controls the shape of the Ricker curve. The "transform both sides" model is certainly that would be appropriate for this example. not the only model Since R is heteroscedastic but not greatly skewed one should consider heteroscedastic models such as (16 ) R = f(S;~,> OC + o-f (S;!)e, where the variance of R is propor.tional to a power of S or of f. In Ruppert and Carroll (1986), the model R (\) = f ( ~) (S;!) + 0- Soc e ( 17> was fit to the Skeena River data, \ = • 75 by with case #12 and oc = . 5. However, both Ho·. oc 1 ikel ihood ratio tests at level .10, = included. 0 and Ho·• \ so models = (4) The MLE was 1 are accepted and (16) both appear reasonable for these data, would be of interest. We plan to study diagnostics and robust estimation for model (17) in the future. though a re-analysis wi thout case #12 23 6. When a response y is SUMMARY thought to fit a f(~~!), model heteroscedastic and/or nonnormally distributed, then y and transformed in the same manner to induce f(~~!) normal errors while retaining the model of y. Often outliers transformation~ in the approximately original but f(~~!) y is can be homoscedi-istic, for the conditional median response are accommodated that is, the outliers are seen to be the resul t by of the skewness or heteroscedasticity in the untransformed data. In some situations, an outlier will indicate a substantially different transformation than that fitting the bulk of the data. In our example with 28 observations, case #12 is a response outlier associated with conditional a small Therefore, the rest value case of of #12 the the counter-indicates data, = transformation from \ and deleting .3 to \ = median (and the severe #12 changes mean) response. heteroscedastici ty the estimated 4It~ in power -.2. Influential cases should be detected and scrutinized as a matter of standard case good #12 in influential statistical our practice. example, case. In other there cases, In are some situations, good the reasons for appropriate such as with removing treatment of an the outliers will be less clear-cut. In this paper, we propose an approximation to the sample influence curve. Although effective the diagnostic approximation for influence is not cases. highly We accurate also propose it a is bounded :nfluence estimator, which can be used to pinpoint influencial cases, to accommodate them, or both. The diagnostic can both be computed with standard software. and the robust an or estimator e APPENDIX There are at least three methods of estimating the covariance matrix of (!, ~) : (i) Evaluate the Hessian of -L(!, the observed Fisher information matrix, left submatrix of I (!,~). evaluated at -1 (i i) . Use (I I* Let * ) -1 . (iii) ~,o-) 1. Use the be the ~ obtained of estimation and we have described estimat~d in !(~), by Hessian of which Then 2. is max (!, ~) (10 ) we have the least squares y(~) is fit to f(~) (~:!) with ~ fixed at~. estimate when -L upper- Suppose we followed the section ...... ! (p+l)X(p+l) Also there is a fourth method which only estimates the covariance matrix of 6. method This is • Use the estimate from model treated as a nonlinear regression problem. Box-Cox <.!!' ~,o-) at This fit also !, which we will call the method gives an estimated covariance matrix for (iv) estimate. -e (i i i) Methods lmplemented on justified by standard (iv) are the nonlinear the well.;...known Method (ii) estimation. for and large ignores 0- easiest to regression sample use since software. theory of they can be Method (i ) is maximum likelihood as a likelihood and treats L (6,>--) max - (!, ~) . In theorem identical, general not for A.2 just below, for parametric we the show that methods (i) transformation problem under estimation where a parameter and is (ii) study are but in eliminated by maximizing the likelihood over that parameter. N ~ In general, methods (i) and (ii) are not equivalent to (iii) even as co, are the but by theorem A.l below all three estimates of var (B) same in the limit as N Carroll a;;r} Ruppert . .rovide a ~ ~ co and ~ (1984) co and 0- o. ~ have let N ~ simple asymptotic theory for 0- fixed theory Bickel and co and ()- ~ Doksum "Small and 0 simul taneously to transformations, is complicated. (1981) 0-" since the usual asymptotics have 25 often proved to checked against from be good approximations to finite Moreover, in many Monte Carlo. engineering and the physical sciences, cr sample data does results sets, when especi ally seem small in the sense that the model fits the data very well. In Carroll and Ruppert (1984) we show that methods equivalent estimates of var(!) as N In summary, methods (i) and ~ (ii) are that method (ii> does not estimate var(cr) and All >.-. var ( J3 ) as four N ~ methods and 00 cr give It does correctly estimates var(>'-) or cov(>'-,!). >.- can be used in place of A a O. identical, except of not equivalent appear that estimates method The confidence interval standard course or the covariance of cr with J3 asymptotically o. ~ ~ and cr 00 (i) and (iv) are error for but a of (iii) (9) for 6-method A standard error of E(yl~) or m(yl~) will require an estimate of cov(>'-,!) and var(>'-). Programming method (i i) by computing the analytic second deri va ti ve matrix of L is somewhat a bother, but the gradient of L max programmed and can be differentiated numerically. 2 . L (8,>.-) = -(N/2)log~ (!,>.-) max - max is easily Since where N = N -1 \ / z 2 (y.,x.;/3,>.-) 1 -1 - i=l and since the gradient of ~2 at N = _-2 -cr \ / i=l is zero, Cd /J >.-) the Hessian of L max at (z~)J (J/J~)(zw) where all quanti ties on the right hand side are evaluated at (!, >.-) . It e 26 is not difficult to numerically compute the derivatives of (zu) and ~ith respect to! and ~. In table A.l we compare errors for the Skeena River the data method with (ii>, case methods produce similar standard errors of 6 ~ of ~ #12 and 13 . 2 and (iv) deleted. standard The three The standard error by method (iii) seems substantially inflated. Theorem A.l ()- 1 (iii> (zw) 0, Suppose methods (i) that and ~1'~2' ... (iii> are i.i.d. Then as N ~ 00 and of estimating the covariance matrix of ! are asymptotically equivalent. Sketch of proof: t N 12 = j\ 22 u(y.,x.;a,Uw(y.,x.;B,~) 1 -1 1 -1 - - i=l N J_ t Define: = ~ j 2 w (y i ' ~i ; a , ~) A A i=l and '" t ~12 ~ll = ~12 T ~22 Then the estimated covariance matrix of (!,).,) .here s2 is the mean square error. Now as N ~ by method (iii) 0) is s2 t -1 27 and by Taylor expansions (No-) -Itt 12 ~ 0 and as N ( 1) • ) ~ and 00 0- o. ~ Therefore, by (Note that YI is a function of €I; method <iii) the estimate see equation Var«N!/o-)!,N'~) of converges to o D By -1 theorem 1 of Carroll and Ruppert (1984) method has (i) asymptotic estimate of var( (N'/o-)!) but a different estimate of Note: Some type of asymptotics to hold. regulari ty condi tions The assumption that on {x.} -1 but other assumptions could be used instead. {x.} -1 are same Var(N'~). needed are i. i. d. For a the for the is convenient rigorous result, an appropriate regularity condition on f would also be needed. Theorem A. 2: Let L(~,o-), e€R k and o-€R q , be a real-valued Let Le and Lo- be the first partial derivatives and let Lee' function. Leo-' and e 28 e L o-e- be the second partial derivatives of L. For each 9 suppose 0-(9) satisfies L(9,e-(9» = sup (A. 1 ) L(~,o-). 0- Then is the upper left kXk submatrix of -1 (A. 2) T L 90- (i!.,o- (i!.» Proof: It is enough to prove the theorem when g=l, for then the general case follows by induction. a Le-(~'O-(i!.» = By (A.l) 0, -.so that Therefore (A. 3) Next (A. 4) where all terms on the right-hand side of (A.4) are evaluated at i!.,o-(i!.). Substituting (A.3) into (A.4) we have 29 Using the identity, V" v.eR k (A- 1 +UV T )-1 = A-1_A-1UVTA-1/<l+VTA-1U) if A€R kXk and (see problem 2.8, page 33 of Rao 1973) we have (A. 5) By another identity (see problem 2.7, page 33 of Rao 1973), (A.5) is the kXk upper left submatrix of (A.2). e- 30 Table 1 Diagnostics for the Skeena River sockeye salmon data. Diagnostics Case Number 5 12 19 25 Residual 917 -939 -882 -922 RSTUDENT 2.25 -4.40 -1.93 -2.04 Hat diagonal .23 .68 .08 .08 Cook's D .43 8.09 .09 .11 1. 23 -6.49 -.55 -.62 -.46 -.71 .13 .19 .55 1. 38 -.33 -.41 -1.01 6.06 -.11 -.11 -.31 1. 56 -.04 -.04 -.17 .77 -.01 -.01 -.10 .51 -.03 -.03 DFFITS ~ DFBETAS-J3 -e DFBETAS-J3 1 2 ~ DFBETAS- ~ /\~.1 Q - ~ /\~ - .N 1 A /\~. - 1 31 Table 2 Maximum likelihood and bounded-influence estimates for the Skeena River data. C=o (MLE) f3 1 f3 2 3.295 -6.9998x10- 4 .3141 >-- No cases deleted. C=l C=2 3.590 3.619 -8.307x10 -4 -8.49x10 C=3 3.622 -4 -8.50x10 .1921 .1329 .1138 .579 .647 W s 1.0 .448 w6 1.0 .931 w12 1.0 .253 .188 .172 wI9 1.0 .811 .857 .874 w25 1.0 .733 .776 .790 1.0 -4 1.0 e~ 32 Table 3 Maximum likelihood and bounded-influence estimates for the Skeena River data. C=O (MLE) 13 1 A 13 2 3.78 -9.54xlo- 4 C=1 C=2 3.89 3.98 -10.2x10 Case #12 deleted. -4 -9.93xlO A ~ -.199 -.254 -.235 w 1.0 .377 .575 W s 1.0 .448 .753 w 6 1.0 1.0 .946 wg 1.0 1.0 .954 1.0 .904 4 ~eW12 This case is deleted .w 18 1.0 w 19 1.0 .781 .860 w 1.0 .703 .846 25 -4 Table 4 Maximum likelihood estimation for the SkeenaRiver data with selected cases removed Cases Removed 6 .... 1 6'2 >-- #12 #4, #12 #5, #12 3.78 4.20 3.89 -9.54X10- 4 -.199 -11.2Xl0- 4 -.428 -10.5Xl0- 4 -.126 #4, #5, #12 4.30 -12.1Xl0- 4 -.392 34 Table A.l Estimated standard errors for the Skeena River sockeye salmon data without case #12 Method s.e(/31) s.e. (/32) s.e. (>-,) ( ii> 0.698 3.17XIO- 4 0.369 <iii> 0~711 3.33XIO- 4 0.624 (iv) 0.694 3.06XIO- 4 35 LIST OF FIGURES Fig. 1 - median Plot of returns recruitment (or estimated recrui ts > against without case spawners #12. wi th mean Selected cases and are identified. Fig. 2 - Square root of Cook's distance plotted against case number. Fig. 3 - Residuals spawners. = [R\-f(S,i>\] from the full-data MLE plotted against Selected cases are identi.fied. Fig. 4 - Differences in mean and median recruitment estimated without and with case #12 plotted against spawners. e'" 3600 - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - t 3 .' 2 Cl -.,:,,:. o o u ~ o +-> o o ex:: Q) \... Itl :::l CT (/) I) N 8 10 12 l il 16 Case Number 18 3 " *5 2 - * * * * - * *16 * >I< .ro 0 ·e~ ~ *>1< '" * * (J) 0:: 1-I -1 >I< ~ >I< *>1< -1 J .j * 22 *4 J * 6 -J I -i .j -2 _i J ~ 19* *12 * 25 J \' -3 -i \ 0 e 250 • , , I 500 750 Spawner's (in thousands of fish) 1000 *9 100 , ,, , , , 50 '\ , MEAN , \ MEDIAN \ \ N .- 'l<= \ 0 \ \ OJ \ \ Vl \ ItS U \ \ \ \ oJ::: \ .j-J \ \ 3: \ \ \ -0 C ItS \ \ -SO \ e· .j-J \ :;:, \ 0 \ oJ::: .j-J \ \ \ 3: \ \ OJ \ .j-J \ \ ItS \ E .j-J Vl -100 \ \ OJ \ \ \ C OJ OJ \ \ 3: \ .j-J OJ .0 OJ U c OJ -1 SO OJ 44- Cl OJ 0> C ItS oJ::: u -200 f -250 400 20 r Spawners (in thousands of fish) 1200 REFERENCES ATKINSON, A. C. (986). "Diagnostic tests for To appear in Technometrics. transformations." (980) • "Relative curvature BATES, D. M. and WATTS, D. G. Journal of the Royal Statistical measures of nonlinearity." Society, series ~, 42, 1-25. BATES, D. M., WOLF, D. A., and WATTS, D. G. (1986) . "Nonlinear least squares and first-order kinetics." Manuscript. BELSLEY, D. A., KUH, E., and WELSCH, Diagnostics. New York: Wiley. BEVERTON, R. J. H., and S. J. HOLT. exploited fish populations." Office, London. 533 p. R. E. (1980) • Regression (1957). "On the dynamics of Her Majesty's Stationary analysis of BOX, G. E. P. and COX, D. R. (964) . "An transformations (with discussion)." Journal of the Royal Statistical Society, Series B, 26, 211-246. -e BOX, G. E. P. and HILL, W. J. (1974). Correcting inhomogeneity of variance with power transformation weighting. Technometrics, 16, 385-389. CARR, N. L. pentane. (1960) . Kinetics of catalytic isomerization of nIndustrial and Engineering Chemistry, 52, 391-396. CARROLL, R. J. (1982). Prediction and power transformation when the choice of power is restricted to a fini te set. Journal of the American Statistical Association, 77, 908-915. CARROLL, R. J. and RUPPERT, D. (1984). "Power transformations Journal of the when fi tting theoretical models to data." American Statistical Association, 79, 321-328. CARROLL, R. J. regression: and RUPPERT, D. (1985) • "Transformations a robust analysis." Technometrics, 27, 1-12. in COOK, R. D. and WANG, P. C. (1983) . "Transforma tion and influential cases in regression." Technometrics, 25, 337-343. COOK, R. D. and WEISBERG, S. (982). Regression. New York and London. DRAPER, N. and SMITH, H. (981). 2nd Edition. Wiley: New York. Residuals and Influence in Chapman and Hall. Applied Regression Analysis, 37 DUAN, N. (1983). "Smearing estimate: a retransformation method." Journal of Statistical Association, 78, 605-610. HAMPEL, F. A. Estimation. Berkeley. (1968) . Ph.D. nonparametric the American contributions to the Theory of Robust Thesis. University of California, HAMPEL, F. R. (1974). "The influence curve and its role in robust estimation." Journal of the American Statistical Association, 62, 1179-1186. HAMPEL, F. R. (1978) . "Optimally boundi ng the gross-errorsensi ti vi ty and the influence of posi tion in factor space." 1978 Proceedings of the ASA Statistical Computing Section. ASA, Washington, D.C., 59-64. HOAGLIN, D. C. and WELSCH, R. (1978). "The hat matrix in regression and ANOVA." The American Statistician, 32, 17-22. KRASKER, W. S. (1980). "Estimation in linear regression models with disparate data points." Econometrica, 48, 1333-1346. KRASKER, w. S. and WELSCH, R. E. (1982). "Efficient influence regression estimation." Journal of the Statistical Association, 77, 595-604. boundedAmerican MILLER, D. M. (1984) . "Reducing transformation bias fitting. The American Statistician, 38, 124-126. in RAO, C. R. (1973). Linear statistical inference applications, second edition. Wiley: New York. and RICKER, W. E. (1954). "Stock and recruitment." Fisheries Research Board of Canada, 11, 559-623. curve Journal its of RICKER, W. E. and SMITH, H. D. (1975). "A revised interpretation of the hi story of the Skeena River sockeye salmon (Oncorhynchus nerka)." Journal of the Fisheries Research Board of Canada, 32, 1369-1381. RUPPERT, D. (1985) . "On the bounded-influence regression estimator of Krasker and Welsch." Journal of the American -Statistical Association, 80, 205-208. RUPPERT, D. and CARROLL, rp0cession analysis relationships." To Yorque Conference of New York. R. J. (1986). "Data transformations in with applications to stock-recruitment appear in the Proceedings of the Ralph Natural Resource Management. Springer: 38 SNEE, R. D. (1985). "An alternative approach to fitting models when re-expression of the response is useful. II' Manuscript. .. ·e •
© Copyright 2025 Paperzz