1st European Workshop on the Assessment of Diagnostic Performance 71 PSEUDO R-SQUARED MEASURES FOR GENERALIZED LINEAR MODELS Martina Mittlböck∞ and Harald Heinzl Department of Medical Computer Sciences, Medical University of Vienna, Austria Summary R-squared is routinely used in linear models. It quantifies how much variation of the outcome variable can be explained by model covariates. Pseudo R-squared measures become more and more popular for generalized linear models. Different suggestions are reviewed how to define pseudo Rsquared measures and corresponding adjustments for models with small sample size and/or many covariates. Also a solution for Poisson regression models with over- or under-dispersion is presented. Key words: coefficient of determination; pseudo-R-squared measure; generalized linear models; deviance; sums-of-squares; over/under-dispersion. Introduction Regression models provide rather simple images of an intrinsically complex reality. Potential prognostic factors are assessed for their importance and an improved prognosis of the interesting outcome is enabled. For instance, the hypothesis may be tested, which factors affect the occurrence of complications after surgery. So, regression models are used to model and describe the effect of prognostic factors on the interesting outcome. The closer the model to ∞ Corresponding Author: Martina Mittlböck, Department of Medical Computer Sciences, Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria e-mail: [email protected] Phone: +43 1 40400 2276, Fax: +43 1 40400 2278 72 1st European Workshop on the Assessment of Diagnostic Performance reality, the more variability of the outcome variable can be explained by the model. This fraction of the explained variability can be quantified by the coefficient of determination, also called R-squared measure. It gives additional information in regression models besides the parameter estimates, p-values and confidence intervals of the covariates. In the ideal case of a perfect prognosis R-squared would achieve one. On the other hand if nothing at all can be explained by the model, the lower bound of zero is attained. R-squared measures in linear and generalized linear models R-squared is well-known and frequently used in linear regression models and becomes also more and more familiar in generalized linear models with non-normal outcomes, e.g. logistic, Poisson, gamma and inverse Gaussian models (Mittlböck and Heinzl, 2003). Proposed pseudo R-squared measures for generalized linear models are based on the concepts of deviances ( RD2 ) 2 and on sums-of-squares ( RSS ) and are defined as follows: ( y i - mˆ i )2 Â D( y; mˆ ;f ) 2 i R =1and RSS = 1 D( y; y;f ) Â i ( y i - yi ) 2 , 2 D where D( y; mˆ ;f ) and D(y; y; f) denote the scaled deviances of the full model including the covariates of interest and the intercept-only model, respectively. y=(y1,…,yn), mˆ = (m1 ,..., m n ) and y = ( y1 ,..., y n ) are the vectors of observed values, estimated values under the full model, and estimated values under the intercept-only model, respectively. Note that usually y1 = ... = y n , however, this is not true in any case (consider e.g. a Poisson model with offset). 2 are identical for normal outcome, but they may By definition, RD2 and RSS result in quite different estimates for non-normal outcomes. RD2 is commonly preferred as it corresponds to the model fitting criteria of maximum likelihood. Thus it cannot become negative and increases monotonically with increasing number of covariates. All these properties are not true for 2 , it can even decrease if a statistically significant covariate is added to a RSS 2 model (Heinzl and Mittlböck, 2002). However, RSS is occasionally preferred because it is claimed that the proportional reduction in the error-sums-ofsquares has an intuitive interpretation also in non-normal outcome variables. 1st European Workshop on the Assessment of Diagnostic Performance 73 Adjustments of R-squared measures in generalized linear models Regression models are often used to screen for prognostic factors, even in situations where the sample size is rather small compared to the number of 2 increases monotonically if covariates are added to the covariates. As RD model, the estimated predictive ability of that model can not decrease with an increasing number of covariates, even if they are not correlated with the interesting outcome at all. So unadjusted R-squared measures may be substantially inflated, jeopardizing the ability to draw valid interpretations. R-squared values of 30 percent or higher can easily be reached, even when no association between independent and dependent variables exists at all. E.g. with n=50 observations with normally distributed outcome and k normally distributed covariates, R-squared increases up to 37 % with k=20 covariates when there is no association at all between the covariates and the dependent variable (Table 1). Table 1: Inflation of unadjusted R-squared measures in linear models (n=50) 1 k 2 D R 2 <0.01 0.03 4 6 8 10 20 0.06 0.09 0.10 0.12 0.37 The use of adjusted R-squared measures which considers also the number of parameters fitted is well established in linear regression models and is defined by mean-squared-errors: 2 SS ,df R = 1- (n - k - 1)-1Â i (yi - mˆi )2 (n - 1)-1Â i (yi - y )2 . For generalized linear models with non-normal outcome (logistic, Poisson, gamma and inverse-Gaussian models) corresponding corrections have been studied (Mittlböck and Schemper, 1996, 1999; Cameron and Windmeijer, 1996; Waldhör et al., 1998; Mittlböck and Waldhör, 2000; Mittlböck, 2002; Mittlböck and Heinzl, 2002; Heinzl and Mittlböck, 2002). A rather ad-hoc bias adjustment is based on the analogy to the adjusted R-squared measure 2 RSS , df of the linear model and is denoted by 74 1st European Workshop on the Assessment of Diagnostic Performance D( y; mˆ ;f ) (n - k - 1) D( y; y;f ) (n - 1) Another type of adjustment of the deviance-based R-squared measure for generalized linear models is based on shrinkage: RD2 , df = 1 - RD2 , g = g RD2 = 1 - D( y; mˆ ; f) + k D( y; mˆ ) + kf =1, D( y; y; f) D( y; y ) where D( y; mˆ ) and D(y; y ) denote the unscaled deviances of the full model and the intercept-only model, respectively. The notation “g” in the subscript refers to the fact that R 2D,g is the shrunk R-squared measure and g is the shrinkage factor (see Copas, 1983, 1987, 1997; Van Houwelingen and Le Cessie, 1990). It is estimated by gˆ = ˆ -k G ˆ , G ˆ = D( y; y; f) - D( y; mˆ ; f) is the log-likelihood ratio c2-statistic for where G testing whether any of the k fitted covariates are associated with the response. The expectation of the adjusted R-squared measure R 2D,g corresponds to the underlying population value. Note that E( R 2D, g ) = 0 if the true R2 equals zero, that is, the covariates have no influence on the outcome. The resulting adjustment coincides with the adjusted R-squared measure in linear regression and with already proposed adjustments for R-squared measures for some situations in generalized linear models (Mittlböck and Waldhör, 2000; Mittlböck, 2002, Mittlböck and Heinzl, 2002; Heinzl and Mittlböck, 2002). 2 RD2 , df is equal to RD,g when f is estimated by the unscaled deviance D(y; m̂ ) divided by the degrees-of freedom (Heinzl and Mittlböck, 2003). Therefore, in case of Poisson and logistic regression models, where usually f=1, RD2 , df accounts implicitly for over- or under-dispersion, assuming the simple modelling of over-dispersion by Var(Y)=fm. If f>1, over-dispersion is modelled and if f<1, then an under-dispersed model is assumed, where f is estimated by the deviance divided by degrees-of-freedom. Therefore RD2 , df 1st European Workshop on the Assessment of Diagnostic Performance 75 should only be used for logistic and Poisson regression models if ove r- or under - dispersion is diagnosed and the model results (e.g. standard error of the parameter estimates, test statistics and p-values) are adjusted accordingly, otherwise results of RD2 , df will not agree with the fitted model. Example for inverse-Gaussian regression models The following example, discussed in Heinzl and Mittlböck (2002), should 2 illustrate the inappropriateness of RSS for some generalized linear models. Gruenberger et al. (2002) studied the survival of 212 patients treated with intra-arterial chemotherapy for unresectable colorectal cancer liver metastases. The authors were interested in the prognostic relevance of the following 6 covariates on survival: 1) differentiation of the primary tumor (MERGDIFF; levels: well/moderate/poor), 2) the percentage of hepatic replacement by colorectal cancer liver metastases (PHR, levels: <25 % / 2550 % / >50-75 % / >75 %), 3) the number of colorectal cancer liver metastases (METCAT; levels: <5 tumors / 5-10 tumors / >10 tumors), 4) the presence of extrahepatic disease at laparotomy (EXHEPP; levels: no / yes), 5) carcinoembryonic antigen (CEA) response to hepatic arterial infusion chemotherapy (CEARESPO; levels: non-responder / patient whose postoperative CEA level returned to normal / non-secretor / responder) and 6) echogenicity (ECHO; levels: hypo / hyper). Only data of patients with a treatment from 1992-1996 and complete data records are used. Since the underlying illness is so critical, 85 of the 87 patients died before the end of the study period. Thus only 2 censored overall survival time values exist, which in order to simplify matters are treated as uncensored values. Overall survival time was used as outcome variable in an inverse-Gaussian (IG) regression model with reciprocal link. All six potential prognostic factors were used as explanatory variables. Since we considered all prognostic factors to be measured on a qualitative scale, the model consisted of 12 dummy variables. Only MERGDIFF appeared to be a statistically significant 76 1st European Workshop on the Assessment of Diagnostic Performance prognostic factor in the IG regression model, the other 5 factors were 2 and its statistically not significant. This fact is reflected in the values of RD adjustments (second row of Table 2). The unadjusted value of 18 percent is 2 and its corrected to values between 5 and 8 percent. The values for RSS adjustments are negative. Table 2: Deviance-based and sums-of-squares-based R-squared measure values in percent for liver metastases data. MERGDIFF k 2 RD,g 2 RD , df 2 RD,g 2 RSS 2 RSS , df not in model 10 11 -1 1 10 -2 in model 12 18 5 7 -18 -37 In a next step we wanted to know what happens, if the crucial prognostic factor MERGDIFF is eliminated from the model. When refitting the IG regression model we found that none of the remaining 5 prognostic factors were statistically significant. The adjustments for the R-squared measures work fine here: unadjusted values of around 10 percent are squeezed down to values of around zero percent (first rows of Table 2). That is, these 5 prognostic factors cannot explain any variation in the overall survival times when an IG regression model with reciprocal link is used. Finally, the striking case of MERGDIFF in Table 2 reveals an unwanted 2 : adding a statistically significant prognostic factor to the IG feature of RSS 2 . On top of that, model can considerably decrease the observed value for RSS this value can become even negative. R-squared measures in over- or under-dispersed Poisson regressions The variance for Poisson data is Var(Y)=m. However, over-or underdispersed models can be fitted by assuming that the variance function is Var(Y)=fm. The scale parameter f is usually estimated by the deviance or the Pearson c2 statistic divided by the degrees of freedom. Heinzl and Mittlböck 1st European Workshop on the Assessment of Diagnostic Performance 77 (2003) investigated the deviance-based pseudo R-squared measure and corresponding adjustments for such over- and under-dispersed Poisson regression models. The unadjusted 2 is not affected by over- or under-dispersion, but the results are too high RD 2 for models with many covariates and/or few observations. RD,g is a good R- squared measure for models with no over- or under-dispersion, but it results in too low values for under-dispersed and too high values for over-dispersed models. If Var(Y)=fm, then an appropriate pseudo R-squared measure takes into account that fπ1. Heinzl and Mittlböck (2003) have shown in a simulation study, where pseudo-Poisson distributed random variables with 2 over- and under-dispersion were generated, that RD, gf = 1- D( y; mˆ ) + kf is D( y; y ) nearly unbiased. RD2 ,g D means that the estimation of f is based on the deviance, and RD2 ,g P if the Pearson’s c2 statistic is used to estimate f. Both, RD2 ,g D and RD2 ,g P behave rather satisfactorily in the simulation study. Part of the results are illustrated in Figure 1, where boxplots of the distributions of 1000 simulated samples for unadjusted and adjusted pseudo R-squared values (in percent) are shown for a true population value of 40 percent. A rather extreme situation of a very small sample size of 16 and 5 covariates is chosen for an under-dispersed (f=0.25), standard (f=1) and over-dispersed (f=4) Poisson regression model. Poisson-distributed random values were generated with mean m/f and multiplied with f, so that the resulting random variable Y mimics an over- or under-dispersed Poisson-distribution with mean m and variance mf, which works well for simulation purposes. The mean m is usually expressed as m = exp(b 0 + b1 x1 + K + b k xk ) . Only the first covariate x1 was assumed to influence the mean; the prognostic effect of all other covariates was eliminated by setting b 2 = K = b k = 0 , and b0 was set to an arbitrary value of 2. 78 1st European Workshop on the Assessment of Diagnostic Performance R-squared measures of Poisson and Logistic regression models in epidemiological studies If the Poisson regression model can be used as an approximation to the logistic regression model (which happens quite frequently in epidemiological research), then the model results concerning estimates of the parameters, RD2 2 2 RD,g RD2 ,g D RD,g P RD2 2 2 RD,g RD2 ,g D RD,g P RD2 2 2 RD,g RD2 ,g D RD,g P Fig. 1: Boxplots of the distributions of 1000 simulated samples for unadjusted and adjusted pseudo R-squared values (in percent) for a sample size of 16, 5 covariates and f=0.25, 1 and 4, respectively. Horizontal lines are drawn for true population value of 40 percent. standard errors, confidence intervals and p-values will be identical, however, the corresponding R-squared estimates will differ dramatically (Mittlböck and Heinzl, 2001). The Poisson model may give an R-squared estimate of, say, 90 percent, which would indicate a nearly perfect prediction, and the corresponding logistic model may achieve an R-squared value of, say, 10 percent, which is rather poor. This apparent contradiction of these results becomes understandable, if we keep in mind, that the Poisson regression model is used to estimate group means. For instance, the Poisson model predicts for a group, say, 3.9 % cures. If in reality, say, 4.1 % cures are observed, then the prediction will be very good with respect to this group. On the other hand, the logistic regression model attempts a prognosis for the individual patient. Naturally, it is much easier to predict that one out of 25 will be cured, than to predict who will be actually cured. 1st European Workshop on the Assessment of Diagnostic Performance 79 Conclusions In summary, correctly adjusted R-squared values give essential information additional to the usual modelling results as they try to quantify the knowledge (or nescience) about the outcome variable of interest. Sums-of-squares based R-squared measures can not be recommended in general. Proper adjustments are necessary if models with small sample size and / or many covariables are fitted. References Cameron, A.C. and Windmeijer, F.A.G. (1996). R2 measures for count data regression models with applications to health-care utilization. Journal of Business and Economic Statistics, V14, 209-220. Copas, J. B. (1983). Regression, prediction and shrinkage (with discussion). Journal of the Royal Statistical Society B, V45, 311-354. Copas, J. B. (1987). Cross-Validation Shrinkage of Regression Predictors. Journal of the Royal Statistical Society B, V49, 175-183. Copas, J. B. (1997). Using regression models for prediction: shrinkage and regression to the mean. Statistical Methods in Medical Research, V6, 167-183. Gruenberger, T., Zhao, J., King, J., Chung, T., Clingan, P. R. & Morris, D. L. (2002). Echogenicity of liver metastases from colorectal cancer is an independent prognostic factor in patients treated with regional chemotherapy. Cancer, V94, 1753-1759. Heinzl, H. and Mittlböck, M. (2002). R-squared measures for the inverse Gaussian regression model. Computational Statistics, V17, 525-544. Heinzl, H. and Mittlböck, M. (2003). Pseudo R-squared measures for Poisson regression models with over- or underdispersion. Computational Statistics and Data Analysis, V44, 253-271. Mittlböck, M. (2002). Calculating adjusted R2 measures for Poisson regression models. Comput.Methods Programs Biomed., V68, 205-214. Mittlböck, M. & Heinzl, H. (2001). A note on R2 measures for Poisson and logistic regression models when both models are applicable. Journal of Clinical Epidemiology, V54, 99-103. Mittlböck, M. & Heinzl, H. (2002). Measures of explained variation in gamma regression models. Communications in Statistics - Simulation and Computation, V31, 61-73. Mittlböck, M. & Heinzl, H. (2003). Measures of Explained Variation for Nonnormal Outcomes. Proceedings of the Second Workshop on Research 80 1st European Workshop on the Assessment of Diagnostic Performance Methodology, June 25-27, 2003, Amsterdam, 47-50. Mittlböck, M. & Schemper, M. (1996). Explained variation for logistic regression. Statistics in Medicine, V15, 1987-1997. Mittlböck, M. & Schemper, M. (1999). Computing measures of explained variation for logistic regression models. Computer Methods and Programs in Biomedicine, V58, 17-24. Mittlböck, M. & Waldhör, T. (2000). Adjustments for R2-measures for Poisson regression models. Computational Statistics and Data Analysis, V34, 461472. Van Houwelingen, J. C. & Le Cessie, S. (1990). Predictive value of statistical models. Statistics in Medicine, V9, 1303-1325. Waldhör, T., Haidinger, G. & Schober, E. (1998). Comparison of R2 measures for Poisson regression by simulation. Journal of Epidemiology and Biostatistics, V3, 209-215.
© Copyright 2026 Paperzz