PSEUDO R-SQUARED MEASURES FOR GENERALIZED LINEAR

1st European Workshop on the Assessment of Diagnostic Performance
71
PSEUDO R-SQUARED MEASURES FOR GENERALIZED
LINEAR MODELS
Martina Mittlböck∞ and Harald Heinzl
Department of Medical Computer Sciences, Medical University of Vienna,
Austria
Summary
R-squared is routinely used in linear models. It quantifies how much
variation of the outcome variable can be explained by model covariates.
Pseudo R-squared measures become more and more popular for generalized
linear models. Different suggestions are reviewed how to define pseudo Rsquared measures and corresponding adjustments for models with small
sample size and/or many covariates. Also a solution for Poisson regression
models with over- or under-dispersion is presented.
Key words: coefficient of determination; pseudo-R-squared measure;
generalized linear models; deviance; sums-of-squares; over/under-dispersion.
Introduction
Regression models provide rather simple images of an intrinsically complex
reality. Potential prognostic factors are assessed for their importance and an
improved prognosis of the interesting outcome is enabled. For instance, the
hypothesis may be tested, which factors affect the occurrence of complications
after surgery. So, regression models are used to model and describe the effect
of prognostic factors on the interesting outcome. The closer the model to
∞
Corresponding Author: Martina Mittlböck, Department of Medical Computer Sciences,
Medical University of Vienna, Spitalgasse 23, A-1090 Vienna, Austria
e-mail: [email protected]
Phone: +43 1 40400 2276, Fax: +43 1 40400 2278
72
1st European Workshop on the Assessment of Diagnostic Performance
reality, the more variability of the outcome variable can be explained by the
model. This fraction of the explained variability can be quantified by the
coefficient of determination, also called R-squared measure. It gives additional
information in regression models besides the parameter estimates, p-values
and confidence intervals of the covariates. In the ideal case of a perfect
prognosis R-squared would achieve one. On the other hand if nothing at all
can be explained by the model, the lower bound of zero is attained.
R-squared measures in linear and generalized linear models
R-squared is well-known and frequently used in linear regression models
and becomes also more and more familiar in generalized linear models with
non-normal outcomes, e.g. logistic, Poisson, gamma and inverse Gaussian
models (Mittlböck and Heinzl, 2003). Proposed pseudo R-squared measures
for generalized linear models are based on the concepts of deviances ( RD2 )
2
and on sums-of-squares ( RSS
) and are defined as follows:
( y i - mˆ i )2
Â
D( y; mˆ ;f )
2
i
R =1and RSS = 1 D( y; y;f )
 i ( y i - yi ) 2 ,
2
D
where D( y; mˆ ;f ) and D(y; y; f) denote the scaled deviances of the full model
including the covariates of interest and the intercept-only model, respectively.
y=(y1,…,yn), mˆ = (m1 ,..., m n ) and y = ( y1 ,..., y n ) are the vectors of observed
values, estimated values under the full model, and estimated values under the
intercept-only model, respectively. Note that usually y1 = ... = y n , however,
this is not true in any case (consider e.g. a Poisson model with offset).
2
are identical for normal outcome, but they may
By definition, RD2 and RSS
result in quite different estimates for non-normal outcomes. RD2 is commonly
preferred as it corresponds to the model fitting criteria of maximum
likelihood. Thus it cannot become negative and increases monotonically
with increasing number of covariates. All these properties are not true for
2
, it can even decrease if a statistically significant covariate is added to a
RSS
2
model (Heinzl and Mittlböck, 2002). However, RSS
is occasionally preferred
because it is claimed that the proportional reduction in the error-sums-ofsquares has an intuitive interpretation also in non-normal outcome variables.
1st European Workshop on the Assessment of Diagnostic Performance
73
Adjustments of R-squared measures in generalized linear models
Regression models are often used to screen for prognostic factors, even in
situations where the sample size is rather small compared to the number of
2 increases monotonically if covariates are added to the
covariates. As RD
model, the estimated predictive ability of that model can not decrease with
an increasing number of covariates, even if they are not correlated with the
interesting outcome at all. So unadjusted R-squared measures may be
substantially inflated, jeopardizing the ability to draw valid interpretations.
R-squared values of 30 percent or higher can easily be reached, even when
no association between independent and dependent variables exists at all.
E.g. with n=50 observations with normally distributed outcome and k
normally distributed covariates, R-squared increases up to 37 % with k=20
covariates when there is no association at all between the covariates and the
dependent variable (Table 1).
Table 1: Inflation of unadjusted R-squared measures in linear models (n=50)
1
k
2
D
R
2
<0.01 0.03
4
6
8
10
20
0.06
0.09
0.10
0.12
0.37
The use of adjusted R-squared measures which considers also the number of
parameters fitted is well established in linear regression models and is
defined by mean-squared-errors:
2
SS ,df
R
= 1-
(n - k - 1)-1Â i (yi - mˆi )2
(n - 1)-1Â i (yi - y )2
.
For generalized linear models with non-normal outcome (logistic, Poisson,
gamma and inverse-Gaussian models) corresponding corrections have been
studied (Mittlböck and Schemper, 1996, 1999; Cameron and Windmeijer,
1996; Waldhör et al., 1998; Mittlböck and Waldhör, 2000; Mittlböck, 2002;
Mittlböck and Heinzl, 2002; Heinzl and Mittlböck, 2002). A rather ad-hoc
bias adjustment is based on the analogy to the adjusted R-squared measure
2
RSS
, df of the linear model and is denoted by
74
1st European Workshop on the Assessment of Diagnostic Performance
D( y; mˆ ;f ) (n - k - 1)
D( y; y;f ) (n - 1)
Another type of adjustment of the deviance-based R-squared measure for
generalized linear models is based on shrinkage:
RD2 , df = 1 -
RD2 , g = g RD2 = 1 -
D( y; mˆ ; f) + k
D( y; mˆ ) + kf
=1,
D( y; y; f)
D( y; y )
where D( y; mˆ ) and D(y; y ) denote the unscaled deviances of the full model
and the intercept-only model, respectively. The notation “g” in the subscript
refers to the fact that R 2D,g is the shrunk R-squared measure and g is the
shrinkage factor (see Copas, 1983, 1987, 1997; Van Houwelingen and Le
Cessie, 1990). It is estimated by
gˆ =
ˆ -k
G
ˆ ,
G
ˆ = D( y; y; f) - D( y; mˆ ; f) is the log-likelihood ratio c2-statistic for
where G
testing whether any of the k fitted covariates are associated with the
response. The expectation of the adjusted R-squared measure R 2D,g
corresponds to the underlying population value. Note that E( R 2D, g ) = 0 if the
true R2 equals zero, that is, the covariates have no influence on the outcome.
The resulting adjustment coincides with the adjusted R-squared measure in
linear regression and with already proposed adjustments for R-squared
measures for some situations in generalized linear models (Mittlböck and
Waldhör, 2000; Mittlböck, 2002, Mittlböck and Heinzl, 2002; Heinzl and
Mittlböck, 2002).
2
RD2 , df is equal to RD,g
when f is estimated by the unscaled deviance D(y; m̂ )
divided by the degrees-of freedom (Heinzl and Mittlböck, 2003). Therefore,
in case of Poisson and logistic regression models, where usually f=1,
RD2 , df accounts implicitly for over- or under-dispersion, assuming the simple
modelling of over-dispersion by Var(Y)=fm. If f>1, over-dispersion is
modelled and if f<1, then an under-dispersed model is assumed, where f is
estimated by the deviance divided by degrees-of-freedom. Therefore RD2 , df
1st European Workshop on the Assessment of Diagnostic Performance
75
should only be used for logistic and Poisson regression models if ove r- or
under - dispersion is diagnosed and the model results (e.g. standard error of
the parameter estimates, test statistics and p-values) are adjusted accordingly,
otherwise results of RD2 , df will not agree with the fitted model.
Example for inverse-Gaussian regression models
The following example, discussed in Heinzl and Mittlböck (2002), should
2
illustrate the inappropriateness of RSS
for some generalized linear models.
Gruenberger et al. (2002) studied the survival of 212 patients treated with
intra-arterial chemotherapy for unresectable colorectal cancer liver
metastases. The authors were interested in the prognostic relevance of the
following 6 covariates on survival: 1) differentiation of the primary tumor
(MERGDIFF; levels: well/moderate/poor), 2) the percentage of hepatic
replacement by colorectal cancer liver metastases (PHR, levels: <25 % / 2550 % / >50-75 % / >75 %), 3) the number of colorectal cancer liver
metastases (METCAT; levels: <5 tumors / 5-10 tumors / >10 tumors), 4) the
presence of extrahepatic disease at laparotomy (EXHEPP; levels: no / yes),
5) carcinoembryonic antigen (CEA) response to hepatic arterial infusion
chemotherapy (CEARESPO; levels: non-responder / patient whose
postoperative CEA level returned to normal / non-secretor / responder) and
6) echogenicity (ECHO; levels: hypo / hyper). Only data of patients with a
treatment from 1992-1996 and complete data records are used. Since the
underlying illness is so critical, 85 of the 87 patients died before the end of
the study period. Thus only 2 censored overall survival time values exist,
which in order to simplify matters are treated as uncensored values.
Overall survival time was used as outcome variable in an inverse-Gaussian
(IG) regression model with reciprocal link. All six potential prognostic
factors were used as explanatory variables. Since we considered all prognostic
factors to be measured on a qualitative scale, the model consisted of 12
dummy variables. Only MERGDIFF appeared to be a statistically significant
76
1st European Workshop on the Assessment of Diagnostic Performance
prognostic factor in the IG regression model, the other 5 factors were
2 and its
statistically not significant. This fact is reflected in the values of RD
adjustments (second row of Table 2). The unadjusted value of 18 percent is
2 and its
corrected to values between 5 and 8 percent. The values for RSS
adjustments are negative.
Table 2: Deviance-based and sums-of-squares-based R-squared measure values in
percent for liver metastases data.
MERGDIFF
k
2
RD,g
2
RD
, df
2
RD,g
2
RSS
2
RSS
, df
not in model
10
11
-1
1
10
-2
in model
12
18
5
7
-18
-37
In a next step we wanted to know what happens, if the crucial prognostic
factor MERGDIFF is eliminated from the model. When refitting the IG
regression model we found that none of the remaining 5 prognostic factors
were statistically significant. The adjustments for the R-squared measures
work fine here: unadjusted values of around 10 percent are squeezed down
to values of around zero percent (first rows of Table 2). That is, these 5
prognostic factors cannot explain any variation in the overall survival times
when an IG regression model with reciprocal link is used.
Finally, the striking case of MERGDIFF in Table 2 reveals an unwanted
2 : adding a statistically significant prognostic factor to the IG
feature of RSS
2 . On top of that,
model can considerably decrease the observed value for RSS
this value can become even negative.
R-squared measures in over- or under-dispersed Poisson regressions
The variance for Poisson data is Var(Y)=m. However, over-or underdispersed models can be fitted by assuming that the variance function is
Var(Y)=fm. The scale parameter f is usually estimated by the deviance or the
Pearson c2 statistic divided by the degrees of freedom. Heinzl and Mittlböck
1st European Workshop on the Assessment of Diagnostic Performance
77
(2003) investigated the deviance-based pseudo R-squared measure and
corresponding adjustments for such over- and under-dispersed Poisson
regression models. The unadjusted
2 is not affected by over- or under-dispersion, but the results are too high
RD
2
for models with many covariates and/or few observations. RD,g
is a good R-
squared measure for models with no over- or under-dispersion, but it results
in too low values for under-dispersed and too high values for over-dispersed
models. If Var(Y)=fm, then an appropriate pseudo R-squared measure takes
into account that fπ1. Heinzl and Mittlböck (2003) have shown in a
simulation study, where pseudo-Poisson distributed random variables with
2
over- and under-dispersion were generated, that RD,
gf = 1-
D( y; mˆ ) + kf
is
D( y; y )
nearly unbiased. RD2 ,g D means that the estimation of f is based on the
deviance, and RD2 ,g P if the Pearson’s c2 statistic is used to estimate f. Both,
RD2 ,g D and RD2 ,g P behave rather satisfactorily in the simulation study. Part of
the results are illustrated in Figure 1, where boxplots of the distributions of
1000 simulated samples for unadjusted and adjusted pseudo R-squared
values (in percent) are shown for a true population value of 40 percent. A
rather extreme situation of a very small sample size of 16 and 5 covariates
is chosen for an under-dispersed (f=0.25), standard (f=1) and over-dispersed
(f=4) Poisson regression model. Poisson-distributed random values were
generated with mean m/f and multiplied with f, so that the resulting random
variable Y mimics an over- or under-dispersed Poisson-distribution with
mean m and variance mf, which works well for simulation purposes. The
mean m is usually expressed as m = exp(b 0 + b1 x1 + K + b k xk ) . Only the first
covariate x1 was assumed to influence the mean; the prognostic effect of all
other covariates was eliminated by setting b 2 = K = b k = 0 , and b0 was set
to an arbitrary value of 2.
78
1st European Workshop on the Assessment of Diagnostic Performance
R-squared measures of Poisson and Logistic regression models in
epidemiological studies
If the Poisson regression model can be used as an approximation to the
logistic regression model (which happens quite frequently in epidemiological
research), then the model results concerning estimates of the parameters,
RD2
2
2
RD,g
RD2 ,g D RD,g P
RD2
2
2
RD,g
RD2 ,g D RD,g P
RD2
2
2
RD,g
RD2 ,g D RD,g P
Fig. 1: Boxplots of the distributions of 1000 simulated samples for unadjusted and adjusted
pseudo R-squared values (in percent) for a sample size of 16, 5 covariates and
f=0.25, 1 and 4, respectively. Horizontal lines are drawn for true population value
of 40 percent.
standard errors, confidence intervals and p-values will be identical, however,
the corresponding R-squared estimates will differ dramatically (Mittlböck
and Heinzl, 2001). The Poisson model may give an R-squared estimate of,
say, 90 percent, which would indicate a nearly perfect prediction, and the
corresponding logistic model may achieve an R-squared value of, say, 10
percent, which is rather poor. This apparent contradiction of these results
becomes understandable, if we keep in mind, that the Poisson regression
model is used to estimate group means. For instance, the Poisson model
predicts for a group, say, 3.9 % cures. If in reality, say, 4.1 % cures are
observed, then the prediction will be very good with respect to this group.
On the other hand, the logistic regression model attempts a prognosis for the
individual patient. Naturally, it is much easier to predict that one out of 25
will be cured, than to predict who will be actually cured.
1st European Workshop on the Assessment of Diagnostic Performance
79
Conclusions
In summary, correctly adjusted R-squared values give essential information
additional to the usual modelling results as they try to quantify the knowledge
(or nescience) about the outcome variable of interest. Sums-of-squares
based R-squared measures can not be recommended in general. Proper
adjustments are necessary if models with small sample size and / or many
covariables are fitted.
References
Cameron, A.C. and Windmeijer, F.A.G. (1996). R2 measures for count data
regression models with applications to health-care utilization. Journal of
Business and Economic Statistics, V14, 209-220.
Copas, J. B. (1983). Regression, prediction and shrinkage (with discussion).
Journal of the Royal Statistical Society B, V45, 311-354.
Copas, J. B. (1987). Cross-Validation Shrinkage of Regression Predictors. Journal
of the Royal Statistical Society B, V49, 175-183.
Copas, J. B. (1997). Using regression models for prediction: shrinkage and
regression to the mean. Statistical Methods in Medical Research, V6, 167-183.
Gruenberger, T., Zhao, J., King, J., Chung, T., Clingan, P. R. & Morris, D. L.
(2002). Echogenicity of liver metastases from colorectal cancer is an independent
prognostic factor in patients treated with regional chemotherapy. Cancer, V94,
1753-1759.
Heinzl, H. and Mittlböck, M. (2002). R-squared measures for the inverse Gaussian
regression model. Computational Statistics, V17, 525-544.
Heinzl, H. and Mittlböck, M. (2003).
Pseudo R-squared measures for Poisson regression models with over- or
underdispersion. Computational Statistics and Data Analysis, V44, 253-271.
Mittlböck, M. (2002). Calculating adjusted R2 measures for Poisson regression
models. Comput.Methods Programs Biomed., V68, 205-214.
Mittlböck, M. & Heinzl, H. (2001). A note on R2 measures for Poisson and logistic
regression models when both models are applicable. Journal of Clinical
Epidemiology, V54, 99-103.
Mittlböck, M. & Heinzl, H. (2002). Measures of explained variation in gamma
regression models. Communications in Statistics - Simulation and Computation,
V31, 61-73.
Mittlböck, M. & Heinzl, H. (2003). Measures of Explained Variation for Nonnormal Outcomes. Proceedings of the Second Workshop on Research
80
1st European Workshop on the Assessment of Diagnostic Performance
Methodology, June 25-27, 2003, Amsterdam, 47-50.
Mittlböck, M. & Schemper, M. (1996). Explained variation for logistic regression.
Statistics in Medicine, V15, 1987-1997.
Mittlböck, M. & Schemper, M. (1999). Computing measures of explained variation
for logistic regression models.
Computer Methods and Programs in Biomedicine, V58, 17-24.
Mittlböck, M. & Waldhör, T. (2000). Adjustments for R2-measures for Poisson
regression models. Computational Statistics and Data Analysis, V34, 461472.
Van Houwelingen, J. C. & Le Cessie, S. (1990). Predictive value of statistical
models. Statistics in Medicine, V9, 1303-1325.
Waldhör, T., Haidinger, G. & Schober, E. (1998). Comparison of R2 measures for
Poisson regression by simulation. Journal of Epidemiology and Biostatistics,
V3, 209-215.