Compare observed and fitted proportions Categorize/Group the continuous x-variables into intervals. Calculate the observed rate of success in each interval. Compare with the estimated proportion of success from the model. Observed and predicted probabilities 1 prop 0.8 0.6 0.4 0.2 0 0 20 40 60 80 100 x See towards the end of f8a.R for an example. 1/5 Residuals in logistic regression Pearson residuals Simple standardization, since Yi ∼ Bin(1, pi ) with E(Yi ) = pi and V (Yi ) = pi (1 − pi ): Yi − p̂i r̃i = p p̂i (1 − p̂i ) ( N (·, ·) not even asymptotically!) In R: infl<-influence(model) then infl$pear.res. However in general the problem with residual analysis for logistic regression is that such plots are not very revealing because of the binary nature of Y . Also the Leverage values, i.e. the diagonal elements of P = X(X 0 W X)−1 X 0 W are now depending both on X and Y and as such these are no more indicators of outliers w.r.t X. 2/5 Residuals in logistic regression Standardized residuals Take leverages vii from the diagonal elements of P = X(X 0 W X)−1 X 0 W : Yi − pˆi ≈ N (0, 1) ri = p pˆi (1 − pˆi )(1 − vii ) (for large n) If |ri | > |λα/2 | it might be considered suspiciously large. Plots of ri vs i can be useful, although it’s sometimes more revealing to plot their squares, e.g. ri2 vs i. Notice vii can be obtained using infl<-influence(model) then infl$hat. 3/5 Residual plots 8 6 r_i^2 4 2 0.4 0 0.0 0 20 40 60 80 100 0 20 x 40 60 80 100 i : ri2 0 −2 r_i 2 4 : Data, p̂i and 95% CI −4 data$y 0.8 The residuals in logistic regression always have a pattern but with few extreme values! 0 20 40 60 80 100 i : ri and 95% CI 4/5 Cook’s distance There is a version of Cook’s distance for logistic regression: DiCook = ri2 vii · p + 1 1 − vii 0.0 0.4 D_i 0.8 1.2 Hosmer & Lemeshow consider influential cases those with 0 DiCook > 1. 20 40 60 80 100 i 5/5
© Copyright 2024 Paperzz