Math 141 Lecture 30: GLM Examples Albyn Jones1 1 Library 304 [email protected] www.people.reed.edu/∼jones/courses/141 Albyn Jones Math 141 Generalized Linear Models Binomial GLM: Logistic Regression Given X , Y ∼ Binomial(n, p) and log(odds) = log p 1−p = β0 + β1 X or equivalently: p= odds eβ0 +β1 X = 1 + odds 1 + eβ0 +β1 X Albyn Jones Math 141 Example: Death in the North Atlantic > titanic Surv N 1 20 23 2 192 862 3 1 1 4 5 5 5 140 144 6 57 175 7 13 13 8 11 11 9 80 93 10 14 168 11 14 31 12 13 48 13 76 165 14 75 462 Class Crew Crew First First First First Second Second Second Second Third Third Third Third Age Adult Adult Child Child Adult Adult Child Child Adult Adult Child Child Adult Adult Albyn Jones Sex Female Male Female Male Female Male Female Male Female Male Female Male Female Male Math 141 Additive Model Ignoring the Age category, we fit the additive model. Call: glm(cbind(Surv, N - Surv) ˜ Class + Sex, family = binomial, data = titanic) Coefficients: Estimate Std. Error z value (Intercept) 1.18740 0.15746 7.541 ClassFirst 0.88081 0.15697 5.611 ClassSecond -0.07178 0.17093 -0.420 ClassThird -0.77742 0.14231 -5.463 SexMale -2.42133 0.13909 -17.408 Albyn Jones Math 141 Interaction Model Call: glm(cbind(Surv, N - Surv) ˜ Class * Sex, family = binomial, data = titanic) Coefficients: (Intercept) ClassFirst ClassSecond ClassThird SexMale ClassFirst:SexMale ClassSecond:SexMale ClassThird:SexMale Estimate Std. Error z value 1.89712 0.61914 3.064 1.66535 0.80027 2.081 0.07053 0.68630 0.103 -2.06075 0.63551 -3.243 -3.14690 0.62453 -5.039 -1.05911 0.81959 -1.292 -0.63882 0.72402 -0.882 1.74286 0.65139 2.676 Albyn Jones Math 141 Interpretation of Coefficients: Third Class Males The interaction coefficient for Third Class Males was about 1.74. What does that mean? p = 1.897 + (−2.06) + (−3.14) + 1.74 log 1−p Add up the coefficients for the intercept (baseline group), the dummy variable for third class, the dummy variable for males, and the interaction term (third class and male). Albyn Jones Math 141 Compute predicted probabilities The interaction coefficient tells us that third class males did better than predicted by the additive model. Females must have done worse, else the additive model would fit well, and the effect would be entirely captured by the coefficient for the Third Class dummy variable. > P1 <- round(predict(T1,type="response"),3) > P2 <- round(predict(T2,type="response"),3) Albyn Jones Math 141 Compare predicted probabilities > data.frame(titanic, P1, P2) Surv N Class Age Sex 1 20 23 Crew Adult Female 2 192 862 Crew Adult Male 3 1 1 First Child Female 4 5 5 First Child Male 5 140 144 First Adult Female 6 57 175 First Adult Male 7 13 13 Second Child Female 8 11 11 Second Child Male 9 80 93 Second Adult Female 10 14 168 Second Adult Male 11 14 31 Third Child Female 12 13 48 Third Child Male 13 76 165 Third Adult Female 14 75 462 Third Adult Male Albyn Jones Math 141 P1 0.766 0.225 0.888 0.413 0.888 0.413 0.753 0.213 0.753 0.213 0.601 0.118 0.601 0.118 P2 0.870 0.223 0.972 0.344 0.972 0.344 0.877 0.140 0.877 0.140 0.459 0.173 0.459 0.173 Example: Death in the Snow From the alr3 package, the donner dataset, with 91 observations. The Donner Party was the most famous tragedy in the history of the westward migration in the United States. In the winter of 1846-47, about ninety wagon train emigrants were unable to cross the Sierra Nevada Mountains of California before winter, and almost one-half starved to death... These data include some information about each of the members of the party from Johnson (1996). Albyn Jones Math 141 The Donner Dataset Variables Age: Approximate age in 1846. Outcome: 1 if survived, 0 if died. Sex: Male or Female. Family.name: family name, hired or single. Status: Family, single or hired. Albyn Jones Math 141 First Try: 3-way interactions > donner.glm0 <- glm(Outcome ˜ Age*Sex*Status, data=donner,family=binomial) Coefficients: (3 not defined because of singularitie z Pr Est SE value (>|z|) <...omitted table entries...> SexMale:StatusHired -19.06 3956 -.005 0.99 SexMale:StatusSingle NA NA NA NA Age:SexMale:StatusHired NA NA NA NA Age:SexMale:StatusSingle NA NA NA NA Albyn Jones Math 141 What happend? ’Singularities’ means exact collinearity! > with(donner,table(Sex,Status)) Status Sex Family Hired Single Female 34 1 0 Male 34 17 5 There is no way to estimate the interaction between Sex and Status, let alone the higher order interactions. In fact, we have almost no data on Females outside of families. Albyn Jones Math 141 Second Try: Omit the troublesome terms glm(formula = Outcome ˜ Age * Sex + Age * Status, family = binomial, data = donner) Est (Intercept) 1.885 Age -0.046 SexMale -1.476 StatusHired 1.036 StatusSingle -17.975 Age:SexMale 0.036 Age:StatusHired -0.070 Age:StatusSingle 0.010 Albyn Jones Std. Error 0.67 0.02 0.84 1.96 14764.86 0.03 0.07 472.85 Math 141 z value Pr(>|z|) 2.810 0.005 -1.833 0.067 -1.755 0.079 0.527 0.598 -0.001 0.999 1.130 0.259 -0.889 0.374 0.000 1.000 Sequential Anova Table > anova(donner.glm0,test="Chisq") Analysis of Deviance Table Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL 87 120.86 Age 1 6.8368 86 114.02 0.00893 Sex 1 5.1509 85 108.87 0.02323 Status 2 5.8995 83 102.97 0.05235 Age:Sex 1 0.9841 82 101.98 0.32120 Age:Status 2 1.0131 80 100.97 0.60258 Albyn Jones Math 141 What Next? When dropping terms from complex models, start at the bottom of the anova table, that is drop interactions before single variables. Neither interaction appears statistically significant, so let’s drop the two-way interactions! Albyn Jones Math 141 Reduced Model: only main effects donner.glm1 <- glm(Outcome ˜ Age + Sex + Status, data=donner,family=binomial) > anova(donner.glm1,donner.glm0) Analysis of Deviance Table Model 1: Model 2: Resid. 1 2 Outcome ˜ Age + Sex + Status Outcome ˜ Age * Sex + Age * Status Df Resid. Dev Df Deviance 83 102.97 80 100.97 3 1.9971 Albyn Jones Math 141 Coefficients glm(Outcome ˜ Age + Sex + Status, data=donner,family=binomial) (Intercept) Age SexMale StatusHired StatusSingle Estimate Std. Error z value Pr(>|z|) 1.487 0.493 3.019 0.003 -0.028 0.015 -1.868 0.062 -0.728 0.517 -1.407 0.159 -0.599 0.628 -0.953 0.341 -17.456 1765.537 -0.010 0.992 Albyn Jones Math 141 Coefficients glm(Outcome ˜ Age + Sex + Status, data=donner,family=binomial) (Intercept) Age SexMale StatusHired StatusSingle Estimate Std. Error z value Pr(>|z|) 1.487 0.493 3.019 0.003 -0.028 0.015 -1.868 0.062 -0.728 0.517 -1.407 0.159 -0.599 0.628 -0.953 0.341 -17.456 1765.537 -0.010 0.992 Do we see anything peculiar here? Albyn Jones Math 141 Coefficients glm(Outcome ˜ Age + Sex + Status, data=donner,family=binomial) (Intercept) Age SexMale StatusHired StatusSingle Estimate Std. Error z value Pr(>|z|) 1.487 0.493 3.019 0.003 -0.028 0.015 -1.868 0.062 -0.728 0.517 -1.407 0.159 -0.599 0.628 -0.953 0.341 -17.456 1765.537 -0.010 0.992 Do we see anything peculiar here? Gigantic SE’s are another symptom of approximate confounding/collinearity! Albyn Jones Math 141 Reduce again! The non-Family dummy variables are not statistically significant. Combine? > Family <- donner$Status == "Family" > donner.glm2 <- glm(Outcome ˜ Age+Sex+Family, data=donner,family=binomial) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.59543 0.81523 0.730 0.4652 Age -0.02984 0.01519 -1.964 0.0495 SexMale -0.75663 0.51985 -1.455 0.1455 FamilyTRUE 0.94053 0.60548 1.553 0.1203 Albyn Jones Math 141 Check the Fine Print! Null deviance: 120.86 Residual deviance: 106.37 on 87 on 84 df df (3 observations deleted due to missingness) AIC: 114.37 Number of Fisher Scoring iterations: 4 > qchisq(.95,84) [1] 106.39484 Albyn Jones Math 141 Check the Fine Print! Null deviance: 120.86 Residual deviance: 106.37 on 87 on 84 df df (3 observations deleted due to missingness) AIC: 114.37 Number of Fisher Scoring iterations: 4 > qchisq(.95,84) [1] 106.39484 What does a large residual deviance mean? Albyn Jones Math 141 Final Points Analysis of Deviance Table Model 1: Model 2: Resid. 1 2 Outcome ˜ Age Outcome ˜ Age + Sex + Family Df Resid. Dev Df Deviance Pr(>Chi) 86 114.02 84 106.37 2 7.6468 0.02185 Sex and Family were strongly associated, so it isn’t clear we can sort out their individual contributions! There may be within-family correlation, so a mixed model might be in order! Albyn Jones Math 141 Poisson Generalized Linear Models the Poisson GLM: a loglinear model Given X , Y ∼ Poisson(µ) and log(µ) = β0 + β1 X or equivalently: µ = eβ0 +β1 X Albyn Jones Math 141 Poisson Regression Models: Interpretation Since the exponential function µ(X ) = eβ0 +β1 X = eβ0 eβ1 X is monotone increasing if β1 > 0 and decreasing if β1 < 0, qualitative interpretation is again completely analogous to interpreting coefficients for the linear model: Albyn Jones Math 141 Poisson Regression Models: Interpretation Since the exponential function µ(X ) = eβ0 +β1 X = eβ0 eβ1 X is monotone increasing if β1 > 0 and decreasing if β1 < 0, qualitative interpretation is again completely analogous to interpreting coefficients for the linear model: β1 > 0 implies positive association with X Albyn Jones Math 141 Poisson Regression Models: Interpretation Since the exponential function µ(X ) = eβ0 +β1 X = eβ0 eβ1 X is monotone increasing if β1 > 0 and decreasing if β1 < 0, qualitative interpretation is again completely analogous to interpreting coefficients for the linear model: β1 > 0 implies positive association with X β1 < 0 implies negative association with X Albyn Jones Math 141 Poisson Regression Models: Interpretation Since the exponential function µ(X ) = eβ0 +β1 X = eβ0 eβ1 X is monotone increasing if β1 > 0 and decreasing if β1 < 0, qualitative interpretation is again completely analogous to interpreting coefficients for the linear model: β1 > 0 implies positive association with X β1 < 0 implies negative association with X β1 = 0 implies no association with X Albyn Jones Math 141 Poisson Regression Models: Interpretation µ(X ) = eβ0 +β1 X = eβ0 eβ1 X The intercept term β0 determines the mean value when X = 0: µ(0) = eβ0 Albyn Jones Math 141 Summary GLM’s are a lot like linear models!! Albyn Jones Math 141
© Copyright 2026 Paperzz