logistic regression

Mathematics 243
Logistic Regressionn
May 5, 2014
1. Reponse variable: categorical with two levels (“success” and “failure”)
2. Example: MedGPA in Stat2Data package.
Acceptance
GPA
textttMCAT
Sex
0, 1
college GPA
score on MCAT test
M or F
data(MedGPA)
xyplot(Acceptance ~ GPA, data = MedGPA)
Acceptance
1.0
0.8
0.6
0.4
0.2
0.0
3.0
3.5
4.0
GPA
3. Two kinds of models: classification, likelihood
(a) The output of the model is: success or failure
(b) The output of the model is a proportion or probability
4. Problem: probability isn’t going to be linear in anything and it is bounded.
5. Solution: transformation, for all real numbers y
p=
ey
1 + ey
y = log
p
1−p
plotFun(logit(p) ~ p, xlim = c(0.05, 0.95))
plotFun(ilogit(y) ~ y, xlim = c(-3, 3))
3
0.8
1
ilogit(y)
logit(p)
2
0
−1
0.6
0.4
0.2
−2
−3
0.2
0.4
0.6
p
0.8
−2
−1
0
y
Chapel: Kindness and Goodness, Libby Huizenga (senior student)
1
2
Page 2
6. A general linear model.
(a) Fit a linear model y
∼
1 + x to get a “link” value.
(b) Transform the link value with a link function l(y) to get the predicted response.
logModel <- glm(Acceptance ~ GPA, data = MedGPA, family = binomial)
logModel
Call:
glm(formula = Acceptance ~ GPA, family = binomial, data = MedGPA)
Coefficients:
(Intercept)
-19.21
GPA
5.45
Degrees of Freedom: 54 Total (i.e. Null);
Null Deviance:
75.8
Residual Deviance: 56.8 AIC: 60.8
53 Residual
f <- makeFun(logModel)
xyplot(Acceptance ~ GPA, data = MedGPA)
plotFun(f(GPA) ~ GPA, add = T)
Acceptance
1.0
0.8
0.6
0.4
0.2
0.0
3.0
3.5
GPA
4.0