Proportional Odds Logistic Regression

Proportional Odds
Logistic Regression
stat 557
Heike Hofmann
Outline
• Proportional Odds Logistic Regression
• Model Definition
• Properties
• Latent Variables
• Intro to Loglinear Models
8.68.6 5.65.6
5.95.93.13.1 5.85.8
8 8 77
66 33
d.3>2.3
0.50.51.21.2 3.83.8
1616 1919
11 22
<2.318.5
18.5 16.9
16.9
0.50.51.81.8 2.22.2
1717 1 1
00 11
>2.314.5
14.5 3.13.1
bserved
andand
fitted
cellcell
counts
gives
anan
idea
ng
observed
fitted
counts
gives
ideaof ofthe
thesign
signofofthe
theresiduals.
residuals.
we
use use
the the
same
residuals
as as
before,
e.g.
on will
we will
same
residuals
before,
e.g.Pearsons
Pearsonsresiduals
residuals
Ordinal Response
•
55
33
33
For
For aa
oijo−
eijeij
ij −
√√ , ,
eijeij
asymptotic
vehave
asymptotic
distributions.
Y distributions.
is categorical
variable with J > 2 levels,
that have natural ordering
roportional
Odds
Model
portional
Odds
Model
Assume y1 < y2 < ... < yJ
ponse
variable
is ordinal,
take
a different
approachtotomodeling
modelingit:
it:based
basedon
onth
th
e variable
Y isYordinal,
we we
cancan
take
a different
approach
ty
(Yj ≤
j | for
x) cumulative
for
1, J
...,we
J log
we
define
cumulativeloglogodds
oddsasas
(YP≤
| x)
j =j1,=...,
define
thethe
cumulative
odds:
•
P (Y ≤ j | x)
πi (x) + ... + πj (x)
P
(Y
≤
j
|
x)
π
i
log
= log (x) + ... + πj (x)
log
=
1 − P (Y ≤ j | x) log πj+1 (x) + ... + πJ (x)
1 − P (Y ≤ j | x)
πj+1 (x) + ... + πJ (x)
ulative odds model
or proportionalodds
odds model
is then
given as
proportional
logistic
regression
ve odds model or proportional odds model is then given as
P (Y ≤ j | x)
logP (Y ≤ j | x)
= αj +�β � x, for j = 1, ..., J
1 − P (Y ≤ j | x)
log
= αj + β x, for j = 1, ..., J
1 − P (Y ≤ j | x)
es αj are ordered, i.e. αj1 ≤ αj2 for j1 < j2 : for j1 < j2 , the cumulative prob
αjordering:
are ordered,
j2 : Since
for j1the< logit
j2 , the
proba
P (Y i.e.
≤ j1 α|j1x)≤≤ αPj2(Yfor≤ jj12 <
| x).
is a cumulative
monotone increas
•
log
P (Y ≤ j | x)
πi (x) + ... + πj (x)
= log
1 − P (Y ≤ j | x)
πj+1 (x) + ... + πJ (x)
The cumulative odds model or proportional odds model is then given as
POLR - properties
log
P (Y ≤ j | x)
= αj + β � x, for j = 1, ..., J
1 − P (Y ≤ j | x)
The values αj are ordered, i.e. αj1 ≤ αj2 for j1 < j2 : for j1 < j2 , the cumulative probabilities have
the same ordering: P (Y ≤ j1 | x) ≤ P (Y ≤ j2 | x). Since the logit is a monotone increasing function,
logit P (Y ≤ j1 | x) ≤ logit P (Y ≤ j2 | x). Using the above model hypothesis, this implies αj1 ≤ αj2 .
The curves for the estimated probabilities are shifts to the left for higher levels of Y , because for continuous
variable X and β �= 0:
• Intercepts follow ordering of levels:
j | x)α=2logit
≤ kα
| xJ−1
− (α −
)/β),
−∞ = α0logit<P (Y
α1≤ ≤
≤ P...(Y ≤
<α α
J =∞
k
j
since P (Y ≤ k | x − (αk − αj )/β) = αk + β · (x − (αk − αj )/β) = αk − αk + αj + βx, i.e. for j < k the curve
for P (Y ≤ k) is the same
curve
as for P (Y ≤ j translated by (αk − αj )/β units in direction X.
2.0
4
6
8
10
m
0.0
0.2
0.4
0.6
0
0.8
1
x1.0
Estimated
E
P(Y
P
M
0(Y
.2
.4
.6
.8
stimated
<=
1)3)
1) Probabilities
2)
0.8
1.0
L(λ; x) = −nλ + log(λ) ·
Estimated Probabilities
n
�
xi
i=1
0.6
� �
�−1 !
∂L � yi − E[Yi ]
=
xij g (πi )
=0
∂β
V ar(Yi )
P(Y <= 2)
P(Y <=
1)1)
0.2
0.4
P(Y <= 3)
i
0.0
•
Estimated probabilities:
µi = E[Yi ]
Curves for� P(Y≤j) are shifts1 to the left from P(Y≤1)
g (πi ) = −
(1 − π59i ) log(1 − πi )
0
2
4
6
x
8
10
Proportional Odds
• cumulative odds ratio:
The cumulative odds ratios are defined as
logit P (Y ≤ j | x1 ) − logit P (Y ≤ j | x2 )
= αj + β � x1 − αj − β � x2 =
P (Y ≤ j | x1 )P (Y > j | x2 )
= β � (x1 − x2 ) =
,
P (Y > j | x1 )P (Y ≤ j | x2 )
i.e. the odds of a response ≤ j at x1 compared to the odds of response ≤ j at x2 is exp (β � (x1 − x2 )),
the odds ratio only depends on the distance, more specifically, for the log odds ratio is proportional to
(this property gives the model its name).
β
The estimated response probabilities πjˆ(x) are:
• for univariate, continuous X: each unit
increase in x increases odds of Y≤ j by e
πjˆ(x) = P (Y ≤ j | x) − P (Y ≤ j − 1 | x).
Then
exp (αj + β � x)
.
P (Y ≤ j | x) =
1 + exp (αj + β � x)
Happiness
happy.age <- polr(happy~poly(age,4)*sex, data=na.omit(happy[,c("happy","age","sex")]))
1.0
Estimated Probabilities
0.8
variable
0.6
not.too.happy
pretty.happy
very.happy
sex
0.4
female
male
0.2
0.0
20
30
40
age
50
60
70
80
Housing Data
Residents of Copenhagen surveyed:
Perceived Influence
600
600
500
500
400
400
count
count
Satisfaction
300
300
200
200
100
100
0
0
Low
Medium
Sat
High
Low
Medium
Infl
High
Contact with Neighbors
700
600
800
500
count
count
600
400
400
300
200
200
100
0
0
Low
Cont
High
Apartment
Atrium
Type
Terrace
Tower
Modelling Housing Data
library(MASS)
help(polr)
house.null <- polr(Sat~1, data=housing, weights=Freq)
house.twoway <- polr(Sat~Infl*Type*Cont-Infl:Type:Cont, weights=Freq, data=housing)
house.main <- polr(Sat~Infl+Type+Cont, weights=Freq, data=housing)
house.full <- polr(Sat~Infl*Type*Cont, weights=Freq, data=housing)
anova(house.null, house.main, house.twoway, house.full)
Likelihood ratio tests of ordinal regression models
Response: Sat
Model Resid. df Resid. Dev
Test
1
1
1679
3648.878
2
Infl + Type + Cont
1673
3479.149 1 vs 2
3 Infl * Type * Cont - Infl:Type:Cont
1662
3448.583 2 vs 3
4
Infl * Type * Cont
1656
3446.458 3 vs 4
Df
LR stat.
Pr(Chi)
6 169.728328 0.000000000
11 30.565974 0.001290500
6
2.124985 0.907852174
Modelling Housing Data
Model
parameters
df
Deviance
Null
2
17+6
3648.878
Main Effects
2+6
6+11
3479.149
2-way
8+11
6
3448.583
full
19+6
0
3446.458
Re-fitting to get Hessian
Call:
polr(formula = Sat ~ Infl * Type * Cont - Infl:Type:Cont, data = housing,
weights = Freq)
Coefficients:
Value Std. Error t value
InflMedium
0.96590
0.2097 4.6054
InflHigh
1.58008
0.2353 6.7145
TypeAtrium
0.72480
0.2926 2.4773
TypeTerrace
0.12959
0.2962 0.4376
TypeTower
1.18063
0.2386 4.9488
ContHigh
0.49799
0.1943 2.5631
InflMedium:TypeAtrium -0.43450
0.3150 -1.3795
InflHigh:TypeAtrium
-0.90645
0.3578 -2.5332
InflMedium:TypeTerrace -0.35442
0.3004 -1.1800
InflHigh:TypeTerrace
-0.04487
0.3828 -0.1172
InflMedium:TypeTower
-1.08932
0.2686 -4.0558
InflHigh:TypeTower
-0.69573
0.3365 -2.0674
InflMedium:ContHigh
-0.02763
0.2181 -0.1267
InflHigh:ContHigh
0.05679
0.2600 0.2184
TypeAtrium:ContHigh
-0.21621
0.2882 -0.7503
TypeTerrace:ContHigh
-0.73529
0.2837 -2.5922
TypeTower:ContHigh
0.06047
0.2417 0.2501
Intercepts:
Value
Std. Error t value
Low|Medium
0.3799 0.1719
2.2102
Medium|High 1.5852 0.1763
8.9937
Residual Deviance: 3448.583
AIC: 3486.583
Model
Selection
All 2-way interactions
Remove single 2-way
interaction term
> anova(house.wotc,house.twoway)
Likelihood ratio tests of ordinal regression models
Response: Sat
Model Resid. df Resid. Dev
Test
1 Infl + Type + Cont + Infl:Type + Infl:Cont
1665
3456.508
2
Infl * Type * Cont - Infl:Type:Cont
1662
3448.583 1 vs 2
Df LR stat.
Pr(Chi)
3 7.924569 0.04759664
> anova(house.woti,house.twoway)
Likelihood ratio tests of ordinal regression models
Response: Sat
Model Resid. df Resid. Dev
Test
1 Infl + Type + Cont + Infl:Cont + Type:Cont
1668
3470.301
2
Infl * Type * Cont - Infl:Type:Cont
1662
3448.583 1 vs 2
Df LR stat.
Pr(Chi)
6 21.71739 0.001362226
> anova(house.woci,house.twoway)
Likelihood ratio tests of ordinal regression models
Response: Sat
Model Resid. df Resid. Dev
Test
1 Infl + Type + Cont + Infl:Type + Type:Cont
1664
3448.695
2
Infl * Type * Cont - Infl:Type:Cont
1662
3448.583 1 vs 2
Df
LR stat.
Pr(Chi)
2 0.1115975 0.9457294
Non-significant 2-way
interaction
xtabs(Freq ~ Infl + Cont + Sat, data = housing)
Low
High
Medium
Low
High
Medium
Sat
Low
Low
Infl
High
Low
High
High
All of these differences
of proportions are
similar (visual estimates
of conditional log odds)
Significant 2-way interaction
xtabs(Freq ~ Type + Infl + Sat, data = housing)
Apartment
Medium
High
Atrium
Low MediumHigh
Low
Terrace
Medium High
Low
Tower
Medium
High
High
Medium
Sat
Low
Low
Type
Tower has different pattern from other housing type
(low and medium influence is about the same), Atrium
(medium and high influence is about the same)
Infl
Medium
Apartment
Low
Low
Low
High
High
Apartment
Low
Low
Medium
High
Infl
Atrium
High
MediumLow
Medium
High
Low High
Type
High
Low
Medium
High
Atrium
High
MediumLow
Apartment
Low
Tower
Terrace
Medium Low High
Medium Low
Terrace
High
Medium Low
Medium
Type*Infl
High
Tower
Medium Low
High
Low
High
Type
Atrium
High
MediumLow
Low
High
Terrace
High Tower
Medium Low High
Medium Low
Type
Housing Models
Raw Data
Type*Infl + Type*Cont
High
Low High
Low
Low
High
Low
Medium
High
Infl
High
Low High
Latent Variables
•
Assume Y* is (unobserved) continuous response
α1 α2
α3
α4
•
•
observed Y with Y = j, for Y* in (αj-1, αj)
•
•
•
Assume: Y*|X ~ G some continuous distribution
Then:
α5
P(Y ≤ j|X) = P(Y* < αj|X)
Then G-1 serves as link function
β, effect of X, is independent of cutoffs (same as for
continuous variable)
−∞ = α0 < α1 < α2 < ... < αJ−1 < αJ = ∞
Cumulative Link
Function
Then Y = j if Y ∗ ∈ (αj−1 , αj ] and
P (Y ≤ j | x) = P (Y ∗ < αj | x)
In the case of two continuous variables X and Y ∗ a linear relationship would be visualized in a scatterplot:
from left to right a linear relationship between X and Y is shown, i.e Y = X + �; the difference between the
pictures is the distribution of the error term �: on the left error terms are standard normal, in the middle
they are uniform, on the right they have an extreme value distribution.
10
8
6
4
2
0
m
12
1
14
X
Y
M
0
2
4
25
4
6
8
10
5
0
m
1
15
X
Y
M
0
Y
4
5
6
Y
5
0
0
2
0
Y
8
10
10
10
12
15
14
20
4
6
8
1
10
5
0
m
X
Y
M
(2)
0
2
4
6
8
10
0
2
4
X
6
8
10
0
2
X
4
6
8
10
X
Y|X ~ G
In each case, we can describe the conditional distribution of Y given X by Y − η(x) ∼ G, where G is a
continuous distribution, and η(x) is a location parameter. In the model, η(x) = β � x.
The cumulative probability in equation 2 then can be written as:
G
Model
Depending Pon
(Y ≤the
j | x) = P (Y < α | x) = G(α −normal
η(x)) = G(α
distribution
offunction
G we
get (with parameter −β):
Then
G serves as a link
in a GLM
logistic
different models: G (P (Y ≤ j | x)) = α − β x
∗
j
j
j
−1
−1
j
�
extreme value
− β � x)
probit
polr
(3)
hazard
With G−1 = logit the model in equation 3 is the proportional odds model - then G isproportional
the standard logistic
distribution
distribution.
If the errors are considered to be standard normal, i.e. G = Φ, the model in equation 3 is the probit model,
if G is the extreme value distribution, i.e.
Loglinear Models
Loglinear Models
• Instead of relationship between X variables
and response Y, loglinear model do not
single out response variable
• Model structure between variables
categorical variables X
Definition
r Model
2d loglinear Model
Assume data is in I by J contingency table of
•
the cell counts of an I × J contingency table of X X and
X and Y (not the response!)
ear model assumes
that
these
cell
counts
come
from
IJ
Let mij be cell count of cell (i,j) - assume
•
nt Poisson cell
distributed
variables
ij ∼ Poµij . Therefore
counts come
from IJMindependent
µij .
variables Mij ~ Poµij with E[Mij] = µij
Y
XY
log mij = λ + λX
+
λ
+
λ
i
j
ij
near model assumes that these cell counts come from IJ
ent Poisson distributed variables Mij ∼ Poµij . Therefore
µij .
Interpretation of effects
Y
XY
log mij = λ + λX
+
λ
+
λ
i
j
ij
• For binary variables X and Y
using baseline effects,
i.e. all first effects are 0
• λ = log m
Y=0 Y=1
X=0
m00
m01
X=1
m10
m11
00
λ1X = log m10 - log m00
λ1Y = log m01 - log m00
λ11XY = log (m11m00Loglinear
)/(m01Models
m10)
( Fall 2008)
October