Binary Dependent Variables 2 Linear probability model

John Coglianese
OH Monday 10:30am - 12:30pm Littauer M-42
Section 4: Binary dependent variables
1
Definition: Binary Dependent Variables
Yi takes on only two values, 0 and 1. Yi is called a binary dependent variable. Example: We
observe whether survey respondents engaged in an extramarital affair or not:
(
1, if affair
Yi =
.
(1)
0, if no affair
Since our standard regression function is fitting a conditional expectation, the population regression
function is the probability that Y = 1, conditional on the regressors:
E[Yi |Xi ] = 1 · P (Yi = 1|Xi ) + 0 · P (Yi = 0|Xi ) = P (Yi = 1|Xi ).
(2)
Note that 0 ≤ P (Yi = 1|Xi ) ≤ 1 because this is a probability.
2
Linear probability model
• The LPM is just the standard OLS framework for regressing Yi on Xi ’s:
Yi = β0 + β1 X1i + . . . + βK XKi + ui .
• How do we interpret the coefficients? A unit increase in X1 is associated with a 100 ∗ β1
percentage points increase in the probability that Y is equal to one.
• Example: We want to regress affairs on gender, age, years of marriage, children (yes or no),
religious views, education and the happiness of the spouses with their marriage.
. reg affair male age ymarriage children religious yeducation rate_marriage, r
Linear regression
Number of obs
F( 7,
593)
Prob > F
R-squared
Root MSE
=
=
=
=
=
601
9.61
0.0000
0.1062
.41189
------------------------------------------------------------------------------|
Robust
affair |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
--------------+---------------------------------------------------------------male |
.0517161
.0389454
1.33
0.185
-.0247716
.1282038
age |
-.007315
.0031355
-2.33
0.020
-.013473
-.0011571
ymarriage |
.0160817
.0056303
2.86
0.004
.0050239
.0271394
children |
.0501878
.044996
1.12
0.265
-.038183
.1385586
religious | -.0539744
.0153308
-3.52
0.000
-.0840837
-.0238651
yeducation |
.0048674
.0078468
0.62
0.535
-.0105436
.0202783
rate_marriage | -.0877282
.0170375
-5.15
0.000
-.1211894
-.0542669
_cons |
.7296495
.1617954
4.51
0.000
.4118877
1.047411
-------------------------------------------------------------------------------
John Coglianese
OH Monday 10:30am - 12:30pm Littauer M-42
Q1: What is the predicted probability of having an affair for a religious female of age 60
with 2 years of marriage, no children, 0 years of education, and very happy about her
marriage (rate marriage = 5)?
P̂ (Yi = 1) =
Q2: What if we increase her education by 10 years?
P̂ (Yi = 1) =
Q3: What’s the change in predicted probabilities of having an affair?
Q4: What are some possible problems with the linear probability model?
To avoid the problem of having negative or greater than 1 probabilities, we can use probit
or logit models instead of LPM.
3
Probit model
• The Probit model fits a nonlinear regression by assuming
P (Yi = 1|Xi ) = Φ(β0 + β1 X1i + . . . + βk Xki ),
(3)
where Φ(·) is the cumulative distribution function (CDF) of the standard normal distribution. z = β0 + β1 X1i + . . . + βk Xki is the z-score.
• Interpretation of the coefficients: β1 is the effect on the z-score of a unit change in X1
holding all the other regressors constant.
• Measure of fit: pseudo − R2
. probit affair male age ymarriage children religious yeducation rate_marriage, r
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
log
log
log
log
pseudolikelihood
pseudolikelihood
pseudolikelihood
pseudolikelihood
Probit regression
Log pseudolikelihood = -305.25271
=
=
=
=
-337.68849
-305.48705
-305.25272
-305.25271
Number of obs
Wald chi2(7)
Prob > chi2
Pseudo R2
=
=
=
=
601
54.93
0.0000
0.0961
------------------------------------------------------------------------------|
Robust
John Coglianese
OH Monday 10:30am - 12:30pm Littauer M-42
affair |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
--------------+---------------------------------------------------------------male |
.1888131
.1319274
1.43
0.152
-.06976
.4473861
age | -.0243995
.0111239
-2.19
0.028
-.0462019
-.0025971
ymarriage |
.0546086
.018964
2.88
0.004
.0174399
.0917773
children |
.2080876
.1662414
1.25
0.211
-.1177397
.5339148
religious | -.1860863
.0532404
-3.50
0.000
-.2904355
-.0817371
yeducation |
.015506
.0263552
0.59
0.556
-.0361493
.0671612
rate_marriage | -.2727109
.0533917
-5.11
0.000
-.3773567
-.168065
_cons |
.7641333
.5343343
1.43
0.153
-.2831427
1.811409
-------------------------------------------------------------------------------
• Calculate changes in the predicted probabilities.
Q5: What is the predicted probability of having an affair for a religious female of age 60
with 2 years of marriage, no children, 0 years of education, and very happy about her
marriage (rate marriage = 5)?
P̂ (Yi = 1) =
Q6: What if we increase her education by 10 years?
P̂ (Yi = 1) =
Q7: What’s the change in predicted probabilities of having an affair?
4
Logit model
• The Logit model makes a different assumption about the CDF:
P (Yi = 1|Xi ) =
1
1 + e−(β0 +β1 X1i +...+βk Xki )
≡ F (β0 + β1 X1i + . . . + βk Xki ),
(4)
where F (·) is the CDF of the logistic distribution.
• Interpretation of the coefficients: β1 is the effect on the Logit function input of a unit change
in X1 holding all the other regressors constant. That is, not very interpretable, use afterbefore method for interpretation.
• Measure of fit: pseudo − R2
. logit affair male age ymarriage children religious yeducation rate_marriage, r
Iteration 0:
Iteration 1:
Iteration 2:
log pseudolikelihood = -337.68849
log pseudolikelihood = -305.9881
log pseudolikelihood = -304.85286
John Coglianese
Iteration 3:
Iteration 4:
OH Monday 10:30am - 12:30pm Littauer M-42
log pseudolikelihood = -304.84845
log pseudolikelihood = -304.84845
Logistic regression
Log pseudolikelihood = -304.84845
Number of obs
Wald chi2(7)
Prob > chi2
Pseudo R2
=
=
=
=
601
53.00
0.0000
0.0972
------------------------------------------------------------------------------|
Robust
affair |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
--------------+---------------------------------------------------------------male |
.316425
.2290355
1.38
0.167
-.1324763
.7653262
age |
-.043737
.0190218
-2.30
0.021
-.0810191
-.0064548
ymarriage |
.0952297
.0323235
2.95
0.003
.0318767
.1585827
children |
.3791783
.293144
1.29
0.196
-.1953733
.95373
religious | -.3254421
.0924561
-3.52
0.000
-.5066527
-.1442314
yeducation |
.0305853
.0451684
0.68
0.498
-.0579432
.1191137
rate_marriage | -.4700082
.0915246
-5.14
0.000
-.6493931
-.2906234
_cons |
1.336659
.9258695
1.44
0.149
-.4780117
3.15133
-------------------------------------------------------------------------------
• Calculate changes in the predicted probabilities.
Q8: What is the predicted probability of having an affair for a religious female of age 60
with 2 years of marriage, no children, 0 years of education, and very happy about her
marriage (rate marriage = 5)?
P̂ (Yi = 1) =
Q9: What if we increase her education by 10 years?
P̂ (Yi = 1) =
Q10: What’s the change in predicted probabilities of having an affair?
John Coglianese
OH Monday 10:30am - 12:30pm Littauer M-42