John Coglianese OH Monday 10:30am - 12:30pm Littauer M-42 Section 4: Binary dependent variables 1 Definition: Binary Dependent Variables Yi takes on only two values, 0 and 1. Yi is called a binary dependent variable. Example: We observe whether survey respondents engaged in an extramarital affair or not: ( 1, if affair Yi = . (1) 0, if no affair Since our standard regression function is fitting a conditional expectation, the population regression function is the probability that Y = 1, conditional on the regressors: E[Yi |Xi ] = 1 · P (Yi = 1|Xi ) + 0 · P (Yi = 0|Xi ) = P (Yi = 1|Xi ). (2) Note that 0 ≤ P (Yi = 1|Xi ) ≤ 1 because this is a probability. 2 Linear probability model • The LPM is just the standard OLS framework for regressing Yi on Xi ’s: Yi = β0 + β1 X1i + . . . + βK XKi + ui . • How do we interpret the coefficients? A unit increase in X1 is associated with a 100 ∗ β1 percentage points increase in the probability that Y is equal to one. • Example: We want to regress affairs on gender, age, years of marriage, children (yes or no), religious views, education and the happiness of the spouses with their marriage. . reg affair male age ymarriage children religious yeducation rate_marriage, r Linear regression Number of obs F( 7, 593) Prob > F R-squared Root MSE = = = = = 601 9.61 0.0000 0.1062 .41189 ------------------------------------------------------------------------------| Robust affair | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------male | .0517161 .0389454 1.33 0.185 -.0247716 .1282038 age | -.007315 .0031355 -2.33 0.020 -.013473 -.0011571 ymarriage | .0160817 .0056303 2.86 0.004 .0050239 .0271394 children | .0501878 .044996 1.12 0.265 -.038183 .1385586 religious | -.0539744 .0153308 -3.52 0.000 -.0840837 -.0238651 yeducation | .0048674 .0078468 0.62 0.535 -.0105436 .0202783 rate_marriage | -.0877282 .0170375 -5.15 0.000 -.1211894 -.0542669 _cons | .7296495 .1617954 4.51 0.000 .4118877 1.047411 ------------------------------------------------------------------------------- John Coglianese OH Monday 10:30am - 12:30pm Littauer M-42 Q1: What is the predicted probability of having an affair for a religious female of age 60 with 2 years of marriage, no children, 0 years of education, and very happy about her marriage (rate marriage = 5)? P̂ (Yi = 1) = Q2: What if we increase her education by 10 years? P̂ (Yi = 1) = Q3: What’s the change in predicted probabilities of having an affair? Q4: What are some possible problems with the linear probability model? To avoid the problem of having negative or greater than 1 probabilities, we can use probit or logit models instead of LPM. 3 Probit model • The Probit model fits a nonlinear regression by assuming P (Yi = 1|Xi ) = Φ(β0 + β1 X1i + . . . + βk Xki ), (3) where Φ(·) is the cumulative distribution function (CDF) of the standard normal distribution. z = β0 + β1 X1i + . . . + βk Xki is the z-score. • Interpretation of the coefficients: β1 is the effect on the z-score of a unit change in X1 holding all the other regressors constant. • Measure of fit: pseudo − R2 . probit affair male age ymarriage children religious yeducation rate_marriage, r Iteration Iteration Iteration Iteration 0: 1: 2: 3: log log log log pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood Probit regression Log pseudolikelihood = -305.25271 = = = = -337.68849 -305.48705 -305.25272 -305.25271 Number of obs Wald chi2(7) Prob > chi2 Pseudo R2 = = = = 601 54.93 0.0000 0.0961 ------------------------------------------------------------------------------| Robust John Coglianese OH Monday 10:30am - 12:30pm Littauer M-42 affair | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------male | .1888131 .1319274 1.43 0.152 -.06976 .4473861 age | -.0243995 .0111239 -2.19 0.028 -.0462019 -.0025971 ymarriage | .0546086 .018964 2.88 0.004 .0174399 .0917773 children | .2080876 .1662414 1.25 0.211 -.1177397 .5339148 religious | -.1860863 .0532404 -3.50 0.000 -.2904355 -.0817371 yeducation | .015506 .0263552 0.59 0.556 -.0361493 .0671612 rate_marriage | -.2727109 .0533917 -5.11 0.000 -.3773567 -.168065 _cons | .7641333 .5343343 1.43 0.153 -.2831427 1.811409 ------------------------------------------------------------------------------- • Calculate changes in the predicted probabilities. Q5: What is the predicted probability of having an affair for a religious female of age 60 with 2 years of marriage, no children, 0 years of education, and very happy about her marriage (rate marriage = 5)? P̂ (Yi = 1) = Q6: What if we increase her education by 10 years? P̂ (Yi = 1) = Q7: What’s the change in predicted probabilities of having an affair? 4 Logit model • The Logit model makes a different assumption about the CDF: P (Yi = 1|Xi ) = 1 1 + e−(β0 +β1 X1i +...+βk Xki ) ≡ F (β0 + β1 X1i + . . . + βk Xki ), (4) where F (·) is the CDF of the logistic distribution. • Interpretation of the coefficients: β1 is the effect on the Logit function input of a unit change in X1 holding all the other regressors constant. That is, not very interpretable, use afterbefore method for interpretation. • Measure of fit: pseudo − R2 . logit affair male age ymarriage children religious yeducation rate_marriage, r Iteration 0: Iteration 1: Iteration 2: log pseudolikelihood = -337.68849 log pseudolikelihood = -305.9881 log pseudolikelihood = -304.85286 John Coglianese Iteration 3: Iteration 4: OH Monday 10:30am - 12:30pm Littauer M-42 log pseudolikelihood = -304.84845 log pseudolikelihood = -304.84845 Logistic regression Log pseudolikelihood = -304.84845 Number of obs Wald chi2(7) Prob > chi2 Pseudo R2 = = = = 601 53.00 0.0000 0.0972 ------------------------------------------------------------------------------| Robust affair | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------male | .316425 .2290355 1.38 0.167 -.1324763 .7653262 age | -.043737 .0190218 -2.30 0.021 -.0810191 -.0064548 ymarriage | .0952297 .0323235 2.95 0.003 .0318767 .1585827 children | .3791783 .293144 1.29 0.196 -.1953733 .95373 religious | -.3254421 .0924561 -3.52 0.000 -.5066527 -.1442314 yeducation | .0305853 .0451684 0.68 0.498 -.0579432 .1191137 rate_marriage | -.4700082 .0915246 -5.14 0.000 -.6493931 -.2906234 _cons | 1.336659 .9258695 1.44 0.149 -.4780117 3.15133 ------------------------------------------------------------------------------- • Calculate changes in the predicted probabilities. Q8: What is the predicted probability of having an affair for a religious female of age 60 with 2 years of marriage, no children, 0 years of education, and very happy about her marriage (rate marriage = 5)? P̂ (Yi = 1) = Q9: What if we increase her education by 10 years? P̂ (Yi = 1) = Q10: What’s the change in predicted probabilities of having an affair? John Coglianese OH Monday 10:30am - 12:30pm Littauer M-42
© Copyright 2025 Paperzz