Econ 388 R. Butler 2014 revisions Lecture 14 Dummy Dependent Variables I. Linear Probability Model: the Regression model with a dummy variables as the dependent variable assumption, implication regular multiple regression setup ๐ฆ๐ = ๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ + ๐๐ E(yi|X) = ๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ linear probability model ๐ฆ๐ = ๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ + ๐๐ since ๐ธ(๐ฆ๐ |๐) = 1(๐๐๐๐(๐ฆ๐ = 1|๐) + 0(๐๐๐๐(๐ฆ๐ = 0|๐))=๐๐(๐ฆ๐ = 1|๐), ๐๐(๐ฆ๐ |๐) = ๐ธ(๐ฆ๐ |๐) = ๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ E( ๏ญ iX)=0 assumed true Still true here: ๐ธ(๐๐ |๐) = [๐๐๐๐(๐ฆ๐ = 1) โ (๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ )] + [๐๐๐๐(๐ฆ๐ = 0) โ (1 โ ๐ฝ0 + ๐ฝ1 ๐ฅ1๐ + ๐ฝ2 ๐ฅ2๐ )] = 0 V( ๏ญ i) is constant assumed true: V( ๏ญ i )= ๏ณ 2 linearity assumed to be a reasonable approximation Cannot be true: V( ๏ญ i )=pr(yi=1)×[1-( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2i)]2 + pr(yi=0)×[ -( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2i)]2 = [1-( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2 i)]× [ ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2i]=[1-prob(y=1)] ๏ด [prob(y=1)] solution: Weighted Least Squares Cannot be true: โunboundednessโ problem, Wooldridge section 7.5 solution: nonlinear probability equation Problems with the linear probability model: 1. Linearity assumption (mapping all those values of Xs into the (0,1) interval)โpicture, and why we use logistic regression and the big girls and boys used probits, logits 2. Heteroskedasticity: Weighted Least Squares Procedure for Linear Probability Model to handle this problem (another option will be to use the โrobustโ procedure in STATA). HETEROSKEDASTICITY ADJUSTMENT FOR LINEAR PROBABILITY MODELS ONLY: 1. run OLS and get the predicted value of y, call it "predictedโ, or Pi. 1 2. check if P i>1.0, then make ๐๐ =.999. *to keep the probability within bounds 0<P<1; check if ๐๐ <0, then make ๐๐ =.001. *to keep the probability within bounds 0<P<1; 3. compute 1 ๐๐ (1โ๐๐ ) for each observation and place the output in a column to be used as a "weight". In Stata and SAS you literally compute the predicted value of the dependent variable, and use a column of 1 ๐๐ (1โ๐๐ ) values as weights. 4. run weighted least squares regressions The Stata code for doing weighted least squares of the linear probability model is # delimit ; infile gpa tuce_scr psi a_grade using "e:\classrm_data\aldr_lpm.txt", clear; summarize; regress a_grade gpa tuce_scr psi; predict YHAT; replace YHAT=.999 if YHAT>=1; replace YHAT=.001 if YHAT<=0; gen WT = 1 / (YHAT*(1-YHAT)); list a_grade YHAT gpa tuce_scr psi; regress a_grade gpa tuce_scr psi [w=WT]; SAS code for the same problem; data one; infile "e:\classrm_data\aldr_lpm.txt" delimiter='09'x dsd truncover; * the option โdelimiter='09'x dsd truncoverโ is for tab delimited files; input gpa tuce_scr psi a_grade ; run; proc means; run; proc reg; model a_grade=gpa tuce_scr psi; output out=two p=yhat; run; data two; set two; if YHAT>=1 then YHAT=.999 ; if YHAT<=0 then YHAT=.001; WT = 1 / (YHAT*(1-YHAT)); run; proc print; var a_grade YHAT gpa tuce_scr psi; run; proc reg; model a_grade=gpa tuce_scr psi ; weight WT; run; [[fastfood.do: Restaurant regional sales manager wants to find out what determines the likelihood that each fast-food chain reached its quota of $6,500 in fast food sales. The restaurants are located in four different cities, and traffic flow on the street where the restaurant is located varies by location.]] 2 OR, you can just use the โrobustโ option to correct for heteroskedasticity (it may not be as efficient as the weighted least squares if you model the heteroskedasticity correctly), but it is robust to alternative forms of heteroskedasticity. ***Stata robust standard error option*** regress a_grade gpa tuce_scr psi, robust; ***SAS robust standard error option*** proc genmod; class id; model a_grade=gpa tuce_scr psi; repeated subject=id; run; III. The General Set-up for Binary Choice Models The outcome is zero or one, conditional on x (the observed characteristics). Hence binary choice models are Bernoulli processes (one, zero outcome, with probability fixed given x)โ the only difference with the usual Bernoulli processes you have studied (like flipping a coin) is that we are conditioning on x. Let ๐(๐ฆ = 1|๐ฅ) =probability of a โoneโ outcome given x, then we have the following: ๐(๐ฆ = 0|๐ฅ) = 1 โ ๐(๐ฆ = 1|๐ฅ); ๐ธ(๐ฆ|๐ฅ) = 1 (๐(๐ฆ = 1|๐ฅ)) + 0((1 โ ๐(๐ฆ = 1|๐ฅ))) = ๐(๐ฆ = 1|๐ฅ); and ๐๐๐(๐ฆ|๐ฅ) = 2 (๐(๐ฆ = 1|๐ฅ))(1 โ ๐(๐ฆ = 1|๐ฅ)) . There are different functional form choices for the ๐(๐ฆ = 1|๐ฅ) function, in particular, the following three are most popular: ๐(๐ฆ = 1|๐ฅ) = ๐บ(๐ฅ ๏ข ) Where x is the 1 x k vector of explaining variables, the first element of which is one (the intercept), and G(.) is some appropriate function. For the linear probability model, Linear probability model (LPM): ๐บ(๐ฅ๐ฝ) = ๐ฅ๐ฝ Logit: ๐บ(๐ฅ๐ฝ) = exp(๐ฅ๐ฝ) 1+exp(๐ฅ๐ฝ) = 1 1+exp(โ๐ฅ๐ฝ) x๏ข Probit: G(x ๏ข )= ๏ฒ ๏ฆ (๏ฎ )d๏ฎ where ๏ฆ (๏ฎ ) is the standard normal density function. ๏ญ๏ฅ ๏ถp( y | x) ๏ถG( x๏ข ) ๏ถG( z ) ๏ฝ ๏ฝ ๏ขj ๏ถx j ๏ถx j ๏ถz ๏ถG ( z ) ๏ถG ( z ) Where =1 for the linear probability model (LPM) and =๐บ(๐ง) โ (1 โ ๏ถz ๏ถz ๏ถG ( z ) ๐บ(๐ง)) = ๐๐๐๐(๐ฆ = 1) โ (1 โ ๐๐๐๐(๐ฆ = 1)) for the logit model, and = ๏ฆ ( z ) for ๏ถz the probit model (Leibniz rule for differentiation). To get the marginal effect for probit in Stata use the dprobit procedure: โdprobit a_grade gpa tuce_scr psi;โ To get the marginal effect logit in Stata add the โmfx computeโ command after the logit procedure as follows: โlogit a_grade gpa tuce_scr psi; mfx compute;โ More particular information follows: For all functions, the marginal effect is given by IV. Logistic Regression Model 3 Whereas the probability of a success (getting an A in the first example, or meeting your sales quota in the second example above) for the linear probability model is ๐๐๐๐(๐ฆ = 1) = ๐ฝฬ0 + ๐ฝฬ1 ๐ฅ1๐ + ๐ฝฬ2 ๐ฅ2๐ in the Logistic regression model it is Prob(y=1) = exp( ๏ขห0 ๏ซ ๏ขห1 x 1 ๏ซ ๏ขหx x 2 ) 1 ๏ซ exp( ๏ขห ๏ซ ๏ขห x ๏ซ ๏ขห x ) 0 1 1 2 2 which complicates things in two ways: a. the estimation is โnon-linear,โ and based on searching for the best estimates rather than getting the estimates directly from a simple set of calculations (as we do in OLS). The estimation technique is known as maximum likelihood estimation, and it has good properties for moderately large and large samples (not only the tests, but the estimators are nice in large samples). b. the interpretation of the coefficients is somewhat different then for OLS estimates. In particular, to find the impact of increasing Xi by one unit on prob(y=1) we need to multiply ๐ฝฬ๐ , the estimated coefficient, by โprob(y=1)*(1 - prob(y=1))โ as follows marginal effect= change in prob( y ๏ฝ 1) ๏ฝ ๏ขหi ๏ prob( y ๏ฝ 1) ๏ [1 ๏ญ prob( y ๏ฝ 1)] change in x i Do the logistic regressions for the samples above, and compare the resulting coefficients. STATA: aldr_logit.do # delimit ; infile gpa tuce_scr psi a_grade using "e:\classrm_data\aldr_lpm.txt", clear; summarize; logit a_grade gpa tuce_scr psi; SAS: proc logistic descending; model y=x1 x2 ; run;. Probit analysis is another way to model dichotomous choices (i.e., the probability of a success). It is also nonlinear and based on slightly different distributional assumptions (namely, the cumulative normal distribution assumption). We will discuss these models further in the next lecture. To get the marginal effects in Stata for probits and logits use the margins command as indicated: probit a_grade gpa tuce_scr psi margins, dydx(gpa tuce_scr psi) ((Note that the dprobit option in Stata gives you the marginal effects at the means, not quite as accurate for most BYU research purposes, as the ones given by the margins command above (marginal effects computed for every observation, and then averaged).)) 4 To get the marginal effect logit in Stata add the โmargins, dydx(.)โ command again after the logit procedure as follows: logit a_grade gpa tuce_scr psi margins, dydx(gpa tuce_scr psi) To get the marginal effect for logits in SAS use the following: proc qlim data=one; model a_grade=gpa tuce_scr psi/ discrete(d=logistic); /* d=probit for probits*/ output out=outqlim marginal; run; proc means data=outqlim; var meff_p2_gpa meff_p2_tuce_scr meff_p2_psi; run; [[[[[[[TIME TO PLAY: DO YOU WANT A WHOLE HERSEY BAR? 1. An estimated age-coefficient value of โ.05โ in a linear probability model of the probability of being married (with a zero-one dependent variable) indicates: a. that 95 percent of the sample is not married b. that for each additional year of age, the probability of marriage increases by 5 percent * c. that for each additional year of age, the probability of marriage increases by less than 5 percent d. none of the above 2. An estimated age-coefficient value of โ.05โ in a binomial logit (or binary logit, logistic regression, or just logit) indicates: a. that 95 percent of the sample is not married b. that for each additional year of age, the probability of marriage increases by 5 percent c. that for each additional year of age, the probability of marriage increases by less than 5 percent * d. none of the above 3. The linearity or boundedness problem with the linear probability model is that: a. the errors exhibit heteroskedasticity b. the error is not normally distributed c. the ๐ 2 is not an accurate measure of goodness of fit d. a regression line with any slope will tend to rise above 1, and fall below 0 for some values of the independent variables * ]]]]]]]]]]]]]] V. Cofficients vary in these models: the A_grade example aldr_lpm_probit.do (along with prior results( yields): Constant Gpa Tuce Psi linear probability model -1.498 .4639 (4.206) .0105 (.670) .379 (.482) probit model -7.452 1.626 (3.409) .0517 (.765) 1.426 (.513) logit model -13.021 2.826 (3.4252) .0951 (.812) 2.279 (.493) 5 loglikelihood -12.978 -12.819 -12.890 IV. testing multiple hypotheses: the likelihood ratio has a Chi-square distribution Another example from A_grade example: Are pre-course standings predictive? Testing whether the coeff (gpa)=0 and coeff (tuce)=0, simultaneously. Tests: with and without (gpa and tuce) probit logit log-likelihood with gpa/tuce -12.819 -12.890 log-likelihood without gpa/tuce -17.671 -17.671 log-likelihood ratio statistic (17.1 in Wooldridge) 2*4.852=9.704 2*4.781=9.562 In this example with two variable coefficients set equal to zero, the log-likelihood ratio statistic is distributed as a Chi-square variate with 2 degrees of freedom under the null hypothesis that these variables are unimportant (and therefore, can be left out of the equation). Is the null hypothesis supported? 6
© Copyright 2026 Paperzz