1 Econ 388 R. Butler 2014 revisions Lecture 14

Econ 388 R. Butler 2014 revisions Lecture 14
Dummy Dependent Variables
I. Linear Probability Model: the Regression model with a dummy variables as the dependent
variable
assumption,
implication
regular multiple regression
setup
๐‘ฆ๐‘– = ๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– + ๐›ฝ2 ๐‘ฅ2๐‘– + ๐œ‡๐‘–
E(yi|X)
= ๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– + ๐›ฝ2 ๐‘ฅ2๐‘–
linear probability model
๐‘ฆ๐‘– = ๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– + ๐›ฝ2 ๐‘ฅ2๐‘– + ๐œ‡๐‘–
since
๐ธ(๐‘ฆ๐‘– |๐‘‹) = 1(๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ๐‘– = 1|๐‘‹) +
0(๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ๐‘– = 0|๐‘‹))=๐‘๐‘Ÿ(๐‘ฆ๐‘– = 1|๐‘‹),
๐‘๐‘Ÿ(๐‘ฆ๐‘– |๐‘‹) = ๐ธ(๐‘ฆ๐‘– |๐‘‹) = ๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– + ๐›ฝ2 ๐‘ฅ2๐‘–
E( ๏ญ iX)=0
assumed true
Still true here:
๐ธ(๐œ‡๐‘– |๐‘‹) = [๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ๐‘– = 1) โˆ— (๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– +
๐›ฝ2 ๐‘ฅ2๐‘– )] + [๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ๐‘– = 0) โˆ— (1 โˆ’ ๐›ฝ0 + ๐›ฝ1 ๐‘ฅ1๐‘– +
๐›ฝ2 ๐‘ฅ2๐‘– )] = 0
V( ๏ญ i) is
constant
assumed true: V( ๏ญ i )= ๏ณ 2
linearity
assumed to be a reasonable
approximation
Cannot be true:
V( ๏ญ i )=pr(yi=1)×[1-( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2i)]2 +
pr(yi=0)×[ -( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2i)]2 =
[1-( ๏ข 0+ ๏ข 1x1i+ ๏ข 2x2 i)]× [ ๏ข 0+ ๏ข 1x1i+ ๏ข
2x2i]=[1-prob(y=1)] ๏‚ด [prob(y=1)]
solution: Weighted Least Squares
Cannot be true: โ€œunboundednessโ€ problem,
Wooldridge section 7.5
solution: nonlinear probability equation
Problems with the linear probability model:
1. Linearity assumption (mapping all those values of Xs into the (0,1) interval)โ€”picture, and
why we use logistic regression and the big girls and boys used probits, logits
2. Heteroskedasticity: Weighted Least Squares Procedure for Linear Probability Model to
handle this problem (another option will be to use the โ€˜robustโ€™ procedure in STATA).
HETEROSKEDASTICITY ADJUSTMENT FOR LINEAR PROBABILITY MODELS
ONLY:
1. run OLS and get the predicted value of y, call it "predictedโ€, or Pi.
1
2. check if P i>1.0, then make ๐‘ƒ๐‘– =.999. *to keep the probability within bounds 0<P<1;
check if ๐‘ƒ๐‘– <0, then make ๐‘ƒ๐‘– =.001. *to keep the probability within bounds 0<P<1;
3. compute
1
๐‘ƒ๐‘– (1โˆ’๐‘ƒ๐‘– )
for each observation and place the output in a column to be used as a
"weight". In Stata and SAS you literally compute the predicted value of the dependent
variable, and use a column of
1
๐‘ƒ๐‘– (1โˆ’๐‘ƒ๐‘– )
values as weights.
4. run weighted least squares regressions
The Stata code for doing weighted least squares of the linear probability model is
# delimit ;
infile gpa tuce_scr psi a_grade using "e:\classrm_data\aldr_lpm.txt", clear;
summarize;
regress a_grade gpa tuce_scr psi;
predict YHAT;
replace YHAT=.999 if YHAT>=1;
replace YHAT=.001 if YHAT<=0;
gen WT = 1 / (YHAT*(1-YHAT));
list a_grade YHAT gpa tuce_scr psi;
regress a_grade gpa tuce_scr psi [w=WT];
SAS code for the same problem;
data one;
infile "e:\classrm_data\aldr_lpm.txt" delimiter='09'x dsd truncover;
* the option โ€œdelimiter='09'x dsd truncoverโ€ is for tab delimited
files;
input gpa tuce_scr psi a_grade ;
run;
proc means; run;
proc reg; model a_grade=gpa tuce_scr psi;
output out=two p=yhat; run;
data two; set two;
if YHAT>=1 then YHAT=.999 ;
if YHAT<=0 then YHAT=.001;
WT = 1 / (YHAT*(1-YHAT));
run;
proc print; var a_grade YHAT gpa tuce_scr psi; run;
proc reg; model a_grade=gpa tuce_scr psi ;
weight WT; run;
[[fastfood.do: Restaurant regional sales manager wants to find out what determines the
likelihood that each fast-food chain reached its quota of $6,500 in fast food sales. The
restaurants are located in four different cities, and traffic flow on the street where the
restaurant is located varies by location.]]
2
OR, you can just use the โ€œrobustโ€ option to correct for heteroskedasticity (it may not be as
efficient as the weighted least squares if you model the heteroskedasticity correctly), but it is
robust to alternative forms of heteroskedasticity.
***Stata robust standard error option***
regress a_grade gpa tuce_scr psi, robust;
***SAS robust standard error option***
proc genmod; class id; model a_grade=gpa tuce_scr psi;
repeated subject=id; run;
III. The General Set-up for Binary Choice Models
The outcome is zero or one, conditional on x (the observed characteristics). Hence binary
choice models are Bernoulli processes (one, zero outcome, with probability fixed given x)โ€”
the only difference with the usual Bernoulli processes you have studied (like flipping a coin)
is that we are conditioning on x. Let ๐‘ƒ(๐‘ฆ = 1|๐‘ฅ) =probability of a โ€œoneโ€ outcome given x,
then we have the following: ๐‘ƒ(๐‘ฆ = 0|๐‘ฅ) = 1 โˆ’ ๐‘ƒ(๐‘ฆ = 1|๐‘ฅ); ๐ธ(๐‘ฆ|๐‘ฅ) =
1 (๐‘ƒ(๐‘ฆ = 1|๐‘ฅ)) + 0((1 โˆ’ ๐‘ƒ(๐‘ฆ = 1|๐‘ฅ))) = ๐‘ƒ(๐‘ฆ = 1|๐‘ฅ); and ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฆ|๐‘ฅ) =
2
(๐‘ƒ(๐‘ฆ = 1|๐‘ฅ))(1 โˆ’ ๐‘ƒ(๐‘ฆ = 1|๐‘ฅ)) . There are different functional form choices for the
๐‘ƒ(๐‘ฆ = 1|๐‘ฅ) function, in particular, the following three are most popular:
๐‘ƒ(๐‘ฆ = 1|๐‘ฅ) = ๐บ(๐‘ฅ ๏ข )
Where x is the 1 x k vector of explaining variables, the first element of which is one (the
intercept), and G(.) is some appropriate function. For the linear probability model,
Linear probability model (LPM): ๐บ(๐‘ฅ๐›ฝ) = ๐‘ฅ๐›ฝ
Logit: ๐บ(๐‘ฅ๐›ฝ) =
exp(๐‘ฅ๐›ฝ)
1+exp(๐‘ฅ๐›ฝ)
=
1
1+exp(โˆ’๐‘ฅ๐›ฝ)
x๏ข
Probit: G(x ๏ข )= ๏ƒฒ ๏ฆ (๏ฎ )d๏ฎ where ๏ฆ (๏ฎ ) is the standard normal density function.
๏€ญ๏‚ฅ
๏‚ถp( y | x) ๏‚ถG( x๏ข ) ๏‚ถG( z )
๏€ฝ
๏€ฝ
๏ขj
๏‚ถx j
๏‚ถx j
๏‚ถz
๏‚ถG ( z )
๏‚ถG ( z )
Where
=1 for the linear probability model (LPM) and
=๐บ(๐‘ง) โˆ— (1 โˆ’
๏‚ถz
๏‚ถz
๏‚ถG ( z )
๐บ(๐‘ง)) = ๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ = 1) โˆ— (1 โˆ’ ๐‘๐‘Ÿ๐‘œ๐‘(๐‘ฆ = 1)) for the logit model, and
= ๏ฆ ( z ) for
๏‚ถz
the probit model (Leibniz rule for differentiation). To get the marginal effect for probit in
Stata use the dprobit procedure: โ€œdprobit a_grade gpa tuce_scr psi;โ€ To get the marginal
effect logit in Stata add the โ€œmfx computeโ€ command after the logit procedure as follows:
โ€œlogit a_grade gpa tuce_scr psi; mfx compute;โ€ More particular information follows:
For all functions, the marginal effect is given by
IV. Logistic Regression Model
3
Whereas the probability of a success (getting an A in the first example, or meeting your
sales quota in the second example above) for the linear probability model is
๐‘ƒ๐‘Ÿ๐‘œ๐‘(๐‘ฆ = 1) = ๐›ฝฬ‚0 + ๐›ฝฬ‚1 ๐‘ฅ1๐‘– + ๐›ฝฬ‚2 ๐‘ฅ2๐‘–
in the Logistic regression model it is
Prob(y=1) =
exp( ๏ขห†0 ๏€ซ ๏ขห†1 x 1 ๏€ซ ๏ขห†x x 2 )
1 ๏€ซ exp( ๏ขห† ๏€ซ ๏ขห† x ๏€ซ ๏ขห† x )
0
1
1
2
2
which complicates things in two ways:
a. the estimation is โ€œnon-linear,โ€ and based on searching for the best estimates rather
than getting the estimates directly from a simple set of calculations (as we do in OLS). The
estimation technique is known as maximum likelihood estimation, and it has good
properties for moderately large and large samples (not only the tests, but the estimators are
nice in large samples).
b. the interpretation of the coefficients is somewhat different then for OLS estimates.
In particular, to find the impact of increasing Xi by one unit on prob(y=1) we need to
multiply ๐›ฝฬ‚๐‘– , the estimated coefficient, by โ€œprob(y=1)*(1 - prob(y=1))โ€ as follows
marginal effect=
change in prob( y ๏€ฝ 1)
๏€ฝ ๏ขห†i ๏ƒ— prob( y ๏€ฝ 1) ๏ƒ— [1 ๏€ญ prob( y ๏€ฝ 1)]
change in x i
Do the logistic regressions for the samples above, and compare the resulting coefficients.
STATA: aldr_logit.do
# delimit ;
infile gpa tuce_scr psi a_grade using "e:\classrm_data\aldr_lpm.txt", clear;
summarize;
logit a_grade gpa tuce_scr psi;
SAS:
proc logistic descending; model y=x1 x2 ; run;.
Probit analysis is another way to model dichotomous choices (i.e., the probability of a
success). It is also nonlinear and based on slightly different distributional assumptions
(namely, the cumulative normal distribution assumption). We will discuss these models
further in the next lecture.
To get the marginal effects in Stata for probits and logits use the margins command as
indicated:
probit a_grade gpa tuce_scr psi
margins, dydx(gpa tuce_scr psi)
((Note that the dprobit option in Stata gives you the marginal effects at the means, not quite
as accurate for most BYU research purposes, as the ones given by the margins command
above (marginal effects computed for every observation, and then averaged).))
4
To get the marginal effect logit in Stata add the โ€œmargins, dydx(.)โ€ command again after the
logit procedure as follows:
logit a_grade gpa tuce_scr psi
margins, dydx(gpa tuce_scr psi)
To get the marginal effect for logits in SAS use the following:
proc qlim data=one;
model a_grade=gpa tuce_scr psi/ discrete(d=logistic); /* d=probit for probits*/
output out=outqlim marginal;
run;
proc means data=outqlim;
var meff_p2_gpa meff_p2_tuce_scr meff_p2_psi; run;
[[[[[[[TIME TO PLAY: DO YOU WANT A WHOLE HERSEY BAR?
1. An estimated age-coefficient value of โ€œ.05โ€ in a linear probability model of the
probability of being married (with a zero-one dependent variable) indicates:
a. that 95 percent of the sample is not married
b. that for each additional year of age, the probability of marriage increases by 5
percent *
c. that for each additional year of age, the probability of marriage increases by
less than 5 percent
d. none of the above
2. An estimated age-coefficient value of โ€œ.05โ€ in a binomial logit (or binary logit,
logistic regression, or just logit) indicates:
a. that 95 percent of the sample is not married
b. that for each additional year of age, the probability of marriage increases by 5
percent
c. that for each additional year of age, the probability of marriage increases by
less than 5 percent *
d. none of the above
3. The linearity or boundedness problem with the linear probability model is that:
a. the errors exhibit heteroskedasticity
b. the error is not normally distributed
c. the ๐‘… 2 is not an accurate measure of goodness of fit
d. a regression line with any slope will tend to rise above 1, and fall below 0 for
some values of the independent variables *
]]]]]]]]]]]]]]
V. Cofficients vary in these models: the A_grade example
aldr_lpm_probit.do (along with prior results( yields):
Constant
Gpa
Tuce
Psi
linear probability model
-1.498
.4639 (4.206)
.0105 (.670)
.379 (.482)
probit model
-7.452
1.626 (3.409)
.0517 (.765)
1.426 (.513)
logit model
-13.021
2.826 (3.4252)
.0951 (.812)
2.279 (.493)
5
loglikelihood
-12.978
-12.819
-12.890
IV. testing multiple hypotheses: the likelihood ratio has a Chi-square distribution
Another example from A_grade example: Are pre-course standings predictive? Testing
whether the coeff (gpa)=0 and coeff (tuce)=0, simultaneously.
Tests: with and without (gpa and tuce)
probit
logit
log-likelihood with gpa/tuce
-12.819
-12.890
log-likelihood without gpa/tuce
-17.671
-17.671
log-likelihood ratio statistic (17.1 in Wooldridge)
2*4.852=9.704 2*4.781=9.562
In this example with two variable coefficients set equal to zero, the log-likelihood ratio
statistic is distributed as a Chi-square variate with 2 degrees of freedom under the null
hypothesis that these variables are unimportant (and therefore, can be left out of the
equation). Is the null hypothesis supported?
6