Economics 345 Applied Econometrics Problem Set 7—Solutions

Economics 345
Applied Econometrics
Problem Set 7—Solutions
Prof: Martin Farnham
Problem sets in this course are ungraded. An answer key will be posted on the course
website within a few days of the release of each problem set. As noted in class, it is
highly recommended that you make every effort to complete these problems before
viewing the answer key.
1) Book Problems 7.1-7.3
7.1
i) Yes, there is evidence that men sleep more than women. The point estimate suggests
that men sleep 88 minutes a week more than women, on average, controlling for other
factors included on the RHS. The t-statistic for the null that the coeff on male equals zero
is 87.75/34.33 which is greater than 2. This indicates clear rejection of the null
hypothesis at the 5% level.
Whether this is a big effect is debatable. We’re talking about 13 minutes a night more
sleep than women. Personally, I would consider this a moderate effect.
ii) Yes, there is a statistically significant tradeoff between working and sleeping. Each
extra minute worked per week leads to about 0.16 fewer minutes of sleep per week.
Judging from the t-statistic (for a null of zero) of -0.163/0.018, this is clearly statistically
significant.
iii) Formally, this null hypothesis is H 0 : β age = 0, β age _ squared = 0 . To test the null
hypothesis we need to estimate the restricted and unrestricted form of the regression, and
conduct an F-test. We’ve estimated the unrestricted form already, so to estimate the
restricted form we’d need to estimate
sleep = β 0 + β1totwrk + β 2 educ + β 5 male + v .
7.2
Note that interpreting coefficients on dummies in log-linear specifications is a bit more
complicated than interpreting coefficients on continuous x variables in log-linear
specifications (by “log-linear” I mean a model where the y variable is in logs, but the x
variables are in levels). Your textbook glosses over this point. For small values, the
coefficient estimate on a dummy variable in a log-linear specification, if multiplied by
100, gives roughly the percent change in y associated with changing the dummy from 0 to
1. But if you consult your lecture notes on dummy variables, you’ll realize this
interpretation is only a rough one, and that for large coefficient values, this
interpretation will be far from the truth. Because the estimates here are fairly small,
we’ll stick to the rough interpretation given by your textbook in these answers. But
consider doing the calculation correctly (per the lecture notes) as an exercise.
i) The coefficient estimate on cigs suggests that an extra cigarette smoked (on average)
daily during pregnancy will reduce birthweight by 0.44%. This would suggest that an
extra 10 cigarettes per day will reduce birthweight by 4.4%.
ii) A white child is expected to weigh 5.5% more than a non-white child on average,
holding other factors fixed. Since the t-stat (for a zero null) is 0.55/0.13, this is clearly a
statistically significant effect.
iii) Increases in mother’s education cause lower birth weight (holding other factors
fixed), according to estimates in the second equation. According to the point estimate, an
extra year of mother’s education lowers birthweight by 0.3%. However, this estimate is
not statistically significant. The t-stat (for a null of zero) is about 1.
iv) The sample sizes are different for the two equations. To do an F-test, one could use
the R-squared formula to compare equation 1 (the restricted equation) to equation 2 (the
unrestricted equation). But one must use the same sample of data in estimating each
regression. We can’t use the R-squareds provided here because they are calculated
using different samples. The first sample size is larger, presumably because data are
missing for motheduc and fatheduc (this causes the number of observations to shrink in
the second equation).
The F-test would be easy to conduct if this problem were corrected.
7.3
i) There is pretty strong evidence that hsize2 should be included. With a t-stat greater
than 4, it’s clear that this term is statistically significant.
To find the optimal high school size, you want to find the size that maximizes SAT score.
Take the derivative with respect to hsize and set it equal to zero.
∂sat
= 19.30 − 2(2.19)hsize = 0 . This implies that 4.38(hsize) = 19.30 , or hsize=4.4.
∂hsize
Recalling that hsize is expressed in units of 100s of students, this suggests that a high
school class of approximately 440 maximizes SAT performance.
ii) According to these estimates, nonblack females get a score of 45 points less than
nonblack males, on average. This estimated difference is very statistically significant, as
one can see by comparing the coefficient estimate and its standard error.
To see this most clearly, it helps to write out 1 equation for nonblack females (female=1,
black=0) and 1 equation for nonblack males (female=0, black=0). The difference
between the two (holding hsize constant) is 45.09.
iii) Black males receive a score that is about 170 points less than nonblack males, ceteris
paribus. This result is also highly statistically significant. To state the setup of the
hypothesis test formally
H 0 : βblack = 0
H 1 : βblack ≠ 0
The t-statistic is -169.81/12.71=13.36. This is statistically significant at any
conventional significance level.
iv) Controlling for other conditions, black females score about 107.5 points lower than
non-black females.
To see this, write out the equation for black females:
sat=1028+19.3hsize-2.19hsize2-45.09(1)-169.81(1)+62.31(1)
Then write out the equation for nonblack females:
sat=1028+19.3hsize-2.19hsize2-45.09(1)
The difference between these is 107.5.
To test whether the difference is statistically significant, we would just need to do a t-test
of the null that (beta4+beta5)=0. This can be done in EViews. To calculate the tstatistic by hand you’d need to calculate se(betahat4+betahat5). p149 of your text talks
about how to do this (though ultimately you’d probably still need EViews to do it this
way). One more way you could do it (also requiring EViews) is to define a parameter
θ = β 4 + β 5 , and rewrite the equation in terms of this parameter:
sat = β 0 + β1hsize + β 2 hsize2 + β 3 female + (θ − β 5 )black + β 5black ⋅ female + u
This can be rearranged to
sat = β 0 + β1hsize + β 2 hsize2 + β 3 female + θ black + β 5 (black ⋅ female − black) + u
and estimated. The parameter, theta, should be equal to -107.5. The standard error of
theta should be se(betahat4+betahat5). Thus a t-stat could be easily calculated, given
the reformulation we did.
2) Book Problems 7.5-7.6
7.5.
i) Plugging in PC=1-noPC and rearranging yields
ColGPA = (β 0 + δ 0 ) − δ 0 noPC + β1hsGPA + β 2 ACT + u
The fitted version of this is
 = (1.26 + 0.157) − 0.157noPC + 0.447hsGPA + 0.008ACT + u .
ColGPA
So the intercept should now be 1.417.
ii) Nothing should happen to the R-squared. It should remain the same, because we’re
controlling for exactly the same variables as before. We’ve just replaced PC with a linear
transformation of itself.
iii) No they should not both be included, because each is a linear combination of the
other. This violates the assumption of no perfect collinearity. Only 1 should be included.
To include both would be to fall into the “dummy variable trap.”
7.6
I discussed this in class when discussing the use of dummy variables for program
evaluation. If we think less able people are more likely to receive training, then train and
ability are negatively correlated. If we think ability leads to higher wages, then the
omission of a control for ability should cause the coefficient on train to be negatively
biased.
The best way to fix this would be to find a variable that controls for ability and include it
in the regression.
3) Book Problem 7.8
i) We could estimate the following model:
ln(wage) = β 0 + β1 (# joints) + β 2 educ + β 3exper + β 4 male + u
ii) We could estimate the following model:
ln(wage) = β 0 + β1 (# joints) + β 2 educ + β 3exper + β 4 male + β 5 male * (# joints) + u
As above, note the caveats about interpreting coefficients on dummies where you have a
log-linear specification.
Beta1 gives the effect of smoking an extra joint on the wage for women (as a percent
change). Beta1+beta5 gives the effect of smoking an extra joint on the wage for men (as
a percent change). So the difference in the effect is given by beta5. To test whether there
are differences in the effects of marijuana use for men and women, we could do a t-test
on beta5, the coefficient on male*(#joints). If we reject the null that this is zero, it would
be indicative of different effects of drug use for men and women.
iii) We could estimate the following model:
ln(wage) = β 0 + β1light + β 2 moderate + β 3high + β 4 educ + β 5 exper + β 6 male + u
Note that the reference group here is non-users. I don’t include a dummy for them,
because that would be perfectly collinear with the other drug use dummies.
iv) To test the null that marijuana use has no effect on the wage, an F-test would be
appropriate. We could test for the joint significance of beta1, beta2, and beta3.
We could take the R-squared from the above model, and also estimate a model omitting
light, moderate, and high, and use the R-squared from that (restricted) model to construct
the F-stat. Numerator degrees of freedom would be q=3. Denominator degrees of
freedom would be n-k-1, where k in this case is 6. At the 5% significance level, the
critical F-value given a large sample would be about 2.6. If the F-stat we obtained was
greater than this, we could reject the null that marijuana use has no effect on the wage.
v) There are a couple of potential problems. First, people may not honestly answer
questions about drug use if they are worried that someone else (e.g. their employer or
police) might hear about their answer. This can cause bias under certain circumstances.
Second, drug use may be correlated with other attributes that affect the wage. Nothing
against drug users, but they probably have a tendency to be less motivated and
hardworking (if you prefer, more “easy going”) than non-users. If we don’t control for
these other attributes that may be correlated with drug use, then we will obtain biased
estimates of the effect of drug use on the wage.
Another possibility is that low-wage workers may have less to lose from a drug
conviction. This could mean that low-wage workers choose to smoke more marijuana
than high wage workers (simply due to different cost-benefit calculus). In such a case,
the wages could actually be causing the drug use, rather than vice versa, as we’ve
assumed. This would also lead to a biased measure of the causal effect of drug use on
the wage.
4) Book Problem C7.2 (Computer Problem)
i) Holding other factors fixed, blacks earn a weekly salary that is 18.8% less than nonblacks. The t-statistic on the coefficient on black is -5. So this difference is clearly
statistically significant.
ii) Here we need to do an F-test. We can use the R-squared formula to construct the Fstatistic.
F=
(R
2
ur
)
− Rr2 / q
(1 − R ) / (n − k − 1)
2
ur
=
(0.2550 − 0.2526) / 2
≈ 1.49
(1 − 0.2550) / 925
The critical F-value for the 10% significance level with df=(2,925) is about 2.30. This is
clearly less than that.
To test for whether we can reject the null at the 20% significance level, we should
probably use EViews. The way to do directly restrict the coefficients of interest is
described in Lab 5, ii.g. This will produce a p-value for the F-test which will tell you the
lowest significance level at which you can reject the null. You should obtain a p-value of
0.226, which says that the 22.6% significance level is the lowest level at which you can
reject the null.
iii) To do this, we should estimate the model
ln(wage) = β 0 + β1educ + β 2 exper + β 3tenure + β 4 married + β 5black
+ β 6 south + β 7urban + β 8black * educ + u
As above, note the caveats about interpreting coefficients on dummies where you have a
log-linear specification.
We obtain a coefficient of -0.0226, which suggests that an extra year of education has a
2.3 percentage point lower return for blacks than for non-blacks. To clarify: the return
to an extra year of education for a nonblack worker is a 6.7% increase in the monthly
salary. The return to an extra year of education for a black worker is a (6.7%+(2.3%))=4.4% increase in the monthly salary.
To test for whether the difference between the return to education for blacks and
nonblacks is statistically significant, we need to test the null hypothesis
H 0 : β 8 = 0 against the alternative that it is not equal to zero. A simple t-test will suffice.
t equals -1.12. This indicates that we cannot reject the null at standard significance
levels (for example, the critical t-value for a 2-sided alternative at the 5% level would be
-1.96).
iv) Now we want to estimate
ln(wage) = β 0 + β1educ + β 2 exper + β 3tenure + β 4 married + β 5black + β 6black ⋅ married
+ β 7 south + β 8urban + u
If you plug in zeros and ones where they belong you can find an expression for the
log(wage) for married blacks and married nonblacks. For married blacks this is
(1)
ln(wage) = β 0 + β1educ + β 2 exper + β 3tenure + β 4 + β 5 + β 6
+ β 7 south + β 8urban + u
For married nonblacks, we get
(2)
ln(wage) = β 0 + β1educ +  + β 4 + β 7 south + β 8urban + u
Thus, the wage difference between married black and married nonblacks, holding
everything else constant, is beta5+beta6.
To estimate this model in EViews you need to first generate variables for the interaction
terms. Then estimate the model. Then obtain the coefficient estimates of beta5 and beta6.
These are -0.24 and 0.06 respectively. Thus the difference in log wage between married
blacks and married nonblacks is -0.19. This translates to 19% lower wages for married
blacks (relative to married nonblacks).